(Variant 3) (commit in 11.4)
When a derived table has a GROUP BY clause:
SELECT ...
FROM (SELECT ... GROUP BY col1, col2) AS tbl
The optimizer would use inner join's output cardinality as an estimate
of derived table size, ignoring the fact that GROUP BY operation would
produce much fewer groups.
Add code to produce tighter bounds:
- The GROUP BY list is split into per-table lists. If GROUP BY list has
expressions that refer to multiple tables, we fall back to join output
cardinality.
- For each table, the first cardinality estimate is join_tab->read_records.
- Then, we try to get a tighter bound by using index statistics.
- If indexes do not cover all GROUP BY columns, we try to use per-column
EITS statistics.
Item_func_group_concat::print() did not take into account
that Item_func_group_concat::separator can be of a different character set
than the "String *str" (when the printing is being done to).
Therefore, printing did not work correctly for:
- non-ASCII separators when GROUP_CONCAT is done on 8bit data
or multi-byte data with mbminlen==1.
- all separators (even including simple ones like comma)
when GROUP_CONCAT is done on ucs2/utf16/utf32 data (mbminlen>1).
Because of this problem, VIEW definitions did not print correctly to
their FRM files. This later led to a wrong SELECT and SHOW CREATE output.
Fix:
- Adding new String methods:
bool append_for_single_quote_using_mb_wc(const char *str, size_t length,
CHARSET_INFO *cs);
bool append_for_single_quote_opt_convert(const char *str,
size_t length,
CHARSET_INFO *cs)
which perform both escaping and character set conversion at the same time.
- Adding a new String method escaped_wc_for_single_quote(),
to reuse the code between the old and the new methods.
- Fixing Item_func_group_concat::print() to use the new
method append_for_single_quote_opt_convert().
Under terms of MDEV 27490 we'll add support for non-BMP identifiers
and upgrade casefolding information to Unicode version 14.0.0.
In Unicode-14.0.0 conversion to lower and upper cases can increase octet length
of the string, so conversion won't be possible in-place any more.
This patch removes virtual functions performing in-place casefolding:
- my_charset_handler_st::casedn_str()
- my_charset_handler_st::caseup_str()
and fixes the code to use the non-inplace functions instead:
- my_charset_handler_st::casedn()
- my_charset_handler_st::caseup()
If a query with GROUP_CONCAT is executed then the server reports a warning
every time when the length of the result of this function exceeds the set
value of the system variable group_concat_max_len. This bug led to the set
of warnings from the second execution of the prepared statement that did
not coincide with the one from the first execution if the executed query
was a grouping query over a join of tables using GROUP_CONCAT function and
join cache was not allowed to be employed.
The descrepancy of the sets of warnings was due to lack of cleanup for
Item_func_group_concat::row_count after execution of the query.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Backported from MYSQL
Bug #25331425: DISTINCT CLAUSE DOES NOT WORK IN GROUP_CONCAT
Issue:
------
The problem occurs when:
1) GROUP_CONCAT (DISTINCT ....) is used in the query.
2) Data size greater than value of system variable:
tmp_table_size.
The result would contain values that are non-unique.
Root cause:
-----------
An in-memory structure is used to filter out non-unique
values. When the data size exceeds tmp_table_size, the
overflow is written to disk as a separate file. The
expectation here is that when all such files are merged,
the full set of unique values can be obtained.
But the Item_func_group_concat::add function is in a bit of
hurry. Even as it is adding values to the tree, it wants to
decide if a value is unique and write it to the result
buffer. This works fine if the configured maximum size is
greater than the size of the data. But since tmp_table_size
is set to a low value, the size of the tree is smaller and
hence requires the creation of multiple copies on disk.
Item_func_group_concat currently has no mechanism to merge
all the copies on disk and then generate the result. This
results in duplicate values.
Solution:
---------
In case of the DISTINCT clause, don't write to the result
buffer immediately. Do the merge and only then put the
unique values in the result buffer. This has be done in
Item_func_group_concat::val_str.
Note regarding result file changes:
-----------------------------------
Earlier when a unique value was seen in
Item_func_group_concat::add, it was dumped to the output.
So result is in the order stored in SE. But with this fix,
we wait until all the data is read and the final set of
unique values are written to output buffer. So the data
appears in the sorted order.
This only fixes the cases when we have DISTINCT without ORDER BY clause
in GROUP_CONCAT.
Backported from MYSQL
Bug #25331425: DISTINCT CLAUSE DOES NOT WORK IN GROUP_CONCAT
Issue:
------
The problem occurs when:
1) GROUP_CONCAT (DISTINCT ....) is used in the query.
2) Data size greater than value of system variable:
tmp_table_size.
The result would contain values that are non-unique.
Root cause:
-----------
An in-memory structure is used to filter out non-unique
values. When the data size exceeds tmp_table_size, the
overflow is written to disk as a separate file. The
expectation here is that when all such files are merged,
the full set of unique values can be obtained.
But the Item_func_group_concat::add function is in a bit of
hurry. Even as it is adding values to the tree, it wants to
decide if a value is unique and write it to the result
buffer. This works fine if the configured maximum size is
greater than the size of the data. But since tmp_table_size
is set to a low value, the size of the tree is smaller and
hence requires the creation of multiple copies on disk.
Item_func_group_concat currently has no mechanism to merge
all the copies on disk and then generate the result. This
results in duplicate values.
Solution:
---------
In case of the DISTINCT clause, don't write to the result
buffer immediately. Do the merge and only then put the
unique values in the result buffer. This has be done in
Item_func_group_concat::val_str.
Note regarding result file changes:
-----------------------------------
Earlier when a unique value was seen in
Item_func_group_concat::add, it was dumped to the output.
So result is in the order stored in SE. But with this fix,
we wait until all the data is read and the final set of
unique values are written to output buffer. So the data
appears in the sorted order.
This only fixes the cases when we have DISTINCT without ORDER BY clause
in GROUP_CONCAT.