In the function make_sortkey a tmp buffer was defined and in the absence of
param->tmp_buffer, tmp buffer used the sort_keys buffer. sort_keys buffer
has a length defined in sort_field->length, while param->tmp_buffer is
stored in param->rec_length. Make sure to use the appropriate length
based on which buffer we are using otherwise we'll overflow.
Also added a type cast to size_t during the calculation of the sort keys
buffer size to avoid an oveflow if the buffer size exceeds 32 bits.
The crash was caused by this problem:
get_best_group_min_max() tries to construct query plans for keys that
are not processed by the range optimizer. This wasn't a problem as long
as SEL_TREE::keys was an array of MAX_KEY elements.
However, now it is a Mem_root_array and only has elements for the used
keys, and get_best_group_min_max attempts to address beyond the end of
the array.
The obvious way to fix the crash was to port (and improve) a part of
96fcfcbd7b5120e8f64fd45985001eca8d36fbfb from mysql-5.7. This makes
get_best_group_min_max not to consider indexes that Mem_root_arrays
have no element for.
After that, I got non-sensical query plans (see MDEV-10325 for details).
Fixed that by making get_best_group_min_max to check if the index is in
table->keys_in_use_for_group_by bitmap.
- "Early NULLs filtering" optimization used to "peel off" Item_ref and
Item_direct_ref wrappers from an outside column reference before
adding "outer_table_col IS NOT NULL" into JOIN::outer_ref_cond.
- When this happened in a subquery that was evaluated in a post-GROUP-BY
context, attempt to evaluate JOIN::outer_ref_cond would fetch an
incorrect value of outer_table_col.
Fixed several optimizer issues relatied to GROUP BY:
a) Refering to a SELECT column in HAVING sometimes calculated it twice, which caused problems with non determinstic functions
b) Removing duplicate fields and constants from GROUP BY was done too late for "using index for group by" optimization to work
c) EXPLAIN SELECT ... GROUP BY did wrongly show 'Using filesort' in some cases involving "Using index for group-by"
a) was fixed by:
- Changed last argument to Item::split_sum_func2() from bool to int to allow more flags
- Added flag argument to Item::split_sum_func() to allow on to specify if the item was in the SELECT part
- Mark all split_sum_func() calls from SELECT with SPLIT_SUM_SELECT
- Changed split_sum_func2() to do nothing if called with an argument that is not a sum function and doesn't include sum functions, if we are not an argument to SELECT.
This ensures that in a case like
select a*sum(b) as f1 from t1 where a=1 group by c having f1 <= 10;
That 'a' in the SELECT part is stored as a reference in the temporary table togeher with sum(b) while the 'a' in having isn't (not needed as 'a' is already a reference to a column in the result)
b) was fixed by:
- Added an extra remove_const() pass for GROUP BY arguments before make_join_statistics() in case of one table SELECT.
This allowes get_best_group_min_max() to optimize things better.
c) was fixed by:
- Added test for group by optimization in JOIN::exec_inner for
select->quick->get_type() == QUICK_SELECT_I::QS_TYPE_GROUP_MIN_MAX
item.cc:
- Simplifed Item::split_sum_func2()
- Split test to make them faster and easier to read
- Changed last argument to Item::split_sum_func2() from bool to int to allow more flags
- Added flag argument to Item::split_sum_func() to allow on to specify if the item was in the SELECT part
- Changed split_sum_func2() to do nothing if called with an argument that is not a sum function and doesn't include sum functions, if we are not an argument to SELECT.
opt_range.cc:
- Simplified get_best_group_min_max() by calcuating first how many group_by elements.
- Use join->group instead of join->group_list to test if group by, as join->group_list may be NULL if everything was optimized away.
sql_select.cc:
- Added an extra remove_const() pass for GROUP BY arguments before make_join_statistics() in case of one table SELECT.
- Use group instead of group_list to test if group by, as group_list may be NULL if everything was optimized away.
- Moved printing of "Error in remove_const" to remove_const() instead of having it in caller.
- Simplified some if tests by re-ordering code.
- update_depend_map_for_order() and remove_const() fixed to handle the case where make_join_statistics() has not yet been called (join->join_tab is 0 in this case)
mysql-test/r/group_by.result:
Test for MDEV-6855
mysql-test/t/group_by.test:
Test for MDEV-6855
sql/item.h:
Fixed spelling error
sql/opt_range.cc:
Added handling of cond_type == Item::CACHE_ITEM in WHERE clauses for MIN/MAX optimization.
Fixed indentation
- test_if_skip_sort_order()/create_ref_for_key() may change table
access from EQ_REF(index1) to REF(index2).
- Doing so doesn't make much sense from optimization POV, but since
they are doing it, they should update tab->read_record.unlock_row
accordingly.
Back-ported from the mysql 5.6 code line the patch with
the following comment:
Fix for Bug#11757108 CHANGE IN EXECUTION PLAN FOR COUNT_DISTINCT_GROUP_ON_KEY
CAUSES PEFORMANCE REGRESSION
The cause for the performance regression is that the access strategy for the
GROUP BY query is changed form using "index scan" in mysql-5.1 to use "loose
index scan" in mysql-5.5. The index used for group by is unique and thus each
"loose scan" group will only contain one record. Since loose scan needs to
re-position on each "loose scan" group this query will do a re-position for
each index entry. Compared to just reading the next index entry as a normal
index scan does, the use of loose scan for this query becomes more expensive.
The cause for selecting to use loose scan for this query is that in the current
code when the size of the "loose scan" group is one, the formula for
calculating the cost estimates becomes almost identical to the cost of using
normal index scan. Differences in use of integer versus floating point arithmetic
can cause one or the other access strategy to be selected.
The main issue with the formula for estimating the cost of using loose scan is
that it does not take into account that it is more costly to do a re-position
for each "loose scan" group compared to just reading the next index entry.
Both index scan and loose scan estimates the cpu cost as:
"number of entries needed too read/scan" * ROW_EVALUATE_COST
The results from testing with the query in this bug indicates that the real
cost for doing re-position four to eight times higher than just reading the
next index entry. Thus, the cpu cost estimate for loose scan should be increased.
To account for the extra work to re-position in the index we increase the
cost for loose index scan to include the cost of navigating the index.
This is modelled as a function of the height of the b-tree:
navigation cost= ceil(log(records in table)/log(indexes per block))
* ROWID_COMPARE_COST;
This will avoid loose index scan being used for indexes where the "loose scan"
group contains very few index entries.
Fixed crashing bug for union queries where there was no real tables.
mysql-test/r/group_by.result:
Added test case
mysql-test/t/group_by.test:
Added test case
sql/db.opt:
Removed genrated file
sql/item.cc:
Handled case when table_list->pos_in_tables is not set. Can only happens when there is no real tables in query
This only happend when using an ORDER BY on a primary key part, where all other key parts where constant.
Remove of duplicated expressions in ORDER BY (as the old code did this in some strange cases)
mysql-test/r/group_by.result:
Fixed results to take into account that duplicate order by parts are now deleted
mysql-test/r/group_by_innodb.result:
Ensure extended keys are on
mysql-test/r/innodb_ext_key.result:
More tests
mysql-test/r/order_by.result:
More tests
mysql-test/t/group_by.test:
Fixed results to take into account that duplicate order by parts are now deleted
mysql-test/t/group_by_innodb.test:
Ensure extended keys are on
mysql-test/t/innodb_ext_key.test:
More tests
mysql-test/t/order_by.test:
More tests
sql/sql_select.cc:
Fixed bug where we looked at extended key parts when we shouldn't
Remove of duplicated expressions in ORDER BY
sql/table.cc:
Indentation fixes
Analysis:
st_select_lex_unit::prepare() computes can_skip_order_by as TRUE.
As a result join->prepare() gets called with order == NULL, and
doesn't do name resolution for the inner ORDER clause. Due to this
the prepare phase doesn't detect that the query references non-exiting
function and field.
Later join->optimize() calls update_used_tables() for a non-resolved
Item_field, which understandably has no Field object. This call results
in a crash.
Solution:
Resolve unnecessary ORDER BY clauses to detect if they reference non-exising
objects. Then remove such clauses from the JOIN object.
Analysis:
Range analysis discoveres that the query can be executed via loose index scan for GROUP BY.
Later, GROUP BY analysis fails to confirm that the GROUP operation can be computed via an
index because there is no logic to handle duplicate field references in the GROUP clause.
As a result the optimizer produces an inconsistent plan. It constructs a temporary table,
but on the other hand the group fields are not set to point there.
Solution:
Make loose scan analysis work in sync with order by analysis. In the case of duplicate
columns loose scan will not be applicable. This limitation will be lifted in 10.0 by
removing duplicate columns.
Fixed MDEV-4002: Server crash or valgrind errors in Item_func_group_concat::setup and Item_func_group_concat::add
mysql-test/r/group_by.result:
Added test case for failing GROUP_CONCAT ... ROLLUP queries
mysql-test/t/group_by.test:
Added test case for failing GROUP_CONCAT ... ROLLUP queries
sql/item_sum.cc:
Fixed issue where field->table pointed to different temporary table than expected.
Ensure that order->next points to the right object (could cause problems with setup_order())