optimizer_use_condition_selectivity>=3
Selectivity analysis should be disabled for Geometrical columns
for the case like geometric_field= string_constant.
and use_stat_tables= PREFERABLY
Currently the code that calculates selectivity for a table does not take into account the case when
we can have GROUP BY optimization (looses index scan).
After the commit b76b69cd5f
loose index scan for queries with DISTINCT stopped working.
That is why that commit has to be reverted.
Additionally this patch fixes the problem of MDEV-10880.
This should not have caused any notable errors in most cases.
After fix, we are not using keys to solve MIN/MAX if the string used for comparision is longer thant the column-
When the optimizer considers an option to use Loose Scan, it should
still consider UNIQUE keys (Previously, MDEV-4120 disabled loose scan
for all kinds of unique indexes. That was wrong)
However, we should not use Loose Scan when trying to satisfy
"SELECT DISTINCT col1, col2, .. colN"
when using an index defined as UNIQU(col1, col2, ... colN).
~40% bugfixed(*) applied
~40$ bugfixed reverted (incorrect or we're not buggy)
~20% bugfixed applied, despite us being not buggy
(*) only changes in the server code, e.g. not cmakefiles
"HAVING SUM(DISTINCT)": WRONG RESULTS.
ISSUE:
------
If a query uses loose index scan and it has both
AGG(DISTINCT) and MIN()/MAX()functions. Then, result values
of MIN/MAX() is set improperly.
When query has AGG(DISTINCT) then end_select is set to
end_send_group. "end_send_group" keeps doing aggregation
until it sees a record from next group. And, then it will
send out the result row of that group.
Since query also has MIN()/MAX() and loose index scan is
used, values of MIN/MAX() are set as part of loose index
scan itself. Setting MIN()/MAX() values as part of loose
index scan overwrites values computed in end_send_group.
This caused invalid result.
For such queries to work loose index scan should stop
performing MIN/MAX() aggregation. And, let end_send_group to
do the same. But according to current design loose index
scan can produce only one row per group key. If we have both
MIN() and MAX() then it has to give two records out. This is
not possible as interface has to use common buffer
record[0]! for both records at a time.
SOLUTIONS:
----------
For such queries to work we need a new interface for loose
index scan. Hence, do not choose loose_index_scan for such
cases. So a new rule SA7 is introduced to take care of the
same.
SA7: "If Q has both AGG_FUNC(DISTINCT ...) and
MIN/MAX() functions then loose index scan access
method is not used."
mysql-test/r/group_min_max.result:
Expected result.
mysql-test/t/group_min_max.test:
1. Test with various combination of AGG(DISTINCT) and
MIN(), MAX() functions.
2. Corrected the plan for old queries.
sql/opt_range.cc:
A new rule SA7 is introduced.
Back-ported from the mysql 5.6 code line the patch with
the following comment:
Fix for Bug#11757108 CHANGE IN EXECUTION PLAN FOR COUNT_DISTINCT_GROUP_ON_KEY
CAUSES PEFORMANCE REGRESSION
The cause for the performance regression is that the access strategy for the
GROUP BY query is changed form using "index scan" in mysql-5.1 to use "loose
index scan" in mysql-5.5. The index used for group by is unique and thus each
"loose scan" group will only contain one record. Since loose scan needs to
re-position on each "loose scan" group this query will do a re-position for
each index entry. Compared to just reading the next index entry as a normal
index scan does, the use of loose scan for this query becomes more expensive.
The cause for selecting to use loose scan for this query is that in the current
code when the size of the "loose scan" group is one, the formula for
calculating the cost estimates becomes almost identical to the cost of using
normal index scan. Differences in use of integer versus floating point arithmetic
can cause one or the other access strategy to be selected.
The main issue with the formula for estimating the cost of using loose scan is
that it does not take into account that it is more costly to do a re-position
for each "loose scan" group compared to just reading the next index entry.
Both index scan and loose scan estimates the cpu cost as:
"number of entries needed too read/scan" * ROW_EVALUATE_COST
The results from testing with the query in this bug indicates that the real
cost for doing re-position four to eight times higher than just reading the
next index entry. Thus, the cpu cost estimate for loose scan should be increased.
To account for the extra work to re-position in the index we increase the
cost for loose index scan to include the cost of navigating the index.
This is modelled as a function of the height of the b-tree:
navigation cost= ceil(log(records in table)/log(indexes per block))
* ROWID_COMPARE_COST;
This will avoid loose index scan being used for indexes where the "loose scan"
group contains very few index entries.
Currently the loose scan code in opt_range.cc considers all indexes as
possible for the access method. Due to inexact statistics it may happen
that a loose scan is selected over a unique index.
This is clearly wrong since a "loose scan" over a unique index will read
the same keys as a direct index scan, but the loose scan has more overhead.
This task adds a rule to skip unique indexes for loose scan.
In the case of loose scan used as input for order by, end_send()
didn't detect correctly that a loose scan was used, and didn't copy
the non-aggregated fields from the temp table used for ORDER BY.
The fix uses the fact that the quick select used for sorting is
attached to JOIN::pre_sort_join_tab instead of JOIN::join_tab.
WITH COMPOSITE KEY COLUMNS
Problem:-
While running a SELECT query with several AGGR(DISTINCT) function
and these are referring to different field of same composite key,
Returned incorrect value.
Analysis:-
In a table, where we have composite key like (a,b,c)
and when we give a query like
select COUNT(DISTINCT b), SUM(DISTINCT a) from ....
here, we first make a list of items in Aggr(distinct) function
(which is a, b), where order of item doesn't matter.
and then we see, whether we have a composite key where the prefix
of index columns matches the items of the aggregation function.
(in this case we have a,b,c).
if yes, so we can use loose index scan and we need not perform
duplicate removal to distinct in our aggregate function.
In our table, we traverse column marked with <-- and get the result as
(a,b,c) count(distinct b) sum(distinct a)
treated as count b treated as sum(a)
(1,1,2)<-- 1 1
(1,2,2)<-- 1++=2 1+1=2
(1,2,3)
(2,1,2)<-- 2++=3 1+1+2=4
(2,2,2)<-- 3++=4 1+1+2+2=6
(2,2,3)
result will be 4,6, but it should be (2,3)
As in this case, our assumption is incorrect. If we have
query like
select count(distinct a,b), sum(distinct a,b)from ..
then we can use loose index scan
Solution:-
In our query, when we have more then one aggr(distinct) function
then they should refer to same fields like
select count(distinct a,b), sum(distinct a,b) from ..
-->we can use loose scan index as both aggr(distinct) refer to same fields a,b.
If they are referring to different field like
select count(distinct a), sum(distinct b) from ..
-->will not use loose scan index as both aggr(distinct) refer to different fields.
Analysis:
Range analysis detects that the subquery is expensive and doesn't
build a range access method. Later, the applicability test for loose
scan doesn't take that into account, and builds a loose scan method
without a range scan on the min/max column. As a result loose scan
fetches the first key in each group, rather than the first key that
satisfies the condition on the min/max column.
Solution:
Since there is no SEL_ARG tree to be used for the min/max column,
it is not possible to use loose scan if the min/max column is compared
with an expensive scalar subquery. Make the test for loose scan
applicability to be in sync with the range analysis code by testing if
the min/max argument is compared with an expensive predicate.
Analys:
The cause for the wrong result was that the optimizer
incorrectly chose min/max loose scan when it is not
applicable. The applicability test missed the case when
a condition on the MIN/MAX argument was OR-ed with a
condition on some other field. In this case, the MIN/MAX
condition cannot be used for loose scan.
Solution:
Extend the test check_group_min_max_predicates() to check
that the WHERE clause is of the form: "cond1 AND cond2"
where
cond1 - does not use min_max_column at all.
cond2 - is an AND/OR tree with leaves in form "min_max_column $CMP$ const"
or $CMP$ is one of the functions between, is [not] null
The range optimizer incorrectly chose a loose scan for group by
when there is a correlated WHERE condition. This range access
method cannot be executed for correlated conditions also with the
"range checked for each record" because generally the range access
method can change for each outer record. Loose scan destructively
changes the query plan and removes the GROUP operation, which will
result in wrong query plans if another range access is chosen
dynamically.
Analysis:
The class member QUICK_GROUP_MIN_MAX_SELECT::seen_first_key
was not reset between subquery re-executions. Thus each
subsequent execution continued from the group that was
reached by the previous subquery execution. As a result
loose scan reached end of file much earlier, and returned
empty result where it shouldn't.
Solution:
Reset seen_first_key before each re-execution of the
loose scan.
sql/sql_insert.cc:
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
******
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
sql/sql_table.cc:
small cleanup
******
small cleanup
Fixed query plans with loose index scan degraded into index scan.
Analysis:
With MWL#89 subqueries are no longer executed and substituted during
the optimization of the outer query. As a result subquery predicates
that were previously executed and substituted by a constant before
the range optimizer were present as regular subquery predicates during
range optimization. The procedure check_group_min_max_predicates()
had a naive test that ruled out all queries with subqueries in the
WHERE clause. This resulted in worse plans with MWL#89.
Solution:
The solution is to refine the test in check_group_min_max_predicates()
to check if each MIN/MAX argument is referred to by a subquery predicate.
Item*) at opt_sum.cc:305
Queries applying MIN/MAX functions to indexed columns are
optimized to read directly from the index if all key parts
of the index preceding the aggregated key part are bound to
constants by the WHERE clause. A prefix length is also
produced, equal to the total length of the bound key
parts. If the aggregated column itself is bound to a
constant, however, it is also included in the prefix.
Such full search keys are read as closed intervals for
reasons beyond the scope of this bug. However, the procedure
missed one case where a key part meant for use as range
endpoint was being overwritten with a NULL value destined
for equality checking. In this case the key part was
overwritten but the range flag remained, causing open
interval reading to be performed.
Bug was fixed by adding more stringent checking to the
search key building procedure (matching_cond) and never
allow overwrites of range predicates with non-range
predicates.
An assertion was added to make sure open intervals are never
used with full search keys.
Conflicts:
Text conflict in mysql-test/r/partition_innodb.result
Text conflict in sql/field.h
Text conflict in sql/item.h
Text conflict in sql/item_cmpfunc.h
Text conflict in sql/item_sum.h
Text conflict in sql/log_event_old.cc
Text conflict in sql/protocol.cc
Text conflict in sql/sql_select.cc
Text conflict in sql/sql_yacc.yy
This has been back-ported from 6.0 as the problems proved to afflict
5.1 as well.
The fix exposed two new bugs. They were reported as follows.
Bug no 52174: Sometimes wrong plan when reading a MAX value
from non-NULL index
Bug no 52173: Reading NULL value from non-NULL index gives wrong
result in embedded server
Both bugs taken together affect a much smaller class of queries than #47762,
so the fix stays for now.
NULL column for NULL
The optimization to read MIN() and MAX() values from an
index did not properly handle comparisons with NULL
values. Fixed by giving up the particular optimization step
if there are non-NULL safe comparisons with NULL values, as
the result is NULL anyway.
Also, Oracle copyright notice was added to all files.
function with distinct.
Loose index scan is used to find MIN/MAX values using appropriate index and
thus allow to avoid grouping. For each found row it updates non-aggregated
fields with values from row with found MIN/MAX value.
Without loose index scan non-aggregated fields are copied by end_send_group
function. With loose index scan there is no need in end_send_group and
end_send is used instead. Non-aggregated fields still need to be copied and
this was wrongly implemented in QUICK_GROUP_MIN_MAX_SELECT::get_next.
WL#3220 added a case when loose index scan can be used with end_send_group to
optimize calculation of aggregate functions with distinct. In this case
the row found by QUICK_GROUP_MIN_MAX_SELECT::get_next might belong to a next
group and copying it will produce wrong result.
Update of non-aggregated fields is moved to the end_send function from
QUICK_GROUP_MIN_MAX_SELECT::get_next.
mysql-test/r/group_min_max.result:
Added a test case for the bug#50539.
mysql-test/t/group_min_max.test:
Added a test case for the bug#50539.
sql/opt_range.cc:
Bug#50539: Wrong result when loose index scan is used for an aggregate
function with distinct.
Update of non-aggregated fields is moved to the end_send function from
QUICK_GROUP_MIN_MAX_SELECT::get_next.
sql/sql_select.cc:
Bug#50539: Wrong result when loose index scan is used for an aggregate
function with distinct.
Update of non-aggregated fields is moved to the end_send function from
QUICK_GROUP_MIN_MAX_SELECT::get_next.
Queries optimized with GROUP_MIN_MAX didn't cleanup KEYREAD
optimization properly. As a result subsequent queries may
return incomplete rows (fields are initialized to default
values).
mysql-test/r/group_min_max.result:
A test case for BUG#49902.
mysql-test/t/group_min_max.test:
A test case for BUG#49902.
sql/opt_range.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/opt_sum.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/sql_select.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/sql_update.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/table.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/table.h:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
WHERE conditions
check_group_min_max() checks if the loose index scan
optimization is applicable for a given WHERE condition, that is
if the MIN/MAX attribute participates only in range predicates
comparing the corresponding field with constants.
The problem was that it considered the whole predicate suitable
for the loose index scan optimization as soon as it encountered
a constant as a predicate argument. This is obviously wrong for
cases when a constant is the first argument of a predicate
which does not satisfy the above condition.
Fixed check_group_min_max() so that all arguments of the input
predicate are considered to decide if it passes the test, even
though a constant has already been encountered.
mysql-test/r/group_min_max.result:
Added a test case for bug #48472.
mysql-test/t/group_min_max.test:
Added a test case for bug #48472.
sql/opt_range.cc:
Fixed check_group_min_max() so that all arguments of the input
predicate are considered to decide if it passes the test, even
though a constant has already been encountered.
results in server crash
check_group_min_max_predicates() assumed the input condition
item to be one of COND_ITEM, SUBSELECT_ITEM, or FUNC_ITEM.
Since a condition of the form "field" is also a valid condition
equivalent to "field <> 0", using such a condition in a query
where the loose index scan was chosen resulted in a debug
assertion failure.
Fixed by handling conditions of the FIELD_ITEM type in
check_group_min_max_predicates().
mysql-test/r/group_min_max.result:
Added a test case for bug #46607.
mysql-test/t/group_min_max.test:
Added a test case for bug #46607.
sql/opt_range.cc:
Handle conditions of the FUNC_ITEM type in
check_group_mix_max_predicates().
without error
When using quick access methods for searching rows in UPDATE or
DELETE there was no check if a fatal error was not already sent
to the client while evaluating the quick condition.
As a result a false OK (following the error) was sent to the
client and the error was thus transformed into a warning.
Fixed by checking for errors sent to the client during
SQL_SELECT::check_quick() and treating them as real errors.
Fixed a wrong test case in group_min_max.test
Fixed a wrong return code in mysql_update() and mysql_delete()
mysql-test/r/bug40113.result:
Bug #40013: test case
mysql-test/r/group_min_max.result:
Bug #40013: fixed a wrong test case
mysql-test/t/bug40113-master.opt:
Bug #40013: test case
mysql-test/t/bug40113.test:
Bug #40013: test case
mysql-test/t/group_min_max.test:
Bug #40013: fixed a wrong test case
sql/sql_delete.cc:
Bug #40113: check for errors evaluating the quick select
sql/sql_update.cc:
Bug #40113: check for errors evaluating the quick select