Analys:
The cause for the wrong result was that the optimizer
incorrectly chose min/max loose scan when it is not
applicable. The applicability test missed the case when
a condition on the MIN/MAX argument was OR-ed with a
condition on some other field. In this case, the MIN/MAX
condition cannot be used for loose scan.
Solution:
Extend the test check_group_min_max_predicates() to check
that the WHERE clause is of the form: "cond1 AND cond2"
where
cond1 - does not use min_max_column at all.
cond2 - is an AND/OR tree with leaves in form "min_max_column $CMP$ const"
or $CMP$ is one of the functions between, is [not] null
The result of materialization of the right part of an IN subquery predicate
is placed into a temporary table. Each row of the materialized table is
distinct. A unique key over all fields of the temporary table is defined and
created. It allows to perform key look-ups into the table.
The table created for a materialized subquery can be accessed by key as
any other table. The function best_access-path search for the best access
to join a table to a given partial join. With some where conditions this
function considers a possibility of a ref_or_null access. If such access
employs the unique key on the temporary table then when estimating
the cost this access the function tries to use the array rec_per_key. Yet,
such array is not built for this unique key. This causes a crash of the server.
Rows returned by the subquery that contain nulls don't have to be placed
into temporary table, as they cannot be match any row produced by the
left part of the subquery predicate. So all fields of the temporary table
can be defined as non-nullable. In this case any ref_or_null access
to the temporary table does not make any sense and it does not make sense
to estimate such an access.
The fix makes sure that the temporary table for a materialized IN subquery
is defined with columns that are all non-nullable. The also ensures that
any row with nulls returned by the subquery is not placed into the
temporary table.
The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
care has been taken to restore the original plan so that each
test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
array bounds checking, and because it looks risky.
Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are
meaningless:
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate
functions
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
The range optimizer incorrectly chose a loose scan for group by
when there is a correlated WHERE condition. This range access
method cannot be executed for correlated conditions also with the
"range checked for each record" because generally the range access
method can change for each outer record. Loose scan destructively
changes the query plan and removes the GROUP operation, which will
result in wrong query plans if another range access is chosen
dynamically.
Analysis:
The class member QUICK_GROUP_MIN_MAX_SELECT::seen_first_key
was not reset between subquery re-executions. Thus each
subsequent execution continued from the group that was
reached by the previous subquery execution. As a result
loose scan reached end of file much earlier, and returned
empty result where it shouldn't.
Solution:
Reset seen_first_key before each re-execution of the
loose scan.
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
Analysis:
Constant table optimization of the outer query finds that
the right side of the equality is a constant that can
be used for an eq_ref access to fetch one row from t1,
and substitute t1 with a constant. Thus constant optimization
triggers evaluation of the subquery during the optimize
phase of the outer query.
The innermost subquery requires a plan with a temporary
table because with InnoDB tables the exact count of rows
is not known, and the empty tables cannot be optimzied
way. JOIN::exec for the innermost subquery substitutes
the subquery tables with a temporary table.
When EXPLAIN gets to print the tables in the innermost
subquery, EXPLAIN needs to print the name of each table
through the corresponding TABLE_LIST object. However,
the temporary table created during execution doesn't
have a corresponding TABLE_LIST, so we get a null
pointer exception.
Solution:
The solution is to forbid using expensive constant
expressions for eq_ref access for contant table
optimization. Notice that eq_ref with a subquery
providing the value is still possible during regular
execution.
revno: 2876.47.174
revision-id: jorgen.loland@oracle.com-20110519120355-qn7eprkad9jqwu5j
parent: mayank.prasad@oracle.com-20110518143645-bdxv4udzrmqsjmhq
committer: Jorgen Loland <jorgen.loland@oracle.com>
branch nick: mysql-trunk-11765831
timestamp: Thu 2011-05-19 14:03:55 +0200
message:
BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER
AWAY QUALIFYING ROWS
The problem was that the ranges created when OR'ing two
conditions could be incorrect. Without the bugfix,
"I <> 6 OR (I <> 8 AND J = 5)" would create these ranges:
"NULL < I < 6",
"6 <= I <= 6 AND 5 <= J <= 5",
"6 < I < 8",
"8 <= I <= 8 AND 5 <= J <= 5",
"8 < I"
While the correct ranges is
"NULL < I < 6",
"6 <= I <= 6 AND 5 <= J <= 5",
"6 < I"
The problem occurs when key_or() ORs
(1) "NULL < I < 6, 6 <= I <= 6 AND 5 <= J <= 5, 6 < I" with
(2) "8 < I AND 5 <= J <= 5"
The reason for the bug is that in key_or(), SEL_ARG *tmp is
used to point to the range in (1) above that is merged with
(2) while key1 points to the root of the red-black tree of
(1). When merging (1) and (2), tmp refers to the "6 < I"
part whereas the root is the "6 <= ... AND 5 <= J <= 5" part.
key_or() decides that the tmp range needs to be split into
"6 < I < 8, 8 <= I <= 8, 8 < I", in which next_key_part of the
second range should be that of tmp. However, next_key_part is
set to key1->next_key_part ("5 <= J <= 5") instead of
tmp->next_key_part (empty). Fixing this gives the correct but
not optimal ranges:
"NULL < I < 6",
"6 <= I <= 6 AND 5 <= J <= 5",
"6 < I < 8",
"8 <= I <= 8",
"8 < I"
A second problem can be seen above: key_or() may create
adjacent ranges that could be replaced with a single range.
Fixes for this is also included in the patch so that the range
above becomes correct AND optimal:
"NULL < I < 6",
"6 <= I <= 6 AND 5 <= J <= 5",
"6 < I"
Merging adjacent ranges like this gives a slightly lower cost
estimate for the range access.
semijoin=on,firstmatch=on,loosescan=on
to
semijoin=off,firstmatch=off,loosescan=off
Adjust the testcases:
- Modify subselect*.test and join_cache.test so that all tests
use the same execution paths as before (i.e. optimizations that
are being tested are enabled)
- Let all other test files run with the new default settings (i.e.
with new optimizations disabled)
- Copy subquery testcases from these files into t/subselect_extra.test
which will run them with new optimizations enabled.
Analysis:
This bug consists of two related problems that are
result of too early evaluation of single-row subqueries
during the optimization phase of the outer query.
Several optimizer code paths try to evaluate single-row
subqueries in order to produce a constant and use that
constant for further optimzation.
When the execution of the subquery peforms destructive
changes to the representation of the subquery, and these
changes are not anticipated by the subsequent optimization
phases of the outer query, we tipically get a crash or
failed assert.
Specifically, in this bug the inner-most suqbuery with
DISTINCT triggers a substitution of the original JOIN
object by a single-table JOIN object with a temp table
needed to perform the DISTINCT operation (created by
JOIN::make_simple_join).
This substitution breaks EXPLAIN because:
a) in the first example JOIN::cleanup no longer can
reach the original table of the innermost subquery, and
close all indexes, and
b) in this second test query, EXPLAIN attempts to print
the name of the internal temp table, and crashes because
the temp table has no name (NULL pointer instead).
Solution:
a) fully disable subquery evaluation during optimization
in all cases - both for constant propagation and range
optimization, and
b) change JOIN::join_free() to perform cleanup irrespective
of EXPLAIN or not.
Analysis:
The subquery is evaluated first during ref-optimization of the outer
query because the subquery is considered constant from the perspective
of the outer query. Thus an attempt is made to evaluate the MAX subquery
and use the new constant to drive an index nested loops join.
During this evaluation the inner-most subquery replaces the JOIN_TAB
with a new one that fetches the data from a temp table.
The function select_describe crashes at the lines:
TABLE_LIST *real_table= table->pos_in_table_list;
item_list.push_back(new Item_string(real_table->alias,
strlen(real_table->alias),
cs));
because 'table' is a temp table, and it has no corresponding table
reference. This 'real_table' is NULL, and real_table->alias results
in a crash.
Solution:
In the spirit of MWL#89 prevent the evaluation of expensive predicates
during optimization. This patch prevents the evaluation of expensive
predicates during ref optimization.
sql/item_subselect.h:
Remove unused class member. Not needed for the fix, but noticed now and removed.
Fixed query plans with loose index scan degraded into index scan.
Analysis:
With MWL#89 subqueries are no longer executed and substituted during
the optimization of the outer query. As a result subquery predicates
that were previously executed and substituted by a constant before
the range optimizer were present as regular subquery predicates during
range optimization. The procedure check_group_min_max_predicates()
had a naive test that ruled out all queries with subqueries in the
WHERE clause. This resulted in worse plans with MWL#89.
Solution:
The solution is to refine the test in check_group_min_max_predicates()
to check if each MIN/MAX argument is referred to by a subquery predicate.
Item*) at opt_sum.cc:305
Queries applying MIN/MAX functions to indexed columns are
optimized to read directly from the index if all key parts
of the index preceding the aggregated key part are bound to
constants by the WHERE clause. A prefix length is also
produced, equal to the total length of the bound key
parts. If the aggregated column itself is bound to a
constant, however, it is also included in the prefix.
Such full search keys are read as closed intervals for
reasons beyond the scope of this bug. However, the procedure
missed one case where a key part meant for use as range
endpoint was being overwritten with a NULL value destined
for equality checking. In this case the key part was
overwritten but the range flag remained, causing open
interval reading to be performed.
Bug was fixed by adding more stringent checking to the
search key building procedure (matching_cond) and never
allow overwrites of range predicates with non-range
predicates.
An assertion was added to make sure open intervals are never
used with full search keys.
This has been back-ported from 6.0 as the problems proved to afflict
5.1 as well.
The fix exposed two new bugs. They were reported as follows.
Bug no 52174: Sometimes wrong plan when reading a MAX value
from non-NULL index
Bug no 52173: Reading NULL value from non-NULL index gives wrong
result in embedded server
Both bugs taken together affect a much smaller class of queries than #47762,
so the fix stays for now.
NULL column for NULL
The optimization to read MIN() and MAX() values from an
index did not properly handle comparisons with NULL
values. Fixed by giving up the particular optimization step
if there are non-NULL safe comparisons with NULL values, as
the result is NULL anyway.
Also, Oracle copyright notice was added to all files.
Queries optimized with GROUP_MIN_MAX didn't cleanup KEYREAD
optimization properly. As a result subsequent queries may
return incomplete rows (fields are initialized to default
values).
mysql-test/r/group_min_max.result:
A test case for BUG#49902.
mysql-test/t/group_min_max.test:
A test case for BUG#49902.
sql/opt_range.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/opt_sum.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/sql_select.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/sql_update.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/table.cc:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
sql/table.h:
Refactor of KEYREAD optimization switch so that KEYREAD
handler state is in sync with st_table::key_read flag.
All SQL code is supposed to switch KEYREAD optimization
via st_table::set_keyread().
WHERE conditions
check_group_min_max() checks if the loose index scan
optimization is applicable for a given WHERE condition, that is
if the MIN/MAX attribute participates only in range predicates
comparing the corresponding field with constants.
The problem was that it considered the whole predicate suitable
for the loose index scan optimization as soon as it encountered
a constant as a predicate argument. This is obviously wrong for
cases when a constant is the first argument of a predicate
which does not satisfy the above condition.
Fixed check_group_min_max() so that all arguments of the input
predicate are considered to decide if it passes the test, even
though a constant has already been encountered.
mysql-test/r/group_min_max.result:
Added a test case for bug #48472.
mysql-test/t/group_min_max.test:
Added a test case for bug #48472.
sql/opt_range.cc:
Fixed check_group_min_max() so that all arguments of the input
predicate are considered to decide if it passes the test, even
though a constant has already been encountered.
covering index
When two range predicates were combined under an OR
predicate, the algorithm tried to merge overlapping ranges
into one. But the case when a range overlapped several other
ranges was not handled. This lead to
1) ranges overlapping, which gave repeated results and
2) a range that overlapped several other ranges was cut off.
Fixed by
1) Making sure that a range got an upper bound equal to the
next range with a greater minimum.
2) Removing a continue statement
mysql-test/r/group_min_max.result:
Bug#42846: Changed query plans
mysql-test/r/range.result:
Bug#42846: Test result.
mysql-test/t/range.test:
Bug#42846: Test case.
sql/opt_range.cc:
Bug#42846: The fix.
Part1: Previously, both endpoints from key2 were copied,
which is not safe. Since ranges are processed in ascending
order of minimum endpoints, it is safe to copy the minimum
endpoint from key2 but not the maximum. The maximum may only
be copied if there is no other range or the other range's
minimum is greater than key2's maximum.
results in server crash
check_group_min_max_predicates() assumed the input condition
item to be one of COND_ITEM, SUBSELECT_ITEM, or FUNC_ITEM.
Since a condition of the form "field" is also a valid condition
equivalent to "field <> 0", using such a condition in a query
where the loose index scan was chosen resulted in a debug
assertion failure.
Fixed by handling conditions of the FIELD_ITEM type in
check_group_min_max_predicates().
mysql-test/r/group_min_max.result:
Added a test case for bug #46607.
mysql-test/t/group_min_max.test:
Added a test case for bug #46607.
sql/opt_range.cc:
Handle conditions of the FUNC_ITEM type in
check_group_mix_max_predicates().
without error
When using quick access methods for searching rows in UPDATE or
DELETE there was no check if a fatal error was not already sent
to the client while evaluating the quick condition.
As a result a false OK (following the error) was sent to the
client and the error was thus transformed into a warning.
Fixed by checking for errors sent to the client during
SQL_SELECT::check_quick() and treating them as real errors.
Fixed a wrong test case in group_min_max.test
Fixed a wrong return code in mysql_update() and mysql_delete()
mysql-test/r/bug40113.result:
Bug #40013: test case
mysql-test/r/group_min_max.result:
Bug #40013: fixed a wrong test case
mysql-test/t/bug40113-master.opt:
Bug #40013: test case
mysql-test/t/bug40113.test:
Bug #40013: test case
mysql-test/t/group_min_max.test:
Bug #40013: fixed a wrong test case
sql/sql_delete.cc:
Bug #40113: check for errors evaluating the quick select
sql/sql_update.cc:
Bug #40113: check for errors evaluating the quick select
WHERE and GROUP BY clause
Loose index scan may use range conditions on the argument of
the MIN/MAX aggregate functions to find the beginning/end of
the interval that satisfies the range conditions in a single go.
These range conditions may have open or closed minimum/maximum
values. When the comparison returns 0 (equal) the code should
check the type of the min/max values of the current interval
and accept or reject the row based on whether the limit is
open or not.
There was a wrong composite condition on checking this and it was
not working in all cases.
Fixed by simplifying the conditions and reversing the logic.
mysql-test/r/group_min_max.result:
Bug #45386: test case
mysql-test/t/group_min_max.test:
Bug #45386: test case
sql/opt_range.cc:
Bug #45386: fix the check whether to use the value if on the
interval boundry
Range analysis did not request sorted output from the storage engine,
which cause partitioned handlers to process one partition at a time
while reading key prefixes in ascending order, causing values to be
missed. Fixed by always requesting sorted order during range analysis.
This fix is introduced in 6.0 by the fix for bug no 41136.
mysql-test/r/group_min_max.result:
Bug#44821: Test result.
mysql-test/t/group_min_max.test:
Bug#44821: Test case
sql/opt_range.cc:
Bug#44821: Fix.
return no rows
The algorithm of determining the best key for loose index scan is doing a loop
over the available indexes and selects the one that has the best cost.
It retrieves the parameters of the current index into a set of variables.
If the cost of using the current index is lower than the best cost so far it
copies these variables into another set of variables that contain the
information for the best index so far.
After having checked all the indexes it uses these variables (outside of the
index loop) to create the table read plan object instance.
The was a single omission : the key_infix/key_infix_len variables were used
outside of the loop without being preserved in the loop for the best index
so far.
This causes these variables to get overwritten by the next index(es) checked.
Fixed by adding variables to hold the data for the current index, passing
the new variables to the function that assigns values to them and copying
the new variables into the existing ones when selecting a new current best
index.
To avoid further such problems moved the declarations of the variables used
to keep information about the current index inside the loop's compound
statement.
mysql-test/r/group_min_max.result:
Bug #41610: test case
mysql-test/t/group_min_max.test:
Bug #41610: test case
sql/opt_range.cc:
Bug #41610: copy the infix data for the current best index
used causes server crash.
When the loose index scan access method is used values of aggregated functions
are precomputed by it. Aggregation of such functions shouldn't be performed
in this case and functions should be treated as normal ones.
The create_tmp_table function wasn't taking this into account and this led to
a crash if a query has MIN/MAX aggregate functions and employs temporary table
and loose index scan.
Now the JOIN::exec and the create_tmp_table functions treat MIN/MAX aggregate
functions as normal ones when the loose index scan is used.
mysql-test/r/group_min_max.result:
Added a test case for the bug#38195.
mysql-test/t/group_min_max.test:
Added a test case for the bug#38195.
sql/sql_select.cc:
Bug#38195: Incorrect handling of aggregate functions when loose index scan is
used causes server crash.
The JOIN::exec and the create_tmp_table functions treat MIN/MAX aggregate
functions as normal ones when the loose index scan is used.
used causes server crash.
When the loose index scan access method is used values of aggregated functions
are precomputed by it. Aggregation of such functions shouldn't be performed
in this case and functions should be treated as normal ones.
The create_tmp_table function wasn't taking this into account and this led to
a crash if a query has MIN/MAX aggregate functions and employs temporary table
and loose index scan.
Now the JOIN::exec and the create_tmp_table functions treat MIN/MAX aggregate
functions as normal ones when the loose index scan is used.
mysql-test/r/group_min_max.result:
Added a test case for the bug#38195.
mysql-test/t/group_min_max.test:
Added a test case for the bug#38195.
sql/sql_select.cc:
Bug#38195: Incorrect handling of aggregate functions when loose index scan is
used causes server crash.
Now the JOIN::exec and the create_tmp_table functions treat MIN/MAX aggregate
functions as normal ones when the loose index scan is used.