Analysis:
The fix for lp:944706 introduces early subquery optimization.
While a subquery is being optimized some of its predicates may be
removed. In the test case, the EXISTS subquery is constant, and is
evaluated to TRUE. As a result the whole OR is TRUE, and thus the
correlated condition "b = alias1.b" is optimized away. The subquery
becomes non-correlated.
The subquery cache is designed to work only for correlated subqueries.
If constant subquery optimization is disallowed, then the constant
subquery is not evaluated, the subquery remains correlated, and its
execution is cached. As a result execution is fast.
However, when the constant subquery was optimized away, it was neither
cached by the subquery cache, nor it was cached by the internal subquery
caching. The latter was due to the fact that the subquery still appeared
as correlated to the subselect_XYZ_engine::exec methods, and they
re-executed the subquery on each call to Item_subselect::exec.
Solution:
The solution is to update the correlated status of the subquery after it has
been optimized. This status consists of:
- st_select_lex::is_correlated
- Item_subselect::is_correlated
- SELECT_LEX::uncacheable
- SELECT_LEX_UNIT::uncacheable
The status is updated by st_select_lex::update_correlated_cache(), and its
caller st_select_lex::optimize_unflattened_subqueries. The solution relies
on the fact that the optimizer already called
st_select_lex::update_used_tables() for each subquery. This allows to
efficiently update the correlated status of each subquery without walking
the whole subquery tree.
Notice that his patch is an improvement over MySQL 5.6 and older, where
subqueries are not pre-optimized, and the above analysis is not possible.
Analysis:
The optimizer detects an empty result through constant table optimization.
Then it calls return_zero_rows(), which in turns calls inderctly
Item_maxmin_subselect::no_rows_in_result(). The latter method set "value=0",
however "value" is pointer to Item_cache, and not just an integer value.
All of the Item_[maxmin | singlerow]_subselect::val_XXX methods does:
if (forced_const)
return value->val_real();
which of course crashes when value is a NULL pointer.
Solution:
When the optimizer discovers an empty result set, set
Item_singlerow_subselect::value to a FALSE constant Item instead of NULL.
The cause for this bug is that the method JOIN::get_examined_rows iterates over all
JOIN_TABs of the join assuming they are just a sequence. In the query above, the
innermost subquery is merged into its parent query. When we call
JOIN::get_examined_rows for the second-level subquery, the iteration that
assumes sequential order of join tabs goes outside the join_tab array and calls
the method JOIN_TAB::get_examined_rows on uninitialized memory.
The fix is to iterate over JOIN_TABs in a way that takes into account the nested
semi-join structure of JOIN_TABs. In particular iterate as select_describe.
The patch enables back constant subquery execution during
query optimization after it was disabled during the development
of MWL#89 (cost-based choice of IN-TO-EXISTS vs MATERIALIZATION).
The main idea is that constant subqueries are allowed to be executed
during optimization if their execution is not expensive.
The approach is as follows:
- Constant subqueries are recursively optimized in the beginning of
JOIN::optimize of the outer query. This is done by the new method
JOIN::optimize_constant_subqueries(). This is done so that the cost
of executing these queries can be estimated.
- Optimization of the outer query proceeds normally. During this phase
the optimizer may request execution of non-expensive constant subqueries.
Each place where the optimizer may potentially execute an expensive
expression is guarded with the predicate Item::is_expensive().
- The implementation of Item_subselect::is_expensive has been extended
to use the number of examined rows (estimated by the optimizer) as a
way to determine whether the subquery is expensive or not.
- The new system variable "expensive_subquery_limit" controls how many
examined rows are considered to be not expensive. The default is 100.
In addition, multiple changes were needed to make this solution work
in the light of the changes made by MWL#89. These changes were needed
to fix various crashes and wrong results, and legacy bugs discovered
during development.
Analysis:
The reason for the wrong result is the interaction between constant
optimization (in this case 1-row table) and subquery optimization.
- First the outer query is optimized, and 'make_join_statistics' finds that
table t2 has one row, reads that row, and marks the whole table as constant.
This also means that all fields of t2 are constant.
- Next, we optimize the subquery in the end of the outer 'make_join_statistics'.
The field 'f2' is considered constant, with value '3'. The subquery predicate
is rewritten as the constant TRUE.
- The outer query execution detects early that the whole query result is empty
and calls 'return_zero_rows'. Since the query is with implicit grouping, we
have to produce one row with special values for the aggregates (depending on
each aggregate function), and NULL values for all non-aggregate fields. This
function calls 'no_rows_in_result' to set each aggregate function to the
default value when it aggregates over an empty result, and then calls
'send_data', which in turn evaluates each Item in the SELECT list.
- When evaluation reaches the subquery predicate, it executes the subquery
with field 'f2' having a constant value '3', and the subquery produces the
incorrect result '7'.
Solution:
Implement Item::no_rows_in_result for all subquery predicates. In order to
make this work, it is also needed to make all val_* methods of all subquery
predicates respect the Item_subselect::forced_const flag. Otherwise subqueries
are executed anyways, and override the default value set by no_rows_in_result
with whatever result is produced from the subquery evaluation.
The table contains one time value: '00:00:32'
This value is converted to timestamp by a subquery.
In convert_constant_item we call (*item)->is_null()
which triggers execution of the Item_singlerow_subselect subquery,
and the string "0000-00-00 00:00:32" is cached
by Item_cache_datetime.
We continue execution and call update_null_value, which calls val_int()
on the cached item, which converts the time value to ((longlong) 32)
Then we continue to do (*item)->save_in_field()
which ends up in Item_cache_datetime::val_str() which fails,
since (32 < 101) in number_to_datetime, and val_str() returns NULL.
Item_singlerow_subselect::val_str isnt prepared for this:
if exec() succeeds, and return !null_value, then val_str()
*must* succeed.
Solution: refuse to cache strings like "0000-00-00 00:00:32"
in Item_cache_datetime::cache_value, and return NULL instead.
This is similar to the solution for
Bug#11766860 - 60085: CRASH IN ITEM::SAVE_IN_FIELD() WITH TIME DATA TYPE
This patch is for 5.5 only.
The issue is not present after WL#946, since a time value
will be converted to a proper timestamp, with the current date
rather than "0000-00-00"
mysql-test/r/subselect.result:
New test case.
mysql-test/t/subselect.test:
New test case.
sql/item.cc:
Verify proper date format before caching timestamps.
sql/item_timefunc.cc:
Use named constant for readability.
The table contains one time value: '00:00:32'
This value is converted to timestamp by a subquery.
In convert_constant_item we call (*item)->is_null()
which triggers execution of the Item_singlerow_subselect subquery,
and the string "0000-00-00 00:00:32" is cached
by Item_cache_datetime.
We continue execution and call update_null_value, which calls val_int()
on the cached item, which converts the time value to ((longlong) 32)
Then we continue to do (*item)->save_in_field()
which ends up in Item_cache_datetime::val_str() which fails,
since (32 < 101) in number_to_datetime, and val_str() returns NULL.
Item_singlerow_subselect::val_str isnt prepared for this:
if exec() succeeds, and return !null_value, then val_str()
*must* succeed.
Solution: refuse to cache strings like "0000-00-00 00:00:32"
in Item_cache_datetime::cache_value, and return NULL instead.
This is similar to the solution for
Bug#11766860 - 60085: CRASH IN ITEM::SAVE_IN_FIELD() WITH TIME DATA TYPE
This patch is for 5.5 only.
The issue is not present after WL#946, since a time value
will be converted to a proper timestamp, with the current date
rather than "0000-00-00"
The function subselect_uniquesubquery_engine::copy_ref_key has to take into
account that when EXPLAIN is processed the array of store_key object created
for any TABLE_REF may contain elements for constant items. These items should
be ignored by thefunction.
Problem: When building the condition for JOIN::outer_ref_cond the optimizer forgot to take into account
that this condition could depend on constant tables as well.
Completed the fix for this bug.
Note: in 5.3 the affected 'if' statement in Item_in_subselect::single_value_transformer()
starting with the condition (thd->variables.sql_mode & MODE_ONLY_FULL_GROUP_BY)
should be removed altogether. The change from table.cc is not needed either.
This is because in 5.3
- min/max transformation for subqueries are done at the optimization phase
- evaluation of the expensive subqueries is done at the execution phase.
Added an EXPLAIN EXTENDED to the test case for bug #12329653.
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
care has been taken to restore the original plan so that each
test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
array bounds checking, and because it looks risky.
Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are
meaningless:
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate
functions
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
The cause of the wrong result was that Item_ref_null_helper::get_date()
didn't use a method of the *_result() family, and fetched the data
for the field from the current row instead of result_field. Changed to
use the correct *_result() method, like to all other similar methods
of Item_ref_null_helper.
Analysis:
lp:894397 was a consequence of a prior incorrect fix of lp:833777
which didn't take into account that even when all tables are
constant there may be correlated conditions, and the where clause
is not equivalent to the constant conditions.
Solution:
When there are constant tables only, evaluate only the conditions
that reference outer fields, because the constant conditions are
already checked, and the where clause doesn't have other conditions
than constant ones, and outer referencing ones. The fix for
lp:894397 also fixes lp:833777.
The problem was that when we have single row subquery with no rows
Item_cache(es) which represent result row was not null and being
requested via element_index() returned random value.
The fix is setting all Item_cache(es) in NULL before executing the
query (reset() method) which guaranty NULL value of whole query
or its elements requested in any way if no rows was found.
set_null() method was added to Item_cache to guaranty correct NULL
value in case of reseting the cache.
Stop attempts to apply IN/ALL/ANY optimizations to so called "fake_select"
(used for ordering and filtering results of union) in union subquery execution.
Analysis:
The optimizer distinguishes two kinds of 'constant' conditions:
expensive ones, and non-expensive ones. The non-expensive conditions
are evaluated inside make_join_select(), and if false, already the
optimizer detects empty query results.
In order to avoid arbitrarily expensive optimization, the evaluation of
expensive constant conditions is delayed until execution. These conditions
are attached to JOIN::exec_const_cond and evaluated in the beginning of
JOIN::exec. The relevant execution logic is:
JOIN::exec()
{
if (! join->exec_const_cond->val_int())
{
produce an empty result;
stop execution
}
continue execution
execute the original WHERE clause (that contains exec_const_cond)
...
}
As a result, when an expensive constant condition is
TRUE, it is evaluated twice - once through
JOIN::exec_const_cond, and once through JOIN::cond.
When the expensive constant condition is a subquery,
predicate, the subquery is evaluated twice. If we have
many levels of subqueries, this logic results in a chain
of recursive subquery executions that walk a perfect
binary tree. The result is that for subquries with depth N,
JOIN::exec is executed O(2^N) times.
Solution:
Notice that the second execution of the constant conditions
happens inside do_select(), in the branch:
if (join->table_count == join->const_tables) { ... }
In this case exec_const_cond is equivalent to the whole WHERE
clause, therefore the WHERE clause has already been checked in
the beginnig of JOIN::exec, and has been found to be true.
The bug is addressed by not evaluating the WHERE clause if there
was exec_const_conds, and it was TRUE.
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.
In MariaDB, when running in ONLY_FULL_GROUP_BY mode,
the server produced in incorrect error message that there
is an aggregate function without GROUP BY, for artificially
created MIN/MAX functions during subquery MIN/MAX optimization.
The fix introduces a way to distinguish between artifially
created MIN/MAX functions as a result of a rewrite, and normal
ones present in the query. The test for ONLY_FULL_GROUP_BY violation
now tests in addition if a MIN/MAX function was part of a MIN/MAX
subquery rewrite.
In order to be able to distinguish these MIN/MAX functions, the
patch introduces an additional flag in Item_in_subselect::in_strategy -
SUBS_STRATEGY_CHOSEN. This flag is set when the optimizer makes its
final choice of a subuqery strategy. In order to make the choice
consistent, access to Item_in_subselect::in_strategy is provided
via new class methods.
******
Fix MySQL BUG#12329653
In MariaDB, when running in ONLY_FULL_GROUP_BY mode,
the server produced in incorrect error message that there
is an aggregate function without GROUP BY, for artificially
created MIN/MAX functions during subquery MIN/MAX optimization.
The fix introduces a way to distinguish between artifially
created MIN/MAX functions as a result of a rewrite, and normal
ones present in the query. The test for ONLY_FULL_GROUP_BY violation
now tests in addition if a MIN/MAX function was part of a MIN/MAX
subquery rewrite.
In order to be able to distinguish these MIN/MAX functions, the
patch introduces an additional flag in Item_in_subselect::in_strategy -
SUBS_STRATEGY_CHOSEN. This flag is set when the optimizer makes its
final choice of a subuqery strategy. In order to make the choice
consistent, access to Item_in_subselect::in_strategy is provided
via new class methods.