upon SELECT .. LIMIT 0
The code must differentiate between a SELECT with contradictory
WHERE/HAVING and one with LIMIT 0.
Also for the latter printed 'Zero limit' instead of 'Impossible where'
in the EXPLAIN output.
Any expensive WHERE condition for a table-less query with
implicit aggregation was lost. As a result the used aggregate
functions were calculated over a non-empty set of rows even
in the case when the condition was false.
For the query having an IN subquery with no tables, we were converting the subquery with an expression between
the left part and the select list of the subquery . This can give incorrect results when we have a condition
in the subquery with a dual table (as this is treated as a no table).
The fix is that we don't do this conversion when we have conds in the subquery with a dual table.
failed with SELECT SQ, TEXT field
The functon find_all_keys does call Item_subselect::walk, which calls walk() for the subquery
The issue is that when a field is represented by Item_outer_ref(Item_direct_ref(Item_copy_string( ...))).
Item_copy_string does have a pointer to an Item_field in Item_copy::item but does not implement Item::walk method, so we are not
able to set the bitmap for that field. This is the reason why the assert fails.
Fixed by adding the walk method to Item_copy class.
Also, implement MDEV-11027 a little differently from 5.5:
recv_sys_t::report(ib_time_t): Determine whether progress should
be reported.
recv_apply_hashed_log_recs(): Rename the parameter to last_batch.
The bug occurred when a subquery
- has a reference to outside, to grand-parent query or further up
- is converted to a semi-join (i.e. merged into its parent).
Then the reference to outside had form Item_ref(Item_field(...)).
- Conversion to semi-join would call item->fix_after_pullout() for the
outside reference.
- Item_ref::fix_after_pullout would call Item_field->fix_after_pullout
- The Item_field would construct a new Name_resolution_context object
This process ignored the fact that the Item_field does not belong to
any of the subselects being flattened.
The result was crash in the next call to Item_field::fix_fields(), where
we would try to use an invalid Name_resolution_context object.
Fixed by not creating Name_resolution_context object if the Item_field's
context does not belong to the subselect(s) that were flattened.
The calls of the function remove_eq_conds() may change the and/or structure
of the where conditions. So JOIN::equal_cond should be updated for non-recursive
calls of remove_eq_conds().
The field JOIN::select_lex->where should be updated after the call
of remove_eq_conds() in the function make_join_statistics(). This
matters for subselects.
The earlier pushed fix for the bug was incomplete. It did not remove
the main cause of the problem: the function remove_eq_conds()
removed always true multiple equalities from any conjunct, but did not
adjust the list of them stored in Item_cond_and::cond_equal.current_level.
Simplified the test case for the bug and moved it to another test file.
The fix triggered changes in EXPLAIN EXTENDED for some queries.
Analysis:
The reason for the inefficent plan was that Item_subselect::is_expensive()
didn't detect the special case when a subquery was optimized, but had no
join plan because it either has no table, or its tables have been optimized
away, or the optimizer detected that the result set is empty.
Solution:
Identify the special cases above in the Item_subselect::is_expensive(),
and consider such degenerate subqueries inexpensive.
The problem was that maybe_null of Item_row and its componetes was unsynced after update_used_tables() (and so pushed_cond_guards was not initialized but then requested).
Fix updates Item_row::maybe_null on update_used_tables().
Analysis:
The following call stack shows that it is possible to set Item_cache::value_cached, and the relevant value
without setting Item_cache::example.
#0 Item_cache_temporal::store_packed at item.cc:8395
#1 get_datetime_value at item_cmpfunc.cc:915
#2 resolve_const_item at item.cc:7987
#3 propagate_cond_constants at sql_select.cc:12264
#4 propagate_cond_constants at sql_select.cc:12227
#5 optimize_cond at sql_select.cc:13026
#6 JOIN::optimize at sql_select.cc:1016
#7 st_select_lex::optimize_unflattened_subqueries at sql_lex.cc:3161
#8 JOIN::optimize_unflattened_subqueries at opt_subselect.cc:4880
#9 JOIN::optimize at sql_select.cc:1554
The fix is to set Item_cache_temporal::example even when the value is
set directly by Item_cache_temporal::store_packed. This makes the
Item_cache_temporal object consistent.
Analysys:
In the beginning of JOIN::cleanup there is code that is supposed to
free all filesort buffers. The code assumes that the table being sorted
is the first non-constant table. To get this table it calls:
first_top_level_tab(this, WITHOUT_CONST_TABLES)
However, first_top_level_tab() instead returned the wrong table - the first
one in the plan, instead of the first non-constant table. There is no other
place outside filesort() where sort buffers may be freed. As a result, the
sort buffer was not freed, and there was a memory leak.
Solution:
Change first_top_level_tab(), to test for WITH_CONST_TABLES instead of
WITHOUT_CONST_TABLES.
Analysis:
The queries in question use the [unique | index]_subquery execution methods.
These methods reuse the ref keys constructed by create_ref_for_key(). The
way create_ref_for_key() works is that it doesn't store in ref.key_copy[]
store_key elements that represent constants. In particular it doesn't store
the store_key for NULL constants.
The execution of [unique | index]_subquery calls
subselect_uniquesubquery_engine::copy_ref_key, which in addition to copy
the left IN argument into a index lookup key, is supposed to detect if
the left IN argument contains NULLs. Since the store_key for the NULL
constant is not copied into the key array, the null is not detected, and
execution erroneously proceeds as if it should look for a complete match.
Solution:
The solution (unlike MySQL) is to reuse already computed information about
NULL presence. Item_in_optimizer::val_int already finds out if the left IN
operand contains NULLs. The fix propagates this to the execution methods
subselect_[unique | index]subquery_engine::exec so it knows if there were
NULL values independent of the presence of keys.
In addition the patch siplifies copy_ref_key() and the logic that hanldes
the case of NULLs in the left IN operand.
Analysis:
Queries with implicit grouping (there is aggregate, but no group by)
follow some non-obvious semantics in the case of empty result set.
Aggregate functions produce some special "natural" value depending on
the function. For instance MIN/MAX return NULL, COUNT returns 0.
The complexity comes from non-aggregate expressions in the select list.
If the non-aggregate expression is a constant, it can be computed, so
we should return its value, however if the expression is non-constant,
and depends on columns from the empty result set, then the only meaningful
value is NULL.
The cause of the wrong result was that for subqueries the optimizer didn't
make a difference between constant and non-constant ones in the case of
empty result for implicit grouping.
Solution:
In all implementations of Item_subselect::no_rows_in_result() check if the
subquery predicate is constant. If it is constant, do not set it to the
default value for implicit grouping, instead let it be evaluated.
Analysis:
When the method JOIN::choose_subquery_plan() decided to apply
the IN-TO-EXISTS strategy, it set the unit and select_lex
uncacheable flag to UNCACHEABLE_DEPENDENT_INJECTED unconditionally.
As result, even if IN-TO-EXISTS injected non-correlated predicates,
the subquery was still treated as correlated.
Solution:
Set the subquery as correlated only if the injected predicate(s) depend
on the outer query.
The patch enables back constant subquery execution during
query optimization after it was disabled during the development
of MWL#89 (cost-based choice of IN-TO-EXISTS vs MATERIALIZATION).
The main idea is that constant subqueries are allowed to be executed
during optimization if their execution is not expensive.
The approach is as follows:
- Constant subqueries are recursively optimized in the beginning of
JOIN::optimize of the outer query. This is done by the new method
JOIN::optimize_constant_subqueries(). This is done so that the cost
of executing these queries can be estimated.
- Optimization of the outer query proceeds normally. During this phase
the optimizer may request execution of non-expensive constant subqueries.
Each place where the optimizer may potentially execute an expensive
expression is guarded with the predicate Item::is_expensive().
- The implementation of Item_subselect::is_expensive has been extended
to use the number of examined rows (estimated by the optimizer) as a
way to determine whether the subquery is expensive or not.
- The new system variable "expensive_subquery_limit" controls how many
examined rows are considered to be not expensive. The default is 100.
In addition, multiple changes were needed to make this solution work
in the light of the changes made by MWL#89. These changes were needed
to fix various crashes and wrong results, and legacy bugs discovered
during development.
make sure that stored routines are evaluated (that is, de facto - cached) in convert_const_to_int().
revert the fix for lp:806943 because it cannot be repeated anymore.
add few tests for convert_const_to_int()
- After the exec_const_cond->val_int() call, check for error and return.
(if we don't do it, we will eventually hit an error when trying to set status OK in
the diagnostics area, which already has an error status).
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
care has been taken to restore the original plan so that each
test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
array bounds checking, and because it looks risky.
Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are
meaningless:
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate
functions
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.