- Make query plan be re-saved after the first join execution
(saving it after JOIN::cleanup is too late because EXPLAIN output
is currently produced before that)
- Handle QPF allocation/deallocation for edge cases, like unsuccessful
BINLOG command.
- Work around the problem with UNION's direct subselects not being visible.
- Update test results ("Using temporary; Using filesort" are now always printed
last in the Extra column)
- This cset gets rid of memory leaks/crashes. Some result mismatches still remain.
The wrong result set returned by the left join query from
the bug test case happened due to several inconsistencies
and bugs of the legacy mysql code.
The bug test case uses an execution plan that employs a scan
of a materialized IN subquery from the WHERE condition.
When materializing such an IN- subquery the optimizer injects
additional equalities into the WHERE clause. These equalities
express the constraints imposed by the subquery predicate.
The injected equality of the query in the test case happens
to belong to the same equality class, and a new equality
imposing a condition on the rows of the materialized subquery
is inferred from this class. Simultaneously the multiple
equality is added to the ON expression of the LEFT JOIN
used in the main query.
The inferred equality of the form f1=f2 is taken into account
when optimizing the scan of the rows the temporary table
that is the result of the subquery materialization: only the
values of the field f1 are read from the table into the record
buffer. Meanwhile the inferred equality is removed from the
WHERE conditions altogether as a constraint on the fields
of the temporary table that has been used when filling this table.
This equality is supposed to be removed from the ON expression
when the multiple equalities of the ON expression are converted
into an optimal set of equality predicates. It supposed to be
removed from the ON expression as an equality inferred from only
equalities of the WHERE condition. Yet, it did not happened
due to the following bug in the code.
Erroneously the code tried to build multiple equality for ON
expression twice: the first time, when it called optimize_cond()
for the WHERE condition, the second time, when it called
this function for the HAVING condition. When executing
optimize_con() for the WHERE condition a reference
to the multiple equality of the WHERE condition is set
in the multiple equality of the ON expression. This reference
would allow later to convert multiple equalities of the
ON expression into equality predicates. However the
the second call of build_equal_items() for the ON expression
that happened when optimize_cond() was called for the
HAVING condition reset this reference to NULL.
This bug fix blocks calling build_equal_items() for ON
expressions for the second time. In general, it will be
beneficial for many queries as it removes from ON
expressions any equalities that are to be checked for the
WHERE condition.
The patch also fixes two bugs in the list manipulation
operations and a bug in the function
substitute_for_best_equal_field() that resulted
in passing wrong reference to the multiple equalities
of where conditions when processing multiple
equalities of ON expressions.
The code of substitute_for_best_equal_field() and
the code the helper function eliminate_item_equal()
were also streamlined and cleaned up.
Now the conversion of the multiple equalities into
an optimal set of equality predicates first produces
the sequence of the all equalities processing multiple
equalities one by one, and, only after this, it inserts
the equalities at the beginning of the other conditions.
The multiple changes in the output of EXPLAIN
EXTENDED are mainly the result of this streamlining,
but in some cases is the result of the removal of
unneeded equalities from ON expressions. In
some test cases this removal were reflected in the
output of EXPLAIN resulted in disappearance of
“Using where” in some rows of the execution plans.
instead of single_select_engine
This task changes the IN-EXISTS rewrite for multi-column subqueries
"(a, b) IN (select b, c ...)" to work in the same way as for
single-column subqueries "a IN (select b ...) with respect to the
injection of NULL-rejecting predicates.
More specifically, the method
Item_in_subselect::create_row_in_to_exists_cond()
adds Item_is_not_null_test and Item_func_trig_cond only if the left
IN operand can be NULL. Not having these predicates when not necessary,
makes it possible to rewrite the subquery into a "unique_subquery" or
"index_subquery" when there is a suitable index on the only
subquery table.
This task fixes an ineffeciency that is a remainder from MySQL 5.0/5.1. There, subqueries
were optimized in a lazy manner, when executed for the first time. During this lazy optimization
it may happen that the server finds a more efficient subquery engine, and substitute the current
engine of the query being executed with the new engine. This required re-execution of the engine.
MariaDB 5.3 pre-optimizes subqueries in almost all cases, and the engine is chosen in most cases,
except when subquery materialization found that it must use partial matching. In this case, the
current code was performing one extra re-execution although it was not needed at all. The patch
performs the re-execution only if the engine was changed while executing.
In addition the patch performs small cleanup by removing "enum store_key_result" because it is
essentially a boolean, and the code that uses it already maps it to a boolean.
The patch enables back constant subquery execution during
query optimization after it was disabled during the development
of MWL#89 (cost-based choice of IN-TO-EXISTS vs MATERIALIZATION).
The main idea is that constant subqueries are allowed to be executed
during optimization if their execution is not expensive.
The approach is as follows:
- Constant subqueries are recursively optimized in the beginning of
JOIN::optimize of the outer query. This is done by the new method
JOIN::optimize_constant_subqueries(). This is done so that the cost
of executing these queries can be estimated.
- Optimization of the outer query proceeds normally. During this phase
the optimizer may request execution of non-expensive constant subqueries.
Each place where the optimizer may potentially execute an expensive
expression is guarded with the predicate Item::is_expensive().
- The implementation of Item_subselect::is_expensive has been extended
to use the number of examined rows (estimated by the optimizer) as a
way to determine whether the subquery is expensive or not.
- The new system variable "expensive_subquery_limit" controls how many
examined rows are considered to be not expensive. The default is 100.
In addition, multiple changes were needed to make this solution work
in the light of the changes made by MWL#89. These changes were needed
to fix various crashes and wrong results, and legacy bugs discovered
during development.
- When doing join optimization, pre-sort the tables so that they mimic the execution
order we've had with 'semijoin=off'.
- That way, we will not get regressions when there are two query plans (the old and the
new) that have indentical costs but different execution times (because of factors that
the optimizer was not able to take into account).
- If LooseScan is used with quick select, require that quick select produces
data in key order (this disables use of MRR, which can return data in arbitrary order).
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
- Correctly handle plan refinement stage for LooseScan plans: run create_ref_for_key() if LooseScan
plan includes a ref access, and if we don't have any fixed key components, switch to a full index scan.
* rename all debugging related command-line options
and variables to start from "debug-", and made them all
OFF by default.
* replace "MySQL" with "MariaDB" in error messages
* "Cast ... converted ... integer to it's ... complement"
is now a note, not a warning
* @@query_cache_strip_comments now has a session scope,
not global.
- if we're considering FirstMatch access with one inner table, and
@@optimizer_switch has semijoin_with_cache flag, calculate costs
as if we used join cache (because we will be able to do so)
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
- Make EXPLAIN display "Start temporary" at the start of the fanout (it used to display
at the first table whose rowid gets into temp. table which is not that useful for
the user)
- Updated test results (all checked)
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.
sql/sql_insert.cc:
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
******
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
sql/sql_table.cc:
small cleanup
******
small cleanup
in the function best_access_path revealed another bug: currently
table scans on NULL keys used for NOT IN subqueries cannot work
together with employment of join caches for inner tables of these
subqueries. Otherwise the result can be wrong as it could be seen
with the result of the test case constructed for bug #37894
in the file subselect3_jcl6.result.
of the 5.3 code line after a merge with 5.2 on 2010-10-28
in order not to allow the cost to access a joined table to be equal
to 0 ever.
Expanded data sets for many test cases to get the same execution plans
as before.
- Set the default
- Adjust the testcases so that 'new' tests are run with optimizations turned on.
- Pull out relevant tests from "irrelevant" tests and run them with optimizations on.
- Run range.test and innodb.test with both mrr=on and mrr=off
semijoin=on,firstmatch=on,loosescan=on
to
semijoin=off,firstmatch=off,loosescan=off
Adjust the testcases:
- Modify subselect*.test and join_cache.test so that all tests
use the same execution paths as before (i.e. optimizations that
are being tested are enabled)
- Let all other test files run with the new default settings (i.e.
with new optimizations disabled)
- Copy subquery testcases from these files into t/subselect_extra.test
which will run them with new optimizations enabled.
Resolved all conflicts, bad merges and fixed a few minor bugs in the code.
Commented out the queries from multi_update, view, subselect_sj, func_str,
derived_view, view_grant that failed either with crashes in ps-protocol or
with wrong results.
The failures are clear indications of some bugs in the code and these bugs
are to be fixed.
Analysis:
Build_equal_items_for_cond() rewrites the WHERE clause in such a way,
that it may merge the list join->cond_equal->current_level with the
list of child Items in an AND condition of the WHERE clause.
The place where this is done is:
static COND *build_equal_items_for_cond(THD *thd, COND *cond,
COND_EQUAL *inherited)
{
...
if (and_level)
{
args->concat(&eq_list);
args->concat((List<Item> *)&cond_equal.current_level);
}
...
}
As a result, later transformations on the WHERE clause may change the
structure of the list join->cond_equal->current_level without knowing this.
Specifically in this bug, Item_in_subselect::inject_in_to_exists_cond
creates a new AND of the old WHERE clause and the IN->EXISTS conditions.
It then calls fix_fields() for the new AND. Among other things, fix_fields
flattens all nested ANDs into one by merging the AND argument lists.
When there is a cond_equal for the JOIN, its list of Item_equal objects
is attached to the end of the original AND. When a lower-level AND is
merged into the top-level one, the argument list of the lower-level AND
is concatenated to the list of multiple equalities in the upper-level AND.
As a result, when substitute_for_best_equal_field processes the
multiple equalities, it turns out that the multiple equality list contains
the Items from the lower-level AND which were concatenated to the end of
the join->cond_equal->current_level list. This results in a crash because
this list must not contain any other Items except for the previously found
Item_equal ones.
Solution:
When performing IN->EXIST predicate injection, and the where clause is an
AND, detach the list of Item_equal objects before calling fix_fields on
the injected where clause.
After fix_fields is done, reattach back the multiple equalities list to
the end of the argument list of the new AND.
Analysis:
The wrong result is a consquence of sorting the subquery
result and then selecting only the first row due to the
artificial LIMIT 1 introduced by the fix_fields phase.
Normally, if there is an ORDER BY in a subquery, the ORDER
is removed (Item_in_subselect::select_in_like_transformer),
however if a GROUP BY is transformed into ORDER, this happens
later, after the removal of the ORDER clause of subqueries, so
we end up with a subquery with an ORDER clause, and an artificially
added LIMIT 1.
The reason why the same works in the main 5.3 without MWL#89, is
that the 5.3 performs all subquery transformations, including
IN->EXISTS before JOIN::optimize(). The beginning of JOIN::optimize
does:
if (having || (select_options & OPTION_FOUND_ROWS))
select_limit= HA_POS_ERROR;
which sets the limit back to infinity, thus 5.3 sorts the whole
subquery result, and IN performs the lookup into all subquery result
rows.
Solution:
Sorting of subqueries without LIMIT is meaningless. Since LIMIT in
subqueries is not supported, the patch removes sorting by setting
join->skip_sort_order= true
for each subquery JOIN object. This improves a number of execution
plans to not perform unnecessary sorting at all.
- Let advance_sj_state() save the value of JOIN::cur_dups_producing_tables
in POSITION::prefix_dups_producing_tables, and restore_sj_state() restore
it.
- "Using MRR" is no longer shown with range access.
- Instead, both range and BKA accesses will show one of the following:
= "Rowid-ordered scan"
= "Key-ordered scan"
= "Key-ordered Rowid-ordered scan"
depending on whether DS-MRR implementation will do scan keys in order, rowids in order,
or both.
- The patch also introduces a way for other storage engines/MRR implementations to
pass information to EXPLAIN output about the properties of employed MRR scans.