Analysis:
The fix for lp:944706 introduces early subquery optimization.
While a subquery is being optimized some of its predicates may be
removed. In the test case, the EXISTS subquery is constant, and is
evaluated to TRUE. As a result the whole OR is TRUE, and thus the
correlated condition "b = alias1.b" is optimized away. The subquery
becomes non-correlated.
The subquery cache is designed to work only for correlated subqueries.
If constant subquery optimization is disallowed, then the constant
subquery is not evaluated, the subquery remains correlated, and its
execution is cached. As a result execution is fast.
However, when the constant subquery was optimized away, it was neither
cached by the subquery cache, nor it was cached by the internal subquery
caching. The latter was due to the fact that the subquery still appeared
as correlated to the subselect_XYZ_engine::exec methods, and they
re-executed the subquery on each call to Item_subselect::exec.
Solution:
The solution is to update the correlated status of the subquery after it has
been optimized. This status consists of:
- st_select_lex::is_correlated
- Item_subselect::is_correlated
- SELECT_LEX::uncacheable
- SELECT_LEX_UNIT::uncacheable
The status is updated by st_select_lex::update_correlated_cache(), and its
caller st_select_lex::optimize_unflattened_subqueries. The solution relies
on the fact that the optimizer already called
st_select_lex::update_used_tables() for each subquery. This allows to
efficiently update the correlated status of each subquery without walking
the whole subquery tree.
Notice that his patch is an improvement over MySQL 5.6 and older, where
subqueries are not pre-optimized, and the above analysis is not possible.
The result of materialization of the right part of an IN subquery predicate
is placed into a temporary table. Each row of the materialized table is
distinct. A unique key over all fields of the temporary table is defined and
created. It allows to perform key look-ups into the table.
The table created for a materialized subquery can be accessed by key as
any other table. The function best_access-path search for the best access
to join a table to a given partial join. With some where conditions this
function considers a possibility of a ref_or_null access. If such access
employs the unique key on the temporary table then when estimating
the cost this access the function tries to use the array rec_per_key. Yet,
such array is not built for this unique key. This causes a crash of the server.
Rows returned by the subquery that contain nulls don't have to be placed
into temporary table, as they cannot be match any row produced by the
left part of the subquery predicate. So all fields of the temporary table
can be defined as non-nullable. In this case any ref_or_null access
to the temporary table does not make any sense and it does not make sense
to estimate such an access.
The fix makes sure that the temporary table for a materialized IN subquery
is defined with columns that are all non-nullable. The also ensures that
any row with nulls returned by the subquery is not placed into the
temporary table.
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
The bug happened because in some cases the function JOIN::exec
did not save the value of TABLE::pre_idx_push_select_cond in
TABLE::select->pre_idx_push_select_cond for the sort table.
Noticed and fixed a bug in the function make_cond_remainder
that builds the remainder condition after extraction of an index
pushdown condition from the where condition. The code
erroneously assumed that the function make_cond_for_table left
the value of ICP_COND_USES_INDEX_ONLY in sub-condition markers.
Adjusted many result files from the regression test suite
after this fix .
of the 5.3 code line after a merge with 5.2 on 2010-10-28
in order not to allow the cost to access a joined table to be equal
to 0 ever.
Expanded data sets for many test cases to get the same execution plans
as before.
Reset 'examined_rows_count' in union to not count same rows twice
mysql-test/r/subselect_mat_cost.result:
Test also slow query logging
mysql-test/t/subselect_mat_cost.test:
Test also slow query logging
sql/sql_union.cc:
Reset 'examined_rows_count' in union to not count same rows twice
- Set the default
- Adjust the testcases so that 'new' tests are run with optimizations turned on.
- Pull out relevant tests from "irrelevant" tests and run them with optimizations on.
- Run range.test and innodb.test with both mrr=on and mrr=off
- Added regression test with queries over the WORLD database.
- Discovered and fixed several bugs in the related cost calculation
functionality both in the semijoin and non-semijon subquery code.
- Added DBUG printing of the cost variables used to decide between
IN-EXISTS and MATERIALIZATION.
Split the tests for MWL#89 into two parts - one for bugs
(currently active), and one for functionality tets
(currently in progress, and thus disabled).
Disable the test for LP BUG#718593.
The patch also adjusts several instable test results
to order the result.
Analysis:
The function prev_record_reads() may skip (jump over)
some query plan nodes where record_count < 1. At the
same time, even though get_partial_join_cost() uses
all first N plan nodes after the last constant table,
it may produce a smaller record_count than
prev_record_reads(), because the record count for
some plan nodes may be < 1, and these nodes may not
participate in prev_record_reads.
Solution:
The current solution is to treat the result of
get_partial_join_cost() as the upper bound for the
total number of unique lookup keys.
This bug extends the fix for LP BUG#715027 to cover one
more case of an Item being transformed, and its property
Item::with_subselect not being updated because
quick_fix_fields doesn't recalculate any properties.
Analysis:
Before calling:
write_record= (select->skip_record(thd) > 0);
the function find_all_keys needs to restore the original read/write
sets of the table that is sorted if the condition select->cond
contains a subquery.
This didn't happen in this test case because the flag "with_subselect"
was not set properly for select->cond.
The reason for the flag not being set properly, was that this condition
was rewritten by add_cond_and_fix() inside make_join_select() by:
/* Add conditions added by add_not_null_conds(). */
if (tab->select_cond)
add_cond_and_fix(thd, &tmp, tab->select_cond);
However, the function add_cond_and_fix() called the shortcut method
Item::quick_fix_field() that didn't update the "with_subselect"
property.
Solution:
Call the complete Item::fix_fields() to update all Item properties,
including "with_subselect".
Analysis:
The failed assert is a result of calling Item_sum_distinct::clear()
on an incomplete object for which Item_sum_distinct::setup() was
not yet called.
The reason is that JOIN::exec for the outer query calls JOIN::reinit()
for all its subqueries, which in turn calls clear() for all aggregate
functions of the subqueries. The call stack is:
mysql_explain_union -> mysql_select -> JOIN::exec -> select_desribe ->
mysql_explain_union -> mysql_select -> JOIN::reinit
This assert doesn't fail in the main 5.3 because constant subqueries
are being executed during the optimize phase of the outer query,
thus the Unique object is created before calling JOIN::exec for the
outer query, and Item_sum_distinct::clear() actually cleans the
Unique object.
Solution:
The best solution is the obvious one - substitute the assert with
a test whether Item_sum_distinct::tree is NULL.
Analysis:
The crash in EXPLAIN resulted from an attempt to print the
name of the internal temporary table created to compute
distinct for the innermost subquery of the test case.
Such tables do not have a corresponding TABLE_LIST (table
reference), hence the crash. The reason for this was that
the subquery was executed as part of constant condition
evaluation before EXPLAIN attempts to print the table name.
During the subquery execution, the subquery JOIN_TAB and
its table are substituted by a temporary table in
make_simple_join.
Solution:
Similar to the analogous case for other Items than the
IS NULL function, do not evaluate expensive constant
conditions.
Fixed LP BUG#714808 Assertion `outer_lookup_keys <= outer_record_count'
Analysis:
The function best_access_path() computes the number or records as
follows:
...
if (rec < MATCHING_ROWS_IN_OTHER_TABLE)
rec= MATCHING_ROWS_IN_OTHER_TABLE; // Fix for small tables
...
if (table->quick_keys.is_set(key))
records= (double) table->quick_rows[key];
else
{
/* quick_range couldn't use key! */
records= (double) s->records/rec;
}
Above MATCHING_ROWS_IN_OTHER_TABLE == 10, and s->records == 1,
thus we get an estimated 0.1 records. As a result JOIN::get_partial_join_cost()
for the outer query computes outer_record_count == 0.1 records, which is
meaningless in this context.
Solution:
Round row count estimates that are < 1 to 1.
The fixes for #643424 was part of the fix for #652727, that's why both
fixes are pushed together.
- The cause for #643424 was the improper use of get_partial_join_cost(),
which assumed that the 'n_tables' parameter was the upper bound for
query plan node indexes.
Fixed by generalizing get_partial_join_cost() as a method that computes
the cost of any partial join.
- The cause of #652727 was that JOIN::choose_subquery_plan() incorrectly
deleted the contents of the old keyuse array in the cases when an injected
plan would not provide more key accesses, and reoptimization was not actually
performed.
- Added more tests to the MWL#89 specific test, and made the test more modular.
- Updated test files.
- Fixed a memory leak.
- More comments.
mysql-test/r/subselect_mat.result:
- Updated the test file to reflect the new optimizer switches related to
materialized subquery execution.
- Added one extra test to test all cases that expose BUG#40037 (this is an old bug from 5.x).
- Updated the test result with correct results that expose BUG#40037.
mysql-test/t/subselect_mat.test:
- Updated the test file to reflect the new optimizer switches related to
materialized subquery execution.
- Added one extra test to test all cases that expose BUG#40037 (this is an old bug from 5.x).
- Updated the test result with correct results that expose BUG#40037.
sql/sql_select.cc:
Fixed a memory leak reported by Valgrind.
- Changed the default optimizer switches to provide 5.1/5.2 compatible behavior
- Added a regression test file to test consistently all cases covered by MWL#89
- Added/corrected/improved comments.