The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
care has been taken to restore the original plan so that each
test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
array bounds checking, and because it looks risky.
Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are
meaningless:
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate
functions
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
- Let JTBM optimization code handle the case where the subquery is degenerate and doesn't have a
join query plan. Regular materialization would fall back to IN->EXISTS for such cases. Semi-Join
materialization does not have such option, instead we introduce and use "constant JTBM join tabs".
- Do a "more thorough" cleanup of SJ-Materialization join tab in JOIN_TAB::cleanup. The bug
was due to the fact that JOIN_TAB::cleanup() may be called multiple times for the same tab
if the join has grouping.
* rename all debugging related command-line options
and variables to start from "debug-", and made them all
OFF by default.
* replace "MySQL" with "MariaDB" in error messages
* "Cast ... converted ... integer to it's ... complement"
is now a note, not a warning
* @@query_cache_strip_comments now has a session scope,
not global.
- Part 1 of the fix: for semi-join merged subqueries, calling child_join->optimize() until we're done with all
PS-lifetime optimizations in the parent.
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.
- The problem was that JOIN::save/restore_query_plan() did not save/restore parts of
the query plan that are located inside SJ_MATERIALIZATION_INFO structures. This could
cause parts of one plan to be used with another, which led get_best_combination() to
constructing non-sensical join plans (and crash).
Fixed by saving/restoring SJM parts of the query plans.
- check_and_do_in_subquery_rewrites() will not set SUBS_MATERIALIZATION flag when it
records that the subquery predicate is to be converted into semi-join.
If convert_join_subqueries_to_semijoins() later decides not to convert to semi-join,
let it set SUBS_MATERIALIZATION flag, if appropriate.
- are_tables_local() failed to recognize the fact that OUTER_REF_TABLE_BIT is ok
for SJ-Materialization. This caused zero-length ref access to be constructed, which
led to an assert.
- Item_in_subselect::inject_in_to_exists_cond() should not call
((Item_cond*)join->conds)->argument_list()->concat(join->cond_equal->current_level)
as that makes two lists share their tail, and the cond_equal list
will end up containing non-Item_equal objects when substitute_for_best_equal_field()
walks through join->conds and replaces all Item_equal objects with Item_func_eq objects.
- So, instead of using List::concat(), manually copy entries from one list to another.
- setup_sj_materialization() code failed to take into account that it can be that
the first [in join order ordering] table inside semi-join-materialization nest
is also an inner table wrt an outer join (that is embedded in the semi-join).
This can happen when all of the tables that are inside the semi-join but not inside
the outer join are constant.
- Made a trivial to not assume that table's embedding join nest is the semi-join
nest: instead, walk up the outer join nests until we reach the semi-join nest.
Analysis:
Partial matching is used even when there are no NULLs in
a materialized subquery, as long as the left NOT IN operand
may contain NULL values.
This case was not handled correctly in two different places.
First, the implementation of parital matching did not clear
the set of matching columns when the merge process advanced
to the next row.
Second, there is no need to perform partial matching at all
when the left operand has no NULLs.
Solution:
First fix subselect_rowid_merge_engine::partial_match() to
properly cleanup the bitmap of matching keys when advancing
to the next row.
Second, change subselect_partial_match_engine::exec() so
that when the materialized subquery doesn't contain any
NULLs, and the left operand of [NOT] IN doesn't contain
NULLs either, the method returns without doing any
unnecessary partial matching. The correct result in this
case is in Item::in_value.
Identified all test cases in the MySQL file subquery_mat.inc that are
not present in MariaDB. In total found 8 test cases for the following
MySQL bugs:
* BUG#49630 - not a bug in MariaDB, added test case
* BUG#52538 - not a bug in MariaDB, added test case (checked with VG)
* BUG#53103 - not a bug in MariaDB, added test case
* BUG#54511 - not a bug in MariaDB, added test case
* BUG#56367 - not a bug in MariaDB, added test case
* BUG#59833 - not a bug in MariaDB, added test case
* BUG#11852644 - not a bug in MariaDB, added test case
* BUG#12668294 - not a bug in MariaDB, added test case
All of these MySQL bugs are not present in MariaDB 5.3.
The comparison was based on the following version of
mysql-trunk:
revno: 3350 [merge]
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: mysql-trunk
timestamp: Mon 2011-08-08 12:42:09 +0300
message:
Merge mysql-5.5 to mysql-trunk.
The function matching_cond should take into account that
there may be always false constant conjunctive conditions
that has not been evaluated yet,for example, conjunctive
conditions with non-correlated subqueries.
- Set the default
- Adjust the testcases so that 'new' tests are run with optimizations turned on.
- Pull out relevant tests from "irrelevant" tests and run them with optimizations on.
- Run range.test and innodb.test with both mrr=on and mrr=off
semijoin=on,firstmatch=on,loosescan=on
to
semijoin=off,firstmatch=off,loosescan=off
Adjust the testcases:
- Modify subselect*.test and join_cache.test so that all tests
use the same execution paths as before (i.e. optimizations that
are being tested are enabled)
- Let all other test files run with the new default settings (i.e.
with new optimizations disabled)
- Copy subquery testcases from these files into t/subselect_extra.test
which will run them with new optimizations enabled.
Resolved all conflicts, bad merges and fixed a few minor bugs in the code.
Commented out the queries from multi_update, view, subselect_sj, func_str,
derived_view, view_grant that failed either with crashes in ps-protocol or
with wrong results.
The failures are clear indications of some bugs in the code and these bugs
are to be fixed.
Analysis:
During optimization of the subquery, in the call chain:
update_ref_and_keys -> add_key_fields ->
merge_key_fields -> Item_direct_ref::is_null -> Item_cache::is_null
The call to Item_cache::is_null() returns TRUE, which is wrong.
This results in Item_null replacing the field 'f3' in the KEY_FIELD,
then this Item_null is used for index access, producing a wrong result.
The reason why Item_cache::is_null returns wrong result is that
this Item_cache object is a cache of the left operand of IN, and was
updated in Item_in_optimizer::val_int. In MWL#89 the latter method is
called during the execution phase, which is after we optimize the subquery.
Therefore during the optization phase the left operand cache of IN was
not updated.
Solution:
Update the left operand cache during optimization if it is a constant.
This bug fix also discoveres and fixes a wrong IF statement in
convert_constant_item().
Analysis:
The wrong result is a consquence of sorting the subquery
result and then selecting only the first row due to the
artificial LIMIT 1 introduced by the fix_fields phase.
Normally, if there is an ORDER BY in a subquery, the ORDER
is removed (Item_in_subselect::select_in_like_transformer),
however if a GROUP BY is transformed into ORDER, this happens
later, after the removal of the ORDER clause of subqueries, so
we end up with a subquery with an ORDER clause, and an artificially
added LIMIT 1.
The reason why the same works in the main 5.3 without MWL#89, is
that the 5.3 performs all subquery transformations, including
IN->EXISTS before JOIN::optimize(). The beginning of JOIN::optimize
does:
if (having || (select_options & OPTION_FOUND_ROWS))
select_limit= HA_POS_ERROR;
which sets the limit back to infinity, thus 5.3 sorts the whole
subquery result, and IN performs the lookup into all subquery result
rows.
Solution:
Sorting of subqueries without LIMIT is meaningless. Since LIMIT in
subqueries is not supported, the patch removes sorting by setting
join->skip_sort_order= true
for each subquery JOIN object. This improves a number of execution
plans to not perform unnecessary sorting at all.
- "Using MRR" is no longer shown with range access.
- Instead, both range and BKA accesses will show one of the following:
= "Rowid-ordered scan"
= "Key-ordered scan"
= "Key-ordered Rowid-ordered scan"
depending on whether DS-MRR implementation will do scan keys in order, rowids in order,
or both.
- The patch also introduces a way for other storage engines/MRR implementations to
pass information to EXPLAIN output about the properties of employed MRR scans.
- Auto-merge with 5.3 main.
- Changed the test for LP BUG#719198 so that
an two more queries were added, and removed a
query that produces a wrong result due to an
unrelated problem. The wrong result is submitted
as a separate bug.
Analysis (BUG#719198):
The assert failed because the execution code for
partial matching is designed with the assumption that
NULLs on the left side are detected as early as possible,
and a NULL result is returned before any lookups are
performed at all.
However, in the case of an Item_cache object on the left
side, null was not detected properly, because detection
was done via Item::is_null(), which is not implemented at
all for Item_cache, and resolved to the default Item::is_null()
which always returns FALSE.
Solution:
Imlpement Item::is_null().
******
Analysis (BUG#730604):
The method Item_field::is_null() determines if an item is NULL from its
Item_field::field object. However, for Item_fields that represent internal
temporary tables, Item_field::field represents the field of the original
table that was the source for the temporary table (in this case t1.f3).
Both in the committed test case, and in the original bug report the current
value of t1.f3 is not NULL. This results in an incorrect count of NULLs
for this column. As a consequence, all related Ordered_key buffers are
allocated with incorrect sizes. Depending on the exact query and data,
these incorrect sizes result in various crashes or failed asserts.
Solution:
The correct value of the current field of the internal temp table is
in Item_field::result_field. This value is determined by
Item::is_null_result().
- In join buffering code, call join_tab_execution_startup() (#1) before we call join_tab_scan->open() (#2).
This is important with SJ-Materialization because #1 fills the materialized table, while
#2 will actually try to read the first row. Attempt to read the first row before we have
populated the materialized table would cause zero rows to be returned when actually there were matches.