- If LooseScan is used with quick select, require that quick select produces
data in key order (this disables use of MRR, which can return data in arbitrary order).
- Disable use of join cache when we're using FirstMatch strategy, and the join
order is such that subquery's inner tables are interleaved with outer. Join
buffering code is incapable of handling such join orders.
- The testcase requires use of @@debug_optimizer_prefer_join_prefix to hit the bug,
but I'm pushing it anyway (including the mention of the variable in .test file),
so that it can be found and enabled when/if we get something comparable in the
main tree.
The problem was that LooseScan execution code assumed that tab->key holds
the index used for looseScan. This is only true when range or full index
scan are used. In case of ref access, the index is in tab->ref.key (and
tab->index==0 which explains how LooseScan passed tests with ref access: they
used one index)
Fixed by setting/using loosescan_key, which always the correct index#.
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
- Let JTBM optimization code handle the case where the subquery is degenerate and doesn't have a
join query plan. Regular materialization would fall back to IN->EXISTS for such cases. Semi-Join
materialization does not have such option, instead we introduce and use "constant JTBM join tabs".
If the duplicate elimination strategy is used for a semi-join and potentially
one of the block-based join algorithms can be employed to join the inner
tables of the semi-join then sorting of the head (first non-constant) table
for a query with ORDER BY / GROUP BY cannot be used.
The function setup_sj_materialization_part1() forgot to set the value
of TABLE::map for any materialized IN subquery.
This could lead to wrong results for queries with subqueries that were
converted to queries with semijoins.
- if we're considering FirstMatch access with one inner table, and
@@optimizer_switch has semijoin_with_cache flag, calculate costs
as if we used join cache (because we will be able to do so)
- Part 1 of the fix: for semi-join merged subqueries, calling child_join->optimize() until we're done with all
PS-lifetime optimizations in the parent.
- Make EXPLAIN display "Start temporary" at the start of the fanout (it used to display
at the first table whose rowid gets into temp. table which is not that useful for
the user)
- Updated test results (all checked)
The patch also fixes an unrelated compiler warning.
Analysis:
The temporary table created during SJ-materialization
might be used for sorting for a group by operation. The
sort buffers for this internal temporary table were not
cleared by the execution code after each subquery
re-execution. This resulted in a memory leak detected
by valgrind.
Solution:
Cleanup the sort buffers for the semijon tables as well.
sql/item_subselect.cc:
- Fix a compiler warning and add logic to revert to table
scan partial match when there are more rows in the materialized
subquery than there can be bits in the NULL bitmap index used
for partial matching.
sql/opt_subselect.cc:
- Fixed a memory leak detected by valgrind
Stop attempts to apply IN/ALL/ANY optimizations to so called "fake_select"
(used for ordering and filtering results of union) in union subquery execution.
- Break down POSITION/advance_sj_state() into four classes
representing potential semi-join strategies.
- Treat all strategies uniformly (before, DuplicateWeedout
was special as it was the catch-all strategy. Now, we're
still relying on it to be the catch-all, but are able to
function,e.g. with firstmatch=on,duplicate_weedout=off.
- Update test results (checked)
This bug in the function setup_semijoin_dups_elimination() could
lead to invalid choice of the sequence of tables for which semi-join
duplicate elimination was applied.
The function setup_semijoin_dups_elimination erroneously assumed
that if join_cache_level is set to 3 or 4 then the type of the
access to a table cannot be JT_REF or JT_EQ_REF. This could lead
to wrong query result sets.
In MariaDB, when running in ONLY_FULL_GROUP_BY mode,
the server produced in incorrect error message that there
is an aggregate function without GROUP BY, for artificially
created MIN/MAX functions during subquery MIN/MAX optimization.
The fix introduces a way to distinguish between artifially
created MIN/MAX functions as a result of a rewrite, and normal
ones present in the query. The test for ONLY_FULL_GROUP_BY violation
now tests in addition if a MIN/MAX function was part of a MIN/MAX
subquery rewrite.
In order to be able to distinguish these MIN/MAX functions, the
patch introduces an additional flag in Item_in_subselect::in_strategy -
SUBS_STRATEGY_CHOSEN. This flag is set when the optimizer makes its
final choice of a subuqery strategy. In order to make the choice
consistent, access to Item_in_subselect::in_strategy is provided
via new class methods.
******
Fix MySQL BUG#12329653
In MariaDB, when running in ONLY_FULL_GROUP_BY mode,
the server produced in incorrect error message that there
is an aggregate function without GROUP BY, for artificially
created MIN/MAX functions during subquery MIN/MAX optimization.
The fix introduces a way to distinguish between artifially
created MIN/MAX functions as a result of a rewrite, and normal
ones present in the query. The test for ONLY_FULL_GROUP_BY violation
now tests in addition if a MIN/MAX function was part of a MIN/MAX
subquery rewrite.
In order to be able to distinguish these MIN/MAX functions, the
patch introduces an additional flag in Item_in_subselect::in_strategy -
SUBS_STRATEGY_CHOSEN. This flag is set when the optimizer makes its
final choice of a subuqery strategy. In order to make the choice
consistent, access to Item_in_subselect::in_strategy is provided
via new class methods.
- convert_subq_to_jtbm() didn't check that subuqery optimization was successful. If it wasn't (in this
example because of @@max_join_size violation), it would proceed further and eventually crash when
trying to execute the un-optimized subquery.
- The problem was that JOIN::save/restore_query_plan() did not save/restore parts of
the query plan that are located inside SJ_MATERIALIZATION_INFO structures. This could
cause parts of one plan to be used with another, which led get_best_combination() to
constructing non-sensical join plans (and crash).
Fixed by saving/restoring SJM parts of the query plans.
- check_and_do_in_subquery_rewrites() will not set SUBS_MATERIALIZATION flag when it
records that the subquery predicate is to be converted into semi-join.
If convert_join_subqueries_to_semijoins() later decides not to convert to semi-join,
let it set SUBS_MATERIALIZATION flag, if appropriate.
- If convert_join_subqueries_to_semijoins() decides to wrap Item_in_subselect in Item_in_optimizer,
it should do so in prep_on_expr/prep_where, too, as long as they are present.
There seems to be two possibilities of how we arrive in this function:
- prep_on_expr/prep_where==NULL, and will be set later by simplify_joins()
- prep_on_expr/prep_where!=NULL, and it is a copy_and_or_structure()-made copy of on_expr/where.
the latter can happen for some (but not all!) nested joins. This bug was that we didn't handle this case.
- Make subquery_types_allow_materialization() detect a case where
create_tmp_table() would create a blob column which would make it
impossible to use materialization
Non-semi-join materialization worked because it detected that this case
and felt back to use IN->EXISTS. Semi-join Materialization cannot easily
fallback, so we have to detect this case early.
- setup_sj_materialization() code failed to take into account that it can be that
the first [in join order ordering] table inside semi-join-materialization nest
is also an inner table wrt an outer join (that is embedded in the semi-join).
This can happen when all of the tables that are inside the semi-join but not inside
the outer join are constant.
- Made a trivial to not assume that table's embedding join nest is the semi-join
nest: instead, walk up the outer join nests until we reach the semi-join nest.
Analysis:
In the test query semi-join merges the inner-most subquery
into the outer subquery, and the optimization of the merged
subquery finds some new index access methods. Later the
IN-EXISTS transformation is applied to the unmerged subquery.
Since the optimizer is instructed to not consider
materialization, it reoptimizes the plan in-place to take into
account the new IN-EXISTS conditions. Just before reoptimization
JOIN::choose_subquery_plan resets the query plan, which also
resets the access methods found during the semi-join merge.
Then reoptimization discovers there are no new access methods,
but it leaves the query plan in its reset state. Later semi-join
crashes because it assumes these access methods are present.
Solution:
When reoptimizing in-place, reset the query plan only after new
access methods were discovered. If no new access methods were
discovered, leave the current plan as it was.
- The problem was that the code that made the check whether the subquery is an AND-part of the WHERE
clause didn't work correctly for nested subqueries. In particular, grand-child subquery in HAVING was
treated as if it was in the WHERE, which eventually caused an assert when replace_where_subcondition
looked for the subquery predicate in the WHERE and couldn't find it there.
- The fix: Removed implementation of "thd_marker approach". thd->thd_marker was used to determine the
location of subquery predicate: setup_conds() would set accordingly it when making the
{where|on_expr}->fix_fields(...)
call so that AND-parts of the WHERE/ON clauses can determine they are the AND-parts.
Item_cond_or::fix_fields(), Item_func::fix_fields(), Item_subselect::fix_fields (this one was missed),
and all other items-that-contain-items had to reset thd->thd_marker before calling fix_fields() for
their children items, so that the children can see they are not AND-parts of WHERE/ON.
- The "thd_marker approach" required that a lot of code in different locations maintains correct value of
thd->thd_marker, so it was replaced with:
- The new approach with mark_as_condition_AND_part does not keep context in thd->thd_marker. Instead,
setup_conds() now calls
{where|on_expr}->mark_as_condition_AND_part()
and implementations of that function make sure that:
- parts of AND-expressions get the mark_as_condition_AND_part() call
- Item_in_subselect objects record that they are AND-parts of WHERE/ON