The wrong result set returned by the left join query from
the bug test case happened due to several inconsistencies
and bugs of the legacy mysql code.
The bug test case uses an execution plan that employs a scan
of a materialized IN subquery from the WHERE condition.
When materializing such an IN- subquery the optimizer injects
additional equalities into the WHERE clause. These equalities
express the constraints imposed by the subquery predicate.
The injected equality of the query in the test case happens
to belong to the same equality class, and a new equality
imposing a condition on the rows of the materialized subquery
is inferred from this class. Simultaneously the multiple
equality is added to the ON expression of the LEFT JOIN
used in the main query.
The inferred equality of the form f1=f2 is taken into account
when optimizing the scan of the rows the temporary table
that is the result of the subquery materialization: only the
values of the field f1 are read from the table into the record
buffer. Meanwhile the inferred equality is removed from the
WHERE conditions altogether as a constraint on the fields
of the temporary table that has been used when filling this table.
This equality is supposed to be removed from the ON expression
when the multiple equalities of the ON expression are converted
into an optimal set of equality predicates. It supposed to be
removed from the ON expression as an equality inferred from only
equalities of the WHERE condition. Yet, it did not happened
due to the following bug in the code.
Erroneously the code tried to build multiple equality for ON
expression twice: the first time, when it called optimize_cond()
for the WHERE condition, the second time, when it called
this function for the HAVING condition. When executing
optimize_con() for the WHERE condition a reference
to the multiple equality of the WHERE condition is set
in the multiple equality of the ON expression. This reference
would allow later to convert multiple equalities of the
ON expression into equality predicates. However the
the second call of build_equal_items() for the ON expression
that happened when optimize_cond() was called for the
HAVING condition reset this reference to NULL.
This bug fix blocks calling build_equal_items() for ON
expressions for the second time. In general, it will be
beneficial for many queries as it removes from ON
expressions any equalities that are to be checked for the
WHERE condition.
The patch also fixes two bugs in the list manipulation
operations and a bug in the function
substitute_for_best_equal_field() that resulted
in passing wrong reference to the multiple equalities
of where conditions when processing multiple
equalities of ON expressions.
The code of substitute_for_best_equal_field() and
the code the helper function eliminate_item_equal()
were also streamlined and cleaned up.
Now the conversion of the multiple equalities into
an optimal set of equality predicates first produces
the sequence of the all equalities processing multiple
equalities one by one, and, only after this, it inserts
the equalities at the beginning of the other conditions.
The multiple changes in the output of EXPLAIN
EXTENDED are mainly the result of this streamlining,
but in some cases is the result of the removal of
unneeded equalities from ON expressions. In
some test cases this removal were reflected in the
output of EXPLAIN resulted in disappearance of
“Using where” in some rows of the execution plans.
reached by fix_fields() (via reference) before row which it belongs to (on the second execution)
and fix_field for row did not follow usual protocol for Items with argument
(first check that the item fixed then call fix_fields).
Item_row::fix_field fixed.
The bug could lead to a wrong estimate of the number of expected rows
in the output of the EXPLAIN commands for queries with GROUP BY.
This could be observed in the test case for LP bug 934348.
If the setting of system variables does not allow to use join buffer
for a join query with GROUP BY <f1,...> / ORDER BY <f1,...> then
filesort is not needed if the first joined table is scanned in
the order compatible with order specified by the list <f1,...>.
- In JOIN::exec(), make the having->update_used_tables() call before we've
made the JOIN::cleanup(full=true) call. The latter frees SJ-Materialization
structures, which correlated subquery predicate items attempt to walk afterwards.
- Let fix_semijoin_strategies_for_picked_join_order() set
POSITION::prefix_record_count for POSITION records that it copies from
SJ_MATERIALIZATION_INFO::tables.
(These records do not have prefix_record_count set, because they are optimized
as joins-inside-semijoin-nests, without full advance_sj_state() processing).
Part#1: make EXPLAIN's plan match the one by actual execution:
Item_subselect::used_tables() should return the same value irrespectively
of whether we're running an EXPLAIN or a SELECT.
- When doing join optimization, pre-sort the tables so that they mimic the execution
order we've had with 'semijoin=off'.
- That way, we will not get regressions when there are two query plans (the old and the
new) that have indentical costs but different execution times (because of factors that
the optimizer was not able to take into account).
- The problem was that convert_subq_to_jtbm() attached the semi-join
TABLE_LIST object into the wrong list: they used to attach it to the
end of parent_lex->leaf_tables.head()->next_local->...->next_local.
This was apparently inccorect, as one can construct an example where
JTBM nest is attached to a table that is inside some mergeable VIEW, which
breaks (causes crash) for name resolution on the subsequent statement
re-execution.
- Solution: Attach to the "right" list. The "wording" was copied from
st_select_lex::handle_derived.
The result of materialization of the right part of an IN subquery predicate
is placed into a temporary table. Each row of the materialized table is
distinct. A unique key over all fields of the temporary table is defined and
created. It allows to perform key look-ups into the table.
The table created for a materialized subquery can be accessed by key as
any other table. The function best_access-path search for the best access
to join a table to a given partial join. With some where conditions this
function considers a possibility of a ref_or_null access. If such access
employs the unique key on the temporary table then when estimating
the cost this access the function tries to use the array rec_per_key. Yet,
such array is not built for this unique key. This causes a crash of the server.
Rows returned by the subquery that contain nulls don't have to be placed
into temporary table, as they cannot be match any row produced by the
left part of the subquery predicate. So all fields of the temporary table
can be defined as non-nullable. In this case any ref_or_null access
to the temporary table does not make any sense and it does not make sense
to estimate such an access.
The fix makes sure that the temporary table for a materialized IN subquery
is defined with columns that are all non-nullable. The also ensures that
any row with nulls returned by the subquery is not placed into the
temporary table.
This bug is the result of an incomplete/inconsistent change introduced into
5.3 code when the cond_equal parameter were added to the function optimize_cond.
The change was made during a merge from 5.2 in October 2010.
The bug could affect only queries with HAVING.
An outer join query with a semi-join subquery could return a wrong result
if the optimizer chose to materialize the subquery.
It happened because when substituting for the best field into a ref item
used to build access keys not all COND_EQUAL objects that could be employed
at substitution were checked.
Also refined some code in the function check_join_cache_usage to make it
safer.
- If LooseScan is used with quick select, require that quick select produces
data in key order (this disables use of MRR, which can return data in arbitrary order).
not be reproduced in the latest release of mariadb-5.3 as it was was fixed
by Sergey Petrunia when working on the problems concerning outer joins within
in subqueries converted to semi-joins.
- Disable use of join cache when we're using FirstMatch strategy, and the join
order is such that subquery's inner tables are interleaved with outer. Join
buffering code is incapable of handling such join orders.
- The testcase requires use of @@debug_optimizer_prefer_join_prefix to hit the bug,
but I'm pushing it anyway (including the mention of the variable in .test file),
so that it can be found and enabled when/if we get something comparable in the
main tree.
The problem was that LooseScan execution code assumed that tab->key holds
the index used for looseScan. This is only true when range or full index
scan are used. In case of ref access, the index is in tab->ref.key (and
tab->index==0 which explains how LooseScan passed tests with ref access: they
used one index)
Fixed by setting/using loosescan_key, which always the correct index#.
of mysql-5.6 code line. The bugs could not be reproduced in the latest release
of mariadb-5.3 as they were fixed either when the code of subquery optimization
was back-ported from mysql-6.0 or later when some other bugs were fixed.
of mysql-5.6 code line. The bugs could not be reproduced in the latest release
of mariadb-5.3 as they were fixed either when the code of subquery optimization
was back-ported from mysql-6.0 or later when some other bugs were fixed.
- equality substitution code was geared towards processing WHERE/ON clauses.
that is, it assumed that it was doing substitions on the code that
= wasn't attached to any particular join_tab yet
= was going to be fed to make_join_select() which would take the condition
apart and attach various parts of it to tables inside/outside semi-joins.
- However, somebody added equality substition for ref access. That is, if
we have a ref access on TBL.key=expr, they would do equality substition in
'expr'. This possibility wasn't accounted for.
- Fixed equality substition code by adding a mode that does equality
substition under assumption that the processed expression will be
attached to a certain particular table TBL.
- Create/use do_copy_nullable_row_to_notnull() function for ref access, which is used
when copying from not-NULL field in table that can be NULL-complemented to not-NULL field.
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
- Correctly handle plan refinement stage for LooseScan plans: run create_ref_for_key() if LooseScan
plan includes a ref access, and if we don't have any fixed key components, switch to a full index scan.
The function setup_sj_materialization_part1() forgot to set the value
of TABLE::map for any materialized IN subquery.
This could lead to wrong results for queries with subqueries that were
converted to queries with semijoins.
- if we're considering FirstMatch access with one inner table, and
@@optimizer_switch has semijoin_with_cache flag, calculate costs
as if we used join cache (because we will be able to do so)
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
- Make EXPLAIN display "Start temporary" at the start of the fanout (it used to display
at the first table whose rowid gets into temp. table which is not that useful for
the user)
- Updated test results (all checked)
This bug in the function Loose_scan_opt::check_ref_access_part1 could lead
to choosing an invalid execution plan employing a loose scan access to a
semi-join table even in the cases when such access could not be used at all.
This could result in wrong answers for some queries with IN subqueries.
This bug in the function setup_semijoin_dups_elimination() could
lead to invalid choice of the sequence of tables for which semi-join
duplicate elimination was applied.
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.
- when create_ref_for_key() is constructing a ref access for
a table that's inside a SJ-Materialization nest, it may not
use references to fields of tables that are unside the nest (because
these fields will not yet have values when ref access will be used)
The check was performed in the first of create_ref_for_key's loops (the
one which counts how many key parts are usable) but not in the second
(the one which actually fills the TABLE_REF structure).