- if we're considering FirstMatch access with one inner table, and
@@optimizer_switch has semijoin_with_cache flag, calculate costs
as if we used join cache (because we will be able to do so)
The execution plan cannot use sorting on the first table from the
sequence of the joined tables if it plans to employ the block-based
hash join algorithm.
- Part 1 of the fix: for semi-join merged subqueries, calling child_join->optimize() until we're done with all
PS-lifetime optimizations in the parent.
- Make create_tmp_table() set KEY_PART_INFO attributes for the keys it creates.
This wasn't needed before but is needed now, when temp. tables that are
results of SJ-Materialization are being used for joins.
This particular bug depended on HA_VAR_LENGTH_PART being set,
but also added code to set HA_BLOB_PART and HA_NULL_PART when appropriate.
KEYUSE elements for a possible hash join key are not sorted by field
numbers of the second table T of the hash join operation. Besides
some of these KEYUSE elements cannot be used to build any key as their
key expressions depend on the tables that are planned to be accessed
after the table T.
The code before the patch did not take this into account and, as a result,
execition of a query the employing block-based hash join algorithm could
cause a crash or return a wrong result set.
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
If has been decided that the first match strategy is to be used to join table T
from a semi-join nest while no buffer can be employed to join this table
then no join buffer can be used to join any table in the join sequence between
the first one belonging to the semi-join nest and table T.
The tables from the same semi-join or outer join nest cannot use
join buffers if in the join sequence of the query execution plan
they are separated by a table that is planned to be joined without
usage of a join buffer.
The cause of the wrong result was that Item_ref_null_helper::get_date()
didn't use a method of the *_result() family, and fetched the data
for the field from the current row instead of result_field. Changed to
use the correct *_result() method, like to all other similar methods
of Item_ref_null_helper.
Analysis:
lp:894397 was a consequence of a prior incorrect fix of lp:833777
which didn't take into account that even when all tables are
constant there may be correlated conditions, and the where clause
is not equivalent to the constant conditions.
Solution:
When there are constant tables only, evaluate only the conditions
that reference outer fields, because the constant conditions are
already checked, and the where clause doesn't have other conditions
than constant ones, and outer referencing ones. The fix for
lp:894397 also fixes lp:833777.
The problem was that when we have single row subquery with no rows
Item_cache(es) which represent result row was not null and being
requested via element_index() returned random value.
The fix is setting all Item_cache(es) in NULL before executing the
query (reset() method) which guaranty NULL value of whole query
or its elements requested in any way if no rows was found.
set_null() method was added to Item_cache to guaranty correct NULL
value in case of reseting the cache.
- Make EXPLAIN display "Start temporary" at the start of the fanout (it used to display
at the first table whose rowid gets into temp. table which is not that useful for
the user)
- Updated test results (all checked)
Stop attempts to apply IN/ALL/ANY optimizations to so called "fake_select"
(used for ordering and filtering results of union) in union subquery execution.
Analysis:
The bug is a result of an incomplete fix for bug lp:869036.
That fix didn't take into account that there may be a case
when ther are no NULLs in the materialized subquery, however
all columns without NULLs may not be grouped in the only
non-null index. This is the case when the left subquery expression
has nullable columns.
Solution:
The patch handles two missing sub-cases of the case when there are
no value (non-null matches) for any outer expression, and there are
both NULLs and non-NUll values in the outer reference.
a) If the materialized subquery contains no NULLs there cannot be a
partial match, because there are no NULLs in those columns where
the outer reference has no NULLs.
b) If the materialized subquery contains NULLs, but there exists a
column, such that its corresponding outer expression has no NULL,
and this column also has no NULL. Then there cannot be a partial
match either.
- Break down POSITION/advance_sj_state() into four classes
representing potential semi-join strategies.
- Treat all strategies uniformly (before, DuplicateWeedout
was special as it was the catch-all strategy. Now, we're
still relying on it to be the catch-all, but are able to
function,e.g. with firstmatch=on,duplicate_weedout=off.
- Update test results (checked)
This bug in the function Loose_scan_opt::check_ref_access_part1 could lead
to choosing an invalid execution plan employing a loose scan access to a
semi-join table even in the cases when such access could not be used at all.
This could result in wrong answers for some queries with IN subqueries.
Analysis:
The optimizer distinguishes two kinds of 'constant' conditions:
expensive ones, and non-expensive ones. The non-expensive conditions
are evaluated inside make_join_select(), and if false, already the
optimizer detects empty query results.
In order to avoid arbitrarily expensive optimization, the evaluation of
expensive constant conditions is delayed until execution. These conditions
are attached to JOIN::exec_const_cond and evaluated in the beginning of
JOIN::exec. The relevant execution logic is:
JOIN::exec()
{
if (! join->exec_const_cond->val_int())
{
produce an empty result;
stop execution
}
continue execution
execute the original WHERE clause (that contains exec_const_cond)
...
}
As a result, when an expensive constant condition is
TRUE, it is evaluated twice - once through
JOIN::exec_const_cond, and once through JOIN::cond.
When the expensive constant condition is a subquery,
predicate, the subquery is evaluated twice. If we have
many levels of subqueries, this logic results in a chain
of recursive subquery executions that walk a perfect
binary tree. The result is that for subquries with depth N,
JOIN::exec is executed O(2^N) times.
Solution:
Notice that the second execution of the constant conditions
happens inside do_select(), in the branch:
if (join->table_count == join->const_tables) { ... }
In this case exec_const_cond is equivalent to the whole WHERE
clause, therefore the WHERE clause has already been checked in
the beginnig of JOIN::exec, and has been found to be true.
The bug is addressed by not evaluating the WHERE clause if there
was exec_const_conds, and it was TRUE.
A non-first execution of a prepared statement missed a call of the
TABLE_LIST::process_index_hints() method in the code of the function
setup_tables().
At some scenarios this could lead to the choice of a quite inefficient
execution plan for the base query of the prepared statement.
This bug in the function setup_semijoin_dups_elimination() could
lead to invalid choice of the sequence of tables for which semi-join
duplicate elimination was applied.
Due to this bug the function SEL_IMERGE::or_sel_tree_with_checks()
could build an inconsistent merge tree if one of the SEL_TREEs in the
resulting index merge happened to contain a full key range.
This could trigger an assertion failure.
The function key_and() erroneously called SEL_ARG::increment_use_count()
when SEL_ARG::incr_refs() should had been called. This could lead to
wrong values of use_count for some SEL_ARG trees.
Apart from the fix, the patch also adds few more unrelated test
cases for partial matching, and fixes few typos.
Analysis:
This bug uncovered that partial matching via rowid intersection
didn't handle the case when:
- the left IN argument has some NULLs,
- there are no non-null value matches, and there is no non-null
column,
- the subquery columns that are not covered with the NULLs in
the left IN argument contain at least one row, such that it
has NULL values in all columns where the left IN operand has
no NULLs.
In this case there is a partial match.
In addition the analysis of the related code uncovered incorrect
handling of few other related cases.
Solution:
The solution for the bug is to check if there exists a row with
NULLs in all columns other than the ones having NULL in the
let IN operand.
The check is implemented via checking whether the bitmaps that
store NULL information in class Ordered_key have a non-empty
intersection for the relevant columns.
The intersection itself is implemented via the function
bitmap_exists_intersection() in my_bitmap.c.
The function setup_semijoin_dups_elimination erroneously assumed
that if join_cache_level is set to 3 or 4 then the type of the
access to a table cannot be JT_REF or JT_EQ_REF. This could lead
to wrong query result sets.
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.