- When doing join optimization, pre-sort the tables so that they mimic the execution
order we've had with 'semijoin=off'.
- That way, we will not get regressions when there are two query plans (the old and the
new) that have indentical costs but different execution times (because of factors that
the optimizer was not able to take into account).
The result of materialization of the right part of an IN subquery predicate
is placed into a temporary table. Each row of the materialized table is
distinct. A unique key over all fields of the temporary table is defined and
created. It allows to perform key look-ups into the table.
The table created for a materialized subquery can be accessed by key as
any other table. The function best_access-path search for the best access
to join a table to a given partial join. With some where conditions this
function considers a possibility of a ref_or_null access. If such access
employs the unique key on the temporary table then when estimating
the cost this access the function tries to use the array rec_per_key. Yet,
such array is not built for this unique key. This causes a crash of the server.
Rows returned by the subquery that contain nulls don't have to be placed
into temporary table, as they cannot be match any row produced by the
left part of the subquery predicate. So all fields of the temporary table
can be defined as non-nullable. In this case any ref_or_null access
to the temporary table does not make any sense and it does not make sense
to estimate such an access.
The fix makes sure that the temporary table for a materialized IN subquery
is defined with columns that are all non-nullable. The also ensures that
any row with nulls returned by the subquery is not placed into the
temporary table.
Problem: When building the condition for JOIN::outer_ref_cond the optimizer forgot to take into account
that this condition could depend on constant tables as well.
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
care has been taken to restore the original plan so that each
test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
array bounds checking, and because it looks risky.
Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are
meaningless:
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate
functions
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
* rename all debugging related command-line options
and variables to start from "debug-", and made them all
OFF by default.
* replace "MySQL" with "MariaDB" in error messages
* "Cast ... converted ... integer to it's ... complement"
is now a note, not a warning
* @@query_cache_strip_comments now has a session scope,
not global.
- if we're considering FirstMatch access with one inner table, and
@@optimizer_switch has semijoin_with_cache flag, calculate costs
as if we used join cache (because we will be able to do so)
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
The cause of the wrong result was that Item_ref_null_helper::get_date()
didn't use a method of the *_result() family, and fetched the data
for the field from the current row instead of result_field. Changed to
use the correct *_result() method, like to all other similar methods
of Item_ref_null_helper.
Analysis:
lp:894397 was a consequence of a prior incorrect fix of lp:833777
which didn't take into account that even when all tables are
constant there may be correlated conditions, and the where clause
is not equivalent to the constant conditions.
Solution:
When there are constant tables only, evaluate only the conditions
that reference outer fields, because the constant conditions are
already checked, and the where clause doesn't have other conditions
than constant ones, and outer referencing ones. The fix for
lp:894397 also fixes lp:833777.
The problem was that when we have single row subquery with no rows
Item_cache(es) which represent result row was not null and being
requested via element_index() returned random value.
The fix is setting all Item_cache(es) in NULL before executing the
query (reset() method) which guaranty NULL value of whole query
or its elements requested in any way if no rows was found.
set_null() method was added to Item_cache to guaranty correct NULL
value in case of reseting the cache.
- Make EXPLAIN display "Start temporary" at the start of the fanout (it used to display
at the first table whose rowid gets into temp. table which is not that useful for
the user)
- Updated test results (all checked)
Stop attempts to apply IN/ALL/ANY optimizations to so called "fake_select"
(used for ordering and filtering results of union) in union subquery execution.
Analysis:
The optimizer distinguishes two kinds of 'constant' conditions:
expensive ones, and non-expensive ones. The non-expensive conditions
are evaluated inside make_join_select(), and if false, already the
optimizer detects empty query results.
In order to avoid arbitrarily expensive optimization, the evaluation of
expensive constant conditions is delayed until execution. These conditions
are attached to JOIN::exec_const_cond and evaluated in the beginning of
JOIN::exec. The relevant execution logic is:
JOIN::exec()
{
if (! join->exec_const_cond->val_int())
{
produce an empty result;
stop execution
}
continue execution
execute the original WHERE clause (that contains exec_const_cond)
...
}
As a result, when an expensive constant condition is
TRUE, it is evaluated twice - once through
JOIN::exec_const_cond, and once through JOIN::cond.
When the expensive constant condition is a subquery,
predicate, the subquery is evaluated twice. If we have
many levels of subqueries, this logic results in a chain
of recursive subquery executions that walk a perfect
binary tree. The result is that for subquries with depth N,
JOIN::exec is executed O(2^N) times.
Solution:
Notice that the second execution of the constant conditions
happens inside do_select(), in the branch:
if (join->table_count == join->const_tables) { ... }
In this case exec_const_cond is equivalent to the whole WHERE
clause, therefore the WHERE clause has already been checked in
the beginnig of JOIN::exec, and has been found to be true.
The bug is addressed by not evaluating the WHERE clause if there
was exec_const_conds, and it was TRUE.
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.
In MariaDB, when running in ONLY_FULL_GROUP_BY mode,
the server produced in incorrect error message that there
is an aggregate function without GROUP BY, for artificially
created MIN/MAX functions during subquery MIN/MAX optimization.
The fix introduces a way to distinguish between artifially
created MIN/MAX functions as a result of a rewrite, and normal
ones present in the query. The test for ONLY_FULL_GROUP_BY violation
now tests in addition if a MIN/MAX function was part of a MIN/MAX
subquery rewrite.
In order to be able to distinguish these MIN/MAX functions, the
patch introduces an additional flag in Item_in_subselect::in_strategy -
SUBS_STRATEGY_CHOSEN. This flag is set when the optimizer makes its
final choice of a subuqery strategy. In order to make the choice
consistent, access to Item_in_subselect::in_strategy is provided
via new class methods.
******
Fix MySQL BUG#12329653
In MariaDB, when running in ONLY_FULL_GROUP_BY mode,
the server produced in incorrect error message that there
is an aggregate function without GROUP BY, for artificially
created MIN/MAX functions during subquery MIN/MAX optimization.
The fix introduces a way to distinguish between artifially
created MIN/MAX functions as a result of a rewrite, and normal
ones present in the query. The test for ONLY_FULL_GROUP_BY violation
now tests in addition if a MIN/MAX function was part of a MIN/MAX
subquery rewrite.
In order to be able to distinguish these MIN/MAX functions, the
patch introduces an additional flag in Item_in_subselect::in_strategy -
SUBS_STRATEGY_CHOSEN. This flag is set when the optimizer makes its
final choice of a subuqery strategy. In order to make the choice
consistent, access to Item_in_subselect::in_strategy is provided
via new class methods.
Analysis:
Equality propagation propagated the constant '7' into
args[0] of the Item_in_optimizer that stands for the
"< ANY" predicate. At the same the min/max subquery
rewrite swapped the order of the left and right operands
of the "<" predicate, but used Item_in_subselect::left_expr.
As a result, when the <ANY predicate is executed early in the
execution phase as a contant condition, instead of a constant
right (swapped) argument of the < predicate, there was a field
(t3.a). This field had no data, since the whole predicate is
considered constant, and it is evaluated before any tables are
read. Having junk in the field row buffer produced wrong result
Solution:
Fix create_swap to pick the correct Item_in_optimizer left
argument.