The select mentioned in the bug attempted to create a temporary table
using the maria storage engine. The table needs to have primary keys such that
duplicates can be removed. Unfortunately this use case has a longer
than allowed key and the tmp table got created without a temporary key.
We must not allow materialization for the subquery if the total key
length and key parts is greater than what the storage engine supports.
Problem was in rewriting left expression which had 2 references on it. Solved with making subselect reference main.
Item_in_optimized can have not Item_in_subselect reference in left part so type casting with no check is dangerous.
Item::cols() should be checked after Item::fix_fields().
Analysis :
==========
During JOIN::prepare of sub-query which creates the
derived tables we call setup_procedure. Here we call
fix_fields for parameters of procedure clause. Calling
setup_procedure at this point may cause issue. If
sub-query is one of parameter being fixed it might
lead to complicated dependencies on derived tables
being prepared.
SOLUTION :
==========
In 5.6 with WL#6242, we have made procedure clause
parameters can only be NUM, so sub-queries are not
allowed as parameters. So in 5.5 we can block
sub-queries in procedure clause parameters.
This eliminates above conflicting dependencies.
update_used_tables for the the where condition to update cached
indicators of constant subexpressions. It should be done before further
possible simplification of the where condition.
This change caused simplification of the executed where conditions
in many test cases.
Analysis:
st_select_lex_unit::prepare() computes can_skip_order_by as TRUE.
As a result join->prepare() gets called with order == NULL, and
doesn't do name resolution for the inner ORDER clause. Due to this
the prepare phase doesn't detect that the query references non-exiting
function and field.
Later join->optimize() calls update_used_tables() for a non-resolved
Item_field, which understandably has no Field object. This call results
in a crash.
Solution:
Resolve unnecessary ORDER BY clauses to detect if they reference non-exising
objects. Then remove such clauses from the JOIN object.
The patch to fix mdev-4418 turned out to be incorrect.
At the substitution of single row tables in make_join_statistics()
the used multiple equalities may change and references to the new multiple
equalities must be updated. The function remove_eq_conds() takes care of it and
it should be called right after the substitution of single row tables.
Calling it after the call of make_join_statistics was a mistake.
After single row substitutions there might appear new equalities.
They should be properly propagated to all AND/OR levels the WHERE
condition. It's done now with an additional call of remove_eq_conds().
Analysis:
The reason for the inefficent plan was that Item_subselect::is_expensive()
didn't detect the special case when a subquery was optimized, but had no
join plan because it either has no table, or its tables have been optimized
away, or the optimizer detected that the result set is empty.
Solution:
Identify the special cases above in the Item_subselect::is_expensive(),
and consider such degenerate subqueries inexpensive.
The wrong result set returned by the left join query from
the bug test case happened due to several inconsistencies
and bugs of the legacy mysql code.
The bug test case uses an execution plan that employs a scan
of a materialized IN subquery from the WHERE condition.
When materializing such an IN- subquery the optimizer injects
additional equalities into the WHERE clause. These equalities
express the constraints imposed by the subquery predicate.
The injected equality of the query in the test case happens
to belong to the same equality class, and a new equality
imposing a condition on the rows of the materialized subquery
is inferred from this class. Simultaneously the multiple
equality is added to the ON expression of the LEFT JOIN
used in the main query.
The inferred equality of the form f1=f2 is taken into account
when optimizing the scan of the rows the temporary table
that is the result of the subquery materialization: only the
values of the field f1 are read from the table into the record
buffer. Meanwhile the inferred equality is removed from the
WHERE conditions altogether as a constraint on the fields
of the temporary table that has been used when filling this table.
This equality is supposed to be removed from the ON expression
when the multiple equalities of the ON expression are converted
into an optimal set of equality predicates. It supposed to be
removed from the ON expression as an equality inferred from only
equalities of the WHERE condition. Yet, it did not happened
due to the following bug in the code.
Erroneously the code tried to build multiple equality for ON
expression twice: the first time, when it called optimize_cond()
for the WHERE condition, the second time, when it called
this function for the HAVING condition. When executing
optimize_con() for the WHERE condition a reference
to the multiple equality of the WHERE condition is set
in the multiple equality of the ON expression. This reference
would allow later to convert multiple equalities of the
ON expression into equality predicates. However the
the second call of build_equal_items() for the ON expression
that happened when optimize_cond() was called for the
HAVING condition reset this reference to NULL.
This bug fix blocks calling build_equal_items() for ON
expressions for the second time. In general, it will be
beneficial for many queries as it removes from ON
expressions any equalities that are to be checked for the
WHERE condition.
The patch also fixes two bugs in the list manipulation
operations and a bug in the function
substitute_for_best_equal_field() that resulted
in passing wrong reference to the multiple equalities
of where conditions when processing multiple
equalities of ON expressions.
The code of substitute_for_best_equal_field() and
the code the helper function eliminate_item_equal()
were also streamlined and cleaned up.
Now the conversion of the multiple equalities into
an optimal set of equality predicates first produces
the sequence of the all equalities processing multiple
equalities one by one, and, only after this, it inserts
the equalities at the beginning of the other conditions.
The multiple changes in the output of EXPLAIN
EXTENDED are mainly the result of this streamlining,
but in some cases is the result of the removal of
unneeded equalities from ON expressions. In
some test cases this removal were reflected in the
output of EXPLAIN resulted in disappearance of
“Using where” in some rows of the execution plans.
Fix by Sergey Petrunia.
This patch only prevents the evaluation of expensive subqueries during optimization.
The crash reported in this bug has been fixed by some other patch.
The fix is to call value->is_null() only when !value->is_expensive(), because is_null()
may trigger evaluation of the Item, which in turn triggers subquery evaluation if the
Item is a subquery.
The problem was that was_null and null_value variables was reset in each reexecution of IN subquery, but engine rerun only for non-constant subqueries.
Fixed checking constant in Item_equal sort.
Fix constant reporting in Item_subselect.
The fix backports from MWL#182: Explain running statements the logic that
saves the original JOIN_TAB array of a query plan after optimization. This
array is later used during EXPLAIN to iterate over the original JOIN plan
nodes in the cases when this plan could be changed by early subquery
execution during the optimization phase of the outer query.
"ORDER BY" AND "LIMIT BY" CLAUSE
PROBLEM:
When a 'limit' clause is specified in a query along with
group by and order by, optimizer chooses wrong index
there by examining more number of rows than required.
However without the 'limit' clause, optimizer chooses
the right index.
ANALYSIS:
With respect to the query specified, range optimizer chooses
the first index as there is a range present ( on 'a'). Optimizer
then checks for an index which would give records in sorted
order for the 'group by' clause.
While checking chooses the second index (on 'c,b,a') based on
the 'limit' specified and the selectivity of
'quick_condition_rows' (number of rows present in the range)
in 'test_if_skip_sort_order' function.
But, it fails to consider that an order by clause on a
different column will result in scanning the entire index and
hence the estimated number of rows calculated above are
wrong (which results in choosing the second index).
FIX:
Do not enforce the 'limit' clause in the call to
'test_if_skip_sort_order' if we are creating a temporary
table. Creation of temporary table indicates that there would be
more post-processing and hence will need all the rows.
This fix is backported from 5.6. This problem is fixed in 5.6 as
part of changes for work log #5558
"ORDER BY" AND "LIMIT BY" CLAUSE
PROBLEM:
When a 'limit' clause is specified in a query along with
group by and order by, optimizer chooses wrong index
there by examining more number of rows than required.
However without the 'limit' clause, optimizer chooses
the right index.
ANALYSIS:
With respect to the query specified, range optimizer chooses
the first index as there is a range present ( on 'a'). Optimizer
then checks for an index which would give records in sorted
order for the 'group by' clause.
While checking chooses the second index (on 'c,b,a') based on
the 'limit' specified and the selectivity of
'quick_condition_rows' (number of rows present in the range)
in 'test_if_skip_sort_order' function.
But, it fails to consider that an order by clause on a
different column will result in scanning the entire index and
hence the estimated number of rows calculated above are
wrong (which results in choosing the second index).
FIX:
Do not enforce the 'limit' clause in the call to
'test_if_skip_sort_order' if we are creating a temporary
table. Creation of temporary table indicates that there would be
more post-processing and hence will need all the rows.
This fix is backported from 5.6. This problem is fixed in 5.6 as
part of changes for work log #5558
mysql-test/r/subselect.result:
Changes for Bug#11762052 results in the correct number of rows.
sql/sql_select.cc:
Do not pass the actual 'limit' value if 'need_tmp' is true.
Analysis:
The fix for bug lp:985667 implements the method Item_subselect::no_rows_in_result()
for all main kinds of subqueries. The purpose of this method is to be called from
return_zero_rows() and set Items to some default value in the case when a query
returns no rows. Aggregates and subqueries require special treatment in this case.
Every implementation of Item_subselect::no_rows_in_result() called
Item_subselect::make_const() to set the subquery predicate to its default value
irrespective of where the predicate was located in the query. Once the predicate
was set to a constant it was never executed.
At the same time, the JOIN object of the fake select for UNIONs (the one used for
the final result of the UNION), was set after all subqueries in the union were
executed. Since we set the subquery as constant, it was never executed, and the
corresponding JOIN was never created.
In order to decide whether the result of NOT IN is NULL or FALSE, Item_in_optimizer
needs to check if the subquery result was empty or not. This is where we got the
crash, because subselect_union_engine::no_rows() checks for
unit->fake_select_lex->join->send_records, and the join object was NULL.
Solution:
If a subquery is in the HAVING clause it must be evaluated in order to know its
result, so that we can properly filter the result records. Once subqueries in the
HAVING clause are executed even in the case of no result rows, this specific
crash will be solved, because the UNION will be executed, and its JOIN will be
constructed. Therefore the fix for this crash is to narrow the fix for lp:985667,
and to apply Item_subselect::no_rows_in_result() only when the subquery predicate
is in the SELECT clause.
Analysis:
When the method JOIN::choose_subquery_plan() decided to apply
the IN-TO-EXISTS strategy, it set the unit and select_lex
uncacheable flag to UNCACHEABLE_DEPENDENT_INJECTED unconditionally.
As result, even if IN-TO-EXISTS injected non-correlated predicates,
the subquery was still treated as correlated.
Solution:
Set the subquery as correlated only if the injected predicate(s) depend
on the outer query.
CHEAP SQ: Valgrind warnings "Memory lost" with IN and EXISTS nested subquery, materialization+semijoin
Analysis:
The memory leak was a result of the interaction of semi-join optimization
with early optimization of constant subqueries. The function:
setup_jtbm_semi_joins() created a dummy temporary table "dummy_table"
in order to make some JOIN_TAB objects complete. Normally, such temporary
tables are freed inside JOIN_TAB::cleanup.
However, the inner-most subquery is pre-optimized, which allows the
optimization fo the MAX subquery to determine that its WHERE is TRUE,
and thus to compute the result of the MAX during optimization. This
ultimately allows the optimize phase of the outer query to find that
it WHERE clause is FALSE. Once JOIN::optimize finds that the result
set is empty, it sets zero_result_cause, and returns *before* it ever
reached make_join_statistics(). As a result the query plan has no
JOIN_TABs at all. Since the temporary table is supposed to be cleanup
via JOIN_TAB::cleanup, this never happens because there is no JOIN_TAB
for this table. Hence we get a memory leak.
Solution:
Whenever there are no JOIN_TABs, iterate over all table reference in
JOIN::join_list, and free the ones that contain semi-join temporary
tables.