values return too many records
WHERE clauses with "outer_value_list NOT IN subselect" were
handled incorrectly if the outer value list contained multiple
items where at least one of these could be NULL. The first
outer record with NULL value was handled correctly, but if a
second record with NULL value existed, the optimizer would
choose to reuse the result it got on the last execution of the
subselect. This is incorrect if the outer value list has
multiple items.
The fix is to make Item_in_optimizer::val_int (in
item_cmpfunc.cc) reuse the result of the latest execution
for NULL values only if all values in the outer_value_list
are NULL.
EXPLAIN EXTENDED of nested query containing a error:
1054 Unknown column '...' in 'field list'
may cause a server crash.
Parse error like described above forces a call to
JOIN::destroy() on malformed subquery.
That JOIN::destroy function closes and frees temporary
tables. However, temporary fields of these tables
may be listed in st_select_lex::group_list of outer
query, and that st_select_lex may not cleanup them
properly. So, after the JOIN::destroy call that
st_select_lex::group_list may have Item_field
objects with dangling pointers to freed temporary
table Field objects. That caused a crash.
messed up
"ROW(...) IN (SELECT ... FROM DUAL)" always returned TRUE.
Item_in_subselect::row_value_transformer rewrites "ROW(...)
IN SELECT" conditions into the "EXISTS (SELECT ... HAVING ...)"
form.
For a subquery from the DUAL pseudotable resulting HAVING
condition is an expression on constant values, so further
transformation with optimize_cond() eliminates this HAVING
condition and resets JOIN::having to NULL.
Then JOIN::exec treated that NULL as an always-true-HAVING
and that caused a bug.
To distinguish an optimized out "HAVING TRUE" clause from
"HAVING FALSE" we already have the JOIN::having_value flag.
However, JOIN::exec() ignored JOIN::having_value as described
above as if it always set to COND_TRUE.
The JOIN::exec method has been modified to take into account
the value of the JOIN::having_value field.
Select with a "NULL NOT IN" condition containing complex
subselect from the same table as in the outer select failed
with an assertion.
The failure was caused by a concatenation of circumstances:
1) an inner select was optimized by make_join_statistics to use
the QUICK_RANGE_SELECT access method (that implies an index
scan of the table);
2) a subselect was independent (constant) from the outer select;
3) a condition was pushed down into inner select.
During the evaluation of a constant IN expression an optimizer
temporary changed the access method from index scan to table
scan, but an engine handler was already initialized for index
access by make_join_statistics. That caused an assertion.
Unnecessary index initialization has been removed from
the QUICK_RANGE_SELECT::init method (QUICK_RANGE_SELECT::reset
reinvokes this initialization).
impossible WHERE/HAVING clause
(subselect_single_select_engine::exec).
Allocation and initialization of joined table list t1, t2... of
subqueries like:
NOT IN (SELECT ... FROM t1,t2,... WHERE 0)
is optimized out, however server tries to traverse this list.
Queries like:
SELECT ROW(1, 2) IN (SELECT t1.a, 2)
FROM t1 GROUP BY t1.a
or
SELECT ROW(1, 2) IN (SELECT t1.a, 2 FROM t2)
FROM t1 GROUP BY t1.a
lead to assertion failure in the
Item_in_subselect::row_value_transformer method in debugging
build, or to unexpected error message in release build:
ERROR 1247 (42S22): Reference '<list ref>' not supported (forward
reference in item list)
Unexpected error message and assertion failure have been
eliminated.
Index lookup does not always guarantee that we can
simply remove the relevant conditions from the WHERE
clause. Reasons can be e.g. conversion errors,
partial indexes etc.
The optimizer was removing these parts of the WHERE
condition without any further checking.
This leads to "false positives" when using indexes.
Fixed by checking the index reference conditions
(using WHERE) when using indexes with sub-queries.
Conversion errors when constructing the condition for an
IN predicates were treated as if the affected column contains
NULL. If such a IN predicate is inside NOT we get wrong
results.
Corrected the handling of conversion errors in an IN predicate
that is resolved by unique_subquery (through
subselect_uniquesubquery_engine).
a crash when the left operand of the predicate is evaluated to NULL.
It happens when the rows from the inner tables (tables from the subquery)
are accessed by index methods with key values obtained by evaluation of
the left operand of the subquery predicate. When this predicate is
evaluated to NULL an alternative access with full table scan is used
to check whether the result set returned by the subquery is empty or not.
The crash was due to the fact the info about the access methods used for
regular key values was not properly restored after a switch back from the
full scan access method had occurred.
The patch restores this info properly.
The same problem existed for queries with IN subquery predicates if they
were used not at the top level of the queries.
- added join cache indication in EXPLAIN (Extra column).
- prefer filesort over full scan over
index for ORDER BY (because it's faster).
- when switching from REF to RANGE because
RANGE uses longer key turn off sort on
the head table only as the resulting
RANGE access is a candidate for join cache
and we don't want to disable it by sorting
on the first table only.
conditions when executing an equijoin query with WHERE condition
containing a subquery predicate of the form join_attr NOT IN (SELECT ...).
To resolve a problem of the correct evaluation of the expression
attr NOT IN (SELECT ...)
an array of guards is created to make it possible to filter out some
predicates of the EXISTS subquery into which the original subquery
predicate is transformed, in the cases when a takes the NULL value.
If attr is defined as a field that cannot be NULL than such an array
is not needed and is not created.
However if the field a occurred also an an equijoin predicate t2.a=t1.b
and table t1 is accessed before table t2 then it may happen that the
the EXISTS subquery is pushed down to the condition evaluated just after
table t1 has been accessed. In this case any occurrence of t2.a is
substituted for t1.b. When t1.b takes the value of NULL an attempt is
made to turn on the corresponding guard. This action caused a crash as
no guard array had been created.
Now the code of Item_in_subselect::set_cond_guard_var checks that the guard
array has been created before setting a guard variable on. Otherwise the
method does nothing. It cannot results in returning a row that could be
rejected as the condition t2.a=t1.b will be checked later anyway.
The Item_outer_ref class based on the Item_direct_ref class was always used
to represent an outer field. But if the outer select is a grouping one and the
outer field isn't under an aggregate function which is aggregated in that
outer select an Item_ref object should be used to represent such a field.
If the outer select in which the outer field is resolved isn't grouping then
the Item_field class should be used to represent such a field.
This logic also should be used for an outer field resolved through its alias
name.
Now the Item_field::fix_outer_field() uses Item_outer_field objects to
represent aliased and non-aliased outer fields for grouping outer selects
only.
Now the fix_inner_refs() function chooses which class to use to access outer
field - the Item_ref or the Item_direct_ref. An object of the chosen class
substitutes the original field in the Item_outer_ref object.
The direct_ref and the found_in_select_list fields were added to the
Item_outer_ref class.
To correctly decide which predicates can be evaluated with a given table
the optimizer must know the exact set of tables that a predicate depends
on. If that mask is too wide (refer to non-existing tables) the optimizer
can erroneously skip a predicate.
One such case of wrong table usage mask were the aggregate functions.
The have a all-1 mask (meaning depend on all tables, including non-existent
ones).
Fixed by making a real used_tables mask for the aggregates. The mask is
constructed in the following way :
1. OR the table dependency masks of all the arguments of the aggregate.
2. If all the arguments of the function are from the local name resolution
context and it is evaluated in the same name resolution
context where it is referenced all the tables from that name resolution
context are OR-ed to the dependency mask. This is to denote that an
aggregate function depends on the number of rows it processes.
3. Handle correctly the case of an aggregate function optimization (such that
the aggregate function can be pre-calculated and made a constant).
Made sure that an aggregate function is never a constant (unless subject of a
specific optimization and pre-calculation).
One other flaw was revealed and fixed in the process : references were
not calling the recalculation method for used_tables of their targets.
created for sorting.
Any outer reference in a subquery was represented by an Item_field object.
If the outer select employs a temporary table all such fields should be
replaced with fields from that temporary table in order to point to the
actual data. This replacement wasn't done and that resulted in a wrong
subquery evaluation and a wrong result of the whole query.
Now any outer field is represented by two objects - Item_field placed in the
outer select and Item_outer_ref in the subquery. Item_field object is
processed as a normal field and the reference to it is saved in the
ref_pointer_array. Thus the Item_outer_ref is always references the correct
field. The original field is substituted for a reference in the
Item_field::fix_outer_field() function.
New function called fix_inner_refs() is added to fix fields referenced from
inner selects and to fix references (Item_ref objects) to these fields.
The new Item_outer_ref class is a descendant of the Item_direct_ref class.
It additionally stores a reference to the original field and designed to
behave more like a field.
Objects of the classes Item_func_is_not_null_test and Item_func_trig_cond
must be transparent for the method Item::split_sum_func2 as these classes
are pure helpers. It means that the method Item::split_sum_func2 should
look at those objects as at pure wrappers.
- Make the code produce correct result: use an array of triggers to turn on/off equalities for each
compared column. Also turn on/off optimizations based on those equalities.
- Make EXPLAIN output show "Full scan on NULL key" for tables for which we switch between
ref/unique_subquery/index_subquery and ALL access.
- index_subquery engine now has HAVING clause when it is needed, and it is
displayed in EXPLAIN EXTENDED
- Fix incorrect presense of "Using index" for index/unique-based subqueries (BUG#22930)
// bk trigger note: this commit refers to BUG#24127
When transforming "oe IN (SELECT ie ...)" wrap the pushed-down predicates
iff "oe can be null", not "ie can be null".
The fix doesn't cover row-based subqueries, those will be fixed in #24127.
Evaluate "NULL IN (SELECT ...)" in a special way: Disable pushed-down
conditions and their "consequences":
= Do full table scans instead of unique_[index_subquery] lookups.
= Change appropriate "ref_or_null" accesses to full table scans in
subquery's joins.
Also cache value of NULL IN (SELECT ...) if the SELECT is not correlated
wrt any upper select.