Before this fix, a IN predicate of the form: "IN (( subselect ))", with two
parenthesis, would be evaluated as a single row subselect: if the subselect
returns more that 1 row, the statement would fail.
The SQL:2003 standard defines a special exception in the specification,
and mandates that this particular form of IN predicate shall be equivalent
to "IN ( subselect )", which involves a table subquery and works with more
than 1 row.
This fix implements "IN (( subselect ))", "IN ((( subselect )))" etc
as per the SQL:2003 requirement.
All the details related to the implementation of this change have been
commented in the code, and the relevant sections of the SQL:2003 spec
are given for reference, so they are not repeated here.
Having access to the spec is a requirement to review in depth this patch.
The bug report has demonstrated the following two problems.
1. If an ORDER/GROUP BY list includes a constant expression being
optimized away and, at the same time, containing single-row
subselects that return more that one row, no error is reported.
Strictly speaking the standard allows to ignore error in this case.
Yet, now a corresponding fatal error is reported in this case.
2. If a query requires sorting by expressions containing single-row
subselects that, however, return more than one row, then the execution
of the query may cause a server crash.
To fix this some code has been added that blocks execution of a subselect
item in case of a fatal error in the method Item_subselect::exec.
UNION over correlated and uncorrelated SELECTS.
In such subqueries each uncorrelated SELECT should be considered as
uncacheable. Otherwise join_free is called for it and in many cases
it causes some problems.
when they contain the '!' operator.
Added an implementation for the method Item_func_not::print.
The method encloses any NOT expression into extra parentheses to avoid
incorrect stored representations of views that use the '!' operators.
Without this change when a view was created that contained
the expression !0*5 its stored representation contained not this
expression but rather the expression not(0)*5 .
The operator '!' is of a higher precedence than '*', while NOT is
of a lower precedence than '*'. That's why the expression !0*5
is interpreted as not(0)*5, while the expression not(0)*5 is interpreted
as not((0)*5) unless sql_mode is set to HIGH_NOT_PRECEDENCE.
Now we translate !0*5 into (not(0))*5.
- Make the code produce correct result: use an array of triggers to turn on/off equalities for each
compared column. Also turn on/off optimizations based on those equalities.
- Make EXPLAIN output show "Full scan on NULL key" for tables for which we switch between
ref/unique_subquery/index_subquery and ALL access.
- index_subquery engine now has HAVING clause when it is needed, and it is
displayed in EXPLAIN EXTENDED
- Fix incorrect presense of "Using index" for index/unique-based subqueries (BUG#22930)
// bk trigger note: this commit refers to BUG#24127
When transforming "oe IN (SELECT ie ...)" wrap the pushed-down predicates
iff "oe can be null", not "ie can be null".
The fix doesn't cover row-based subqueries, those will be fixed in #24127.
We create Item_cache_* object for each operand for each left operand of
a subquery predicate. We also create Item_func_conv_charset for each string
constant that needs charset conversion. So here we have Item_cache wrapped
into Item_func_conv_charset.
When Item_func_conv_charset wraps an constant Item it gets it's value
in constructor. The problem is that Item_cache is ready to be used only
at execution time, which is too late.
The fix makes Item_cache wrapping constant to get ready at fix_fields() time.
- When returning metadata for scalar subqueries the actual type of the
column was calculated based on the value type, which limits the actual
type of a scalar subselect to the set of (currently) 3 basic types :
integer, double precision or string. This is the reason that columns
of types other then the basic ones (e.g. date/time) are reported as
being of the corresponding basic type.
Fixed by storing/returning information for the column type in addition
to the result type.
This is a performance issue for queries with subqueries evaluation
of which requires filesort.
Allocation of memory for the sort buffer at each evaluation of a
subquery may take a significant amount of time if the buffer is rather big.
With the fix we allocate the buffer at the first evaluation of the
subquery and reuse it at each subsequent evaluation.
Evaluate "NULL IN (SELECT ...)" in a special way: Disable pushed-down
conditions and their "consequences":
= Do full table scans instead of unique_[index_subquery] lookups.
= Change appropriate "ref_or_null" accesses to full table scans in
subquery's joins.
Also cache value of NULL IN (SELECT ...) if the SELECT is not correlated
wrt any upper select.
If elements a not top-level IN subquery were accessed by an index and
the subquery result set included a NULL value then the quantified
predicate that contained the subquery was evaluated to NULL when
it should return a non-null value.
list using a function
When executing dependent subqueries they are re-inited and re-exec() for
each row of the outer context.
The cause for the bug is that during subquery reinitialization/re-execution,
the optimizer reallocates JOIN::join_tab, JOIN::table in make_simple_join()
and the local variable in 'sortorder' in create_sort_index(), which is
allocated by make_unireg_sortorder().
Care must be taken not to allocate anything into the thread's memory pool
while re-initializing query plan structures between subquery re-executions.
All such items mush be cached and reused because the thread's memory pool
is freed at the end of the whole query.
Note that they must be cached and reused even for queries that are not
otherwise cacheable because otherwise it will grow the thread's memory
pool every time a cacheable query is re-executed.
We provide additional members to the JOIN structure to store references
to the items that need to be cached.
strings
MySQL is setting the flag HA_END_SPACE_KEYS for all the keys that reference
text or varchar columns with collation different than binary.
This was done to handle correctly the situation where a lookup on such a key
may return more than 1 row because of the presence of many rows that differ
only by the amount of trailing space in the table's string column.
Inserting such values however appears to violate the unique checks on
INSERT/UPDATE. Thus that flag must not be set as it will prevent the optimizer
from choosing a faster access method.
This fix removes the setting of the HA_END_SPACE_KEYS flag.
an ALL/ANY quantified subquery in HAVING.
The Item::split_sum_func2 method should not create Item_ref
for objects of any class derived from Item_subselect.
wrong results
Mark the containing Item(s) (Item_subselect descendant usually) of
a subselect as containing aggregate functions if it has references
to aggregates functions that are calculated outside its context.
This tels end_send_group() not to make an Item_subselect descendant in
select list a copy and causes the correct value being returned.
Made the parser to support parenthesis around UNION branches.
This is done by amending the rules of the parser so it generates the correct
structure.
Currently it supports arbitrary subquery/join/parenthesis operations in the
EXISTS clause.
In the IN/scalar subquery case it will allow adding nested parenthesis only
if there is an UNION clause after the parenthesis. Otherwise it will just
treat the multiple nested parenthesis as a scalar expression.
It adds extra lex level for ((SELECT ...) UNION ...) to accommodate for the
UNION clause.
Treat queries with no FROM and aggregate functions as normal queries,
so the aggregate function get correctly calculated as if there is 1 row.
This means that they will be considered to have one row, so COUNT(*) will return
1 instead of 0. Other aggregates will behave in compatible manner.
- Make the range-et-al optimizer produce E(#table records after table
condition is applied),
- Make the join optimizer use this value,
- Add "filtered" column to EXPLAIN EXTENDED to show
fraction of records left after table condition is applied
- Adjust test results, add comments
The problem was in that opt_sum_query() replaced MIN/MAX functions
with the corresponding constant found in a key, but due to imprecise
representation of float numbers, when evaluating the where clause,
this comparison failed.
When MIN/MAX optimization detects that all tables can be removed,
also remove all conjuncts in a where clause that refer to these
tables. As a result of this fix, these conditions are not evaluated
twice, and in the case of float number comparisons we do not discard
result rows due to imprecise float representation.
As a side-effect this fix also corrects an unnoticed problem in
bug 12882.
The bug caused a crash of the server if a subquery with
ORDER BY DESC used the range access method.
The bug happened because the method QUICK_SELECT_DESC::reset
was not reworked after MRR interface had been introduced.
The bug was due to a loss happened during a refactoring made
on May 30 2005 that modified the function JOIN::reinit.
As a result of it for any subquery the value of offset_limit_cnt
was not restored for the following executions. Yet the first
execution of the subquery made it equal to 0.
The fix restores this value in the function JOIN::reinit.
DESCRIBE returned the type BIGINT for a column of a view if the column
was specified by an expression over values of the type INT.
E.g. for the view defined as follows:
CREATE VIEW v1 SELECT COALESCE(f1,f2) FROM t1
DESCRIBE returned type BIGINT for the only column of the view if f1,f2 are
columns of the INT type.
At the same time DESCRIBE returned type INT for the only column of the table
defined by the statement:
CREATE TABLE t2 SELECT COALESCE(f1,f2) FROM t1.
This inconsistency was removed by the patch.
Now the code chooses between INT/BIGINT depending on the
precision of the aggregated column type.
Thus both DESCRIBE commands above returns type INT for v1 and t2.
may return a wrong result.
An Item_sum_hybrid object has the was_values flag which indicates whether any
values were added to the sum function. By default it is set to true and reset
to false on any no_rows_in_result() call. This method is called only in
return_zero_rows() function. An ALL/ANY subquery can be optimized by MIN/MAX
optimization. The was_values flag is used to indicate whether the subquery
has returned at least one row. This bug occurs because return_zero_rows() is
called only when we know that the select will return zero rows before
starting any scans but often such information is not known.
In the reported case the return_zero_rows() function is not called and
the was_values flag is not reset to false and yet the subquery return no rows
Item_func_not_all and Item_func_nop_all functions return a wrong
comparison result.
The end_send_group() function now calls no_rows_in_result() for each item
in the fields_list if there is no rows were found for the (sub)query.
The ALL/ANY subqueries are the subject of MIN/MAX optimization. The matter
of this optimization is to embed MIN() or MAX() function into the subquery
in order to get only one row by which we can tell whether the expression
with ALL/ANY subquery is true or false.
But when it is applied to a subquery like 'select a_constant' the reported bug
occurs. As no tables are specified in the subquery the do_select() function
isn't called for the optimized subquery and thus no values have been added
to a MIN()/MAX() function and it returns NULL instead of a_constant.
This leads to a wrong query result.
For the subquery like 'select a_constant' there is no reason to apply
MIN/MAX optimization because the subquery anyway will return at most one row.
Thus the Item_maxmin_subselect class is more appropriate for handling such
subqueries.
The Item_in_subselect::single_value_transformer() function now checks
whether tables are specified for the subquery. If no then this subselect is
handled like a UNION using an Item_maxmin_subselect object.
The problem was that we restored SQL_CACHE, SQL_NO_CACHE flags in SELECT
statement from internal structures based on value set later at runtime, not
the original value set by the user.
The solution is to remember that original value.
The convert_constant_item() function converts constant items to ints on
prepare phase to optimize execution speed. In this case it tries to evaluate
subselect which contains a derived table and is contained in a derived table.
All derived tables are filled only after all derived tables are prepared.
So evaluation of subselect with derived table at the prepare phase will
return a wrong result.
A new flag with_subselect is added to the Item class. It indicates that
expression which this item represents is a subselect or contains a subselect.
It is set to 0 by default. It is set to 1 in the Item_subselect constructor
for subselects.
For Item_func and Item_cond derived classes it is set after fixing any argument
in Item_func::fix_fields() and Item_cond::fix_fields accordingly.
The convert_constant_item() function now doesn't convert a constant item
if the with_subselect flag set in it.
When a view statement is compiled on CREATE VIEW time, most of the
optimizations should not be done. Finding the right optimization
for a subquery is one of them.
Unfortunately the optimizer is resolving the column references of
the left expression of IN subqueries in the process of deciding
witch optimization to use (if needed). So there should be a
special case in Item_in_subselect::fix_fields() : check the
validity of the left expression of IN subqueries in CREATE VIEW
mode and then proceed as normal.
Re-work best_access_path() and find_best() to reuse E(#rows(range access)) as
E(#rows(ref[_or_null](const) access) only when it is appropriate.
[This is the final cumulative patch]
This performance degradation was due to the fact that some
cost evaluation code added into 4.1 in the function find_best was
not merged into the code of the function best_access_path added
together with other code for greedy optimizer.
Added a parameter to the function print_plan. The parameter contains
accumulated cost for a given partial join.
The patch does not include a special test case since this performance
degradation is hard to reproduse with a simple example.
TODO: make the function find_best use the function best_access_path
in order to remove duplication of code which might result in incomplete
merges in the future.
In the code that converts IN predicates to EXISTS predicates it is changing
the select list elements to constant 1. Example :
SELECT ... FROM ... WHERE a IN (SELECT c FROM ...)
is transformed to :
SELECT ... FROM ... WHERE EXISTS (SELECT 1 FROM ... HAVING a = c)
However there can be no FROM clause in the IN subquery and it may not be
a simple select : SELECT ... FROM ... WHERE a IN (SELECT f(..) AS
c UNION SELECT ...) This query is transformed to : SELECT ... FROM ...
WHERE EXISTS (SELECT 1 FROM (SELECT f(..) AS c UNION SELECT ...)
x HAVING a = c) In the above query c in the HAVING clause is made to be
an Item_null_helper (a subclass of Item_ref) pointing to the real
Item_field (which is not referenced anywhere else in the query anymore).
This is done because Item_ref_null_helper collects information whether
there are NULL values in the result. This is OK for directly executed
statements, because the Item_field pointed by the Item_null_helper is
already fixed when the transformation is done. But when executed as
a prepared statement all the Item instances are "un-fixed" before the
recompilation of the prepared statement. So when the Item_null_helper
gets fixed it discovers that the Item_field it points to is not fixed
and issues an error. The remedy is to keep the original select list
references when there are no tables in the FROM clause. So the above
becomes : SELECT ... FROM ... WHERE EXISTS (SELECT c FROM (SELECT f(..)
AS c UNION SELECT ...) x HAVING a = c) In this way c is referenced
directly in the select list as well as by reference in the HAVING
clause. So it gets correctly fixed even with prepared statements. And
since the Item_null_helper subclass of Item_ref_null_helper is not used
anywhere else it's taken out.
Multiple equalities were not adjusted after reading constant tables.
It resulted in neglecting good index based methods that could be
used to access of other tables.
When there is conjunction of conds, the substitute_for_best_equal_field()
will call the eliminate_item_equal() function in loop to build final
expression. But if eliminate_item_equal() finds that some cond will always
evaluate to 0, then that cond will be substituted by Item_int with value ==
0. In this case on the next iteration eliminate_item_equal() will get that
Item_int and treat it as Item_cond. This is leads to memory corruption and
server crash on cleanup phase.
To the eliminate_item_equal() function was added DBUG_ASSERT for checking
that all items treaten as Item_cond are really Item_cond.
The substitute_for_best_equal_field() now checks that if
eliminate_item_equal() returns Item_int and it's value is 0 then this
value is returned as the result of whole conjunction.
A subquery transformation changes the HAVING clause of the embedding query if the subquery contains
a GROUP BY clause. Yet the split_sum_func2 function was not applied to the modified HAVING clause.
This could result in wrong answers.
Bug #10308: Parse 'purge master logs' with subselect correctly.
subselect.test:
Bug #10308: Test for 'purge master logs' with subselect.
subselect.result:
Bug #10308: Test result for 'purge master logs' with subselect.
* Provide backwards compatibility extension to name resolution of
coalesced columns. The patch allows such columns to be qualified
with a table (and db) name, as it is in 4.1.
Based on a patch from Monty.
* Adjusted tests accordingly to test both backwards compatible name
resolution of qualified columns, and ANSI-style resolution of
non-qualified columns.
For this, each affected test has two versions - one with qualified
columns, and one without.
Fixed bug #11479.
The JOIN::reinit method cannot call setup_tables
after the optimization phase since this function
removes some optimization settings for joined
tables. E.g. it resets values of the null_row flag to 0.
subselect.result, subselect.test:
Added a test case for bug #11479.
"Process NATURAL and USING joins according to SQL:2003".
* Some of the main problems fixed by the patch:
- in "select *" queries the * expanded correctly according to
ANSI for arbitrary natural/using joins
- natural/using joins are correctly transformed into JOIN ... ON
for any number/nesting of the joins.
- column references are correctly resolved against natural joins
of any nesting and combined with arbitrary other joins.
* This patch also contains a fix for name resolution of items
inside the ON condition of JOIN ... ON - in this case items must
be resolved only against the JOIN operands. To support such
'local' name resolution, the patch introduces a stack of
name resolution contexts used at parse time.
NOTICE:
- This patch is not complete in the sense that
- there are 2 test cases that still do not pass -
one in join.test, one in select.test. Both are marked
with a comment "TODO: WL#2486".
- it does not include a new test specific for the task
Added a test case for bug #12392.
item_cmpfunc.cc:
Fixed bug #12392.
Missing handling of rows containing NULL components
when evaluating IN predicates caused a crash.
- Fixed some error condtion when handling dates with 'T'
- Added extra test for bug #11867 (Wrong result with "... WHERE ROW( a, b ) IN ( SELECT DISTINCT a, b WHERE ...)" to show it's not yet fixed
- Safety fixes and cleanups
Added test case for bug #11867.
Fixed results for two existing test cases.
subselect.test:
Added test case for bug #11867.
item_subselect.cc:
Fixed bug #11867.
Added missing code in Item_in_subselect::row_value_transformer
that caused problems for queries with
ROW(elems) IN (SELECT DISTINCT cols FROM ...).
Item_type_holder doesn't store information about length and exact type of
original item which results in redefining length to max_length and geometry
type to longtext.
Changed the way derived tables except unions are built. Now they are created
from original field list instead of list of Item_type_holder.
Fixed buf #11487.
Added a call of QUICK_RANGE_SELECT::init to the
QUICK_RANGE_SELECT::reset method. Without it the second
evaluation of a subquery employing the range access failed.
subselect.result, subselect.test:
Added a test case for bug #11487.
Added a test case for bug #9516.
item_subselect.h:
Fixed bug #9516.
The bug was due to that fact that the class Item_subselect
inherited the generic implementation of the function
not_null_tables that was not valid for the objects
of this class. As a result evaluation of the
not_null_tables attribute was not correct for subqueries.
This caused invalid transformations of outer joins into
inner joins.
Added a test case for bug #9338.
sql_select.cc:
Fixed bug #9338.
When an occurence of a field reference has to be replaced
by another field reference the whole Item_field must be
replaced.
item.cc:
Fixed bug #9338.
The method Item_field::replace_equal_field_processor was
replaced by Item_field::replace_equal_field. The new method
is used to replace the occurences of Item_field objects.
item.h:
Fixed bug #9338.
The virtual function replace_equal_field_processor was replaced
by replace_equal_field. The latter is supposed to be used as a
callback function in calls of the method transform.
CAST() now produces warnings when casting a wrong INTEGER or CHAR values. This also applies to implicite string to number casts. (Bug #5912)
ALTER TABLE now fails in STRICT mode if it generates warnings.
Inserting a zero date in a DATE, DATETIME or TIMESTAMP column during TRADITIONAL mode now produces an error. (Bug #5933)
Remove TMP_TABLE_PARAM::copy_funcs_it. TMP_TABLE_PARAM is a member of JOIN which is
copied via memcpy, and List_iterator_fast TMP_TABLE_PARAM::copy_funcs_it ends up
pointing to the wrong List.
Added test cases for bug #7351.
item_cmpfunc.cc:
Fixed bug #7351: incorrect result for a query with a
subquery returning empty set.
If in the predicate v IN (SELECT a FROM t WHERE cond)
v is null, then the result of the predicate is either
INKNOWN or FALSE. It is FALSE if the subquery returns
an empty set.
item_subselect.cc:
Fixed bug #7351: incorrect result for a query with a
subquery returning empty set.
The problem was due to not a quite legal transformation
for 'IN' subqueries. A subquery containing a predicate
of the form
v IN (SELECT a FROM t WHERE cond)
was transformed into
EXISTS(SELECT a FROM t WHERE cond AND (a=v OR a IS NULL)).
Yet, this transformation is valid only if v is not null.
If v is null, then, in the case when
(SELECT a FROM t WHERE cond) returns an empty set the value
of the predicate is FALSE, otherwise the result of the
predicate is INKNOWN.
The fix resolves this problem by changing the result
of the transformation to
EXISTS(SELECT a FROM t WHERE cond AND (v IS NULL OR (a=v OR a IS NULL)))
in the case when v is nullable.
The new transformation prevents applying the lookup
optimization for IN subqueries. To make it still
applicable we have to introduce guarded access methods.
Renamed HA_VAR_LENGTH to HA_VAR_LENGTH_PART
Renamed in all files FIELD_TYPE_STRING and FIELD_TYPE_VAR_STRING to MYSQL_TYPE_STRING and MYSQL_TYPE_VAR_STRING to make it easy to catch all possible errors
Added support for VARCHAR KEYS to heap
Removed support for ISAM
Now only long VARCHAR columns are changed to TEXT on demand (not CHAR)
Internal temporary files can now use fixed length tables if the used VARCHAR columns are short
results." a.k.a. "Proper cleanup of subqueries is missing for SET and DO
statements". (Version #2 with after-review fixes).
To perform proper cleanup for statements that can contain subqueries
but don't have main select we must call free_undelaid_joins().
FOUND is not a reserved keyword anymore
Added Item_field::set_no_const_sub() to be able to mark fields that can't be substituted
Added 'simple_select' method to be able to quickly determinate if a select_result is a normal SELECT
Note that the 5.0 tree is not yet up to date: Sanja will have to fix multi-update-locks for this merge to be complete
improved mechanisn of detection posibility to be NULL for single row queries
switched off substitution optimisation for single row subqueries in PS due to problem in resolving substituted expressions
(changes to make subselects test working with PS protocol)
prepared statements when LIMIT is used" and post-review comments.
The fix changes the approach we calculate the need for ORDER BY
in UNION: the previous was not PS friendly, as it damaged SELECT_LEX
options in case of single select.
Added the code processing on expressions for applying
multiple equalities.
sql_select.cc:
Post-merge fixes for Item_equal patch.
Added the code processing on expressions for applying
multiple equalities.
Many files:
Post-merge fixes for Item_equal patch.
item_cmpfunc.cc:
Post-merge fixes for Item_equal patch.
Fixed a problem when an equality field=const cannot be applied to
the predicate P(field,c) for constant propagation as a conversion
of field is needed.
item.h, item.cc:
Fixed a problem when an equality field=const cannot be applied to
the predicate P(field,c) for constant propagation as a conversion
of field is needed.