query / no aggregate of subquery
The optimizer counts the aggregate functions that
appear as top level expressions (in all_fields) in
the current subquery. Later it makes a list of these
that it uses to actually execute the aggregates in
end_send_group().
That count is used in several places as a flag whether
there are aggregates functions.
While collecting the above info it must not consider
aggregates that are not aggregated in the current
context. It must treat them as normal expressions
instead. Not doing that leads to incorrect data about
the query, e.g. running a query that actually has no
aggregate functions as if it has some (and hence is
expected to return only one row).
Fixed by ignoring the aggregates that are not aggregated
in the current context.
One other smaller omission discovered and fixed in the
process : the place of aggregation was not calculated for
user defined functions. Fixed by calling
Item_sum::init_sum_func_check() and
Item_sum::check_sum_func() as it's done for the rest of
the aggregate functions.
ORDER BY and LIMIT 1.
The bug was introduced by the patch for bug 21727. The patch
erroneously skipped initialization of the array of headers
for sorted records for non-first evaluations of the subquery.
To fix the problem a new parameter has been added to the
function make_char_array that performs the initialization.
Now this function is called for any invocation of the
filesort procedure. Yet it allocates the buffer for sorted
records only if this parameter is NULL.
using a derived table over a grouping subselect.
This crash happens only when materialization of the derived tables
requires creation of auxiliary temporary table, for example when
a grouping operation is carried out with usage of a temporary table.
The crash happened because EXPLAIN EXTENDED when printing the query
expression made an attempt to use the objects created in the mem_root
of the temporary table which has been already freed by the moment
when printing is called.
This bug appeared after the method Item_field::print() had been
introduced.
Non-correlated scalar subqueries may get executed
in EXPLAIN at the optimization phase if they are
part of a right hand sargable expression.
If the scalar subquery uses a temp table to
materialize its results it will replace the
subquery structure from the parser with a simple
select from the materialization table.
As a result the EXPLAIN will crash as the
temporary materialization table is not to be shown
in EXPLAIN at all.
Fixed by preserving the original query structure
right after calling optimize() for scalar subqueries
with temp tables executed during EXPLAIN.
Validity checks for nested set functions
were not taking into account that the enclosed
set function may be on a nest level that is
lower than the nest level of the enclosing set
function.
Fixed by :
- propagating max_sum_func_level
up the enclosing set functions chain.
- updating the max_sum_func_level of the
enclosing set function when the enclosed set
function is aggregated above or on the same
nest level of as the level of the enclosing
set function.
- updating the max_arg_level of the enclosing
set function on a reference that refers to
an item above or on the same nest level
as the level of the enclosing set function.
- Treating both Item_field and Item_ref as possibly
referencing items from outer nest levels.
The Item_outer_ref class based on the Item_direct_ref class was always used
to represent an outer field. But if the outer select is a grouping one and the
outer field isn't under an aggregate function which is aggregated in that
outer select an Item_ref object should be used to represent such a field.
If the outer select in which the outer field is resolved isn't grouping then
the Item_field class should be used to represent such a field.
This logic also should be used for an outer field resolved through its alias
name.
Now the Item_field::fix_outer_field() uses Item_outer_field objects to
represent aliased and non-aliased outer fields for grouping outer selects
only.
Now the fix_inner_refs() function chooses which class to use to access outer
field - the Item_ref or the Item_direct_ref. An object of the chosen class
substitutes the original field in the Item_outer_ref object.
The direct_ref and the found_in_select_list fields were added to the
Item_outer_ref class.
If a set function with a outer reference s(outer_ref) cannot be aggregated
the outer query against which the reference has been resolved then MySQL
interpretes s(outer_ref) in the same way as it would interpret s(const).
Hovever the standard requires throwing an error in this situation.
Added some code to support this requirement in ansi mode.
Corrected another minor bug in Item_sum::check_sum_func.
context was used as an argument of GROUP_CONCAT.
Ensured correct setting of the depended_from field in references
generated for set functions aggregated in outer selects.
A wrong value of this field resulted in wrong maps returned by
used_tables() for these references.
Made sure that a temporary table field is added for any set function
aggregated in outer context when creation of a temporary table is
needed to execute the inner subquery.
aggregated in outer context returned wrong results.
This happened only if the subquery did not contain any references
to outer fields.
As there were no references to outer fields the subquery erroneously
was taken for non-correlated one.
Now any set function aggregated in outer context makes the subquery
correlated.
when the column is to be read from a derived table column which
was specified as a concatenation of string literals.
The bug happened because the Item_string::append did not adjust the
value of Item_string::max_length. As a result of it the temporary
table column defined to store the concatenation of literals was
not wide enough to hold the whole value.
Before this fix, a IN predicate of the form: "IN (( subselect ))", with two
parenthesis, would be evaluated as a single row subselect: if the subselect
returns more that 1 row, the statement would fail.
The SQL:2003 standard defines a special exception in the specification,
and mandates that this particular form of IN predicate shall be equivalent
to "IN ( subselect )", which involves a table subquery and works with more
than 1 row.
This fix implements "IN (( subselect ))", "IN ((( subselect )))" etc
as per the SQL:2003 requirement.
All the details related to the implementation of this change have been
commented in the code, and the relevant sections of the SQL:2003 spec
are given for reference, so they are not repeated here.
Having access to the spec is a requirement to review in depth this patch.
The bug report has demonstrated the following two problems.
1. If an ORDER/GROUP BY list includes a constant expression being
optimized away and, at the same time, containing single-row
subselects that return more that one row, no error is reported.
Strictly speaking the standard allows to ignore error in this case.
Yet, now a corresponding fatal error is reported in this case.
2. If a query requires sorting by expressions containing single-row
subselects that, however, return more than one row, then the execution
of the query may cause a server crash.
To fix this some code has been added that blocks execution of a subselect
item in case of a fatal error in the method Item_subselect::exec.
(Mostly in DBUG_PRINT() and unused arguments)
Fixed bug in query cache when used with traceing (--with-debug)
Fixed memory leak in mysqldump
Removed warnings from mysqltest scripts (replaced -- with #)
- When returning metadata for scalar subqueries the actual type of the
column was calculated based on the value type, which limits the actual
type of a scalar subselect to the set of (currently) 3 basic types :
integer, double precision or string. This is the reason that columns
of types other then the basic ones (e.g. date/time) are reported as
being of the corresponding basic type.
Fixed by storing/returning information for the column type in addition
to the result type.
This is a performance issue for queries with subqueries evaluation
of which requires filesort.
Allocation of memory for the sort buffer at each evaluation of a
subquery may take a significant amount of time if the buffer is rather big.
With the fix we allocate the buffer at the first evaluation of the
subquery and reuse it at each subsequent evaluation.
If elements a not top-level IN subquery were accessed by an index and
the subquery result set included a NULL value then the quantified
predicate that contained the subquery was evaluated to NULL when
it should return a non-null value.
list using a function
When executing dependent subqueries they are re-inited and re-exec() for
each row of the outer context.
The cause for the bug is that during subquery reinitialization/re-execution,
the optimizer reallocates JOIN::join_tab, JOIN::table in make_simple_join()
and the local variable in 'sortorder' in create_sort_index(), which is
allocated by make_unireg_sortorder().
Care must be taken not to allocate anything into the thread's memory pool
while re-initializing query plan structures between subquery re-executions.
All such items mush be cached and reused because the thread's memory pool
is freed at the end of the whole query.
Note that they must be cached and reused even for queries that are not
otherwise cacheable because otherwise it will grow the thread's memory
pool every time a cacheable query is re-executed.
We provide additional members to the JOIN structure to store references
to the items that need to be cached.
an ALL/ANY quantified subquery in HAVING.
The Item::split_sum_func2 method should not create Item_ref
for objects of any class derived from Item_subselect.
wrong results
Mark the containing Item(s) (Item_subselect descendant usually) of
a subselect as containing aggregate functions if it has references
to aggregates functions that are calculated outside its context.
This tels end_send_group() not to make an Item_subselect descendant in
select list a copy and causes the correct value being returned.