query / no aggregate of subquery
The optimizer counts the aggregate functions that
appear as top level expressions (in all_fields) in
the current subquery. Later it makes a list of these
that it uses to actually execute the aggregates in
end_send_group().
That count is used in several places as a flag whether
there are aggregates functions.
While collecting the above info it must not consider
aggregates that are not aggregated in the current
context. It must treat them as normal expressions
instead. Not doing that leads to incorrect data about
the query, e.g. running a query that actually has no
aggregate functions as if it has some (and hence is
expected to return only one row).
Fixed by ignoring the aggregates that are not aggregated
in the current context.
One other smaller omission discovered and fixed in the
process : the place of aggregation was not calculated for
user defined functions. Fixed by calling
Item_sum::init_sum_func_check() and
Item_sum::check_sum_func() as it's done for the rest of
the aggregate functions.
Problem: separator was not converted to the result character set,
so the result was a mixture of two different character sets,
which was especially bad for UCS2.
Fix: convert separator to the result character set.
When using GROUP_CONCAT with ORDER BY, a tree is used for the sorting, as
opposed to normal nested loops join used when there is no ORDER BY.
The tree traversal that generates the result counts the lines that have been
cut down. (as they get cut down to the field's max_size)
But the check of that count was before the tree traversal, so no
warning was generated if the output is truncated.
Fixed by moving the check to after the tree traversal.
Validity checks for nested set functions
were not taking into account that the enclosed
set function may be on a nest level that is
lower than the nest level of the enclosing set
function.
Fixed by :
- propagating max_sum_func_level
up the enclosing set functions chain.
- updating the max_sum_func_level of the
enclosing set function when the enclosed set
function is aggregated above or on the same
nest level of as the level of the enclosing
set function.
- updating the max_arg_level of the enclosing
set function on a reference that refers to
an item above or on the same nest level
as the level of the enclosing set function.
- Treating both Item_field and Item_ref as possibly
referencing items from outer nest levels.
If a set function with a outer reference s(outer_ref) cannot be aggregated
the outer query against which the reference has been resolved then MySQL
interpretes s(outer_ref) in the same way as it would interpret s(const).
Hovever the standard requires throwing an error in this situation.
Added some code to support this requirement in ansi mode.
Corrected another minor bug in Item_sum::check_sum_func.
When creating a temporary table the concise column type
of a string expression is decided based on its length:
- if its length is under 512 it is stored as either
varchar or char.
- otherwise it is stored as a BLOB.
There is a flag (convert_blob_length) to create_tmp_field
that, when >0 allows to force creation of a varchar if the
max blob length is under convert_blob_length.
However it must be verified that convert_blob_length
(settable through a SQL option in some cases) is
under the maximum that can be stored in a varchar column.
While performing that check for expressions in
create_tmp_field_from_item the max length of the blob was
used instead. This causes blob columns to be created in the
heap temp table used by GROUP_CONCAT (where blobs must not
be created in the temp table because of the constant
convert_blob_length that is passed to create_tmp_field() ).
And since these blob columns are not expected in that place
we get wrong results.
Fixed by checking that the value of the flag variable is
in the limits that fit into VARCHAR instead of the max length
of the blob column.
from func_group.test after the patch for bug #27229 had been applied.
The memory corruption happened because in some rare cases the function
count_field_types underestimated the number of elements in
in the array param->items_to_copy.
context was used as an argument of GROUP_CONCAT.
Ensured correct setting of the depended_from field in references
generated for set functions aggregated in outer selects.
A wrong value of this field resulted in wrong maps returned by
used_tables() for these references.
Made sure that a temporary table field is added for any set function
aggregated in outer context when creation of a temporary table is
needed to execute the inner subquery.
The problem in this bug is when we create temporary tables. When
temporary tables are created for unions, there is some
inferrence being carried out regarding the type of the column.
Whenever this column type is inferred to be REAL (i.e. FLOAT or
DOUBLE), MySQL will always try to maintain exact precision, and
if that is not possible (there are hardware limits, since FLOAT
and DOUBLE are stored as approximate values) will switch to
using approximate values. The problem here is that at this point
the information about number of significant digits is not
available. Furthermore, the number of significant digits should
be increased for the AVG function, however, this was not properly
handled. There are 4 parts to the problem:
#1: DOUBLE and FLOAT fields don't display their proper display
lengths in max_display_length(). This is hard-coded as 53 for
DOUBLE and 24 for FLOAT. Now changed to instead return the
field_length.
#2: Type holders for temporary tables do not preserve the
max_length of the Item's from which they are created, and is
instead reverted to the 53 and 24 from above. This causes
*all* fields to get non-fixed significant digits.
#3: AVG function does not update max_length (display length)
when updating number of decimals.
#4: The function that switches to non-fixed number of
significant digits should use DBL_DIG + 2 or FLT_DIG + 2 as
cut-off values (Since fixed precision does not use the 'e'
notation)
Of these points, #1 is the controversial one, but this
change is preferred and has been cleared with Monty. The
function causes quite a few unit tests to blow up and they had
to b changed, but each one is annotated and motivated. We
frequently see the magical 53 and 24 give way to more relevant
numbers.
aggregated in outer context returned wrong results.
This happened only if the subquery did not contain any references
to outer fields.
As there were no references to outer fields the subquery erroneously
was taken for non-correlated one.
Now any set function aggregated in outer context makes the subquery
correlated.
To correctly decide which predicates can be evaluated with a given table
the optimizer must know the exact set of tables that a predicate depends
on. If that mask is too wide (refer to non-existing tables) the optimizer
can erroneously skip a predicate.
One such case of wrong table usage mask were the aggregate functions.
The have a all-1 mask (meaning depend on all tables, including non-existent
ones).
Fixed by making a real used_tables mask for the aggregates. The mask is
constructed in the following way :
1. OR the table dependency masks of all the arguments of the aggregate.
2. If all the arguments of the function are from the local name resolution
context and it is evaluated in the same name resolution
context where it is referenced all the tables from that name resolution
context are OR-ed to the dependency mask. This is to denote that an
aggregate function depends on the number of rows it processes.
3. Handle correctly the case of an aggregate function optimization (such that
the aggregate function can be pre-calculated and made a constant).
Made sure that an aggregate function is never a constant (unless subject of a
specific optimization and pre-calculation).
One other flaw was revealed and fixed in the process : references were
not calling the recalculation method for used_tables of their targets.
When only one row was present, the subtraction of nearly the same number
resulted in catastropic cancellation, introducing an error in the
VARIANCE calculation near 1e-15. That was sqrt()ed to get STDDEV, the
error was escallated to near 1e-8.
The simple fix of testing for a row count of 1 and forcing that to yield
0.0 is insufficient, as two rows of the same value should also have a
variance of 0.0, yet the error would be about the same.
So, this patch changes the formula that computes the VARIANCE to be one
that is not subject to catastrophic cancellation.
In addition, it now uses only (faster-than-decimal) floating point numbers
to calculate, and renders that to other types on demand.
We use val_int() calls (followed by null_value check) to determine
nullness in some Item_sum_count' and Item_sum_count_distinct' methods,
as a side effect we get extra warnings raised in the val_int().
Fix: use is_null() instead.
- Removed not used variables and functions
- Added #ifdef around code that is not used
- Renamed variables and functions to avoid conflicts
- Removed some not used arguments
Fixed some class/struct warnings in ndb
Added define IS_LONGDATA() to simplify code in libmysql.c
I did run gcov on the changes and added 'purecov' comments on almost all lines that was not just variable name changes
When implicitly converting string fields to numbers the
string-to-number conversion error was not sent to the client.
Added code to send the conversion error as warning.
We also need to prevent generation of warnings from the places
where val_xxx() methods are called for the sole purpose of updating
the Item::null_value flag.
To achieve that a special function is added (and called) :
update_null_value(). This function will set the no_errors flag and
will call val_xxx(). The warning generation in Field_string::val_xxx()
will use the flag when generating the conversion warnings.
expression cols.
The problem was that MYSQL_FIELD::org_name was set for MIN() and MAX()
functions (COUNT() is also mentioned in the bug report but was already
fixed).
After this patch for expressions MYSQL_FIELD::name is set to either
expression itself or its alias, and other data origin fields of
MYSQL_FILED (db, org_table, table, org_name) are empty strings.
Problem: GROUP_CONCAT on a multi-byte column can truncate
in the middle of a multibyte character when applying
group_concat_max_len limit. It produces an invalid
multi-byte character in the result string.
The second, easier version - reusing old "warning_for_row" flag,
instead of introducing of "result_is_full" - which was
added in the previous commit.
Item::val_xxx() may be called by the server several times at execute time
for a single query. Calls to val_xxx() may be very expensive and sometimes
(count(distinct), sum(distinct), avg(distinct)) not possible.
To avoid that problem the results of calculation for these aggregate
functions are cached so that val_xxx() methods just return the calculated
value for the second and subsequent calls.
(COUNT(*) = 1) not working in SELECT inside prepared statement.
Note: the warning was introduced in 5.0 and 5.1, 4.1 is OK with the
original fix.
The problem was that in 5.0 and 5.1 clear() for group functions may
access hybrid_type member, and this member is initialized in
fix_fields().
So we should not call clear() from item cleanup() methods, as cleanup()
may be called for unfixed items.