sort_buffer_size cannot allocate
The NULL return from tree_insert() (on low memory) was not
checked for in Item_func_group_concat::add(). As a result
on low memory conditions a crash happens.
Fixed by properly checking the return code.
with gcc 4.3.2
Compiling MySQL with gcc 4.3.2 and later produces a number of
warnings, many of which are new with the recent compiler
versions.
This bug will be resolved in more than one patch to limit the
size of changesets. This is the first patch, fixing a number
of the warnings, predominantly "suggest using parentheses
around && in ||", and empty for and while bodies.
The copy of the original arguments of a aggregate function was not
initialized until after fix_fields().
Sometimes (e.g. when there's an error processing the statement)
the print() can be called with no corresponding fix_fields() call.
Fixed by adding a check if the Item is fixed before using the arguments
copy.
The bug is a regression introduced by the patch for bug32798.
The code in Item_func_group_concat::clear() relied on the 'distinct'
variable to check if 'unique_filter' was initialized. That, however,
is not always valid because Item_func_group_concat::setup() can do
shortcuts in some cases w/o initializing 'unique_filter'.
Fixed by checking the value of 'unique_filter' instead of 'distinct'
before dereferencing.
Mixing aggregate functions and non-grouping columns is not allowed in the
ONLY_FULL_GROUP_BY mode. However in some cases the error wasn't thrown because
of insufficient check.
In order to check more thoroughly the new algorithm employs a list of outer
fields used in a sum function and a SELECT_LEX::full_group_by_flag.
Each non-outer field checked to find out whether it's aggregated or not and
the current select is marked accordingly.
All outer fields that are used under an aggregate function are added to the
Item_sum::outer_fields list and later checked by the Item_sum::check_sum_func
function.
returns wrong results
Casting AVG() to DECIMAL led to incorrect results when the arguments
had a non-DECIMAL type, because in this case
Item_sum_avg::val_decimal() performed the division by the number of
arguments twice.
Fixed by changing Item_sum_avg::val_decimal() to not rely on
Item_sum_sum::val_decimal(), i.e. calculate sum and divide using
DECIMAL arithmetics for DECIMAL arguments, and utilize val_real() with
subsequent conversion to DECIMAL otherwise.
Was a double-free of the Unique member of Item_func_group_concat.
This was not causing a crash because the Unique is a descendent of
Sql_alloc.
Fixed to free the Unique only if it was allocated for the instance
of Item_func_group_concat it was referenced from
suite)
Under some circumstances a combination of aggregate functions and
GROUP BY in a SELECT query over a VIEW could lead to incorrect
calculation of the result type of the aggregate function. This in
turn could result in incorrect results, or assertion failures on debug
builds.
Fixed by changing the logic in Item_sum_hybrid::fix_fields() so that
the argument's item is dereferenced before calling its type() method.
w/ Field_date instead of Field_newdate
Field_date was still used in temp table creation.
Fixed by using Field_newdate consistently throughout the server
except when reading tables defined with older MySQL version.
No test suite is possible because both Field_date and Field_newdate
return the same values in all the metadata calls.
with null values
For queries containing GROUP_CONCAT(DISTINCT fields ORDER BY fields), there
was a limitation that the DISTINCT fields had to be the same as ORDER BY
fields, owing to the fact that one single sorted tree was used for keeping
track of tuples, ordering and uniqueness. Fixed by introducing a second
structure to handle uniqueness so that the original structure has only to
order the result.
The fix is a copy of Martin Friebe's suggestion.
added testing for no_appended which will be false if anything,
including the empty string is in result
Problem: GROUP_CONCAT(DISTINCT BIT_FIELD...) uses a tree to store keys;
which are constructed using a temporary table fields,
see Item_func_group_concat::setup().
As a) we don't store null bits in the tree where the bit fields store parts
of their data and b) there's no method to properly compare two table records
we've got problem.
Fix: convert BIT fields to INT in the temporary table used.
Item_sum_distinct::setup(THD*): Assertion
There was an assertion to detect a bug in ROLLUP
implementation. However the assertion is not true
when used in a subquery context with non-cacheable
statements.
Fixed by turning the assertion to accepted case
(just like it's done for the other aggregate functions).
- The bug was caused by COUNT(DISTINCT ...) code using Unique object in
a way that assumed that BIT(N) column occupies a contiguous space in
temp_table->record[0] buffer.
- The fix is to make COUNT(DISTINCT ...) code instruct create_tmp_table to
create temporary table with column of type BIGINT, not BIT(N).
a temporary table.
The result string of the Item_func_group_concat wasn't initialized in the
copying constructor of the Item_func_group_concat class. This led to a
wrong charset of GROUP_CONCAT result when the select employs a temporary
table.
The copying constructor of the Item_func_group_concat class now correctly
initializes the charset of the result string.
query / no aggregate of subquery
The optimizer counts the aggregate functions that
appear as top level expressions (in all_fields) in
the current subquery. Later it makes a list of these
that it uses to actually execute the aggregates in
end_send_group().
That count is used in several places as a flag whether
there are aggregates functions.
While collecting the above info it must not consider
aggregates that are not aggregated in the current
context. It must treat them as normal expressions
instead. Not doing that leads to incorrect data about
the query, e.g. running a query that actually has no
aggregate functions as if it has some (and hence is
expected to return only one row).
Fixed by ignoring the aggregates that are not aggregated
in the current context.
One other smaller omission discovered and fixed in the
process : the place of aggregation was not calculated for
user defined functions. Fixed by calling
Item_sum::init_sum_func_check() and
Item_sum::check_sum_func() as it's done for the rest of
the aggregate functions.
Problem: separator was not converted to the result character set,
so the result was a mixture of two different character sets,
which was especially bad for UCS2.
Fix: convert separator to the result character set.
When using GROUP_CONCAT with ORDER BY, a tree is used for the sorting, as
opposed to normal nested loops join used when there is no ORDER BY.
The tree traversal that generates the result counts the lines that have been
cut down. (as they get cut down to the field's max_size)
But the check of that count was before the tree traversal, so no
warning was generated if the output is truncated.
Fixed by moving the check to after the tree traversal.
Validity checks for nested set functions
were not taking into account that the enclosed
set function may be on a nest level that is
lower than the nest level of the enclosing set
function.
Fixed by :
- propagating max_sum_func_level
up the enclosing set functions chain.
- updating the max_sum_func_level of the
enclosing set function when the enclosed set
function is aggregated above or on the same
nest level of as the level of the enclosing
set function.
- updating the max_arg_level of the enclosing
set function on a reference that refers to
an item above or on the same nest level
as the level of the enclosing set function.
- Treating both Item_field and Item_ref as possibly
referencing items from outer nest levels.
If a set function with a outer reference s(outer_ref) cannot be aggregated
the outer query against which the reference has been resolved then MySQL
interpretes s(outer_ref) in the same way as it would interpret s(const).
Hovever the standard requires throwing an error in this situation.
Added some code to support this requirement in ansi mode.
Corrected another minor bug in Item_sum::check_sum_func.
When creating a temporary table the concise column type
of a string expression is decided based on its length:
- if its length is under 512 it is stored as either
varchar or char.
- otherwise it is stored as a BLOB.
There is a flag (convert_blob_length) to create_tmp_field
that, when >0 allows to force creation of a varchar if the
max blob length is under convert_blob_length.
However it must be verified that convert_blob_length
(settable through a SQL option in some cases) is
under the maximum that can be stored in a varchar column.
While performing that check for expressions in
create_tmp_field_from_item the max length of the blob was
used instead. This causes blob columns to be created in the
heap temp table used by GROUP_CONCAT (where blobs must not
be created in the temp table because of the constant
convert_blob_length that is passed to create_tmp_field() ).
And since these blob columns are not expected in that place
we get wrong results.
Fixed by checking that the value of the flag variable is
in the limits that fit into VARCHAR instead of the max length
of the blob column.
from func_group.test after the patch for bug #27229 had been applied.
The memory corruption happened because in some rare cases the function
count_field_types underestimated the number of elements in
in the array param->items_to_copy.
context was used as an argument of GROUP_CONCAT.
Ensured correct setting of the depended_from field in references
generated for set functions aggregated in outer selects.
A wrong value of this field resulted in wrong maps returned by
used_tables() for these references.
Made sure that a temporary table field is added for any set function
aggregated in outer context when creation of a temporary table is
needed to execute the inner subquery.
The problem in this bug is when we create temporary tables. When
temporary tables are created for unions, there is some
inferrence being carried out regarding the type of the column.
Whenever this column type is inferred to be REAL (i.e. FLOAT or
DOUBLE), MySQL will always try to maintain exact precision, and
if that is not possible (there are hardware limits, since FLOAT
and DOUBLE are stored as approximate values) will switch to
using approximate values. The problem here is that at this point
the information about number of significant digits is not
available. Furthermore, the number of significant digits should
be increased for the AVG function, however, this was not properly
handled. There are 4 parts to the problem:
#1: DOUBLE and FLOAT fields don't display their proper display
lengths in max_display_length(). This is hard-coded as 53 for
DOUBLE and 24 for FLOAT. Now changed to instead return the
field_length.
#2: Type holders for temporary tables do not preserve the
max_length of the Item's from which they are created, and is
instead reverted to the 53 and 24 from above. This causes
*all* fields to get non-fixed significant digits.
#3: AVG function does not update max_length (display length)
when updating number of decimals.
#4: The function that switches to non-fixed number of
significant digits should use DBL_DIG + 2 or FLT_DIG + 2 as
cut-off values (Since fixed precision does not use the 'e'
notation)
Of these points, #1 is the controversial one, but this
change is preferred and has been cleared with Monty. The
function causes quite a few unit tests to blow up and they had
to b changed, but each one is annotated and motivated. We
frequently see the magical 53 and 24 give way to more relevant
numbers.
aggregated in outer context returned wrong results.
This happened only if the subquery did not contain any references
to outer fields.
As there were no references to outer fields the subquery erroneously
was taken for non-correlated one.
Now any set function aggregated in outer context makes the subquery
correlated.
To correctly decide which predicates can be evaluated with a given table
the optimizer must know the exact set of tables that a predicate depends
on. If that mask is too wide (refer to non-existing tables) the optimizer
can erroneously skip a predicate.
One such case of wrong table usage mask were the aggregate functions.
The have a all-1 mask (meaning depend on all tables, including non-existent
ones).
Fixed by making a real used_tables mask for the aggregates. The mask is
constructed in the following way :
1. OR the table dependency masks of all the arguments of the aggregate.
2. If all the arguments of the function are from the local name resolution
context and it is evaluated in the same name resolution
context where it is referenced all the tables from that name resolution
context are OR-ed to the dependency mask. This is to denote that an
aggregate function depends on the number of rows it processes.
3. Handle correctly the case of an aggregate function optimization (such that
the aggregate function can be pre-calculated and made a constant).
Made sure that an aggregate function is never a constant (unless subject of a
specific optimization and pre-calculation).
One other flaw was revealed and fixed in the process : references were
not calling the recalculation method for used_tables of their targets.
When only one row was present, the subtraction of nearly the same number
resulted in catastropic cancellation, introducing an error in the
VARIANCE calculation near 1e-15. That was sqrt()ed to get STDDEV, the
error was escallated to near 1e-8.
The simple fix of testing for a row count of 1 and forcing that to yield
0.0 is insufficient, as two rows of the same value should also have a
variance of 0.0, yet the error would be about the same.
So, this patch changes the formula that computes the VARIANCE to be one
that is not subject to catastrophic cancellation.
In addition, it now uses only (faster-than-decimal) floating point numbers
to calculate, and renders that to other types on demand.
We use val_int() calls (followed by null_value check) to determine
nullness in some Item_sum_count' and Item_sum_count_distinct' methods,
as a side effect we get extra warnings raised in the val_int().
Fix: use is_null() instead.