Various parts of code used different 'precision' arguments for sprintf("%g") when converting
floating point numbers to a string. This led to differences in results in some cases
depending on whether the text-based or prepared statements protocol is used for a query.
Fixed by changing arguments to sprintf("%g") to always be 15 (DBL_DIG) so that results are
consistent regardless of the protocol.
This patch will be null-merged to 6.0 as the problem does not exists there (fixed by the
patch for WL#2934).
ONLY_FULL_GROUP_BY
The check for non-aggregated columns in queries with aggregate function, but without
GROUP BY was treating all the parts of the query as if they are in the SELECT list.
Fixed by ignoring the non-aggregated fields in the WHERE clause.
The optimizer pulls up aggregate functions which should be aggregated in
an outer select. At some point it may substitute such a function for a field
in the temporary table. The setup_copy_fields function doesn't take this
into account and may overrun the copy_field buffer.
Fixed by filtering out the fields referenced through the specialized
reference for aggregates (Item_aggregate_ref).
Added an assertion to make sure bugs that cause similar discrepancy
don't go undetected.
returns wrong results
Casting AVG() to DECIMAL led to incorrect results when the arguments
had a non-DECIMAL type, because in this case
Item_sum_avg::val_decimal() performed the division by the number of
arguments twice.
Fixed by changing Item_sum_avg::val_decimal() to not rely on
Item_sum_sum::val_decimal(), i.e. calculate sum and divide using
DECIMAL arithmetics for DECIMAL arguments, and utilize val_real() with
subsequent conversion to DECIMAL otherwise.
When resolving references we need to take into consideration
the view "fields" and allow qualified access to them.
Fixed by extending the reference resolution to process view
fields correctly.
The HAVING clause is subject to the same rules as the SELECT list
about using aggregated and non-aggregated columns.
But this was not enforced when processing implicit grouping from
using aggregate functions.
Fixed by performing the same checks for HAVING as for SELECT.
file .\opt_sum.cc, line
The optimizer pre-calculates the MIN/MAX values for queries like
SELECT MIN(kp_k) WHERE kp_1 = const AND ... AND kp_k-1 = const
when there is a key over kp_1...kp_k
In doing so it was not checking correctly nullability and
there was a superfluous assert().
Fixed by making sure that the field can be null before checking and
taking out the wrong assert().
.
Introduced a correct check for nullability
The MIN(field) can return NULL when all the row values in the group
are NULL-able or if there were no rows.
Fixed the assertion to reflect the case when there are no rows.
Item_sum_distinct::setup(THD*): Assertion
There was an assertion to detect a bug in ROLLUP
implementation. However the assertion is not true
when used in a subquery context with non-cacheable
statements.
Fixed by turning the assertion to accepted case
(just like it's done for the other aggregate functions).
to NULL
For queries of the form SELECT MIN(key_part_k) FROM t1
WHERE key_part_1 = const and ... and key_part_k-1 = const,
the opt_sum_query optimization tries to
use an index to substitute MIN/MAX functions with their values according
to the following rules:
1) Insert the minimum non-null values where the WHERE clause still matches, or
3) A row of nulls
However, the correct semantics requires that there is a third case 2)
such that a NULL value is substituted if there are only NULL values for
key_part_k.
The patch modifies opt_sum_query() to handle this missing case.
- added join cache indication in EXPLAIN (Extra column).
- prefer filesort over full scan over
index for ORDER BY (because it's faster).
- when switching from REF to RANGE because
RANGE uses longer key turn off sort on
the head table only as the resulting
RANGE access is a candidate for join cache
and we don't want to disable it by sorting
on the first table only.
When only one row was present, the subtraction of nearly the same number
resulted in catastropic cancellation, introducing an error in the
VARIANCE calculation near 1e-15. That was sqrt()ed to get STDDEV, the
error was escallated to near 1e-8.
The simple fix of testing for a row count of 1 and forcing that to yield
0.0 is insufficient, as two rows of the same value should also have a
variance of 0.0, yet the error would be about the same.
So, this patch changes the formula that computes the VARIANCE to be one
that is not subject to catastrophic cancellation.
In addition, it now uses only (faster-than-decimal) floating point numbers
to calculate, and renders that to other types on demand.
We use val_int() calls (followed by null_value check) to determine
nullness in some Item_sum_count' and Item_sum_count_distinct' methods,
as a side effect we get extra warnings raised in the val_int().
Fix: use is_null() instead.
Item::val_xxx() may be called by the server several times at execute time
for a single query. Calls to val_xxx() may be very expensive and sometimes
(count(distinct), sum(distinct), avg(distinct)) not possible.
To avoid that problem the results of calculation for these aggregate
functions are cached so that val_xxx() methods just return the calculated
value for the second and subsequent calls.
wrong results
Mark the containing Item(s) (Item_subselect descendant usually) of
a subselect as containing aggregate functions if it has references
to aggregates functions that are calculated outside its context.
This tels end_send_group() not to make an Item_subselect descendant in
select list a copy and causes the correct value being returned.
Treat queries with no FROM and aggregate functions as normal queries,
so the aggregate function get correctly calculated as if there is 1 row.
This means that they will be considered to have one row, so COUNT(*) will return
1 instead of 0. Other aggregates will behave in compatible manner.