NULLable BIGINT and INT columns in comparison
Problem: a consequence of the fix for 43668.
Some Arg_comparator inner initialization missed,
that may lead to unpredictable (wrong) comparison
results.
Fix: always properly initialize Arg_comparator
before its usage.
MySQL manual describes values of the YEAR(2) field type as follows:
values 00 - 69 mean 2000 - 2069 years and values 70 - 99 mean 1970 - 1999
years. MIN/MAX and comparison functions was comparing them as int values
thus producing wrong result.
Now the Arg_comparator class is extended with compare_year function which
performs correct comparison of the YEAR type.
The Item_sum_hybrid class now uses Item_cache and Arg_comparator objects to
correctly calculate its value.
To allow Arg_comparator to use func_name() function for Item_func and Item_sum
objects the func_name declaration is moved to the Item_result_field class.
A helper function is_owner_equal_func is added to the Arg_comparator class.
It checks whether the Arg_comparator object owner is the <=> function or not.
A helper function setup is added to the Item_sum_hybrid class. It sets up
cache item and comparator.
When values of different types are compared they're converted to a type that
allows correct comparison. This conversion is done for each comparison and
takes some time. When a constant is being compared it's possible to cache the
value after conversion to speedup comparison. In some cases (large dataset,
complex WHERE condition with many type conversions) query might be executed
7% faster.
A test case isn't provided because all changes are internal and isn't visible
outside.
The behavior of the Item_cache is changed to cache values on the first request
of cached value rather than at the moment of storing item to be cached.
A flag named value_cached is added to the Item_cache class. It's set to TRUE
when cache holds the value of the last stored item.
Function named cache_value() is added to the Item_cache class and derived classes.
This function actually caches the value of the saved item.
Item_cache_xxx::store functions now only store item to be cached and set
value_cached flag to FALSE.
Item_cache_xxx::val_xxx functions are changed to call cache_value function
prior to returning cached value if value_cached is FALSE.
The Arg_comparator::set_cmp_func function now calls cache_converted_constant
to cache constants if they need a type conversion.
The Item_cache::get_cache function is overloaded to allow setting of the
cache type.
The cache_converted_constant function is added to the Arg_comparator class.
It checks whether a value can and should be cached and if so caches it.
The SE API requires mysql to notify the storage engine that
it's going to read certain tables at the beginning of the
statement (by calling start_stmt(), store_lock() or
external_lock()).
These are typically called by the lock_tables().
However SHOW CREATE TABLE is not pre-locking the tables
because it's not expected to access the data at all.
But for some view definitions (that include comparing a
date/datetime/timestamp column to a string returning
scalar subquery) the JOIN::prepare may still access data
when materializing the scalar non-correlated subquery
in Arg_comparator::can_compare_as_dates().
Fixed by not materializing the subquery when the function
is called in a SHOW/EXPLAIN/CREATE VIEW
values return too many records
WHERE clauses with "outer_value_list NOT IN subselect" were
handled incorrectly if the outer value list contained multiple
items where at least one of these could be NULL. The first
outer record with NULL value was handled correctly, but if a
second record with NULL value existed, the optimizer would
choose to reuse the result it got on the last execution of the
subselect. This is incorrect if the outer value list has
multiple items.
The fix is to make Item_in_optimizer::val_int (in
item_cmpfunc.cc) reuse the result of the latest execution
for NULL values only if all values in the outer_value_list
are NULL.
When a query was using a DATE or DATETIME value formatted
using any other separator characters beside hyphen '-', a
query with a greater-or-equal '>=' condition matching only
the greatest value in an indexed column, the result was
empty if index range scan was employed.
The range optimizer got a new feature between 5.1.38 and
5.1.39 that changes a greater-or-equal condition to a
greater-than if the value matching that in the query was not
present in the table. But the value comparison function
compared the dates as strings instead of dates.
The bug was fixed by splitting the function
get_date_from_str in two: One part that parses and does
error checking. This function is now visible outside the
module. The old get_date_from_str now calls the new
function.
SELECT ... WHERE ... IN (NULL, ...) does full table scan,
even if the same query without the NULL uses efficient range scan.
The bugfix for the bug 18360 introduced an optimization:
if
1) all right-hand arguments of the IN function are constants
2) result types of all right argument items are compatible
enough to use the same single comparison function to
compare all of them to the left argument,
then
we can convert the right-hand list of constant items to an array
of equally-typed constant values for the further
QUICK index access etc. (see Item_func_in::fix_length_and_dec()).
The Item_null constant item objects have STRING_RESULT
result types, so, as far as Item_func_in::fix_length_and_dec()
is aware of NULLs in the right list, this improvement efficiently
optimizes IN function calls with a mixed right list of NULLs and
string constants. However, the optimization doesn't affect mixed
lists of NULLs and integers, floats etc., because there is no
unique common comparator.
New optimization has been added to ignore the result type
of NULL constants in the static analysis of mixed right-hand lists.
This is safe, because at the execution phase we care about
presence of NULLs anyway.
1. The collect_cmp_types() function has been modified to optionally
ignore NULL constants in the item list.
2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
function has been modified to work not only with in_string
vectors but with in_vectors of other types.
with gcc 4.3.2
This patch fixes a number of GCC warnings about variables used
before initialized. A new macro UNINIT_VAR() is introduced for
use in the variable declaration, and LINT_INIT() usage will be
gradually deprecated. (A workaround is used for g++, pending a
patch for a g++ bug.)
GCC warnings for unused results (attribute warn_unused_result)
for a number of system calls (present at least in later
Ubuntus, where the usual void cast trick doesn't work) are
also fixed.
The problem was that creating a DECIMAL column from a decimal
value could lead to a failed assertion as decimal values can
have a higher precision than those attached to a table. The
assert could be triggered by creating a table from a decimal
with a large (> 30) scale. Also, there was a problem in
calculating the number of digits in the integral and fractional
parts if both exceeded the maximum number of digits permitted
by the new decimal type.
The solution is to ensure that truncation procedure is executed
when deducing a DECIMAL column from a decimal value of higher
precision. If the integer part is equal to or bigger than the
maximum precision for the DECIMAL type (65), the integer part
is truncated to fit and the fractional becomes zero. Otherwise,
the fractional part is truncated to fit into the space left
after the integer part is copied.
This patch borrows code and ideas from Martin Hansson's patch.
Using DECIMAL constants with more than 65 digits in CREATE
TABLE ... SELECT led to bogus errors in release builds or
assertion failures in debug builds.
The problem was in inconsistency in how DECIMAL constants and
fields are handled internally. We allow arbitrarily long
DECIMAL constants, whereas DECIMAL(M,D) columns are limited to
M<=65 and D<=30. my_decimal_precision_to_length() was used in
both Item and Field code and truncated precision to
DECIMAL_MAX_PRECISION when calculating value length without
adjusting precision and decimals. As a result, a DECIMAL
constant with more than 65 digits ended up having length less
than precision or decimals which led to assertion failures.
Fixed by modifying my_decimal_precision_to_length() so that
precision is truncated to DECIMAL_MAX_PRECISION only for Field
object which is indicated by the new 'truncate' parameter.
Another inconsistency fixed by this patch is how DECIMAL
constants and expressions are handled for CREATE ... SELECT.
create_tmp_field_from_item() (which is used for constants) was
changed as a part of the bugfix for bug #24907 to handle long
DECIMAL constants gracefully. Item_func::tmp_table_field()
(which is used for expressions) on the other hand was still
using a simplistic approach when creating a Field_new_decimal
from a DECIMAL expression.
with gcc 4.3.2
Compiling MySQL with gcc 4.3.2 and later produces a number of
warnings, many of which are new with the recent compiler
versions.
This bug will be resolved in more than one patch to limit the
size of changesets. This is the first patch, fixing a number
of the warnings, predominantly "suggest using parentheses
around && in ||", and empty for and while bodies.
with gcc 4.3.2
Compiling MySQL with gcc 4.3.2 and later produces a number of
warnings, many of which are new with the recent compiler
versions.
This bug will be resolved in more than one patch to limit the
size of changesets. This is the first patch, fixing a number
of the warnings, predominantly "suggest using parentheses
around && in ||", and empty for and while bodies.
HAVING
When calculating GROUP BY the server caches some expressions. It does
that by allocating a string slot (Item_copy_string) and assigning the
value of the expression to it. This effectively means that the result
type of the expression can be changed from whatever it was to a string.
As this substitution takes place after the compile-time result type
calculation for IN but before the run-time type calculations,
it causes the type calculations in the IN function done at run time
to get unexpected results different from what was prepared at compile time.
In the CASE ... WHEN ... THEN ... statement there was a similar problem
and it was solved by artificially adding a STRING argument to the set of
types of the IN/CASE arguments at compile time, so if any of the
arguments of the CASE function changes its type to a string it will
still be covered by the information prepared at compile time.
and HAVING
When calculating GROUP BY the server caches some expressions. It does
that by allocating a string slot (Item_copy_string) and assigning the
value of the expression to it. This effectively means that the result
type of the expression can be changed from whatever it was to a string.
As this substitution takes place after the compile-time result type
calculation for IN but before the run-time type calculations,
it causes the type calculations in the IN function done at run time
to get unexpected results different from what was prepared at compile time.
In the CASE ... WHEN ... THEN ... statement there was a similar problem
and it was solved by artificially adding a STRING argument to the matrix
at compile time, so if any of the arguments of the CASE function changes
its type to a string it will still be covered by the information prepared
at compile time.
Extended the CASE fix for cover the IN case.
An alternative way of fixing this problem is by caching the result type of
the arguments at compile time and using the cached information at run time
instead of re-calculating the result types.
Preferred the CASE approach for uniformity and fix localization.
In case of ROW item each compared pair does not
check if argumet collations can be aggregated and
thus appropiriate item conversion does not happen.
The fix is to add the check and convertion for ROW
pairs.
Item_in_optimizer::is_null() evaluated "NULL IN (SELECT ...)" to NULL regardless of
whether subquery produced any records, this was a documented limitation.
The limitation has been removed (see bugs 8804, 24085, 24127) now
Item_in_optimizer::val_int() correctly handles all cases with NULLs. Make
Item_in_optimizer::is_null() invoke val_int() to return correct values for
"NULL IN (SELECT ...)".
Execution of queries containing the CASE function of
aggregate function like in "SELECT ... CASE ARGV(...) WHEN ..."
crashed the server.
The CASE function caches pointers to concrete comparison
functions for an each pair of types of CASE-WHERE clause
parameters, i.e. for the "CASE INT_RESULT WHERE REAL_RESULT
THEN ... WHERE DECIMAL_RESULT ... END" function call it
caches comparisons for INT_RESULT with REAL_RESULT and
for INT_RESULT with DECIMAL_RESULT. Usually a result
type is known after a call to the fix_fields function,
however, the setup_copy_fields function call may
wrap aggregate items with Item_copy_string that has
STRING_RESULT result type, so setup_copy_fields may
change argument result types of the CASE function after
call to Item_func_case::fix_fields/fix_length_and_dec.
Then the Item_func_case::find_item function tries to
use comparison function for unexpected pair of the
STRING_RESULT and some other type - that caused
an assertion failure of server crash.
The Item_func_case::fix_length_and_dec function has
been modified to take into account possible STRING_RESULT
result type in the presence of aggregate arguments of
the CASE function.
IF(..., CAST(longtext AS UNSIGNED), signed_val)
(was: LEFT JOIN on inline view crashes server)
Select from a LONGTEXT column wrapped with an expression
like "IF(..., CAST(longtext_column AS UNSIGNED), smth_signed)"
failed an assertion or crashed the server. IFNULL function was
affected too.
LONGTEXT column item has a maximum length of 32^2-1 bytes,
at the same time this is a maximum possible length of any
MySQL item. CAST(longtext_column AS UNSIGNED) returns some
unsigned numeric result of length 32^2-1, so the result of
IF/IFNULL function of this number and some other signed number
will have text length of (32^2-1)+1=32^2 (one byte for the
minus sign) - there is integer overflow, and the length is
equal to zero. That caused assert/crash.
The bug has been fixed by the same solution as in the CASE
function implementation.
Field_varstring::store
The code that temporary saved the bitmaps of the read set and the write set so that
it can set it to all columns for debug purposes was not expecting that the
table->read_set and table->write_set can be the same. And was always saving both in
sequence.
As a result the original value was never restored.
Fixed by saving & restoring the original value only once if the two sets are the
same (in a special set of functions).
We pretended that TIMEDIFF() would always return positive results;
this gave strange results in comparisons of the TIMEDIFF(low,hi)<TIME(0)
type that rendered a negative result, but still gave false in comparison.
We also inadvertantly dropped the sign when converting times to
decimal.
CAST(time AS DECIMAL) handles signs of the times correctly.
TIMEDIFF() marked up as signed. Time/date comparison code switched to
signed for clarity.
The convert_constant_item function converts a constant to integer using
field for condition like 'field = a_constant'. In some cases the
convert_constant_item is called for a subquery when outer select is already
being executed, so convert_constant_item saves field's value to prevent its
corruption. For EXPLAIN and at the prepare phase field's value isn't
initialized yet, thus when convert_constant_item tries to restore saved
value it fails assertion.
Now the convert_constant_item doesn't save/restore field's value if it's
haven't been read yet. Outer constant values are always saved.
The convert_constant_item function converts a constant to integer using
field for condition like 'field = a_constant'. When the convert_constant_item
is called for a subquery the outer select is already being executed, so
convert_constant_item saves field's value to prevent its corruption.
For EXPLAIN field's value isn't initialized thus when convert_constant_item
tries to restore saved value it fails assertion.
Now the convert_constant_item doesn't save/restore field's value
for EXPLAIN.
min() and max() functions are implemented in MySQL as macros.
This means that max(a,b) is expanded to: ((a) > (b) ? (a) : (b))
Note how 'a' is quoted two times.
Now imagine 'a' is a recursive function call that's several 10s of levels deep.
And the recursive function does max() with a function arg as well to dive into
recursion.
This means that simple function call can take most of the clock time.
Identified and fixed several such calls to max()/min() : including the IF()
sql function implementation.
and value-list
The server returns unexpected results if a right side of the
NOT IN clause consists of NULL value and some constants of
the same type, for example:
SELECT * FROM t WHERE NOT t.id IN (NULL, 1, 2)
may return 3, 4, 5 etc if a table contains these values.
The Item_func_in::val_int method has been modified:
unnecessary resets of an Item_func_case::has_null field
value has been moved outside of an argument comparison
loop. (Also unnecessary re-initialization of the null_value
field has been moved).