Valgrind warning happens due to early null values check
in Item_func_in::fix_length_and_dec(before item evaluation).
As result null value items with uninitialized values are
placed into array and it leads to valgrind warnings during
value array sorting.
The fix is to check null value after item evaluation, item
is evaluated in in_array::set() method.
Incorrect handling of NULL arguments could lead to a crash on
the IN or CASE operations when either NULL arguments were
passed explicitly as arguments (IN) or implicitly generated by
the WITH ROLLUP modifier (both IN and CASE).
Item_func_case::find_item() assumed all necessary comparators
to be instantiated in fix_length_and_dec(). However, in the
presence of WITH ROLLUP modifier, arguments could be
substituted with an Item_null leading to an "unexpected"
STRING_RESULT comparator being invoked.
In addition to the problem identical to the above,
Item_func_in::val_int() could crash even with explicitly passed
NULL arguments due to an optimization in fix_length_and_dec()
leading to NULL arguments being ignored during comparators
creation.
SELECT ... WHERE ... IN (NULL, ...) does full table scan,
even if the same query without the NULL uses efficient range scan.
The bugfix for the bug 18360 introduced an optimization:
if
1) all right-hand arguments of the IN function are constants
2) result types of all right argument items are compatible
enough to use the same single comparison function to
compare all of them to the left argument,
then
we can convert the right-hand list of constant items to an array
of equally-typed constant values for the further
QUICK index access etc. (see Item_func_in::fix_length_and_dec()).
The Item_null constant item objects have STRING_RESULT
result types, so, as far as Item_func_in::fix_length_and_dec()
is aware of NULLs in the right list, this improvement efficiently
optimizes IN function calls with a mixed right list of NULLs and
string constants. However, the optimization doesn't affect mixed
lists of NULLs and integers, floats etc., because there is no
unique common comparator.
New optimization has been added to ignore the result type
of NULL constants in the static analysis of mixed right-hand lists.
This is safe, because at the execution phase we care about
presence of NULLs anyway.
1. The collect_cmp_types() function has been modified to optionally
ignore NULL constants in the item list.
2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
function has been modified to work not only with in_string
vectors but with in_vectors of other types.
HAVING
When calculating GROUP BY the server caches some expressions. It does
that by allocating a string slot (Item_copy_string) and assigning the
value of the expression to it. This effectively means that the result
type of the expression can be changed from whatever it was to a string.
As this substitution takes place after the compile-time result type
calculation for IN but before the run-time type calculations,
it causes the type calculations in the IN function done at run time
to get unexpected results different from what was prepared at compile time.
In the CASE ... WHEN ... THEN ... statement there was a similar problem
and it was solved by artificially adding a STRING argument to the set of
types of the IN/CASE arguments at compile time, so if any of the
arguments of the CASE function changes its type to a string it will
still be covered by the information prepared at compile time.
and HAVING
When calculating GROUP BY the server caches some expressions. It does
that by allocating a string slot (Item_copy_string) and assigning the
value of the expression to it. This effectively means that the result
type of the expression can be changed from whatever it was to a string.
As this substitution takes place after the compile-time result type
calculation for IN but before the run-time type calculations,
it causes the type calculations in the IN function done at run time
to get unexpected results different from what was prepared at compile time.
In the CASE ... WHEN ... THEN ... statement there was a similar problem
and it was solved by artificially adding a STRING argument to the matrix
at compile time, so if any of the arguments of the CASE function changes
its type to a string it will still be covered by the information prepared
at compile time.
Extended the CASE fix for cover the IN case.
An alternative way of fixing this problem is by caching the result type of
the arguments at compile time and using the cached information at run time
instead of re-calculating the result types.
Preferred the CASE approach for uniformity and fix localization.
Execution of queries containing the CASE function of
aggregate function like in "SELECT ... CASE ARGV(...) WHEN ..."
crashed the server.
The CASE function caches pointers to concrete comparison
functions for an each pair of types of CASE-WHERE clause
parameters, i.e. for the "CASE INT_RESULT WHERE REAL_RESULT
THEN ... WHERE DECIMAL_RESULT ... END" function call it
caches comparisons for INT_RESULT with REAL_RESULT and
for INT_RESULT with DECIMAL_RESULT. Usually a result
type is known after a call to the fix_fields function,
however, the setup_copy_fields function call may
wrap aggregate items with Item_copy_string that has
STRING_RESULT result type, so setup_copy_fields may
change argument result types of the CASE function after
call to Item_func_case::fix_fields/fix_length_and_dec.
Then the Item_func_case::find_item function tries to
use comparison function for unexpected pair of the
STRING_RESULT and some other type - that caused
an assertion failure of server crash.
The Item_func_case::fix_length_and_dec function has
been modified to take into account possible STRING_RESULT
result type in the presence of aggregate arguments of
the CASE function.
and value-list
The server returns unexpected results if a right side of the
NOT IN clause consists of NULL value and some constants of
the same type, for example:
SELECT * FROM t WHERE NOT t.id IN (NULL, 1, 2)
may return 3, 4, 5 etc if a table contains these values.
The Item_func_in::val_int method has been modified:
unnecessary resets of an Item_func_case::has_null field
value has been moved outside of an argument comparison
loop. (Also unnecessary re-initialization of the null_value
field has been moved).
The `SELECT col FROM t WHERE col NOT IN (col, ...) GROUP BY col'
crashed in the range optimizer.
The get_func_mm_tree function has been modified to check the
Item_func_in::array field for the NULL value before using of that
value.
Problem: we may get unexpected results comparing [u]longlong values as doubles.
Fix: adjust the test to use integer comparators.
Note: it's not a real fix, we have to implement some new comparators
to completely solve the original problem (see my comment in the bug report).
of its argument happened to be a decimal expression returning
the NULL value.
The crash was due to the fact the function in_decimal::set did
not take into account that val_decimal() could return 0 if
the decimal expression had been evaluated to NULL.
Several problems here :
1. The conversion to double of an hex string const item
was not taking into account the unsigned flag.
2. IN was not behaving in the same was way as comparisons
when performed over an INT/DATE/DATETIME/TIMESTAMP column
and a constant. The ordinary comparisons in that case
convert the constant to an INTEGER value and do int
comparisons. Fixed the IN to do the same.
3. IN is not taking into account the unsigned flag when
calculating <expr> IN (<int_const1>, <int_const2>, ...).
Extended the implementation of IN to store and process
the unsigned flag for its arguments.
When checking if an IN predicate can be evaluated using a key
the optimizer makes sure that all the arguments of IN are of
the same result type. To assure that it check whether
Item_func_in::array is filled in.
However Item_func_in::array is set if the types are
the same AND all the arguments are compile time constants.
Fixed by introducing Item_func_in::arg_types_compatible
flag to allow correct checking of the desired condition.
The optimizer needs to evaluate whether predicates are better
evaluated using an index. IN is one such predicate.
To qualify an IN predicate must involve a field of the index
on the left and constant arguments on the right.
However whether an expression is a constant can be determined only
by knowing the preceding tables in the join order.
Assuming that only IN predicates with expressions on the right that
are constant for the whole query qualify limits the scope of
possible optimizations of the IN predicate (more specifically it
doesn't allow the "Range checked for each record" optimization for
such an IN predicate.
Fixed by not pre-determining the optimizability of the IN predicate
in the case when all right IN operands are not SQL constant expressions
The problem was that some functions (namely IN() starting with 4.1, and
CHAR() starting with 5.0) were returning NULL in certain conditions,
while they didn't set their maybe_null flag. Because of that there could
be some problems with 'IS NULL' check, and statements that depend on the
function value domain, like CREATE TABLE t1 SELECT 1 IN (2, NULL);.
The fix is to set maybe_null correctly.
result
The IN function aggregates result types of all expressions. It uses that
type in comparison of left expression and expressions in right part.
This approach works in most cases. But let's consider the case when the
right part contains both strings and integers. In that case this approach may
cause wrong results because all strings which do not start with a digit are
evaluated as 0.
CASE uses the same approach when a CASE expression is given thus it's also
affected.
The idea behind this fix is to make IN function to compare expressions with
different result types differently. For example a string in the left
part will be compared as string with strings specified in right part and
will be converted to real for comparison to int or real items in the right
part.
A new function called collect_cmp_types() is added. It collects different
result types for comparison of first item in the provided list with each
other item in the list.
The Item_func_in class now can refer up to 5 cmp_item objects: 1 for each
result type for comparison purposes. cmp_item objects are allocated according
to found result types. The comparison of the left expression with any
right part expression is now based only on result types of these expressions.
The Item_func_case class is modified in the similar way when a CASE
expression is specified. Now it can allocate up to 5 cmp_item objects
to compare CASE expression with WHEN expressions of different types.
The comparison of the CASE expression with any WHEN expression now based only
on result types of these expressions.
The IN() function uses agg_cmp_type() to aggregate all types of its arguments
to find out some common type for comparisons. In this particular case the
char() and the int was aggregated to double because char() can contain values
like '1.5'. But all strings which do not start from a digit are converted to
0. thus 'a' and 'z' become equal.
This behaviour is reasonable when all function arguments are constants. But
when there is a field or an expression this can lead to false comparisons. In
this case it makes more sense to coerce constants to the type of the field
argument.
The agg_cmp_type() function now aggregates types of constant and non-constant
items separately. If some non-constant items will be found then their
aggregated type will be returned. Thus after the aggregation constants will be
coerced to the aggregated type.
- When manually constructing a SEL_TREE for "t.key NOT IN(...)", take into account that
get_mm_parts may return a tree with type SEL_TREE::IMPOSSIBLE
- Added missing OOM checks
- Added comments
too much memory. Instead, either create the equvalent SEL_TREE manually, or create only two ranges that
strictly include the area to scan
(Note: just to re-iterate: increasing NOT_IN_IGNORE_THRESHOLD will make optimization run slower for big
IN-lists, but the server will not run out of memory. O(N^2) memory use has been eliminated)
new file
mysql_fix_privilege_tables.sql, mysql_create_system_tables.sh:
Adding true BINARY/VARBINARY: fixing "password" type, not to be 0x00-padding.
Many files:
Adding true BINARY/VARBINARY: fixing tests not to output 0x00 bytes.
Adding true BINARY/VARBINARY: new pad_char structure member.
ctype-bin.c:
Adding true BINARY/VARBINARY: new pad_char structure member.
New strnxfrm, with two trailing length bytes.
field.cc:
Adding true BINARY/VARBINARY.
Fixed bug #11885.
sql_select.cc:
Fixed bug #11885.
Predicates of the forms 'a IN (v)' 'a NOT IN (v)' now
is replaced by 'a=v' and 'a<>v' at the parsing stage.
sql_yacc.yy:
Fixed bug #11885.
Predicates of the forms 'a IN (v)' 'a NOT IN (v)' now
is replaced by 'a=v' and 'a<>v' at the parsing stage.
Bug#7834 Illegal mix of collations in IN operator
IN was the first function supporting
character set convertion.
agg_arg_charsets() was written afterwards,
which is more flexible.
Now IN just reuses this function.
Added a case for bug #6365.
item_cmpfunc.cc:
Fixed bug #6365 : Server crashed when list of values
in IN predicate contains NULL while the tested field is
of the character type and not of the default set;
e.g. when f in 'f IN (NULL,'aa') belongs to binary
character set, while the default character set is latin1.
Added more DBUG statements
Ensure that we are comparing end space with BINARY strings
Use 'any_db' instead of '' to mean any database. (For HANDLER command)
Only strip ' ' when comparing CHAR, not other space-like characters (like \t)