Base Tables
The type inferrence of a view column caused the result to be
interpreted as the wrong type: DATE colums were interpreted
as TIME and TIME as DATETIME. This happened because view
columns are represented by Item_ref objects as opposed to
Item_field's. Item_ref had no method for retrieving a TIME
value and thus was forced to depend on the default
implementation for any expression, which caused the
expression to be evaluated as a string and then parsed into
a TIME/DATETIME value.
Fixed by letting Item_ref classes forward the request for a
TIME value to the referred Item - which is a field in this
case - this reads the TIME value directly without
conversion.
SunStudio
SunStudio compilers of late warn about methods that might hide
methods in base classes due to the use of overloading combined
with overriding. SunStudio also warns about variables defined
in local socpe or method arguments that have the same name as
a member attribute of the class.
This patch renames methods that might hide base class methods,
to make it easier both for humans and compilers to see what is
actually called. It also renames variables in local scope.
The problem is that Item_direct_view_ref which is inherited
from Item_ident updates orig_table_name and table_name with
the same values. The fix is introduction of new constructor
into Item_ident and up which updates orig_table_name and
table_name separately.
Grouping by a subquery in a query with a distinct aggregate
function lead to a wrong result (wrong and unordered
grouping values).
There are two related problems:
1) The query like this:
SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c
FROM t1 GROUP BY aa
returned wrong result, because the outer reference "t1.a"
in the subquery was substituted with the Item_ref item.
The Item_ref item obtains data from the result_field object
that refreshes once after the end of each group. This data
is not applicable to filesort since filesort() doesn't care
about groups (and doesn't update result_field objects with
copy_fields() and so on). Also that data is not applicable
to group separation algorithm: end_send_group() checks every
record with test_if_group_changed() that evaluates Item_ref
items, but it refreshes those Item_ref-s only after the end
of group, that is a vicious circle and the grouped column
values in the output are shifted.
Fix: if
a) we grouping by a subquery and
b) that subquery has outer references to FROM list
of the grouping query,
then we substitute these outer references with
Item_direct_ref like references under aggregate
functions: Item_direct_ref obtains data directly
from the current record.
2) The query with a non-trivial grouping expression like:
SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c
FROM t1 GROUP BY aa+0
also returned wrong result, since JOIN::exec() substitutes
references to top-level aliases in SELECT list with Item_copy
caching items. Item_copy items have same refreshing policy
as Item_ref items, so the whole groping expression with
Item_copy inside returns wrong result in filesort() and
end_send_group().
Fix: include aliased items into GROUP BY item tree instead
of Item_ref references to them.
MySQL handles the join syntax "JOIN ... USING( field1,
... )" and natural joins by building the same parse tree as
a corresponding join with an "ON t1.field1 = t2.field1 ..."
expression would produce. This parse tree was not cleaned up
properly in the following scenario. If a thread tries to
lock some tables and finds that the tables were dropped and
re-created while waiting for the lock, it cleans up column
references in the statement by means a per-statement free
list. But if the statement was part of a stored procedure,
column references on the stored procedure's free list weren't
cleaned up and thus contained pointers to freed objects.
Fixed by adding a call to clean up the current prepared
statement's free list.
Several problems fixed :
1. Non constant expressions in UNION ... ORDER BY were not correctly cleaned up
in st_select_lex_unit::cleanup() causing crashes in EXPLAIN EXTENDED because of
fields quoted by these expressions pointing to the already freed temporary table
used to calculate the UNION.
Fixed by correctly cleaning up expressions of any depth.
2. Subqueries in the order by part of UNION ... ORDER BY ... caused a crash in
EXPLAIN EXTENDED because of a transformation attempt made during EXPLAIN EXTENDED
execution. Fixed by not doing the transformation when in EXPLAIN.
3. Fulltext functions caused crash when in the ORDER BY part of an un-parenthesized
UNION that gets "promoted" to be valid for the whole union, e.g.
SELECT * FROM t1 UNION SELECT * FROM t2 ORDER BY MATCHES (a) AGAINST ('abc' IN BOOLEAN MODE).
This is a case that demonstrates a more general problem of parts of the query being
moved to another level. When doing such transformation late in the optimization run
when most of the flags about the contents of the query are already aggregated it's possible
to "split" the flags so that they correctly reflect the new queries after the transformation.
In specific the ST_SELECT_LEX::ftfunc_list is holding all the free text function for all the
parts of the second SELECT in the UNION and we don't know what part of that is in the ORDER BY
that we're to move to the UNION level and what part is about the other parts of the second SELECT.
Fixed by throwing and error when such statements are about to be processed by adding a check
for the presence of MATCH() inside the ORDER BY clause that's going to get promoted to UNION.
To workaround this new limitation one must parenthesize the UNION SELECTs and provide a real
global ORDER BY for the UNION outside of the parenthesis.
timestamp primary key
Since TIMESTAMP values are adjusted by the current time zone
settings in both numeric and string contexts, using any
expressions involving TIMESTAMP values as a
(sub)partitioning function leads to undeterministic behavior of
partitioned tables. The effect may vary depending on a storage
engine, it can be either incorrect data being retrieved or
stored, or an assertion failure. The root cause of this is the
fact that the calculated partition ID may differ from a
previously calculated ID for the same data due to timezone
adjustments of the partitioning expression value.
Fixed by disabling any expressions involving TIMESTAMP values
to be used in partitioning functions with the follwing two
exceptions:
1. Creating or altering into a partitioned table that violates
the above rule is not allowed, but opening existing such tables
results in a warning rather than an error so that such tables
could be fixed.
2. UNIX_TIMESTAMP() is the only way to get a
timezone-independent value from a TIMESTAMP column, because it
returns the internal representation (a time_t value) of a
TIMESTAMP argument verbatim. So UNIX_TIMESTAMP(timestamp_column)
is allowed and should be used to fix existing tables if one
wants to use TIMESTAMP columns with partitioning.
MySQL manual describes values of the YEAR(2) field type as follows:
values 00 - 69 mean 2000 - 2069 years and values 70 - 99 mean 1970 - 1999
years. MIN/MAX and comparison functions was comparing them as int values
thus producing wrong result.
Now the Arg_comparator class is extended with compare_year function which
performs correct comparison of the YEAR type.
The Item_sum_hybrid class now uses Item_cache and Arg_comparator objects to
correctly calculate its value.
To allow Arg_comparator to use func_name() function for Item_func and Item_sum
objects the func_name declaration is moved to the Item_result_field class.
A helper function is_owner_equal_func is added to the Arg_comparator class.
It checks whether the Arg_comparator object owner is the <=> function or not.
A helper function setup is added to the Item_sum_hybrid class. It sets up
cache item and comparator.
When values of different types are compared they're converted to a type that
allows correct comparison. This conversion is done for each comparison and
takes some time. When a constant is being compared it's possible to cache the
value after conversion to speedup comparison. In some cases (large dataset,
complex WHERE condition with many type conversions) query might be executed
7% faster.
A test case isn't provided because all changes are internal and isn't visible
outside.
The behavior of the Item_cache is changed to cache values on the first request
of cached value rather than at the moment of storing item to be cached.
A flag named value_cached is added to the Item_cache class. It's set to TRUE
when cache holds the value of the last stored item.
Function named cache_value() is added to the Item_cache class and derived classes.
This function actually caches the value of the saved item.
Item_cache_xxx::store functions now only store item to be cached and set
value_cached flag to FALSE.
Item_cache_xxx::val_xxx functions are changed to call cache_value function
prior to returning cached value if value_cached is FALSE.
The Arg_comparator::set_cmp_func function now calls cache_converted_constant
to cache constants if they need a type conversion.
The Item_cache::get_cache function is overloaded to allow setting of the
cache type.
The cache_converted_constant function is added to the Arg_comparator class.
It checks whether a value can and should be cached and if so caches it.
When a query was using a DATE or DATETIME value formatted
using any other separator characters beside hyphen '-', a
query with a greater-or-equal '>=' condition matching only
the greatest value in an indexed column, the result was
empty if index range scan was employed.
The range optimizer got a new feature between 5.1.38 and
5.1.39 that changes a greater-or-equal condition to a
greater-than if the value matching that in the query was not
present in the table. But the value comparison function
compared the dates as strings instead of dates.
The bug was fixed by splitting the function
get_date_from_str in two: One part that parses and does
error checking. This function is now visible outside the
module. The old get_date_from_str now calls the new
function.
Problem was that the partition containing NULL values
was pruned away, since '2001-01-01' < '2001-02-00' but
TO_DAYS('2001-02-00') is NULL.
Added the NULL partition for RANGE/LIST partitioning on TO_DAYS()
function to be scanned too.
Also fixed a bug that added ALLOW_INVALID_DATES to sql_mode
(SELECT * FROM t WHERE date_col < '1999-99-99' on a RANGE/LIST
partitioned table would add it).
There were a problem since pruning uses the field
for comparison (while evaluate_join_record uses longlong),
resulting in pruning failures when comparing DATE to DATETIME.
Fix was to always comparing DATE vs DATETIME as DATETIME,
by adding ' 00:00:00' to the DATE string.
And adding optimization for comparing with 23:59:59, so that
DATETIME_col > '2001-02-03 23:59:59' ->
TO_DAYS(DATETIME_col) > TO_DAYS('2001-02-03 23:59:59') instead
of '>='.
The problem was that creating a DECIMAL column from a decimal
value could lead to a failed assertion as decimal values can
have a higher precision than those attached to a table. The
assert could be triggered by creating a table from a decimal
with a large (> 30) scale. Also, there was a problem in
calculating the number of digits in the integral and fractional
parts if both exceeded the maximum number of digits permitted
by the new decimal type.
The solution is to ensure that truncation procedure is executed
when deducing a DECIMAL column from a decimal value of higher
precision. If the integer part is equal to or bigger than the
maximum precision for the DECIMAL type (65), the integer part
is truncated to fit and the fractional becomes zero. Otherwise,
the fractional part is truncated to fit into the space left
after the integer part is copied.
This patch borrows code and ideas from Martin Hansson's patch.
HAVING
When calculating GROUP BY the server caches some expressions. It does
that by allocating a string slot (Item_copy_string) and assigning the
value of the expression to it. This effectively means that the result
type of the expression can be changed from whatever it was to a string.
As this substitution takes place after the compile-time result type
calculation for IN but before the run-time type calculations,
it causes the type calculations in the IN function done at run time
to get unexpected results different from what was prepared at compile time.
In the CASE ... WHEN ... THEN ... statement there was a similar problem
and it was solved by artificially adding a STRING argument to the set of
types of the IN/CASE arguments at compile time, so if any of the
arguments of the CASE function changes its type to a string it will
still be covered by the information prepared at compile time.
In case of ROW item each compared pair does not
check if argumet collations can be aggregated and
thus appropiriate item conversion does not happen.
The fix is to add the check and convertion for ROW
pairs.
- Remove bothersome warning messages. This change focuses on the warnings
that are covered by the ignore file: support-files/compiler_warnings.supp.
- Strings are guaranteed to be max uint in length
IS NULL was not checking the correct row in a HAVING context.
At the first row of a new group (where the HAVING clause is evaluated)
the column and SELECT list references in the HAVING clause should
refer to the last row of the previous group and not to the current one.
This was not done for IS NULL, because it was using Item::is_null() doesn't
have a Item_is_null_result() counterpart to access the data from the
last row of the previous group. Note that all the Item::val_xxx() functions
(e.g. Item::val_int()) have their _result counterparts (e.g. Item::val_int_result()).
Fixed by implementing a is_null_result() (similarly to int_result()) and
calling this instead of is_null() column and SELECT list references inside
the HAVING clause.
The code to get read the value of a system variable was extracting its value
on PREPARE stage and was substituting the value (as a constant) into the parse tree.
Note that this must be a reversible transformation, i.e. it must be reversed before
each re-execution.
Unfortunately this cannot be reliably done using the current code, because there are
other non-reversible source tree transformations that can interfere with this
reversible transformation.
Fixed by not resolving the value at PREPARE, but at EXECUTE (as the rest of the
functions operate). Added a cache of the value (so that it's constant throughout
the execution of the query). Note that the cache also caches NULL values.
Updated an obsolete related test suite (variables-big) and the code to test the
result type of system variables (as per bug 74).
The optimizer pulls up aggregate functions which should be aggregated in
an outer select. At some point it may substitute such a function for a field
in the temporary table. The setup_copy_fields function doesn't take this
into account and may overrun the copy_field buffer.
Fixed by filtering out the fields referenced through the specialized
reference for aggregates (Item_aggregate_ref).
Added an assertion to make sure bugs that cause similar discrepancy
don't go undetected.
Length value is the length of the field,
Max_length is the length of the field value.
So Max_length can not be more than Length.
The fix: fixed calculation of the Item_empty_string item length
(Patch applied and queued on demand of Trudy/Davi.)
This fix is for 5.0 only : back porting the 6.0 patch manually
The parser code in sql/sql_yacc.yy needs to be more robust to out of
memory conditions, so that when parsing a query fails due to OOM,
the thread gracefully returns an error.
Before this fix, a new/alloc returning NULL could:
- cause a crash, if dereferencing the NULL pointer,
- produce a corrupted parsed tree, containing NULL nodes,
- alter the semantic of a query, by silently dropping token values or nodes
With this fix:
- C++ constructors are *not* executed with a NULL "this" pointer
when operator new fails.
This is achieved by declaring "operator new" with a "throw ()" clause,
so that a failed new gracefully returns NULL on OOM conditions.
- calls to new/alloc are tested for a NULL result,
- The thread diagnostic area is set to an error status when OOM occurs.
This ensures that a request failing in the server properly returns an
ER_OUT_OF_RESOURCES error to the client.
- OOM conditions cause the parser to stop immediately (MYSQL_YYABORT).
This prevents causing further crashes when using a partially built parsed
tree in further rules in the parser.
No test scripts are provided, since automating OOM failures is not
instrumented in the server.
Tested under the debugger, to verify that an error in alloc_root cause the
thread to returns gracefully all the way to the client application, with
an ER_OUT_OF_RESOURCES error.
WL#4165 Prepared statements: validation
WL#4166 Prepared statements: automatic re-prepare
Fixes
Bug#27430 Crash in subquery code when in PS and table DDL changed after PREPARE
Bug#27690 Re-execution of prepared statement after table was replaced with a view crashes
Bug#27420 A combination of PS and view operations cause error + assertion on shutdown
The basic idea of the patch is to keep track of table metadata between
prepared statement prepare and execute. If some table used in the statement
has changed, the prepared statement is re-prepared before execution.
See WL#4165 and WL#4166 contents and comments in the code for details
of the implementation.
Assertion `0' failed
If ROW item is a part of an expression that also has
aggregate function calls (COUNT/SUM/AVG...), a
"splitting" with an Item::split_sum_func2 function
is applied to that ROW item.
Current implementation of Item::split_sum_func2
replaces this Item_row with a newly created
Item_aggregate_ref reference to it.
Then the row cache tries to work with the
Item_aggregate_ref object as with the Item_row object:
row cache calls row-emulation methods such as cols and
element_index. Item_aggregate_ref (like it's parent
Item_ref) inherits dummy implementations of those
methods from the hierarchy root Item, and call to
them leads to failed assertions and wrong data
output.
Row-emulation virtual functions (cols, element_index, addr,
check_cols, null_inside and bring_value) of Item_ref have
been overloaded to forward calls to an underlying item
reference.
The problem is that passing anything other than a integer to a limit
clause in a prepared statement would fail. This limitation was introduced
to avoid replication problems (e.g: replicating the statement with a
string argument would cause a parse failure in the slave).
The solution is to convert arguments to the limit clause to a integer
value and use this converted value when persisting the query to the log.
NAME_CONST('whatever', -1) * MAX(whatever) bombed since -1 was
not seen as constant, but as FUNCTION_UNARY_MINUS(constant)
while we are at the same time pretending it was a basic const
item. This confused the aggregate handlers in exciting ways.
We now make NAME_CONST() behave more consistently.
between 5.0 and 5.1.
The problem was that in the patch for Bug#11986 it was decided
to store original query in UTF8 encoding for the INFORMATION_SCHEMA.
This approach however turned out to be quite difficult to implement
properly. The main problem is to preserve the same IS-output after
dump/restore.
So, the fix is to rollback to the previous functionality, but also
to fix it to support multi-character-set-queries properly. The idea
is to generate INFORMATION_SCHEMA-query from the item-tree after
parsing view declaration. The IS-query should:
- be completely in UTF8;
- not contain character set introducers.
For more information, see WL4052.