Search conditions were evaluated using val_int(), which was wrong.
Fixing the code to use val_bool() instead.
Details:
- Adding a new item_base_t::IS_COND flag which marks Items used
as <search condition> in WHERE, HAVING, JOIN ON, CASE WHEN clauses.
The flag is at the parse time.
These expressions must be evaluated using val_bool() rather than val_int().
Note, the optimizer creates more Items which are used as search conditions.
Most of these items are not marked with IS_COND yet. This is OK for now,
but eventually these Items can also be fixed to have the flag.
- Adding a method Item::is_cond() which tests if the Item has the IS_COND flag.
- Implementing Item_cache_bool. It evaluates the cached expression using
val_bool() rather than val_int().
Overriding Type_handler_bool::Item_get_cache() to create Item_cache_bool.
- Implementing Item::save_bool_in_field(). It uses val_bool() rather than
val_int() to evaluate the expression.
- Implementing Type_handler_bool::Item_save_in_field()
using Item::save_bool_in_field().
- Fixing all Item_bool_func descendants to implement a virtual val_bool()
rather than a virtual val_int().
- To find places where val_int() should be fixed to val_bool(), a few
DBUG_ASSERT(!is_cond()) where added into val_int() implementations
of selected (most frequent) classes:
Item_field
Item_str_func
Item_datefunc
Item_timefunc
Item_datetimefunc
Item_cache_bool
Item_bool_func
Item_func_hybrid_field_type
Item_basic_constant descendants
- Fixing all places where DBUG_ASSERT() happened during an "mtr" run
to use val_bool() instead of val_int().
The `Item` class methods `get_copy()`, `build_clone()`, and `clone_item()`
face an issue where they may be defined in a descendant class
(e.g., `Item_func`) but not in a further descendant (e.g., `Item_func_child`).
This can lead to scenarios where `build_clone()`, when operating on an
instance of `Item_func_child` with a pointer to the base class (`Item`),
returns an instance of `Item_func` instead of `Item_func_child`.
Since this limitation cannot be resolved at compile time, this commit
introduces runtime type checks for the copy/clone operations.
A debug assertion will now trigger in case of a type mismatch.
`get_copy()`, `build_clone()`, and `clone_item()` are no more virtual,
but virtual `do_get_copy()`, `do_build_clone()`, and `do_clone_item()`
are added to the protected section of the class `Item`.
Additionally, const qualifiers have been added to certain methods
to enhance code reliability.
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
Functions extracting non-negative datetime components:
- YEAR(dt), EXTRACT(YEAR FROM dt)
- QUARTER(td), EXTRACT(QUARTER FROM dt)
- MONTH(dt), EXTRACT(MONTH FROM dt)
- WEEK(dt), EXTRACT(WEEK FROM dt)
- HOUR(dt),
- MINUTE(dt),
- SECOND(dt),
- MICROSECOND(dt),
- DAYOFYEAR(dt)
- EXTRACT(YEAR_MONTH FROM dt)
did not set their max_length properly, so in the DECIMAL
context they created a too small DECIMAL column, which
led to the 'Out of range value' error.
The problem is that most of these functions historically
returned the signed INT data type.
There were two simple ways to fix these functions:
1. Add +1 to max_length.
But this would also change their size in the string context
and create too long VARCHAR columns, with +1 excessive size.
2. Preserve max_length, but change the data type from INT to INT UNSIGNED.
But this would break backward compatibility.
Also, using UNSIGNED is generally not desirable,
it's better to stay with signed when possible.
This fix implements another solution, which it makes all these functions
work well in all contexts: int, decimal, string.
Fix details:
- Adding a new special class Type_handler_long_ge0 - the data type
handler for expressions which:
* should look like normal signed INT
* but which known not to return negative values
Expressions handled by Type_handler_long_ge0 store in Item::max_length
only the number of digits, without adding +1 for the sign.
- Fixing Item_extract to use Type_handler_long_ge0
for non-negative datetime components:
YEAR, YEAR_MONTH, QUARTER, MONTH, WEEK
- Adding a new abstract class Item_long_ge0_func, for functions
returning non-negative datetime components.
Item_long_ge0_func uses Type_handler_long_ge0 as the type handler.
The class hierarchy now looks as follows:
Item_long_ge0_func
Item_long_func_date_field
Item_func_to_days
Item_func_dayofmonth
Item_func_dayofyear
Item_func_quarter
Item_func_year
Item_long_func_time_field
Item_func_hour
Item_func_minute
Item_func_second
Item_func_microsecond
- Cleanup: EXTRACT(QUARTER FROM dt) created an excessive VARCHAR column
in string context. Changing its length from 2 to 1.
Item_func_tochar::check_arguments() didn't check if its arguments
each had one column. Failing to make this check and proceeding would
eventually cause either an assertion failure or the execution would
reach "MY_ASSERT_UNREACHABLE();" which would produce a crash with
a misleading stack trace.
* Fixed Item_func_tochar::check_arguments() to do the required check.
* Also fixed MY_ASSERT_UNREACHABLE() to terminate the program. Just
"executing" __builtin_unreachable() used to cause "undefined results",
which in my experience was a crash with corrupted stack trace.
Rewrite datetime comparison conditions into sargeable. For example,
YEAR(col) <= val -> col <= YEAR_END(val)
YEAR(col) < val -> col < YEAR_START(val)
YEAR(col) >= val -> col >= YEAR_START(val)
YEAR(col) > val -> col > YEAR_END(val)
YEAR(col) = val -> col BETWEEN YEAR_START(val) AND YEAR_END(val)
Do the same with DATE(col), for example:
DATE(col) <= val -> col <= DAY_END(val)
After such a rewrite index lookup on column "col" can be employed
TO_CHAR(expr, fmt)
- expr: required parameter, data/time/timestamp type expression
- fmt: optional parameter, format string, supports
YYYY/YYY/YY/RRRR/RR/MM/MON/MONTH/MI/DD/DY/HH/HH12/HH24/SS and special
characters. The default value is "YYYY-MM-DD HH24:MI:SS"
In Oracle, TO_CHAR() can also be used to convert numbers to strings, but
this is not supported. This will gave an error in this patch.
Other things:
- If format strings is a constant, it's evaluated only once and if there
is any errors in it, they are given at once and the statement will abort.
Original author: woqutech
Lots of optimizations and cleanups done as part of review
Changes:
- To detect automatic strlen() I removed the methods in String that
uses 'const char *' without a length:
- String::append(const char*)
- Binary_string(const char *str)
- String(const char *str, CHARSET_INFO *cs)
- append_for_single_quote(const char *)
All usage of append(const char*) is changed to either use
String::append(char), String::append(const char*, size_t length) or
String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
String::append()
- Added overflow argument to escape_string_for_mysql() and
escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
This was needed as most usage of the above functions never tested the
result for -1 and would have given wrong results or crashes in case
of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
Changed all Item_func::func_name()'s to func_name_cstring()'s.
The old Item_func_or_sum::func_name() is now an inline function that
returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
LEX_CSTRING.
- Changed for some functions the name argument from const char * to
to const LEX_CSTRING &:
- Item::Item_func_fix_attributes()
- Item::check_type_...()
- Type_std_attributes::agg_item_collations()
- Type_std_attributes::agg_item_set_converter()
- Type_std_attributes::agg_arg_charsets...()
- Type_handler_hybrid_field_type::aggregate_for_result()
- Type_handler_geometry::check_type_geom_or_binary()
- Type_handler::Item_func_or_sum_illegal_param()
- Predicant_to_list_comparator::add_value_skip_null()
- Predicant_to_list_comparator::add_value()
- cmp_item_row::prepare_comparators()
- cmp_item_row::aggregate_row_elements_for_comparison()
- Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
could be simplified to not use String_space(), thanks to the fixed
my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
- NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
clarify what the function really does.
- Rename of protocol function:
bool store(const char *from, CHARSET_INFO *cs) to
bool store_string_or_null(const char *from, CHARSET_INFO *cs).
This was done to both clarify the difference between this 'store' function
and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.
Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.
This was done to simplify copying of with_* flags
Other things:
- Changed Flags to C++ enums, which enables gdb to print
out bit values for the flags. This also enables compiler
errors if one tries to manipulate a non existing bit in
a variable.
- Added set_maybe_null() as a shortcut as setting the
MAYBE_NULL flags was used in a LOT of places.
- Renamed PARAM flag to SP_VAR to ensure it's not confused with persistent
statement parameters.
One should instead use Item::fixed() and Item::with_subquery()
Removed Item::is_fixed() and has_subquery() and did the following replace:
replace is_fixed() fixed() -- *.*
replace 'has_subquery()' 'with_subquery()' -- *.*
The reason for the change is that neither clang or gcc can do efficient
code when several bit fields are change at the same time or when copying
one or more bits between identical bit fields.
Updated bits explicitely with & and | is MUCH more efficient than what
current compilers can do.
- Changed order of class fields to remove dead alignment space.
- Changed bool fields in Item to bit fields.
- Used packed enum's for some fields in common classes
- Removed not used Item::rsize.
- Changed some class variables from uint/int to smaller type int's.
- Ensured that field_index is uint16 in all classes and functions. Fixed
also that we proparly compare with NO_CACHED_FIELD_INDEX when checking
if variable is not set.
- Removed checking of highest bit of unireg_check (has not been used in
a long time)
- Fixed wrong arguments to make_cond_for_table() for join_tab_idx_arg
from false to 0.
One of the result was reducing the size if class Item with ~24 bytes
Problem:
When calculatung MIN() and MAX() in a query with GROUP BY, like this:
SELECT MIN(time_expr), MAX(time_expr) FROM t1 GROUP BY i;
the code in Item_sum_min_max::update_field() erroneosly used
string format comparison, therefore '100:20:30' was considered as
smaller than '10:20:30'.
Fix:
1. Implementing low level "native" related methods in class Time:
Time::Time(const Native &native) - convert native to Time
Time::to_native(Native *to, uint decimals) - convert Time to native
The "native" binary representation for TIME is equal to
the binary data format of Field_timef, which is used to
store TIME when mysql56_temporal_format is ON (default).
2. Implementing Type_handler_time_common "native" related methods:
Type_handler_time_common::cmp_native()
Type_handler_time_common::Item_val_native_with_conversion()
Type_handler_time_common::Item_val_native_with_conversion_result()
Type_handler_time_common::Item_param_val_native()
3. Implementing missing "native representation" related methods
in Field_time and Field_timef:
Field_time::store_native()
Field_time::val_native()
Field_timef::store_native()
Field_timef::val_native()
4. Implementing missing "native" related methods in all Items
that can have the TIME data type:
Item_timefunc::val_native()
Item_name_const::val_native()
Item_time_literal::val_native()
Item_cache_time::val_native()
Item_handled_func::val_native()
5. Marking Type_handler_time_common as "native ready".
So now Item_sum_min_max::update_field() calculates
values using min_max_update_native_field(),
which uses native binary representation rather than string representation.
Before this change, only the TIMESTAMP data type used native
representation to calculate MIN() and MAX().
Benchmarks (see more details in MDEV):
This change not only fixes the wrong result, but also
makes a "SELECT .. MAX.. GROUP BY .." query faster:
# TIME(0)
CREATE TABLE t1 (id INT, time_col TIME) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
MySQL80: 0.159 sec
10.3: 0.108 sec
10.4: 0.094 sec (fixed)
# TIME(6):
CREATE TABLE t1 (id INT, time_col TIME(6)) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10.999999'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
My80: 0.154
10.3: 0.135
10.4: 0.093 (fixed)