Problem:
When calculatung MIN() and MAX() in a query with GROUP BY, like this:
SELECT MIN(time_expr), MAX(time_expr) FROM t1 GROUP BY i;
the code in Item_sum_min_max::update_field() erroneosly used
string format comparison, therefore '100:20:30' was considered as
smaller than '10:20:30'.
Fix:
1. Implementing low level "native" related methods in class Time:
Time::Time(const Native &native) - convert native to Time
Time::to_native(Native *to, uint decimals) - convert Time to native
The "native" binary representation for TIME is equal to
the binary data format of Field_timef, which is used to
store TIME when mysql56_temporal_format is ON (default).
2. Implementing Type_handler_time_common "native" related methods:
Type_handler_time_common::cmp_native()
Type_handler_time_common::Item_val_native_with_conversion()
Type_handler_time_common::Item_val_native_with_conversion_result()
Type_handler_time_common::Item_param_val_native()
3. Implementing missing "native representation" related methods
in Field_time and Field_timef:
Field_time::store_native()
Field_time::val_native()
Field_timef::store_native()
Field_timef::val_native()
4. Implementing missing "native" related methods in all Items
that can have the TIME data type:
Item_timefunc::val_native()
Item_name_const::val_native()
Item_time_literal::val_native()
Item_cache_time::val_native()
Item_handled_func::val_native()
5. Marking Type_handler_time_common as "native ready".
So now Item_sum_min_max::update_field() calculates
values using min_max_update_native_field(),
which uses native binary representation rather than string representation.
Before this change, only the TIMESTAMP data type used native
representation to calculate MIN() and MAX().
Benchmarks (see more details in MDEV):
This change not only fixes the wrong result, but also
makes a "SELECT .. MAX.. GROUP BY .." query faster:
# TIME(0)
CREATE TABLE t1 (id INT, time_col TIME) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
MySQL80: 0.159 sec
10.3: 0.108 sec
10.4: 0.094 sec (fixed)
# TIME(6):
CREATE TABLE t1 (id INT, time_col TIME(6)) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10.999999'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
My80: 0.154
10.3: 0.135
10.4: 0.093 (fixed)
MDEV-17625 Different warnings when comparing a garbage to DATETIME vs TIME
- Splitting processes of data type conversion (to TIME/DATE,DATETIME)
and warning generation.
Warning are now only get collected during conversion (in an "int" variable),
and are pushed in the very end of conversion (not in parallel).
Warnings generated by the low level routines str_to_xxx() and number_to_xxx()
can now be changed at the end, when TIME_FUZZY_DATES is applied,
from "Invalid value" to "Truncated invalid value".
Now "Illegal value" is issued only when the low level routine returned
an error and TIME_FUZZY_DATES was not set. Otherwise, if the low level
routine returned "false" (success), or if NULL was converted to a zero
datetime by TIME_FUZZY_DATES, then "Truncated illegal value"
is issued. This gives better warnings.
- Methods Type_handler::Item_get_date() and
Type_handler::Item_func_hybrid_field_type_get_date() now only
convert and collect warning information, but do not push warnings.
- Changing the return data type for Type_handler::Item_get_date()
and Type_handler::Item_func_hybrid_field_type_get_date() from
"bool" to "void". The conversion result (success vs error) can be
checked by testing ltime->time_type. MYSQL_TIME_{NONE|ERROR}
mean mean error, other values mean success.
- Adding new wrapper methods Type_handler::Item_get_date_with_warn() and
Type_handler::Item_func_hybrid_field_type_get_date_with_warn()
to do conversion followed by raising warnings, and changing
the code to call new Type_handler::***_with_warn() methods.
- Adding a helper class Temporal::Status, a wrapper
for MYSQL_TIME_STATUS with automatic initialization.
- Adding a helper class Temporal::Warn, to collect warnings
but without actually raising them. Moving a part of ErrConv
into a separate class ErrBuff, and deriving both Temporal::Warn
and ErrConv from ErrBuff. The ErrBuff part of Temporal::Warn
is used to collect textual representation of the input data.
- Adding a helper class Temporal::Warn_push. It's used
to collect warning information during conversion, and
automatically pushes warnings to the diagnostics area
on its destructor time (in case of non-zero warning).
- Moving more code from various functions inside class Temporal.
- Adding more Temporal_hybrid constructors and
protected Temporal methods make_from_xxx(),
which convert and only collect warning information, but do not
actually raise warnings.
- Now the low level functions str_to_datetime() and str_to_time()
always set status->warning if the return value is "true" (error).
- Now the low level functions number_to_time() and number_to_datetime()
set the "*was_cut" argument if the return value is "true" (error).
- Adding a few DBUG_ASSERTs to make sure that str_to_xxx() and
number_to_xxx() always set warnings on error.
- Adding new warning flags MYSQL_TIME_WARN_EDOM and MYSQL_TIME_WARN_ZERO_DATE
for the code symmetry. Before this change there was a special
code path for (rc==true && was_cut==0) which was treated by
Field_temporal::store_invalid_with_warning as "zero date violation".
Now was_cut==0 always means that there are no any error/warnings/notes
to be raised, not matter what rc is.
- Using new Temporal_hybrid constructors in combination with
Temporal::Warn_push inside str_to_datetime_with_warn(),
double_to_datetime_with_warn(), int_to_datetime_with_warn(),
Field::get_date(), Item::get_date_from_string(), and a few other places.
- Removing methods Dec_ptr::to_datetime_with_warn(),
Year::to_time_with_warn(), my_decimal::to_datetime_with_warn(),
Dec_ptr::to_datetime_with_warn().
Fixing Sec6::to_time() and Sec6::to_datetime() to
convert and only collect warnings, without raising warnings.
Now warning raising functionality resides in Temporal::Warn_push.
- Adding classes Longlong_hybrid_null and Double_null, to
return both value and the "IS NULL" flag. Adding methods
Item::to_double_null(), to_longlong_hybrid_null(),
Item_func_hybrid_field_type::to_longlong_hybrid_null_op(),
Item_func_hybrid_field_type::to_double_null_op().
Removing separate classes VInt and VInt_op, as they
have been replaced by a single class Longlong_hybrid_null.
- Adding a helper method Temporal::type_name_by_timestamp_type(),
moving a part of make_truncated_value_warning() into it,
and reusing in Temporal::Warn::push_conversion_warnings().
- Removing Item::make_zero_date() and
Item_func_hybrid_field_type::make_zero_mysql_time().
They provided duplicate functionality.
Now this code resides in Temporal::make_fuzzy_date().
The latter is now called for all Item types when data type
conversion (to DATE/TIME/DATETIME) is involved, including
Item_field and Item_direct_view_ref.
This fixes MDEV-17563: Item_direct_view_ref now correctly converts
NULL to a zero date when TIME_FUZZY_DATES says so.
Problems:
Functions LEAST() and GREATEST() in TIME context, as well as functions
TIMESTAMP(a,b) and ADDTIME(a,b), returned confusing results when the
input TIME-alike value in a number or in a string was out of the TIME
supported range.
In case of TIMESTAMP(a,b) and ADDTIME(a,b), the second argument
value could get extra unexpected digits. For example, in:
ADDTIME('2001-01-01 00:00:00', 10000000) or
ADDTIME('2001-01-01 00:00:00', '1000:00:00')
the second argument was converted to '838:59:59.999999'
with six fractional digits, which contradicted "decimals"
previously set to 0 in fix_length_and_dec().
These unexpected fractional digits led to confusing function results.
Changes:
1. GREATEST(), LEAST()
- fixing Item_func_min_max::get_time_native()
to respect "decimals" set by fix_length_and_dec().
If a value of some numeric or string time-alike argument
goes outside of the TIME range and gets limited to '838:59:59.999999',
it's now right-truncated to the correct fractional precision.
- fixing, Type_handler_temporal_result::Item_func_min_max_fix_attributes()
to take into account arguments' time_precision() or datetime_precision(),
rather than rely on "decimals" calculated by the generic implementation
in Type_handler::Item_func_min_max_fix_attributes(). This makes
GREATEST() and LEAST() return better data types, with the same
fractional precision with what TIMESTAMP(a,b) and ADDTIME(a,b) return
for the same arguments, and with DATE(a) and TIMESTAMP(a).
2. Item_func_add_time and Item_func_timestamp
It was semantically wrong to apply the limit of the TIME data type
to the argument "b", which plays the role of "INTERVAL DAY TO SECOND" here.
Changing the code to fetch the argument "b" as INTERVAL rather than as TIME.
The low level routine calc_time_diff() now gets the interval
value without limiting to '838:59:59.999999', so in these examples:
ADDTIME('2001-01-01 00:00:00', 10000000)
ADDTIME('2001-01-01 00:00:00', '1000:00:00')
calc_time_diff() gets '1000:00:00' as is. The SQL function result
now gets limited to the supported result data type range
(datetime or time) inside calc_time_diff(), which now calculates
the return value using the real fractional digits that
came directly from the arguments (without the effect of limiting
to the TIME range), so the result does not have any unexpected
fractional digits any more.
Detailed changes in TIMESTAMP() and ADDTIME():
- Adding a new class Interval_DDhhmmssff. It's similar to Time, but:
* does not try to parse datetime format, as it's not needed for
functions TIMESTAMP() and ADDTIME().
* does not cut values to '838:59:59.999999'
The maximum supported Interval_DDhhmmssff's hard limit is
'UINT_MAX32:59:59.999999'. The maximum used soft limit is:
- '87649415:59:59.999999' (in 'hh:mm:ss.ff' format)
- '3652058 23:59:59.999999' (in 'DD hh:mm:ss.ff' format)
which is a difference between:
- TIMESTAMP'0001-01-01 00:00:00' and
- TIMESTAMP'9999-12-31 23:59:59.999999'
(the minimum datetime that supports arithmetic, and the
maximum possible datetime value).
- Fixing get_date() methods in the classes related to functions
ADDTIME(a,b) and TIMESTAMP(a,b) to use the new class Interval_DDhhmmssff
for fetching data from the second argument, instead of get_date().
- Fixing fix_length_and_dec() methods in the classes related
to functions ADDTIME(a,b) and TIMESTAMP(a,b) to use
Interval_DDhhmmssff::fsp(item) instead of item->time_precision()
to get the fractional precision of the second argument correctly.
- Splitting the low level function str_to_time() into smaller pieces
to reuse the code. Adding a new function str_to_DDhhmmssff(), to
parse "INTERVAL DAY TO SECOND" values.
After these changes, functions TIMESTAMP() and ADDTIME()
return much more predictable results, in terms of fractional
digits, and in terms of the overall result.
The full ranges of DATETIME and TIME values are now covered by TIMESTAMP()
and ADDTIME(), so the following can now be calculated:
SELECT ADDTIME(TIMESTAMP'0001-01-01 00:00:00', '87649415:59:59.999999');
-> '9999-12-31 23:59:59.999999'
SELECT TIMESTAMP(DATE'0001-01-01', '87649415:59:59.999999')
-> '9999-12-31 23:59:59.999999'
SELECT ADDTIME(TIME'-838:59:59.999999', '1677:59:59.999998');
-> '838:59:59.999999'
- Implementing the task according to the MDEV description.
- Adding a helper class Sec6_add to share the code in type-specific
branches in Item_func_add_time::get_date().
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
TODO:
- Make get_thd_memroot() inline
- To do this, we need to reduce dependence of include files, especially
so that sql_class.h is not depending in item.h
Fixing the data type for the "fuzzydate" parameter to
Item_func_hybrid_field_type::date_op() from uint to ulonglong,
for consistency with Item::get_date().