The task "MDEV-25829 Change default Unicode collation to uca1400_ai_ci"
previously changed collation derivation for string user variables
from DERIVATION_EXPLICIT to DERIVATION_COERCIBLE, to resolve illegal
collation mix conflicts between table columns and user variables
when they have different collations.
However, DERIVATION_COERCIBLE was a wrong choice because it caused
conflicts between string literals and user variables when they have
different collations.
Adding a new collation derivation level DERIVATION_USERVAR.
This makes the collation of a user variable:
- weaker than a table column (like it was intended by MDEV-25829)
- but stronger than a literal (like it was in pre-MDEV-25829)
Cleanup in sql_type.h:
Removing the line "- BINARY(expr)" from the before-DERIVATION_CAST
comment, as it was on a wrong place. It's also listed on the correct
place before DERIVATION_IMPLICIT.
Changing the return type of the following functions:
- CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), NOW()
- SYSDATE()
- FROM_UNIXTIME()
from DATETIME to TIMESTAMP.
Note, the old function NOW() returning DATETIME is still available
as LOCALTIMESTAMP or LOCALTIMESTAMP(), e.g.:
SELECT
LOCALTIMESTAMP, -- DATETIME
CURRENT_TIMESTAMP; -- TIMESTAMP
The change in the functions return data type fixes some problems
that occurred near a DST change:
- Problem #1
INSERT INTO t1 (timestamp_field) VALUES (CURRENT_TIMESTAMP);
INSERT INTO t1 (timestamp_field) VALUES (COALESCE(CURRENT_TIMESTAMP));
could result into two different values inserted.
- Problem #2
INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526));
INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526+3600));
could result into two equal TIMESTAMP values near a DST change.
Additional changes:
- FROM_UNIXTIME(0) now returns SQL NULL instead of '1970-01-01 00:00:00'
(assuming time_zone='+00:00')
- UNIX_TIMESTAMP('1970-01-01 00:00:00') now returns SQL NULL instead of 0
(assuming time_zone='+00:00'
These additional changes are needed for consistency with TIMESTAMP fields,
which cannot store '1970-01-01 00:00:00 +00:00'
The code erroneously called sec_since_epoch() for dates with zeros,
e.g. '2024-00-01'.
Fixi: adding a test that the date does not have zeros before
calling TIME_to_native().
Changing the alias LOCALTIME->CURRENT_TIMESTAMP to LOCALTIME->CURRENT_TIME.
This changes the return type of LOCALTIME from DATETIME to TIME,
according to the SQL Standard.
This patch extends the timestamp from
2038-01-19 03:14:07.999999 to 2106-02-07 06:28:15.999999
for 64 bit hardware and OS where 'long' is 64 bits.
This is true for 64 bit Linux but not for Windows.
This is done by treating the 32 bit stored int as unsigned instead of
signed. This is safe as MariaDB has never accepted dates before the epoch
(1970).
The benefit of this approach that for normal timestamp the storage is
compatible with earlier version.
However for tables using system versioning we before stored a
timestamp with the year 2038 as the 'max timestamp', which is used to
detect current values. This patch stores the new 2106 year max value
as the max timestamp. This means that old tables using system
versioning needs to be updated with mariadb-upgrade when moving them
to 11.4. That will be done in a separate commit.
Step#2 - Adding a new collation derivation level for CAST and CONVERT.
Now character string cast functions:
- CAST(string_expr AS CHAR)
- CONVERT(expr USING charset_name)
have a new collation derivation level between:
- string literals
- utf8 metadata functions, e.g. user() and database()
Before the change these cast functions had collation derivation equal
to table columns, which caused more illegal mix of collation conflicts.
Note, binary string cast functions:
- BINARY(expr)
- CAST(string_expr AS BINARY)
- CONVERT(expr USING binary)
did not change their collation derivation, to preserve the behaviour of
queries like these:
SELECT database()=BINARY'test';
SELECT user()=CAST('root' AS BINARY);
SELECT current_role()=CONVERT('role' USING binary);
Derivation levels after the change look as follows:
DERIVATION_IGNORABLE= 7, // Explicit NULL
DERIVATION_NUMERIC= 6, // Numbers in string context,
// Numeric user variables
// CAST(numeric_expr AS CHAR)
DERIVATION_COERCIBLE= 5, // Literals, string user variables
DERIVATION_CAST= 4, // CAST(string_expr AS CHAR),
// CONVERT(string_expr USING cs)
DERIVATION_SYSCONST= 3, // utf8 metadata functions, e.g. user(), database()
DERIVATION_IMPLICIT= 2, // Table columns, SP variables, BINARY(expr)
DERIVATION_NONE= 1, // A mix (e.g. CONCAT) of two differrent collations
DERIVATION_EXPLICIT= 0 // An explicit COLLATE clause
TIME-alike string and numeric arguments to TIMEDIFF()
can get additional fractional seconds during the supported
TIME range adjustment in get_time().
For example, during TIMEDIFF('839:00:00','00:00:00') evaluation
in Item_func_timediff::get_date(), the call for args[0]->get_time()
returns MYSQL_TIME '838:59:59.999999'.
Item_func_timediff::get_date() did not handle these extra digits
and returned a MYSQL_TIME result with fractional digits outside
of Item_func_timediff::decimals. This mismatch could further be
caught by a DBUG_ASSERT() in various other pieces of the code,
leading to a crash.
Fix:
In case if get_time() returned MYSQL_TIMESTAMP_TIME,
let's truncate all extra digits using my_time_trunc(&l_time,decimals).
This guarantees that the rest of the code returns a MYSQL_TIME
with second_part not conflicting with Item_func_timediff::decimals.
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
Functions extracting non-negative datetime components:
- YEAR(dt), EXTRACT(YEAR FROM dt)
- QUARTER(td), EXTRACT(QUARTER FROM dt)
- MONTH(dt), EXTRACT(MONTH FROM dt)
- WEEK(dt), EXTRACT(WEEK FROM dt)
- HOUR(dt),
- MINUTE(dt),
- SECOND(dt),
- MICROSECOND(dt),
- DAYOFYEAR(dt)
- EXTRACT(YEAR_MONTH FROM dt)
did not set their max_length properly, so in the DECIMAL
context they created a too small DECIMAL column, which
led to the 'Out of range value' error.
The problem is that most of these functions historically
returned the signed INT data type.
There were two simple ways to fix these functions:
1. Add +1 to max_length.
But this would also change their size in the string context
and create too long VARCHAR columns, with +1 excessive size.
2. Preserve max_length, but change the data type from INT to INT UNSIGNED.
But this would break backward compatibility.
Also, using UNSIGNED is generally not desirable,
it's better to stay with signed when possible.
This fix implements another solution, which it makes all these functions
work well in all contexts: int, decimal, string.
Fix details:
- Adding a new special class Type_handler_long_ge0 - the data type
handler for expressions which:
* should look like normal signed INT
* but which known not to return negative values
Expressions handled by Type_handler_long_ge0 store in Item::max_length
only the number of digits, without adding +1 for the sign.
- Fixing Item_extract to use Type_handler_long_ge0
for non-negative datetime components:
YEAR, YEAR_MONTH, QUARTER, MONTH, WEEK
- Adding a new abstract class Item_long_ge0_func, for functions
returning non-negative datetime components.
Item_long_ge0_func uses Type_handler_long_ge0 as the type handler.
The class hierarchy now looks as follows:
Item_long_ge0_func
Item_long_func_date_field
Item_func_to_days
Item_func_dayofmonth
Item_func_dayofyear
Item_func_quarter
Item_func_year
Item_long_func_time_field
Item_func_hour
Item_func_minute
Item_func_second
Item_func_microsecond
- Cleanup: EXTRACT(QUARTER FROM dt) created an excessive VARCHAR column
in string context. Changing its length from 2 to 1.
This was done after discussions with Igor, Sanja and Bar.
The main reason for removing the deprication was to ensure that MariaDB
is always backward compatible whenever possible.
Other things:
- Added statistics counters, mainly for the feedback plugin.
- INTO OUTFILE
- INTO variable
- If INTO is using the old syntax (end of query)
Bit operators (~ ^ | & << >>) and the function BIT_COUNT()
always called val_int() for their arguments.
It worked correctly only for INT type arguments.
In case of DECIMAL and DOUBLE arguments it did not work well:
the argument values were truncated to the maximum SIGNED BIGINT value
of 9223372036854775807.
Fixing the code as follows:
- If the argument if of an integer data type,
it works using val_int() as before.
- If the argument if of some other data type, it gets the argument value
using val_decimal(), to avoid truncation, and then converts the result
to ulonglong.
Using Item_handled_func to switch between the two approaches easier.
As an additional advantage, with Item_handled_func it will be easier
to implement overloading in the future, so data type plugings will be able
to define their own behavioir of bit operators and BIT_COUNT().
Moving the code from the former val_int() implementations
as methods to Longlong_null, to avoid code duplication in the
INT and DECIMAL branches.
Remove usage of deprecated variable storage_engine. It was deprecated in 5.5 but
it never issued a deprecation warning. Make it issue a warning in 10.5.1.
Replaced with default_storage_engine.
The MDEV-17262 commit 26432e49d3
was skipped. In Galera 4, the implementation would seem to require
changes to the streaming replication.
In the tests archive.rnd_pos main.profiling, disable_ps_protocol
for SHOW STATUS and SHOW PROFILE commands until MDEV-18974
has been fixed.