Item_func_conv::fix_length_and_dec() incorrectly set maximum
length as 64 character. But for negative numbers it can
return up to 65 charcters (including the sign).
Update `SESSION_USER()` behaviour to be comparable with `CURRENT_USER()`.
`SESSION_USER()` will return the user and host columns from `mysql.user`
used to authenticate the user when the session was created.
Historically `SESSION_USER()` was an alias of `USER()` function. The
main difference with `USER()` behaviour after this changes is that
`SESSION_USER()` now returns the host column from `mysql.user` instead of
the client host or ip.
NOTE: `SESSION_USER_IS_USER` old mode is added to make the change
backward compatible.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
Based on the current logic, objects of classes Item_func_charset and
Item_func_coercibility (responsible for CHARSET() and COERCIBILITY()
functions) are always considered constant.
However, SQL syntax allows their use in a non-constant manner, such as
CHARSET(t1.a), COERCIBILITY(t1.a).
In these cases, the `used_tables()` parameter corresponds to table names
in the function parameters, creating an inconsistency: the item is marked
as constant but accesses tables. This leads to crashes when
conditions with CHARSET()/COERCIBILITY() are pushed into derived tables.
This commit addresses the issue by setting `used_tables()` to 0 for
`Item_func_charset` and `Item_func_coercibility`. Additionally, the items
now store the return values during the preparation phase and return
them during the execution phase. This ensures that the items do not call
its arguments methods during the execution and are truly constant.
Reviewer: Alexander Barkov <bar@mariadb.com>
The `Item` class methods `get_copy()`, `build_clone()`, and `clone_item()`
face an issue where they may be defined in a descendant class
(e.g., `Item_func`) but not in a further descendant (e.g., `Item_func_child`).
This can lead to scenarios where `build_clone()`, when operating on an
instance of `Item_func_child` with a pointer to the base class (`Item`),
returns an instance of `Item_func` instead of `Item_func_child`.
Since this limitation cannot be resolved at compile time, this commit
introduces runtime type checks for the copy/clone operations.
A debug assertion will now trigger in case of a type mismatch.
`get_copy()`, `build_clone()`, and `clone_item()` are no more virtual,
but virtual `do_get_copy()`, `do_build_clone()`, and `do_clone_item()`
are added to the protected section of the class `Item`.
Additionally, const qualifiers have been added to certain methods
to enhance code reliability.
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>
Item_func_hex::fix_length_and_dec() evaluated a too short data type
for signed numeric arguments, which resulted in a 'Data too long for column'
error on CREATE..SELECT.
Fixing the code to take into account that a short negative
numer can produce a long HEX value: -1 -> 'FFFFFFFFFFFFFFFF'
Also fixing Item_func_hex::val_str_ascii_from_val_real().
Without this change, MTR test with HEX with negative float point arguments
failed on some platforms (aarch64, ppc64le, s390-x).
Step#2 - Adding a new collation derivation level for CAST and CONVERT.
Now character string cast functions:
- CAST(string_expr AS CHAR)
- CONVERT(expr USING charset_name)
have a new collation derivation level between:
- string literals
- utf8 metadata functions, e.g. user() and database()
Before the change these cast functions had collation derivation equal
to table columns, which caused more illegal mix of collation conflicts.
Note, binary string cast functions:
- BINARY(expr)
- CAST(string_expr AS BINARY)
- CONVERT(expr USING binary)
did not change their collation derivation, to preserve the behaviour of
queries like these:
SELECT database()=BINARY'test';
SELECT user()=CAST('root' AS BINARY);
SELECT current_role()=CONVERT('role' USING binary);
Derivation levels after the change look as follows:
DERIVATION_IGNORABLE= 7, // Explicit NULL
DERIVATION_NUMERIC= 6, // Numbers in string context,
// Numeric user variables
// CAST(numeric_expr AS CHAR)
DERIVATION_COERCIBLE= 5, // Literals, string user variables
DERIVATION_CAST= 4, // CAST(string_expr AS CHAR),
// CONVERT(string_expr USING cs)
DERIVATION_SYSCONST= 3, // utf8 metadata functions, e.g. user(), database()
DERIVATION_IMPLICIT= 2, // Table columns, SP variables, BINARY(expr)
DERIVATION_NONE= 1, // A mix (e.g. CONCAT) of two differrent collations
DERIVATION_EXPLICIT= 0 // An explicit COLLATE clause
This patch also fixes:
MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
MDEV-33088 Cannot create triggers in the database `MYSQL`
MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0
- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER
- Adding a wrapper function CHARSET_INFO::streq(), to compare
two strings for equality. For now it calls strnncoll() internally.
In the future it will turn into a virtual function.
- Adding new accent sensitive case insensitive collations:
- utf8mb4_general1400_as_ci
- utf8mb3_general1400_as_ci
They implement accent sensitive case insensitive comparison.
The weight of a character is equal to the code point of its
upper case variant. These collations use Unicode-14.0.0 casefolding data.
The result of
my_charset_utf8mb3_general1400_as_ci.strcoll()
is very close to the former
my_charset_utf8mb3_general_ci.strcasecmp()
There is only a difference in a couple dozen rare characters, because:
- the switch from "tolower" to "toupper" comparison, to make
utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
- the switch from Unicode-3.0.0 to Unicode-14.0.0
This difference should be tolarable. See the list of affected
characters in the MDEV description.
Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
Unlike utf8mb4_general_ci, it does not treat all BMP characters
as equal.
- Adding classes representing names of the file based database objects:
Lex_ident_db
Lex_ident_table
Lex_ident_trigger
Their comparison collation depends on the underlying
file system case sensitivity and on --lower-case-table-names
and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.
- Adding classes representing names of other database objects,
whose names have case insensitive comparison style,
using my_charset_utf8mb3_general1400_as_ci:
Lex_ident_column
Lex_ident_sys_var
Lex_ident_user_var
Lex_ident_sp_var
Lex_ident_ps
Lex_ident_i_s_table
Lex_ident_window
Lex_ident_func
Lex_ident_partition
Lex_ident_with_element
Lex_ident_rpl_filter
Lex_ident_master_info
Lex_ident_host
Lex_ident_locale
Lex_ident_plugin
Lex_ident_engine
Lex_ident_server
Lex_ident_savepoint
Lex_ident_charset
engine_option_value::Name
- All the mentioned Lex_ident_xxx classes implement a method streq():
if (ident1.streq(ident2))
do_equal();
This method works as a wrapper for CHARSET_INFO::streq().
- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
in class members and in function/method parameters.
- Replacing all calls like
system_charset_info->coll->strcasecmp(ident1, ident2)
to
ident1.streq(ident2)
- Taking advantage of the c++11 user defined literal operator
for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
data types. Use example:
const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;
is now a shorter version of:
const Lex_ident_column primary_key_name=
Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
Turning REGEXP_REPLACE into two schema-qualified functions:
- mariadb_schema.regexp_replace()
- oracle_schema.regexp_replace()
Fixing oracle_schema.regexp_replace(subj,pattern,replacement) to treat
NULL in "replacement" as an empty string.
Adding new classes implementing oracle_schema.regexp_replace():
- Item_func_regexp_replace_oracle
- Create_func_regexp_replace_oracle
Adding helper methods:
- String *Item::val_str_null_to_empty(String *to)
- String *Item::val_str_null_to_empty(String *to, bool null_to_empty)
and reusing these methods in both Item_func_replace and
Item_func_regexp_replace.
Item_func_quote did not calculate its max_length correctly for nullable
arguments.
Fix:
In case if the argument is nullable, reserve at least 4 characters
so the string "NULL" fits.
The crash happened with an indexed virtual column whose
value is evaluated using a function that has a different meaning
in sql_mode='' vs sql_mode=ORACLE:
- DECODE()
- LTRIM()
- RTRIM()
- LPAD()
- RPAD()
- REPLACE()
- SUBSTR()
For example:
CREATE TABLE t1 (
b VARCHAR(1),
g CHAR(1) GENERATED ALWAYS AS (SUBSTR(b,0,0)) VIRTUAL,
KEY g(g)
);
So far we had replacement XXX_ORACLE() functions for all mentioned function,
e.g. SUBSTR_ORACLE() for SUBSTR(). So it was possible to correctly re-parse
SUBSTR_ORACLE() even in sql_mode=''.
But it was not possible to re-parse the MariaDB version of SUBSTR()
after switching to sql_mode=ORACLE. It was erroneously mis-interpreted
as SUBSTR_ORACLE().
As a result, this combination worked fine:
SET sql_mode=ORACLE;
CREATE TABLE t1 ... g CHAR(1) GENERATED ALWAYS AS (SUBSTR(b,0,0)) VIRTUAL, ...;
INSERT ...
FLUSH TABLES;
SET sql_mode='';
INSERT ...
But the other way around it crashed:
SET sql_mode='';
CREATE TABLE t1 ... g CHAR(1) GENERATED ALWAYS AS (SUBSTR(b,0,0)) VIRTUAL, ...;
INSERT ...
FLUSH TABLES;
SET sql_mode=ORACLE;
INSERT ...
At CREATE time, SUBSTR was instantiated as Item_func_substr and printed
in the FRM file as substr(). At re-open time with sql_mode=ORACLE, "substr()"
was erroneously instantiated as Item_func_substr_oracle.
Fix:
The fix proposes a symmetric solution. It provides a way to re-parse reliably
all sql_mode dependent functions to their original CREATE TABLE time meaning,
no matter what the open-time sql_mode is.
We take advantage of the same idea we previously used to resolve sql_mode
dependent data types.
Now all sql_mode dependent functions are printed by SHOW using a schema
qualifier when the current sql_mode differs from the function sql_mode:
SET sql_mode='';
CREATE TABLE t1 ... SUBSTR(a,b,c) ..;
SET sql_mode=ORACLE;
SHOW CREATE TABLE t1; -> mariadb_schema.substr(a,b,c)
SET sql_mode=ORACLE;
CREATE TABLE t2 ... SUBSTR(a,b,c) ..;
SET sql_mode='';
SHOW CREATE TABLE t1; -> oracle_schema.substr(a,b,c)
Old replacement names like substr_oracle() are still understood for
backward compatibility and used in FRM files (for downgrade compatibility),
but they are not printed by SHOW any more.
KDF(key_str, salt [, {info | iterations} [, kdf_name [, width ]]])
kdf_name is "hkdf" or "pbkdf2_hmac" (default).
width (in bits) can be any number divisible by 8,
by default it's taken from @@block_encryption_mode
iterations must be positive, and is 1000 by default
OpenSSL 1.0 doesn't support HKDF, so it'll return NULL.
This OpenSSL version is still used in SLES 12 and CentOS 7