Commit graph

554 commits

Author SHA1 Message Date
Alexander Barkov
903b5d6a83 MDEV-25829 Change default Unicode collation to uca1400_ai_ci
Step#3 The main patch
2024-05-24 15:50:05 +04:00
Alexander Barkov
a3117c7983 MDEV-25829 Change default Unicode collation to uca1400_ai_ci
Step#2 - Adding a new collation derivation level for CAST and CONVERT.

Now character string cast functions:
  - CAST(string_expr AS CHAR)
  - CONVERT(expr USING charset_name)

have a new collation derivation level between:

  - string literals
  - utf8 metadata functions, e.g. user() and database()

Before the change these cast functions had collation derivation equal
to table columns, which caused more illegal mix of collation conflicts.

Note, binary string cast functions:
  - BINARY(expr)
  - CAST(string_expr AS BINARY)
  - CONVERT(expr USING binary)
did not change their collation derivation, to preserve the behaviour of
queries like these:
SELECT database()=BINARY'test';
SELECT user()=CAST('root' AS BINARY);
SELECT current_role()=CONVERT('role' USING binary);

Derivation levels after the change look as follows:

  DERIVATION_IGNORABLE= 7, // Explicit NULL

  DERIVATION_NUMERIC= 6,   // Numbers in string context,
                           // Numeric user variables
                           // CAST(numeric_expr AS CHAR)

  DERIVATION_COERCIBLE= 5, // Literals, string user variables

  DERIVATION_CAST= 4,      // CAST(string_expr AS CHAR),
                           // CONVERT(string_expr USING cs)

  DERIVATION_SYSCONST= 3,  // utf8 metadata functions, e.g. user(), database()
  DERIVATION_IMPLICIT= 2,  // Table columns, SP variables, BINARY(expr)
  DERIVATION_NONE= 1,      // A mix (e.g. CONCAT) of two differrent collations
  DERIVATION_EXPLICIT= 0   // An explicit COLLATE clause
2024-05-24 15:37:47 +04:00
Alexander Barkov
fd247cc21f MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp()
This patch also fixes:
  MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
  MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
  MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
  MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
  MDEV-33088 Cannot create triggers in the database `MYSQL`
  MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
  MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
  MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
  MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
  MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0

- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER

- Adding a wrapper function CHARSET_INFO::streq(), to compare
  two strings for equality. For now it calls strnncoll() internally.
  In the future it will turn into a virtual function.

- Adding new accent sensitive case insensitive collations:
    - utf8mb4_general1400_as_ci
    - utf8mb3_general1400_as_ci
  They implement accent sensitive case insensitive comparison.
  The weight of a character is equal to the code point of its
  upper case variant. These collations use Unicode-14.0.0 casefolding data.

  The result of
     my_charset_utf8mb3_general1400_as_ci.strcoll()
  is very close to the former
     my_charset_utf8mb3_general_ci.strcasecmp()

  There is only a difference in a couple dozen rare characters, because:
    - the switch from "tolower" to "toupper" comparison, to make
      utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
    - the switch from Unicode-3.0.0 to Unicode-14.0.0
  This difference should be tolarable. See the list of affected
  characters in the MDEV description.

  Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
  Unlike utf8mb4_general_ci, it does not treat all BMP characters
  as equal.

- Adding classes representing names of the file based database objects:

    Lex_ident_db
    Lex_ident_table
    Lex_ident_trigger

  Their comparison collation depends on the underlying
  file system case sensitivity and on --lower-case-table-names
  and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.

- Adding classes representing names of other database objects,
  whose names have case insensitive comparison style,
  using my_charset_utf8mb3_general1400_as_ci:

  Lex_ident_column
  Lex_ident_sys_var
  Lex_ident_user_var
  Lex_ident_sp_var
  Lex_ident_ps
  Lex_ident_i_s_table
  Lex_ident_window
  Lex_ident_func
  Lex_ident_partition
  Lex_ident_with_element
  Lex_ident_rpl_filter
  Lex_ident_master_info
  Lex_ident_host
  Lex_ident_locale
  Lex_ident_plugin
  Lex_ident_engine
  Lex_ident_server
  Lex_ident_savepoint
  Lex_ident_charset
  engine_option_value::Name

- All the mentioned Lex_ident_xxx classes implement a method streq():

  if (ident1.streq(ident2))
     do_equal();

  This method works as a wrapper for CHARSET_INFO::streq().

- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
  in class members and in function/method parameters.

- Replacing all calls like
    system_charset_info->coll->strcasecmp(ident1, ident2)
  to
    ident1.streq(ident2)

- Taking advantage of the c++11 user defined literal operator
  for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
  data types. Use example:

  const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;

  is now a shorter version of:

  const Lex_ident_column primary_key_name=
    Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
2024-04-18 15:22:10 +04:00
Oleksandr Byelkin
d21cb43db1 Merge branch '11.2' into 11.3 2024-02-04 16:42:31 +01:00
Sergei Golubchik
79580f4f96 Merge branch '11.1' into 11.2 2024-02-02 17:43:57 +01:00
Sergei Golubchik
6ef0b2ee5f Merge branch '10.11' into 11.0 2024-02-01 18:57:08 +01:00
Sergei Golubchik
87e13722a9 Merge branch '10.6' into 10.11 2024-02-01 18:36:14 +01:00
Sergei Golubchik
3f6038bc51 Merge branch '10.5' into 10.6 2024-01-31 18:04:03 +01:00
Sergei Golubchik
01f6abd1d4 Merge branch '10.4' into 10.5 2024-01-31 17:32:53 +01:00
Oleksandr Byelkin
fe490f85bb Merge branch '10.11' into 11.0 2024-01-30 08:54:10 +01:00
Oleksandr Byelkin
14d930db5d Merge branch '10.6' into 10.11 2024-01-30 08:17:58 +01:00
Oleksandr Byelkin
25c0806867 Merge branch '10.5' into 10.6 2024-01-30 07:43:15 +01:00
Oleksandr Byelkin
50107c4b22 Merge branch '10.4' into 10.5 2024-01-30 07:26:17 +01:00
Alexander Barkov
f738cc9876 MDEV-29095 REGEXP_REPLACE treats empty strings different than REPLACE in ORACLE mode
Turning REGEXP_REPLACE into two schema-qualified functions:
- mariadb_schema.regexp_replace()
- oracle_schema.regexp_replace()

Fixing oracle_schema.regexp_replace(subj,pattern,replacement) to treat
NULL in "replacement" as an empty string.

Adding new classes implementing oracle_schema.regexp_replace():
- Item_func_regexp_replace_oracle
- Create_func_regexp_replace_oracle

Adding helper methods:
- String *Item::val_str_null_to_empty(String *to)
- String *Item::val_str_null_to_empty(String *to, bool null_to_empty)

and reusing these methods in both Item_func_replace and
Item_func_regexp_replace.
2024-01-24 10:59:17 +04:00
Alexander Barkov
81d01855fe MDEV-28651 quote(NULL) returns incorrect result in view ('NU' instead of 'NULL')
Item_func_quote did not calculate its max_length correctly for nullable
arguments.

Fix:

In case if the argument is nullable, reserve at least 4 characters
so the string "NULL" fits.
2024-01-23 13:22:58 +04:00
Sergei Golubchik
7f0094aac8 Merge branch '11.2' into 11.3 2023-12-21 02:14:59 +01:00
Sergei Golubchik
fef31a26f3 Merge branch '11.1' into 11.2 2023-12-20 23:43:05 +01:00
Sergei Golubchik
8c8bce05d2 Merge branch '10.11' into 11.0 2023-12-19 15:53:18 +01:00
Sergei Golubchik
fd0b47f9d6 Merge branch '10.6' into 10.11 2023-12-18 11:19:04 +01:00
Sergei Golubchik
e95bba9c58 Merge branch '10.5' into 10.6 2023-12-17 11:20:43 +01:00
Sergei Golubchik
98a39b0c91 Merge branch '10.4' into 10.5 2023-12-02 01:02:50 +01:00
Alexander Barkov
2b6d241ee4 MDEV-27744 LPAD in vcol created in ORACLE mode makes table corrupted in non-ORACLE
The crash happened with an indexed virtual column whose
value is evaluated using a function that has a different meaning
in sql_mode='' vs sql_mode=ORACLE:

- DECODE()
- LTRIM()
- RTRIM()
- LPAD()
- RPAD()
- REPLACE()
- SUBSTR()

For example:

CREATE TABLE t1 (
  b VARCHAR(1),
  g CHAR(1) GENERATED ALWAYS AS (SUBSTR(b,0,0)) VIRTUAL,
  KEY g(g)
);

So far we had replacement XXX_ORACLE() functions for all mentioned function,
e.g. SUBSTR_ORACLE() for SUBSTR(). So it was possible to correctly re-parse
SUBSTR_ORACLE() even in sql_mode=''.

But it was not possible to re-parse the MariaDB version of SUBSTR()
after switching to sql_mode=ORACLE. It was erroneously mis-interpreted
as SUBSTR_ORACLE().

As a result, this combination worked fine:

SET sql_mode=ORACLE;
CREATE TABLE t1 ... g CHAR(1) GENERATED ALWAYS AS (SUBSTR(b,0,0)) VIRTUAL, ...;
INSERT ...
FLUSH TABLES;
SET sql_mode='';
INSERT ...

But the other way around it crashed:

SET sql_mode='';
CREATE TABLE t1 ... g CHAR(1) GENERATED ALWAYS AS (SUBSTR(b,0,0)) VIRTUAL, ...;
INSERT ...
FLUSH TABLES;
SET sql_mode=ORACLE;
INSERT ...

At CREATE time, SUBSTR was instantiated as Item_func_substr and printed
in the FRM file as substr(). At re-open time with sql_mode=ORACLE, "substr()"
was erroneously instantiated as Item_func_substr_oracle.

Fix:

The fix proposes a symmetric solution. It provides a way to re-parse reliably
all sql_mode dependent functions to their original CREATE TABLE time meaning,
no matter what the open-time sql_mode is.

We take advantage of the same idea we previously used to resolve sql_mode
dependent data types.

Now all sql_mode dependent functions are printed by SHOW using a schema
qualifier when the current sql_mode differs from the function sql_mode:

SET sql_mode='';
CREATE TABLE t1 ... SUBSTR(a,b,c) ..;
SET sql_mode=ORACLE;
SHOW CREATE TABLE t1;   ->   mariadb_schema.substr(a,b,c)

SET sql_mode=ORACLE;
CREATE TABLE t2 ... SUBSTR(a,b,c) ..;
SET sql_mode='';
SHOW CREATE TABLE t1;   ->   oracle_schema.substr(a,b,c)

Old replacement names like substr_oracle() are still understood for
backward compatibility and used in FRM files (for downgrade compatibility),
but they are not printed by SHOW any more.
2023-11-08 15:01:20 +04:00
Sergei Golubchik
4f9396b9f8 MDEV-31474 KDF() function
KDF(key_str, salt [, {info | iterations} [, kdf_name [, width ]]])

kdf_name is "hkdf" or "pbkdf2_hmac" (default).

width (in bits) can be any number divisible by 8,
by default it's taken from @@block_encryption_mode

iterations must be positive, and is 1000 by default

OpenSSL 1.0 doesn't support HKDF, so it'll return NULL.
This OpenSSL version is still used in SLES 12 and CentOS 7
2023-09-30 14:43:12 +02:00
Sergei Golubchik
eaa87eb89f allow random_bytes() in virtual columns 2023-08-15 10:16:10 +02:00
Sergei Golubchik
5de39c5ae3 MDEV-9069 extend AES_ENCRYPT() and AES_DECRYPT() to support IV and the algorithm
AES_ENCRYPT(str, key, [, iv [, mode ]])
AES_DECRYPT(str, key, [, iv [, mode ]])

mode is aes-{128,192,256}-{ecb,cbc,ctr} e.g. "aes-128-cbc".

and a @@block_encryption_mode variable for the default value of mode

change in behavior: AES_ENCRYPT(str, key) can no longer
be used in persistent virtual columns (and alike)
2023-08-02 13:29:48 +02:00
Sergei Golubchik
c2b6916393 MDEV-19629 post-merge fixes
* it isn't "pfs" function, don't call it Item_func_pfs,
  don't use item_pfsfunc.*
* tests don't depend on performance schema, put in the main suite
* inherit from Item_str_ascii_func
* use connection collation, not utf8mb3_general_ci
* set result length in fix_length_and_dec
* do not set maybe_null
* use my_snprintf() where possible
* don't set m_value.ptr on every invocation
* update sys schema to use the format_pico_time()
* len must be size_t (compilation error on Windows)
* the correct function name for double->double is fabs()
* drop volatile hack
2023-03-27 21:27:27 +02:00
Oleksandr Byelkin
76bcea3154 Merge branch '10.9' into 10.10 2023-01-31 11:01:48 +01:00
Oleksandr Byelkin
de2d089942 Merge branch '10.8' into 10.9 2023-01-31 10:37:31 +01:00
Oleksandr Byelkin
638625278e Merge branch '10.7' into 10.8 2023-01-31 09:57:52 +01:00
Oleksandr Byelkin
b923b80cfd Merge branch '10.6' into 10.7 2023-01-31 09:33:58 +01:00
Oleksandr Byelkin
c3a5cf2b5b Merge branch '10.5' into 10.6 2023-01-31 09:31:42 +01:00
Oleksandr Byelkin
7fa02f5c0b Merge branch '10.4' into 10.5 2023-01-27 13:54:14 +01:00
Alexander Barkov
284ac6f2b7 MDEV-27653 long uniques don't work with unicode collations 2023-01-19 20:33:03 +04:00
Marko Mäkelä
fa389b9098 Merge 10.9 into 10.10 2022-12-14 08:57:39 +02:00
Alexander Barkov
133446828c MDEV-27009 Add UCA-14.0.0 collations
- Added one neutral and 22 tailored (language specific) collations based on
  Unicode Collation Algorithm version 14.0.0.

  Collations were added for Unicode character sets
  utf8mb3, utf8mb4, ucs2, utf16, utf32.

  Every tailoring was added with four accent and case
  sensitivity flag combinations, e.g:

  * utf8mb4_uca1400_swedish_as_cs
  * utf8mb4_uca1400_swedish_as_ci
  * utf8mb4_uca1400_swedish_ai_cs
  * utf8mb4_uca1400_swedish_ai_ci

  and their _nopad_ variants:

  * utf8mb4_uca1400_swedish_nopad_as_cs
  * utf8mb4_uca1400_swedish_nopad_as_ci
  * utf8mb4_uca1400_swedish_nopad_ai_cs
  * utf8mb4_uca1400_swedish_nopad_ai_ci

- Introducing a conception of contextually typed named collations:

  CREATE DATABASE db1 CHARACTER SET utf8mb4;
  CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci);

  The idea is that there is no a need to specify the character set prefix
  in the new collation names. It's enough to type just the suffix
  "uca1400_as_ci". The character set is taken from the context.

  In the above example script the context character set is utf8mb4.
  So the CREATE TABLE will make a column with the collation
  utf8mb4_uca1400_as_ci.

  Short collations names can be used in any parts of the SQL syntax
  where the COLLATE clause is understood.

- New collations are displayed only one time
  (without character set combinations) by these statements:

     SELECT * FROM INFORMATION_SCHEMA.COLLATIONS;
     SHOW COLLATION;

  For example, all these collations:
  - utf8mb3_uca1400_swedish_as_ci
  - utf8mb4_uca1400_swedish_as_ci
  - ucs2_uca1400_swedish_as_ci
  - utf16_uca1400_swedish_as_ci
  - utf32_uca1400_swedish_as_ci
  have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION,
  with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix
  without the character set name:

SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';

+-----------------------+
| COLLATION_NAME        |
+-----------------------+
| uca1400_swedish_as_ci |
+-----------------------+

  Note, the behaviour of old collations did not change.
  Non-unicode collations (e.g. latin1_swedish_ci) and
  old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci)
  are still displayed with the character set prefix, as before.

- The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed.

  The NOT NULL constraint was removed from these columns:
  - CHARACTER_SET_NAME
  - ID
  - IS_DEFAULT
  and from the corresponding columns in SHOW COLLATION.

  For example:

SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
+-----------------------+--------------------+------+------------+
| COLLATION_NAME        | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
+-----------------------+--------------------+------+------------+
| uca1400_swedish_as_ci | NULL               | NULL | NULL       |
+-----------------------+--------------------+------+------------+

  The NULL value in these columns now means that the collation
  is applicable to multiple character sets.
  The behavioir of old collations did not change.
  Make sure your client programs can handle NULL values in these columns.

- The structure of the table
  INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed.

  Three new NOT NULL columns were added:
  - FULL_COLLATION_NAME
  - ID
  - IS_DEFAULT

  New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY.
  The column COLLATION_NAME contains the collation name without the character
  set prefix. The column FULL_COLLATION_NAME contains the collation name with
  the character set prefix.

  Old collations have full collation name in both FULL_COLLATION_NAME and
  COLLATION_NAME.

SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4|latin1).*swedish.*ci$';
+-----------------------------+-------------------------------------+--------------------+------+------------+
| COLLATION_NAME              | FULL_COLLATION_NAME                 | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
+-----------------------------+-------------------------------------+--------------------+------+------------+
| latin1_swedish_ci           | latin1_swedish_ci                   | latin1             |    8 | Yes        |
| latin1_swedish_nopad_ci     | latin1_swedish_nopad_ci             | latin1             | 1032 |            |
| utf8mb4_swedish_ci          | utf8mb4_swedish_ci                  | utf8mb4            |  232 |            |
| uca1400_swedish_ai_ci       | utf8mb4_uca1400_swedish_ai_ci       | utf8mb4            | 2368 |            |
| uca1400_swedish_as_ci       | utf8mb4_uca1400_swedish_as_ci       | utf8mb4            | 2370 |            |
| uca1400_swedish_nopad_ai_ci | utf8mb4_uca1400_swedish_nopad_ai_ci | utf8mb4            | 2372 |            |
| uca1400_swedish_nopad_as_ci | utf8mb4_uca1400_swedish_nopad_as_ci | utf8mb4            | 2374 |            |
+-----------------------------+-------------------------------------+--------------------+------+------------+

- Other INFORMATION_SCHEMA queries:

  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS;
  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS;
  SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
  SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS;

  display full collation names, including character sets prefix,
  for all collations, including new collations.

  Corresponding SHOW commands also display full collation names
  in collation related columns:

  SHOW CREATE TABLE t1;
  SHOW CREATE DATABASE db1;
  SHOW TABLE STATUS;
  SHOW CREATE FUNCTION f1;
  SHOW CREATE PROCEDURE p1;
  SHOW CREATE EVENT ev1;
  SHOW CREATE TRIGGER tr1;
  SHOW CREATE VIEW;

  These INFORMATION_SCHEMA queries and SHOW statements may change in
  the future, to display show collation names.
2022-08-10 15:04:24 +02:00
Daniel Black
d7e3265dd3 MDEV-29154 Excessive warnings upon a call to RANDOM_BYTES
Bring the 5 warnings of select random_bytes(cast('x' as unsigned)+1);
back to two. 1 for Item_func_random_bytes::fix_length_and_dec and
one from Item_func_random_bytes::val_str.

The warnings are from args[0]->val_int().
2022-07-31 14:54:37 +02:00
Daniel Black
8b9ac5bfe0 MDEV-29029 RANDOM_BYTES cannot be virtual column 2022-07-31 14:54:37 +02:00
Vanislavsky
3c2b0cac52 MDEV-25704 Add RANDOM_BYTES function
MySQL 5.6 added the RANDOM_BYTES function.
https://dev.mysql.com/doc/refman/5.6/en/encryption-functions.html#function_random-bytes

This is needed for compatibility purposes.
2022-07-31 14:54:37 +02:00
Sergei Golubchik
fe8b99dd5c MDEV-27104 deprecate DES_ENCRYPT/DECRYPT functions 2022-07-28 00:53:20 +02:00
Rucha Deodhar
3eb1e11d8a MDEV-23479: Add a THD* argument to Item_func_or_sum::fix_length_and_dec()
Fix: Added THD *thd argument in Item_func_or_sum::fix_length_and_dec() and in
fix_length_and_dec() for all derived classes of Item_func_or_sum.
2022-03-30 17:00:17 +05:30
Oleksandr Byelkin
4fb2cb1a30 Merge branch '10.7' into 10.8 2022-02-04 14:50:25 +01:00
Oleksandr Byelkin
9ed8deb656 Merge branch '10.6' into 10.7 2022-02-04 14:11:46 +01:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00
Oleksandr Byelkin
41a163ac5c Merge branch '10.2' into 10.3 2022-01-29 15:41:05 +01:00
Alexander Barkov
3d69213e74 MDEV-26953 Assertion `!str || str != Ptr || !is_alloced()' failed in String::copy upon SELECT with sjis
Item::save_str_in_field() passes &Item::str_value as a parameter
to val_str().

Item_func::make_empty_result() also fills and returns str_value.

As a result, in the reported scenario in
Item_func::val_str_from_val_str_ascii()
both "str" and "res" pointed to Item::str_value,
which made the DBUG_ASSERT inside String::copy()
(preventing copying to itself) crash:

  if ((null_value= str->copy(res->ptr(), res->length(),
                             &my_charset_latin1, collation.collation,
                             &errors)))

Fix:
- Adding a String* parameter to make_empty_result()
- Passing the val_str() parameter to make_empty_string().
2022-01-27 09:44:11 +04:00
Marko Mäkelä
5b3ad94c7b MDEV-27208: Extend CRC32() and implement CRC32C()
We used to define a native unary function CRC32() that computes the CRC-32
of a string using the ISO 3309 polynomial that is being used by zlib
and many others.

Often, a CRC is computed in pieces. To faciliate this, we introduce a
2-ary variant of the function that inputs a previous CRC as the first
argument: CRC32('MariaDB')=CRC32(CRC32('Maria'),'DB').

InnoDB and MyRocks use a different polynomial, which was implemented
in SSE4.2 instructions that were introduced in the
Intel Nehalem microarchitecture. This is commonly called CRC-32C
(Castagnoli).

We introduce a native function that uses the Castagnoli polynomial:
CRC32C('MariaDB')=CRC32C(CRC32C('Maria'),'DB'). This allows
SELECT...INTO DUMPFILE to be used for the creation of files with
valid checksums, such as a logically empty InnoDB redo log file
ib_logfile0 corresponding to a particular log sequence number.
2022-01-21 19:24:00 +02:00
Alexander Barkov
e4b302e436 MDEV-27018 IF and COALESCE lose "json" property
Hybrid functions (IF, COALESCE, etc) did not preserve the JSON property
from their arguments. The same problem was repeatable for single row subselects.

The problem happened because the method Item::is_json_type() was inconsistently
implemented across the Item hierarchy. For example, Item_hybrid_func
and Item_singlerow_subselect did not override is_json_type().

Solution:

- Removing Item::is_json_type()

- Implementing specific JSON type handlers:
  Type_handler_string_json
  Type_handler_varchar_json
  Type_handler_tiny_blob_json
  Type_handler_blob_json
  Type_handler_medium_blob_json
  Type_handler_long_blob_json

- Reusing the existing data type infrastructure to pass JSON
  type handlers across all item types, including classes Item_hybrid_func
  and Item_singlerow_subselect. Note, these two classes themselves do not
  need any changes!

- Extending the data type infrastructure so data types can inherit
  their properties (e.g. aggregation rules) from their base data types.
  E.g. VARCHAR/JSON acts as VARCHAR, LONGTEXT/JSON acts as LONGTEXT
  when mixed to a non-JSON data type. This is done by:
    - adding virtual method Type_handler::type_handler_base()
    - adding a helper class Type_handler_pair
    - refactoring Type_handler_hybrid_field_type methods
      aggregate_for_result(), aggregate_for_min_max(),
      aggregate_for_num_op() to use Type_handler_pair.

This change also fixes:

  MDEV-27361 Hybrid functions with JSON arguments do not send format metadata

Also, adding mtr tests for JSON replication. It was not covered yet.
And the current patch changes the replication code slightly.
2022-01-21 19:28:48 +04:00
Daniel Black
1d27b5789a MDEV-27544 database() function should return 64 characters
Database names are 64 utf8 characters per the system tables
that refer to them.

The current database() function is returning 34 characters.

The result of limiting this function results to max length of 34
became apparent when used in a UNION ALL where the results are
truncated to 34 characters.

For (uninvestigated) reasons, SELECT DATABASE() on its own
would always return the right number of characters.

Thanks Alexander Barkov for the review.

Thanks dave for noticing the bug in the stackexchange post
https://dba.stackexchange.com/questions/306183/why-is-my-database-name-truncated
2022-01-20 19:01:36 +11:00