This is done by mapping most of the existing MySQL unicode 0900 collations
to MariadB 1400 unicode collations. The assumption is that 1400 is a super
set of 0900 for all practical purposes.
I also added a new function 'compare_collations()' and changed most code
to use this instead of comparing character sets directly.
This enables one to seamlessly mix-and-match the corresponding 0900 and
1400 sets. Field comparision and alter table treats the character sets
as identical.
All MySQL 8.0 0900 collations are supported except:
- utf8mb4_ja_0900_as_cs
- utf8mb4_ja_0900_as_cs_ks
- utf8mb4_ru_0900_as_cs
- utf8mb4_zh_0900_as_cs
These do not have corresponding entries in the MariadB 01400 collations.
Other things:
- Added COMMENT colum to information_schema.collations. For utf8mb4_0900
colletions it contains the corresponding alias collation.
strerror_s on Linux will, for unknown error codes, display
'Unknown error <codenum>' and our tests are written with this assumption.
However, on macOS, sterror_s returns 'Unknown error: <codenum>' in the
same case, which breaks tests. Make my_strerror consistent across the
platforms by removing the ':' when present.
The code in my_strtoll10_mb2 and my_strtoll10_utf32
could hit undefinite behavior by negation of LONGLONG_MIN.
Fixing to avoid this.
Also, fixing my_strtoll10() in the same style.
The previous reduction produced a redundant warning on
CAST(_latin1'-9223372036854775808' AS SIGNED)
The code in my_strntoull_8bit() and my_strntoull_mb2_or_mb4()
could hit undefinite behavior by negating of LONGLONG_MIN.
Fixing the code to avoid this.
This patch fixes two problems:
- The code inside my_strtod_int() in strings/dtoa.c could test the byte
behind the end of the string when processing the mantissa.
Rewriting the code to avoid this.
- The code in test_if_number() in sql/sql_analyse.cc called my_atof()
which is unsafe and makes the called my_strtod_int() look behind
the end of the string if the input string is not 0-terminated.
Fixing test_if_number() to use my_strtod() instead, passing the correct
end pointer.
nullptr+0 is an UB (undefined behavior).
- Fixing my_string_metadata_get_mb() to handle {nullptr,0} without UB.
- Fixing THD::copy_with_error() to disallow {nullptr,0} by DBUG_ASSERT().
- Fixing parse_client_handshake_packet() to call THD::copy_with_error()
with an empty string {"",0} instead of NULL string {nullptr,0}.
- Fixing the code in get_interval_value() to use Longlong_hybrid_null.
This allows to handle correctly:
- Signed and unsigned arguments
(the old code assumed the argument to be signed)
- Avoid undefined negation behavior the corner case with LONGLONG_MIN
This fixes the UBSAN warning:
negation of -9223372036854775808 cannot be represented
in type 'long long int';
- Fixing the code in get_interval_value() to avoid overflow in
the INTERVAL_QUARTER and INTERVAL_WEEK branches.
This fixes the UBSAN warning:
signed integer overflow: -9223372036854775808 * 7 cannot be represented
in type 'long long int'
- Fixing the INTERVAL_WEEK branch in date_add_interval() to handle
huge numbers correctly. Before the change, huge positive numeber
were treated as their negative complements.
Note, some other branches still can be affected by this problem
and should also be fixed eventually.
Fixing the condition to raise an overflow in the ulonglong
representation of the number is greater or equal to 0x8000000000000000ULL.
Before this change the condition did not catch -9223372036854775808
(the smallest possible signed negative longlong number).
The problem was introduced by MDEV-30879.
The function my_strntoll_8bit() was correctly changed by MDEV-30879.
The function my_strntoll_mb2_or_mb4() was not.
Applying the missing change to my_strntoll_mb2_or_mb4().
BASE 62 uses 0-9, A-Z and then a-z to give the numbers 0-61. This patch
increases the range of the string functions to cover this.
Based on ideas and tests in PR #2589, but re-written into the charset
functions.
Includes fix by Sergei, UBSAN complained:
ctype-simple.c:683:38: runtime error: negation of -9223372036854775808
cannot be represented in type 'long long int'; cast to an unsigned
type to negate this value to itself
Co-authored-by: Weijun Huang <huangweijun1001@gmail.com>
Co-authored-by: Sergei Golubchik <serg@mariadb.org>
Modify the NS_ZERO state in the JSON number parser to allow
exponential notation with a zero coefficient (e.g. 0E-4).
The NS_ZERO state transition on 'E' was updated to move to the
NS_EX state rather than returning a syntax error. Similar change
was made for the NS_ZE1 (negative zero) starter state.
This allows accepted number grammar to include cases like:
- 0E4
- -0E-10
which were previously disallowed. Numeric parsing remains
the same for all other states.
Test cases are added to func_json.test to validate parsing for
various exponential numbers starting with zero coefficients.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services.
The patch for "MDEV-25440: Indexed CHAR ... broken with NO_PAD collations"
fixed these scenarios from MDEV-26743:
- Basic latin letter vs equal accented letter
- Two letters vs equal (but space padded) expansion
However, this scenario was still broken:
- Basic latin letter (but followed by an ignorable character)
vs equal accented letter
Fix:
When processing for a NOPAD collation a string with trailing ignorable
characters, like:
'<non-ignorable><ignorable><ignorable>'
the string gets virtually converted to:
'<non-ignorable><ignorable><ignorable><space><space><space>...'
After the fix the code works differently in these two cases:
1. <space> fits into the "nchars" limit
2. <space> does not fit into the "nchars" limit
Details:
1. If "nchars" is large enough (4+ in this example),
return weights as follows:
'[weight-for-non-ignorable, 1 char] [weight-for-space-character, 3 chars]'
i.e. the weight for the virtual trailing space character now indicates
that it corresponds to total 3 characters:
- two ignorable characters
- one virtual trailing space character
2. If "nchars" is small (3), then the virtual trailing space character
does not fit into the "nchar" limit, so return 0x00 as weight, e.g.:
'[weight-for-non-ignorable, 1 char] [0x00, 2 chars]'
Adding corresponding MTR tests and unit tests.