The patch for "MDEV-25440: Indexed CHAR ... broken with NO_PAD collations"
fixed these scenarios from MDEV-26743:
- Basic latin letter vs equal accented letter
- Two letters vs equal (but space padded) expansion
However, this scenario was still broken:
- Basic latin letter (but followed by an ignorable character)
vs equal accented letter
Fix:
When processing for a NOPAD collation a string with trailing ignorable
characters, like:
'<non-ignorable><ignorable><ignorable>'
the string gets virtually converted to:
'<non-ignorable><ignorable><ignorable><space><space><space>...'
After the fix the code works differently in these two cases:
1. <space> fits into the "nchars" limit
2. <space> does not fit into the "nchars" limit
Details:
1. If "nchars" is large enough (4+ in this example),
return weights as follows:
'[weight-for-non-ignorable, 1 char] [weight-for-space-character, 3 chars]'
i.e. the weight for the virtual trailing space character now indicates
that it corresponds to total 3 characters:
- two ignorable characters
- one virtual trailing space character
2. If "nchars" is small (3), then the virtual trailing space character
does not fit into the "nchar" limit, so return 0x00 as weight, e.g.:
'[weight-for-non-ignorable, 1 char] [0x00, 2 chars]'
Adding corresponding MTR tests and unit tests.
- Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars()
and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES.
The flag defines if strnncollsp_nchars() should emulate trailing spaces
which were possibly trimmed earlier (e.g. in InnoDB CHAR compression).
This is important for NOPAD collations.
For example, with this input:
- str1= 'a ' (Latin letter a followed by one space)
- str2= 'a ' (Latin letter a followed by two spaces)
- nchars= 3
if the flag is given, strnncollsp_nchars() will virtually restore
one trailing space to str1 up to nchars (3) characters and compare two
strings as equal:
- str1= 'a ' (one extra trailing space emulated)
- str2= 'a ' (as is)
If the flag is not given, strnncollsp_nchars() does not add trailing
virtual spaces, so in case of a NOPAD collation, str1 will be compared
as less than str2 because it is shorter.
- Field_string::cmp_prefix() now passes the new flag.
Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do
not pass the new flag.
- The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc
(which handles the CHAR data type) now also passed the new flag.
- Fixing UCA collations to respect the new flag.
Other collations are possibly also affected, however
I had no success in making an SQL script demonstrating the problem.
Other collations will be extended to respect this flags in a separate
patch later.
- Changing the meaning of the last parameter of Field::cmp_prefix()
from "number of bytes" (internal length)
to "number of characters" (user visible length).
The code calling cmp_prefix() from handler.cc was wrong.
After this change, the call in handler.cc became correct.
The code calling cmp_prefix() from key_rec_cmp() in key.cc
was adjusted according to this change.
- Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c
now pass the new flag.
A few new tests also were added, without the flag.
Also fixes:
MDEV-27768 MDEV-25440: Assertion `(cs->state & 0x20000) == 0' failed in my_strnncollsp_nchars_generic_8bit
The "strnncollsp_nchars" virtual function pointer for tis620_thai_nopad_ci
was incorrectly initialized to a generic function
my_strnncollsp_nchars_generic_8bit(), which crashed on assert.
Implementing a tis620 specific function version.
- Removing the "diff_if_only_endspace_difference" argument from
MY_COLLATION_HANDLER::strnncollsp(), my_strnncollsp_simple(),
as well as in the function template MY_FUNCTION_NAME(strnncollsp)
in strcoll.ic
- Removing the "diff_if_only_space_different" from ha_compare_text(),
hp_rec_key_cmp().
- Adding a new function my_strnncollsp_padspace_bin() and reusing
it instead of duplicate code pieces in my_strnncollsp_8bit_bin(),
my_strnncollsp_latin1_de(), my_strnncollsp_tis620(),
my_strnncollsp_utf8_cs().
- Adding more tests for better coverage of the trailing space handling.
- Removing the unused definition of HA_END_SPACE_ARE_EQUAL
1. unused static inline functions are only removed at -xO4,
otherwise test binaries will depend on various mysys
symbols that they don't use. Link test with libmysys.
2. Sphinx - don't instantiate (explicitly) templates before
they're defined. Or, rather, don't instantiate them explicitly at
all.
3. GIS - don't use anonymous unions and structs.
The autotools-based build system has been superseded and
is being removed in order to ease the maintenance burden on
developers tweaking and maintaining the build system.
In order to support tools that need to extract the server
version, a new file that (only) contains the server version,
called VERSION, is introduced. The file contents are human
and machine-readable. The format is:
MYSQL_VERSION_MAJOR=5
MYSQL_VERSION_MINOR=5
MYSQL_VERSION_PATCH=8
MYSQL_VERSION_EXTRA=-rc
The CMake based version extraction in cmake/mysql_version.cmake
is changed to extract the version from this file. The configure
to CMake wrapper is retained for backwards compatibility and to
support the BUILD/ scripts. Also, a new a makefile target
show-dist-name that prints the server version is introduced.
VERSION:
Add top-level version file.
cmake/mysql_version.cmake:
Get version information from the top-level VERSION file.
Do not cache the version components (MAJOR_VERSION, etc).
Add MYSQL_RPM_VERSION as a replacement for MYSQL_U_SCORE_VERSION.
- Removed SCCS rule from Makefile.am
- Made dummy rule in sql_yacc.yy to get rid of compiler warning about not used label.
Don't use maintainer mode with valgrind (as we don't want to initialize all variables)
config/ac-macros/maintainer.m4:
Don't use maintainer mode with valgrind (as we don't want to initialize all variables)
Force initialization of variables when using -Werror (To get rid of compiler warnings)
configure.in:
Don't use maintainer mode with valgrind (as we don't want to initialize all variables)
sql/sql_yacc.yy:
Made dummy rule in sql_yacc.yy to get rid of compiler warning about not used label.
mysql-test/t/bug46080-master.opt:
Lower limits to be able to run tests
regex/main.c:
Fixed compiler warnings
storage/maria/ma_key_recover.c:
Fixed compiler warnings
storage/maria/ma_recovery.c:
Fixed compiler warnings
storage/maria/ma_unique.c:
Fixed compiler warnings
strings/ctype-uca.c:
Added comment
strings/xml.c:
Fixed compiler warnings
support-files/compiler_warnings.supp:
Added suppressions for windows
unittest/strings/strings-t.c:
Added ifdef to fix compilation failure when compiling without UCA
Problem: The functions my_like_range_xxx() returned
badly formed maximum strings for Asian character sets,
which made problems for storage engines.
Fix:
- Removed a number my_like_range_xxx() implementations,
which were in fact dumplicate code pieces.
- Using generic my_like_range_mb() instead.
- Setting max_sort_char member properly for Asian character sets
- Adding unittest/strings/strings-t.c,
to test that my_like_range_xxx() return well-formed
min and max strings.
Notes:
- No additional tests in mysql/t/ available.
Old tests cover the affected code well enough.