Cleanup unnecessary whitespace at the end of lines and end of files
in the unittest/ directory. Note that all code changes are
non-functional.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
The patch for "MDEV-25440: Indexed CHAR ... broken with NO_PAD collations"
fixed these scenarios from MDEV-26743:
- Basic latin letter vs equal accented letter
- Two letters vs equal (but space padded) expansion
However, this scenario was still broken:
- Basic latin letter (but followed by an ignorable character)
vs equal accented letter
Fix:
When processing for a NOPAD collation a string with trailing ignorable
characters, like:
'<non-ignorable><ignorable><ignorable>'
the string gets virtually converted to:
'<non-ignorable><ignorable><ignorable><space><space><space>...'
After the fix the code works differently in these two cases:
1. <space> fits into the "nchars" limit
2. <space> does not fit into the "nchars" limit
Details:
1. If "nchars" is large enough (4+ in this example),
return weights as follows:
'[weight-for-non-ignorable, 1 char] [weight-for-space-character, 3 chars]'
i.e. the weight for the virtual trailing space character now indicates
that it corresponds to total 3 characters:
- two ignorable characters
- one virtual trailing space character
2. If "nchars" is small (3), then the virtual trailing space character
does not fit into the "nchar" limit, so return 0x00 as weight, e.g.:
'[weight-for-non-ignorable, 1 char] [0x00, 2 chars]'
Adding corresponding MTR tests and unit tests.
- Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars()
and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES.
The flag defines if strnncollsp_nchars() should emulate trailing spaces
which were possibly trimmed earlier (e.g. in InnoDB CHAR compression).
This is important for NOPAD collations.
For example, with this input:
- str1= 'a ' (Latin letter a followed by one space)
- str2= 'a ' (Latin letter a followed by two spaces)
- nchars= 3
if the flag is given, strnncollsp_nchars() will virtually restore
one trailing space to str1 up to nchars (3) characters and compare two
strings as equal:
- str1= 'a ' (one extra trailing space emulated)
- str2= 'a ' (as is)
If the flag is not given, strnncollsp_nchars() does not add trailing
virtual spaces, so in case of a NOPAD collation, str1 will be compared
as less than str2 because it is shorter.
- Field_string::cmp_prefix() now passes the new flag.
Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do
not pass the new flag.
- The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc
(which handles the CHAR data type) now also passed the new flag.
- Fixing UCA collations to respect the new flag.
Other collations are possibly also affected, however
I had no success in making an SQL script demonstrating the problem.
Other collations will be extended to respect this flags in a separate
patch later.
- Changing the meaning of the last parameter of Field::cmp_prefix()
from "number of bytes" (internal length)
to "number of characters" (user visible length).
The code calling cmp_prefix() from handler.cc was wrong.
After this change, the call in handler.cc became correct.
The code calling cmp_prefix() from key_rec_cmp() in key.cc
was adjusted according to this change.
- Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c
now pass the new flag.
A few new tests also were added, without the flag.
Also fixes:
MDEV-27768 MDEV-25440: Assertion `(cs->state & 0x20000) == 0' failed in my_strnncollsp_nchars_generic_8bit
The "strnncollsp_nchars" virtual function pointer for tis620_thai_nopad_ci
was incorrectly initialized to a generic function
my_strnncollsp_nchars_generic_8bit(), which crashed on assert.
Implementing a tis620 specific function version.
Adding two levels of optimization:
1. For every bytes pair [00..FF][00..FF] which:
a. consists of two ASCII characters or makes a well-formed two-byte character
b. whose total weight string fits into 4 weights
(concatenated weight string in case of two ASCII characters,
or a single weight string in case of a two-byte character)
c. whose weight is context independent (i.e. does not depend on contractions
or previous context pairs)
store weights in a separate array of MY_UCA_2BYTES_ITEM,
so during scanner_next() we can scan two bytes at a time.
Byte pairs that do not match the conditions a-c are marked in this array
as not applicable for optimization and scanned as before.
2. For every byte pair which is applicable for optimization in #1,
and which produces only one or two weights, store
weights in one more array of MY_UCA_WEIGHT2. So in the beginning
of strnncoll*() we can skip equal prefixes using an even more efficient
loop. This loop consumes two bytes at a time. The loop scans while the
two bytes on both sides produce weight strings of equal length
(i.e. one weight on both sides, or two weight on both sides).
This allows to compare efficiently:
- Context independent sequences consisting of two ASCII characters
- Context independent 2-byte characters
- Contractions consisting of two ASCII characters, e.g. Czech "ch".
- Some tricky cases: "ss" vs "SHARP S"
("ss" produces two weights, 0xC39F also produces two weights)
This change removed 68 explict strlen() calls from the code.
The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name
Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible
Other things:
- There where compiler issues with ensuring that all character set names
points to the same string: gcc doesn't allow one to use integer constants
when defining global structures (constant char * pointers works fine).
To get around this, I declared defines for each character set name
length.
- Removing the "diff_if_only_endspace_difference" argument from
MY_COLLATION_HANDLER::strnncollsp(), my_strnncollsp_simple(),
as well as in the function template MY_FUNCTION_NAME(strnncollsp)
in strcoll.ic
- Removing the "diff_if_only_space_different" from ha_compare_text(),
hp_rec_key_cmp().
- Adding a new function my_strnncollsp_padspace_bin() and reusing
it instead of duplicate code pieces in my_strnncollsp_8bit_bin(),
my_strnncollsp_latin1_de(), my_strnncollsp_tis620(),
my_strnncollsp_utf8_cs().
- Adding more tests for better coverage of the trailing space handling.
- Removing the unused definition of HA_END_SPACE_ARE_EQUAL
1. unused static inline functions are only removed at -xO4,
otherwise test binaries will depend on various mysys
symbols that they don't use. Link test with libmysys.
2. Sphinx - don't instantiate (explicitly) templates before
they're defined. Or, rather, don't instantiate them explicitly at
all.
3. GIS - don't use anonymous unions and structs.
The autotools-based build system has been superseded and
is being removed in order to ease the maintenance burden on
developers tweaking and maintaining the build system.
In order to support tools that need to extract the server
version, a new file that (only) contains the server version,
called VERSION, is introduced. The file contents are human
and machine-readable. The format is:
MYSQL_VERSION_MAJOR=5
MYSQL_VERSION_MINOR=5
MYSQL_VERSION_PATCH=8
MYSQL_VERSION_EXTRA=-rc
The CMake based version extraction in cmake/mysql_version.cmake
is changed to extract the version from this file. The configure
to CMake wrapper is retained for backwards compatibility and to
support the BUILD/ scripts. Also, a new a makefile target
show-dist-name that prints the server version is introduced.
VERSION:
Add top-level version file.
cmake/mysql_version.cmake:
Get version information from the top-level VERSION file.
Do not cache the version components (MAJOR_VERSION, etc).
Add MYSQL_RPM_VERSION as a replacement for MYSQL_U_SCORE_VERSION.
- Removed SCCS rule from Makefile.am
- Made dummy rule in sql_yacc.yy to get rid of compiler warning about not used label.
Don't use maintainer mode with valgrind (as we don't want to initialize all variables)
config/ac-macros/maintainer.m4:
Don't use maintainer mode with valgrind (as we don't want to initialize all variables)
Force initialization of variables when using -Werror (To get rid of compiler warnings)
configure.in:
Don't use maintainer mode with valgrind (as we don't want to initialize all variables)
sql/sql_yacc.yy:
Made dummy rule in sql_yacc.yy to get rid of compiler warning about not used label.