Bug#51676 Server crashes on SELECT, ORDER BY on 'utf8mb4' column
An additional fix. We should use 0xFFFD as a weight for supplementary
characters, not the "weight for character U+FFFD".
Problem: the "caseinfo" member of CHARSET_INFO structure was not
initialized for user-defined Unicode collations, which made the
server crash.
Fix: initializing caseinfo properly.
This patch provides performance improvements:
- send_fields() when character_set_results = latin1
is now about twice faster for column/table/database
names, consisting on ASCII characters.
Changes:
- Protocol doesn't use "convert" temporary buffer anymore,
and converts strings directly to "packet".
- General conversion optimization: quick conversion
of ASCII strings was added.
modified files:
include/m_ctype.h
- Adding a new flag.
- Adding a new function prototype
libmysqld/lib_sql.cc
- Adding quick conversion method for embedded library:
conversion is now done directly to result buffer,
without using a temporary buffer.
mysys/charset.c
- Mark all dynamic ucs2 character sets as non-ASCII
- Mark some dymamic 7bit and 8bit charsets as non-ASCII
(for example swe7 is not fully ASCII compatible).
sql/protocol.cc
- Adding quick method to convert a string directly
into protocol buffer, without using a temporary buffer.
sql/protocol.h
- Adding a new method prototype
sql/sql_string.cc
Optimization for conversion between two ASCII-compatible charsets:
- quickly convert ASCII strings,
switch to mc_wc->wc_mb method only when a non-ASCII character is met.
- copy four ASCII characters at once on i386
strings/conf_to_src.c
- Marking non-ASCII character sets with a flag.
strings/ctype-extra.c
- Regenerating ctype-extra.c by running "conf_to_src".
strings/ctype-uca.c
- Marking UCS2 character set as non-ASCII.
strings/ctype-ucs2.c
- Marking UCS2 character set as non-ASCII.
strings/ctype.c
- A new function to detect if a 7bit or 8bit character set
is ascii compatible.
Problem:
Crash happened with a user-defined utf8 collation,
on attempt to insert a value longer than the column
to store.
Reason:
The "ctype" member was not initialized (NULL) when
allocating a user-defined utf8 collation, so an attempt
to call my_ctype(cs, *str) to check if we loose any important
data when truncating the value made the server crash.
Fix:
Initializing tge "ctype" member to a proper value.
mysql-test/r/ctype_ldml.result
Adding tests
mysql-test/t/ctype_ldml.test
Adding tests
strings/ctype-uca.c
Adding initialization of "ctype" member.
modified:
mysql-test/r/ctype_ldml.result
mysql-test/t/ctype_ldml.test
strings/ctype-uca.c
Problem: like_range() returned wrong ranges for contractions (like 'ch' in Czech').
Fix: adding a special code to handle tricky cases:
- contraction head followed by a wild character
- full contraction
- contraction part followed by another contraction part,
but they are not a contraction together.
and is not described in the manual
- Adding missing initialization for utf8 collations
- Minor code clean-ups: renaming variables,
moving code into a new separate function.
- Adding test, to check that both ucs2 and utf8 user
defined collations work (ucs2_test_ci and utf8_test_ci)
- Adding Vietnamese collation as a complex user defined
collation example.
The following type conversions was done:
- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t
Removed declaration of byte, gptr, my_string, my_size_t and size_s.
Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
as this requires fewer casts in the code and is more in line with how the
standard functions work.
- Added extra length argument to dirname_part() to return the length of the
created string.
- Changed (at least) following functions to take uchar* as argument:
- db_dump()
- my_net_write()
- net_write_command()
- net_store_data()
- DBUG_DUMP()
- decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
argument to my_uncompress() from a pointer to a value as we only return
one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.
Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
explicitely as this conflict was often hided by casting the function to
hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
size_t. This was needed to properly detect errors (which are
returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
(portability fix)
- Removed windows specific code to restore cursor position as this
causes slowdown on windows and we should not mix read() and pread()
calls anyway as this is not thread safe. Updated function comment to
reflect this. Changed function that depended on original behavior of
my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
m_size is the number of elements in the array, not a string/memory
length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
- Replaced some calls to alloc_root + memcpy to use
strmake_root()/strdup_root().
- Changed some calls from memdup() to strmake() (Safety fix)
- Simpler loops in client-simple.c
Problem: GROUP BY on empty ucs2 strings crashed server.
Reason: sometimes mi_unique_hash() is executed with
ptr=null and length=0, which means "empty string".
The branch of code handling UCS2 character set
was not safe against ptr=null and fell into and
endless loop even if length=0 because of poiter
arithmetic overflow.
Fix: adding special check for length=0 to avoid pointer arithmetic
overflow.
- Removed not used variables and functions
- Added #ifdef around code that is not used
- Renamed variables and functions to avoid conflicts
- Removed some not used arguments
Fixed some class/struct warnings in ndb
Added define IS_LONGDATA() to simplify code in libmysql.c
I did run gcov on the changes and added 'purecov' comments on almost all lines that was not just variable name changes
new file
mysql_fix_privilege_tables.sql, mysql_create_system_tables.sh:
Adding true BINARY/VARBINARY: fixing "password" type, not to be 0x00-padding.
Many files:
Adding true BINARY/VARBINARY: fixing tests not to output 0x00 bytes.
Adding true BINARY/VARBINARY: new pad_char structure member.
ctype-bin.c:
Adding true BINARY/VARBINARY: new pad_char structure member.
New strnxfrm, with two trailing length bytes.
field.cc:
Adding true BINARY/VARBINARY.
In cp932, '\' character can be the second byte in a
multi-byte character stream. This makes it difficult to use
mysql_escape_string. Added flag to indicate which languages allow
'\' as second byte of multibyte sequence so that when putting a prepared
statement into the binlog we can decide at runtime whether hex encoding
is really needed.
UPPER/LOWER now can return a string with different length.
mi_test1.c:
Adding new arguments.
Many files:
Changeing caseup/casedn to return a result with different
length than argument.
sql_string.h:
Removing unused method,
mysql_priv.h:
Removing unused method
fixing test results accordingly.
ctype-uca.c:
It appeared that in traditional Spanish collation
'RR' is not equal to 'R', as Unicode and Mimer state.
We'll go Oracle and IBM way instead:
No special rules to 'RR'.
Renamed HA_VAR_LENGTH to HA_VAR_LENGTH_PART
Renamed in all files FIELD_TYPE_STRING and FIELD_TYPE_VAR_STRING to MYSQL_TYPE_STRING and MYSQL_TYPE_VAR_STRING to make it easy to catch all possible errors
Added support for VARCHAR KEYS to heap
Removed support for ISAM
Now only long VARCHAR columns are changed to TEXT on demand (not CHAR)
Internal temporary files can now use fixed length tables if the used VARCHAR columns are short