This change removed 68 explict strlen() calls from the code.
The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name
Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible
Other things:
- There where compiler issues with ensuring that all character set names
points to the same string: gcc doesn't allow one to use integer constants
when defining global structures (constant char * pointers works fine).
To get around this, I declared defines for each character set name
length.
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
This patch ensures that all identical character sets shares the same
cs->csname.
This allows us to replace strcmp() in my_charset_same() with comparisons
of pointers. This fixes a long standing performance issue that could cause
as strcmp() for every item sent trough the protocol class to the end user.
One consequence of this patch is that we don't allow one to add a character
definition in the Index.xml file that changes the csname of an existing
character set. This is by design as changing character set names of existing
ones is extremely dangerous, especially as some storage engines just records
character set numbers.
As we now have a hash over character set's csname, we can in the future
use that for faster access to a specific character set. This could be done
by changing the hash to non unique and use the hash to find the next
character set with same csname.
The code did not take into account that:
- U+005C (backslash) can occupy more than mbminlen characters (e.g. in sjis)
- Some character sets do not have a code for U+005C (e.g. swe7)
Adding a new function my_wc_to_printable into MY_CHARSET_HANDLER to
cover all special cases easier.
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
- Moving detection of the MY_CS_CSSORT, MY_CS_PUREASCII, MY_CS_NONASCII
flags of loadable collations from add_collation() in mysys.c
to my_cset_init_8bit() and my_coll_init_simple() in ctype-simple.c.
- Adding tests that these flags are set properly for loadable collations
- Moving LDML test related *.xml files from mysql-test/std_data/
to mysql-test/std_data/ldml/, as there will be more *.xml test files
The Item_string constructors called set_name() on the source string,
which was wrong because in case of UCS2/UTF16/UTF32 the source value
might be a not well formed string (e.g. have incomplete leftmost character).
Now set_name() is called on str_value after its copied
(with optionally left zero padding) from the source string.
- MDEV-6694 Illegal mix of collation with a PS parameter
Item_param::convert_str_value() did not set repertoire.
Introducing a new structure MY_STRING_METADATA to collect
character length and repertoire of a string in a single loop,
to avoid two separate loops. Adding a new class Item_basic_value::Metadata
as a convenience wrapper around MY_STRING_METADATA, to reuse the
code between Item_string and Item_param.
"mtr ctype_ldml" failed when compiled with "gcc -funsigned-char".
Changing the code not to depend on the signed/unsigned compiler defaults
for the "char" data type.
sql/sql_insert.cc:
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
******
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
sql/sql_table.cc:
small cleanup
******
small cleanup