This patch also fixes:
MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
MDEV-33088 Cannot create triggers in the database `MYSQL`
MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0
- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER
- Adding a wrapper function CHARSET_INFO::streq(), to compare
two strings for equality. For now it calls strnncoll() internally.
In the future it will turn into a virtual function.
- Adding new accent sensitive case insensitive collations:
- utf8mb4_general1400_as_ci
- utf8mb3_general1400_as_ci
They implement accent sensitive case insensitive comparison.
The weight of a character is equal to the code point of its
upper case variant. These collations use Unicode-14.0.0 casefolding data.
The result of
my_charset_utf8mb3_general1400_as_ci.strcoll()
is very close to the former
my_charset_utf8mb3_general_ci.strcasecmp()
There is only a difference in a couple dozen rare characters, because:
- the switch from "tolower" to "toupper" comparison, to make
utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
- the switch from Unicode-3.0.0 to Unicode-14.0.0
This difference should be tolarable. See the list of affected
characters in the MDEV description.
Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
Unlike utf8mb4_general_ci, it does not treat all BMP characters
as equal.
- Adding classes representing names of the file based database objects:
Lex_ident_db
Lex_ident_table
Lex_ident_trigger
Their comparison collation depends on the underlying
file system case sensitivity and on --lower-case-table-names
and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.
- Adding classes representing names of other database objects,
whose names have case insensitive comparison style,
using my_charset_utf8mb3_general1400_as_ci:
Lex_ident_column
Lex_ident_sys_var
Lex_ident_user_var
Lex_ident_sp_var
Lex_ident_ps
Lex_ident_i_s_table
Lex_ident_window
Lex_ident_func
Lex_ident_partition
Lex_ident_with_element
Lex_ident_rpl_filter
Lex_ident_master_info
Lex_ident_host
Lex_ident_locale
Lex_ident_plugin
Lex_ident_engine
Lex_ident_server
Lex_ident_savepoint
Lex_ident_charset
engine_option_value::Name
- All the mentioned Lex_ident_xxx classes implement a method streq():
if (ident1.streq(ident2))
do_equal();
This method works as a wrapper for CHARSET_INFO::streq().
- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
in class members and in function/method parameters.
- Replacing all calls like
system_charset_info->coll->strcasecmp(ident1, ident2)
to
ident1.streq(ident2)
- Taking advantage of the c++11 user defined literal operator
for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
data types. Use example:
const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;
is now a shorter version of:
const Lex_ident_column primary_key_name=
Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
String length growth during upper/lower conversion
in Unicode collations depends only on the underlying MY_UNICASE_INFO
used in the collation.
Maintaining a separate member CHARSET_INFO::caseup_multiply and
CHARSET_INFO::casedn_multiply duplicated this information
and caused bugs like this (when MY_UNICASE_INFO and case??_multiply
when out of sync because of incomplete CHARSET_INFO initialization).
Fix:
Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply
from members to virtual functions.
The virtual functions in Unicode collations calculate case conversion
growth factors from the MY_UNICASE_INFO. This guarantees that the growth
factors are always in sync with the MY_UNICASE_INFO.
- Added one neutral and 22 tailored (language specific) collations based on
Unicode Collation Algorithm version 14.0.0.
Collations were added for Unicode character sets
utf8mb3, utf8mb4, ucs2, utf16, utf32.
Every tailoring was added with four accent and case
sensitivity flag combinations, e.g:
* utf8mb4_uca1400_swedish_as_cs
* utf8mb4_uca1400_swedish_as_ci
* utf8mb4_uca1400_swedish_ai_cs
* utf8mb4_uca1400_swedish_ai_ci
and their _nopad_ variants:
* utf8mb4_uca1400_swedish_nopad_as_cs
* utf8mb4_uca1400_swedish_nopad_as_ci
* utf8mb4_uca1400_swedish_nopad_ai_cs
* utf8mb4_uca1400_swedish_nopad_ai_ci
- Introducing a conception of contextually typed named collations:
CREATE DATABASE db1 CHARACTER SET utf8mb4;
CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci);
The idea is that there is no a need to specify the character set prefix
in the new collation names. It's enough to type just the suffix
"uca1400_as_ci". The character set is taken from the context.
In the above example script the context character set is utf8mb4.
So the CREATE TABLE will make a column with the collation
utf8mb4_uca1400_as_ci.
Short collations names can be used in any parts of the SQL syntax
where the COLLATE clause is understood.
- New collations are displayed only one time
(without character set combinations) by these statements:
SELECT * FROM INFORMATION_SCHEMA.COLLATIONS;
SHOW COLLATION;
For example, all these collations:
- utf8mb3_uca1400_swedish_as_ci
- utf8mb4_uca1400_swedish_as_ci
- ucs2_uca1400_swedish_as_ci
- utf16_uca1400_swedish_as_ci
- utf32_uca1400_swedish_as_ci
have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION,
with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix
without the character set name:
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
+-----------------------+
| COLLATION_NAME |
+-----------------------+
| uca1400_swedish_as_ci |
+-----------------------+
Note, the behaviour of old collations did not change.
Non-unicode collations (e.g. latin1_swedish_ci) and
old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci)
are still displayed with the character set prefix, as before.
- The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed.
The NOT NULL constraint was removed from these columns:
- CHARACTER_SET_NAME
- ID
- IS_DEFAULT
and from the corresponding columns in SHOW COLLATION.
For example:
SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
+-----------------------+--------------------+------+------------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT |
+-----------------------+--------------------+------+------------+
| uca1400_swedish_as_ci | NULL | NULL | NULL |
+-----------------------+--------------------+------+------------+
The NULL value in these columns now means that the collation
is applicable to multiple character sets.
The behavioir of old collations did not change.
Make sure your client programs can handle NULL values in these columns.
- The structure of the table
INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed.
Three new NOT NULL columns were added:
- FULL_COLLATION_NAME
- ID
- IS_DEFAULT
New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY.
The column COLLATION_NAME contains the collation name without the character
set prefix. The column FULL_COLLATION_NAME contains the collation name with
the character set prefix.
Old collations have full collation name in both FULL_COLLATION_NAME and
COLLATION_NAME.
SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4|latin1).*swedish.*ci$';
+-----------------------------+-------------------------------------+--------------------+------+------------+
| COLLATION_NAME | FULL_COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT |
+-----------------------------+-------------------------------------+--------------------+------+------------+
| latin1_swedish_ci | latin1_swedish_ci | latin1 | 8 | Yes |
| latin1_swedish_nopad_ci | latin1_swedish_nopad_ci | latin1 | 1032 | |
| utf8mb4_swedish_ci | utf8mb4_swedish_ci | utf8mb4 | 232 | |
| uca1400_swedish_ai_ci | utf8mb4_uca1400_swedish_ai_ci | utf8mb4 | 2368 | |
| uca1400_swedish_as_ci | utf8mb4_uca1400_swedish_as_ci | utf8mb4 | 2370 | |
| uca1400_swedish_nopad_ai_ci | utf8mb4_uca1400_swedish_nopad_ai_ci | utf8mb4 | 2372 | |
| uca1400_swedish_nopad_as_ci | utf8mb4_uca1400_swedish_nopad_as_ci | utf8mb4 | 2374 | |
+-----------------------------+-------------------------------------+--------------------+------+------------+
- Other INFORMATION_SCHEMA queries:
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS;
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS;
SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES;
SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS;
SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS;
SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES;
SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES;
SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS;
SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS;
SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS;
display full collation names, including character sets prefix,
for all collations, including new collations.
Corresponding SHOW commands also display full collation names
in collation related columns:
SHOW CREATE TABLE t1;
SHOW CREATE DATABASE db1;
SHOW TABLE STATUS;
SHOW CREATE FUNCTION f1;
SHOW CREATE PROCEDURE p1;
SHOW CREATE EVENT ev1;
SHOW CREATE TRIGGER tr1;
SHOW CREATE VIEW;
These INFORMATION_SCHEMA queries and SHOW statements may change in
the future, to display show collation names.
If someone on whatever reasons uses --default-character-set=cp850,
this will avoid incorrect display, and inserting incorrect data.
Adjusting console codepage sometimes also needs to happen with
--default-charset=auto, on older Windows. This is because autodetection
is not always exact. For example, console codepage on US editions of
Windows is 437. Client autodetects it as cp850, a rather loose
approximation, given 46 code point differences. We change the console
codepage to cp850, so that there is no discrepancy.
That fix is currently Windows-only, and serves people who used combination
of chcp to achieve WYSIWYG effect (although, this would mostly likely used
with utf8 in the past)
Now, --default-character-set would be a replacement for that.
Fix fs_character_set() detection of current codepage.
- Use corresponding entry in the manifest, as described in
https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
- If if ANSI codepage is UTF8 (i.e for Windows 1903 and later)
Use UTF8 as default client charset
Set console codepage(s) to UTF8, in case process is using console
- Allow some previously disabled MTR tests, that used Unicode for in "exec",
for the recent Windows versions
This change removed 68 explict strlen() calls from the code.
The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name
Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible
Other things:
- There where compiler issues with ensuring that all character set names
points to the same string: gcc doesn't allow one to use integer constants
when defining global structures (constant char * pointers works fine).
To get around this, I declared defines for each character set name
length.
Changes:
- To detect automatic strlen() I removed the methods in String that
uses 'const char *' without a length:
- String::append(const char*)
- Binary_string(const char *str)
- String(const char *str, CHARSET_INFO *cs)
- append_for_single_quote(const char *)
All usage of append(const char*) is changed to either use
String::append(char), String::append(const char*, size_t length) or
String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
String::append()
- Added overflow argument to escape_string_for_mysql() and
escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
This was needed as most usage of the above functions never tested the
result for -1 and would have given wrong results or crashes in case
of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
Changed all Item_func::func_name()'s to func_name_cstring()'s.
The old Item_func_or_sum::func_name() is now an inline function that
returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
LEX_CSTRING.
- Changed for some functions the name argument from const char * to
to const LEX_CSTRING &:
- Item::Item_func_fix_attributes()
- Item::check_type_...()
- Type_std_attributes::agg_item_collations()
- Type_std_attributes::agg_item_set_converter()
- Type_std_attributes::agg_arg_charsets...()
- Type_handler_hybrid_field_type::aggregate_for_result()
- Type_handler_geometry::check_type_geom_or_binary()
- Type_handler::Item_func_or_sum_illegal_param()
- Predicant_to_list_comparator::add_value_skip_null()
- Predicant_to_list_comparator::add_value()
- cmp_item_row::prepare_comparators()
- cmp_item_row::aggregate_row_elements_for_comparison()
- Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
could be simplified to not use String_space(), thanks to the fixed
my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
- NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
clarify what the function really does.
- Rename of protocol function:
bool store(const char *from, CHARSET_INFO *cs) to
bool store_string_or_null(const char *from, CHARSET_INFO *cs).
This was done to both clarify the difference between this 'store' function
and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.
Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
This patch ensures that all identical character sets shares the same
cs->csname.
This allows us to replace strcmp() in my_charset_same() with comparisons
of pointers. This fixes a long standing performance issue that could cause
as strcmp() for every item sent trough the protocol class to the end user.
One consequence of this patch is that we don't allow one to add a character
definition in the Index.xml file that changes the csname of an existing
character set. This is by design as changing character set names of existing
ones is extremely dangerous, especially as some storage engines just records
character set numbers.
As we now have a hash over character set's csname, we can in the future
use that for faster access to a specific character set. This could be done
by changing the hash to non unique and use the hash to find the next
character set with same csname.
Restore the detection of default charset in command line utilities.
It worked up to 10.1, but was broken by Connector/C.
Moved code for detection of default charset from sql-common/client.c
to mysys, and make command line utilities to use this code if charset
was not specified on the command line.
- Moving detection of the MY_CS_CSSORT, MY_CS_PUREASCII, MY_CS_NONASCII
flags of loadable collations from add_collation() in mysys.c
to my_cset_init_8bit() and my_coll_init_simple() in ctype-simple.c.
- Adding tests that these flags are set properly for loadable collations
- Moving LDML test related *.xml files from mysql-test/std_data/
to mysql-test/std_data/ldml/, as there will be more *.xml test files
- Character set code & tests from Alexander Barkov
- Integration with ALTER TABLE, REPAIR and open_table from Monty
The problem was that MySQL 5.6 added some croatian and vitanamese character set collations that are incompatible with MariaDB.
The fix is to move the MariaDB conflicting collation numbers out of the region that MySQL is likely to use.
mysql_upgrade, REPAIR TABLE or ALTER TABLE will fix the collations.
If one tries to access and old incompatible table, one will get the error "Table upgrade required...."
After this patch, MariaDB supports all the MySQL character set collations and the old MariaDB croatian collations, which are closer to the latest standard than the MySQL versions.
New character sets:
ucs2_croatian_mysql561_uca_ci
utf8_croatian_mysql561_uca_ci
utf16_croatian_mysql561_uca_ci
utf32_croatian_mysql561_uca_ci
utf8mb4_croatian_mysql561_uca_ci
Other things:
- Fixed some compiler warnings
- mysql_upgrade prints information about repaired tables.
- Increased version number
VERSION:
Increased VERSION number
client/mysqlcheck.c:
Print repaired table name when using --verbose
include/m_ctype.h:
Add new MariaDB collation regions that are not likely to conflict with MySQL
include/my_base.h:
Added flag to detect if table was opened for ALTER TABLE
mysql-test/r/ctype_ldml.result:
Updated result
mysql-test/r/ctype_uca.result:
Updated result
mysql-test/r/ctype_upgrade.result:
Updated result
mysql-test/r/ctype_utf16_uca.result:
Updated result
mysql-test/r/ctype_utf32_uca.result:
Updated result
mysql-test/r/ctype_utf8mb4_uca.result:
Updated result
mysql-test/std_data/ctype_upgrade:
Test files for testing upgrading of conflicting collations
mysql-test/suite/engines/funcs/r/db_alter_collate_ascii.result:
New collations added
mysql-test/suite/engines/funcs/r/db_alter_collate_utf8.result:
New collations added
mysql-test/suite/innodb/r/innodb_ctype_ldml.result:
Updated test result
mysql-test/suite/innodb/t/innodb_ctype_ldml.test:
Updated test result
mysql-test/suite/plugins/r/show_all_plugins.result:
Updated version number
mysql-test/suite/roles/create_and_drop_role_invalid_user_table.result:
Updated version number
mysql-test/t/ctype_ldml.test:
Updated test
mysql-test/t/ctype_uca.test:
Testing of new collations
mysql-test/t/ctype_upgrade.test:
Testing of upgrading tables with old collations
The test ensures that:
- We will get an error if we try to open a table with old collations.
- CHECK TABLE will detect that the table needs to be upgraded.
- ALTER TABLE and REPAIR will fix the table.
- mysql_upgrade works as expected
mysql-test/t/ctype_utf16_uca.test:
Testing of new collations
mysql-test/t/ctype_utf32_uca.test:
Testing of new collations
mysql-test/t/ctype_utf8mb4_uca.test:
Testing of new collations
mysys/charset-def.c:
Added new character sets
mysys/charset.c:
Always give an error, if requested, if a character set didn't exist
sql/handler.cc:
- Added upgrade_collation() to check if collation is compatible with old version
- check_collation_compatibility() checks if we are using an old collation from MariaDB 5.5 or MySQL 5.6
- ha_check_for_upgrade() returns HA_ADMIN_NEEDS_ALTER if we have an incompatible collation
sql/handler.h:
Added new prototypes
sql/sql_table.cc:
- Mark that tables are opened for ALTER TABLE
- If table needs to be upgraded, ensure we are not using online alter table.
sql/table.cc:
- If we are using an old incompatible collation, change to use the new one and mark table as incompatible.
- Give an error if we try to open an incompatible table.
sql/table.h:
Added error that table needs to be rebuild
storage/connect/ha_connect.cc:
Fixed compiler warning
strings/ctype-uca.c:
New character sets
sql/sql_insert.cc:
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
******
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
sql/sql_table.cc:
small cleanup
******
small cleanup
client/mysqltest.cc:
Column names.
mysql-test/r/grant_cache_no_prot.result:
fix of text.
mysql-test/r/grant_cache_ps_prot.result:
Fix of test.
mysql-test/r/query_cache.result:
Switching on and off query cache.
mysql-test/t/query_cache.test:
Switching on and off query cache.
mysys/charset.c:
Fix of parser.
sql/handler.cc:
thd added to parameters.
sql/log_event.cc:
thd added to parameters.
sql/log_event_old.cc:
thd added to parameters.
sql/mysql_priv.h:
Fixed functions definitions.
sql/mysqld.cc:
Comments stripping.
sql/set_var.cc:
Switching on and off query cache.
sql/set_var.h:
Switching on and off query cache.
sql/share/errmsg.txt:
New errors.
sql/sql_cache.cc:
Switching query cache on and off, removing comments.
sql/sql_cache.h:
thd added to parameters.
sql/sql_class.h:
Comments stripping.
sql/sql_db.cc:
thd added to parameters.
sql/sql_lex.cc:
lex fixed.
sql/sql_parse.cc:
thd added to parameters.
The retrieval of a charset by number was not
doing bounds checking before accessing the internal
character sets array.
Added checks for valid charset number.
Added asserts for valid charset number to some of
the internal functions.
Removed one superfluous check for charset_number 0
(since the all_charsets_array[0] is set to 0 anyway) for
uniformity.
Test suite added.
configure.in:
Added comment
mysql-test/suite/innodb_plugin/t/innodb_bug56680.test:
Disable test when run with valgrind as we get errors from buf_buddy_relocate() on work for this test.
(Should probably be investigated as this may be an issue in xtradb, but probably harmless)
Work is an amd-64 running openSUSE 1.11 and valgrind 3.4.1
mysys/charset.c:
Remove static function if not used (to remove compiler warning)
storage/xtradb/srv/srv0srv.c:
Added casts to get rid of compiler warnings