mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-27 09:14:17 +01:00

Author	SHA1	Message	Date
Alexander Barkov	133446828c	MDEV-27009 Add UCA-14.0.0 collations - Added one neutral and 22 tailored (language specific) collations based on Unicode Collation Algorithm version 14.0.0. Collations were added for Unicode character sets utf8mb3, utf8mb4, ucs2, utf16, utf32. Every tailoring was added with four accent and case sensitivity flag combinations, e.g: * utf8mb4_uca1400_swedish_as_cs * utf8mb4_uca1400_swedish_as_ci * utf8mb4_uca1400_swedish_ai_cs * utf8mb4_uca1400_swedish_ai_ci and their _nopad_ variants: * utf8mb4_uca1400_swedish_nopad_as_cs * utf8mb4_uca1400_swedish_nopad_as_ci * utf8mb4_uca1400_swedish_nopad_ai_cs * utf8mb4_uca1400_swedish_nopad_ai_ci - Introducing a conception of contextually typed named collations: CREATE DATABASE db1 CHARACTER SET utf8mb4; CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci); The idea is that there is no a need to specify the character set prefix in the new collation names. It's enough to type just the suffix "uca1400_as_ci". The character set is taken from the context. In the above example script the context character set is utf8mb4. So the CREATE TABLE will make a column with the collation utf8mb4_uca1400_as_ci. Short collations names can be used in any parts of the SQL syntax where the COLLATE clause is understood. - New collations are displayed only one time (without character set combinations) by these statements: SELECT * FROM INFORMATION_SCHEMA.COLLATIONS; SHOW COLLATION; For example, all these collations: - utf8mb3_uca1400_swedish_as_ci - utf8mb4_uca1400_swedish_as_ci - ucs2_uca1400_swedish_as_ci - utf16_uca1400_swedish_as_ci - utf32_uca1400_swedish_as_ci have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION, with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix without the character set name: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+ \| COLLATION_NAME \| +-----------------------+ \| uca1400_swedish_as_ci \| +-----------------------+ Note, the behaviour of old collations did not change. Non-unicode collations (e.g. latin1_swedish_ci) and old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci) are still displayed with the character set prefix, as before. - The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed. The NOT NULL constraint was removed from these columns: - CHARACTER_SET_NAME - ID - IS_DEFAULT and from the corresponding columns in SHOW COLLATION. For example: SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+--------------------+------+------------+ \| COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------+--------------------+------+------------+ \| uca1400_swedish_as_ci \| NULL \| NULL \| NULL \| +-----------------------+--------------------+------+------------+ The NULL value in these columns now means that the collation is applicable to multiple character sets. The behavioir of old collations did not change. Make sure your client programs can handle NULL values in these columns. - The structure of the table INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed. Three new NOT NULL columns were added: - FULL_COLLATION_NAME - ID - IS_DEFAULT New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY. The column COLLATION_NAME contains the collation name without the character set prefix. The column FULL_COLLATION_NAME contains the collation name with the character set prefix. Old collations have full collation name in both FULL_COLLATION_NAME and COLLATION_NAME. SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4\|latin1).swedish.ci$'; +-----------------------------+-------------------------------------+--------------------+------+------------+ \| COLLATION_NAME \| FULL_COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------------+-------------------------------------+--------------------+------+------------+ \| latin1_swedish_ci \| latin1_swedish_ci \| latin1 \| 8 \| Yes \| \| latin1_swedish_nopad_ci \| latin1_swedish_nopad_ci \| latin1 \| 1032 \| \| \| utf8mb4_swedish_ci \| utf8mb4_swedish_ci \| utf8mb4 \| 232 \| \| \| uca1400_swedish_ai_ci \| utf8mb4_uca1400_swedish_ai_ci \| utf8mb4 \| 2368 \| \| \| uca1400_swedish_as_ci \| utf8mb4_uca1400_swedish_as_ci \| utf8mb4 \| 2370 \| \| \| uca1400_swedish_nopad_ai_ci \| utf8mb4_uca1400_swedish_nopad_ai_ci \| utf8mb4 \| 2372 \| \| \| uca1400_swedish_nopad_as_ci \| utf8mb4_uca1400_swedish_nopad_as_ci \| utf8mb4 \| 2374 \| \| +-----------------------------+-------------------------------------+--------------------+------+------------+ - Other INFORMATION_SCHEMA queries: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS; SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES; SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS; display full collation names, including character sets prefix, for all collations, including new collations. Corresponding SHOW commands also display full collation names in collation related columns: SHOW CREATE TABLE t1; SHOW CREATE DATABASE db1; SHOW TABLE STATUS; SHOW CREATE FUNCTION f1; SHOW CREATE PROCEDURE p1; SHOW CREATE EVENT ev1; SHOW CREATE TRIGGER tr1; SHOW CREATE VIEW; These INFORMATION_SCHEMA queries and SHOW statements may change in the future, to display show collation names.	2022-08-10 15:04:24 +02:00
Oleksandr Byelkin	9ed8deb656	Merge branch '10.6' into 10.7	2022-02-04 14:11:46 +01:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Oleksandr Byelkin	a576a1cea5	Merge branch '10.3' into 10.4	2022-01-30 09:46:52 +01:00
Alexander Barkov	b915f79e4e	MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings	2022-01-21 12:16:07 +04:00
Vladislav Vaintroub	47e18af906	MDEV-27494 Rename .ic files to .inl	2022-01-17 16:41:51 +01:00
Marko Mäkelä	b36d6f92a8	Merge 10.6 into 10.7	2021-09-30 11:01:07 +03:00
Alexander Barkov	0d68b0a2d6	MDEV-26669 Add MY_COLLATION_HANDLER functions min_str() and max_str()	2021-09-27 17:10:22 +04:00
Alexander Barkov	0629711db4	MDEV-26572 Improve simple multibyte collation performance on the ASCII range	2021-09-13 08:03:25 +04:00
Monty	a206658b98	Change CHARSET_INFO character set and collaction names to LEX_CSTRING This change removed 68 explict strlen() calls from the code. The following renames was done to ensure we don't use the old names when merging code from earlier releases, as using the new variables for print function could result in crashes: - charset->csname renamed to charset->cs_name - charset->name renamed to charset->coll_name Almost everything where mechanical changes except: - Changed to use the new Protocol::store(LEX_CSTRING..) when possible - Changed to use field->store(LEX_CSTRING, CHARSET_INFO) when possible - Changed to use String->append(LEX_CSTRING&) when possible Other things: - There where compiler issues with ensuring that all character set names points to the same string: gcc doesn't allow one to use integer constants when defining global structures (constant char * pointers works fine). To get around this, I declared defines for each character set name length.	2021-05-19 22:54:07 +02:00
Monty	dbcd3384e0	MDEV-7947 strcmp() takes 0.37% in OLTP RO This patch ensures that all identical character sets shares the same cs->csname. This allows us to replace strcmp() in my_charset_same() with comparisons of pointers. This fixes a long standing performance issue that could cause as strcmp() for every item sent trough the protocol class to the end user. One consequence of this patch is that we don't allow one to add a character definition in the Index.xml file that changes the csname of an existing character set. This is by design as changing character set names of existing ones is extremely dangerous, especially as some storage engines just records character set numbers. As we now have a hash over character set's csname, we can in the future use that for faster access to a specific character set. This could be done by changing the hash to non unique and use the hash to find the next character set with same csname.	2020-07-23 10:54:33 +03:00
Alexander Barkov	cfe5ee90c8	MDEV-22043 Special character leads to assertion in my_wc_to_printable_generic on 10.5.2 (debug) The code did not take into account that: - U+005C (backslash) can occupy more than mbminlen characters (e.g. in sjis) - Some character sets do not have a code for U+005C (e.g. swe7) Adding a new function my_wc_to_printable into MY_CHARSET_HANDLER to cover all special cases easier.	2020-05-09 16:01:30 +04:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	cb248f8806	Merge branch '5.5' into 10.1	2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru	5543b75550	Update FSF Address * Update wrong zip-code	2019-05-11 21:29:06 +03:00
Alexander Barkov	5058ced5df	MDEV-7769 MY_CHARSET_INFO refactoring# On branch 10.2 Part 3 (final): removing MY_CHARSET_HANDLER::well_formed_len().	2016-10-10 14:36:09 +04:00
Alexander Barkov	ee19806b8e	MDEV-9711 NO PAD collations Based on the patch from Daniil Medvedev (a Google Summer of Code task)	2016-09-06 12:50:02 +04:00
Alexander Barkov	e7ff281d2e	MDEV-6353 my_ismbchar() and my_mbcharlen() refactoring	2016-05-17 15:27:10 +04:00
Alexander Barkov	e09299511e	MDEV-9665 Remove cs->cset->ismbchar() Using a more powerfull cs->cset->charlen() instead.	2016-03-16 10:55:12 +04:00
Alexander Barkov	78b80cb6ba	Adding MY_CHARSET_HANDLER::native_to_mb(). This is a pre-requisite patch for: - MDEV-8433 Make field<'broken-string' use indexes - MDEV-8625 Bad result set with ignorable characters when using a prefix key - MDEV-8626 Bad result set with contractions when using a prefix key	2015-08-14 18:34:41 +04:00
Alexander Barkov	4f828a1cac	MDEV-8214 Asian MB2 charsets: compare broken bytes as "greater than any non-broken character"	2015-06-26 13:40:28 +04:00
Alexander Barkov	197afb413f	MDEV-6566 Different INSERT behaviour on bad bytes with and without character set conversion	2015-03-13 16:51:36 +04:00
Alexander Barkov	a7ed8523e3	Adding a shared include file ctype-mb.ic and removing a number of very similar copies of my_well_formed_len_xxx(), implemented for big5, cp932, euckr, eucjpms, gb2312m gbk, sjis, ujis.	2015-03-04 09:16:43 +04:00
Alexander Barkov	b1b6101af2	A preparatory patch for MDEV-6566. Adding a new virtual function MY_CHARSET_HANDLER::copy_abort(). Moving character set specific code into the correspoding implementations (for simple, multi-byte and mbmaxlen>1 character sets).	2015-03-02 18:24:22 +04:00
Sergei Golubchik	2ae7541bcf	cleanup: s/const CHARSET_INFO/CHARSET_INFO/ as CHARSET_INFO is already const, using const on it is redundant and results in compiler warnings (on Windows)	2014-12-04 10:41:51 +01:00
Alexander Barkov	426d246f5b	MDEV-5163 Merge WEIGHT_STRING function from MySQL-5.6	2013-10-23 20:25:52 +04:00
Alexander Barkov	0b6c4bb34f	MDEV-4928 Merge collation customization improvements Merging the following MySQL-5.6 changes: - WL#5624: Collation customization improvements http://dev.mysql.com/worklog/task/?id=5624 - WL#4013: Unicode german2 collation http://dev.mysql.com/worklog/task/?id=4013 - Bug#62429 XML: ExtractValue, UpdateXML max arg length 127 chars http://bugs.mysql.com/bug.php?id=62429 (required by WL#5624)	2013-10-02 15:04:07 +04:00
Sergei Golubchik	4f435bddfd	5.3 merge	2012-01-13 15:50:02 +01:00
Michael Widenius	6920457142	Merge with MariaDB 5.1	2011-11-24 18:48:58 +02:00
Michael Widenius	a8d03ab235	Initail merge with MySQL 5.1 (XtraDB still needs to be merged) Fixed up copyright messages.	2011-11-21 19:13:14 +02:00
Sergei Golubchik	0e007344ea	mysql-5.5.18 merge	2011-11-03 19:17:05 +01:00
Sergei Golubchik	76f0b94bb0	merge with 5.3 sql/sql_insert.cc: CREATE ... IF NOT EXISTS may do nothing, but it is still not a failure. don't forget to my_ok it. **** CREATE ... IF NOT EXISTS may do nothing, but it is still not a failure. don't forget to my_ok it. sql/sql_table.cc: small cleanup **** small cleanup	2011-10-19 21:45:18 +02:00
Sergei Golubchik	9809f05199	5.5-merge	2011-07-02 22:08:51 +02:00
Kent Boortz	68f00a5686	Updated/added copyright headers	2011-06-30 17:37:13 +02:00
Kent Boortz	02e07e3b51	Updated/added copyright headers	2011-06-30 17:46:53 +02:00
Michael Widenius	1be5462d59	Merge with MariaDB 5.1	2011-05-03 19:10:10 +03:00
Michael Widenius	e415ba0fb2	Merge with MySQL 5.1.57/58 Moved some BSD string functions from Unireg	2011-05-02 20:58:45 +03:00
Michael Widenius	3358cdd504	Merge with 5.1 to get in changes from MySQL 5.1.55	2011-02-28 19:39:30 +02:00
Michael Widenius	785695e7c3	Flush DBUG log in case of DBUG_ASSERT() Added strings_def.h into strings library to be able to have a DBUG_ASSERT() version without _db_flush() call (as strings.a should not depend on dbug.a) Remove include of m_string.h in all string files (as it's included by string_def.h). Fixed include order. Changed "m_ctype.h" -> <m_ctype.h> include/my_dbug.h: Flush DBUG log in case of DBUG_ASSERT() strings/bchange.c: Include strings_def.h strings/bcmp.c: Include strings_def.h strings/bfill.c: Include strings_def.h strings/bmove.c: Include strings_def.h strings/bmove512.c: Include strings_def.h strings/bmove_upp.c: Include strings_def.h strings/conf_to_src.c: Include strings_def.h Fixed copyright strings/ctype-big5.c: Include strings_def.h strings/ctype-bin.c: Include strings_def.h strings/ctype-cp932.c: Include strings_def.h strings/ctype-czech.c: Include strings_def.h strings/ctype-euc_kr.c: Include strings_def.h strings/ctype-eucjpms.c: Include strings_def.h strings/ctype-extra.c: Include strings_def.h strings/ctype-gbk.c: Include strings_def.h strings/ctype-latin1.c: Include strings_def.h strings/ctype-mb.c: Include strings_def.h strings/ctype-simple.c: Include strings_def.h strings/ctype-sjis.c: Include strings_def.h strings/ctype-tis620.c: Include strings_def.h strings/ctype-uca.c: Include strings_def.h strings/ctype-ucs2.c: Include strings_def.h strings/ctype-ujis.c: Include strings_def.h strings/ctype-utf8.c: Include strings_def.h strings/ctype-win1250ch.c: Include strings_def.h strings/ctype.c: Include strings_def.h strings/decimal.c: Include strings_def.h strings/do_ctype.c: Include strings_def.h strings/int2str.c: Include strings_def.h strings/is_prefix.c: Include strings_def.h strings/llstr.c: Include strings_def.h strings/longlong2str.c: Include strings_def.h strings/longlong2str_asm.c: Include strings_def.h strings/my_strchr.c: Include strings_def.h strings/my_strtoll10.c: Include strings_def.h strings/my_vsnprintf.c: Include strings_def.h strings/r_strinstr.c: Include strings_def.h strings/str2int.c: Include strings_def.h strings/str_alloc.c: Include strings_def.h strings/str_test.c: Include strings_def.h Fixed compiler warnings strings/strappend.c: Include strings_def.h strings/strcend.c: Include strings_def.h strings/strcont.c: Include strings_def.h strings/strend.c: Include strings_def.h strings/strfill.c: Include strings_def.h strings/strinstr.c: Include strings_def.h strings/strmake.c: Include strings_def.h strings/strmov.c: Include strings_def.h strings/strmov_overlapp.c: Include strings_def.h strings/strnlen.c: Include strings_def.h strings/strnmov.c: Include strings_def.h strings/strstr.c: Include strings_def.h strings/strto.c: Include strings_def.h strings/strtod.c: Include strings_def.h strings/strtol.c: Include strings_def.h strings/strtoll.c: Include strings_def.h strings/strtoul.c: Include strings_def.h strings/strtoull.c: Include strings_def.h strings/strxmov.c: Include strings_def.h strings/strxnmov.c: Include strings_def.h strings/uctypedump.c: Include strings_def.h Fixed compiler warnings Removed double include of m_ctype.h strings/udiv.c: Include strings_def.h strings/xml.c: Include strings_def.h	2011-01-30 12:41:44 +02:00
Alexander Barkov	435289acd4	Updating Copyright information	2011-01-19 16:17:52 +03:00
Alexander Barkov	dfb7930b33	Merging Copyright update from 5.1	2011-01-19 16:31:17 +03:00
Sergei Golubchik	65ca700def	merge. checkpoint. does not compile.	2010-11-25 18:17:28 +01:00
Sergei Golubchik	a3d80d952d	merge with 5.1	2010-09-11 20:43:48 +02:00
Michael Widenius	ad6d95d3cb	Merge with MySQL 5.1.50 - Changed to still use bcmp() in certain cases becasue - Faster for short unaligneed strings than memcmp() - Bettern when using valgrind - Changed to use my_sprintf() instead of sprintf() to get higher portability for old systems - Changed code to use MariaDB version of select->skip_record() - Removed -%::SCCS/s.% from Makefile.am:s to remove automake warnings	2010-08-27 17:12:44 +03:00
Alexander Barkov	fc74792a6b	Merging from mysql-5.1-bugteam	2010-07-26 10:47:03 +04:00
Alexander Barkov	e57a9d6fe0	Bug#45012 my_like_range_cp932 generates invalid string Problem: The functions my_like_range_xxx() returned badly formed maximum strings for Asian character sets, which made problems for storage engines. Fix: - Removed a number my_like_range_xxx() implementations, which were in fact dumplicate code pieces. - Using generic my_like_range_mb() instead. - Setting max_sort_char member properly for Asian character sets - Adding unittest/strings/strings-t.c, to test that my_like_range_xxx() return well-formed min and max strings. Notes: - No additional tests in mysql/t/ available. Old tests cover the affected code well enough.	2010-07-26 09:06:18 +04:00
Davi Arnaut	13f7a1d244	WL#5486: Remove code for unsupported platforms Remove MS-DOS specific code.	2010-07-15 08:16:06 -03:00
Alexander Barkov	4e4122b999	WL#3090 Japanese Character Set adjustments added: @ mysql-test/include/ctype_utf8_table.inc Adding a share file to populate all utf8 values [U+0000..U+FFFF] modified: @ include/m_ctype.h Introducing MB2 and MY_PUT_MB2 macros @ mysql-test/r/ctype_cp932_binlog_stm.result @ mysql-test/r/ctype_eucjpms.result @ mysql-test/r/ctype_sjis.result @ mysql-test/r/ctype_ujis.result @ mysql-test/t/ctype_cp932_binlog_stm.test @ mysql-test/t/ctype_eucjpms.test @ mysql-test/t/ctype_sjis.test @ mysql-test/t/ctype_ujis.test Adding test @ strings/ctype-cp932.c @ strings/ctype-eucjpms.c @ strings/ctype-sjis.c @ strings/ctype-ujis.c Adding new functions using Big-Table approach.	2010-02-15 09:57:24 +04:00
Alexander Barkov	8dfc3fbbab	WL#4583 Case conversion in Asian character sets modified: include/m_ctype.h - Changing type for tolower/toupper members, to store values >= 0xFFFF. - Adding function prototypes mysql-test/r/ctype_big5.result mysql-test/r/ctype_cp932_binlog_stm.result mysql-test/r/ctype_eucjpms.result* mysql-test/r/ctype_euckr.result mysql-test/r/ctype_gb2312.result mysql-test/r/ctype_gbk.result mysql-test/r/ctype_sjis.result mysql-test/r/ctype_ujis.result mysql-test/t/ctype_big5.test mysql-test/t/ctype_cp932_binlog_stm.test mysql-test/t/ctype_eucjpms.test mysql-test/t/ctype_euckr.test mysql-test/t/ctype_gb2312.test mysql-test/t/ctype_gbk.test mysql-test/t/ctype_sjis.test mysql-test/t/ctype_ujis.test - Adding tests strings/ctype-big5.c strings/ctype-cp932.c strings/ctype-euc_kr.c strings/ctype-eucjpms.c strings/ctype-gb2312.c strings/ctype-gbk.c strings/ctype-sjis.c - Adding upper/lower case conversion data strings/ctype-mb.c - Adding handling of upper/lower conversion for multi-byte characters. strings/ctype-ujis.c - Implementing shared upper/lower conversion functions for ujis and eucjpms - Adding upper/lower case conversion data for ujis	2010-01-14 15:17:57 +04:00

1 2 3 4

162 commits