Commit graph

155 commits

Author SHA1 Message Date
Alexander Barkov
133446828c MDEV-27009 Add UCA-14.0.0 collations
- Added one neutral and 22 tailored (language specific) collations based on
  Unicode Collation Algorithm version 14.0.0.

  Collations were added for Unicode character sets
  utf8mb3, utf8mb4, ucs2, utf16, utf32.

  Every tailoring was added with four accent and case
  sensitivity flag combinations, e.g:

  * utf8mb4_uca1400_swedish_as_cs
  * utf8mb4_uca1400_swedish_as_ci
  * utf8mb4_uca1400_swedish_ai_cs
  * utf8mb4_uca1400_swedish_ai_ci

  and their _nopad_ variants:

  * utf8mb4_uca1400_swedish_nopad_as_cs
  * utf8mb4_uca1400_swedish_nopad_as_ci
  * utf8mb4_uca1400_swedish_nopad_ai_cs
  * utf8mb4_uca1400_swedish_nopad_ai_ci

- Introducing a conception of contextually typed named collations:

  CREATE DATABASE db1 CHARACTER SET utf8mb4;
  CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci);

  The idea is that there is no a need to specify the character set prefix
  in the new collation names. It's enough to type just the suffix
  "uca1400_as_ci". The character set is taken from the context.

  In the above example script the context character set is utf8mb4.
  So the CREATE TABLE will make a column with the collation
  utf8mb4_uca1400_as_ci.

  Short collations names can be used in any parts of the SQL syntax
  where the COLLATE clause is understood.

- New collations are displayed only one time
  (without character set combinations) by these statements:

     SELECT * FROM INFORMATION_SCHEMA.COLLATIONS;
     SHOW COLLATION;

  For example, all these collations:
  - utf8mb3_uca1400_swedish_as_ci
  - utf8mb4_uca1400_swedish_as_ci
  - ucs2_uca1400_swedish_as_ci
  - utf16_uca1400_swedish_as_ci
  - utf32_uca1400_swedish_as_ci
  have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION,
  with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix
  without the character set name:

SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';

+-----------------------+
| COLLATION_NAME        |
+-----------------------+
| uca1400_swedish_as_ci |
+-----------------------+

  Note, the behaviour of old collations did not change.
  Non-unicode collations (e.g. latin1_swedish_ci) and
  old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci)
  are still displayed with the character set prefix, as before.

- The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed.

  The NOT NULL constraint was removed from these columns:
  - CHARACTER_SET_NAME
  - ID
  - IS_DEFAULT
  and from the corresponding columns in SHOW COLLATION.

  For example:

SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
+-----------------------+--------------------+------+------------+
| COLLATION_NAME        | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
+-----------------------+--------------------+------+------------+
| uca1400_swedish_as_ci | NULL               | NULL | NULL       |
+-----------------------+--------------------+------+------------+

  The NULL value in these columns now means that the collation
  is applicable to multiple character sets.
  The behavioir of old collations did not change.
  Make sure your client programs can handle NULL values in these columns.

- The structure of the table
  INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed.

  Three new NOT NULL columns were added:
  - FULL_COLLATION_NAME
  - ID
  - IS_DEFAULT

  New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY.
  The column COLLATION_NAME contains the collation name without the character
  set prefix. The column FULL_COLLATION_NAME contains the collation name with
  the character set prefix.

  Old collations have full collation name in both FULL_COLLATION_NAME and
  COLLATION_NAME.

SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4|latin1).*swedish.*ci$';
+-----------------------------+-------------------------------------+--------------------+------+------------+
| COLLATION_NAME              | FULL_COLLATION_NAME                 | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
+-----------------------------+-------------------------------------+--------------------+------+------------+
| latin1_swedish_ci           | latin1_swedish_ci                   | latin1             |    8 | Yes        |
| latin1_swedish_nopad_ci     | latin1_swedish_nopad_ci             | latin1             | 1032 |            |
| utf8mb4_swedish_ci          | utf8mb4_swedish_ci                  | utf8mb4            |  232 |            |
| uca1400_swedish_ai_ci       | utf8mb4_uca1400_swedish_ai_ci       | utf8mb4            | 2368 |            |
| uca1400_swedish_as_ci       | utf8mb4_uca1400_swedish_as_ci       | utf8mb4            | 2370 |            |
| uca1400_swedish_nopad_ai_ci | utf8mb4_uca1400_swedish_nopad_ai_ci | utf8mb4            | 2372 |            |
| uca1400_swedish_nopad_as_ci | utf8mb4_uca1400_swedish_nopad_as_ci | utf8mb4            | 2374 |            |
+-----------------------------+-------------------------------------+--------------------+------+------------+

- Other INFORMATION_SCHEMA queries:

  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS;
  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS;
  SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
  SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS;

  display full collation names, including character sets prefix,
  for all collations, including new collations.

  Corresponding SHOW commands also display full collation names
  in collation related columns:

  SHOW CREATE TABLE t1;
  SHOW CREATE DATABASE db1;
  SHOW TABLE STATUS;
  SHOW CREATE FUNCTION f1;
  SHOW CREATE PROCEDURE p1;
  SHOW CREATE EVENT ev1;
  SHOW CREATE TRIGGER tr1;
  SHOW CREATE VIEW;

  These INFORMATION_SCHEMA queries and SHOW statements may change in
  the future, to display show collation names.
2022-08-10 15:04:24 +02:00
Alexander Barkov
0c4c064f98 MDEV-27743 Remove Lex::charset
This patch also fixes:

MDEV-27690 Crash on `CHARACTER SET csname COLLATE DEFAULT` in column definition
MDEV-27853 Wrong data type on column `COLLATE DEFAULT` and table `COLLATE some_non_default_collation`
MDEV-28067 Multiple conflicting column COLLATE clauses are not rejected
MDEV-28118 Wrong collation of `CAST(.. AS CHAR COLLATE DEFAULT)`
MDEV-28119 Wrong column collation on MODIFY + CONVERT
2022-03-22 17:12:15 +04:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Alexander Barkov
b915f79e4e MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings 2022-01-21 12:16:07 +04:00
Alexander Barkov
0d68b0a2d6 MDEV-26669 Add MY_COLLATION_HANDLER functions min_str() and max_str() 2021-09-27 17:10:22 +04:00
Monty
a206658b98 Change CHARSET_INFO character set and collaction names to LEX_CSTRING
This change removed 68 explict strlen() calls from the code.

The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name

Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible

Other things:
- There where compiler issues with ensuring that all character set names
  points to the same string: gcc doesn't allow one to use integer constants
  when defining global structures (constant char * pointers works fine).
  To get around this, I declared defines for each character set name
  length.
2021-05-19 22:54:07 +02:00
Marko Mäkelä
133b4b46fe Merge 10.4 into 10.5 2020-11-03 16:24:47 +02:00
Marko Mäkelä
533a13af06 Merge 10.3 into 10.4 2020-11-03 14:49:17 +02:00
Marko Mäkelä
8036d0a359 MDEV-22387: Do not violate __attribute__((nonnull))
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.

After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
2020-11-02 14:19:21 +02:00
Monty
dbcd3384e0 MDEV-7947 strcmp() takes 0.37% in OLTP RO
This patch ensures that all identical character sets shares the same
cs->csname.
This allows us to replace strcmp() in my_charset_same() with comparisons
of pointers. This fixes a long standing performance issue that could cause
as strcmp() for every item sent trough the protocol class to the end user.

One consequence of this patch is that we don't allow one to add a character
definition in the Index.xml file that changes the csname of an existing
character set. This is by design as changing character set names of existing
ones is extremely dangerous, especially as some storage engines just records
character set numbers.

As we now have a hash over character set's csname, we can in the future
use that for faster access to a specific character set. This could be done
by changing the hash to non unique and use the hash to find the next
character set with same csname.
2020-07-23 10:54:33 +03:00
Alexander Barkov
cfe5ee90c8 MDEV-22043 Special character leads to assertion in my_wc_to_printable_generic on 10.5.2 (debug)
The code did not take into account that:
- U+005C (backslash) can occupy more than mbminlen characters (e.g. in sjis)
- Some character sets do not have a code for U+005C (e.g. swe7)

Adding a new function my_wc_to_printable into MY_CHARSET_HANDLER to
cover all special cases easier.
2020-05-09 16:01:30 +04:00
Monty
4d61f1247a Fixed compiler warnings from gcc 7.4.1
- Fixed possible error in rocksdb/rdb_datadic.cc
2020-01-29 23:23:55 +02:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
cb248f8806 Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru
5543b75550 Update FSF Address
* Update wrong zip-code
2019-05-11 21:29:06 +03:00
Marko Mäkelä
ef3070e997 Merge 10.1 into 10.2 2018-08-02 08:19:57 +03:00
Oleksandr Byelkin
cb5952b506 Merge branch '10.0' into bb-10.1-merge-sanja 2018-07-25 22:24:40 +02:00
Alexander Barkov
e2ac4098ed Simplify caseup() and casedn() in charsets
After the MDEV-13118 fix there's no code in the server that
wants caseup/casedn to change the argument in place for simple
charsets.  Let's remove this logic and always return the result in a
new string for all charsets, both simple and complex.

1. Removing the optimization that *some* character sets used in casedn()
  and caseup(), which allowed (and required) to change the case in-place,
  overwriting the string passed as the "src" argument.
  Now all CHARSET_INFO's work in the same way:
  non of them change the source string in-place, all of them now convert
  case from the source string to the destination string, leaving
  the source string untouched.

2. Adding "const" qualifier to the "char *src" parameter
   to caseup() and casedn().

3. Removing duplicate implementations in ctype-mb.c.
  Now both caseup() and casedn() implementations for all CJK character sets
  use internally the same function my_casefold_mb()
  (the former my_casefold_mb_varlen()).

4. Removing the "unused" attribute from parameters of some my_case{up|dn}_xxx()
   implementations, as the affected parameters are now *used* in the code.
   Previously these parameters were used only in DBUG_ASSERT().
2018-07-19 13:02:14 +04:00
Vladislav Vaintroub
9891ee5a2a Fix and reenable Windows compiler warning C4800 (size_t conversion). 2018-01-26 10:37:46 +00:00
Alexander Barkov
5058ced5df MDEV-7769 MY_CHARSET_INFO refactoring# On branch 10.2
Part 3 (final): removing MY_CHARSET_HANDLER::well_formed_len().
2016-10-10 14:36:09 +04:00
Alexander Barkov
ee19806b8e MDEV-9711 NO PAD collations
Based on the patch from Daniil Medvedev (a Google Summer of Code task)
2016-09-06 12:50:02 +04:00
Alexander Barkov
e7ff281d2e MDEV-6353 my_ismbchar() and my_mbcharlen() refactoring 2016-05-17 15:27:10 +04:00
Alexander Barkov
1d73005bf3 MDEV-8360 Clean-up CHARSET_INFO: strnncollsp: diff_if_only_endspace_difference
- Removing the "diff_if_only_endspace_difference" argument from
  MY_COLLATION_HANDLER::strnncollsp(), my_strnncollsp_simple(),
  as well as in the function template MY_FUNCTION_NAME(strnncollsp)
  in strcoll.ic

- Removing the "diff_if_only_space_different" from ha_compare_text(),
  hp_rec_key_cmp().

- Adding a new function my_strnncollsp_padspace_bin() and reusing
  it instead of duplicate code pieces in my_strnncollsp_8bit_bin(),
  my_strnncollsp_latin1_de(), my_strnncollsp_tis620(),
  my_strnncollsp_utf8_cs().

- Adding more tests for better coverage of the trailing space handling.

- Removing the unused definition of HA_END_SPACE_ARE_EQUAL
2016-03-31 11:04:48 +04:00
Alexander Barkov
e09299511e MDEV-9665 Remove cs->cset->ismbchar()
Using a more powerfull cs->cset->charlen() instead.
2016-03-16 10:55:12 +04:00
Alexander Barkov
3ba2a958be MDEV-8694 Wrong result for SELECT..WHERE a NOT LIKE 'a ' AND a='a'
Note, the patch for MDEV-8661 unintentionally fixed MDEV-8694 as well,
as a side effect. Adding a real clear fix: implementing
Item_func_like::propagate_equal_fields() with comments.
2015-08-28 17:03:09 +04:00
Alexander Barkov
78b80cb6ba Adding MY_CHARSET_HANDLER::native_to_mb().
This is a pre-requisite patch for:
- MDEV-8433 Make field<'broken-string' use indexes
- MDEV-8625 Bad result set with ignorable characters when using a prefix key
- MDEV-8626 Bad result set with contractions when using a prefix key
2015-08-14 18:34:41 +04:00
Alexander Barkov
197afb413f MDEV-6566 Different INSERT behaviour on bad bytes with and without character set conversion 2015-03-13 16:51:36 +04:00
Alexander Barkov
b1b6101af2 A preparatory patch for MDEV-6566.
Adding a new virtual function MY_CHARSET_HANDLER::copy_abort().
Moving character set specific code into the correspoding implementations
(for simple, multi-byte and mbmaxlen>1 character sets).
2015-03-02 18:24:22 +04:00
Alexander Barkov
807934d083 MDEV-7086 main.ctype_cp932 fails in buildbot on a valgrind build
Removing a redundant and wrong condition which could access beyond
the pattern string range.
2014-11-18 13:07:37 +04:00
Michael Widenius
c4f5326bb7 MDEV-6255 DUPLICATE KEY Errors on SELECT .. GROUP BY that uses temporary and filesort.
The problem was that my_hash_sort didn't properly delete end-space characters properly, so strings that should compare
identically was seen as different strings.  (Space was handled correctly, but not NBSP)
This caused duplicate key errors when a heap table was converted to Aria as part of overflow in group by.

Fixed by removing all characters that compares as end space when creating a hash.

Other things:
- Fixed that --sorted_results also works for errors in mysqltest.
- Speed up hash by not comparing strings that has different hash.
- Speed up many my_hash_sort functions by using registers to calculate hash instead of pointers.
  This was previously done for some functions, but not for all.
- Made a macro of the hash function, to simplify code and to be able to experiment with new hash functions.







client/mysqltest.cc:
  Fixed that --sorted_results also works for error messages.
mysql-test/r/ctype_partitions.result:
  New test to ensure that partitions on hash works
mysql-test/suite/multi_source/gtid.result:
  Updated result
mysql-test/suite/multi_source/gtid.test:
  Test that --sorted_result works for error messages
mysql-test/suite/multi_source/gtid_ignore_duplicates.result:
  Updated result
mysql-test/suite/multi_source/gtid_ignore_duplicates.test:
  Updated result
mysql-test/suite/multi_source/load_data.result:
  Updated result
mysql-test/suite/multi_source/load_data.test:
  Updated result
mysql-test/t/ctype_partitions.test:
  New test to ensure that partitions on hash works
storage/heap/hp_write.c:
  Speed up hash by not comparing strings that has different hash.
storage/maria/ma_check.c:
  Extra debug
strings/ctype-bin.c:
  Use macro for hash function
strings/ctype-latin1.c:
  Use macro for hash function
  Use registers to calculate hash (speedup)
strings/ctype-mb.c:
  Use macro for hash function
  Use registers to calculate hash (speedup)
strings/ctype-simple.c:
  Use macro for hash function
  Use same variable names as in other my_hash_sort functions.
  Update my_hash_sort_simple() to properly remove end space (patch by Bar)
strings/ctype-uca.c:
  Ignore duplicated space inside strings and end space in my_hash_sort_uca(). This fixed MDEV-6255
  Use macro for hash function
  Use registers to calculate hash (speedup)
strings/ctype-ucs2.c:
  Use macro for hash function
  Use registers to calculate hash (speedup)
strings/ctype-utf8.c:
  Use macro for hash function
  Use registers to calculate hash (speedup)
strings/strings_def.h:
  Made a macro of the hash function, to simplify code and to be able to experiment with new hash functions.
2014-09-11 22:42:35 +03:00
Sergei Golubchik
1c6ad62a26 mysql-5.5.39 merge
~40% bugfixed(*) applied
~40$ bugfixed reverted (incorrect or we're not buggy)
~20% bugfixed applied, despite us being not buggy
(*) only changes in the server code, e.g. not cmakefiles
2014-08-02 21:26:16 +02:00
Sergei Golubchik
6fb17a0601 5.5.39 merge 2014-08-07 18:06:56 +02:00
Erlend Dahl
13d4101a39 Bug#18850241 WRONG COPYRIGHT HEADER IN SOME STRINGS/CTYPE-* FILES 2014-06-23 12:11:13 +02:00
Sergey Vojtovich
b95c8ce530 MDEV-5675 - Performance: my_hash_sort_bin is called too often
Reduced number of my_hash_sort_bin() calls from 4 to 1 per query.
Reduced number of memory accesses done by my_hash_sort_bin().

Details:
- let MDL subsystem use pre-calculated hash value for hash
  inserts and deletes
- let table cache use pre-calculated MDL hash value
- MDL namespace is excluded from hash value calculation, so that
  hash value can be used by table cache as is
- hash value for MDL is calculated as resulting hash value + MDL
  namespace
- extended hash implementation to accept user defined hash function
2014-03-06 16:19:12 +04:00
Sergei Golubchik
8dce8ecfda minor cleanup 2014-03-01 10:19:42 +01:00
Alexander Barkov
426d246f5b MDEV-5163 Merge WEIGHT_STRING function from MySQL-5.6 2013-10-23 20:25:52 +04:00
Alexander Barkov
0b6c4bb34f MDEV-4928 Merge collation customization improvements
Merging the following MySQL-5.6 changes:
- WL#5624: Collation customization improvements
  http://dev.mysql.com/worklog/task/?id=5624

- WL#4013: Unicode german2 collation
  http://dev.mysql.com/worklog/task/?id=4013

- Bug#62429 XML: ExtractValue, UpdateXML max arg length 127 chars
  http://bugs.mysql.com/bug.php?id=62429
  (required by WL#5624)
2013-10-02 15:04:07 +04:00
Sergei Golubchik
b7b5f6f1ab 10.0-monty merge
includes:
* remove some remnants of "Bug#14521864: MYSQL 5.1 TO 5.5 BUGS PARTITIONING"
* introduce LOCK_share, now LOCK_ha_data is strictly for engines
* rea_create_table() always creates .par file (even in "frm-only" mode)
* fix a 5.6 bug, temp file leak on dummy ALTER TABLE
2013-07-21 16:39:19 +02:00
Sergei Golubchik
005c7e5421 mysql-5.5.32 merge 2013-07-16 19:09:54 +02:00
Sergei Golubchik
b381cf843c mysql-5.5.31 merge 2013-05-07 13:05:09 +02:00
Michael Widenius
068c61978e Temporary commit of 10.0-merge 2013-03-26 00:03:13 +02:00
Murthy Narkedimilli
8afe262ae5 Fix for Bug 16395495 - OLD FSF ADDRESS IN GPL HEADER 2013-03-19 15:53:48 +01:00
Neeraj Bisht
84d798a1d5 BUG#14303860 - EXECUTING A SELECT QUERY WITH TOO
MANY WILDCARDS CAUSES A SEGFAULT
      Back port from 5.6 and trunk
2013-01-14 16:51:52 +05:30
Neeraj Bisht
99645e5be5 BUG#14303860 - EXECUTING A SELECT QUERY WITH TOO
MANY WILDCARDS CAUSES A SEGFAULT

Back port from 5.6 and trunk
2013-01-14 14:59:48 +05:30
Sergei Golubchik
4f435bddfd 5.3 merge 2012-01-13 15:50:02 +01:00
Michael Widenius
6920457142 Merge with MariaDB 5.1 2011-11-24 18:48:58 +02:00
Michael Widenius
a8d03ab235 Initail merge with MySQL 5.1 (XtraDB still needs to be merged)
Fixed up copyright messages.
2011-11-21 19:13:14 +02:00
Sergei Golubchik
76f0b94bb0 merge with 5.3
sql/sql_insert.cc:
  CREATE ... IF NOT EXISTS may do nothing, but
  it is still not a failure. don't forget to my_ok it.
  ******
  CREATE ... IF NOT EXISTS may do nothing, but
  it is still not a failure. don't forget to my_ok it.
sql/sql_table.cc:
  small cleanup
  ******
  small cleanup
2011-10-19 21:45:18 +02:00
Sergei Golubchik
9809f05199 5.5-merge 2011-07-02 22:08:51 +02:00