mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-15 19:42:28 +01:00

Author	SHA1	Message	Date
Alexander Barkov	0d17c540a5	MDEV-27277 Add a warning when max_sort_length is reached Step#1: fixing the return type of strnxfrm() from size_t to this structure: typedef struct { size_t m_output_length; size_t m_source_length_used; uint m_warnings; } my_strnxfrm_ret_t;	2024-10-22 21:42:53 +07:00
Marko Mäkelä	a009280e60	Merge 10.9 into 10.10	2023-04-14 12:24:14 +03:00
Marko Mäkelä	5bada1246d	Merge 10.5 into 10.6	2023-04-11 16:15:19 +03:00
Alexander Barkov	62e137d4d7	Merge remote-tracking branch 'origin/10.4' into 10.5	2023-04-05 16:16:19 +04:00
Alexander Barkov	8020b1bd73	MDEV-30034 UNIQUE USING HASH accepts duplicate entries for tricky collations - Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars() and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES. The flag defines if strnncollsp_nchars() should emulate trailing spaces which were possibly trimmed earlier (e.g. in InnoDB CHAR compression). This is important for NOPAD collations. For example, with this input: - str1= 'a ' (Latin letter a followed by one space) - str2= 'a ' (Latin letter a followed by two spaces) - nchars= 3 if the flag is given, strnncollsp_nchars() will virtually restore one trailing space to str1 up to nchars (3) characters and compare two strings as equal: - str1= 'a ' (one extra trailing space emulated) - str2= 'a ' (as is) If the flag is not given, strnncollsp_nchars() does not add trailing virtual spaces, so in case of a NOPAD collation, str1 will be compared as less than str2 because it is shorter. - Field_string::cmp_prefix() now passes the new flag. Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do not pass the new flag. - The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc (which handles the CHAR data type) now also passed the new flag. - Fixing UCA collations to respect the new flag. Other collations are possibly also affected, however I had no success in making an SQL script demonstrating the problem. Other collations will be extended to respect this flags in a separate patch later. - Changing the meaning of the last parameter of Field::cmp_prefix() from "number of bytes" (internal length) to "number of characters" (user visible length). The code calling cmp_prefix() from handler.cc was wrong. After this change, the call in handler.cc became correct. The code calling cmp_prefix() from key_rec_cmp() in key.cc was adjusted according to this change. - Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c now pass the new flag. A few new tests also were added, without the flag.	2023-04-04 12:30:50 +04:00
Alexander Barkov	7f6b648d7d	MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8 String length growth during upper/lower conversion in Unicode collations depends only on the underlying MY_UNICASE_INFO used in the collation. Maintaining a separate member CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply duplicated this information and caused bugs like this (when MY_UNICASE_INFO and case??_multiply when out of sync because of incomplete CHARSET_INFO initialization). Fix: Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply from members to virtual functions. The virtual functions in Unicode collations calculate case conversion growth factors from the MY_UNICASE_INFO. This guarantees that the growth factors are always in sync with the MY_UNICASE_INFO.	2023-02-17 17:33:27 +04:00
Alexander Barkov	133446828c	MDEV-27009 Add UCA-14.0.0 collations - Added one neutral and 22 tailored (language specific) collations based on Unicode Collation Algorithm version 14.0.0. Collations were added for Unicode character sets utf8mb3, utf8mb4, ucs2, utf16, utf32. Every tailoring was added with four accent and case sensitivity flag combinations, e.g: * utf8mb4_uca1400_swedish_as_cs * utf8mb4_uca1400_swedish_as_ci * utf8mb4_uca1400_swedish_ai_cs * utf8mb4_uca1400_swedish_ai_ci and their _nopad_ variants: * utf8mb4_uca1400_swedish_nopad_as_cs * utf8mb4_uca1400_swedish_nopad_as_ci * utf8mb4_uca1400_swedish_nopad_ai_cs * utf8mb4_uca1400_swedish_nopad_ai_ci - Introducing a conception of contextually typed named collations: CREATE DATABASE db1 CHARACTER SET utf8mb4; CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci); The idea is that there is no a need to specify the character set prefix in the new collation names. It's enough to type just the suffix "uca1400_as_ci". The character set is taken from the context. In the above example script the context character set is utf8mb4. So the CREATE TABLE will make a column with the collation utf8mb4_uca1400_as_ci. Short collations names can be used in any parts of the SQL syntax where the COLLATE clause is understood. - New collations are displayed only one time (without character set combinations) by these statements: SELECT * FROM INFORMATION_SCHEMA.COLLATIONS; SHOW COLLATION; For example, all these collations: - utf8mb3_uca1400_swedish_as_ci - utf8mb4_uca1400_swedish_as_ci - ucs2_uca1400_swedish_as_ci - utf16_uca1400_swedish_as_ci - utf32_uca1400_swedish_as_ci have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION, with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix without the character set name: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+ \| COLLATION_NAME \| +-----------------------+ \| uca1400_swedish_as_ci \| +-----------------------+ Note, the behaviour of old collations did not change. Non-unicode collations (e.g. latin1_swedish_ci) and old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci) are still displayed with the character set prefix, as before. - The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed. The NOT NULL constraint was removed from these columns: - CHARACTER_SET_NAME - ID - IS_DEFAULT and from the corresponding columns in SHOW COLLATION. For example: SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+--------------------+------+------------+ \| COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------+--------------------+------+------------+ \| uca1400_swedish_as_ci \| NULL \| NULL \| NULL \| +-----------------------+--------------------+------+------------+ The NULL value in these columns now means that the collation is applicable to multiple character sets. The behavioir of old collations did not change. Make sure your client programs can handle NULL values in these columns. - The structure of the table INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed. Three new NOT NULL columns were added: - FULL_COLLATION_NAME - ID - IS_DEFAULT New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY. The column COLLATION_NAME contains the collation name without the character set prefix. The column FULL_COLLATION_NAME contains the collation name with the character set prefix. Old collations have full collation name in both FULL_COLLATION_NAME and COLLATION_NAME. SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4\|latin1).swedish.ci$'; +-----------------------------+-------------------------------------+--------------------+------+------------+ \| COLLATION_NAME \| FULL_COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------------+-------------------------------------+--------------------+------+------------+ \| latin1_swedish_ci \| latin1_swedish_ci \| latin1 \| 8 \| Yes \| \| latin1_swedish_nopad_ci \| latin1_swedish_nopad_ci \| latin1 \| 1032 \| \| \| utf8mb4_swedish_ci \| utf8mb4_swedish_ci \| utf8mb4 \| 232 \| \| \| uca1400_swedish_ai_ci \| utf8mb4_uca1400_swedish_ai_ci \| utf8mb4 \| 2368 \| \| \| uca1400_swedish_as_ci \| utf8mb4_uca1400_swedish_as_ci \| utf8mb4 \| 2370 \| \| \| uca1400_swedish_nopad_ai_ci \| utf8mb4_uca1400_swedish_nopad_ai_ci \| utf8mb4 \| 2372 \| \| \| uca1400_swedish_nopad_as_ci \| utf8mb4_uca1400_swedish_nopad_as_ci \| utf8mb4 \| 2374 \| \| +-----------------------------+-------------------------------------+--------------------+------+------------+ - Other INFORMATION_SCHEMA queries: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS; SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES; SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS; display full collation names, including character sets prefix, for all collations, including new collations. Corresponding SHOW commands also display full collation names in collation related columns: SHOW CREATE TABLE t1; SHOW CREATE DATABASE db1; SHOW TABLE STATUS; SHOW CREATE FUNCTION f1; SHOW CREATE PROCEDURE p1; SHOW CREATE EVENT ev1; SHOW CREATE TRIGGER tr1; SHOW CREATE VIEW; These INFORMATION_SCHEMA queries and SHOW statements may change in the future, to display show collation names.	2022-08-10 15:04:24 +02:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Alexander Barkov	b915f79e4e	MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings	2022-01-21 12:16:07 +04:00
Monty	a206658b98	Change CHARSET_INFO character set and collaction names to LEX_CSTRING This change removed 68 explict strlen() calls from the code. The following renames was done to ensure we don't use the old names when merging code from earlier releases, as using the new variables for print function could result in crashes: - charset->csname renamed to charset->cs_name - charset->name renamed to charset->coll_name Almost everything where mechanical changes except: - Changed to use the new Protocol::store(LEX_CSTRING..) when possible - Changed to use field->store(LEX_CSTRING, CHARSET_INFO) when possible - Changed to use String->append(LEX_CSTRING&) when possible Other things: - There where compiler issues with ensuring that all character set names points to the same string: gcc doesn't allow one to use integer constants when defining global structures (constant char * pointers works fine). To get around this, I declared defines for each character set name length.	2021-05-19 22:54:07 +02:00
Rucha Deodhar	2fdb556e04	MDEV-8334: Rename utf8 to utf8mb3 This patch changes the main name of 3 byte character set from utf8 to utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default, so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.	2021-05-19 06:48:36 +02:00
Monty	dbcd3384e0	MDEV-7947 strcmp() takes 0.37% in OLTP RO This patch ensures that all identical character sets shares the same cs->csname. This allows us to replace strcmp() in my_charset_same() with comparisons of pointers. This fixes a long standing performance issue that could cause as strcmp() for every item sent trough the protocol class to the end user. One consequence of this patch is that we don't allow one to add a character definition in the Index.xml file that changes the csname of an existing character set. This is by design as changing character set names of existing ones is extremely dangerous, especially as some storage engines just records character set numbers. As we now have a hash over character set's csname, we can in the future use that for faster access to a specific character set. This could be done by changing the hash to non unique and use the hash to find the next character set with same csname.	2020-07-23 10:54:33 +03:00
Alexander Barkov	cfe5ee90c8	MDEV-22043 Special character leads to assertion in my_wc_to_printable_generic on 10.5.2 (debug) The code did not take into account that: - U+005C (backslash) can occupy more than mbminlen characters (e.g. in sjis) - Some character sets do not have a code for U+005C (e.g. swe7) Adding a new function my_wc_to_printable into MY_CHARSET_HANDLER to cover all special cases easier.	2020-05-09 16:01:30 +04:00
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	f177f125d4	Merge branch '5.5' into 10.1	2019-05-11 19:15:57 +03:00
Michal Schorm	17b4f99928	Update FSF address This commit is based on the work of Michal Schorm, rebased on the earliest MariaDB version. Th command line used to generate this diff was: find ./ -type f \ -exec sed -i -e 's/Foundation, Inc., 59 Temple Place, Suite 330, Boston, /Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, /g' {} \; \ -exec sed -i -e 's/Foundation, Inc. 59 Temple Place.* Suite 330, Boston, /Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, /g' {} \; \ -exec sed -i -e 's/MA......-1307.USA/MA 02110-1335 USA/g' {} \; \ -exec sed -i -e 's/Foundation, Inc., 59 Temple/Foundation, Inc., 51 Franklin/g' {} \; \ -exec sed -i -e 's/Place, Suite 330, Boston, MA.02111-1307.USA/Street, Fifth Floor, Boston, MA 02110-1335 USA/g' {} \; \ -exec sed -i -e 's/MA.*.....-1307/MA 02110-1335/g' {} \;	2019-05-10 20:52:00 +03:00
luz.paz	3dd01669b4	Misc. typos Found via `codespell -i 3 -w --skip="./debian/po" -I ../mariadb-server-word-whitelist.txt ./cmake/ ./debian/ ./Docs/ ./include/ ./man/ ./plugin/ ./strings/`	2018-04-05 15:26:57 +04:00
Monty	536215e32f	Added DBUG_ASSERT_AS_PRINTF compile flag If compiling a non DBUG binary with -DDBUG_ASSERT_AS_PRINTF asserts will be changed to printf + stack trace (of stack trace are enabled). - Changed #ifndef DBUG_OFF to #ifdef DBUG_ASSERT_EXISTS for those DBUG_OFF that was just used to enable assert - Assert checking that could greatly impact performance where changed to DBUG_ASSERT_SLOW which is not affected by DBUG_ASSERT_AS_PRINTF - Added one extra option to my_print_stacktrace() to get more silent in case of stack trace printing as part of assert.	2017-08-24 01:05:50 +02:00
Sergei Golubchik	da4d71d10d	Merge branch '10.1' into 10.2	2017-03-30 12:48:42 +02:00
iangilfillan	f0ec34002a	Correct FSF address	2017-03-10 18:21:29 +01:00
Alexander Barkov	ee19806b8e	MDEV-9711 NO PAD collations Based on the patch from Daniil Medvedev (a Google Summer of Code task)	2016-09-06 12:50:02 +04:00
Alexander Barkov	1ca595fbf7	LDML refactoring for "MDEV-9711 NO PAD collations" - Moving detection of the MY_CS_CSSORT, MY_CS_PUREASCII, MY_CS_NONASCII flags of loadable collations from add_collation() in mysys.c to my_cset_init_8bit() and my_coll_init_simple() in ctype-simple.c. - Adding tests that these flags are set properly for loadable collations - Moving LDML test related .xml files from mysql-test/std_data/ to mysql-test/std_data/ldml/, as there will be more .xml test files	2016-09-03 09:05:56 +04:00
Michael Widenius	9276e0e911	Cleanup of my_hash_sort patch strings/ctype-uca.c: HASH needs to be done in opposite order to preserve partitioned tables	2014-09-12 14:45:11 +03:00
Michael Widenius	c4f5326bb7	MDEV-6255 DUPLICATE KEY Errors on SELECT .. GROUP BY that uses temporary and filesort. The problem was that my_hash_sort didn't properly delete end-space characters properly, so strings that should compare identically was seen as different strings. (Space was handled correctly, but not NBSP) This caused duplicate key errors when a heap table was converted to Aria as part of overflow in group by. Fixed by removing all characters that compares as end space when creating a hash. Other things: - Fixed that --sorted_results also works for errors in mysqltest. - Speed up hash by not comparing strings that has different hash. - Speed up many my_hash_sort functions by using registers to calculate hash instead of pointers. This was previously done for some functions, but not for all. - Made a macro of the hash function, to simplify code and to be able to experiment with new hash functions. client/mysqltest.cc: Fixed that --sorted_results also works for error messages. mysql-test/r/ctype_partitions.result: New test to ensure that partitions on hash works mysql-test/suite/multi_source/gtid.result: Updated result mysql-test/suite/multi_source/gtid.test: Test that --sorted_result works for error messages mysql-test/suite/multi_source/gtid_ignore_duplicates.result: Updated result mysql-test/suite/multi_source/gtid_ignore_duplicates.test: Updated result mysql-test/suite/multi_source/load_data.result: Updated result mysql-test/suite/multi_source/load_data.test: Updated result mysql-test/t/ctype_partitions.test: New test to ensure that partitions on hash works storage/heap/hp_write.c: Speed up hash by not comparing strings that has different hash. storage/maria/ma_check.c: Extra debug strings/ctype-bin.c: Use macro for hash function strings/ctype-latin1.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-mb.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-simple.c: Use macro for hash function Use same variable names as in other my_hash_sort functions. Update my_hash_sort_simple() to properly remove end space (patch by Bar) strings/ctype-uca.c: Ignore duplicated space inside strings and end space in my_hash_sort_uca(). This fixed MDEV-6255 Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-ucs2.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-utf8.c: Use macro for hash function Use registers to calculate hash (speedup) strings/strings_def.h: Made a macro of the hash function, to simplify code and to be able to experiment with new hash functions.	2014-09-11 22:42:35 +03:00
Sergei Golubchik	76f0b94bb0	merge with 5.3 sql/sql_insert.cc: CREATE ... IF NOT EXISTS may do nothing, but it is still not a failure. don't forget to my_ok it. **** CREATE ... IF NOT EXISTS may do nothing, but it is still not a failure. don't forget to my_ok it. sql/sql_table.cc: small cleanup **** small cleanup	2011-10-19 21:45:18 +02:00
Michael Widenius	785695e7c3	Flush DBUG log in case of DBUG_ASSERT() Added strings_def.h into strings library to be able to have a DBUG_ASSERT() version without _db_flush() call (as strings.a should not depend on dbug.a) Remove include of m_string.h in all string files (as it's included by string_def.h). Fixed include order. Changed "m_ctype.h" -> <m_ctype.h> include/my_dbug.h: Flush DBUG log in case of DBUG_ASSERT() strings/bchange.c: Include strings_def.h strings/bcmp.c: Include strings_def.h strings/bfill.c: Include strings_def.h strings/bmove.c: Include strings_def.h strings/bmove512.c: Include strings_def.h strings/bmove_upp.c: Include strings_def.h strings/conf_to_src.c: Include strings_def.h Fixed copyright strings/ctype-big5.c: Include strings_def.h strings/ctype-bin.c: Include strings_def.h strings/ctype-cp932.c: Include strings_def.h strings/ctype-czech.c: Include strings_def.h strings/ctype-euc_kr.c: Include strings_def.h strings/ctype-eucjpms.c: Include strings_def.h strings/ctype-extra.c: Include strings_def.h strings/ctype-gbk.c: Include strings_def.h strings/ctype-latin1.c: Include strings_def.h strings/ctype-mb.c: Include strings_def.h strings/ctype-simple.c: Include strings_def.h strings/ctype-sjis.c: Include strings_def.h strings/ctype-tis620.c: Include strings_def.h strings/ctype-uca.c: Include strings_def.h strings/ctype-ucs2.c: Include strings_def.h strings/ctype-ujis.c: Include strings_def.h strings/ctype-utf8.c: Include strings_def.h strings/ctype-win1250ch.c: Include strings_def.h strings/ctype.c: Include strings_def.h strings/decimal.c: Include strings_def.h strings/do_ctype.c: Include strings_def.h strings/int2str.c: Include strings_def.h strings/is_prefix.c: Include strings_def.h strings/llstr.c: Include strings_def.h strings/longlong2str.c: Include strings_def.h strings/longlong2str_asm.c: Include strings_def.h strings/my_strchr.c: Include strings_def.h strings/my_strtoll10.c: Include strings_def.h strings/my_vsnprintf.c: Include strings_def.h strings/r_strinstr.c: Include strings_def.h strings/str2int.c: Include strings_def.h strings/str_alloc.c: Include strings_def.h strings/str_test.c: Include strings_def.h Fixed compiler warnings strings/strappend.c: Include strings_def.h strings/strcend.c: Include strings_def.h strings/strcont.c: Include strings_def.h strings/strend.c: Include strings_def.h strings/strfill.c: Include strings_def.h strings/strinstr.c: Include strings_def.h strings/strmake.c: Include strings_def.h strings/strmov.c: Include strings_def.h strings/strmov_overlapp.c: Include strings_def.h strings/strnlen.c: Include strings_def.h strings/strnmov.c: Include strings_def.h strings/strstr.c: Include strings_def.h strings/strto.c: Include strings_def.h strings/strtod.c: Include strings_def.h strings/strtol.c: Include strings_def.h strings/strtoll.c: Include strings_def.h strings/strtoul.c: Include strings_def.h strings/strtoull.c: Include strings_def.h strings/strxmov.c: Include strings_def.h strings/strxnmov.c: Include strings_def.h strings/uctypedump.c: Include strings_def.h Fixed compiler warnings Removed double include of m_ctype.h strings/udiv.c: Include strings_def.h strings/xml.c: Include strings_def.h	2011-01-30 12:41:44 +02:00

28 commits