mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-28 17:54:16 +01:00

Author	SHA1	Message	Date
Oleksandr Byelkin	036df5f970	Merge branch '10.10' into 10.11	2023-08-08 14:57:31 +02:00
Oleksandr Byelkin	ced243a099	Merge branch '10.9' into 10.10	2023-08-05 20:34:09 +02:00
Oleksandr Byelkin	34a8e78581	Merge branch '10.6' into 10.9	2023-08-04 08:01:06 +02:00
Oleksandr Byelkin	6bf8483cac	Merge branch '10.5' into 10.6	2023-08-01 15:08:52 +02:00
Oleksandr Byelkin	7564be1352	Merge branch '10.4' into 10.5	2023-07-26 16:02:57 +02:00
Oleksandr Byelkin	f52954ef42	Merge commit '10.4' into 10.5	2023-07-20 11:54:52 +02:00
Alexander Barkov	03c2157dd6	MDEV-28384 UBSAN: null pointer passed as argument 1, which is declared to never be null in my_strnncoll_binary on SELECT ... COUNT or GROUP_CONCAT Also fixes: MDEV-30982 UBSAN: runtime error: null pointer passed as argument 2, which is declared to never be null in my_strnncoll_binary on DELETE Calling memcmp() with a NULL pointer is undefined behaviour according to the C standard, even if the length argument is 0. Adding tests for length==0 before calling memcmp() into: - my_strnncoll_binary() - my_strnncoll_8bit_bin	2023-07-20 11:56:19 +04:00
Marko Mäkelä	c04284e747	Merge 10.10 into 10.11	2023-06-07 15:01:43 +03:00
Marko Mäkelä	82230aa423	Merge 10.9 into 10.10	2023-06-07 14:48:37 +03:00
anson1014	1db4fc543b	Ensure that source files contain only valid UTF8 encodings (#2188 ) Modern software (including text editors, static analysis software, and web-based code review interfaces) often requires source code files to be interpretable via a consistent character encoding, with UTF-8 or ASCII (a strict subset of UTF-8) as the default. Several of the MariaDB source files contain bytes that are not valid in either the UTF-8 or ASCII encodings, but instead represent strings encoded in the ISO-8859-1/Latin-1 or ISO-8859-2/Latin-2 encodings. These inconsistent encodings may prevent software from correctly presenting or processing such files. Converting all source files to valid UTF8 characters will ensure correct handling. Comments written in Czech were replaced with lightly-corrected translations from Google Translate. Additionally, comments describing the proper handling of special characters were changed so that the comments are now purely UTF8. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc. Co-authored-by: Andrew Hutchings <andrew@linuxjedi.co.uk>	2023-05-19 13:21:34 +01:00
Rucha Deodhar	b7b8a9ee43	MDEV-23187: Assorted assertion failures in json_find_path with certain collations Fix by Alexey Botchkov The 'value_len' is calculated wrong for the multibyte charsets. In the read_strn() function we get the length of the string with the final ' " ' character. So have to subtract it's length from the value_len. And the length of '1' isn't correct for the ucs2 charset (must be 2).	2023-05-16 01:52:16 +05:30
Rucha Deodhar	3b34454c9d	MDEV-23187: Assorted assertion failures in json_find_path with certain collations Analysis: When we have negative index, the value in array_counter[] array is going to be -1 at some point ( because in case of negative index in json path, the initial value for a path with negative index is -<size_of_array>, and as we move forward in array while parsing it and finding path, this value increments). Since SKIPPED_STEP_MARK, is maximum uint value, it gets compared to some int value in the array and eventually equates to -1 and messes with path. Fix: Make SKIPPED_STEP_MARK maximum of INT32.	2023-05-15 12:17:30 +05:30
Oleksandr Byelkin	06d03dcdd3	Merge branch '10.10' into 10.11	2023-05-03 21:05:34 +02:00
Marko Mäkelä	e02a2f4e9f	Merge 10.9 into 10.10	2023-05-02 10:22:43 +03:00
Marko Mäkelä	d8997f875e	Merge 10.8 into 10.9	2023-04-28 13:39:33 +03:00
Marko Mäkelä	7d967423fe	MDEV-31147 json_normalize does not work correctly with MSAN build json_normalize_number(): Avoid accessing str past str_len. The function would seem to work incorrectly when some digits are not followed by a decimal point (.) or an exponent (E or e).	2023-04-28 12:15:45 +03:00
Alexander Barkov	b0ecf4693d	Merge remote-tracking branch 'origin/10.10' into 10.11	2023-04-26 13:10:57 +04:00
Alexander Barkov	c21745dbe4	MDEV-30577 Case folding for uca1400 collations is not up to date Adding casefolding for Unicode-14.0.0 into uca1400 collations.	2023-04-18 11:31:05 +04:00
Alexander Barkov	6075f12c65	MDEV-31071 Refactor case folding data types in Unicode collations This is a non-functional change. It changes the way how case folding data and weight data (for simple Unicode collations) are stored: - Removing data types MY_UNICASE_CHARACTER, MY_UNICASE_INFO - Using data types MY_CASEFOLD_CHARACTER, MY_CASEFOLD_INFO instead. This patch changes simple Unicode collations in a similar way how MDEV-30695 previously changed Asian collations. No new MTR tests are needed. The underlying code is thoroughly covered by a number of ctype__ws.test and ctype__casefold.test files, which were added recently as a preparation for this change. Old and new Unicode data layout ------------------------------- Case folding data is now stored in separate tables consisting of MY_CASEFOLD_CHARACTER elements with two members: typedef struct casefold_info_char_t { uint32 toupper; uint32 tolower; } MY_CASEFOLD_CHARACTER; while weight data (for simple non-UCA collations xxx_general_ci and xxx_general_mysql500_ci) is stored in separate arrays of uint16 elements. Before this change case folding data and simple weight data were stored together, in tables of the following elements with three members: typedef struct unicase_info_char_st { uint32 toupper; uint32 tolower; uint32 sort; /* weights for simple collations */ } MY_UNICASE_CHARACTER; This data format was redundant, because weights (the "sort" member) were needed only for these two simple Unicode collations: - xxx_general_ci - xxx_general_mysql500_ci Adding case folding information for Unicode-14.0.0 using the old format would waste memory without purpose. Detailed changes ---------------- - Changing the underlying data types as described above - Including unidata-dump.c into the sources. This program was earlier used to dump UnicodeData.txt (e.g. https://www.unicode.org/Public/14.0.0/ucd/UnicodeData.txt) into MySQL / MariaDB source files. It was originally written in 2002, but has not been distributed yet together with MySQL / MariaDB sources. - Removing the old format Unicode data earlier dumped from UnicodeData.txt (versions 3.0.0 and 5.2.0) from ctype-utf8.c. Adding Unicode data in the new format into separate header files, to maintain the code easier: - ctype-unicode300-casefold.h - ctype-unicode300-casefold-tr.h - ctype-unicode300-general_ci.h - ctype-unicode300-general_mysql500_ci.h - ctype-unicode520-casefold.h - Adding a new file ctype-unidata.c as an aggregator for the header files listed above.	2023-04-18 11:29:25 +04:00
Alexander Barkov	2ad287caad	MDEV-31069 Reuse duplicate char-to-weight conversion code in ctype-utf8.c and ctype-ucs2.c Removing similar functions from ctype-utf8.c and ctype-ucs2.c - my_tosort_utf16() - my_tosort_utf32() - my_tosort_ucs2() - my_tosort_unicode() Adding new shared functions into ctype-unidata.h: - my_tosort_unicode_bmp() - reused for utf8mb3, ucs2 - my_tosort_unicode() - reused for utf8mb4, utf16, utf32 For simplicity, the new version of my_tosort_unicode*() does not include the code handling the MY_CS_LOWER_SORT flag because: - it affects performance negatively - we don't have any collations with this flag yet anyway (This code was most likely earlier erroneously merged from MySQL's utf8_tolower_ci at some point.)	2023-04-18 10:24:05 +04:00
Alexander Barkov	30b4bb4204	MDEV-31068 Reuse duplicate case conversion code in ctype-utf8.c and ctype-ucs2.c	2023-04-18 06:44:03 +04:00
Marko Mäkelä	656c2e18b1	Merge 10.10 into 10.11	2023-04-14 13:08:28 +03:00
Marko Mäkelä	a009280e60	Merge 10.9 into 10.10	2023-04-14 12:24:14 +03:00
Marko Mäkelä	44281b88f3	Merge 10.8 into 10.9	2023-04-14 11:32:36 +03:00
Marko Mäkelä	1d1e0ab2cc	Merge 10.6 into 10.8	2023-04-12 15:50:08 +03:00
Marko Mäkelä	5bada1246d	Merge 10.5 into 10.6	2023-04-11 16:15:19 +03:00
Alexander Barkov	62e137d4d7	Merge remote-tracking branch 'origin/10.4' into 10.5	2023-04-05 16:16:19 +04:00
Alexander Barkov	8020b1bd73	MDEV-30034 UNIQUE USING HASH accepts duplicate entries for tricky collations - Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars() and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES. The flag defines if strnncollsp_nchars() should emulate trailing spaces which were possibly trimmed earlier (e.g. in InnoDB CHAR compression). This is important for NOPAD collations. For example, with this input: - str1= 'a ' (Latin letter a followed by one space) - str2= 'a ' (Latin letter a followed by two spaces) - nchars= 3 if the flag is given, strnncollsp_nchars() will virtually restore one trailing space to str1 up to nchars (3) characters and compare two strings as equal: - str1= 'a ' (one extra trailing space emulated) - str2= 'a ' (as is) If the flag is not given, strnncollsp_nchars() does not add trailing virtual spaces, so in case of a NOPAD collation, str1 will be compared as less than str2 because it is shorter. - Field_string::cmp_prefix() now passes the new flag. Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do not pass the new flag. - The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc (which handles the CHAR data type) now also passed the new flag. - Fixing UCA collations to respect the new flag. Other collations are possibly also affected, however I had no success in making an SQL script demonstrating the problem. Other collations will be extended to respect this flags in a separate patch later. - Changing the meaning of the last parameter of Field::cmp_prefix() from "number of bytes" (internal length) to "number of characters" (user visible length). The code calling cmp_prefix() from handler.cc was wrong. After this change, the call in handler.cc became correct. The code calling cmp_prefix() from key_rec_cmp() in key.cc was adjusted according to this change. - Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c now pass the new flag. A few new tests also were added, without the flag.	2023-04-04 12:30:50 +04:00
Oleksandr Byelkin	ac5a534a4c	Merge remote-tracking branch '10.4' into 10.5	2023-03-31 21:32:41 +02:00
Christian Gonzalez	8b0f766c6c	Minimize unsafe C functions usage Replace calls to `sprintf` and `strcpy` by the safer options `snprintf` and `safe_strcpy` in the following directories: - libmysqld - mysys - sql-common - strings All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.	2023-03-08 10:36:25 +00:00
Marko Mäkelä	9267160c11	Merge 10.10 into 10.11	2023-03-06 13:39:12 +02:00
Alexander Barkov	0bf400a19a	A cleanup for MDEV-30695 Refactor case folding data types in Asian collations Adding "const" qualifiers to casefold_info_st::page	2023-03-03 04:49:28 +04:00
Alexander Barkov	965bdf3e66	MDEV-30746 Regression in ucs2_general_mysql500_ci 1. Adding a separate MY_COLLATION_HANDLER my_collation_ucs2_general_mysql500_ci_handler implementing a proper order for ucs2_general_mysql500_ci The problem happened because ucs2_general_mysql500_ci erroneously used my_collation_ucs2_general_ci_handler. 2. Cosmetic changes: Renaming: - plane00_mysql500 to my_unicase_mysql500_page00 - my_unicase_pages_mysql500 to my_unicase_mysql500_pages to use the same naming style with: - my_unicase_default_page00 - my_unicase_defaul_pages 3. Moving code fragments from - handler::check_collation_compatibility() in handler.cc - upgrade_collation() in table.cc into new methods in class Charset, to reuse the code easier.	2023-03-01 15:38:02 +04:00
Marko Mäkelä	95d51369c9	Merge 10.10 into 10.11	2023-02-28 10:52:42 +02:00
Marko Mäkelä	f14d9fa09a	Merge 10.9 into 10.10	2023-02-28 10:43:29 +02:00
Marko Mäkelä	c3246e4bf0	Merge 10.8 into 10.9	2023-02-28 10:37:11 +02:00
Alexander Barkov	b62123e0d5	MDEV-30716 Wrong casefolding in xxx_unicode_520_ci for U+0700..U+07FF The array my_unicase_pages_unicode520[7] erroneously mapped to plane06 instead of plane07.	2023-02-23 23:40:45 +04:00
Helmut Grohne	6f6fa3bec2	MDEV-30694: Cross building on x86_64 to arch i686 fails Currently cross compilation on x86_64 to arch i686 fails with error: > ctype-uca1400data.h /bin/sh: 1: uca-dump: not found Commit makes sure that uca-dump is treated correctly when cross compiling MariaDB to another architecture	2023-02-22 16:01:46 +00:00
Alexander Barkov	33f8f92b74	MDEV-30695 Refactor case folding data types in Asian collations This is a non-functional change and should not change the server behavior. Casefolding information is now stored in items of a new data type MY_CASEFOLD_CHARACTER: typedef struct casefold_info_char_t { uint32 toupper; uint32 tolower; } MY_CASEFOLD_CHARACTER; Before this change, casefolding tables for Asian collations were stored in: typedef struct unicase_info_char_st { uint32 toupper; uint32 tolower; uint32 sort; } MY_UNICASE_CHARACTER; The "sort" member was not used in the code handling Asian collations, it only wasted space. (it's only used by Unicode _general_ci and _general_mysql500_ci collations). Unicode collations (at least UCA and _bin) should also be refactored later, but under terms of a separate task.	2023-02-21 14:10:25 +04:00
Alexander Barkov	7e341cc740	MDEV-30692 conf_to_src is not up to date Fixing conf_to_src.c according to changes made by `a206658b98` Re-generating ctype-extra.c at once, to fix the indentation from manually edited to automatic.	2023-02-21 11:07:25 +04:00
Alexander Barkov	7f6b648d7d	MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8 String length growth during upper/lower conversion in Unicode collations depends only on the underlying MY_UNICASE_INFO used in the collation. Maintaining a separate member CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply duplicated this information and caused bugs like this (when MY_UNICASE_INFO and case??_multiply when out of sync because of incomplete CHARSET_INFO initialization). Fix: Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply from members to virtual functions. The virtual functions in Unicode collations calculate case conversion growth factors from the MY_UNICASE_INFO. This guarantees that the growth factors are always in sync with the MY_UNICASE_INFO.	2023-02-17 17:33:27 +04:00
Marko Mäkelä	1fd0099839	Merge 10.10 into 10.11	2023-02-16 11:41:18 +02:00
Marko Mäkelä	345356b868	Merge 10.9 into 10.10	2023-02-16 11:36:38 +02:00
Marko Mäkelä	0d55914d96	Merge 10.8 into 10.9	2023-02-16 10:25:34 +02:00
Marko Mäkelä	dbab3e8d90	Merge 10.6 into 10.8	2023-02-10 13:43:53 +02:00
Marko Mäkelä	6aec87544c	Merge 10.5 into 10.6	2023-02-10 13:03:01 +02:00
Marko Mäkelä	c41c79650a	Merge 10.4 into 10.5	2023-02-10 12:02:11 +02:00
Alexander Barkov	0845bce0d9	MDEV-30556 UPPER() returns an empty string for U+0251 in Unicode-5.2.0+ collations for utf8	2023-02-03 18:18:32 +04:00
Oleksandr Byelkin	c7c415734d	Merge branch '10.10' into 10.11	2023-01-31 11:07:08 +01:00
Oleksandr Byelkin	76bcea3154	Merge branch '10.9' into 10.10	2023-01-31 11:01:48 +01:00

1 2 3 4 5 ...

1759 commits