Commit graph

1832 commits

Author SHA1 Message Date
Alexander Barkov
8020b1bd73 MDEV-30034 UNIQUE USING HASH accepts duplicate entries for tricky collations
- Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars()
  and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES.
  The flag defines if strnncollsp_nchars() should emulate trailing spaces
  which were possibly trimmed earlier (e.g. in InnoDB CHAR compression).
  This is important for NOPAD collations.

  For example, with this input:
   - str1= 'a '    (Latin letter a followed by one space)
   - str2= 'a  '   (Latin letter a followed by two spaces)
   - nchars= 3
  if the flag is given, strnncollsp_nchars() will virtually restore
  one trailing space to str1 up to nchars (3) characters and compare two
  strings as equal:
  - str1= 'a  '  (one extra trailing space emulated)
  - str2= 'a  '  (as is)

  If the flag is not given, strnncollsp_nchars() does not add trailing
  virtual spaces, so in case of a NOPAD collation, str1 will be compared
  as less than str2 because it is shorter.

- Field_string::cmp_prefix() now passes the new flag.
  Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do
  not pass the new flag.

- The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc
  (which handles the CHAR data type) now also passed the new flag.

- Fixing UCA collations to respect the new flag.
  Other collations are possibly also affected, however
  I had no success in making an SQL script demonstrating the problem.
  Other collations will be extended to respect this flags in a separate
  patch later.

- Changing the meaning of the last parameter of Field::cmp_prefix()
  from "number of bytes" (internal length)
  to "number of characters" (user visible length).

  The code calling cmp_prefix() from handler.cc was wrong.
  After this change, the call in handler.cc became correct.

  The code calling cmp_prefix() from key_rec_cmp() in key.cc
  was adjusted according to this change.

- Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c
  now pass the new flag.
  A few new tests also were added, without the flag.
2023-04-04 12:30:50 +04:00
Oleksandr Byelkin
ac5a534a4c Merge remote-tracking branch '10.4' into 10.5 2023-03-31 21:32:41 +02:00
Christian Gonzalez
8b0f766c6c Minimize unsafe C functions usage
Replace calls to `sprintf` and `strcpy` by the safer options `snprintf`
and `safe_strcpy` in the following directories:

- libmysqld
- mysys
- sql-common
- strings

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
2023-03-08 10:36:25 +00:00
Marko Mäkelä
9267160c11 Merge 10.10 into 10.11 2023-03-06 13:39:12 +02:00
Alexander Barkov
0bf400a19a A cleanup for MDEV-30695 Refactor case folding data types in Asian collations
Adding "const" qualifiers to casefold_info_st::page
2023-03-03 04:49:28 +04:00
Alexander Barkov
965bdf3e66 MDEV-30746 Regression in ucs2_general_mysql500_ci
1. Adding a separate MY_COLLATION_HANDLER
   my_collation_ucs2_general_mysql500_ci_handler
   implementing a proper order for ucs2_general_mysql500_ci
   The problem happened because ucs2_general_mysql500_ci
   erroneously used my_collation_ucs2_general_ci_handler.

2. Cosmetic changes: Renaming:
   - plane00_mysql500 to my_unicase_mysql500_page00
   - my_unicase_pages_mysql500 to my_unicase_mysql500_pages
   to use the same naming style with:
   - my_unicase_default_page00
   - my_unicase_defaul_pages

3. Moving code fragments from
   - handler::check_collation_compatibility() in handler.cc
   - upgrade_collation() in table.cc
   into new methods in class Charset, to reuse the code easier.
2023-03-01 15:38:02 +04:00
Marko Mäkelä
95d51369c9 Merge 10.10 into 10.11 2023-02-28 10:52:42 +02:00
Marko Mäkelä
f14d9fa09a Merge 10.9 into 10.10 2023-02-28 10:43:29 +02:00
Marko Mäkelä
c3246e4bf0 Merge 10.8 into 10.9 2023-02-28 10:37:11 +02:00
Alexander Barkov
b62123e0d5 MDEV-30716 Wrong casefolding in xxx_unicode_520_ci for U+0700..U+07FF
The array my_unicase_pages_unicode520[7] erroneously mapped to plane06
instead of plane07.
2023-02-23 23:40:45 +04:00
Helmut Grohne
6f6fa3bec2 MDEV-30694: Cross building on x86_64 to arch i686 fails
Currently cross compilation on x86_64 to arch i686 fails
with error:

> ctype-uca1400data.h
/bin/sh: 1: uca-dump: not found

Commit makes sure that uca-dump is treated correctly
when cross compiling MariaDB to another architecture
2023-02-22 16:01:46 +00:00
Alexander Barkov
33f8f92b74 MDEV-30695 Refactor case folding data types in Asian collations
This is a non-functional change and should not change the server behavior.

Casefolding information is now stored in items of a new data type MY_CASEFOLD_CHARACTER:

typedef struct casefold_info_char_t
{
  uint32 toupper;
  uint32 tolower;
} MY_CASEFOLD_CHARACTER;

Before this change, casefolding tables for Asian collations were stored in:

typedef struct unicase_info_char_st
{
  uint32 toupper;
  uint32 tolower;
  uint32 sort;
} MY_UNICASE_CHARACTER;

The "sort" member was not used in the code handling Asian collations,
it only wasted space.
(it's only used by Unicode _general_ci and _general_mysql500_ci collations).

Unicode collations (at least UCA and _bin) should also be refactored later,
but under terms of a separate task.
2023-02-21 14:10:25 +04:00
Alexander Barkov
7e341cc740 MDEV-30692 conf_to_src is not up to date
Fixing conf_to_src.c according to changes made by
 a206658b98

Re-generating ctype-extra.c at once, to fix the indentation
from manually edited to automatic.
2023-02-21 11:07:25 +04:00
Alexander Barkov
7f6b648d7d MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8
String length growth during upper/lower conversion
in Unicode collations depends only on the underlying MY_UNICASE_INFO
used in the collation.

Maintaining a separate member CHARSET_INFO::caseup_multiply and
CHARSET_INFO::casedn_multiply duplicated this information
and caused bugs like this (when MY_UNICASE_INFO and case??_multiply
when out of sync because of incomplete CHARSET_INFO initialization).

Fix:

Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply
from members to virtual functions.
The virtual functions in Unicode collations calculate case conversion
growth factors from the MY_UNICASE_INFO. This guarantees that the growth
factors are always in sync with the MY_UNICASE_INFO.
2023-02-17 17:33:27 +04:00
Marko Mäkelä
1fd0099839 Merge 10.10 into 10.11 2023-02-16 11:41:18 +02:00
Marko Mäkelä
345356b868 Merge 10.9 into 10.10 2023-02-16 11:36:38 +02:00
Marko Mäkelä
0d55914d96 Merge 10.8 into 10.9 2023-02-16 10:25:34 +02:00
Marko Mäkelä
dbab3e8d90 Merge 10.6 into 10.8 2023-02-10 13:43:53 +02:00
Marko Mäkelä
6aec87544c Merge 10.5 into 10.6 2023-02-10 13:03:01 +02:00
Marko Mäkelä
c41c79650a Merge 10.4 into 10.5 2023-02-10 12:02:11 +02:00
Alexander Barkov
0845bce0d9 MDEV-30556 UPPER() returns an empty string for U+0251 in Unicode-5.2.0+ collations for utf8 2023-02-03 18:18:32 +04:00
Oleksandr Byelkin
c7c415734d Merge branch '10.10' into 10.11 2023-01-31 11:07:08 +01:00
Oleksandr Byelkin
76bcea3154 Merge branch '10.9' into 10.10 2023-01-31 11:01:48 +01:00
Oleksandr Byelkin
de2d089942 Merge branch '10.8' into 10.9 2023-01-31 10:37:31 +01:00
Oleksandr Byelkin
638625278e Merge branch '10.7' into 10.8 2023-01-31 09:57:52 +01:00
Oleksandr Byelkin
b923b80cfd Merge branch '10.6' into 10.7 2023-01-31 09:33:58 +01:00
Oleksandr Byelkin
c3a5cf2b5b Merge branch '10.5' into 10.6 2023-01-31 09:31:42 +01:00
Oleksandr Byelkin
7fa02f5c0b Merge branch '10.4' into 10.5 2023-01-27 13:54:14 +01:00
Sergei Golubchik
0c27559994 MDEV-26817 runtime error: index 24320 out of bounds for type 'json_string_char_classes [128] *and* ASAN: global-buffer-overflow on address ... READ of size 4 on SELECT JSON_VALID
protect from out-of-bound array access

it was already done in all other places, this one was the only one missed
2023-01-20 19:43:15 +01:00
Marko Mäkelä
3a237f7666 Merge 10.10 into 10.11 2023-01-11 11:13:56 +02:00
Marko Mäkelä
cae5a0328b Merge 10.9 into 10.10 2023-01-10 15:06:25 +02:00
Alexander Freiherr von Buddenbrock
0225159a8d MDEV-29381: SON paths containing dashes are reported as syntax errors in
procedures

MDEV-22224 caused the parsing of keys with hyphens to break by setting
the state transitions for parsing keys to JE_SYN (syntax error) when
they encounter a hyphen. However json key names may contain hyphens and
still be considered valid json.

This patch changes the state transition table so that key names with
hyphens remain valid. Note that unquoted key names in paths like
$.key-name are also valid again. This restores the previous behaviour
when hyphens were considered part of the P_ETC character class.
2023-01-06 12:55:51 +05:30
Marko Mäkelä
0aca3012a1 Merge 10.10 into 10.11 2022-12-14 09:18:30 +02:00
Marko Mäkelä
fa389b9098 Merge 10.9 into 10.10 2022-12-14 08:57:39 +02:00
Marko Mäkelä
b7914f562d Merge 10.8 into 10.9 2022-12-13 18:24:51 +02:00
Marko Mäkelä
d7a4ce3c80 Merge 10.7 into 10.8 2022-12-13 18:11:24 +02:00
Marko Mäkelä
25b91c3f13 Merge 10.6 into 10.7 2022-12-13 18:01:49 +02:00
Marko Mäkelä
a8a5c8a1b8 Merge 10.5 into 10.6 2022-12-13 16:58:58 +02:00
Marko Mäkelä
1dc2f35598 Merge 10.4 into 10.5 2022-12-13 14:39:18 +02:00
Marko Mäkelä
fdf43b5c78 Merge 10.3 into 10.4 2022-12-13 11:37:33 +02:00
Marko Mäkelä
64071d30bd Merge 10.10 into 10.11 2022-12-07 10:00:52 +02:00
Marko Mäkelä
3ff4eb07ed Merge 10.9 into 10.10 2022-12-07 09:49:38 +02:00
Marko Mäkelä
23f705f3a2 Merge 10.8 into 10.9 2022-12-07 09:43:38 +02:00
Marko Mäkelä
b3c254339b Merge 10.7 into 10.8 2022-12-07 09:43:13 +02:00
Marko Mäkelä
9e27e53dfa Merge 10.6 into 10.7 2022-12-07 09:39:46 +02:00
Marko Mäkelä
e55397a46d Merge 10.5 into 10.6 2022-12-05 18:04:23 +02:00
Jan Lindström
4eb8e51c26 Merge 10.4 into 10.5 2022-11-30 13:10:52 +02:00
Alexander Barkov
931549ff66 MDEV-27670 Assertion `(cs->state & 0x20000) == 0' failed in my_strnncollsp_nchars_generic_8bit
Also fixes:

MDEV-27768 MDEV-25440: Assertion `(cs->state & 0x20000) == 0' failed in my_strnncollsp_nchars_generic_8bit

The "strnncollsp_nchars" virtual function pointer for tis620_thai_nopad_ci
was incorrectly initialized to a generic function
my_strnncollsp_nchars_generic_8bit(), which crashed on assert.

Implementing a tis620 specific function version.
2022-11-22 14:03:23 +04:00
Alexander Barkov
6216a2dfa2 MDEV-29473 UBSAN: Signed integer overflow: X * Y cannot be represented in type 'int' in strings/dtoa.c
Fixing a few problems relealed by UBSAN in type_float.test

- multiplication overflow in dtoa.c

- uninitialized Field::geom_type (and Field::srid as well)

- Wrong call-back function types used in combination with SHOW_FUNC.
  Changes in the mysql_show_var_func data type definition were not
  properly addressed all around the code by the following commits:
    b4ff64568c
    18feb62fee
    0ee879ff8a

  Adding a helper SHOW_FUNC_ENTRY() function and replacing
  all mysql_show_var_func declarations using SHOW_FUNC
  to SHOW_FUNC_ENTRY, to catch mysql_show_var_func in the future
  at compilation time.
2022-11-17 17:51:01 +04:00
Jan Lindström
90608bd649 Merge 10.10 into 10.11 2022-09-06 11:32:54 +03:00