Commit graph

1788 commits

Author SHA1 Message Date
Marko Mäkelä
d62b0368ca Merge 10.4 into 10.5 2022-03-29 12:59:18 +03:00
Marko Mäkelä
ae6e214fd8 Merge 10.3 into 10.4 2022-03-29 11:13:18 +03:00
Marko Mäkelä
020e7d89eb Merge 10.2 into 10.3 2022-03-29 09:53:15 +03:00
Alexander Barkov
0c4c064f98 MDEV-27743 Remove Lex::charset
This patch also fixes:

MDEV-27690 Crash on `CHARACTER SET csname COLLATE DEFAULT` in column definition
MDEV-27853 Wrong data type on column `COLLATE DEFAULT` and table `COLLATE some_non_default_collation`
MDEV-28067 Multiple conflicting column COLLATE clauses are not rejected
MDEV-28118 Wrong collation of `CAST(.. AS CHAR COLLATE DEFAULT)`
MDEV-28119 Wrong column collation on MODIFY + CONVERT
2022-03-22 17:12:15 +04:00
Alexey Botchkov
6277e7df6b MDEV-22742 UBSAN: Many overflow issues in strings/decimal.c - runtime error: signed integer overflow: x * y cannot be represented in type 'long long int' (on optimized builds).
Avoid integer overflow, do the check before the calculation.
2022-03-21 15:05:42 +04:00
Marko Mäkelä
18bb95b608 Merge 10.7 into 10.8 2022-03-14 11:52:11 +02:00
Marko Mäkelä
e67d46e4a1 Merge 10.6 into 10.7 2022-03-14 11:30:32 +02:00
Marko Mäkelä
572e34304e Merge 10.5 into 10.6 2022-03-14 10:59:46 +02:00
Marko Mäkelä
59359fb44a MDEV-24841 Build error with MSAN use-of-uninitialized-value in comp_err
The MemorySanitizer implementation in clang includes some built-in
instrumentation (interceptors) for GNU libc. In GNU libc 2.33, the
interface to the stat() family of functions was changed. Until the
MemorySanitizer interceptors are adjusted, any MSAN code builds
will act as if that the stat() family of functions failed to initialize
the struct stat.

A fix was applied in
https://reviews.llvm.org/rG4e1a6c07052b466a2a1cd0c3ff150e4e89a6d87a
but it fails to cover the 64-bit variants of the calls.

For now, let us work around the MemorySanitizer bug by defining
and using the macro MSAN_STAT_WORKAROUND().
2022-03-14 09:28:55 +02:00
Sergei Golubchik
a4f0ae7c18 UBSAN: out of bound array read in json
json_lib.c:847:25: runtime error: index 200 out of bounds for type 'json_string_char_classes [128]'
json_lib.c:847:25: runtime error: load of address 0x56286f7175a0 with insufficient space for an object of type 'json_string_char_classes'

fixes main.json_equals  and main.json_normalize
2022-02-24 19:18:19 +01:00
Oleksandr Byelkin
4fb2cb1a30 Merge branch '10.7' into 10.8 2022-02-04 14:50:25 +01:00
Oleksandr Byelkin
9ed8deb656 Merge branch '10.6' into 10.7 2022-02-04 14:11:46 +01:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Sergei Golubchik
bc10f58a58 MDEV-24909 JSON functions don't respect KILL QUERY / max_statement_time limit
pass the pointer to thd->killed down to the json library,
check it while scanning,
use thd->check_killed() to generate the proper error message
2022-01-30 12:07:31 +01:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00
Oleksandr Byelkin
41a163ac5c Merge branch '10.2' into 10.3 2022-01-29 15:41:05 +01:00
Alexander Barkov
b915f79e4e MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings 2022-01-21 12:16:07 +04:00
Alexander Barkov
47463e5796 MDEV-27552 Change the return type of my_uca_context_weight_find() to MY_CONTRACTION* 2022-01-20 15:44:13 +04:00
Sergei Petrunia
ce4956f322 Code cleanup 2022-01-19 18:14:07 +03:00
Sergei Petrunia
3936dc3353 MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
Part#3:
- make json_escape() return different errors on conversion error
  and on out-of-space condition.
- Make histogram code handle conversion errors.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
dde6d76995 Trivial code cleanup 2022-01-19 18:10:09 +03:00
Sergei Petrunia
72c0ba43b2 Code cleanup part #1 2022-01-19 18:10:09 +03:00
Michael Okoko
fb2edab3eb Extract json parser functions from class
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
6bc2df5fa4 Add parser to read JSON array (of histograms) into string vector
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Vladislav Vaintroub
47e18af906 MDEV-27494 Rename .ic files to .inl 2022-01-17 16:41:51 +01:00
Marko Mäkelä
4f7574b10c MDEV-27042 fixup: GCC 11 -Og -Wmaybe-uninitialized 2021-11-29 09:24:58 +02:00
Alexander Barkov
f9ad8072cd MDEV-27042 UCA: Resetting contractions to ignorable does not work well
The weight scanner routine scanner_next() did not properly handle the cases
when a contraction produces no weights (is ignorable).

Adding a helper routine my_uca_scanner_set_weight() and using
it in all cases:

- A single ASCII character
- A contraction starting with an ASCII character
- A multi-byte character
- A contraction starting with a multi-byte character

Also adding two other helper routines:

- my_uca_scanner_next_expansion_weight()
- my_uca_scanner_set_weight_outside_maxchar()

to avoid using scanner->wbeg directly inside scanner_next().
This reduces the probability of similar future bugs.
2021-11-24 13:45:35 +04:00
Alexander Barkov
0a3d1d106a Refactoring for MDEV-27042 and MDEV-27009
This patch prepares the code for upcoming changes:

MDEV-27009 Add UCA-14.0.0 collations
MDEV-27042 UCA: Resetting contractions to ignorable does not work well

1. Adding "const" qualifiers to return type and parameters in functions:
- my_uca_contraction2_weight()
- my_wmemcmp()
- my_uca_contraction_weight()
- my_uca_scanner_contraction_find()
- my_uca_previous_context_find()
- my_uca_context_weight_find()

2. Adding a helper function my_uca_true_contraction_eq()

3. Changing the way how scanner->wbeg is set during context weight handling.
   It was previously set inside functions:
   - my_uca_scanner_contraction_find()
   - my_uca_previous_context_find()
   Now it's set inside scanner_next(), which makes the code more symmetric
   for context-free and context-dependent sequences.
   This makes then upcoming fix for MDEV-27042 simpler.
2021-11-24 13:35:57 +04:00
Marko Mäkelä
06988bdcaa Merge 10.6 into 10.7 2021-11-09 09:40:29 +02:00
Marko Mäkelä
25ac047baf Merge 10.5 into 10.6 2021-11-09 09:11:50 +02:00
Marko Mäkelä
9c18b96603 Merge 10.4 into 10.5 2021-11-09 08:50:33 +02:00
Marko Mäkelä
47ab793d71 Merge 10.3 into 10.4 2021-11-09 08:40:14 +02:00
Marko Mäkelä
524b4a89da Merge 10.2 into 10.3 2021-11-09 08:26:59 +02:00
Alexander Barkov
d0b611a76d MDEV-24335 Unexpected question mark in the end of a TINYTEXT column
my_copy_fix_mb() passed MIN(src_length,dst_length) to
my_append_fix_badly_formed_tail(). It could break a multi-byte
character in the middle, which put the question mark to the
destination.

Fixing the code to pass the true src_length to
my_append_fix_badly_formed_tail().
2021-11-02 09:00:49 +04:00
Alexander Barkov
059797ed44 MDEV-24901 SIGSEGV in fts_get_table_name, SIGSEGV in ib_vector_size, SIGSEGV in row_merge_fts_doc_tokenize, stack smashing
strmake() puts one extra 0x00 byte at the end of the string.
The code in my_strnxfrm_tis620[_nopad] did not take this into
account, so in the reported scenario the 0x00 byte was put outside
of a stack variable, which made ASAN crash.

This problem is already fixed in in MySQL:

  commit 19bd66fe43c41f0bde5f36bc6b455a46693069fb
  Author: bin.x.su@oracle.com <>
  Date:   Fri Apr 4 11:35:27 2014 +0800

But the fix does not seem to be correct, as it breaks when finds a zero byte
in the source string.

Using memcpy() instead of strmake().

- Unlike strmake(), memcpy() it does not write beyond the destination
  size passed.
- Unlike the MySQL fix, memcpy() does not break on the first 0x00 byte found
  in the source string.
2021-10-29 12:37:29 +04:00
Eric Herman
401ff6994d MDEV-26221: DYNAMIC_ARRAY use size_t for sizes
https://jira.mariadb.org/browse/MDEV-26221
my_sys DYNAMIC_ARRAY and DYNAMIC_STRING inconsistancy

The DYNAMIC_STRING uses size_t for sizes, but DYNAMIC_ARRAY used uint.
This patch adjusts DYNAMIC_ARRAY to use size_t like DYNAMIC_STRING.

As the MY_DIR member number_of_files is copied from a DYNAMIC_ARRAY,
this is changed to be size_t.

As MY_TMPDIR members 'cur' and 'max' are copied from a DYNAMIC_ARRAY,
these are also changed to be size_t.

The lists of plugins and stored procedures use DYNAMIC_ARRAY,
but their APIs assume a size of 'uint'; these are unchanged.
2021-10-19 16:00:26 +03:00
Marko Mäkelä
b36d6f92a8 Merge 10.6 into 10.7 2021-09-30 11:01:07 +03:00
Alexander Barkov
0d68b0a2d6 MDEV-26669 Add MY_COLLATION_HANDLER functions min_str() and max_str() 2021-09-27 17:10:22 +04:00
Alexander Barkov
0629711db4 MDEV-26572 Improve simple multibyte collation performance on the ASCII range 2021-09-13 08:03:25 +04:00
Eric Herman
105e4148bf Add json_normalize function to json_lib
This patch implements a library for normalizing json documents.
The algorithm is:
* Recursively sort json keys according to utf8mb4_bin collation.
* Normalize numbers to be of the form [-]<digit>.<frac>E<exponent>
* All unneeded whitespace and line endings are removed.
* Arrays are not sorted.

Co-authored-by: Vicențiu Ciorbaru <vicentiu@mariadb.org>
2021-07-21 16:32:11 +03:00
Eric Herman
7b587fcbe7 fix json typo s/UNINITALIZED/UNINITIALIZED/ 2021-07-21 16:32:11 +03:00
Vladislav Vaintroub
3d6eb7afcf MDEV-25602 get rid of __WIN__ in favor of standard _WIN32
This fixed the MySQL bug# 20338 about misuse of double underscore
prefix __WIN__, which was old MySQL's idea of identifying Windows
Replace it by _WIN32 standard symbol for targeting Windows OS
(both 32 and 64 bit)

Not that connect storage engine is not fixed in this patch (must be
fixed in "upstream" branch)
2021-06-06 13:21:03 +02:00
Monty
a206658b98 Change CHARSET_INFO character set and collaction names to LEX_CSTRING
This change removed 68 explict strlen() calls from the code.

The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name

Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible

Other things:
- There where compiler issues with ensuring that all character set names
  points to the same string: gcc doesn't allow one to use integer constants
  when defining global structures (constant char * pointers works fine).
  To get around this, I declared defines for each character set name
  length.
2021-05-19 22:54:07 +02:00
Monty
5c7d243b29 Add support for minimum field width for strings to my_vsnprintf()
This patch adds support for right aligned strings and numbers.
Left alignment is left as an exercise for anyone needing it.

MDEV-25612 "Assertion `to <= end' failed in process_args" fixed.
(Was caused by the original version of this patch)
2021-05-19 22:27:29 +02:00
Monty
fa7d4abf16 Added typedef decimal_digits_t (uint16) for number of digits in most
aspects of decimals and integers

For fields and Item's uint8 should be good enough. After
discussions with Alexander Barkov we choose uint16 (for now)
as some format functions may accept +256 digits.

The reason for this patch was to make the usage and storage of decimal
digits simlar. Before this patch decimals was stored/used as uint8,
int and uint.  The lengths for numbers where also using a lot of
different types.

Changed most decimal variables and functions to use the new typedef.

squash! af7f09106b6c1dc20ae8c480bff6fd22d266b184

Use decimal_digits_t for all aspects of digits (total, precision
and scale), both for decimals and integers.
2021-05-19 22:27:27 +02:00
Rucha Deodhar
2fdb556e04 MDEV-8334: Rename utf8 to utf8mb3
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
2021-05-19 06:48:36 +02:00
Monty
4d53a7585c Updated tests in decimal.c that match current API and usage 2021-05-11 21:25:08 +03:00
Marko Mäkelä
80ed136e6d Merge 10.4 into 10.5 2021-04-21 09:01:01 +03:00
Monty
031f11717d Fix all warnings given by UBSAN
The easiest way to compile and test the server with UBSAN is to run:
./BUILD/compile-pentium64-ubsan
and then run mysql-test-run.
After this commit, one should be able to run this without any UBSAN
warnings. There is still a few compiler warnings that should be fixed
at some point, but these do not expose any real bugs.

The 'special' cases where we disable, suppress or circumvent UBSAN are:
- ref10 source (as here we intentionally do some shifts that UBSAN
  complains about.
- x86 version of optimized int#korr() methods. UBSAN do not like unaligned
  memory access of integers.  Fixed by using byte_order_generic.h when
  compiling with UBSAN
- We use smaller thread stack with ASAN and UBSAN, which forced me to
  disable a few tests that prints the thread stack size.
- Verifying class types does not work for shared libraries. I added
  suppression in mysql-test-run.pl for this case.
- Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
  safe to have overflows (two cases, in item_func.cc).

Things fixed:
- Don't left shift signed values
  (byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
- Don't assign not non existing values to enum variables.
- Ensure that bool and enum values are properly initialized in
  constructors.  This was needed as UBSAN checks that these types has
  correct values when one copies an object.
  (gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
- Ensure we do not called handler functions on unallocated objects or
  deleted objects.
  (events.cc, sql_acl.cc).
- Fixed bugs in Item_sp::Item_sp() where we did not call constructor
  on Query_arena object.
- Fixed several cast of objects to an incompatible class!
  (Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
   sql_select.cc ...)
- Ensure we do not do integer arithmetic that causes over or underflows.
  This includes also ++ and -- of integers.
  (Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
- Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
  value_type is initialized to this instead of to -1, which is not a valid
  enum value for json_value_types.
- Ensure we do not call memcpy() when second argument could be null.
- Fixed that Item_func_str::make_empty_result() creates an empty string
  instead of a null string (safer as it ensures we do not do arithmetic
  on null strings).

Other things:

- Changed struct st_position to an OBJECT and added an initialization
  function to it to ensure that we do not copy or use uninitialized
  members. The change to a class was also motived that we used "struct
  st_position" and POSITION randomly trough the code which was
  confusing.
- Notably big rewrite in sql_acl.cc to avoid using deleted objects.
- Changed in sql_partition to use '^' instead of '-'. This is safe as
  the operator is either 0 or 0x8000000000000000ULL.
- Added check for select_nr < INT_MAX in JOIN::build_explain() to
  avoid bug when get_select() could return NULL.
- Reordered elements in POSITION for better alignment.
- Changed sql_test.cc::print_plan() to use pointers instead of objects.
- Fixed bug in find_set() where could could execute '1 << -1'.
- Added variable have_sanitizer, used by mtr.  (This variable was before
  only in 10.5 and up).  It can now have one of two values:
  ASAN or UBSAN.
- Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
  it virtual. This was an effort to get UBSAN to work with loaded storage
  engines. I kept the change as the new place is better.
- Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
  in tabutil.cpp.
- Added HAVE_REPLICATION around usage of rgi_slave, to get embedded
  server to compile with UBSAN. (Patch from Marko).
- Added #ifdef for powerpc64 to avoid a bug in old gcc versions related
  to integer arithmetic.

Changes that should not be needed but had to be done to suppress warnings
from UBSAN:

- Added static_cast<<uint16_t>> around shift to get rid of a LOT of
  compiler warnings when using UBSAN.
- Had to change some '/' of 2 base integers to shift to get rid of
  some compile time warnings.

Reviewed by:
- Json changes: Alexey Botchkov
- Charset changes in ctype-uca.c: Alexander Barkov
- InnoDB changes & Embedded server: Marko Mäkelä
- sql_acl.cc changes: Vicențiu Ciorbaru
- build_explain() changes: Sergey Petrunia
2021-04-20 12:30:09 +03:00