collations
Fix by Alexey Botchkov
The 'value_len' is calculated wrong for the multibyte charsets. In the
read_strn() function we get the length of the string with the final ' " '
character. So have to subtract it's length from the value_len. And the
length of '1' isn't correct for the ucs2 charset (must be 2).
The MDEV-25004 test innodb_fts.versioning is omitted because ever since
commit 685d958e38 InnoDB would not allow
writes to a database where the redo log file ib_logfile0 is missing.
Range can be thought about in similar manner as wildcard (*) where
more than one elements are processed. To implement range notation, extended
json parser to parse the 'to' keyword and added JSON_PATH_ARRAY_RANGE for
path type. If there is 'to' keyword then use JSON_PATH_ARRAY range for
path type along with existing type.
This new integer to store the end index of range is n_item_end.
When there is 'to' keyword, store the integer in n_item_end else store in
n_item.
This patch can be viewed as combination of two parts:
1) Enabling '-' in the path so that the parser does not give out a warning.
2) Setting the negative index to a correct value and returning the
appropriate value.
1) To enable using the negative index in the path:
To make the parser not return warning when negative index is used in path
'-' needs to be allowed in json path characters. P_NEG is added
to enable this and is made recognizable by setting the 45th index of
json_path_chr_map[] to P_NEG (instead of previous P_ETC)
because 45 corresponds to '-' in unicode.
When the path is being parsed and '-' is encountered, the parser should
recognize it as parsing '-' sign, so a new json state PS_NEG is required.
When the state is PS_NEG, it means that a negative integer is
going to be parsed so set is_negative_index of current step to 1 and
n_item is set accordingly when integer is encountered after '-'.
Next proceed with parsing rest of the path and get the correct path.
Next thing is parsing the json and returning correct value.
2) Setting the negative index to a correct value and returning the value:
While parsing json if we encounter array and the path step for the array
is a negative index (n_item < 0), then we can count the number of elements
in the array and set n_item to correct corresponding value. This is done in
json_skip_array_and_count.
Part#3:
- make json_escape() return different errors on conversion error
and on out-of-space condition.
- Make histogram code handle conversion errors.
This patch implements a library for normalizing json documents.
The algorithm is:
* Recursively sort json keys according to utf8mb4_bin collation.
* Normalize numbers to be of the form [-]<digit>.<frac>E<exponent>
* All unneeded whitespace and line endings are removed.
* Arrays are not sorted.
Co-authored-by: Vicențiu Ciorbaru <vicentiu@mariadb.org>
The easiest way to compile and test the server with UBSAN is to run:
./BUILD/compile-pentium64-ubsan
and then run mysql-test-run.
After this commit, one should be able to run this without any UBSAN
warnings. There is still a few compiler warnings that should be fixed
at some point, but these do not expose any real bugs.
The 'special' cases where we disable, suppress or circumvent UBSAN are:
- ref10 source (as here we intentionally do some shifts that UBSAN
complains about.
- x86 version of optimized int#korr() methods. UBSAN do not like unaligned
memory access of integers. Fixed by using byte_order_generic.h when
compiling with UBSAN
- We use smaller thread stack with ASAN and UBSAN, which forced me to
disable a few tests that prints the thread stack size.
- Verifying class types does not work for shared libraries. I added
suppression in mysql-test-run.pl for this case.
- Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
safe to have overflows (two cases, in item_func.cc).
Things fixed:
- Don't left shift signed values
(byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
- Don't assign not non existing values to enum variables.
- Ensure that bool and enum values are properly initialized in
constructors. This was needed as UBSAN checks that these types has
correct values when one copies an object.
(gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
- Ensure we do not called handler functions on unallocated objects or
deleted objects.
(events.cc, sql_acl.cc).
- Fixed bugs in Item_sp::Item_sp() where we did not call constructor
on Query_arena object.
- Fixed several cast of objects to an incompatible class!
(Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
sql_select.cc ...)
- Ensure we do not do integer arithmetic that causes over or underflows.
This includes also ++ and -- of integers.
(Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
- Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
value_type is initialized to this instead of to -1, which is not a valid
enum value for json_value_types.
- Ensure we do not call memcpy() when second argument could be null.
- Fixed that Item_func_str::make_empty_result() creates an empty string
instead of a null string (safer as it ensures we do not do arithmetic
on null strings).
Other things:
- Changed struct st_position to an OBJECT and added an initialization
function to it to ensure that we do not copy or use uninitialized
members. The change to a class was also motived that we used "struct
st_position" and POSITION randomly trough the code which was
confusing.
- Notably big rewrite in sql_acl.cc to avoid using deleted objects.
- Changed in sql_partition to use '^' instead of '-'. This is safe as
the operator is either 0 or 0x8000000000000000ULL.
- Added check for select_nr < INT_MAX in JOIN::build_explain() to
avoid bug when get_select() could return NULL.
- Reordered elements in POSITION for better alignment.
- Changed sql_test.cc::print_plan() to use pointers instead of objects.
- Fixed bug in find_set() where could could execute '1 << -1'.
- Added variable have_sanitizer, used by mtr. (This variable was before
only in 10.5 and up). It can now have one of two values:
ASAN or UBSAN.
- Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
it virtual. This was an effort to get UBSAN to work with loaded storage
engines. I kept the change as the new place is better.
- Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
in tabutil.cpp.
- Added HAVE_REPLICATION around usage of rgi_slave, to get embedded
server to compile with UBSAN. (Patch from Marko).
- Added #ifdef for powerpc64 to avoid a bug in old gcc versions related
to integer arithmetic.
Changes that should not be needed but had to be done to suppress warnings
from UBSAN:
- Added static_cast<<uint16_t>> around shift to get rid of a LOT of
compiler warnings when using UBSAN.
- Had to change some '/' of 2 base integers to shift to get rid of
some compile time warnings.
Reviewed by:
- Json changes: Alexey Botchkov
- Charset changes in ctype-uca.c: Alexander Barkov
- InnoDB changes & Embedded server: Marko Mäkelä
- sql_acl.cc changes: Vicențiu Ciorbaru
- build_explain() changes: Sergey Petrunia
character (").
The my_wildcmp function doesn't expect the string parameter to
have escapements, only the template. So the string
should be unescaped if necessary.
check_contains() fixed. When an item of an array is a complex
structure, it can be half-read after the end of the recursive
check_contains() call. So we just manually get to it's ending.