collations
Fix by Alexey Botchkov
The 'value_len' is calculated wrong for the multibyte charsets. In the
read_strn() function we get the length of the string with the final ' " '
character. So have to subtract it's length from the value_len. And the
length of '1' isn't correct for the ucs2 charset (must be 2).
starts with '['
Analysis:
When type is non-scalar and the json document has syntax error
then it is not detected during validating type. And Since other validate
functions take const argument, the error state is not stored eventually.
Fix:
After we run out of all schemas (in case of no error during validation) from
the schema list, go over the json document until there is error in parsing
or json doc has ended.
The MDEV-25004 test innodb_fts.versioning is omitted because ever since
commit 685d958e38 InnoDB would not allow
writes to a database where the redo log file ib_logfile0 is missing.
Range can be thought about in similar manner as wildcard (*) where
more than one elements are processed. To implement range notation, extended
json parser to parse the 'to' keyword and added JSON_PATH_ARRAY_RANGE for
path type. If there is 'to' keyword then use JSON_PATH_ARRAY range for
path type along with existing type.
This new integer to store the end index of range is n_item_end.
When there is 'to' keyword, store the integer in n_item_end else store in
n_item.
This patch can be viewed as combination of two parts:
1) Enabling '-' in the path so that the parser does not give out a warning.
2) Setting the negative index to a correct value and returning the
appropriate value.
1) To enable using the negative index in the path:
To make the parser not return warning when negative index is used in path
'-' needs to be allowed in json path characters. P_NEG is added
to enable this and is made recognizable by setting the 45th index of
json_path_chr_map[] to P_NEG (instead of previous P_ETC)
because 45 corresponds to '-' in unicode.
When the path is being parsed and '-' is encountered, the parser should
recognize it as parsing '-' sign, so a new json state PS_NEG is required.
When the state is PS_NEG, it means that a negative integer is
going to be parsed so set is_negative_index of current step to 1 and
n_item is set accordingly when integer is encountered after '-'.
Next proceed with parsing rest of the path and get the correct path.
Next thing is parsing the json and returning correct value.
2) Setting the negative index to a correct value and returning the value:
While parsing json if we encounter array and the path step for the array
is a negative index (n_item < 0), then we can count the number of elements
in the array and set n_item to correct corresponding value. This is done in
json_skip_array_and_count.
Part#3:
- make json_escape() return different errors on conversion error
and on out-of-space condition.
- Make histogram code handle conversion errors.
This patch implements a library for normalizing json documents.
The algorithm is:
* Recursively sort json keys according to utf8mb4_bin collation.
* Normalize numbers to be of the form [-]<digit>.<frac>E<exponent>
* All unneeded whitespace and line endings are removed.
* Arrays are not sorted.
Co-authored-by: Vicențiu Ciorbaru <vicentiu@mariadb.org>
The easiest way to compile and test the server with UBSAN is to run:
./BUILD/compile-pentium64-ubsan
and then run mysql-test-run.
After this commit, one should be able to run this without any UBSAN
warnings. There is still a few compiler warnings that should be fixed
at some point, but these do not expose any real bugs.
The 'special' cases where we disable, suppress or circumvent UBSAN are:
- ref10 source (as here we intentionally do some shifts that UBSAN
complains about.
- x86 version of optimized int#korr() methods. UBSAN do not like unaligned
memory access of integers. Fixed by using byte_order_generic.h when
compiling with UBSAN
- We use smaller thread stack with ASAN and UBSAN, which forced me to
disable a few tests that prints the thread stack size.
- Verifying class types does not work for shared libraries. I added
suppression in mysql-test-run.pl for this case.
- Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
safe to have overflows (two cases, in item_func.cc).
Things fixed:
- Don't left shift signed values
(byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
- Don't assign not non existing values to enum variables.
- Ensure that bool and enum values are properly initialized in
constructors. This was needed as UBSAN checks that these types has
correct values when one copies an object.
(gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
- Ensure we do not called handler functions on unallocated objects or
deleted objects.
(events.cc, sql_acl.cc).
- Fixed bugs in Item_sp::Item_sp() where we did not call constructor
on Query_arena object.
- Fixed several cast of objects to an incompatible class!
(Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
sql_select.cc ...)
- Ensure we do not do integer arithmetic that causes over or underflows.
This includes also ++ and -- of integers.
(Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
- Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
value_type is initialized to this instead of to -1, which is not a valid
enum value for json_value_types.
- Ensure we do not call memcpy() when second argument could be null.
- Fixed that Item_func_str::make_empty_result() creates an empty string
instead of a null string (safer as it ensures we do not do arithmetic
on null strings).
Other things:
- Changed struct st_position to an OBJECT and added an initialization
function to it to ensure that we do not copy or use uninitialized
members. The change to a class was also motived that we used "struct
st_position" and POSITION randomly trough the code which was
confusing.
- Notably big rewrite in sql_acl.cc to avoid using deleted objects.
- Changed in sql_partition to use '^' instead of '-'. This is safe as
the operator is either 0 or 0x8000000000000000ULL.
- Added check for select_nr < INT_MAX in JOIN::build_explain() to
avoid bug when get_select() could return NULL.
- Reordered elements in POSITION for better alignment.
- Changed sql_test.cc::print_plan() to use pointers instead of objects.
- Fixed bug in find_set() where could could execute '1 << -1'.
- Added variable have_sanitizer, used by mtr. (This variable was before
only in 10.5 and up). It can now have one of two values:
ASAN or UBSAN.
- Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
it virtual. This was an effort to get UBSAN to work with loaded storage
engines. I kept the change as the new place is better.
- Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
in tabutil.cpp.
- Added HAVE_REPLICATION around usage of rgi_slave, to get embedded
server to compile with UBSAN. (Patch from Marko).
- Added #ifdef for powerpc64 to avoid a bug in old gcc versions related
to integer arithmetic.
Changes that should not be needed but had to be done to suppress warnings
from UBSAN:
- Added static_cast<<uint16_t>> around shift to get rid of a LOT of
compiler warnings when using UBSAN.
- Had to change some '/' of 2 base integers to shift to get rid of
some compile time warnings.
Reviewed by:
- Json changes: Alexey Botchkov
- Charset changes in ctype-uca.c: Alexander Barkov
- InnoDB changes & Embedded server: Marko Mäkelä
- sql_acl.cc changes: Vicențiu Ciorbaru
- build_explain() changes: Sergey Petrunia
character (").
The my_wildcmp function doesn't expect the string parameter to
have escapements, only the template. So the string
should be unescaped if necessary.
check_contains() fixed. When an item of an array is a complex
structure, it can be half-read after the end of the recursive
check_contains() call. So we just manually get to it's ending.