collations
Fix by Alexey Botchkov
The 'value_len' is calculated wrong for the multibyte charsets. In the
read_strn() function we get the length of the string with the final ' " '
character. So have to subtract it's length from the value_len. And the
length of '1' isn't correct for the ucs2 charset (must be 2).
collations
Analysis:
When we have negative index, the value in array_counter[] array is going to
be -1 at some point ( because in case of negative index in json path, the
initial value for a path with negative index is -<size_of_array>, and as we
move forward in array while parsing it and finding path, this value
increments). Since SKIPPED_STEP_MARK, is maximum uint value, it gets
compared to some int value in the array and eventually equates to -1
and messes with path.
Fix:
Make SKIPPED_STEP_MARK maximum of INT32.
json_normalize_number(): Avoid accessing str past str_len.
The function would seem to work incorrectly when some digits are
not followed by a decimal point (.) or an exponent (E or e).
Analysis:
When we skip level when path is found, it changes the state of the json
engine. This breaks the sequence for json_get_path_next() which is called at
the end to ensure json document is valid and leads to crash.
Fix:
Use json_scan_next() at the end to check if json document has correct
syntax (is valid).
Analysis:
Parsing json path happens only once. When paring, we set types of path
(types_used) to use later. If the path type has range or wild card, only
then multiple values get added to the result set.
However for each row in the table, types_used still gets
overwritten to default (no multiple values) and is also not set again
(because path is already parsed). Since multiple values depend on the
type of path, they dont get added to result set either.
Fix:
set default for types_used only if path is not parsed already.
procedures
MDEV-22224 caused the parsing of keys with hyphens to break by setting
the state transitions for parsing keys to JE_SYN (syntax error) when
they encounter a hyphen. However json key names may contain hyphens and
still be considered valid json.
This patch changes the state transition table so that key names with
hyphens remain valid. Note that unquoted key names in paths like
$.key-name are also valid again. This restores the previous behaviour
when hyphens were considered part of the P_ETC character class.
Analysis: The JSON functions(JSON_ARRAY[OBJECT|ARRAY_APPEND|ARRAY_INSERT|INSERT|SET|REPLACE]) result is truncated when the function is called based on LONGTEXT field. The overflow occurs when computing the result length due to the LONGTEXT max length is same as uint32 max length. It lead to wrong result length.
Fix: Add static_cast<ulonglong> to avoid uint32 overflow and fix the arguments used.
Analysis: JSON_VALUE() returns "null" string instead of NULL pointer.
Fix: When the type is JSON_VALUE_NULL (which is also a scalar) set
null_value to true and return 0 instead of returning string.
Analysis: JSON_OVERLAPS() does not check nested key-value pair completely.
If there is nested object, then it only scans and validates if two json values
overlap until one of the value (which is of type object) is exhausted.
This does not really check if the two values of keys are exacly the same, instead
it only checks if key-value pair of one is present in key-value pair of the
other
Fix: Normalize the values (which are of type object) and compare
using string compare. This will validate if two values
are exactly the same.