The tests main.func_json and json.json_no_table fail on server built with
the option -DWITH_PROTECT_STATEMENT_MEMROOT=YES by the reason that a memory
is allocated on the statement's memory root on the second execution of
a query that uses the function json_contains_path().
The reason that a memory is allocated on second execution of a prepared
statement that ivokes the function json_contains_path() is that a memory
allocated on every call of the method Item_json_str_multipath::fix_fields
To fix the issue, memory allocation should be done only once on first
call of the method Item_json_str_multipath::fix_fields. Simmilar issue
take place inside the method Item_func_json_contains_path::fix_fields.
Both methods are modified to make memory allocation only once on its
first execution and later re-use the allocated memory.
Before this patch the memory referenced by the pointers stored in the array
tmp_paths were released by the method Item_func_json_contains_path::cleanup
that is called on finishing execution of a prepared statement. Now that
memory allocation performed inside the method Item_json_str_multipath::fix_fields
is done only once, the item clean up has degenerate form and can be
delegated to the cleanup() method of the base class and memory deallocation
can be performed in the destructor.
data from a table similar to other JSON functions
Analysis:
Since we are fetching values for every row ( because we are running SELECT
for all rows of a table ), correct value can be only obtained at the time of
calling val_int() because it is called to get value for each row.
Fix:
Set up hash for each row instead of doing it during fixing fields.
* invoke parent's cleanup()
* don't reinit memroot, if already inited (causes memory leak)
also move free_root() from destructor to cleanup() to not accumulate
allocations from prepare and multiple executes
The idea is to have simple functions that the user can combine to produce
the exact result one wants, whether the user wants JSON object that has
common keys with another JSON object, or same key/value pair etc. So
making simpler function helps here.
We accomplish this by making three separate functions.
1) JSON_OBJECT_FILTER_KEYS(Obj, Arr_keys):
Put keys ( which are basically strings ) in hash, go over the object and
get key one by one. If the key is present in the hash,
add the key-value pair to result.
2) JSON_OBJECT_TO_ARRAY(Obj) : Create a string variable, Go over the json
object, and add each key value pair as an array into the result.
3) JSON_ARRAY_INTERSECT(arr1, arr2) :
Go over one of the json and add each item of the array
in hash (after normalizing each item). Go over the second array,
search the normalized item one by one in the hash. If item is found,
add it to the result.
Implementation Idea: Holyfoot ( Alexey Botchkov)
Author: tanruixiang and Rucha Deodhar
objects
Idea behind implementation:
We get the json object specified by the json path. Then, transform it into
key-value pairs by going over the json. Get each key-value pair
one-by-one and return the result.
Analysis: null_value is not set if any one of the arguments is NULL. So it
returns 1.
Fix: when either argument is NULL, set null_value to true, so that null can
be returned
Implementation:
Implementation is made according to json schema validation draft 2020
JSON schema basically has same structure as that of json object, consisting
of key-value pairs. So it can be parsed in the same manner as
any json object.
However, none of the keywords are mandatory, so making guess about the
json value type based only on the keywords would be incorrect.
Hence we need separate objects denoting each keyword.
So during create_object_and_handle_keyword() we create appropriate objects
based on the keywords and validate each of them individually on the json
document by calling respective validate() function if the type matches.
If any of them fails, return false, else return true.
1) When at least one of the two json documents is of scalar type:
1.a) If value and json document both are scalar, then return true
if they have same type and value.
1.b) If json document is scalar but other is array (or vice versa),
then return true if array has at least one element of same type
and value as scalar.
1.c) If one is scalar and other is object, then return false because
it can't be compared.
2) When both arguments are of non-scalar type and below conditons
are satisfied then return true:
2.a) When both arguments are arrays:
Iterate over the value and json document. If there exists at
least one element in other array of same type and value as
that of element in value.
2.b) If both arguments are objects:
Iterate over value and json document and if there exists at least
one key-value pair common between two objects.
2.c) If either of json document or value is array and other is object:
Iterate over the array, if an element of type object is found,
then compare it with the object (which is the other arguemnt).
If the entire object matches i.e all they key value pairs match.
Hybrid functions (IF, COALESCE, etc) did not preserve the JSON property
from their arguments. The same problem was repeatable for single row subselects.
The problem happened because the method Item::is_json_type() was inconsistently
implemented across the Item hierarchy. For example, Item_hybrid_func
and Item_singlerow_subselect did not override is_json_type().
Solution:
- Removing Item::is_json_type()
- Implementing specific JSON type handlers:
Type_handler_string_json
Type_handler_varchar_json
Type_handler_tiny_blob_json
Type_handler_blob_json
Type_handler_medium_blob_json
Type_handler_long_blob_json
- Reusing the existing data type infrastructure to pass JSON
type handlers across all item types, including classes Item_hybrid_func
and Item_singlerow_subselect. Note, these two classes themselves do not
need any changes!
- Extending the data type infrastructure so data types can inherit
their properties (e.g. aggregation rules) from their base data types.
E.g. VARCHAR/JSON acts as VARCHAR, LONGTEXT/JSON acts as LONGTEXT
when mixed to a non-JSON data type. This is done by:
- adding virtual method Type_handler::type_handler_base()
- adding a helper class Type_handler_pair
- refactoring Type_handler_hybrid_field_type methods
aggregate_for_result(), aggregate_for_min_max(),
aggregate_for_num_op() to use Type_handler_pair.
This change also fixes:
MDEV-27361 Hybrid functions with JSON arguments do not send format metadata
Also, adding mtr tests for JSON replication. It was not covered yet.
And the current patch changes the replication code slightly.
JSON_REPLACE() function executed with an error on Spider SE.
This patch fixes the problem, and it also fixes the MDEV-24541.
The problem is that Item_func_json_insert::func_name() returns
the wrong function name "json_update".
The Spider SE reconstructs a query based on the return value
in some cases. Thus, if the return value is wrong, the Spider SE
may generate a wrong query.
This patch implements JSON_EQUALS SQL function. The function takes
advantage of the json_normalize functionality and does the following:
norm_a = json_normalize(a)
norm_b = json_normalize(b)
return strcmp(norm_a, norm_b)
Co-authored-by: Vicențiu Ciorbaru <vicentiu@mariadb.org>
Changes:
- To detect automatic strlen() I removed the methods in String that
uses 'const char *' without a length:
- String::append(const char*)
- Binary_string(const char *str)
- String(const char *str, CHARSET_INFO *cs)
- append_for_single_quote(const char *)
All usage of append(const char*) is changed to either use
String::append(char), String::append(const char*, size_t length) or
String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
String::append()
- Added overflow argument to escape_string_for_mysql() and
escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
This was needed as most usage of the above functions never tested the
result for -1 and would have given wrong results or crashes in case
of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
Changed all Item_func::func_name()'s to func_name_cstring()'s.
The old Item_func_or_sum::func_name() is now an inline function that
returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
LEX_CSTRING.
- Changed for some functions the name argument from const char * to
to const LEX_CSTRING &:
- Item::Item_func_fix_attributes()
- Item::check_type_...()
- Type_std_attributes::agg_item_collations()
- Type_std_attributes::agg_item_set_converter()
- Type_std_attributes::agg_arg_charsets...()
- Type_handler_hybrid_field_type::aggregate_for_result()
- Type_handler_geometry::check_type_geom_or_binary()
- Type_handler::Item_func_or_sum_illegal_param()
- Predicant_to_list_comparator::add_value_skip_null()
- Predicant_to_list_comparator::add_value()
- cmp_item_row::prepare_comparators()
- cmp_item_row::aggregate_row_elements_for_comparison()
- Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
could be simplified to not use String_space(), thanks to the fixed
my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
- NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
clarify what the function really does.
- Rename of protocol function:
bool store(const char *from, CHARSET_INFO *cs) to
bool store_string_or_null(const char *from, CHARSET_INFO *cs).
This was done to both clarify the difference between this 'store' function
and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.
Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.
This was done to simplify copying of with_* flags
Other things:
- Changed Flags to C++ enums, which enables gdb to print
out bit values for the flags. This also enables compiler
errors if one tries to manipulate a non existing bit in
a variable.
- Added set_maybe_null() as a shortcut as setting the
MAYBE_NULL flags was used in a LOT of places.
- Renamed PARAM flag to SP_VAR to ensure it's not confused with persistent
statement parameters.
The reason for the change is that neither clang or gcc can do efficient
code when several bit fields are change at the same time or when copying
one or more bits between identical bit fields.
Updated bits explicitely with & and | is MUCH more efficient than what
current compilers can do.
Quick grouping is not supported for JSON_OBJECTAGG. The same for GROUP_CONCAT too
so make sure that Item::quick_group is set to FALSE. We need to make sure that in
the case of JSON_OBJECTAGG we don't create an index over grouping fields of
the temp table and update the result after each iteration.
Instead we should first sort the result in accordance to the
GROUP BY fields and then perform the grouping and
write the result to the temp table.
Item_func_json_extract did not implement val_decimal(),
so CAST(JSON_EXTRACT('{"x":true}', '$.x') AS DECIMAL) erroneously
returned 0 with a warning because of convertion from the string "true"
to decimal.
Implementing val_decimal(), so boolean values are correctly handled.