Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.
This patch consolidates the functions all into the
qsort_cmp2 interface.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
Field_blob::store() has special code for GROUP_CONCAT temporary table
(to store blob values in Blob_mem_storage - this prevents them
from being freed/overwritten when a next row is read).
Field_geom and Field_blob_compressed inherit from Field_blob but they
have their own ::store() method without this special Blob_mem_storage
support.
Considering that non-grouping CONCAT() of such fields converts
them to plain BLOB, let's do the same for GROUP_CONCAT. To do it,
Item_func_group_concat::setup will signal that it's creating
a temporary table for GROUP_CONCAT, and Field_blog::make_new_field()
override will create base Field_blob when under group concat.
This bug could affect queries containing a join of derived tables over
grouping views such that one of the derived tables contains a window
function while another uses view V with dependent subquery DSQ containing
a set function aggregated outside of the subquery in the view V. The
subquery also refers to the fields from the group clause of the view.Due to
this bug execution of such queries could produce wrong result sets.
When the fix_fields() method performs context analysis of a set function AF
first, at the very beginning the function Item_sum::init_sum_func_check()
is called. The function copies the pointer to the embedding set function,
if any, stored in THD::LEX::in_sum_func into the corresponding field of the
set function AF simultaneously changing the value of THD::LEX::in_sum_func
to point to AF. When at the very end of the fix_fields() method the function
Item_sum::check_sum_func() is called it is supposed to restore the value
of THD::LEX::in_sum_func to point to the embedding set function. And in
fact Item_sum::check_sum_func() did it, but only for regular set functions,
not for those used in window functions. As a result after the context
analysis of AF had finished THD::LEX::in_sum_func still pointed to AF.
It confused the further context analysis. In particular it led to wrong
resolution of Item_outer_ref objects in the fix_inner_refs() function.
This wrong resolution forced reading the values of grouping fields referred
in DSQ not from the temporary table used for aggregation from which they
were supposed to be read, but from the table used as the source table for
aggregation.
This patch guarantees that the value of THD::LEX::in_sum_func is properly
restored after the call of fix_fields() for any set function.
Item_func_group_concat::print() did not take into account
that Item_func_group_concat::separator can be of a different character set
than the "String *str" (when the printing is being done to).
Therefore, printing did not work correctly for:
- non-ASCII separators when GROUP_CONCAT is done on 8bit data
or multi-byte data with mbminlen==1.
- all separators (even including simple ones like comma)
when GROUP_CONCAT is done on ucs2/utf16/utf32 data (mbminlen>1).
Because of this problem, VIEW definitions did not print correctly to
their FRM files. This later led to a wrong SELECT and SHOW CREATE output.
Fix:
- Adding new String methods:
bool append_for_single_quote_using_mb_wc(const char *str, size_t length,
CHARSET_INFO *cs);
bool append_for_single_quote_opt_convert(const char *str,
size_t length,
CHARSET_INFO *cs)
which perform both escaping and character set conversion at the same time.
- Adding a new String method escaped_wc_for_single_quote(),
to reuse the code between the old and the new methods.
- Fixing Item_func_group_concat::print() to use the new
method append_for_single_quote_opt_convert().
If a query with GROUP_CONCAT is executed then the server reports a warning
every time when the length of the result of this function exceeds the set
value of the system variable group_concat_max_len. This bug led to the set
of warnings from the second execution of the prepared statement that did
not coincide with the one from the first execution if the executed query
was a grouping query over a join of tables using GROUP_CONCAT function and
join cache was not allowed to be employed.
The descrepancy of the sets of warnings was due to lack of cleanup for
Item_func_group_concat::row_count after execution of the query.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Functions extracting non-negative datetime components:
- YEAR(dt), EXTRACT(YEAR FROM dt)
- QUARTER(td), EXTRACT(QUARTER FROM dt)
- MONTH(dt), EXTRACT(MONTH FROM dt)
- WEEK(dt), EXTRACT(WEEK FROM dt)
- HOUR(dt),
- MINUTE(dt),
- SECOND(dt),
- MICROSECOND(dt),
- DAYOFYEAR(dt)
- EXTRACT(YEAR_MONTH FROM dt)
did not set their max_length properly, so in the DECIMAL
context they created a too small DECIMAL column, which
led to the 'Out of range value' error.
The problem is that most of these functions historically
returned the signed INT data type.
There were two simple ways to fix these functions:
1. Add +1 to max_length.
But this would also change their size in the string context
and create too long VARCHAR columns, with +1 excessive size.
2. Preserve max_length, but change the data type from INT to INT UNSIGNED.
But this would break backward compatibility.
Also, using UNSIGNED is generally not desirable,
it's better to stay with signed when possible.
This fix implements another solution, which it makes all these functions
work well in all contexts: int, decimal, string.
Fix details:
- Adding a new special class Type_handler_long_ge0 - the data type
handler for expressions which:
* should look like normal signed INT
* but which known not to return negative values
Expressions handled by Type_handler_long_ge0 store in Item::max_length
only the number of digits, without adding +1 for the sign.
- Fixing Item_extract to use Type_handler_long_ge0
for non-negative datetime components:
YEAR, YEAR_MONTH, QUARTER, MONTH, WEEK
- Adding a new abstract class Item_long_ge0_func, for functions
returning non-negative datetime components.
Item_long_ge0_func uses Type_handler_long_ge0 as the type handler.
The class hierarchy now looks as follows:
Item_long_ge0_func
Item_long_func_date_field
Item_func_to_days
Item_func_dayofmonth
Item_func_dayofyear
Item_func_quarter
Item_func_year
Item_long_func_time_field
Item_func_hour
Item_func_minute
Item_func_second
Item_func_microsecond
- Cleanup: EXTRACT(QUARTER FROM dt) created an excessive VARCHAR column
in string context. Changing its length from 2 to 1.
This patch fixes too strong condition in assert at the method
Item_func_group_concat::fix_fields
that is true in case of a stored routine and obviously broken
for a prepared statement.
During the 10.5->10.6 merge please use the 10.6 code on conflicts.
This is the 10.5 version of the patch (a backport of the 10.6 version).
Unlike 10.6 version, it makes changes in plugin/type_inet/sql_type_inet.*
rather than in sql/sql_type_fixedbin.h
Item_bool_rowready_func2, Item_func_between, Item_func_in
did not check if a not-NULL argument of an arbitrary data type
can produce a NULL value on conversion to INET6.
This caused a crash on DBUG_ASSERT() in conversion failures,
because the function returned SQL NULL for something that
has Item::maybe_null() equal to false.
Adding setting NULL-ability in such cases.
Details:
- Removing the code in Item_func::setup_args_and_comparator()
performing character set aggregation with optional narrowing.
This aggregation is done inside Arg_comparator::set_cmp_func_string().
So this code was redundant
- Removing Item_func::setup_args_and_comparator() as it git simplified to
just to two lines:
convert_const_compared_to_int_field(thd);
return cmp->set_cmp_func(thd, this, &args[0], &args[1], true);
Using these lines directly in:
- Item_bool_rowready_func2::fix_length_and_dec()
- Item_func_nullif::fix_length_and_dec()
- Adding a new virtual method:
- Type_handler::Item_bool_rowready_func2_fix_length_and_dec().
- Adding tests detecting if the data type conversion can return SQL NULL into
the following methods of Type_handler_inet6:
- Item_bool_rowready_func2_fix_length_and_dec
- Item_func_between_fix_length_and_dec
- Item_func_in_fix_comparator_compatible_types
This is the follow-up patch that removes explicit use of thd->stmt_arena
for memory allocation and replaces it with call of the method
THD::active_stmt_arena_to_use()
Additionally, this patch adds extra DBUG_ASSERT to check that right
query arena is in use.
MDEV-30668 Set function aggregated in outer select used in view definition
This patch fixes two bugs concerning views whose specifications contain
subqueries with set functions aggregated in outer selects.
Due to the first bug those such views that have implicit grouping were
considered as mergeable. This led to wrong result sets for selects from
these views.
Due to the second bug the aggregation select was determined incorrectly and
this led to bogus error messages.
The patch added several test cases for these two bugs and for four other
duplicate bugs.
The patch also enables view-protocol for many other test cases.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
The problem was that when storing rows into a temporary table,
MIN/MAX items that where marked as constants (as theire value had
been computed at start of query) would be reset.
Fixed by not reseting MIN/MAX items that are marked as const in
Item_sum_min_max::clear().
Arythmetic can overrun the uint type when possible group_concat_max_len
is multiplied to collation.mbmaxlen (can easily be like 4).
So use ulonglong there for calculations.
Precision should be kept below DECIMAL_MAX_SCALE for computations.
It can be bigger in Item_decimal. I'd fix this too but it changes the
existing behaviour so problemmatic to ix.
This crash happens on a combination of multiple conditions:
- There is a thead#1 running an "ANALYZE FORMAT=JSON" query for a
"SELECT .. FROM INFORMATION_SCHEMA.COLUMNS WHERE .. "
- The WHERE clause contains a stored function call, say f1().
- The WHERE clause is built in the way so that the function f1()
is never actually called, e.g.
WHERE .. AND (TRUE OR f1()=expr)
- The database contains multiple VIEWs that have the function f1() call,
e.g. in their <select list>
- The WHERE clause is built in the way so that these VIEWs match
the condition.
- There is a parallel thread#2 running. It creates or drops or recreates
some other stored routine, say f2(), which is not used in the ANALYZE query.
It effectively invalidates the stored routine cache for thread#1
without locking.
Note, it is important that f2() is NOT used by ANALYZE query.
Otherwise, thread#2 would be locked until the ANALYZE query
finishes.
When all of the above conditions are met, the following happens:
1. thread#1 starts the ANALYZE query. It notices a call for the stored function
f1() in the WHERE condition. The function f1() gets parsed and cached
to the SP cache. Its address also gets assigned to Item_func_sp::m_sp.
2. thread#1 starts iterating through all tables that
match the WHERE condition to find the information about their columns.
3. thread#1 processes columns of the VIEW v1.
It notices a call for f1() in the VIEW v1 definition.
But f1() is already cached in the step#1 and it is up to date.
So nothing happens with the SP cache.
4. thread#2 re-creates f2() in a non-locking mode.
It effectively invalidates the SP cache in thread#1.
5. thread#1 processes columns of the VIEW v2.
It notices a call for f1() in the VIEW v2 definition.
It also notices that the cached version of f1() is not up to date.
It frees the old definition of f1(), parses it again, and puts a
new version of f1() to the SP cache.
6. thread#1 finishes processing rows and generates the JSON output.
When printing the "attached_condition" value, it calls
Item_func_sp::print() for f1(). But this Item_func_sp links
to the old (freed) version of f1().
The above scenario demonstrates that Item_func_sp::m_sp can point to an
alredy freed instance when Item_func_sp::func_name() is called,
so accessing to Item_sp::m_sp->m_handler is not safe.
This patch rewrites the code to use Item_func_sp::m_handler instead,
which is always reliable.
Note, this patch is only a cleanup for MDEV-28166 to quickly fix the regression.
It fixes MDEV-28267. But it does not fix the core problem:
The code behind I_S does not take into account that the SP
cache can be updated while evaluating rows of the COLUMNS table.
This is a corner case and it never happens with any other tables.
I_S.COLUMNS is very special.
Another example of the core problem is reported in MDEV-25243.
The code accesses to Item_sp::m_sp->m_chistics of an
already freed m_sp, again. It will be addressed separately.
This patch reverts the fixes of the bugs MDEV-24454 and MDEV-25631 from
the commit 3690c549c6.
It leaves the changes in plugin/feedback/feedback.cc and corresponding
test files introduced in this commit intact.
Proper fixes for the bug MDEV-24454 and MDEV-25631 will follow immediately.
Use in_sum_func (and so nest_level) only in LEX to which SELECT lex belong to
Reduce usage of current_select (because it does not always point on the correct
SELECT_LEX, for example with prepare.
Change context for all classes inherited from Item_ident (was only for Item_field) in case of pushing down it to HAVING.
Now name resolution context have to have SELECT_LEX reference if the context is present.
Fixed feedback plugin stack usage.
The easiest way to compile and test the server with UBSAN is to run:
./BUILD/compile-pentium64-ubsan
and then run mysql-test-run.
After this commit, one should be able to run this without any UBSAN
warnings. There is still a few compiler warnings that should be fixed
at some point, but these do not expose any real bugs.
The 'special' cases where we disable, suppress or circumvent UBSAN are:
- ref10 source (as here we intentionally do some shifts that UBSAN
complains about.
- x86 version of optimized int#korr() methods. UBSAN do not like unaligned
memory access of integers. Fixed by using byte_order_generic.h when
compiling with UBSAN
- We use smaller thread stack with ASAN and UBSAN, which forced me to
disable a few tests that prints the thread stack size.
- Verifying class types does not work for shared libraries. I added
suppression in mysql-test-run.pl for this case.
- Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
safe to have overflows (two cases, in item_func.cc).
Things fixed:
- Don't left shift signed values
(byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
- Don't assign not non existing values to enum variables.
- Ensure that bool and enum values are properly initialized in
constructors. This was needed as UBSAN checks that these types has
correct values when one copies an object.
(gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
- Ensure we do not called handler functions on unallocated objects or
deleted objects.
(events.cc, sql_acl.cc).
- Fixed bugs in Item_sp::Item_sp() where we did not call constructor
on Query_arena object.
- Fixed several cast of objects to an incompatible class!
(Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
sql_select.cc ...)
- Ensure we do not do integer arithmetic that causes over or underflows.
This includes also ++ and -- of integers.
(Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
- Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
value_type is initialized to this instead of to -1, which is not a valid
enum value for json_value_types.
- Ensure we do not call memcpy() when second argument could be null.
- Fixed that Item_func_str::make_empty_result() creates an empty string
instead of a null string (safer as it ensures we do not do arithmetic
on null strings).
Other things:
- Changed struct st_position to an OBJECT and added an initialization
function to it to ensure that we do not copy or use uninitialized
members. The change to a class was also motived that we used "struct
st_position" and POSITION randomly trough the code which was
confusing.
- Notably big rewrite in sql_acl.cc to avoid using deleted objects.
- Changed in sql_partition to use '^' instead of '-'. This is safe as
the operator is either 0 or 0x8000000000000000ULL.
- Added check for select_nr < INT_MAX in JOIN::build_explain() to
avoid bug when get_select() could return NULL.
- Reordered elements in POSITION for better alignment.
- Changed sql_test.cc::print_plan() to use pointers instead of objects.
- Fixed bug in find_set() where could could execute '1 << -1'.
- Added variable have_sanitizer, used by mtr. (This variable was before
only in 10.5 and up). It can now have one of two values:
ASAN or UBSAN.
- Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
it virtual. This was an effort to get UBSAN to work with loaded storage
engines. I kept the change as the new place is better.
- Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
in tabutil.cpp.
- Added HAVE_REPLICATION around usage of rgi_slave, to get embedded
server to compile with UBSAN. (Patch from Marko).
- Added #ifdef for powerpc64 to avoid a bug in old gcc versions related
to integer arithmetic.
Changes that should not be needed but had to be done to suppress warnings
from UBSAN:
- Added static_cast<<uint16_t>> around shift to get rid of a LOT of
compiler warnings when using UBSAN.
- Had to change some '/' of 2 base integers to shift to get rid of
some compile time warnings.
Reviewed by:
- Json changes: Alexey Botchkov
- Charset changes in ctype-uca.c: Alexander Barkov
- InnoDB changes & Embedded server: Marko Mäkelä
- sql_acl.cc changes: Vicențiu Ciorbaru
- build_explain() changes: Sergey Petrunia
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
The problem here is similar to the case with DISTINCT, the tree used for ORDER BY
needs to also hold the null bytes of the record. This was not done for GROUP_CONCAT
as NULLS are rejected by GROUP_CONCAT.
Also introduced a comparator function for the order by tree to handle null
values with JSON_ARRAYAGG.