Oracle mode has different set operator precedence and handling (not by
standard). In Oracle mode the below test case is handled as-is, in
plain order from left to right. In MariaDB default mode follows SQL
standard and makes INTERSECT prioritized, so UNION is taken from
derived table which is INTERSECT result (here and below the same
applies for EXCEPT).
Non-distinct set operator (UNION ALL/INTERSECT ALL) works via unique
key release but it can be done only once. We cannot add index to
non-empty heap table (see heap_enable_indexes()). So every UNION ALL
before rightmost UNION DISTINCT works as UNION DISTINCT. That is
common syntax, MySQL, MSSQL and Oracle work that way.
There is union_distinct property which indicates the rightmost
distinct UNION (at least, so the algorithm works simple: it releases
the unique key after union_distinct in the loop
(st_select_lex_unit::exec()).
INTERSECT ALL code (implemented by MDEV-18844 in a896beb) does not
know about Oracle mode and treats union_distinct as the last
operation, that's why it releases unique key on union_distinct
operation. INTERSECT ALL requires unique key for it to work, so before
any INTERSECT ALL unique key must not be released (see
select_unit_ext::send_data()).
The patch tweaks INTERSECT ALL code for Oracle mode. In
disable_index_if_needed() it does not allow unique key release before
the last operation and it allows unfold on the last operation. Test
case with UNION DISTINCT following INTERSECT ALL at least does not
include invalid data, but in fact the whole INTERSECT ALL code could
be refactored for better semantical triggers.
The patch fixes typo in st_select_lex_unit::prepare() where
have_except_all_or_intersect_all masked eponymous data member which
wrongly triggered unique key release in st_select_lex_unit::prepare().
The patch fixes unknown error in case ha_disable_indexes() fails.
Note: optimize_bag_operation() does some operator substitutions, but
it does not run under PS. So if there is difference in test with --ps
that means non-optimized (have_except_all_or_intersect_all == true)
code path is not good.
Note 2: VIEW is stored and executed in normal mode (see
Sql_mode_save_for_frm_handling) hence when SELECT order is different
in Oracle mode (defined by parsed_select_expr_cont()) it must be
covered by --disable_view_protocol.
In the main.plugin this function is called assuming the function
prototype int (*)(THD *, st_mysql_show_var *, void *, system_status_var *, enum_var_type)'
as changed in b4ff64568c.
We update the ha_example::show_func_example to match the prototype
on which it is called.
If the execution of the two reads in log_t::get_lsn_approx() is
interleaved with concurrent writes of those fields in
log_t::write_buf() or log_t::persist(), the returned approximation
will be an upper bound. If log_t::append_prepare_wait() is pending,
the approximation could be a lower bound.
We must adjust each caller of log_t::get_lsn_approx() for the
possibility that the return value is larger than
MAX(oldest_modification) in buf_pool.flush_list.
af_needed_for_redo(): Add a comment that explains why the glitch
is not a problem.
page_cleaner_flush_pages_recommendation(): Revise the logic for
the unlikely case cur_lsn < oldest_lsn. The original logic would have
invoked af_get_pct_for_lsn() with a very large age value, which
would likely cause an overflow of the local variable lsn_age_factor,
and make pct_for_lsn a "random number". Based on that value,
total_ratio would be normalized to something between 0.0 and 1.0.
Nothing extremely bad should have happened in this case;
the innodb_io_capacity_max should not be exceeded.
buf_pool_t::resize(): After successfully shrinking the buffer pool,
announce the success. The size had already been updated in shrunk().
After failing to shrink the buffer pool, re-enable the adaptive
hash index if it had been enabled.
Reviewed by: Debarun Banerjee
Calling SetArrayOptions with Nodes[i - 1].Key as its nm argument
exposed a MemorySanitizer error when i=0 for the tests:
* connect.bson_udf
* connect.json_udf
* connect.json_udf_bin
Its assumed that a basic optimization would have eliminated
these invalid expressions.
As the nm argument was unused, it has been removed.
THD::reset_sub_statement_state and THD::restore_sub_staement_state
swap auto_inc_intervals_forced(Discrete_intervals_list) of a THD class
with a local variable temporary to execute other things before restoring
at the end of Table_triggers_list::process_triggers under a
rpl_master_erroneous_autoinc(true) condition as exposed by the
rpl.rpl_trigger test.
The uninitialized data isn't used and the only required action is to
copy the data in one direction. As the intent is for the auto_inc_intervals_forced
value to be overwritten or unused, MEM_UNDEFINED is used on it to
ensure the previous state is considered invalid.
The other uses of reset_sub_statement_state in Item_sp::execute_impl
also follow the same pattern of taking a copy to restore within the
same function.
Without this increase the mtr test case pre/post conditions will
fail as the stack usage has increased under MSAN with clang-20.1.
ASAN takes a 11M stack, however there was no obvious gain in MSAN
test success after 2M.
The resulting behaviour observed on smaller stack size was a
SEGV normally.
Hide the default stack size from the sysvar tests that expose
thread-stack as a variable with its default value.
Despite being included in the HAVE_valgrind define.
As such it's best differenciated from valgrind in the
server identifier as they have for the purposes a distinct
and different set of behaviours.
MSAN has its own set of test inclusions that that are different
from valgrind and such including "valgrind" in a server string that
gets tested for valgrind will incorrectly exclude some tests
that are suitable for MSAN but not valgrind.
There's a have_sanitizer system variable for exposing
the sanitizer being used so there's no need for
version verboseness.
Correct have_sanitizer system variable description to
include MSAN has been possible for a while.
In original fix, commit 82d7419e06,
a 16k stack frame limit was imposed. Under the stack usage is doubled due
to MSAN. Debug builds without optimization can use more as well.
ASAN Debug builds also exceeded the 16k stack frame limit.
To keep some safety limit, a 64k limit is imposed to the compiler
under MSAN or ASAN with CMAKE_BUILD_TYPE=Debug.
Clang ~16+ on MSAN became quite strict with uninitalized
data being passed and returned from functions. Non-debug builds
have a basic optimization that hides these from those builds
Two innodb cases violate the assumptions, however once inlined
with a basic optimization those that existed for uninitialized
values are removed.
(MDEV-36316) rec_set_bit_field_2 calling mach_read_from_2 hits a read of
bits it wasn't actually changing.
(MDEV-36327) The function dict_process_sys_columns_rec left
nth_v_col uninitialized unless it was a virtual column. This was
ok as the function i_s_sys_columns_fill_table also didn't read
this value unless it was a virtual column.
my_time_fraction_remainder(): Remove a DBUG_ASSERT, because there is
none in sec_part_shift() or sec_part_unshift() either. A buffer
overflow should be caught by cmake -DWITH_ASAN=ON in all three.
This fixes a build with GCC 14.2 and
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS=-Og.
Reviewed by: Daniel Black
New warnings come from 3 places
1. Warning C5287: Warning comes from json_lib.c from code like
compile_time_assert((int) JSON_VALUE_NULL == (int) JSV_NULL);
2. Warning C5287: Similar warning come from wc_static_assert() from code
in wolfSSL's header file
3. Warning C5286 in WolfSSL code, -enum_value
(i.e multiplying enum with -1)is used
To fix:
- Disable warnings in WolfSSL code, using /wd<num> flag.
- workaround warning for users of WolfSSL, disable
wc_static_assert() with -DWC_NO_STATIC_ASSERT compile flag
- Rewrite some compile_time_assert in json_lib.c to avoid warning.
- add target_link_libraries(vio ${SSL_LIBRARIES}) so that
vio picks up -DWC_NO_STATIC_ASSERT
Problem:
=======
- In 10.11, During Copy algorithm, InnoDB does use bulk insert
for row by row insert operation. When temporary directory
ran out of memory, row_mysql_handle_errors() fails to handle
DB_TEMP_FILE_WRITE_FAIL.
- During inplace algorithm, concurrent DML fails to write
the log operation into the temporary file. InnoDB fail to
mark the error for the online log.
- ddl_log_write() releases the global ddl lock prematurely before
release the log memory entry
Fix:
===
row_mysql_handle_errors(): Rollback the transaction when
InnoDB encounters DB_TEMP_FILE_WRITE_FAIL
convert_error_code_to_mysql(): Report an aborted transaction
when InnoDB encounters DB_TEMP_FILE_WRITE_FAIL during
alter table algorithm=copy or innodb bulk insert operation
row_log_online_op(): Mark the error in online log when
InnoDB ran out of temporary space
fil_space_extend_must_retry(): Mark the os_has_said_disk_full
as true if os_file_set_size() fails
btr_cur_pessimistic_update(): Return error code when
btr_cur_pessimistic_insert() fails
ddl_log_write(): Release the global ddl lock after releasing
the log memory entry when error was encountered
btr_cur_optimistic_update(): Relax the assertion that
blob pointer can be null during rollback because InnoDB can
ran out of space while allocating the external page
ha_innobase::extra(): Rollback the transaction during DDL before
calling convert_error_code_to_mysql().
row_undo_mod_upd_exist_sec(): Remove the assertion which says
that InnoDB should fail to build index entry when rollbacking
an incomplete transaction after crash recovery. This scenario
can happen when InnoDB ran out of space.
row_upd_changes_ord_field_binary_func(): Relax the assertion to
make that externally stored field can be null when InnoDB ran out
of space.
The problem was that aria_backup_client code did not intialize
maria_tmpdir, which is used during recovery if repair table is needed to
reconstruct indexes.
Problem:
=======
- During inplace algorithm, concurrent DML fails to write
the log operation into the temporary file. InnoDB fail to
mark the error for the online log.
- ddl_log_write() releases the global ddl lock prematurely before
release the log memory entry
Fix:
===
row_log_online_op(): Mark the error in online log when
InnoDB ran out of temporary space
fil_space_extend_must_retry(): Mark the os_has_said_disk_full
as true if os_file_set_size() fails
btr_cur_pessimistic_update(): Return error code when
btr_cur_pessimistic_insert() fails
ddl_log_write(): Release the global ddl lock after releasing the
log memory entry when error was encountered
btr_cur_optimistic_update(): Relax the assertion that
blob pointer can be null during rollback because InnoDB can
ran out of space while allocating the external page
row_undo_mod_upd_exist_sec(): Remove the assertion which says
that InnoDB should fail to build index entry when rollbacking
an incomplete transaction after crash recovery. This scenario
can happen when InnoDB ran out of space.
row_upd_changes_ord_field_binary_func(): Relax the assertion to
make that externally stored field can be null when InnoDB ran out
of space.
rename_table_in_stat_tables(): Allocate TABLE_LIST[STATISTICS_TABLES]
from the heap instead of the stack, to pass -Wframe-larger-than=16384
in an optimized CMAKE_BUILD_TYPE=RelWithDebInfo build.
When table2myisam() prepares recinfo structures BIT field was skipped
because pack_length_in_rec() returns 0. Instead of BIT field
DB_ROW_HASH_1 field was taken into recinfo structure and its length
was added to reclength. This is wrong because not stored fields must
not be prepared as record columns (MI_COLUMNDEF) in storage
layer. 0-length fields are prepared in "reserve space for null bits"
branch.
The problem only occurs with tables where there is no data for the
main record outside of the null bits.
The fix updates minpos condition so we avoid fields after
stored_rec_length and these are not stored by
definition. share->reclength already includes not stored lengths from
CREATE TABLE so we cannot use it as minpos starting point.
In Aria there is no "reserve space for null bits" and it does not
create column definition for BIT. Also there is no
setup_vcols_for_repair() to reproduce the issue. But nonetheless it
creates column definition for not stored fields redundantly, so we
patch table2maria() as well. The test case for Aria tries to
demonstrate BIT field works, it does not reproduce any issues (as
redundant column definition for not stored field seem to not cause any
problems).
ER_DUP_ENTRY on partitioned table
Now as c1492f3d07 (MDEV-36115) restores m_last_part table->file
points to partition p0 while the error happens in p1, so error index
does not match ib_table in innobase_get_mysql_key_number_for_index().
This case is handled by separate code block in
innobase_get_mysql_key_number_for_index() which was wrong on using
secondary index for dict_index_is_auto_gen_clust() and it was not
covered by the tests.
--result_format 2 command fails with assertion:
DbugExit (why=0x7fffffff49e0 "missing DBUG_RETURN or DBUG_VOID_RETURN
macro in function \"do_result_format_version\"\n") at
../src/dbug/dbug.c:2045
After the membership change on a newly synced node, then this is just a
warning and safe to suppress.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Test changes only. Both warnings are expected and
should be suppressed because we intentionally inject
different inconsistencies on two nodes and then join
them back with membership change.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Test changes only. Add wait_condition so that all nodes
are in the expected state and add debug output if issue
does reproduce.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Issue:
Mariadb acquires additional MDL locks on UPDATE/INSERT/DELETE statements
on table with foreign keys. For example, table t1 references t2, an
UPDATE to t1 will MDL lock t2 in addition to t1.
A replica may deliver an ALTER t1 and UPDATE t2 concurrently for
applying. Then the UPDATE may acquire MDL lock for t1, followed by a
conflict when the ALTER attempts to MDL lock on t1. Causing a BF-BF
conflict.
Solution:
Additional keys for the referenced/foreign table needs to be added
to avoid potential MDL conflicts with concurrent update and DDLs.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Add wait_condition to wait until all inserted rows are replicated
so that show create table is deterministic.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
page_is_corrupted(): Do not allocate the buffers from stack,
but from the heap, in xb_fil_cur_open().
row_quiesce_write_cfg(): Issue one type of message when we
fail to create the .cfg file.
update_statistics_for_table(), read_statistics_for_table(),
delete_statistics_for_table(), rename_table_in_stat_tables():
Use a common stack buffer for Index_stat, Column_stat, Table_stat.
ha_connect::FileExists(): Invoke push_warning_printf() so that
we can avoid allocating a buffer for snprintf().
translog_init_with_table(): Do not duplicate TRANSLOG_PAGE_SIZE_BUFF.
Let us also globally enable the GCC 4.4 and clang 3.0 option
-Wframe-larger-than=16384 to reduce the possibility of introducing
such stack overflow in the future. For RocksDB and Mroonga we relax
these limits.
Reviewed by: Vladislav Lesin
In commit b6923420f3 (MDEV-29445)
some hash tables were accidentally created with the minimum size
(101 entries) instead of correctly deriving the size from the
initial innodb_buffer_pool_size. This led to very long hash bucket
chains, which are very slow to traverse.
ut_find_prime(): Assert that the size is nonzero in order to catch
this type of regression in the future.
innodb_init_params(): Do not bother reading buf_pool.curr_size()
when it is known to be 0,
srv_start(): Correctly initialize srv_lock_table_size to 5 times
buf_pool.curr_size(), that is, the buffer pool size in pages,
between invoking buf_pool.create() and lock_sys.create().
btr_search_enable(), dict_sys_t::create(), dict_sys_t::resize():
Correctly refer to buf_pool.curr_pool_size(), that is,
innodb_buffer_pool_size in bytes, when calculating the hash table size.
In MDEV-29445 the expressions buf_pool_get_curr_size() were
accidentally replaced with buf_pool.curr_size().
buf_buddy_shrink(): Properly cover the case when KEY_BLOCK_SIZE
corresponds to the innodb_page_size, that is, the ROW_FORMAT=COMPRESSED
page frame is directly allocated from the buffer pool, not via the
binary buddy allocator.
buf_LRU_check_size_of_non_data_objects(): Avoid a crash when the
buffer pool is being shrunk.
buf_pool_t::shrink(): Abort if over 95% of the shrunk buffer pool
would be occupied by the adaptive hash index or record locks.
In commit b6923420f3 (MDEV-29445)
we started to specify the MAP_POPULATE flag for allocating the
InnoDB buffer pool. This would cause a lot of time to be spent
on __mm_populate() inside the Linux kernel, such as 16 seconds
to pre-fault or commit innodb_buffer_pool_size=64G.
Let us revert to the previous way of allocating the buffer pool
at startup. Note: An attempt to increase the buffer pool size by
SET GLOBAL innodb_buffer_pool_size (up to innodb_buffer_pool_size_max)
will invoke my_virtual_mem_commit(), which will use MAP_POPULATE
to zero-fill and prefault the requested additional memory area, blocking
buf_pool.mutex.
Before MDEV-29445 we allocated the InnoDB buffer pool by invoking
mmap(2) once (via my_large_malloc()). After the change, we would
invoke mmap(2) twice, first via my_virtual_mem_reserve() and then
via my_virtual_mem_commit(). Outside Microsoft Windows, we are
reverting back to my_large_malloc() like allocation.
my_virtual_mem_reserve(): Define only for Microsoft Windows.
Other platforms should invoke my_large_virtual_alloc() and
update_malloc_size() instead of my_virtual_mem_reserve() and
my_virtual_mem_commit().
my_large_virtual_alloc(): Define only outside Microsoft Windows.
Do not specify MAP_NORESERVE nor MAP_POPULATE, to preserve compatibility
with my_large_malloc(). Were MAP_POPULATE specified, the mmap()
system call would be significantly slower, for example 18 seconds
to reserve 64 GiB upfront.
log_t::append_prepare_wait(): Do not attempt to read log_sys.write_lsn
because it is not protected by log_sys.latch but by write_lock, which
we cannot hold here. The assertion could fail if log_t::write_buf()
is executing concurrently, and it has not yet executed log_write_buf()
or updated log_sys.write_lsn.
Fixes up commit acd071f599 (MDEV-21923)
There were two issues with the test:
1. A race between a race_condition.inc and status variable, where the
status variable Rpl_semi_sync_master_status could be ON before the
semi-sync connection finished establishing, resulting in
Rpl_semi_sync_master_clients showing 0 (instead of 1). To fix this,
we simply instead wait for Rpl_semi_sync_master_clients to be 1
before proceeding.
2. Another race between a race_condition.inc and status variable,
where the wait_condition waited on a process_list command of
'BINLOG DUMP' to disappear to infer the binlog dump thread was
killed, to where we then verified semi-sync state was correct
using status variables. However, the 'BINLOG DUMP' command is
overridden with a killed status before the semi-sync tear-down
happens, and thereby we could see invalid values. The fix for
this is to change the wait_condition to instead wait for the
connection with the replication user is gone, because that stays
through the binlog dump thread tear-down life-cycle
log_t::append_prepare_wait(): Do not attempt to read log_sys.write_lsn
because it is not protected by log_sys.latch but by write_lock, which
we cannot hold here. The assertion could fail if log_t::write_buf()
is executing concurrently, and it has not yet executed log_write_buf()
or updated log_sys.write_lsn.
Fixes up commit acd071f599 (MDEV-21923)
The SQL service leaves the affected rows uninitialized.
The initialization of the spider plugin that uses
this service will fail under MSAN because there isn't
an initialized value to return at the end of the query.