MDEV-33502 Slowdown when running nested statement with many partitions
This change was triggered to help some MariaDB users with close to
10000 bits in their bitmaps.
- Change underlaying storage to be 64 bit instead of 32bit.
- This reduses number of loops to scan bitmaps.
- This can cause some bitmaps to be 4 byte large.
- Ensure that all not used top-bits are always 0 (simplifes code as
the last 64 bit storage is not a special case anymore).
- Use my_find_first_bit() to find the first set bit which is much faster
than scanning trough things byte by byte and then bit by bit.
Other things:
- Added a bool to remember if my_bitmap_init() did allocate the bitmap
array. my_bitmap_free() will only free arrays it did allocate.
This allowed me to remove setting 'bitmap=0' before calling
my_bitmap_free() for cases where the bitmap's where allocated externally.
- my_bitmap_init() sets bitmap to 0 in case of failure.
- Added 'universal' asserts to most bitmap functions.
- Change all remaining calls to bitmap_init() to my_bitmap_init().
- To finish the change from 2014.
- Changed all usage of uint32 in my_bitmap.h to my_bitmap_map.
- Updated bitmap_copy() to handle bitmaps of different size.
- Removed const from bitmap_exists_intersection() as this caused casts
on all usage.
- Removed not used function bitmap_set_above().
- Renamed create_last_word_mask() to create_last_bit_mask() (to match
name changes in my_bitmap.cc)
- Extended bitmap-t with test for more bitmap functions.
Problem:
=======
- When online alter of InnoDB statistics table happens,
any transaction which updates the statistics table
has to read the undo log and log the DML changes during
transaction commit. Applying undo log
(UndorecApplier::apply_undo_rec) requires a shared
lock on dictionary cache but dict_stats_save() already
holds write lock on dictionary cache. This leads to
abort of server during commit of statistics table changes.
Solution:
========
- Disallow LOCK=NONE operation for the InnoDB statistics table.
The reasoning is that statistics tables are typically
rather small, so any blocking would be rather short.
Writes to the statistics tables should be a rare operation.
srw_lock_debug::have_rd(), srw_lock_debug::have_wr():
For SUX_LOCK_GENERIC (no futex based synchronization primitives),
we cannot check if the underlying srw_lock is held by us.
Thanks to Dmitry Shulga for pointing out this build failure.
Problem:
========
Currently import operation fails with schema mismatch when
cfg file has hidden fts document id and hidden fts document index.
Fix:
====
To fix this issue, simply add the fts doc id column,
indexes in table definition and try to import the table.
In case of success:
1) update the fts document id in sys columns.
2) update the number of columns in sys tables.
3) insert the new fts index entry in sys indexes table
and sys fields.
4) Reload the table with new table definition
purge_sys_t::truncating_tablespace(): Clamp the
innodb_max_undo_log_size to the maximum number of pages
before converting the result into a 32-bit unsigned integer.
This fixes up commit f8c88d905b (MDEV-33213).
In later major versions, we would use 32-bit unsigned integer here
due to commit ca501ffb04
and the code would crash also on 64-bit processors.
Reviewed by: Debarun Banerjee
buf_read_ahead_linear(): If buf_pool.watch_is_sentinel(*bpage),
do not attempt to read the page frame because the pointer would be null
for the elements of buf_pool.watch[].
Hitting this bug requires the use of a non-default value of
innodb_change_buffering.
libxml2 2.12.0 made `xmlGetLastError()` return `const` pointer:
61034116d0
Clang 16 does not like this:
error: assigning to 'xmlErrorPtr' (aka '_xmlError *') from 'const xmlError *' (aka 'const _xmlError *') discards qualifiers
error: cannot initialize a variable of type 'xmlErrorPtr' (aka '_xmlError *') with an rvalue of type 'const xmlError *' (aka 'const _xmlError *')
Let’s update the variables to `const`.
For older versions, it will be automatically converted.
But then `xmlResetError(xmlError*)` will not like the `const` pointer:
error: no matching function for call to 'xmlResetError'
note: candidate function not viable: 1st argument ('const xmlError *' (aka 'const _xmlError *')) would lose const qualifier
Let’s replace it with `xmlResetLastError()`.
ALso remove `LIBXMLDOC::Xerr` protected member property.
It was introduced in 65b0e5455b
along with the `xmlResetError` calls.
It does not appear to be used for anything.
ha_innobase::check_if_supported_inplace_alter(): Require ALGORITHM=COPY
when creating a FULLTEXT INDEX on a versioned table.
row_merge_buf_add(), row_merge_read_clustered_index(): Remove the parameter
or local variable history_fts that had been added in the attempt to fix
MDEV-25004.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
After MDEV-31400, plugins are allowed to ask for retries when failing
initialisation. However, such failures also cause plugin system
variables to be deleted (plugin_variables_deinit()) before retrying
and are not re-added during retry.
We fix this by checking that if the plugin has requested a retry the
variables are not deleted. Because plugin_deinitialize() also calls
plugin_variables_deinit(), if the retry fails, the variables will
still be deleted.
Alternatives considered:
- remove the plugin_variables_deinit() from plugin_initialize() error
handling altogether. We decide to take a more conservative approach
here.
- re-add the system variables during retry. It is more complicated
than simply iterating over plugin->system_vars and call
my_hash_insert(). For example we will need to assign values to
the test_load field and extract more code from test_plugin_options(),
if that is possible.
fts_doc_id_cmp(): Replaces several duplicated functions for
comparing two doc_id_t*. On IA-32, AMD64, ARMv7, ARMv8, RISC-V
this should make use of some conditional ALU instructions.
On POWER there will be conditional jumps. Unlike the original
functions, these will return the correct result even if the
difference of the two doc_id does not fit in the int data type.
We use static_assert() and offsetof() to check at compilation time
that this function is compatible with the rbt_create() calls.
fts_query_compare_rank(): As documented, return -1 and not 1
when the rank are equal and r1->doc_id < r2->doc_id. This will
affect the result of ha_innobase::ft_read().
fts_ptr2_cmp(), fts_ptr1_ptr2_cmp(): These replace
fts_trx_table_cmp(), fts_trx_table_id_cmp().
The fts_savepoint_t::tables will be sorted by dict_table_t*
rather than dict_table_t::id. There was no correctness bug in
the previous comparison predicates. We can avoid one level of
unnecessary pointer dereferencing in this way.
Actually, fts_savepoint_t is duplicating trx_t::mod_tables.
MDEV-33401 was filed about removing it.
The added unit test innodb_rbt-t covers both the previous buggy comparison
predicate and the revised fts_doc_id_cmp(), using keys which led to
finding the bug. Thanks to Shaohua Wang from Alibaba for providing the
example and the revised comparison predicate.
Reviewed by: Thirunarayanan Balathandayuthapani
fts_doc_ids_sort(): Sort an array of doc_id_t by C++11 std::sort().
fts_doc_id_cmp(), ib_vector_sort(): Remove. The comparison was
returning an incorrect result when the difference exceeded the int range.
Reviewed by: Thirunarayanan Balathandayuthapani
MDEV-33308 CHECK TABLE is modifying .frm file even if --read-only
As noted in commit d0ef1aaf61,
MySQL as well as older versions of MariaDB server would during
ALTER TABLE ... IMPORT TABLESPACE write bogus values to the
PAGE_MAX_TRX_ID field to pages of the clustered index, instead of
letting that field remain 0.
In commit 8777458a6e this field
was repurposed for PAGE_ROOT_AUTO_INC in the clustered index root page.
To avoid trouble when upgrading from MySQL or older versions of MariaDB,
we will try to detect and correct bogus values of PAGE_ROOT_AUTO_INC
when opening a table for the first time from the SQL layer.
btr_read_autoinc_with_fallback(): Add the parameters to mysql_version,max
to indicate the TABLE_SHARE::mysql_version of the .frm file and the
maximum value allowed for the type of the AUTO_INCREMENT column.
In case the table was originally created in MySQL or an older version of
MariaDB, read also the maximum value of the AUTO_INCREMENT column from
the table and reset the PAGE_ROOT_AUTO_INC if it is above the limit.
dict_table_t::get_index(const dict_col_t &) const: Find an index that
starts with the specified column.
ha_innobase::check_for_upgrade(): Return HA_ADMIN_FAILED if InnoDB
needs upgrading but is in read-only mode. In this way, the call to
update_frm_version() will be skipped.
row_import_autoinc(): Adjust the AUTO_INCREMENT column at the end of
ALTER TABLE...IMPORT TABLESPACE. This refinement was suggested by
Debarun Banerjee.
The changes outside InnoDB were developed by Michael 'Monty' Widenius:
Added print_check_msg() service for easy reporting of check/repair messages
in ENGINE=Aria and ENGINE=InnoDB.
Fixed that CHECK TABLE do not update the .frm file under --read-only.
Added 'handler_flags' to HA_CHECK_OPT as a way for storage engines to
store state from handler::check_for_upgrade().
Reviewed by: Debarun Banerjee
row_discard_tablespace(): Do not invoke dict_index_t::clear_instant_alter()
because that would corrupt any adaptive hash index entries in the table.
row_import_for_mysql(): Invoke dict_index_t::clear_instant_alter()
after detaching any adaptive hash index entries.
lock_table_children(): A new function to lock all child tables of a table.
We will only hold dict_sys.latch while traversing
dict_table_t::referenced_set. To prevent a race condition with
std::set::erase() we will copy the pointers to the child tables to a
local vector. Once we have acquired MDL and references to all child tables,
we can safely release dict_sys.latch, wait for the locks, and finally
release the references.
dict_acquire_mdl_shared(): A new variant that takes mdl_context as a
parameter.
lock_table_for_trx(): Assert that we are not holding dict_sys.latch.
ha_innobase::truncate(): When foreign_key_checks=ON, assert that
no child tables exist (other than the current table).
In any case, we will invoke lock_table_children()
so that the child table metadata can be safely updated.
(It is possible that a child table is being created concurrently
with TRUNCATE TABLE.)
ha_innobase::delete_table(): Before and after acquiring exclusive
locks on the current table as well as all child tables,
check that FOREIGN KEY constraints will not be violated.
In this way, we can reject impossible DROP TABLE without having to
wait for locks first.
This fixes up commit 2ca1123464 (MDEV-26217)
and commit c3c53926c4 (MDEV-26554).
While the index_lock and block_lock include debug instrumentation
to keep track of shared lock holders, such instrumentation was
never part of the simpler srw_lock, and therefore some users of the
class implemented a limited form of bookkeeping.
srw_lock_debug encapsulates srw_lock and adds the data members
writer, readers_lock, and readers to keep track of the threads that
hold the exclusive latch or any shared latches. The debug checks
are available also with SUX_LOCK_GENERIC (in environments that do not
implement a futex-like system call).
dict_sys_t::latch: Use srw_lock_debug in debug builds.
This makes the debug fields latch_ex, latch_readers redundant.
fil_space_t::latch: Use srw_lock_debug in debug builds.
This makes the debug field latch_count redundant.
The field latch_owner must be preserved, because
fil_space_t::is_owner() is being used in all builds.
lock_sys_t::latch: Use srw_lock_debug in debug builds.
This makes the debug fields writer, readers redundant.
lock_sys_t::is_holder(): A new debug predicate to check if
the current thread is holding lock_sys.latch in any mode.
trx_rseg_t::latch: Use srw_lock_debug in debug builds.
If we fail to open a tablespace while looking for FILE_CHECKPOINT, we
set the corruption flag. Specifically, if encryption key is missing, we
would not be able to open an encrypted tablespace and the flag could be
set. We miss checking for this flag and report "Missing FILE_CHECKPOINT"
Address review comment to improve the test. Flush pages before starting
no-checkpoint block. It should improve the number of cases where the
test is skipped because some intermediate checkpoint is triggered.
This column (`COUNT_TRANSACTIONS_RETRIES`) is defined as `BIGINT UNSIGNED`
(64-bit unsigned integer) in the user-visible SQL definition:
182ff21ace/storage/perfschema/table_replication_applier_status.cc (L66)
"COUNT_TRANSACTIONS_RETRIES BIGINT unsigned not null comment 'The number of retries that were made because the replication SQL thread failed to apply a transaction.',"
And its value is internally set/updated using the `set_field_ulonglong`
function:
182ff21ace/storage/perfschema/table_replication_applier_status.cc (L231-L233)
case 3: /* total number of times transactions were retried */
set_field_ulonglong(f, m_row.count_transactions_retries);
break;
… but the structure where it is stored allocates only `ulong` for it:
182ff21ace/storage/perfschema/table_replication_applier_status.h (L62)
ulong count_transactions_retries;
As a result of this inconsistency:
1. On any platform where `ulong` is `uint32_t` and `ulonglong` is `uint64_t`,
setting this value would corrupt the 4 bytes of memory *following* the 4
bytes actually allocated for it.
Likely this problem was never noticed because this is the final element in
the structure, and the structure is padded by the compiler to prevent
memory corruption errors.
2. On any BIG-ENDIAN platform where `ulong` is `uint32_t` and `ulonglong`
is `uint64_t`, reading back the value of this column will result in
total garbage.
Likely this problem was never noticed because MariaDB has not been
tested on 32-bit big-endian platforms.
In order not to affect the user-visible/SQL definition of this column, the
correct way to fix this issue is to change it to `ulonglong` in the
structure definition. See
https://github.com/MariaDB/server/pull/2763/files#r1329110832 for the
original identification and discussion of this issue.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services
mariadb-backup with --prepare option could result in empty redo log
file. When --prepare is followed by --prepare --export, we exit early
in srv_start function without opening the ibdata1 tablespace. Later
while trying to read rollback segment header page, we hit the debug
assert which claims that the system space should already have been
opened.
There are two assert cases here.
Issue-1: System tablespace object is not there in fil space hash i.e.
srv_sys_space.open_or_create() is not called.
Issue-2: The system tablespace data file ibdata1 is not opened i.e.
fil_system.sys_space->open() is not called.
Fix: For empty redo log and restore operation, open system tablespace
before returning.
This follows up commit 383f77cd84
which simplified dict_table_schema_check().
Note: We can display quoted names like this:
my_snprintf(buf, sizeof buf, "%`.*s.%`s",
int(t->name.dblen()), t->name.m_name, t->name.basename());
During import, if cfg file is not specified, we don't update the autoinc
field in innodb dictionary object dict_table_t. The next insert tries to
insert from the starting position of auto increment and fails.
It can be observed that the issue is resolved once server is restarted
as the persistent value is read correctly from PAGE_ROOT_AUTO_INC from
index root page. The patch fixes the issue by reading the the auto
increment value directly from PAGE_ROOT_AUTO_INC during import if cfg
file is not specified.
Test Fix:
1. import_bugs.test: Embedded mode warning has absolute path. Regular
expression replacement in test.
2. full_crc32_import.test: Table level auto increment mismatch after
import. It was using the auto increment data from the table prior to
discard and import which is not right. This value has cached auto
increment value higher than the actual inserted value and value stored
in PAGE_ROOT_AUTO_INC. Updated the result file and added validation for
checking the maximum value of auto increment column.
Add UTF8_IS_UTF8MB3 to the (session) old_mode in connections made with
sql service to run init queries
The connection is new and the global variable value takes effect
rather than the session value from the caller of spider_db_init
Reason:
======
undo_space_dblwr test case fails if the first page of undo
tablespace is not flushed before restart the server. While
restarting the server, InnoDB fails to detect the first
page of undo tablespace from doublewrite buffer.
Fix:
===
Use "ib_log_checkpoint_avoid_hard" debug sync point
to avoid checkpoint and make sure to flush the
dirtied page before killing the server.
innodb_make_page_dirty(): Fails to set
srv_fil_make_page_dirty_debug variable.
row_search_mvcc(): Revise an overflow check, disabling it on 64-bit
systems. The maximum number of consecutive record reads in a key range
scan should be limited by the maximum number of records per page
(less than 2^13) and the maximum number of pages per tablespace (2^32)
to less than 2^45. On 32-bit systems we can simplify the overflow check.
Reviewed by: Vladislav Lesin
There are two array fields in spider_share with similar names:
share->use_sql_dbton_ids that maps from i to the i-th dbton used by
share. Thus it should be used only when i iterates over all distinct
dbtons used by share.
share->sql_dbton_ids that maps from i to the dbton used by the i-th
link of the share. Thus it should be used only when i iterates over
all links of a share.
We correct instances where share->sql_dbton_ids should be used instead
of share->use_sql_dbton_ids.
Most things where wrong in the test suite.
The one thing that was a bug was that table_map_id was in some places
defined as ulong and in other places as ulonglong. On Linux 64 bit this
is not a problem as ulong == ulonglong, but on windows this caused failures.
Fixed by ensuring that all instances of table_map_id are ulonglong.
MCOL-5611 supporting with Boost-1.80, the version "next_prime"
disappears from https://github.com/boostorg/unordered/blob/boost-1.79.0/include/boost/unordered/detail/implementation.hpp
makes it the currenly highest supported versions.
Lets check this version.
While CMake-3.19+ supports version ranges in package determinations this
isn't supported for Boost in Cmake-3.28. So we check for the 1.80 and
don't compile ColumnStore.