Code review fixes.
The general change, proposed by Debarun Banerjee, is bypassing not only
X, but also S-lock to avoid deadlock for cases like:
S1 X2(waiting for S1) S3(witing for X2) <- X1 should bypass both S3 and X2
This commit implements
mysql/mysql-server@7037a0bdc8
functionality, i.e. if some transaction A holds not-gap S-lock on some
record, and some other transactions B={b1, b2, ..., bn} have not-gap
X-locks waiting for the S-lock of transaction A, and transaction A
requests not-gap and not insert intention X-lock which conflicts with
the X-locks of transactions B and does not conflict with another locks
in the queue, then grant the X-lock to transaction A.
MySQL's commit contains the following explanation of why insert-intention
locks must not overtake a waiting ordinary or gap locks:
"It is important that this decission rule doesn't allow
INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`,
`X`, `X|GAP`), as inserting a record into a gap would split such WAITING
lock, violating the invariant that each transaction can have at most
single WAITING lock at any time."
I would add to the explanation the following. Suppose we have trx 1 which
holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t"
or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and
MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record.
And then trx 1 wants to insert some record just before the locked record.
It requests insert-intention lock, and if the lock overtakes trx 2 lock,
there will be phantom records for trx 2 in RR. lock_delete_updated.test
shows how "DELETE" allows to insert some records in already scanned gap
and misses some records to delete.
The current implementation differs from MySQL implementation. There are
two key differences:
1. Lock queue ordering. In MySQL all waiting locks precede all granted
locks. A new waiting lock is added to the head of the queue, a new
granted lock is added to the end of the queue, if some waiting lock
is granted, it's moved to the end of the queue. In MariaDB any new
lock is added to the end of the queue and waiting lock does not change
its position in the queue where the lock is granted. The rule is that
blocking lock must be located before blocked lock in lock queue. We
maintain the rule with inserting bypassing lock just before bypassed
one.
2. MySQL implementation uses some object(locksys::Trx_locks_cache) which
can be passed to consecutive calls to rec_lock_has_to_wait() for the
same trx and heap_no to cache the result of checking if trx has a
granted lock which is blocking the waiting lock(see
locksys::Trx_locks_cache::has_granted_blocker()). The current
implementation does not use such object, because it looks for such
granted lock on the level of lock_rec_other_has_conflicting() and
lock_rec_has_to_wait_in_queue(). I.e. there is no need in additional
lock queue iteration in
locksys::Trx_locks_cache::has_granted_blocker(), as we already iterate
it in lock_rec_other_has_conflicting() and
lock_rec_has_to_wait_in_queue().
During the testing the following case was found. Suppose we have
delete-marked record and going to do inplace insert into
that delete-marked record. Usually we don't create explicit lock if
there are no conlicting with not gap X-lock locks(see
lock_clust_rec_modify_check_and_lock(), btr_cur_update_in_place()). The
implicit lock will be converted to explicit one by demand.
That can happen during INSERT, the not-gap S-lock can
be acquired on searching for duplicates(see
row_ins_duplicate_error_in_clust()), and, if delete-marked record is
found, inplace insert(see btr_cur_upd_rec_in_place()) modifies the
record, what is treated as implicit lock.
But there can be a case when some transaction trx1 holds not-gap S-lock,
another transaction trx2 creates waiting X-lock, and then trx2 tries to
do inplace insert. Before the fix the waiting X-lock of trx2 would be
conflicting lock, and trx1 would try to create explicit X-lock, what
would cause deadlock, and one of the transactions whould be rolled back.
But after the fix, trx2 waiting X-lock is not treated as conflicting
with trx1 X-lock anymore, as trx1 already holds S-lock. If we don't create
explicit lock, then some other transaction trx3 can create it during
implicit to explicit lock conversion and place it at the end of the
queue. So there can be the following locks order in the queue:
S1(granted) X2(waiting) X1(granted)
The above queue is not valid, because all granted trx1 locks must be
placed before waiting trx2 lock. Besides, lock_rec_release_try() can
remove S(granted, trx1) lock and grant X lock to trx 2, and there can be
two granted X-locks on the same record:
X2(granted) X1(granted)
Taking into account that lock_rec_release_try() can release cell and
lock_sys latches leaving some locks unreleased, the queue validation
function can fail in any unexpected place.
It can be fixed with two ways:
1) Place explicit X(granted, trx1) lock before X(waiting, trx2) lock
during implicit to explicit lock conversion. This option is implemented
in MySQL, as granted lock is always placed at the top of locks queue,
and waiting locks are placed at the bottom of the queue. MariaDB does
not do this, and implementing this variant would require conflicting
locks search before converting implicit to explicit lock, what, in
turns, would require cell and/or lock_sys latch acquiring.
2) Create and place X(granted, trx1) lock before X(waiting, trx2) during
inplace INSERT, i.e. when lock_rec_lock() is invoked from
lock_clust_rec_modify_check_and_lock() or
lock_sec_rec_modify_check_and_lock(), if X(waiting, trx2) is
bypassed. Such a way we don't need in additional conflicting locks
search, as they are searched anyway in lock_rec_low().
This fix implements the second variant(see the changes around
c_lock_info.insert_after in lock_rec_lock). I.e. if some record was
delete-marked and we do inplace insert in such a record, and some lock for
bypass was found, create explicit lock to avoid conflicting lock search on
each implicit to explicit lock conversion. We can remove it if MDEV-35624
is implemented.
lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue():
search locks to bypass along with conflicting locks searching in the
same loop. The result is returned in conflicting_lock_info object.
There can be several locks to bypass, only the first one is returned to
limit lock_rec_find_similar_on_page() with the first bypassed lock to
preserve "blocking before blocked" invariant. conflicting_lock_info also
contains a pointer to the lock, after which we can insert bypassing
lock. This lock precedes bypassed one.
Bypassing lock can be next-key lock, and the following cases are
possible:
1. S1(not-gap, granted) II2(granted) X3(waiting for S1),
When new X1(ordinary) lock is acquired, there will be the following
locks queue:
S1(not-gap, granted) II2(granted) X1(ordinary, granted) X3(waiting for
S1)
If we had inserted new X1 lock just after S1, and S1 had been released
on transaction commit or rollback, we would have the following
sequence in the locks queue:
X1(ordinary, granted) II2(granted) X3(waiting for X1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is not a real issue as II lock once granted can be
ignored but it could possibly hit some assert(taking into account
that lock_release_try() can release lock_sys latch, and other threads
can acquire the latch and validate lock queue) as it breaks our design
constraint that any granted lock in the queue should not conflict
with locks ahead in the queue. But lock_rec_queue_validate() does not
check the above constraint. We place new bypassing lock just before
bypassed one, but there still can be the case when lock bitmap is used
instead of creating new lock object(see lock_rec_add_to_queue() and
lock_rec_find_similar_on_page()), and the lock, which owns the
bitmap, can precede II2(granted). We can either disable
lock_rec_find_similar_on_page() space optimization for bypassing locks
or treat "X1(ordinary, granted) II2(granted)" sequence as valid. As
we don't currently have the function which would fail on the above
sequence, let treat it as valid for the case, when lock_release()
execution is in process.
2. S1(ordinary, granted) II2(waiting for S1) X3(waiting for S1)
When new X1(ordinary) lock is acquired, there will be the following
locks queue:
S1(ordinary, granted) II2(waiting for S1) X1(ordinary, granted)
X3(waiting for S1).
After S1 releasing there will be:
II2(granted) X1(ordinary, granted) X3(waiting for X1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The above queue is valid because ordinary lock does not conflict with
II-lock(see lock_rec_has_to_wait()).
lock_rec_create_low(): insert new lock to the position which
lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue()
return if the lock is bypassing.
lock_rec_find_similar_on_page(): add ability to limit similiar lock search
with the certain lock to preserve "blocking before blocked" invariant for
all bypassed locks.
lock_rec_add_to_queue(): don't treat bypassed locks as waiting ones to
let lock bitmap reusing for bypassing locks.
lock_rec_lock(): fix inplace insert case, explained above.
lock_rec_dequeue_from_page(), lock_rec_rebuild_waiting_queue: move
bypassing lock to the correct place to preserve "blocking before blocked"
invariant.
Reviewed by: Debarun Banerjee, Marko Mäkelä.
Let us make some member functions of lock_sys_t non-static
to avoid some shuffling of function parameter registers.
lock_cancel_waiting_and_release(): Declare static, because there
are no external callers.
Reviewed by: Debarun Banerjee
trx_t::autoinc_locks: Use small_vector<lock_t*,4> in order to avoid any
dynamic memory allocation in the most common case (a statement is
holding AUTO_INCREMENT locks on at most 4 tables or partitions).
lock_cancel_waiting_and_release(): Instead of removing elements from
the middle, simply assign nullptr, like lock_table_remove_autoinc_lock().
The added test innodb.auto_increment_lock_mode covers the dynamic memory
allocation as well as nondeterministically (occasionally) covers
the out-of-order lock release in lock_table_remove_autoinc_lock().
Reviewed by: Debarun Banerjee
MY_RELAX_CPU(): On GCC and compatible compilers (including clang and
its derivatives), let us use a null inline assembler block as the
fallback. This should benefit s390x and LoongArch, for example.
Also, let us remove the generic fallback block that does exactly the
opposite of what this function aims to achieve: avoid hogging the
memory bus so that other threads will have a chance to let our spin
loop to proceed.
On RISC-V, we will use __builtin_riscv_pause() which is a valid
instruction encoding in all ISA versions according to
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562936.html
-DCMAKE_BUILD_TYPE=xxx sets some C compiler flags according to the build type.
-DBUILD_CONFIG was completely overwriting them in some compiler / arch
combinations and not in others. Make it consistently "append-only", not
overwrite.
Also, enforce the same set of flags for Release and RelWithDebInfo.
This reverts ff1f611a0d as it is no longer
necessary.
Avoid assert()
By default, CMAKE_BUILD_TYPE RelWithDebInfo or Release implies
-DNDEBUG, which disables the assert() macro. MariaDB is deviating
from that. Let us be explicit to use assert() only in debug builds.
By default, CMAKE_BUILD_TYPE RelWithDebInfo or Release implies -DNDEBUG,
which disables the assert() macro. MariaDB is deviating from that.
Let us be explicit to use assert() only in debug builds.
This fixes up 1b8358d943
The macros ut_ad() and DBUG_ASSERT() can evaluate their argument twice.
That is wrong for any read-modify-write arguments.
Thanks to Nikita Malyavin for pointing this out.
when testing MDEV-34539 create a table specifically for the test,
don't use a system table as a shortcut to save a couple of lines.
followup for 8d813f080b
use the same condition in
fill_schema_table_from_frm() when open_table_from_share() fails, as in
fill_schema_table_from_frm() when tdc_aquire_share() fails and as in
fill_schema_table_from_open() when open_table_from_share() fails
get_all_tables() skipped tables if the user has no privileges on
the schema itself and no granted privilege on any tables in the schema.
that is, it was skipping performance_schema tables (privileges
on them aren't explicitly granted, but internally hard-coded)
To fix:
* extend ACL_internal_table_access::check() method with
`bool any_combination_will_do`
* fix all perfschema privilege checks to take it into account.
* don't reuse table_acl_check object for all tables, initialize it
for every table otherwise GRANT_INTERNAL_INFO will leak
* remove incorrect privilege check from get_all_tables()
cannot have an assert in Warning_info::push_warning()
because SQL command SIGNAL can set an absolutely arbitrary
message, even an empty one or ending with '\n'
move the assert into push_warning() and my_message_sql().
followup for 9508a44c37
Most InnoDB functions do not throw any exceptions, not even indirectly
std::bad_alloc, which could be thrown by a C++ memory allocation function.
Let us annotate many functions with noexcept in order to reduce the code
footprint related to exception handling.
Reviewed by: Thirunarayanan Balathandayuthapani
The issue is caused by a logic error in Item_sum::get_tmp_table_item() method:
it resets arguments of the item to point to the result fields during
change_ref_to_tmp_fields() call. However, Item_sum arguments must not be modified.
It is enough for Item_sum objects to call ancestor's implementation
Item::get_tmp_table_item().
This fix is in accordance with MySQL commit 2e3dc09087c24798c90e05163ed3d931f6b93db3
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>
The LOCK_global_system_variables must not be held when taking mutexes
such as LOCK_commit_ordered and LOCK_log, as this causes inconsistent
mutex locking order that can theoretically cause the server to
deadlock.
To avoid this, temporarily release LOCK_global_system_variables in two
system variable update functions, like it is done in many other
places.
Enforce the correct locking order at server startup, to more easily
catch (in debug builds) any remaining wrong orders that may be hidden
elsewhere in the code.
Note that when this is merged to 11.4, similar unlock/lock of
LOCK_global_system_variables must be added in update_binlog_space_limit()
as is done in binlog_checksum_update() and fix_max_binlog_size(), as this
is a new function added in 11.4 that also needs the same fix. Tests will
fail with wrong mutex order until this is done.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
ha_innobase::delete_table(): Clear trx->dict_operation_lock_mode
after, not before invoking trx->rollback(), so that
row_undo_mod_parse_undo_rec() will be invoked with dict_locked=true
and dict_sys_t::freeze() will not be invoked for loading a table
definition. Inside dict_sys_t::freeze(), an assertion !have_any()
would fail when the current thread is already holding the latch.
This fixes up commit c5fd9aa562 (MDEV-25919).
Reviewed by: Debarun Banerjee
during FLUSH PRIVILEGES, allow_all_hosts temporarily goes out of sync
with acl_check_hosts and acl_wild_hosts.
As it's tested in acl_check_host() without a mutex, let's re-test it
under a mutex to make sure the value is correct.
Note that it's just an optimization and it's ok to see outdated
allow_all_hosts value here.
* replace the message away in the test result
* remove "feedback plugin:" prefix, it's a server message, not plugin's
* downgrade to the warning, because
1) it's not a failure, no operation was aborted, server still works
2) it's something actionable, so not a [Note] either
- InnoDB fails to recover the full crc32 encrypted page from
doublewrite buffer. The reason is that buf_dblwr_t::recover()
fails to identify the space id from the page because the page has
been encrypted from FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION bytes.
Fix:
===
buf_dblwr_t::recover(): preserve any pages whose space_id
does not match a known tablespace. These could be encrypted pages
of tablespaces that had been created with
innodb_checksum_algorithm=full_crc32.
buf_page_t::read_complete(): If the page looks corrupted and the
tablespace is encrypted and in full_crc32 format, try to
restore the page from doublewrite buffer.
recv_dblwr_t::recover_encrypted_page(): Find the page which
has the same page number and try to decrypt the page using
space->crypt_data. After decryption, compare the space id.
Write the recovered page back to the file.
Mariabackup (mariadb-backup) supports the --use-memory option that
sets the buffer pool size for innodb. However, current SST scripts
do not use this option. This commit adds support for this option,
the value for which can be specified via the "use_memory" parameter
in the configuration file in the [sst], [mariabackup] or [xtrabackup]
sections (supported only for compatibility with old configurations).
In addition, if the innodb_buffer_pool_size option is specified in
the user configuration (in the main server configuration sections)
or passed to the SST scripts or the server via arguments, its value
is also passed to mariadb-backup as the value for the --use-memory
option.
A new section name [mariabackup] has also been added, which can
be used instead of the deprecated [xtrabackup] (the section name
"mariabackup" was specified in the documentation, but was not
actually supported by SST scripts before this commit).
CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'
An sql thread can reach the "Slave has read all relay log" state
and then start reading relay log again. Let's use a more generic
pattern to retrieve the sql thread ID even if it's not
in the "read all relay log" state.