Commit graph

698 commits

Author SHA1 Message Date
Marko Mäkelä
8d16da1487 MDEV-24789: Reduce lock_sys mutex contention further
lock_sys_t::deadlock_check(): Assume that only lock_sys.wait_mutex
is being held by the caller.

lock_sys_t::rd_lock_try(): New function.

lock_sys_t::cancel(trx_t*): Kill an active transaction that may be
holding a lock.

lock_sys_t::cancel(trx_t*, lock_t*): Cancel a waiting lock request.

lock_trx_handle_wait(): Avoid acquiring mutexes in some cases,
and in never acquire lock_sys.latch in exclusive mode.
This function is only invoked in a semi-consistent read
(locking a clustered index record only if it matches the search condition).
Normally, lock_wait() will take care of lock waits.

lock_wait(): Invoke the new function lock_sys_t::cancel() at the end,
to avoid acquiring exclusive lock_sys.latch.

lock_rec_other_trx_holds_expl(): Use LockGuard instead of LockMutexGuard.

lock_release_autoinc_locks(): Explicitly acquire table->lock_mutex,
in case only a shared lock_sys.latch is being held. Deadlock::report()
will still hold exclusive lock_sys.latch while invoking
lock_cancel_waiting_and_release().

lock_cancel_waiting_and_release(): Acquire trx->mutex in this function,
instead of expecting the caller to do so.

lock_unlock_table_autoinc(): Only acquire shared lock_sys.latch.

lock_table_has_locks(): Do not acquire lock_sys.latch at all.

Deadlock::check_and_resolve(): Only acquire shared lock_sys.latchm
for invoking lock_sys_t::cancel(trx, wait_lock).

innobase_query_caching_table_check_low(),
row_drop_tables_for_mysql_in_background(): Do not acquire lock_sys.latch.
2021-03-02 14:26:33 +02:00
Marko Mäkelä
01b44c054d MDEV-25026 Various code paths are accessing freed pages
The test case encryption.innodb_encrypt_freed was failing in
MemorySanitizer builds.

recv_recover_page(): Mark non-recovered pages as freed.

fil_crypt_rotate_page(): Before comparing the block->frame contents,
check if the block was marked as freed.

Other places: Whenever using BUF_GET_POSSIBLY_FREED, check the
block->page.status before accessing the page frame.

(Both uses of BUF_GET_IF_IN_POOL should be correct now.)
2021-03-02 11:51:22 +02:00
Marko Mäkelä
7cf4419fc4 MDEV-24789: Reduce lock_sys.wait_mutex contention
A performance regression was introduced by
commit e71e613353 (MDEV-24671)
and mostly addressed by
commit 455514c800.

The regression is likely caused by increased contention
lock_sys.latch (former lock_sys.mutex), possibly indirectly
caused by contention on lock_sys.wait_mutex. This change aims to
reduce both, but further improvements will be needed.

lock_wait(): Minimize the lock_sys.wait_mutex hold time.

lock_sys_t::deadlock_check(): Add a parameter for indicating
whether lock_sys.latch is exclusively locked.

trx_t::was_chosen_as_deadlock_victim: Always use atomics.

lock_wait_wsrep(): Assume that no mutex is being held.

Deadlock::report(): Always kill the victim transaction.

lock_sys_t::timeout: New counter to back MONITOR_TIMEOUT.
2021-02-26 14:58:48 +02:00
Marko Mäkelä
21987e5919 MDEV-20612 fixup: Reduce hash table lookups
Let us calculate the hash table cell address while we are calculating
the latch address, to avoid repeated computations of the address.
The latch address can be derived from the cell address with a simple
bitmask operation.
2021-02-24 14:47:42 +02:00
Marko Mäkelä
43b239a081 MDEV-24915 Galera conflict resolution is unnecessarily complex
The fix of MDEV-23328 introduced a background thread for
killing conflicting transactions.
Thanks to the refactoring that was conducted in MDEV-24671,
the high-priority ("brute-force") applier thread can kill the
conflicting transactions itself, before waiting for the
locks to be finally released (after the conflicting transactions
have been rolled back).

This also allows us to remove the hack LockGGuard that had to
be added in MDEV-20612, and remove Galera-related function
parameters from lock creation.
2021-02-18 12:16:51 +02:00
Marko Mäkelä
18dc5b0192 MDEV-20612 fixup: Remove a redundant check
lock_wait_rpl_report(): Only reload trx->lock.wait_lock
if lock_sys.wait_mutex had to be released and reacquired.
2021-02-18 12:02:36 +02:00
Marko Mäkelä
94b4578704 Merge 10.5 into 10.6 2021-02-17 19:39:05 +02:00
Marko Mäkelä
9f13670004 MDEV-24738 fixup: heap-use-after-poison in lock_sys_t::deadlock_check()
Deadlock::report(): Require the caller to acquire lock_sys.latch
if invoking on a transaction that is now owned by the current thread.
2021-02-17 17:44:23 +02:00
Marko Mäkelä
c68007d958 MDEV-24738 Improve the InnoDB deadlock checker
A new configuration parameter innodb_deadlock_report is introduced:
* innodb_deadlock_report=off: Do not report any details of deadlocks.
* innodb_deadlock_report=basic: Report transactions and waiting locks.
* innodb_deadlock_report=full (default): Report also the blocking locks.

The improved deadlock checker will consider all involved transactions
in one loop, even if the deadlock loop includes several transactions.
The theoretical maximum number of transactions that can be involved in
a deadlock is `innodb_page_size` * 8, limited by the persistent data
structures.

Note: Similar to
mysql/mysql-server@3859219875
our deadlock checker will consider at most one blocking transaction
for each waiting transaction. The new field trx->lock.wait_trx be
nullptr if and only if trx->lock.wait_lock is nullptr. Note that
trx->lock.wait_lock->trx == trx (the waiting transaction), while
trx->lock.wait_trx points to one of the transactions whose lock is
conflicting with trx->lock.wait_lock.

Considering only one blocking transaction will greatly simplify
our deadlock checker, but it may also make the deadlock checker
blind to some deadlocks where the deadlock cycle is 'hidden' by
the fact that the registered trx->lock.wait_trx is not actually
waiting for any InnoDB lock, but something else. So, instead of
deadlocks, sometimes lock wait timeout may be reported.

To improve on this, whenever trx->lock.wait_trx is changed, we
will register further 'candidate' transactions in Deadlock::to_check(),
and check for 'revealed' deadlocks as soon as possible, in lock_release()
and innobase_kill_query().

The old DeadlockChecker was holding lock_sys.latch, even though using
lock_sys.wait_mutex should be less contended (and thus preferred)
in the likely case that no deadlock is present.

lock_wait(): Defer the deadlock check to this function, instead of
executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().

DeadlockChecker: Complete rewrite:
(1) Explicitly keep track of transactions that are being waited for,
in trx->lock.wait_trx, protected by lock_sys.wait_mutex. Previously,
we were painstakingly traversing the lock heaps while blocking
concurrent registration or removal of any locks (even uncontended ones).
(2) Use Brent's cycle-detection algorithm for deadlock detection,
traversing each trx->lock.wait_trx edge at most 2 times.
(3) If a deadlock is detected, release lock_sys.wait_mutex,
acquire LockMutexGuard, re-acquire lock_sys.wait_mutex and re-invoke
find_cycle() to find out whether the deadlock is still present.
(4) Display information on all transactions that are involved in the
deadlock, and choose a victim to be rolled back.

lock_sys.deadlocks: Replaces lock_deadlock_found. Protected by wait_mutex.

Deadlock::find_cycle(): Quickly find a cycle of trx->lock.wait_trx...
using Brent's cycle detection algorithm.

Deadlock::report(): Report a deadlock cycle that was found by
Deadlock::find_cycle(), and choose a victim with the least weight.
Altogether, we may traverse each trx->lock.wait_trx edge up to 5
times (2*find_cycle()+1 time for reporting and choosing the victim).

Deadlock::check_and_resolve(): Find and resolve a deadlock.

lock_wait_rpl_report(): Report the waits-for information to
replication. This used to be executed as part of DeadlockChecker.
Replication must know the waits-for relations even if no deadlocks
are present in InnoDB.

Reviewed by: Vladislav Vaintroub
2021-02-17 12:44:08 +02:00
Marko Mäkelä
584e52118c MDEV-20612 fixup: Make comments refer to lock_sys.latch 2021-02-17 12:18:03 +02:00
Marko Mäkelä
e5d83ad472 MDEV-20612 fixup: Fix a memory leak in buffer pool resize 2021-02-16 11:27:13 +02:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Marko Mäkelä
26d6224dd6 MDEV-20612: Enable concurrent lock_release()
lock_release_try(): Try to release locks while only holding
shared lock_sys.latch.

lock_release(): If 5 attempts of lock_release_try() fail,
proceed to acquire exclusive lock_sys.latch.
2021-02-12 17:44:58 +02:00
Marko Mäkelä
b08448de64 MDEV-20612: Partition lock_sys.latch
We replace the old lock_sys.mutex (which was renamed to lock_sys.latch)
with a combination of a global lock_sys.latch and table or page hash lock
mutexes.

The global lock_sys.latch can be acquired in exclusive mode, or
it can be acquired in shared mode and another mutex will be acquired
to protect the locks for a particular page or a table.

This is inspired by
mysql/mysql-server@1d259b87a6
but the optimization of lock_release() will be done in the next commit.
Also, we will interleave mutexes with the hash table elements, similar
to how buf_pool.page_hash was optimized
in commit 5155a300fa (MDEV-22871).

dict_table_t::autoinc_trx: Use Atomic_relaxed.

dict_table_t::autoinc_mutex: Use srw_mutex in order to reduce the
memory footprint. On 64-bit Linux or OpenBSD, both this and the new
dict_table_t::lock_mutex should be 32 bits and be stored in the same
64-bit word. On Microsoft Windows, the underlying SRWLOCK is 32 or 64
bits, and on other systems, sizeof(pthread_mutex_t) can be much larger.

ib_lock_t::trx_locks, trx_lock_t::trx_locks: Document the new rules.
Writers must assert lock_sys.is_writer() || trx->mutex_is_owner().

LockGuard: A RAII wrapper for acquiring a page hash table lock.

LockGGuard: Like LockGuard, but when Galera Write-Set Replication
is enabled, we must acquire all shards, for updating arbitrary trx_locks.

LockMultiGuard: A RAII wrapper for acquiring two page hash table locks.

lock_rec_create_wsrep(), lock_table_create_wsrep(): Special
Galera conflict resolution in non-inlined functions in order
to keep the common code paths shorter.

lock_sys_t::prdt_page_free_from_discard(): Refactored from
lock_prdt_page_free_from_discard() and
lock_rec_free_all_from_discard_page().

trx_t::commit_tables(): Replaces trx_update_mod_tables_timestamp().

lock_release(): Let trx_t::commit_tables() invalidate the query cache
for those tables that were actually modified by the transaction.
Merge lock_check_dict_lock() to lock_release().

We must never release lock_sys.latch while holding any
lock_sys_t::hash_latch. Failure to do that could lead to
memory corruption if the buffer pool is resized between
the time lock_sys.latch is released and the hash_latch is released.
2021-02-12 17:44:32 +02:00
Marko Mäkelä
b01d8e1a33 MDEV-20612: Replace lock_sys.mutex with lock_sys.latch
For now, we will acquire the lock_sys.latch only in exclusive mode,
that is, use it as a mutex.

This is preparation for the next commit where we will introduce
a less intrusive alternative, combining a shared lock_sys.latch
with dict_table_t::lock_mutex or a mutex embedded in
lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash.
2021-02-11 14:52:10 +02:00
Marko Mäkelä
903464929c MDEV-20612 preparation: LockMutexGuard
Let us use the RAII wrapper LockMutexGuard for most operations where
lock_sys.mutex is acquired.
2021-02-11 14:36:11 +02:00
Marko Mäkelä
2e64513fba MDEV-20612 preparation: Fewer calls to buf_page_t::id() 2021-02-11 12:48:07 +02:00
Marko Mäkelä
74ab97f58f Cleanup: Remove lock_trx_lock_list_init(), lock_table_get_n_locks() 2021-02-07 11:18:21 +02:00
Marko Mäkelä
487fbc2e15 MDEV-21452 fixup: Introduce trx_t::mutex_is_owner()
When we replaced trx_t::mutex with srw_mutex
in commit 38fd7b7d91
we lost the SAFE_MUTEX instrumentation.
Let us introduce a replacement and restore the assertions.
2021-02-05 16:37:06 +02:00
Marko Mäkelä
455514c800 MDEV-24789: Try to reduce mutex contention 2021-02-05 16:16:44 +02:00
Marko Mäkelä
3e45f8e36a MDEV-24789: Reduce sizeof(trx_lock_t)
trx_lock_t::cond: Use pthread_cond_t directly, because no instrumentation
will ever be used. This saves sizeof(void*) and removes some duplicated
inline code.

trx_lock_t::was_chosen_as_wsrep_victim: Fold into
trx_lock_t::was_chosen_as_deadlock_victim.

trx_lock_t::cancel, trx_lock_t::rec_cached, trx_lock_t::table_cached:
Use only one byte of storage, reducing memory alignment waste.

On AMD64 GNU/Linux, MDEV-24671 caused a sizeof(trx_lock_t) increase
of 48 bytes (plus the PLUGIN_PERFSCHEMA overhead of trx_lock_t::cond).
These changes should save 32 bytes.
2021-02-05 13:15:56 +02:00
Marko Mäkelä
465bdabb7a Cleanup: Reduce some lock_sys.mutex contention
lock_table(): Remove the constant parameter flags=0.

lock_table_resurrect(): Merge lock_table_ix_resurrect() and
lock_table_x_resurrect().

lock_rec_lock(): Only acquire LockMutexGuard if lock_table_has()
does not hold.
2021-02-05 13:14:50 +02:00
Marko Mäkelä
de407e7cb4 MDEV-24731 fixup: bogus assertion
DeadlockChecker::search(): Move a bogus assertion into a condition.
If the current transaction is waiting for a table lock (on something
else than an auto-increment lock), it is well possible that other
transactions are holding not only a conflicting lock, but also an
auto-increment lock.

This mistake was noticed during the testing of MDEV-24731, but it was
accidentally introduced in commit 5f46385764.

lock_wait_end(): Remove an unused variable, and add an assertion.
2021-02-05 08:35:15 +02:00
Marko Mäkelä
5f46385764 MDEV-24731 Excessive mutex contention in DeadlockChecker::check_and_resolve()
The DeadlockChecker expects to be able to freeze the waits-for graph.
Hence, it is best executed somewhere where we are not holding any
additional mutexes.

lock_wait(): Defer the deadlock check to this function, instead
of executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().

DeadlockChecker::trx_rollback(): Merge with the only caller,
check_and_resolve().

LockMutexGuard: RAII accessor for lock_sys.mutex.

lock_sys.deadlocks: Replaces lock_deadlock_found.

trx_t: Clean up some comments.
2021-02-04 16:38:07 +02:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Jan Lindström
75546dfbb1 MDEV-24704 : Galera test failure on galera.galera_nopk_unicode
Analysis:
=========

Reason for test failure was a mutex deadlock between DeadlockChecker with stack

Thread 6 (Thread 0xffff70066070 (LWP 24667)):
0  0x0000ffff784e850c in __lll_lock_wait (futex=futex@entry=0xffff04002258, private=0) at lowlevellock.c:46
1  0x0000ffff784e19f0 in __GI___pthread_mutex_lock (mutex=mutex@entry=0xffff04002258) at pthread_mutex_lock.c:135
2  0x0000aaaaac8cd014 in inline_mysql_mutex_lock (src_file=0xaaaaacea0f28 "/home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc", src_line=762, that=0xffff04002258) at /home/buildbot/buildbot/build/mariadb-10.2.37/include/mysql/psi/mysql_thread.h:675
3  wsrep_thd_is_BF (thd=0xffff040009a8, sync=sync@entry=1 '\001') at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc:762
4  0x0000aaaaacadce68 in lock_rec_has_to_wait (for_locking=false, lock_is_on_supremum=<optimized out>, lock2=0xffff628952d0, type_mode=291, trx=0xffff62894070) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:826
5  lock_has_to_wait (lock1=<optimized out>, lock2=0xffff628952d0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:873
6  0x0000aaaaacadd0b0 in DeadlockChecker::search (this=this@entry=0xffff70061fe8) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:7142
7  0x0000aaaaacae2dd8 in DeadlockChecker::check_and_resolve (lock=lock@entry=0xffff62894120, trx=trx@entry=0xffff62894070) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:7286
8  0x0000aaaaacae3070 in lock_rec_enqueue_waiting (c_lock=0xffff628952d0, type_mode=type_mode@entry=3, block=block@entry=0xffff62076c40, heap_no=heap_no@entry=2, index=index@entry=0xffff4c076f28, thr=thr@entry=0xffff4c078810, prdt=prdt@entry=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:1796
9  0x0000aaaaacae3900 in lock_rec_lock_slow (thr=0xffff4c078810, index=0xffff4c076f28, heap_no=2, block=0xffff62076c40, mode=3, impl=0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:2106
10 lock_rec_lock (impl=false, mode=3, block=0xffff62076c40, heap_no=2, index=0xffff4c076f28, thr=0xffff4c078810) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:2168
11 0x0000aaaaacae3ee8 in lock_sec_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0xffff62076c40, rec=rec@entry=0xffff6240407f "\303\221\342\200\232\303\220\302\265\303\220\302\272\303\221\302\201\303\221\342\200\232", index=index@entry=0xffff4c076f28, offsets=0xffff4c080690, offsets@entry=0xffff70062a30, mode=LOCK_X, mode@entry=1653162096, gap_mode=0, gap_mode@entry=281470749427104, thr=thr@entry=0xffff4c078810) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:6082
12 0x0000aaaaacb684c4 in sel_set_rec_lock (pcur=0xaaaac841c270, pcur@entry=0xffff4c077d58, rec=0xffff6240407f "\303\221\342\200\232\303\220\302\265\303\220\302\272\303\221\302\201\303\221\342\200\232", rec@entry=0x28 <error: Cannot access memory at address 0x28>, index=index@entry=0xffff4c076f28, offsets=0xffff70062a30, mode=281472334905456, type=281470749427104, thr=0xffff4c078810, thr@entry=0x9f, mtr=0x0, mtr@entry=0xffff70063928) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/row/row0sel.cc:1270
13 0x0000aaaaacb6bb64 in row_search_mvcc (buf=buf@entry=0xffff4c080690 "\376\026", mode=mode@entry=PAGE_CUR_GE, prebuilt=0xffff4c077b98, match_mode=match_mode@entry=1, direction=direction@entry=0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/row/row0sel.cc:5181
14 0x0000aaaaacaae568 in ha_innobase::index_read (this=0xffff4c038a80, buf=0xffff4c080690 "\376\026", key_ptr=<optimized out>, key_len=768, find_flag=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc:9393
15 0x0000aaaaac9201cc in handler::ha_index_read_map (this=0xffff4c038a80, buf=0xffff4c080690 "\376\026", key=0xffff4c07ccf8 "", keypart_map=keypart_map@entry=18446744073709551615, find_flag=find_flag@entry=HA_READ_KEY_EXACT) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/handler.cc:2718
16 0x0000aaaaac9f36b0 in Rows_log_event::find_row (this=this@entry=0xffff4c030098, rgi=rgi@entry=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:13461
17 0x0000aaaaac9f3e44 in Update_rows_log_event::do_exec_row (this=0xffff4c030098, rgi=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:13936
18 0x0000aaaaac9e7ee8 in Rows_log_event::do_apply_event (this=0xffff4c030098, rgi=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:11101
19 0x0000aaaaac8ca4e8 in Log_event::apply_event (rgi=0xffff4c01b510, this=0xffff4c030098) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.h:1454
20 wsrep_apply_events (buf_len=0, events_buf=0x1, thd=0xffff4c0009a8) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_applier.cc:164
21 wsrep_apply_cb (ctx=0xffff4c0009a8, buf=0x1, buf_len=18446743528248705000, flags=<optimized out>, meta=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_applier.cc:267
22 0x0000ffff7322d29c in galera::TrxHandle::apply (this=this@entry=0xffff4c027960, recv_ctx=recv_ctx@entry=0xffff4c0009a8, apply_cb=apply_cb@entry=0xaaaaac8c9fe8 <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, meta=...) at /home/buildbot/buildbot/build/galera/src/trx_handle.cpp:317
23 0x0000ffff73239664 in apply_trx_ws (recv_ctx=recv_ctx@entry=0xffff4c0009a8, apply_cb=0xaaaaac8c9fe8 <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, commit_cb=0xaaaaac8ca8d0 <wsrep_commit_cb(void*, uint32_t, wsrep_trx_meta_t const*, wsrep_bool_t*, bool)>, trx=..., meta=...) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:34
24 0x0000ffff7323c0c4 in galera::ReplicatorSMM::apply_trx (this=this@entry=0xaaaac7c7ebc0, recv_ctx=recv_ctx@entry=0xffff4c0009a8, trx=trx@entry=0xffff4c027960) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:454
25 0x0000ffff7323e8b8 in galera::ReplicatorSMM::process_trx (this=0xaaaac7c7ebc0, recv_ctx=0xffff4c0009a8, trx=0xffff4c027960) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:1258
26 0x0000ffff73268f68 in galera::GcsActionSource::dispatch (this=this@entry=0xaaaac7c7f348, recv_ctx=recv_ctx@entry=0xffff4c0009a8, act=..., exit_loop=@0xffff7006535f: false) at /home/buildbot/buildbot/build/galera/src/gcs_action_source.cpp:115
27 0x0000ffff73269dd0 in galera::GcsActionSource::process (this=0xaaaac7c7f348, recv_ctx=0xffff4c0009a8, exit_loop=@0xffff7006535f: false) at /home/buildbot/buildbot/build/galera/src/gcs_action_source.cpp:180
28 0x0000ffff7323ef5c in galera::ReplicatorSMM::async_recv (this=0xaaaac7c7ebc0, recv_ctx=0xffff4c0009a8) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:362
29 0x0000ffff73217760 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /home/buildbot/buildbot/build/galera/src/wsrep_provider.cpp:244
30 0x0000aaaaac8cb344 in wsrep_replication_process (thd=0xffff4c0009a8) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc:486
31 0x0000aaaaac8bc3a0 in start_wsrep_THD (arg=arg@entry=0xaaaac7cb3e38) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_mysqld.cc:2173
32 0x0000aaaaaca89198 in pfs_spawn_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/perfschema/pfs.cc:1869
33 0x0000ffff784defc4 in start_thread (arg=0xaaaaaca890d8 <pfs_spawn_thread(void*)>) at pthread_create.c:335
34 0x0000ffff7821c3f0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

and background victim transaction kill with stack

Thread 28 (Thread 0xffff485fa070 (LWP 24870)):
0  0x0000ffff784e530c in __pthread_cond_wait (cond=cond@entry=0xaaaac83e98e0, mutex=mutex@entry=0xaaaac83e98b0) at pthread_cond_wait.c:186
1  0x0000aaaaacb10788 in os_event::wait (this=0xaaaac83e98a0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:158
2  os_event::wait_low (reset_sig_count=2, this=0xaaaac83e98a0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:325
3  os_event_wait_low (event=0xaaaac83e98a0, reset_sig_count=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:507
4  0x0000aaaaacb98480 in sync_array_wait_event (arr=arr@entry=0xaaaac7dbb450, cell=@0xffff485f96e8: 0xaaaac7dbb560) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/sync/sync0arr.cc:471
5  0x0000aaaaacab53c8 in TTASEventMutex<GenericPolicy>::enter (line=19524, filename=0xaaaaacf2ce40 "/home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc", max_delay=<optimized out>, max_spins=0, this=0xaaaac83cc8c0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/include/ib0mutex.h:516
6  PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0xaaaac83cc8c0, n_spins=<optimized out>, n_delay=<optimized out>, name=0xaaaaacf2ce40 "/home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc", line=19524) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/include/ib0mutex.h:637
7  0x0000aaaaacaaa52c in bg_wsrep_kill_trx (void_arg=0xffff4c057430) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc:19524
8  0x0000aaaaac79e7f0 in handle_manager (arg=arg@entry=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/sql_manager.cc:112
9  0x0000aaaaaca89198 in pfs_spawn_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/perfschema/pfs.cc:1869
10 0x0000ffff784defc4 in start_thread (arg=0xaaaaaca890d8 <pfs_spawn_thread(void*)>) at pthread_create.c:335
11 0x0000ffff7821c3f0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Fix:
====

Do not use THD::LOCK_thd_data mutex if we already hold lock_sys->mutex because it
will cause mutexing order violation. Victim transaction holding conflicting
locks can't be committed or rolled back while we hold lock_sys->mutex. Thus,
it is safe to do wsrep_thd_is_BF call with no additional mutexes.
2021-01-27 19:46:20 +02:00
Marko Mäkelä
68b2819342 Cleanup: Remove many C-style lock_get_ accessors
Let us prefer member functions to the old C-style accessor functions.
Also, prefer bitwise AND operations for checking multiple flags.
2021-01-27 18:41:58 +02:00
Marko Mäkelä
cbb0a60c57 Cleanup: Remove lock_get_size() 2021-01-27 18:02:11 +02:00
Marko Mäkelä
121d0f7f53 MDEV-20612: Speed up lock_table_other_has_incompatible()
dict_table_t::n_lock_x_or_s: Keep track of LOCK_S or LOCK_X on the table.

lock_table_other_has_incompatible(): In the likely case that no
transaction is waiting for or holding LOCK_S or LOCK_X on the table,
return early: conflicts cannot exist.

This is based on the idea of Zhai Weixiang, who reported MySQL Bug #72948.

lock_table_has_to_wait_in_queue(), lock_table_dequeue():
Extend the optimization, inspired by
mysql/mysql-server@bb7191d6cb
by Jakub Łopuszański.
2021-01-27 15:45:39 +02:00
Marko Mäkelä
3329f0ed0c Cleanup: Remove LOCK_REC (which was mutually exclusive with LOCK_TABLE) 2021-01-27 15:45:39 +02:00
Marko Mäkelä
e71e613353 MDEV-24671: Replace lock_wait_timeout_task with mysql_cond_timedwait()
lock_wait(): Replaces lock_wait_suspend_thread(). Wait for the lock to
be granted or the transaction to be killed using mysql_cond_timedwait()
or mysql_cond_wait().

lock_wait_end(): Replaces que_thr_end_lock_wait() and
lock_wait_release_thread_if_suspended().

lock_wait_timeout_task: Remove. The operating system kernel will
resume the mysql_cond_timedwait() in lock_wait(). An added benefit
is that innodb_lock_wait_timeout no longer has a 'jitter' of 1 second,
which was caused by this wake-up task waking up only once per second,
and then waking up any threads for which the timeout (which was only
measured in seconds) was exceeded.

innobase_kill_query(): Set trx->error_state=DB_INTERRUPTED,
so that a call trx_is_interrupted(trx) in lock_wait() can be avoided.

We will protect things more consistently with lock_sys.wait_mutex,
which will be moved below lock_sys.mutex in the latching order.

trx_lock_t::cond: Condition variable for !wait_lock, used with
lock_sys.wait_mutex.

srv_slot_t: Remove. Replaced by trx_lock_t::cond,

lock_grant_after_reset(): Merged to to lock_grant().

lock_rec_get_index_name(): Remove.

lock_sys_t: Introduce wait_pending, wait_count, wait_time, wait_time_max
that are protected by wait_mutex.

trx_lock_t::que_state: Remove.

que_thr_state_t: Remove QUE_THR_COMMAND_WAIT, QUE_THR_LOCK_WAIT.

que_thr_t: Remove is_active, start_running(), stop_no_error().

que_fork_t::n_active_thrs, trx_lock_t::n_active_thrs: Remove.
2021-01-27 15:45:39 +02:00
Marko Mäkelä
7f1ab8f742 Cleanups:
que_thr_t::fork_type: Remove.

QUE_THR_SUSPENDED, TRX_QUE_COMMITTING: Remove.

Cleanup lock_cancel_waiting_and_release()
2021-01-27 15:45:39 +02:00
Marko Mäkelä
898dcf93a8 Cleanup the lock creation
LOCK_MAX_N_STEPS_IN_DEADLOCK_CHECK, LOCK_MAX_DEPTH_IN_DEADLOCK_CHECK,
LOCK_RELEASE_INTERVAL: Replace with the bare use of the constants.

lock_rec_create_low(): Remove LOCK_PAGE_BITMAP_MARGIN altogether.
We already have REDZONE_SIZE as a 'safety margin' in AddressSanitizer
builds, to catch any out-of-bounds access.

lock_prdt_add_to_queue(): Avoid a useless search when enqueueing
a waiting lock request.

lock_prdt_lock(): Reduce the size of the trx->mutex critical section.
2021-01-27 15:45:38 +02:00
Marko Mäkelä
469da6c34d Cleanup: Remove trx_get_id_for_print()
Any transaction that has requested a lock must have trx->id!=0.

trx_print_low(): Distinguish non-locking or inactive transaction
objects by displaying the pointer in parentheses.

fill_trx_row(): Do not try to map trx->id to a pointer-based value.
2021-01-27 15:45:38 +02:00
Marko Mäkelä
3cef4f8f0f MDEV-515 Reduce InnoDB undo logging for insert into empty table
We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.

For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().

Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.

Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.

This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).

INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.

In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.

Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.

dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.

ins_node_t::bulk_insert: Whether bulk insert was initiated.

trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.

trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().

trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.

row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.

lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.

btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().

trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.

ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.

row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.

row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().

lock_sec_rec_cons_read_sees(): Replaced with lower-level code.

btr_root_page_init(): Refactored from btr_create().

dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.

ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.

This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
2021-01-25 18:41:27 +02:00
sjaakola
beaea31ab1 MDEV-23851 BF-BF Conflict issue because of UK GAP locks
Some DML operations on tables having unique secondary keys cause scanning
in the secondary index, for instance to find potential unique key violations
in the seconday index. This scanning may involve GAP locking in the index.
As this locking happens also when applying replication events in high priority
applier threads, there is a probabality for lock conflicts between two wsrep
high priority threads.

This PR avoids lock conflicts of high priority wsrep threads, which do
secondary index scanning e.g. for duplicate key detection.

The actual fix is the patch in sql_class.cc:thd_need_ordering_with(), where
we allow relaxed GAP locking protocol between wsrep high priority threads.
wsrep high priority threads (replication appliers, replayers and TOI processors)
are ordered by the replication provider, and they will not need serializability
support gained by secondary index GAP locks.

PR contains also a mtr test, which exercises a scenario where two replication
applier threads have a false positive conflict in GAP of unique secondary index.
The conflicting local committing transaction has to replay, and the test verifies
also that the replaying phase will not conflict with the latter repllication applier.
Commit also contains new test scenario for galera.galera_UK_conflict.test,
where replayer starts applying after a slave applier thread, with later seqno,
has advanced to commit phase. The applier and replayer have false positive GAP
lock conflict on secondary unique index, and replayer should ignore this.
This test scenario caused crash with earlier version in this PR, and to fix this,
the secondary index uniquenes checking has been relaxed even further.

Now innodb trx_t structure has new member: bool wsrep_UK_scan, which is set to
true, when high priority thread is performing unique secondary index scanning.
The member trx_t::wsrep_UK_scan is defined inside WITH_WSREP directive, to make
it possible to prepare a MariaDB build where this additional trx_t member is
not present and is not used in the code base. trx->wsrep_UK_scan is set to true
only for the duration of function call for: lock_rec_lock() trx->wsrep_UK_scan
is used only in lock_rec_has_to_wait() function to relax the need to wait if
wsrep_UK_scan is set and conflicting transaction is also high priority.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-01-18 08:09:06 +02:00
Marko Mäkelä
666565c7f0 Merge 10.5 into 10.6 2021-01-11 17:32:08 +02:00
Marko Mäkelä
8de233af81 Merge 10.4 into 10.5 2021-01-11 16:29:51 +02:00
Marko Mäkelä
fd5e103aa4 Merge 10.3 into 10.4 2021-01-11 10:35:06 +02:00
Marko Mäkelä
5a1a714187 Merge 10.2 into 10.3 (except MDEV-17556)
The fix of MDEV-17556 (commit e25623e78a
and commit 61a362c949) has been
omitted due to conflicts and will have to be applied separately later.
2021-01-11 09:41:54 +02:00
Jan Lindström
775fccea0c MDEV-23536 : Race condition between KILL and transaction commit
A race condition may occur between the execution of transaction commit,
and an execution of a KILL statement that would attempt to abort that
transaction.

MDEV-17092 worked around this race condition by modifying InnoDB code.
After that issue was closed, Sergey Vojtovich pointed out that this
race condition would better be fixed above the storage engine layer:

If you look carefully into the above, you can conclude that
thd->free_connection() can be called concurrently with
KILL/thd->awake(). Which is the bug. And it is partially fixed in
THD::~THD(), that is destructor waits for KILL completion:

Fix: Add necessary mutex operations to THD::free_connection()
and move WSREP specific code also there. This ensures that no
one is using THD while we do free_connection(). These mutexes
will also ensures that there can't be concurrent KILL/THD::awake().

innobase_kill_query
  We can now remove usage of trx_sys_mutex introduced on MDEV-17092.

trx_t::free()
  Poison trx->state and trx->mysql_thd

This patch is validated with an RQG run similar to the one that
reproduced MDEV-17092.
2021-01-08 17:11:54 +02:00
Marko Mäkelä
8a4ca33938 Cleanup: Declare trx_weight_ge() inline 2021-01-05 14:18:10 +02:00
Marko Mäkelä
bd52f1a2dd Cleanup: Remove lock_number_of_rows_locked()
Let us access trx->lock.n_rec_locks directly.
2021-01-04 15:30:34 +02:00
Marko Mäkelä
a64cb6d265 Merge 10.3 into 10.4 2020-12-28 13:46:22 +02:00
Marko Mäkelä
7f037b8c9f Merge 10.2 into 10.3 2020-12-28 13:30:20 +02:00
sjaakola
8e3e87d2fc MDEV-23851 MDEV-24229 BF-BF conflict issues
Issues MDEV-23851 and MDEV-24229 are probably duplicates and are caused by the new self-asserting function lock0lock.cc:wsrep_assert_no_bf_bf_wait().
The criteria for asserting is too strict and does not take in consideration scenarios of "false positive" lock conflicts, which are resolved by replaying the local transaction.
As a fix, this PR is relaxing the assert criteria by two conditions, which skip assert if high priority transactions are locking in correct order or if conflicting high priority lock holder is aborting and has just not yet released the lock.

Alternative fix would be to remove wsrep_assert_no_bf_bf_wait() altogether, or remove the assert in this function and let it only print warnings in error log.
But in my high conflict rate multi-master test scenario, this relaxed asserting appears to be safe.

This PR also removes two wsrep_report_bf_lock_wait() calls in innodb lock manager, which cause mutex access assert in debug builds.

Foreign key appending missed handling of data types of float and double in INSERT execution. This is not directly related to the actual issue here but is fixed in this PR nevertheless. Missing these foreign keys values in certification could cause problems in some multi-master load scenarios.

Finally, some problem reports suggest that some of the issues reported in MDEV-23851 might relate to false positive lock conflicts over unique secondary index gaps. There is separate work for relaxing UK index gap locking of replication appliers, and separate PR will be submitted for it, with a related mtr test as well.
2020-12-28 09:06:16 +02:00
Marko Mäkelä
cf2480dd77 MDEV-21452: Retain the watchdog only on dict_sys.mutex, for performance
Most hangs seem to involve dict_sys.mutex. While holding lock_sys.mutex
we rarely acquire any buffer pool page latches, which are a frequent
source of potential hangs.
2020-12-15 17:56:18 +02:00
Marko Mäkelä
ff5d306e29 MDEV-21452: Replace ib_mutex_t with mysql_mutex_t
SHOW ENGINE INNODB MUTEX functionality is completely removed,
as are the InnoDB latching order checks.

We will enforce innodb_fatal_semaphore_wait_threshold
only for dict_sys.mutex and lock_sys.mutex.

dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex.

lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex.

FIXME: srv_sys should be removed altogether; it is duplicating tpool
functionality.

fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must
not hold fil_system.mutex.

fil_close_all_files(): To prevent SAFE_MUTEX warnings for
fil_space_destroy_crypt_data(), we must not hold fil_system.mutex
while invoking fil_space_free_low() on a detached tablespace.
2020-12-15 17:56:18 +02:00
Marko Mäkelä
38fd7b7d91 MDEV-21452: Replace all direct use of os_event_t
Let us replace os_event_t with mysql_cond_t, and replace the
necessary ib_mutex_t with mysql_mutex_t so that they can be
used with condition variables.

Also, let us replace polling (os_thread_sleep() or timed waits)
with plain mysql_cond_wait() wherever possible.

Furthermore, we will use the lightweight srw_mutex for trx_t::mutex,
to hopefully reduce contention on lock_sys.mutex.

FIXME: Add test coverage of
mariabackup --backup --kill-long-queries-timeout
2020-12-15 17:56:17 +02:00