mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 11:01:52 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	8d16da1487	MDEV-24789: Reduce lock_sys mutex contention further lock_sys_t::deadlock_check(): Assume that only lock_sys.wait_mutex is being held by the caller. lock_sys_t::rd_lock_try(): New function. lock_sys_t::cancel(trx_t): Kill an active transaction that may be holding a lock. lock_sys_t::cancel(trx_t, lock_t*): Cancel a waiting lock request. lock_trx_handle_wait(): Avoid acquiring mutexes in some cases, and in never acquire lock_sys.latch in exclusive mode. This function is only invoked in a semi-consistent read (locking a clustered index record only if it matches the search condition). Normally, lock_wait() will take care of lock waits. lock_wait(): Invoke the new function lock_sys_t::cancel() at the end, to avoid acquiring exclusive lock_sys.latch. lock_rec_other_trx_holds_expl(): Use LockGuard instead of LockMutexGuard. lock_release_autoinc_locks(): Explicitly acquire table->lock_mutex, in case only a shared lock_sys.latch is being held. Deadlock::report() will still hold exclusive lock_sys.latch while invoking lock_cancel_waiting_and_release(). lock_cancel_waiting_and_release(): Acquire trx->mutex in this function, instead of expecting the caller to do so. lock_unlock_table_autoinc(): Only acquire shared lock_sys.latch. lock_table_has_locks(): Do not acquire lock_sys.latch at all. Deadlock::check_and_resolve(): Only acquire shared lock_sys.latchm for invoking lock_sys_t::cancel(trx, wait_lock). innobase_query_caching_table_check_low(), row_drop_tables_for_mysql_in_background(): Do not acquire lock_sys.latch.	2021-03-02 14:26:33 +02:00
Marko Mäkelä	01b44c054d	MDEV-25026 Various code paths are accessing freed pages The test case encryption.innodb_encrypt_freed was failing in MemorySanitizer builds. recv_recover_page(): Mark non-recovered pages as freed. fil_crypt_rotate_page(): Before comparing the block->frame contents, check if the block was marked as freed. Other places: Whenever using BUF_GET_POSSIBLY_FREED, check the block->page.status before accessing the page frame. (Both uses of BUF_GET_IF_IN_POOL should be correct now.)	2021-03-02 11:51:22 +02:00
Marko Mäkelä	7cf4419fc4	MDEV-24789: Reduce lock_sys.wait_mutex contention A performance regression was introduced by commit `e71e613353` (MDEV-24671) and mostly addressed by commit `455514c800`. The regression is likely caused by increased contention lock_sys.latch (former lock_sys.mutex), possibly indirectly caused by contention on lock_sys.wait_mutex. This change aims to reduce both, but further improvements will be needed. lock_wait(): Minimize the lock_sys.wait_mutex hold time. lock_sys_t::deadlock_check(): Add a parameter for indicating whether lock_sys.latch is exclusively locked. trx_t::was_chosen_as_deadlock_victim: Always use atomics. lock_wait_wsrep(): Assume that no mutex is being held. Deadlock::report(): Always kill the victim transaction. lock_sys_t::timeout: New counter to back MONITOR_TIMEOUT.	2021-02-26 14:58:48 +02:00
Marko Mäkelä	21987e5919	MDEV-20612 fixup: Reduce hash table lookups Let us calculate the hash table cell address while we are calculating the latch address, to avoid repeated computations of the address. The latch address can be derived from the cell address with a simple bitmask operation.	2021-02-24 14:47:42 +02:00
Marko Mäkelä	43b239a081	MDEV-24915 Galera conflict resolution is unnecessarily complex The fix of MDEV-23328 introduced a background thread for killing conflicting transactions. Thanks to the refactoring that was conducted in MDEV-24671, the high-priority ("brute-force") applier thread can kill the conflicting transactions itself, before waiting for the locks to be finally released (after the conflicting transactions have been rolled back). This also allows us to remove the hack LockGGuard that had to be added in MDEV-20612, and remove Galera-related function parameters from lock creation.	2021-02-18 12:16:51 +02:00
Marko Mäkelä	18dc5b0192	MDEV-20612 fixup: Remove a redundant check lock_wait_rpl_report(): Only reload trx->lock.wait_lock if lock_sys.wait_mutex had to be released and reacquired.	2021-02-18 12:02:36 +02:00
Marko Mäkelä	94b4578704	Merge 10.5 into 10.6	2021-02-17 19:39:05 +02:00
Marko Mäkelä	9f13670004	MDEV-24738 fixup: heap-use-after-poison in lock_sys_t::deadlock_check() Deadlock::report(): Require the caller to acquire lock_sys.latch if invoking on a transaction that is now owned by the current thread.	2021-02-17 17:44:23 +02:00
Marko Mäkelä	c68007d958	MDEV-24738 Improve the InnoDB deadlock checker A new configuration parameter innodb_deadlock_report is introduced: * innodb_deadlock_report=off: Do not report any details of deadlocks. * innodb_deadlock_report=basic: Report transactions and waiting locks. * innodb_deadlock_report=full (default): Report also the blocking locks. The improved deadlock checker will consider all involved transactions in one loop, even if the deadlock loop includes several transactions. The theoretical maximum number of transactions that can be involved in a deadlock is `innodb_page_size` * 8, limited by the persistent data structures. Note: Similar to mysql/mysql-server@3859219875 our deadlock checker will consider at most one blocking transaction for each waiting transaction. The new field trx->lock.wait_trx be nullptr if and only if trx->lock.wait_lock is nullptr. Note that trx->lock.wait_lock->trx == trx (the waiting transaction), while trx->lock.wait_trx points to one of the transactions whose lock is conflicting with trx->lock.wait_lock. Considering only one blocking transaction will greatly simplify our deadlock checker, but it may also make the deadlock checker blind to some deadlocks where the deadlock cycle is 'hidden' by the fact that the registered trx->lock.wait_trx is not actually waiting for any InnoDB lock, but something else. So, instead of deadlocks, sometimes lock wait timeout may be reported. To improve on this, whenever trx->lock.wait_trx is changed, we will register further 'candidate' transactions in Deadlock::to_check(), and check for 'revealed' deadlocks as soon as possible, in lock_release() and innobase_kill_query(). The old DeadlockChecker was holding lock_sys.latch, even though using lock_sys.wait_mutex should be less contended (and thus preferred) in the likely case that no deadlock is present. lock_wait(): Defer the deadlock check to this function, instead of executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting(). DeadlockChecker: Complete rewrite: (1) Explicitly keep track of transactions that are being waited for, in trx->lock.wait_trx, protected by lock_sys.wait_mutex. Previously, we were painstakingly traversing the lock heaps while blocking concurrent registration or removal of any locks (even uncontended ones). (2) Use Brent's cycle-detection algorithm for deadlock detection, traversing each trx->lock.wait_trx edge at most 2 times. (3) If a deadlock is detected, release lock_sys.wait_mutex, acquire LockMutexGuard, re-acquire lock_sys.wait_mutex and re-invoke find_cycle() to find out whether the deadlock is still present. (4) Display information on all transactions that are involved in the deadlock, and choose a victim to be rolled back. lock_sys.deadlocks: Replaces lock_deadlock_found. Protected by wait_mutex. Deadlock::find_cycle(): Quickly find a cycle of trx->lock.wait_trx... using Brent's cycle detection algorithm. Deadlock::report(): Report a deadlock cycle that was found by Deadlock::find_cycle(), and choose a victim with the least weight. Altogether, we may traverse each trx->lock.wait_trx edge up to 5 times (2*find_cycle()+1 time for reporting and choosing the victim). Deadlock::check_and_resolve(): Find and resolve a deadlock. lock_wait_rpl_report(): Report the waits-for information to replication. This used to be executed as part of DeadlockChecker. Replication must know the waits-for relations even if no deadlocks are present in InnoDB. Reviewed by: Vladislav Vaintroub	2021-02-17 12:44:08 +02:00
Marko Mäkelä	584e52118c	MDEV-20612 fixup: Make comments refer to lock_sys.latch	2021-02-17 12:18:03 +02:00
Marko Mäkelä	e5d83ad472	MDEV-20612 fixup: Fix a memory leak in buffer pool resize	2021-02-16 11:27:13 +02:00
Sergei Golubchik	25d9d2e37f	Merge branch 'bb-10.4-release' into bb-10.5-release	2021-02-15 16:43:15 +01:00
Sergei Golubchik	00a313ecf3	Merge branch 'bb-10.3-release' into bb-10.4-release Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution" was null-merged. 10.4 version of the fix is coming up separately	2021-02-12 17:44:22 +01:00
Marko Mäkelä	26d6224dd6	MDEV-20612: Enable concurrent lock_release() lock_release_try(): Try to release locks while only holding shared lock_sys.latch. lock_release(): If 5 attempts of lock_release_try() fail, proceed to acquire exclusive lock_sys.latch.	2021-02-12 17:44:58 +02:00
Marko Mäkelä	b08448de64	MDEV-20612: Partition lock_sys.latch We replace the old lock_sys.mutex (which was renamed to lock_sys.latch) with a combination of a global lock_sys.latch and table or page hash lock mutexes. The global lock_sys.latch can be acquired in exclusive mode, or it can be acquired in shared mode and another mutex will be acquired to protect the locks for a particular page or a table. This is inspired by mysql/mysql-server@1d259b87a6 but the optimization of lock_release() will be done in the next commit. Also, we will interleave mutexes with the hash table elements, similar to how buf_pool.page_hash was optimized in commit `5155a300fa` (MDEV-22871). dict_table_t::autoinc_trx: Use Atomic_relaxed. dict_table_t::autoinc_mutex: Use srw_mutex in order to reduce the memory footprint. On 64-bit Linux or OpenBSD, both this and the new dict_table_t::lock_mutex should be 32 bits and be stored in the same 64-bit word. On Microsoft Windows, the underlying SRWLOCK is 32 or 64 bits, and on other systems, sizeof(pthread_mutex_t) can be much larger. ib_lock_t::trx_locks, trx_lock_t::trx_locks: Document the new rules. Writers must assert lock_sys.is_writer() \|\| trx->mutex_is_owner(). LockGuard: A RAII wrapper for acquiring a page hash table lock. LockGGuard: Like LockGuard, but when Galera Write-Set Replication is enabled, we must acquire all shards, for updating arbitrary trx_locks. LockMultiGuard: A RAII wrapper for acquiring two page hash table locks. lock_rec_create_wsrep(), lock_table_create_wsrep(): Special Galera conflict resolution in non-inlined functions in order to keep the common code paths shorter. lock_sys_t::prdt_page_free_from_discard(): Refactored from lock_prdt_page_free_from_discard() and lock_rec_free_all_from_discard_page(). trx_t::commit_tables(): Replaces trx_update_mod_tables_timestamp(). lock_release(): Let trx_t::commit_tables() invalidate the query cache for those tables that were actually modified by the transaction. Merge lock_check_dict_lock() to lock_release(). We must never release lock_sys.latch while holding any lock_sys_t::hash_latch. Failure to do that could lead to memory corruption if the buffer pool is resized between the time lock_sys.latch is released and the hash_latch is released.	2021-02-12 17:44:32 +02:00
Marko Mäkelä	b01d8e1a33	MDEV-20612: Replace lock_sys.mutex with lock_sys.latch For now, we will acquire the lock_sys.latch only in exclusive mode, that is, use it as a mutex. This is preparation for the next commit where we will introduce a less intrusive alternative, combining a shared lock_sys.latch with dict_table_t::lock_mutex or a mutex embedded in lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash.	2021-02-11 14:52:10 +02:00
Marko Mäkelä	903464929c	MDEV-20612 preparation: LockMutexGuard Let us use the RAII wrapper LockMutexGuard for most operations where lock_sys.mutex is acquired.	2021-02-11 14:36:11 +02:00
Marko Mäkelä	2e64513fba	MDEV-20612 preparation: Fewer calls to buf_page_t::id()	2021-02-11 12:48:07 +02:00
Marko Mäkelä	74ab97f58f	Cleanup: Remove lock_trx_lock_list_init(), lock_table_get_n_locks()	2021-02-07 11:18:21 +02:00
Marko Mäkelä	487fbc2e15	MDEV-21452 fixup: Introduce trx_t::mutex_is_owner() When we replaced trx_t::mutex with srw_mutex in commit `38fd7b7d91` we lost the SAFE_MUTEX instrumentation. Let us introduce a replacement and restore the assertions.	2021-02-05 16:37:06 +02:00
Marko Mäkelä	455514c800	MDEV-24789: Try to reduce mutex contention	2021-02-05 16:16:44 +02:00
Marko Mäkelä	3e45f8e36a	MDEV-24789: Reduce sizeof(trx_lock_t) trx_lock_t::cond: Use pthread_cond_t directly, because no instrumentation will ever be used. This saves sizeof(void*) and removes some duplicated inline code. trx_lock_t::was_chosen_as_wsrep_victim: Fold into trx_lock_t::was_chosen_as_deadlock_victim. trx_lock_t::cancel, trx_lock_t::rec_cached, trx_lock_t::table_cached: Use only one byte of storage, reducing memory alignment waste. On AMD64 GNU/Linux, MDEV-24671 caused a sizeof(trx_lock_t) increase of 48 bytes (plus the PLUGIN_PERFSCHEMA overhead of trx_lock_t::cond). These changes should save 32 bytes.	2021-02-05 13:15:56 +02:00
Marko Mäkelä	465bdabb7a	Cleanup: Reduce some lock_sys.mutex contention lock_table(): Remove the constant parameter flags=0. lock_table_resurrect(): Merge lock_table_ix_resurrect() and lock_table_x_resurrect(). lock_rec_lock(): Only acquire LockMutexGuard if lock_table_has() does not hold.	2021-02-05 13:14:50 +02:00
Marko Mäkelä	de407e7cb4	MDEV-24731 fixup: bogus assertion DeadlockChecker::search(): Move a bogus assertion into a condition. If the current transaction is waiting for a table lock (on something else than an auto-increment lock), it is well possible that other transactions are holding not only a conflicting lock, but also an auto-increment lock. This mistake was noticed during the testing of MDEV-24731, but it was accidentally introduced in commit `5f46385764`. lock_wait_end(): Remove an unused variable, and add an assertion.	2021-02-05 08:35:15 +02:00
Marko Mäkelä	5f46385764	MDEV-24731 Excessive mutex contention in DeadlockChecker::check_and_resolve() The DeadlockChecker expects to be able to freeze the waits-for graph. Hence, it is best executed somewhere where we are not holding any additional mutexes. lock_wait(): Defer the deadlock check to this function, instead of executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting(). DeadlockChecker::trx_rollback(): Merge with the only caller, check_and_resolve(). LockMutexGuard: RAII accessor for lock_sys.mutex. lock_sys.deadlocks: Replaces lock_deadlock_found. trx_t: Clean up some comments.	2021-02-04 16:38:07 +02:00
Sergei Golubchik	60ea09eae6	Merge branch '10.2' into 10.3	2021-02-01 13:49:33 +01:00
Jan Lindström	75546dfbb1	MDEV-24704 : Galera test failure on galera.galera_nopk_unicode Analysis: ========= Reason for test failure was a mutex deadlock between DeadlockChecker with stack Thread 6 (Thread 0xffff70066070 (LWP 24667)): 0 0x0000ffff784e850c in __lll_lock_wait (futex=futex@entry=0xffff04002258, private=0) at lowlevellock.c:46 1 0x0000ffff784e19f0 in __GI___pthread_mutex_lock (mutex=mutex@entry=0xffff04002258) at pthread_mutex_lock.c:135 2 0x0000aaaaac8cd014 in inline_mysql_mutex_lock (src_file=0xaaaaacea0f28 "/home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc", src_line=762, that=0xffff04002258) at /home/buildbot/buildbot/build/mariadb-10.2.37/include/mysql/psi/mysql_thread.h:675 3 wsrep_thd_is_BF (thd=0xffff040009a8, sync=sync@entry=1 '\001') at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc:762 4 0x0000aaaaacadce68 in lock_rec_has_to_wait (for_locking=false, lock_is_on_supremum=<optimized out>, lock2=0xffff628952d0, type_mode=291, trx=0xffff62894070) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:826 5 lock_has_to_wait (lock1=<optimized out>, lock2=0xffff628952d0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:873 6 0x0000aaaaacadd0b0 in DeadlockChecker::search (this=this@entry=0xffff70061fe8) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:7142 7 0x0000aaaaacae2dd8 in DeadlockChecker::check_and_resolve (lock=lock@entry=0xffff62894120, trx=trx@entry=0xffff62894070) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:7286 8 0x0000aaaaacae3070 in lock_rec_enqueue_waiting (c_lock=0xffff628952d0, type_mode=type_mode@entry=3, block=block@entry=0xffff62076c40, heap_no=heap_no@entry=2, index=index@entry=0xffff4c076f28, thr=thr@entry=0xffff4c078810, prdt=prdt@entry=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:1796 9 0x0000aaaaacae3900 in lock_rec_lock_slow (thr=0xffff4c078810, index=0xffff4c076f28, heap_no=2, block=0xffff62076c40, mode=3, impl=0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:2106 10 lock_rec_lock (impl=false, mode=3, block=0xffff62076c40, heap_no=2, index=0xffff4c076f28, thr=0xffff4c078810) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:2168 11 0x0000aaaaacae3ee8 in lock_sec_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0xffff62076c40, rec=rec@entry=0xffff6240407f "\303\221\342\200\232\303\220\302\265\303\220\302\272\303\221\302\201\303\221\342\200\232", index=index@entry=0xffff4c076f28, offsets=0xffff4c080690, offsets@entry=0xffff70062a30, mode=LOCK_X, mode@entry=1653162096, gap_mode=0, gap_mode@entry=281470749427104, thr=thr@entry=0xffff4c078810) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:6082 12 0x0000aaaaacb684c4 in sel_set_rec_lock (pcur=0xaaaac841c270, pcur@entry=0xffff4c077d58, rec=0xffff6240407f "\303\221\342\200\232\303\220\302\265\303\220\302\272\303\221\302\201\303\221\342\200\232", rec@entry=0x28 <error: Cannot access memory at address 0x28>, index=index@entry=0xffff4c076f28, offsets=0xffff70062a30, mode=281472334905456, type=281470749427104, thr=0xffff4c078810, thr@entry=0x9f, mtr=0x0, mtr@entry=0xffff70063928) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/row/row0sel.cc:1270 13 0x0000aaaaacb6bb64 in row_search_mvcc (buf=buf@entry=0xffff4c080690 "\376\026", mode=mode@entry=PAGE_CUR_GE, prebuilt=0xffff4c077b98, match_mode=match_mode@entry=1, direction=direction@entry=0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/row/row0sel.cc:5181 14 0x0000aaaaacaae568 in ha_innobase::index_read (this=0xffff4c038a80, buf=0xffff4c080690 "\376\026", key_ptr=<optimized out>, key_len=768, find_flag=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc:9393 15 0x0000aaaaac9201cc in handler::ha_index_read_map (this=0xffff4c038a80, buf=0xffff4c080690 "\376\026", key=0xffff4c07ccf8 "", keypart_map=keypart_map@entry=18446744073709551615, find_flag=find_flag@entry=HA_READ_KEY_EXACT) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/handler.cc:2718 16 0x0000aaaaac9f36b0 in Rows_log_event::find_row (this=this@entry=0xffff4c030098, rgi=rgi@entry=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:13461 17 0x0000aaaaac9f3e44 in Update_rows_log_event::do_exec_row (this=0xffff4c030098, rgi=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:13936 18 0x0000aaaaac9e7ee8 in Rows_log_event::do_apply_event (this=0xffff4c030098, rgi=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:11101 19 0x0000aaaaac8ca4e8 in Log_event::apply_event (rgi=0xffff4c01b510, this=0xffff4c030098) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.h:1454 20 wsrep_apply_events (buf_len=0, events_buf=0x1, thd=0xffff4c0009a8) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_applier.cc:164 21 wsrep_apply_cb (ctx=0xffff4c0009a8, buf=0x1, buf_len=18446743528248705000, flags=<optimized out>, meta=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_applier.cc:267 22 0x0000ffff7322d29c in galera::TrxHandle::apply (this=this@entry=0xffff4c027960, recv_ctx=recv_ctx@entry=0xffff4c0009a8, apply_cb=apply_cb@entry=0xaaaaac8c9fe8 <wsrep_apply_cb(void, void const, size_t, uint32_t, wsrep_trx_meta_t const)>, meta=...) at /home/buildbot/buildbot/build/galera/src/trx_handle.cpp:317 23 0x0000ffff73239664 in apply_trx_ws (recv_ctx=recv_ctx@entry=0xffff4c0009a8, apply_cb=0xaaaaac8c9fe8 <wsrep_apply_cb(void, void const, size_t, uint32_t, wsrep_trx_meta_t const)>, commit_cb=0xaaaaac8ca8d0 <wsrep_commit_cb(void, uint32_t, wsrep_trx_meta_t const, wsrep_bool_t, bool)>, trx=..., meta=...) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:34 24 0x0000ffff7323c0c4 in galera::ReplicatorSMM::apply_trx (this=this@entry=0xaaaac7c7ebc0, recv_ctx=recv_ctx@entry=0xffff4c0009a8, trx=trx@entry=0xffff4c027960) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:454 25 0x0000ffff7323e8b8 in galera::ReplicatorSMM::process_trx (this=0xaaaac7c7ebc0, recv_ctx=0xffff4c0009a8, trx=0xffff4c027960) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:1258 26 0x0000ffff73268f68 in galera::GcsActionSource::dispatch (this=this@entry=0xaaaac7c7f348, recv_ctx=recv_ctx@entry=0xffff4c0009a8, act=..., exit_loop=@0xffff7006535f: false) at /home/buildbot/buildbot/build/galera/src/gcs_action_source.cpp:115 27 0x0000ffff73269dd0 in galera::GcsActionSource::process (this=0xaaaac7c7f348, recv_ctx=0xffff4c0009a8, exit_loop=@0xffff7006535f: false) at /home/buildbot/buildbot/build/galera/src/gcs_action_source.cpp:180 28 0x0000ffff7323ef5c in galera::ReplicatorSMM::async_recv (this=0xaaaac7c7ebc0, recv_ctx=0xffff4c0009a8) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:362 29 0x0000ffff73217760 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /home/buildbot/buildbot/build/galera/src/wsrep_provider.cpp:244 30 0x0000aaaaac8cb344 in wsrep_replication_process (thd=0xffff4c0009a8) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc:486 31 0x0000aaaaac8bc3a0 in start_wsrep_THD (arg=arg@entry=0xaaaac7cb3e38) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_mysqld.cc:2173 32 0x0000aaaaaca89198 in pfs_spawn_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/perfschema/pfs.cc:1869 33 0x0000ffff784defc4 in start_thread (arg=0xaaaaaca890d8 <pfs_spawn_thread(void)>) at pthread_create.c:335 34 0x0000ffff7821c3f0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89 and background victim transaction kill with stack Thread 28 (Thread 0xffff485fa070 (LWP 24870)): 0 0x0000ffff784e530c in __pthread_cond_wait (cond=cond@entry=0xaaaac83e98e0, mutex=mutex@entry=0xaaaac83e98b0) at pthread_cond_wait.c:186 1 0x0000aaaaacb10788 in os_event::wait (this=0xaaaac83e98a0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:158 2 os_event::wait_low (reset_sig_count=2, this=0xaaaac83e98a0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:325 3 os_event_wait_low (event=0xaaaac83e98a0, reset_sig_count=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:507 4 0x0000aaaaacb98480 in sync_array_wait_event (arr=arr@entry=0xaaaac7dbb450, cell=@0xffff485f96e8: 0xaaaac7dbb560) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/sync/sync0arr.cc:471 5 0x0000aaaaacab53c8 in TTASEventMutex<GenericPolicy>::enter (line=19524, filename=0xaaaaacf2ce40 "/home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc", max_delay=<optimized out>, max_spins=0, this=0xaaaac83cc8c0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/include/ib0mutex.h:516 6 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0xaaaac83cc8c0, n_spins=<optimized out>, n_delay=<optimized out>, name=0xaaaaacf2ce40 "/home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc", line=19524) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/include/ib0mutex.h:637 7 0x0000aaaaacaaa52c in bg_wsrep_kill_trx (void_arg=0xffff4c057430) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc:19524 8 0x0000aaaaac79e7f0 in handle_manager (arg=arg@entry=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/sql_manager.cc:112 9 0x0000aaaaaca89198 in pfs_spawn_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/perfschema/pfs.cc:1869 10 0x0000ffff784defc4 in start_thread (arg=0xaaaaaca890d8 <pfs_spawn_thread(void*)>) at pthread_create.c:335 11 0x0000ffff7821c3f0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89 Fix: ==== Do not use THD::LOCK_thd_data mutex if we already hold lock_sys->mutex because it will cause mutexing order violation. Victim transaction holding conflicting locks can't be committed or rolled back while we hold lock_sys->mutex. Thus, it is safe to do wsrep_thd_is_BF call with no additional mutexes.	2021-01-27 19:46:20 +02:00
Marko Mäkelä	68b2819342	Cleanup: Remove many C-style lock_get_ accessors Let us prefer member functions to the old C-style accessor functions. Also, prefer bitwise AND operations for checking multiple flags.	2021-01-27 18:41:58 +02:00
Marko Mäkelä	cbb0a60c57	Cleanup: Remove lock_get_size()	2021-01-27 18:02:11 +02:00
Marko Mäkelä	121d0f7f53	MDEV-20612: Speed up lock_table_other_has_incompatible() dict_table_t::n_lock_x_or_s: Keep track of LOCK_S or LOCK_X on the table. lock_table_other_has_incompatible(): In the likely case that no transaction is waiting for or holding LOCK_S or LOCK_X on the table, return early: conflicts cannot exist. This is based on the idea of Zhai Weixiang, who reported MySQL Bug #72948. lock_table_has_to_wait_in_queue(), lock_table_dequeue(): Extend the optimization, inspired by mysql/mysql-server@bb7191d6cb by Jakub Łopuszański.	2021-01-27 15:45:39 +02:00
Marko Mäkelä	3329f0ed0c	Cleanup: Remove LOCK_REC (which was mutually exclusive with LOCK_TABLE)	2021-01-27 15:45:39 +02:00
Marko Mäkelä	e71e613353	MDEV-24671: Replace lock_wait_timeout_task with mysql_cond_timedwait() lock_wait(): Replaces lock_wait_suspend_thread(). Wait for the lock to be granted or the transaction to be killed using mysql_cond_timedwait() or mysql_cond_wait(). lock_wait_end(): Replaces que_thr_end_lock_wait() and lock_wait_release_thread_if_suspended(). lock_wait_timeout_task: Remove. The operating system kernel will resume the mysql_cond_timedwait() in lock_wait(). An added benefit is that innodb_lock_wait_timeout no longer has a 'jitter' of 1 second, which was caused by this wake-up task waking up only once per second, and then waking up any threads for which the timeout (which was only measured in seconds) was exceeded. innobase_kill_query(): Set trx->error_state=DB_INTERRUPTED, so that a call trx_is_interrupted(trx) in lock_wait() can be avoided. We will protect things more consistently with lock_sys.wait_mutex, which will be moved below lock_sys.mutex in the latching order. trx_lock_t::cond: Condition variable for !wait_lock, used with lock_sys.wait_mutex. srv_slot_t: Remove. Replaced by trx_lock_t::cond, lock_grant_after_reset(): Merged to to lock_grant(). lock_rec_get_index_name(): Remove. lock_sys_t: Introduce wait_pending, wait_count, wait_time, wait_time_max that are protected by wait_mutex. trx_lock_t::que_state: Remove. que_thr_state_t: Remove QUE_THR_COMMAND_WAIT, QUE_THR_LOCK_WAIT. que_thr_t: Remove is_active, start_running(), stop_no_error(). que_fork_t::n_active_thrs, trx_lock_t::n_active_thrs: Remove.	2021-01-27 15:45:39 +02:00
Marko Mäkelä	7f1ab8f742	Cleanups: que_thr_t::fork_type: Remove. QUE_THR_SUSPENDED, TRX_QUE_COMMITTING: Remove. Cleanup lock_cancel_waiting_and_release()	2021-01-27 15:45:39 +02:00
Marko Mäkelä	898dcf93a8	Cleanup the lock creation LOCK_MAX_N_STEPS_IN_DEADLOCK_CHECK, LOCK_MAX_DEPTH_IN_DEADLOCK_CHECK, LOCK_RELEASE_INTERVAL: Replace with the bare use of the constants. lock_rec_create_low(): Remove LOCK_PAGE_BITMAP_MARGIN altogether. We already have REDZONE_SIZE as a 'safety margin' in AddressSanitizer builds, to catch any out-of-bounds access. lock_prdt_add_to_queue(): Avoid a useless search when enqueueing a waiting lock request. lock_prdt_lock(): Reduce the size of the trx->mutex critical section.	2021-01-27 15:45:38 +02:00
Marko Mäkelä	469da6c34d	Cleanup: Remove trx_get_id_for_print() Any transaction that has requested a lock must have trx->id!=0. trx_print_low(): Distinguish non-locking or inactive transaction objects by displaying the pointer in parentheses. fill_trx_row(): Do not try to map trx->id to a pointer-based value.	2021-01-27 15:45:38 +02:00
Marko Mäkelä	3cef4f8f0f	MDEV-515 Reduce InnoDB undo logging for insert into empty table We implement an idea that was suggested by Michael 'Monty' Widenius in October 2017: When InnoDB is inserting into an empty table or partition, we can write a single undo log record TRX_UNDO_EMPTY, which will cause ROLLBACK to clear the table. For this to work, the insert into an empty table or partition must be covered by an exclusive table lock that will be held until the transaction has been committed or rolled back, or the INSERT operation has been rolled back (and the table is empty again), in lock_table_x_unlock(). Clustered index records that are covered by the TRX_UNDO_EMPTY record will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot be distinguished from what MDEV-12288 leaves behind after purging the history of row-logged operations. Concurrent non-locking reads must be adjusted: If the read view was created before the INSERT into an empty table, then we must continue to imagine that the table is empty, and not try to read any records. If the read view was created after the INSERT was committed, then all records must be visible normally. To implement this, we introduce the field dict_table_t::bulk_trx_id. This special handling only applies to the very first INSERT statement of a transaction for the empty table or partition. If a subsequent statement in the transaction is modifying the initially empty table again, we must enable row-level undo logging, so that we will be able to roll back to the start of the statement in case of an error (such as duplicate key). INSERT IGNORE will continue to use row-level logging and locking, because implementing it would require the ability to roll back the latest row. Since the undo log that we write only allows us to roll back the entire statement, we cannot support INSERT IGNORE. We will introduce a handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage engines that INSERT IGNORE is being executed. In many test cases, we add an extra record to the table, so that during the 'interesting' part of the test, row-level locking and logging will be used. Replicas will continue to use row-level logging and locking until MDEV-24622 has been addressed. Likewise, this optimization will be disabled in Galera cluster until MDEV-24623 enables it. dict_table_t::bulk_trx_id: The latest active or committed transaction that initiated an insert into an empty table or partition. Protected by exclusive table lock and a clustered index leaf page latch. ins_node_t::bulk_insert: Whether bulk insert was initiated. trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert). Unlike earlier, this collection will cover also temporary tables. trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(), is_bulk_insert(), was_bulk_insert(). trx_undo_report_row_operation(): Before accessing any undo log pages, invoke trx->mod_tables.emplace() in order to determine whether undo logging was disabled, or whether this is the first INSERT and we are supposed to write a TRX_UNDO_EMPTY record. row_ins_clust_index_entry_low(): If we are inserting into an empty clustered index leaf page, set the ins_node_t::bulk_insert flag for the subsequent trx_undo_report_row_operation() call. lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock(): Remove the redundant parameter 'flags' that can be checked in the caller. btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation(). trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT), ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that the next statement will not be covered by table-level undo logging. ReadView::changes_visible(trx_id_t) const: New accessor for the case where the trx_id_t is not read from a potentially corrupted index page but directly from the memory. In this case, we can skip a sanity check. row_sel(), row_sel_try_search_shortcut(), row_search_mvcc(): row_sel_try_search_shortcut_for_mysql(), row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id. row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees(). lock_sec_rec_cons_read_sees(): Replaced with lower-level code. btr_root_page_init(): Refactored from btr_create(). dict_index_t::clear(), dict_table_t::clear(): Empty an index or table, for the ROLLBACK of an INSERT operation. ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT into an empty table. This is joint work with Thirunarayanan Balathandayuthapani, who created a working prototype. Thanks to Matthias Leich for extensive testing.	2021-01-25 18:41:27 +02:00
sjaakola	beaea31ab1	MDEV-23851 BF-BF Conflict issue because of UK GAP locks Some DML operations on tables having unique secondary keys cause scanning in the secondary index, for instance to find potential unique key violations in the seconday index. This scanning may involve GAP locking in the index. As this locking happens also when applying replication events in high priority applier threads, there is a probabality for lock conflicts between two wsrep high priority threads. This PR avoids lock conflicts of high priority wsrep threads, which do secondary index scanning e.g. for duplicate key detection. The actual fix is the patch in sql_class.cc:thd_need_ordering_with(), where we allow relaxed GAP locking protocol between wsrep high priority threads. wsrep high priority threads (replication appliers, replayers and TOI processors) are ordered by the replication provider, and they will not need serializability support gained by secondary index GAP locks. PR contains also a mtr test, which exercises a scenario where two replication applier threads have a false positive conflict in GAP of unique secondary index. The conflicting local committing transaction has to replay, and the test verifies also that the replaying phase will not conflict with the latter repllication applier. Commit also contains new test scenario for galera.galera_UK_conflict.test, where replayer starts applying after a slave applier thread, with later seqno, has advanced to commit phase. The applier and replayer have false positive GAP lock conflict on secondary unique index, and replayer should ignore this. This test scenario caused crash with earlier version in this PR, and to fix this, the secondary index uniquenes checking has been relaxed even further. Now innodb trx_t structure has new member: bool wsrep_UK_scan, which is set to true, when high priority thread is performing unique secondary index scanning. The member trx_t::wsrep_UK_scan is defined inside WITH_WSREP directive, to make it possible to prepare a MariaDB build where this additional trx_t member is not present and is not used in the code base. trx->wsrep_UK_scan is set to true only for the duration of function call for: lock_rec_lock() trx->wsrep_UK_scan is used only in lock_rec_has_to_wait() function to relax the need to wait if wsrep_UK_scan is set and conflicting transaction is also high priority. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-01-18 08:09:06 +02:00
Marko Mäkelä	666565c7f0	Merge 10.5 into 10.6	2021-01-11 17:32:08 +02:00
Marko Mäkelä	8de233af81	Merge 10.4 into 10.5	2021-01-11 16:29:51 +02:00
Marko Mäkelä	fd5e103aa4	Merge 10.3 into 10.4	2021-01-11 10:35:06 +02:00
Marko Mäkelä	5a1a714187	Merge 10.2 into 10.3 (except MDEV-17556) The fix of MDEV-17556 (commit `e25623e78a` and commit `61a362c949`) has been omitted due to conflicts and will have to be applied separately later.	2021-01-11 09:41:54 +02:00
Jan Lindström	775fccea0c	MDEV-23536 : Race condition between KILL and transaction commit A race condition may occur between the execution of transaction commit, and an execution of a KILL statement that would attempt to abort that transaction. MDEV-17092 worked around this race condition by modifying InnoDB code. After that issue was closed, Sergey Vojtovich pointed out that this race condition would better be fixed above the storage engine layer: If you look carefully into the above, you can conclude that thd->free_connection() can be called concurrently with KILL/thd->awake(). Which is the bug. And it is partially fixed in THD::~THD(), that is destructor waits for KILL completion: Fix: Add necessary mutex operations to THD::free_connection() and move WSREP specific code also there. This ensures that no one is using THD while we do free_connection(). These mutexes will also ensures that there can't be concurrent KILL/THD::awake(). innobase_kill_query We can now remove usage of trx_sys_mutex introduced on MDEV-17092. trx_t::free() Poison trx->state and trx->mysql_thd This patch is validated with an RQG run similar to the one that reproduced MDEV-17092.	2021-01-08 17:11:54 +02:00
Marko Mäkelä	8a4ca33938	Cleanup: Declare trx_weight_ge() inline	2021-01-05 14:18:10 +02:00
Marko Mäkelä	bd52f1a2dd	Cleanup: Remove lock_number_of_rows_locked() Let us access trx->lock.n_rec_locks directly.	2021-01-04 15:30:34 +02:00
Marko Mäkelä	a64cb6d265	Merge 10.3 into 10.4	2020-12-28 13:46:22 +02:00
Marko Mäkelä	7f037b8c9f	Merge 10.2 into 10.3	2020-12-28 13:30:20 +02:00
sjaakola	8e3e87d2fc	MDEV-23851 MDEV-24229 BF-BF conflict issues Issues MDEV-23851 and MDEV-24229 are probably duplicates and are caused by the new self-asserting function lock0lock.cc:wsrep_assert_no_bf_bf_wait(). The criteria for asserting is too strict and does not take in consideration scenarios of "false positive" lock conflicts, which are resolved by replaying the local transaction. As a fix, this PR is relaxing the assert criteria by two conditions, which skip assert if high priority transactions are locking in correct order or if conflicting high priority lock holder is aborting and has just not yet released the lock. Alternative fix would be to remove wsrep_assert_no_bf_bf_wait() altogether, or remove the assert in this function and let it only print warnings in error log. But in my high conflict rate multi-master test scenario, this relaxed asserting appears to be safe. This PR also removes two wsrep_report_bf_lock_wait() calls in innodb lock manager, which cause mutex access assert in debug builds. Foreign key appending missed handling of data types of float and double in INSERT execution. This is not directly related to the actual issue here but is fixed in this PR nevertheless. Missing these foreign keys values in certification could cause problems in some multi-master load scenarios. Finally, some problem reports suggest that some of the issues reported in MDEV-23851 might relate to false positive lock conflicts over unique secondary index gaps. There is separate work for relaxing UK index gap locking of replication appliers, and separate PR will be submitted for it, with a related mtr test as well.	2020-12-28 09:06:16 +02:00
Marko Mäkelä	cf2480dd77	MDEV-21452: Retain the watchdog only on dict_sys.mutex, for performance Most hangs seem to involve dict_sys.mutex. While holding lock_sys.mutex we rarely acquire any buffer pool page latches, which are a frequent source of potential hangs.	2020-12-15 17:56:18 +02:00
Marko Mäkelä	ff5d306e29	MDEV-21452: Replace ib_mutex_t with mysql_mutex_t SHOW ENGINE INNODB MUTEX functionality is completely removed, as are the InnoDB latching order checks. We will enforce innodb_fatal_semaphore_wait_threshold only for dict_sys.mutex and lock_sys.mutex. dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex. lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex. FIXME: srv_sys should be removed altogether; it is duplicating tpool functionality. fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must not hold fil_system.mutex. fil_close_all_files(): To prevent SAFE_MUTEX warnings for fil_space_destroy_crypt_data(), we must not hold fil_system.mutex while invoking fil_space_free_low() on a detached tablespace.	2020-12-15 17:56:18 +02:00
Marko Mäkelä	38fd7b7d91	MDEV-21452: Replace all direct use of os_event_t Let us replace os_event_t with mysql_cond_t, and replace the necessary ib_mutex_t with mysql_mutex_t so that they can be used with condition variables. Also, let us replace polling (os_thread_sleep() or timed waits) with plain mysql_cond_wait() wherever possible. Furthermore, we will use the lightweight srw_mutex for trx_t::mutex, to hopefully reduce contention on lock_sys.mutex. FIXME: Add test coverage of mariabackup --backup --kill-long-queries-timeout	2020-12-15 17:56:17 +02:00

1 2 3 4 5 ...

698 commits