mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Jan Lindström	d217a925b2	MDEV-24923 : Port selected Galera conflict resolution changes from 10.6 Add condition on trx->state == TRX_STATE_COMMITTED_IN_MEMORY in order to avoid unnecessary work. If a transaction has already been committed or rolled back, it will release its locks in lock_release() and let the waiting thread(s) continue execution. Let BF wait on lock_rec_has_to_wait and if necessary other BF is replayed. wsrep_trx_order_before If BF is not even replicated yet then they are ordered correctly. bg_wsrep_kill_trx Make sure victim_trx is found and check also its state. If state is TRX_STATE_COMMITTED_IN_MEMORY transaction is already committed or rolled back and will release it locks soon. wsrep_assert_no_bf_bf_wait Transaction requesting new record lock should be TRX_STATE_ACTIVE Conflicting transaction can be in states TRX_STATE_ACTIVE, TRX_STATE_COMMITTED_IN_MEMORY or in TRX_STATE_PREPARED. If conflicting transaction is already committed in memory or prepared we should wait. When transaction is committed in memory we held trx mutex, but not lock_sys->mutex. Therefore, we could end here before transaction has time to do lock_release() that is protected with lock_sys->mutex. lock_rec_has_to_wait We very well can let bf to wait normally as other BF will be replayed in case of conflict. For debug builds we will do additional sanity checks to catch unsupported bf wait if any. wsrep_kill_victim Check is victim already in TRX_STATE_COMMITTED_IN_MEMORY state and if it is we can return. lock_rec_dequeue_from_page lock_rec_unlock Remove unnecessary wsrep_assert_no_bf_bf_wait function calls. We can very well let BF wait here.	2021-03-30 08:58:10 +03:00
Jan Lindström	75546dfbb1	MDEV-24704 : Galera test failure on galera.galera_nopk_unicode Analysis: ========= Reason for test failure was a mutex deadlock between DeadlockChecker with stack Thread 6 (Thread 0xffff70066070 (LWP 24667)): 0 0x0000ffff784e850c in __lll_lock_wait (futex=futex@entry=0xffff04002258, private=0) at lowlevellock.c:46 1 0x0000ffff784e19f0 in __GI___pthread_mutex_lock (mutex=mutex@entry=0xffff04002258) at pthread_mutex_lock.c:135 2 0x0000aaaaac8cd014 in inline_mysql_mutex_lock (src_file=0xaaaaacea0f28 "/home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc", src_line=762, that=0xffff04002258) at /home/buildbot/buildbot/build/mariadb-10.2.37/include/mysql/psi/mysql_thread.h:675 3 wsrep_thd_is_BF (thd=0xffff040009a8, sync=sync@entry=1 '\001') at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc:762 4 0x0000aaaaacadce68 in lock_rec_has_to_wait (for_locking=false, lock_is_on_supremum=<optimized out>, lock2=0xffff628952d0, type_mode=291, trx=0xffff62894070) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:826 5 lock_has_to_wait (lock1=<optimized out>, lock2=0xffff628952d0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:873 6 0x0000aaaaacadd0b0 in DeadlockChecker::search (this=this@entry=0xffff70061fe8) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:7142 7 0x0000aaaaacae2dd8 in DeadlockChecker::check_and_resolve (lock=lock@entry=0xffff62894120, trx=trx@entry=0xffff62894070) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:7286 8 0x0000aaaaacae3070 in lock_rec_enqueue_waiting (c_lock=0xffff628952d0, type_mode=type_mode@entry=3, block=block@entry=0xffff62076c40, heap_no=heap_no@entry=2, index=index@entry=0xffff4c076f28, thr=thr@entry=0xffff4c078810, prdt=prdt@entry=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:1796 9 0x0000aaaaacae3900 in lock_rec_lock_slow (thr=0xffff4c078810, index=0xffff4c076f28, heap_no=2, block=0xffff62076c40, mode=3, impl=0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:2106 10 lock_rec_lock (impl=false, mode=3, block=0xffff62076c40, heap_no=2, index=0xffff4c076f28, thr=0xffff4c078810) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:2168 11 0x0000aaaaacae3ee8 in lock_sec_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0xffff62076c40, rec=rec@entry=0xffff6240407f "\303\221\342\200\232\303\220\302\265\303\220\302\272\303\221\302\201\303\221\342\200\232", index=index@entry=0xffff4c076f28, offsets=0xffff4c080690, offsets@entry=0xffff70062a30, mode=LOCK_X, mode@entry=1653162096, gap_mode=0, gap_mode@entry=281470749427104, thr=thr@entry=0xffff4c078810) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:6082 12 0x0000aaaaacb684c4 in sel_set_rec_lock (pcur=0xaaaac841c270, pcur@entry=0xffff4c077d58, rec=0xffff6240407f "\303\221\342\200\232\303\220\302\265\303\220\302\272\303\221\302\201\303\221\342\200\232", rec@entry=0x28 <error: Cannot access memory at address 0x28>, index=index@entry=0xffff4c076f28, offsets=0xffff70062a30, mode=281472334905456, type=281470749427104, thr=0xffff4c078810, thr@entry=0x9f, mtr=0x0, mtr@entry=0xffff70063928) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/row/row0sel.cc:1270 13 0x0000aaaaacb6bb64 in row_search_mvcc (buf=buf@entry=0xffff4c080690 "\376\026", mode=mode@entry=PAGE_CUR_GE, prebuilt=0xffff4c077b98, match_mode=match_mode@entry=1, direction=direction@entry=0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/row/row0sel.cc:5181 14 0x0000aaaaacaae568 in ha_innobase::index_read (this=0xffff4c038a80, buf=0xffff4c080690 "\376\026", key_ptr=<optimized out>, key_len=768, find_flag=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc:9393 15 0x0000aaaaac9201cc in handler::ha_index_read_map (this=0xffff4c038a80, buf=0xffff4c080690 "\376\026", key=0xffff4c07ccf8 "", keypart_map=keypart_map@entry=18446744073709551615, find_flag=find_flag@entry=HA_READ_KEY_EXACT) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/handler.cc:2718 16 0x0000aaaaac9f36b0 in Rows_log_event::find_row (this=this@entry=0xffff4c030098, rgi=rgi@entry=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:13461 17 0x0000aaaaac9f3e44 in Update_rows_log_event::do_exec_row (this=0xffff4c030098, rgi=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:13936 18 0x0000aaaaac9e7ee8 in Rows_log_event::do_apply_event (this=0xffff4c030098, rgi=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:11101 19 0x0000aaaaac8ca4e8 in Log_event::apply_event (rgi=0xffff4c01b510, this=0xffff4c030098) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.h:1454 20 wsrep_apply_events (buf_len=0, events_buf=0x1, thd=0xffff4c0009a8) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_applier.cc:164 21 wsrep_apply_cb (ctx=0xffff4c0009a8, buf=0x1, buf_len=18446743528248705000, flags=<optimized out>, meta=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_applier.cc:267 22 0x0000ffff7322d29c in galera::TrxHandle::apply (this=this@entry=0xffff4c027960, recv_ctx=recv_ctx@entry=0xffff4c0009a8, apply_cb=apply_cb@entry=0xaaaaac8c9fe8 <wsrep_apply_cb(void, void const, size_t, uint32_t, wsrep_trx_meta_t const)>, meta=...) at /home/buildbot/buildbot/build/galera/src/trx_handle.cpp:317 23 0x0000ffff73239664 in apply_trx_ws (recv_ctx=recv_ctx@entry=0xffff4c0009a8, apply_cb=0xaaaaac8c9fe8 <wsrep_apply_cb(void, void const, size_t, uint32_t, wsrep_trx_meta_t const)>, commit_cb=0xaaaaac8ca8d0 <wsrep_commit_cb(void, uint32_t, wsrep_trx_meta_t const, wsrep_bool_t, bool)>, trx=..., meta=...) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:34 24 0x0000ffff7323c0c4 in galera::ReplicatorSMM::apply_trx (this=this@entry=0xaaaac7c7ebc0, recv_ctx=recv_ctx@entry=0xffff4c0009a8, trx=trx@entry=0xffff4c027960) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:454 25 0x0000ffff7323e8b8 in galera::ReplicatorSMM::process_trx (this=0xaaaac7c7ebc0, recv_ctx=0xffff4c0009a8, trx=0xffff4c027960) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:1258 26 0x0000ffff73268f68 in galera::GcsActionSource::dispatch (this=this@entry=0xaaaac7c7f348, recv_ctx=recv_ctx@entry=0xffff4c0009a8, act=..., exit_loop=@0xffff7006535f: false) at /home/buildbot/buildbot/build/galera/src/gcs_action_source.cpp:115 27 0x0000ffff73269dd0 in galera::GcsActionSource::process (this=0xaaaac7c7f348, recv_ctx=0xffff4c0009a8, exit_loop=@0xffff7006535f: false) at /home/buildbot/buildbot/build/galera/src/gcs_action_source.cpp:180 28 0x0000ffff7323ef5c in galera::ReplicatorSMM::async_recv (this=0xaaaac7c7ebc0, recv_ctx=0xffff4c0009a8) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:362 29 0x0000ffff73217760 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /home/buildbot/buildbot/build/galera/src/wsrep_provider.cpp:244 30 0x0000aaaaac8cb344 in wsrep_replication_process (thd=0xffff4c0009a8) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc:486 31 0x0000aaaaac8bc3a0 in start_wsrep_THD (arg=arg@entry=0xaaaac7cb3e38) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_mysqld.cc:2173 32 0x0000aaaaaca89198 in pfs_spawn_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/perfschema/pfs.cc:1869 33 0x0000ffff784defc4 in start_thread (arg=0xaaaaaca890d8 <pfs_spawn_thread(void)>) at pthread_create.c:335 34 0x0000ffff7821c3f0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89 and background victim transaction kill with stack Thread 28 (Thread 0xffff485fa070 (LWP 24870)): 0 0x0000ffff784e530c in __pthread_cond_wait (cond=cond@entry=0xaaaac83e98e0, mutex=mutex@entry=0xaaaac83e98b0) at pthread_cond_wait.c:186 1 0x0000aaaaacb10788 in os_event::wait (this=0xaaaac83e98a0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:158 2 os_event::wait_low (reset_sig_count=2, this=0xaaaac83e98a0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:325 3 os_event_wait_low (event=0xaaaac83e98a0, reset_sig_count=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:507 4 0x0000aaaaacb98480 in sync_array_wait_event (arr=arr@entry=0xaaaac7dbb450, cell=@0xffff485f96e8: 0xaaaac7dbb560) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/sync/sync0arr.cc:471 5 0x0000aaaaacab53c8 in TTASEventMutex<GenericPolicy>::enter (line=19524, filename=0xaaaaacf2ce40 "/home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc", max_delay=<optimized out>, max_spins=0, this=0xaaaac83cc8c0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/include/ib0mutex.h:516 6 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0xaaaac83cc8c0, n_spins=<optimized out>, n_delay=<optimized out>, name=0xaaaaacf2ce40 "/home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc", line=19524) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/include/ib0mutex.h:637 7 0x0000aaaaacaaa52c in bg_wsrep_kill_trx (void_arg=0xffff4c057430) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc:19524 8 0x0000aaaaac79e7f0 in handle_manager (arg=arg@entry=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/sql_manager.cc:112 9 0x0000aaaaaca89198 in pfs_spawn_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/perfschema/pfs.cc:1869 10 0x0000ffff784defc4 in start_thread (arg=0xaaaaaca890d8 <pfs_spawn_thread(void*)>) at pthread_create.c:335 11 0x0000ffff7821c3f0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89 Fix: ==== Do not use THD::LOCK_thd_data mutex if we already hold lock_sys->mutex because it will cause mutexing order violation. Victim transaction holding conflicting locks can't be committed or rolled back while we hold lock_sys->mutex. Thus, it is safe to do wsrep_thd_is_BF call with no additional mutexes.	2021-01-27 19:46:20 +02:00
sjaakola	beaea31ab1	MDEV-23851 BF-BF Conflict issue because of UK GAP locks Some DML operations on tables having unique secondary keys cause scanning in the secondary index, for instance to find potential unique key violations in the seconday index. This scanning may involve GAP locking in the index. As this locking happens also when applying replication events in high priority applier threads, there is a probabality for lock conflicts between two wsrep high priority threads. This PR avoids lock conflicts of high priority wsrep threads, which do secondary index scanning e.g. for duplicate key detection. The actual fix is the patch in sql_class.cc:thd_need_ordering_with(), where we allow relaxed GAP locking protocol between wsrep high priority threads. wsrep high priority threads (replication appliers, replayers and TOI processors) are ordered by the replication provider, and they will not need serializability support gained by secondary index GAP locks. PR contains also a mtr test, which exercises a scenario where two replication applier threads have a false positive conflict in GAP of unique secondary index. The conflicting local committing transaction has to replay, and the test verifies also that the replaying phase will not conflict with the latter repllication applier. Commit also contains new test scenario for galera.galera_UK_conflict.test, where replayer starts applying after a slave applier thread, with later seqno, has advanced to commit phase. The applier and replayer have false positive GAP lock conflict on secondary unique index, and replayer should ignore this. This test scenario caused crash with earlier version in this PR, and to fix this, the secondary index uniquenes checking has been relaxed even further. Now innodb trx_t structure has new member: bool wsrep_UK_scan, which is set to true, when high priority thread is performing unique secondary index scanning. The member trx_t::wsrep_UK_scan is defined inside WITH_WSREP directive, to make it possible to prepare a MariaDB build where this additional trx_t member is not present and is not used in the code base. trx->wsrep_UK_scan is set to true only for the duration of function call for: lock_rec_lock() trx->wsrep_UK_scan is used only in lock_rec_has_to_wait() function to relax the need to wait if wsrep_UK_scan is set and conflicting transaction is also high priority. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-01-18 08:09:06 +02:00
Jan Lindström	775fccea0c	MDEV-23536 : Race condition between KILL and transaction commit A race condition may occur between the execution of transaction commit, and an execution of a KILL statement that would attempt to abort that transaction. MDEV-17092 worked around this race condition by modifying InnoDB code. After that issue was closed, Sergey Vojtovich pointed out that this race condition would better be fixed above the storage engine layer: If you look carefully into the above, you can conclude that thd->free_connection() can be called concurrently with KILL/thd->awake(). Which is the bug. And it is partially fixed in THD::~THD(), that is destructor waits for KILL completion: Fix: Add necessary mutex operations to THD::free_connection() and move WSREP specific code also there. This ensures that no one is using THD while we do free_connection(). These mutexes will also ensures that there can't be concurrent KILL/THD::awake(). innobase_kill_query We can now remove usage of trx_sys_mutex introduced on MDEV-17092. trx_t::free() Poison trx->state and trx->mysql_thd This patch is validated with an RQG run similar to the one that reproduced MDEV-17092.	2021-01-08 17:11:54 +02:00
sjaakola	8e3e87d2fc	MDEV-23851 MDEV-24229 BF-BF conflict issues Issues MDEV-23851 and MDEV-24229 are probably duplicates and are caused by the new self-asserting function lock0lock.cc:wsrep_assert_no_bf_bf_wait(). The criteria for asserting is too strict and does not take in consideration scenarios of "false positive" lock conflicts, which are resolved by replaying the local transaction. As a fix, this PR is relaxing the assert criteria by two conditions, which skip assert if high priority transactions are locking in correct order or if conflicting high priority lock holder is aborting and has just not yet released the lock. Alternative fix would be to remove wsrep_assert_no_bf_bf_wait() altogether, or remove the assert in this function and let it only print warnings in error log. But in my high conflict rate multi-master test scenario, this relaxed asserting appears to be safe. This PR also removes two wsrep_report_bf_lock_wait() calls in innodb lock manager, which cause mutex access assert in debug builds. Foreign key appending missed handling of data types of float and double in INSERT execution. This is not directly related to the actual issue here but is fixed in this PR nevertheless. Missing these foreign keys values in certification could cause problems in some multi-master load scenarios. Finally, some problem reports suggest that some of the issues reported in MDEV-23851 might relate to false positive lock conflicts over unique secondary index gaps. There is separate work for relaxing UK index gap locking of replication appliers, and separate PR will be submitted for it, with a related mtr test as well.	2020-12-28 09:06:16 +02:00
Marko Mäkelä	1595189250	MDEV-23897 SIGSEGV on commit with innodb_lock_schedule_algorithm=VATS This regression for debug builds was introduced by MDEV-23101 (commit `224c950462`). Due to MDEV-16664, the parameter innodb_lock_schedule_algorithm=VATS is not enabled by default. The purpose of the added assertions was to enforce the invariant that Galera replication cannot be enabled together with VATS due to MDEV-12837. However, upon closer inspection, it is obvious that the variable 'lock' may be assigned to the null pointer if no match is found in the previous->hash list. lock_grant_and_move_on_page(), lock_grant_and_move_on_rec(): Assert !lock->trx->is_wsrep() only after ensuring that lock is not a null pointer.	2020-10-06 22:35:43 +03:00
Jan Lindström	224c950462	MDEV-23101 : SIGSEGV in lock_rec_unlock() when Galera is enabled Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue and move condition to correct callers. Add a function to report BF lock waits and assert if incorrect BF-BF lock wait happens. wsrep_report_bf_lock_wait Add a new function to report BF lock wait. wsrep_assert_no_bf_bf_wait Add a new function to check do we have a BF-BF wait and if we have report this case and assert as it is a bug. lock_rec_has_to_wait Use new wsrep_assert_bf_wait to check BF-BF wait. lock_rec_create_low lock_table_create Use new function to report BF lock waits. lock_rec_insert_by_trx_age lock_grant_and_move_on_page lock_grant_and_move_on_rec Assert that trx is not Galera as VATS is not compatible with Galera. lock_rec_add_to_queue If there is conflicting lock in a queue make sure that transaction is BF. lock_rec_has_to_wait_in_queue Remove incorrect BF handling. If there is conflicting locks in a queue all transactions must wait. lock_rec_dequeue_from_page lock_rec_unlock If there is conflicting lock make sure it is not BF-BF case. lock_rec_queue_validate Add Galera record locking rules comment and use new function to report BF lock waits. All attempts to reproduce the original assertion have been failed. Therefore, there is no test case on this commit.	2020-09-10 13:18:12 +03:00
Marko Mäkelä	91caf130b7	MDEV-23101 fixup: Remove redundant code lock_rec_has_to_wait_in_queue(): Remove an obviously redundant assertion that was added in commit `a8ec45863b` and also enclose a Galera-specific condition in #ifdef WITH_WSREP.	2020-08-04 09:56:09 +03:00
Jan Lindström	a8ec45863b	MDEV-23101: SIGSEGV in lock_rec_unlock() when Galera is enabled lock_rec_has_to_wait wsrep_kill_victim lock_rec_create_low lock_rec_add_to_queue DeadlockChecker::select_victim() THD can't change from normal transaction to BF (brute force) transaction here, thus there is no need to syncronize access in wsrep_thd_is_BF function. lock_rec_has_to_wait_in_queue Add condition that lock is not NULL and add assertions if we are in strong state.	2020-08-03 15:15:40 +03:00
Marko Mäkelä	eba2d10ac5	MDEV-22721 Remove bloat caused by InnoDB logger class Introduce a new ATTRIBUTE_NOINLINE to ib::logger member functions, and add UNIV_UNLIKELY hints to callers. Also, remove some crash reporting output. If needed, the information will be available using debugging tools. Furthermore, remove some fts_enable_diag_print output that included indexed words in raw form. The code seemed to assume that words are NUL-terminated byte strings. It is not clear whether a NUL terminator is always guaranteed to be present. Also, UCS2 or UTF-16 strings would typically contain many NUL bytes.	2020-06-04 10:24:10 +03:00
Marko Mäkelä	c93f8aca65	MDEV-22618 Assertion !dict_index_is_online_ddl ... in lock_table_locks_lookup lock_table_locks_lookup(): Relax the assertion. Locks must not exist while online secondary index creation is in progress. However, if CREATE UNIQUE INDEX has not been committed yet, but the index creation has been completed, concurrent DML transactions may acquire record locks on the index. Furthermore, such concurrent DML may cause duplicate key violation, causing the DDL operation to be rolled back. After that, the online_status may be ONLINE_INDEX_ABORTED or ONLINE_INDEX_ABORTED_DROPPED. So, the debug assertion may only forbid the state ONLINE_INDEX_CREATION.	2020-05-19 10:42:56 +03:00
Daniel Black	ba2061da52	MDEV-21595: innodb offset_t rename to rec_offs thanks to: perl -i -pe 's/\boffset_t\b/rec_offs/g' $(git grep -lw offset_t storage/innobase)	2020-04-29 12:02:47 +03:00
Marko Mäkelä	c06845d6f0	Merge 10.1 into 10.2	2020-04-27 13:28:13 +03:00
Marko Mäkelä	edbdfc2f99	MDEV-7962 wsrep_on() takes 0.14% in OLTP RO The function wsrep_on() was being called rather frequently in InnoDB and XtraDB. Let us cache it in trx_t and invoke trx_t::is_wsrep() instead. innobase_trx_init(): Cache trx->wsrep = wsrep_on(thd). ha_innobase::write_row(): Replace many repeated calls to current_thd, and test the cheapest condition first.	2020-04-27 11:18:11 +03:00
Eugene Kosov	f0aa073f2b	MDEV-20950 Reduce size of record offsets offset_t: this is a type which represents one record offset. It's unsigned short int. a lot of functions: replace ulint with offset_t btr_pcur_restore_position_func(), page_validate(), row_ins_scan_sec_index_for_duplicate(), row_upd_clust_rec_by_insert_inherit_func(), row_vers_impl_x_locked_low(), trx_undo_prev_version_build(): allocate record offsets on the stack instead of waiting for rec_get_offsets() to allocate it from mem_heap_t. So, reducing memory allocations. RECORD_OFFSET, INDEX_OFFSET: now it's less convenient to store pointers in offset_t* array. One pointer occupies now several offset_t. And those constant are start indexes into array to places where to store pointer values REC_OFFS_HEADER_SIZE: adjusted for the new reality REC_OFFS_NORMAL_SIZE: increase size from 100 to 300 which means less heap allocations. And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which is smaller than previous 800 bytes. REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality rem0rec.h, rem0rec.ic, rem0rec.cc: various arguments, return values and local variables types were changed to fix numerous integer conversions issues. enum field_type_t: offset types concept was introduces which replaces old offset flags stuff. Like in earlier version, 2 upper bits are used to store offset type. And this enum represents those types. REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed get_type(), set_type(), get_value(), combine(): these are convenience functions to work with offsets and it's types rec_offs_base()[0]: still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL rec_offs_base()[i]: these have type offset_t now. Two upper bits contains type.	2019-12-13 00:26:50 +07:00
Thirunarayanan Balathandayuthapani	fb3e3a6a3d	MDEV-20483 trx_lock_t::table_locks is not a subset of trx_lock_t::trx_locks Problem: ======= Transaction left with nonempty table locks list. This leads to assumption that table_locks is not subset of trx_locks. Problem is that lock_wait_timeout_thread() doesn't remove the table lock from table_locks for transaction. Solution: ======== In lock_wait_timeout_thread(), remove the lock from table vector of transaction.	2019-09-17 19:54:55 +05:30
Marko Mäkelä	dae1b3b04c	MDEV-15326: Backport trx_t::is_referenced() Backport the applicable part of Sergey Vojtovich's commit `0ca2ea1a65` from MariaDB Server 10.3. trx reference counter was updated under mutex and read without any protection. This is both slow and unsafe. Use atomic operations for reference counter accesses.	2019-09-04 09:42:38 +03:00
Marko Mäkelä	b07beff894	MDEV-15326: InnoDB: Failing assertion: !other_lock MySQL 5.7.9 (and MariaDB 10.2.2) introduced a race condition between InnoDB transaction commit and the conversion of implicit locks into explicit ones. The assertion failure can be triggered with a test that runs 3 concurrent single-statement transactions in a loop on a simple table: CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; thread1: INSERT INTO t SET a=1; thread2: DELETE FROM t; thread3: SELECT * FROM t FOR UPDATE; -- or DELETE FROM t; The failure scenarios are like the following: (1) The INSERT statement is being committed, waiting for lock_sys->mutex. (2) At the time of the failure, both the DELETE and SELECT transactions are active but have not logged any changes yet. (3) The transaction where the !other_lock assertion fails started lock_rec_convert_impl_to_expl(). (4) After this point, the commit of the INSERT removed the transaction from trx_sys->rw_trx_set, in trx_erase_lists(). (5) The other transaction consulted trx_sys->rw_trx_set and determined that there is no implicit lock. Hence, it grabbed the lock. (6) The !other_lock assertion fails in lock_rec_add_to_queue() for the lock_rec_convert_impl_to_expl(), because the lock was 'stolen'. This assertion failure looks genuine, because the INSERT transaction is still active (trx->state=TRX_STATE_ACTIVE). The problematic step (4) was introduced in mysql/mysql-server@e27e0e0bb7 which fixed something related to MVCC (covered by the test innodb.innodb-read-view). Basically, it reintroduced an error that had been mentioned in an earlier commit mysql/mysql-server@a17be6963f: "The active transaction was removed from trx_sys->rw_trx_set prematurely." Our fix goes along the following lines: (a) Implicit locks will released by assigning trx->state=TRX_STATE_COMMITTED_IN_MEMORY as the first step. This transition will no longer be protected by lock_sys_t::mutex, only by trx->mutex. This idea is by Sergey Vojtovich. (b) We detach the transaction from trx_sys before starting to release explicit locks. (c) All callers of trx_rw_is_active() and trx_rw_is_active_low() must recheck trx->state after acquiring trx->mutex. (d) Before releasing any explicit locks, we will ensure that any activity by other threads to convert implicit locks into explicit will have ceased, by checking !trx_is_referenced(trx). There was a glitch in this check when it was part of lock_trx_release_locks(); at the end we would release trx->mutex and acquire lock_sys->mutex and trx->mutex, and fail to recheck (trx_is_referenced() is protected by trx_t::mutex). (e) Explicit locks can be released in batches (LOCK_RELEASE_INTERVAL=1000) just like we did before. trx_t::state: Document that the transition to COMMITTED is only protected by trx_t::mutex, no longer by lock_sys_t::mutex. trx_rw_is_active_low(), trx_rw_is_active(): Document that the transaction state should be rechecked after acquiring trx_t::mutex. trx_t::commit_state(): New function to change a transaction to committed state, to release implicit locks. trx_t::release_locks(): New function to release the explicit locks after commit_state(). lock_trx_release_locks(): Move much of the logic to the caller (which must invoke trx_t::commit_state() and trx_t::release_locks() as needed), and assert that the transaction will have locks. trx_get_trx_by_xid(): Make the parameter a pointer to const. lock_rec_other_trx_holds_expl(): Recheck trx->state after acquiring trx->mutex, and avoid a redundant lookup of the transaction. lock_rec_queue_validate(): Recheck impl_trx->state while holding impl_trx->mutex. row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): Document that the transaction state must be rechecked after trx_mutex_enter(). trx_free_prepared(): Adjust for the changes to lock_trx_release_locks().	2019-09-04 09:42:38 +03:00
Marko Mäkelä	7c79c12784	MDEV-15326 preparation: Remove trx_sys_t::n_prepared_trx This is a backport of `900b07908b` from MariaDB Server 10.3.	2019-09-04 09:42:38 +03:00
Marko Mäkelä	2842ae03bc	Remove a bogus comment Changes of PAGE_MAX_TRX_ID must be redo-logged for correctness. That was fixed in the InnoDB Plugin for MySQL 5.1 already.	2019-08-28 15:27:35 +03:00
Marko Mäkelä	25af2a183b	MDEV-15326/MDEV-16136 dead code removal Revert part of `fa2a74e08d`. trx_reference(): Remove, and merge the relevant part to the only caller trx_rw_is_active(). If the statements trx = NULL; were ever executed, the function would have dereferenced a NULL pointer and crashed in trx_mutex_exit(trx). Hence, those statements must have been unreachable, and they can be replaced with debug assertions. trx_rw_is_active(): Avoid unnecessary acquisition and release of trx->mutex when do_ref_count=false. lock_trx_release_locks(): Do not reset trx->id=0. Had the statement been necessary, we would have experienced crashes in trx_reference().	2019-08-27 16:38:57 +03:00
Marko Mäkelä	b6ac67389d	Merge 10.1 into 10.2	2019-07-25 12:14:27 +03:00
Marko Mäkelä	0c7c61019d	Remove the wrappers ut_time(), ut_difftime(), ib_time_t	2019-07-24 21:59:26 +03:00
Marko Mäkelä	9e5df96751	Reduce the amount of time(NULL) calls for lock processing lock_t::requested_time: Document what the field is used for. lock_t::wait_time: Document that the field is only used for diagnostics and may be garbage if the system time is being adjusted. srv_slot_t::suspend_time: Document that this is duplicating trx_lock_t::wait_started. lock_table_print(), lock_rec_print(): Declare in static scope. Add a parameter for the current time. lock_deadlock_check_and_resolve(), lock_deadlock_lock_print(), lock_deadlock_joining_trx_print(): Add a parameter for the current time.	2019-07-24 21:59:26 +03:00
Marko Mäkelä	10727b6953	Always initialize trx_t::start_time_micro This affects the function has_higher_priority() for internal or recovered transactions.	2019-07-24 21:59:26 +03:00
Marko Mäkelä	10ee1b95b8	Remove ut_usectime(), ut_gettimeofday() Replace ut_usectime() with my_interval_timer(), which is equivalent, but monotonically counting nanoseconds instead of counting the microseconds of real time. os_event_wait_time_low(): Use my_hrtime() instead of ut_usectime(). FIXME: Set a clock attribute on the condition variable that allows a monotonic clock to be chosen as the time base, so that the wait is immune to adjustments of the system clock.	2019-07-24 21:59:26 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	c0ac0b8860	Update FSF address	2019-05-11 19:25:02 +03:00
Marko Mäkelä	810f014ca7	MDEV-18429: Simpler implementation This reverts commit `61f370a3c9` and implements a simpler fix that is straightforward to merge to 10.3. lock_print_info: Renamed from PrintNotStarted. Dump the entire contents of trx_sys->mysql_trx_list. lock_print_info_rw_recovered: Like lock_print_info, but dump only recovered transactions in trx_sys->rw_trx_list. lock_print_info_all_transactions(): Dump both trx_sys->mysql_trx_list and trx_sys->rw_trx_list. TrxLockIterator, TrxListIterator, lock_rec_fetch_page(): Remove. This is a partial backport of the 10.3 commit `a447980ff3` which removed the race-condition-prone ability of the InnoDB monitor to read relevant pages into the buffer pool for some record locks.	2019-04-29 17:49:55 +03:00
Marko Mäkelä	5fb4c0a8a8	Clean up ut_list ut_list_validate(), ut_list_map(): Add variants with const Functor& so that these functions can be called with an rvalue. Remove wrapper macros, and add #ifdef UNIV_DEBUG around debug-only code.	2019-04-29 15:11:06 +03:00
Eugene Kosov	61f370a3c9	MDEV-18429 Consistent non-locking reads do not appear in TRANSACTIONS section of SHOW ENGINE INNODB STATUS lock_print_info_all_transactions(): print transactions which are started but not in RW or RO lists. print_not_started(): Replaces PrintNotStarted. Collect the skipped transactions.	2019-04-29 15:11:06 +03:00
Marko Mäkelä	793bd3ee13	lock_rec_convert_impl_to_expl_for_trx(): Remove unused parameter	2019-04-26 17:40:20 +03:00
Marko Mäkelä	bc145193c1	Merge 10.1 into 10.2	2019-04-25 09:04:09 +03:00
Marko Mäkelä	bfb0726fc2	Merge 5.5 into 10.1	2019-04-24 12:03:11 +03:00
Thirunarayanan Balathandayuthapani	d5da8ae04d	MDEV-15772 Potential list overrun during XA recovery InnoDB could return the same list again and again if the buffer passed to trx_recover_for_mysql() is smaller than the number of transactions that InnoDB recovered in XA PREPARE state. We introduce the transaction state TRX_PREPARED_RECOVERED, which is like TRX_PREPARED, but will be set during trx_recover_for_mysql() so that each transaction will only be returned once. Because init_server_components() is invoking ha_recover() twice, we must reset the state of the transactions back to TRX_PREPARED after returning the complete list, so that repeated traversals will see the complete list again, instead of seeing an empty list. Without this tweak, the test main.tc_heuristic_recover would hang in MariaDB 10.1.	2019-04-24 11:46:14 +03:00
Marko Mäkelä	d0116e10a5	Revert MDEV-18464 and MDEV-12009 This reverts commit `21b2fada7a` and commit `81d71ee6b2`. The MDEV-18464 change introduces a few data race issues. Contrary to the documentation, the field trx_t::victim is not always being protected by lock_sys_t::mutex and trx_t::mutex. Most importantly, it seems that KILL QUERY could wrongly avoid acquiring both mutexes when invoking lock_trx_handle_wait_low(), in case another thread had already set trx->victim=true. We also revert MDEV-12009, because it should depend on the MDEV-18464 fix being present.	2019-03-28 12:39:50 +02:00
Jan Lindström	21b2fada7a	MDEV-18464: Port kill_one_trx fixes from 10.4 to 10.1 Pushed the decision for innodb transaction and system locking down to lock0lock.cc level. With this, we can avoid releasing these mutexes for executions where these mutexes were acquired upfront. This patch will also fix BF aborting of native threads, e.g. threads which have declared wsrep_on=OFF. Earlier, we have used, for innodb trx locks, was_chosen_as_deadlock_victim flag, for marking inodb transactions, which are victims for wsrep BF abort. With native threads (wsrep_on==OFF), re-using was_chosen_as_deadlock_victim flag may lead to inteference with real deadlock, and to deal with this, the patch has added new flag for marking wsrep BF aborts only: victim=true Similar way if replication decides to abort one of the threads we mark victim by: victim=true innobase_kill_query Remove lock sys and trx mutex handling. wsrep_innobase_kill_one_trx Mark victim trx with victim=true trx0trx.h Remove trx_abort_t type and abort type variable from trx struct. Add victim variable to trx. wsrep_kill_victim Remove abort_type lock_report_waiters_to_mysql Take also trx mutex and mark trx as a victim for replication abort. lock_trx_handle_wait_low New low level function to check whether the transaction has already been rolled back because it was selected as a deadlock victim, or if it has to wait then cancel the wait lock. lock_trx_handle_wait If transaction is not marked as victim take lock sys and trx mutex before calling lock_trx_handle_wait_low and release them after that. row_search_for_mysql Remove lock sys and trx mutex taking and releasing. trx_rollback_to_savepoint_for_mysql_low trx_commit_in_memory Clean up victim variable.	2019-03-28 07:40:03 +02:00
Sergei Golubchik	f97d879bf8	cmake: re-enable -Werror in the maintainer mode now we can afford it. Fix -Werror errors. Note: * old gcc is bad at detecting uninit variables, disable it. * time_t is int or long, cast it for printf's	2019-03-27 22:51:37 +01:00
Marko Mäkelä	447e493179	Remove some unnecessary InnoDB #include	2018-11-29 12:53:44 +02:00
Marko Mäkelä	a81fceafb1	MDEV-14409 Assertion `page_rec_is_leaf(rec)' failed in lock_rec_validate_page lock_rec_queue_validate(): Assert page_rec_is_leaf(rec), except when the record is a page infimum or supremum. lock_rec_validate_page(): Relax the assertion that failed. The assertion was reachable when the record lock bitmap was empty. lock_rec_insert_check_and_lock(): Assert page_is_leaf().	2018-11-26 10:10:49 +02:00
Marko Mäkelä	ff88e4bb8a	Remove many redundant #include from InnoDB	2018-11-19 11:42:14 +02:00
Marko Mäkelä	cb5bca721b	MDEV-17765 lock_discard_page() may fail to discard locks for SPATIAL INDEX lock_discard_page(): Traverse and discard the B-tree record locks only if they exist. Else, discard the R-tree (spatial) index locks.	2018-11-19 11:40:10 +02:00
Marko Mäkelä	055a3334ad	MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.	2018-09-07 22:10:02 +03:00
Marko Mäkelä	9258097fa3	Merge 10.1 into 10.2	2018-08-21 15:20:34 +03:00
Marko Mäkelä	fa2a74e08d	MDEV-16136: Prevent wrong reuse in trx_reference() If trx_free() and trx_create_low() were called while a call to trx_reference() was pending, we could get a reference to a wrong transaction object. trx_reference(): Return NULL if the trx->id no longer matches. lock_trx_release_locks(): Assign trx->id = 0, so that trx_reference() will not return a reference to this object. trx_cleanup_at_db_startup(): Assign trx->id = 0. assert_trx_is_free(): Assert !trx->n_ref. Assert trx->id == 0, now that it will be cleared as part of a transaction commit.	2018-08-16 05:54:11 +03:00
Marko Mäkelä	6583506570	Rename lock_pool_t to lock_list Also, replace reverse iteration with forward iteration. lock_table_has(): Remove a redundant condition.	2018-08-16 05:53:50 +03:00
Marko Mäkelä	3ce8a0fc49	MDEV-16136: Simplify trx_lock_t memory management Allocate trx->lock.rec_pool and trx->lock.table_pool directly from trx_t. Remove unnecessary use of std::vector. In order to do this, move some definitions from lock0priv.h to lock0types.h, so that ib_lock_t will not be an opaque type.	2018-08-16 05:53:38 +03:00
Jan Lindström	3b37edee1a	MDEV-13333: Deadlock failure that does not occur elsewhere InnoDB executed code that is mean to execute only when Galera is used and in bad luck one of the transactions is selected incorrectly as deadlock victim. Fixed by adding wsrep_on_trx() condition before entering actual Galera transaction handling. No always repeatable test case for this issue is known.	2018-08-06 15:45:44 +03:00
Marko Mäkelä	be370efac4	Do not declare an unused variable	2018-08-03 14:37:31 +03:00
Thirunarayanan Balathandayuthapani	d897b4dc25	MDEV-15855: Use atomics for dict_table_t::n_ref_count Make dict_table_t::n_ref_count private, and protect it with a combination of dict_sys->mutex and atomics. We want to be able to invoke dict_table_t::release() without holding dict_sys->mutex.	2018-07-05 15:47:51 +03:00

1 2 3 4 5 ...

369 commits