mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-04-16 12:15:38 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	67caeca284	MDEV-36122: Protect table references with a lock dict_table_open_on_id(): Simplify the logic. dict_stats: A helper for acquiring MDL and opening the tables mysql.innodb_table_stats and mysql.innodb_index_stats. innodb_ft_aux_table_validate(): Contiguously hold dict_sys.latch while accessing the table that we open with dict_table_open_on_name(). lock_table_children(): Do not hold a table reference while invoking dict_acquire_mdl_shared<false>(), which may temporarily release and reacquire the shared dict_sys.latch that we are holding. With these changes, no caller of dict_acquire_mdl_shared<false> should be holding a table reference. All remaining calls to dict_table_open_on_name(dict_locked=false) except the one in fts_lock_table() and possibly in the DDL recovery predicate innodb_check_version() should be protected by MDL, but there currently is no assertion that would enforce this. Reviewed by: Debarun Banerjee	2025-03-26 14:22:58 +02:00
Vlad Lesin	c05e7c4e0e	MDEV-35708 lock_rec_get_prev() returns only the first record lock It's supposed that the function gets the previous lock set on a record. But if there are several locks set on a record, it will return only the first one. Continue locks list iteration till the certain lock even if the certain bit in lock bitmap is set.	2025-01-20 12:03:50 +03:00
Marko Mäkelä	0abef37ccd	Minor lock_sys cleanup Let us make some member functions of lock_sys_t non-static to avoid some shuffling of function parameter registers. lock_cancel_waiting_and_release(): Declare static, because there are no external callers. Reviewed by: Debarun Banerjee	2025-01-15 16:55:29 +02:00
Marko Mäkelä	b82abc7163	MDEV-35701 trx_t::autoinc_locks causes unnecessary dynamic memory allocation trx_t::autoinc_locks: Use small_vector<lock_t*,4> in order to avoid any dynamic memory allocation in the most common case (a statement is holding AUTO_INCREMENT locks on at most 4 tables or partitions). lock_cancel_waiting_and_release(): Instead of removing elements from the middle, simply assign nullptr, like lock_table_remove_autoinc_lock(). The added test innodb.auto_increment_lock_mode covers the dynamic memory allocation as well as nondeterministically (occasionally) covers the out-of-order lock release in lock_table_remove_autoinc_lock(). Reviewed by: Debarun Banerjee	2025-01-15 16:55:01 +02:00
Sergei Golubchik	9929a0a76e	MDEV-32576 increase query length in the InnoDB deadlock output * increase target buffer size to 3072 * remove the parameter, just use the buffer size as a limit	2025-01-09 10:00:36 +01:00
Marko Mäkelä	ddd7d5d8e3	MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure Under unknown circumstances, the SQL layer may wrongly disregard an invocation of thd_mark_transaction_to_rollback() when an InnoDB transaction had been aborted (rolled back) due to one of the following errors: * HA_ERR_LOCK_DEADLOCK * HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON) * HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON) Such an error used to cause a crash of InnoDB during transaction commit. These changes aim to catch and report the error earlier, so that not only this crash can be avoided but also the original root cause be found and fixed more easily later. The idea of this fix is from Michael 'Monty' Widenius. HA_ERR_ROLLBACK: A new error code that will be translated into ER_ROLLBACK_ONLY, signalling that the current transaction has been aborted and the only allowed action is ROLLBACK. trx_t::state: Add TRX_STATE_ABORTED that is like TRX_STATE_NOT_STARTED, but noting that the transaction had been rolled back and aborted. trx_t::is_started(): Replaces trx_is_started(). ha_innobase: Check the transaction state in various places. Simplify the logic around SAVEPOINT. ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only(). The InnoDB logic around transaction savepoints, commit, and rollback was unnecessarily complex and might have contributed to this inconsistency. So, we are simplifying that logic as well. trx_savept_t: Replace with const undo_no_t*. When we rollback to a savepoint, all we need to know is the number of undo log records that must survive. trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t directly in the space allocated at innobase_hton->savepoint_offset. fts_trx_create(): Do not copy previous savepoints. fts_savepoint_rollback(): If a savepoint was not found, roll back everything after the default savepoint of fts_trx_create(). The test innodb_fts.savepoint is extended to cover this code. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2024-12-12 18:02:00 +02:00
Daniele Sciascia	e821c9fa7c	MDEV-35281 SR transaction crashes with innodb_snapshot_isolation Ignore snapshot isolation conflict during fragment removal, before streaming transaction commits. This happens when a streaming transaction creates a read view that precedes the INSERTion of fragments into the streaming_log table. Fragments are INSERTed using a different transaction. These fragment are then removed as part of COMMIT of the streaming transaction. This fragment removal operation could fail when the fragments were not part the transaction's read view, thus violating snapshot isolation.	2024-11-29 08:06:32 +01:00
Marko Mäkelä	895cd553a3	MDEV-32175: Reduce page_align(), page_offset() calls When srv_page_size and innodb_page_size were introduced, the functions page_align() and page_offset() got more expensive. Let us try to replace such calls with simpler pointer arithmetics with respect to the buffer page frame. page_rec_get_next_non_del_marked(): Add a page frame as a parameter, and template<bool comp>. page_rec_next_get(): A more efficient variant of page_rec_get_next(), with template<bool comp> and const page_t* parameters. lock_get_heap_no(): Replaces page_rec_get_heap_no() outside debug checks. fseg_free_step(), fseg_free_step_not_header(): Take the header block as a parameter. Reviewed by: Vladislav Lesin	2024-11-21 11:01:30 +02:00
Marko Mäkelä	3c312d247c	MDEV-35190 HASH_SEARCH duplicates effort before HASH_INSERT or HASH_DELETE The HASH_ macros are unnecessarily obfuscating the logic, so we had better replace them. hash_cell_t::search(): Implement most of the HASH_DELETE logic, for a subsequent insert or remove(). hash_cell_t::remove(): Remove an element. hash_cell_t::find(): Implement the HASH_SEARCH logic. xb_filter_hash_free(): Avoid any hash table lookup; just traverse the hash bucket chains and free each element. xb_register_filter_entry(): Search databases_hash only once. rm_if_not_found(): Make use of find_filter_in_hashtable(). dict_sys_t::acquire_temporary_table(), dict_sys_t::find_table(): Define non-inline to avoid unnecessary code duplication. dict_sys_t::add(dict_table_t *table), dict_table_rename_in_cache(): Look for duplicate while finding the insert position. dict_table_change_id_in_cache(): Merged to the only caller row_discard_tablespace(). hash_insert(): Helper function of dict_sys_t::resize(). fil_space_t::create(): Look for a duplicate (and crash if found) when searching for the insert position. lock_rec_discard(): Take the hash array cell as a parameter to avoid a duplicated lookup. lock_rec_free_all_from_discard_page(): Remove a parameter. Reviewed by: Debarun Banerjee	2024-11-21 08:59:02 +02:00
Vlad Lesin	8c7786e7d5	MDEV-34690 lock_rec_unlock_unmodified() causes deadlock lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock() or under a combination of lock_sys.rd_lock() + record locks hash table cell latch. It also requests page latch to check if locked records were changed by the current transaction or not. Usually InnoDB requests page latch to find the certain record on the page, and then requests lock_sys and/or record lock hash cell latch to request record lock. lock_rec_unlock_unmodified() requests the latches in the opposite order, what causes deadlocks. One of the possible scenario for the deadlock is the following: thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table cell latch, the latch is acquired; thread 2 - purge thread acquires page latch and tries to remove delete-marked record, it invokes lock_update_delete(), which requests locks hash table cell latch, held by thread 1; thread 1 - requests page latch, held by thread 2. To fix it we need to release lock_sys.latch and/or lock hash cell latch, acquire page latch and re-acquire lock_sys related latches. When lock_sys.latch and/or lock hash cell latch are released in lock_release_on_prepare() and lock_release_on_prepare_try(), the page on which the current lock is held, can be merged. In this case the bitmap of the current lock must be cleared, and the new lock must be added to the end of trx->lock.trx_locks list, or bitmap of already existing lock must be changed. The new field trx_lock_t::set_nth_bit_calls indicates if new locks (bits in existing lock bitmaps or new lock objects) were created during the period when lock_sys was released in trx->lock.trx_locks list iteration loop in lock_release_on_prepare() or lock_release_on_prepare_try(). And, if so, we traverse the list again. The block can be freed during pages merging, what causes assertion failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page get mode to it. That's why page_get_mode parameter was added to btr_block_get() to pass BUF_GET_POSSIBLY_FREED from lock_release_on_prepare() and lock_release_on_prepare_try() to buf_page_get_gen(). As searching for id of trx, which modified secondary index record, is quite expensive operation, restrict its usage for master. System variable was added to remove the restriction for testing simplifying. The variable exists only either for debug build or for build with -DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the probability of catching bugs for release build with RQG. Note that the code, which does primary index lookup to find out what transaction modified secondary index record, is necessary only when there is no primary key and no unique secondary key on replica with row based replication, because only in this case extra X locks on unmodified records can be set during scan phase. Reviewed by Marko Mäkelä.	2024-10-23 12:36:17 +03:00
Vlad Lesin	92180ad513	MDEV-34466 XA prepare don't release unmodified records for some cases There is no need to exclude exclusive non-gap locks from the procedure of locks releasing on XA PREPARE execution in lock_release_on_prepare_try() after commit `17e59ed3aa` (MDEV-33454), because lock_rec_unlock_unmodified() should check if the record was modified with the XA, and release the lock if it was not. lock_release_on_prepare_try(): don't skip X-locks, let lock_rec_unlock_unmodified() to process them. lock_sec_rec_some_has_impl(): add template parameter for not acquiring trx_t::mutex for the case if a caller already holds the mutex, don't crash if lock's bitmap is clean. row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument to skip trx_t::mutex acquiring. rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the current thread already holds it. Thanks to Andrei Elkin for finding the bug. Reviewed by Marko Mäkelä, Debarun Banerjee.	2024-10-23 12:36:17 +03:00
Jan Lindström	b3be3c2157	MDEV-30653 : With wsrep_mode=REPLICATE_ARIA only part of mixed-engine transactions is replicated Replication of non-transactional engines is experimental and uses TOI. This naturally means that if there is open transaction with transactional engine it's changes will be rolled back. Fixed by adding error message if non-transactional engine is part of multi-engine transaction with warning. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-10-23 04:00:52 +02:00
Denis Protivensky	231900e5bb	MDEV-34836: TOI on parent table must BF abort SR in progress on a child Applied SR transaction on the child table was not BF aborted by TOI running on the parent table for several reasons: Although SR correctly collected FK-referenced keys to parent, TOI in Galera disregards common certification index and simply sets itself to depend on the latest certified write set seqno. Since this write set was the fragment of SR transaction, TOI was allowed to run in parallel with SR presuming it would BF abort the latter. At the same time, DML transactions in the server don't grab MDL locks on FK-referenced tables, thus parent table wasn't protected by an MDL lock from SR and it couldn't provoke MDL lock conflict for TOI to BF abort SR transaction. In InnoDB, DDL transactions grab shared MDL locks on child tables, which is not enough to trigger MDL conflict in Galera. InnoDB-level Wsrep patch didn't contain correct conflict resolution logic due to the fact that it was believed MDL locking should always produce conflicts correctly. The fix brings conflict resolution rules similar to MDL-level checks to InnoDB, thus accounting for the problematic case. Apart from that, wsrep_thd_is_SR() is patched to return true only for executing SR transactions. It should be safe as any other SR state is either the same as for any single write set (thus making the two logically equivalent), or it reflects an SR transaction as being aborting or prepared, which is handled separately in BF-aborting logic, and for regular execution path it should not matter at all. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-09-24 11:14:01 +02:00
Marko Mäkelä	0e76c1ba94	Merge 10.5 into 10.6	2024-08-28 15:51:36 +03:00
Marko Mäkelä	e7bb9b7c55	MDEV-24923 fixup: Correct a function comment	2024-08-27 18:06:24 +03:00
Thirunarayanan Balathandayuthapani	3359ac09a4	MDEV-34066 Output of SHOW ENGINE INNODB STATUS uses the nanoseconds suffix for microseconds - This issue is caused by commit `e71e613353` (MDEV-24671). Change the output of transaction lock wait time in microseconds suffix.	2024-07-23 21:36:13 +05:30
Yuchen Pei	f071b7620b	Merge branch '10.5' into 10.6	2024-07-16 15:54:22 +08:00
Thirunarayanan Balathandayuthapani	00d2c7f7f4	MDEV-34542 Assertion `lock_trx_has_sys_table_locks(trx) == __null' failed in void row_mysql_unfreeze_data_dictionary(trx_t*) - During XA PREPARE, InnoDB releases the non-exclusive locks. But it fails to remove the non-exclusive table lock from the transaction table locks. In the mean time, main thread evicts the table from the LRU cache. While rollbacking the XA transaction, InnoDB iterates through the table locks to check whether it holds lock on any system tables and wrongly assumes the evicted table as system table since the table id is 0 Fix: === During XA PREPARE, remove the table locks of the transaction while releasing the non-exclusive locks.	2024-07-12 17:42:14 +05:30
Julius Goryavsky	4026f04425	Merge branch 10.5 into 10.6	2024-07-09 11:56:47 +02:00
Denis Protivensky	b7718a1c1c	MDEV-32738: Don't roll back high-prio txn waiting on a lock in InnoDB DML transactions on FK-child tables also get table locks on FK-parent tables. If there is a DML transaction holding such a lock, and a TOI transaction starts, the latter BF-aborts the former and puts itself into a waiting state. If at this moment another DML transaction on FK-child table starts, it doesn't check that the transaction waiting on a parent table lock is TOI, and it erroneously BF-aborts the waiting TOI transaction. The fix: don't roll back high-priority transaction waiting on a lock in InnoDB, instead roll back an incoming DML transaction. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-07-08 23:36:21 +02:00
Marko Mäkelä	d1ecf5cc5f	MDEV-32176 Contention in ha_innobase::info_low() During a Sysbench oltp_point_select workload with 1 table and 400 concurrent connections, a bottleneck on dict_table_t::lock_mutex was observed in ha_innobase::info_low(). dict_table_t::lock_latch: Replaces lock_mutex. In ha_innobase::info_low() and several other places, we will acquire a shared dict_table_t::lock_latch or we may elide the latch if hardware memory transactions are available. innobase_build_v_templ(): Remove the parameter "bool locked", and require the caller to hold exclusive dict_table_t::lock_latch (instead of holding an exclusive dict_sys.latch). Tested by: Vladislav Vaintroub Reviewed by: Vladislav Vaintroub	2024-06-28 15:57:07 +03:00
Iaroslav Babanin	5d49a2add7	MDEV-33935 fix deadlock counter - The deadlock counter was moved from Deadlock::find_cycle into Deadlock::report, because the find_cycle method is called multiple times during deadlock detection flow, which means it shouldn't have such side effects. But report() can, which called only once for a victim transaction. - Also the deadlock_detect.test and *.result test case has been extended to handle the fix.	2024-06-19 20:43:33 +03:00
Jan Lindström	ee974ca5e0	MDEV-31658 : Deadlock found when trying to get lock during applying Problem was that there was two non-conflicting local idle transactions in node_1 that both inserted a key to primary key. Then two transactions from other nodes inserted also a key to primary key so that insert from node_2 conflicted one of the local transactions in node_1 so that there would be duplicate key if both are committed. For this insert from other node tries to acquire S-lock for this record and because this insert is high priority brute force (BF) transaction it will kill idle local transaction. Concurrently, second insert from node_3 conflicts the second idle insert transaction in node_1. Again, it tries to acquire S-lock for this record and kills idle local transaction. At this point we have two non-conflicting high priority transactions holding S-lock on different records in node_1. For example like this: rec s-lock-node2-rec s-lock-node3-rec rec. Because these high priority BF-transactions do not wait each other insert from node3 that has later seqno compared to insert from node2 can continue. It will try to acquire insert intention for record it tries to insert (to avoid duplicate key to be inserted by local transaction). Hower, it will note that there is conflicting S-lock in same gap between records. This will lead deadlock error as we have defined that BF-transactions may not wait for record lock but we can't kill conflicting BF-transaction because it has lower seqno and it should commit first. BF-transactions are executed concurrently because their values to primary key are different i.e. they do not conflict. Galera certification will make sure that inserts from other nodes i.e these high priority BF-transactions can't insert duplicate keys. Local transactions naturally can but they will be killed when BF-transaction acquires required record locks. Therefore, we can allow situation where there is conflicting S-lock and insert intention lock regardless of their seqno order and let both continue with no wait. This will lead to situation where we need to allow BF-transaction to wait when lock_rec_has_to_wait_in_queue is called because this function is also called from lock_rec_queue_validate and because lock is waiting there would be assertion in ut_a(lock->is_gap() \|\| lock_rec_has_to_wait_in_queue(cell, lock)); lock_wait_wsrep_kill Add debug sync points for BF-transactions killing local transaction. wsrep_assert_no_bf_bf_wait Print also requested lock information lock_rec_has_to_wait Add function to handle wsrep transaction lock wait cases. lock_rec_has_to_wait_wsrep New function to handle wsrep transaction lock wait exceptions. lock_rec_has_to_wait_in_queue Remove wsrep exception, in this function all conflicting locks need to wait in queue. Conflicts between BF and local transactions are handled in lock_wait. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-06-19 14:09:11 +02:00
Marko Mäkelä	27834ebc91	Merge 10.5 into 10.6	2024-06-10 15:22:15 +03:00
Marko Mäkelä	a2bd936c52	MDEV-33161 Function pointer signature mismatch in LF_HASH In cmake -DWITH_UBSAN=ON builds with clang but not with GCC, -fsanitize=undefined will flag several runtime errors on function pointer mismatch related to the lock-free hash table LF_HASH. Let us use matching function signatures and remove function pointer casts in order to avoid potential bugs due to undefined behaviour. These errors could be caught at compilation time by -Wcast-function-type-strict, which is available starting with clang-16, but not available in any version of GCC as of now. The old GCC flag -Wcast-function-type is enabled as part of -Wextra, but it specifically does not catch these errors. Reviewed by: Vladislav Vaintroub	2024-06-10 12:35:33 +03:00
Marko Mäkelä	5ba542e9ee	Merge 10.5 into 10.6	2024-05-30 14:27:07 +03:00
mariadb-DebarunBanerjee	b2944adb76	MDEV-34166 Server could hang with BP < 80M under stress BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size. For example, for BP size lower than 80M and 16 K page size, the limit is more than 5% of total BP and for lowest BP 5M, it is 80% of the BP. Non-data objects like explicit locks could occupy part of the BP pool reducing the pages available for LRU. If LRU reaches minimum limit and if no free pages are available, server would hang with page cleaner not able to free any more pages. Fix: To avoid such hang, we adjust the LRU limit lower than the limit for data objects as checked in buf_LRU_check_size_of_non_data_objects() i.e. one page less than 5% of BP.	2024-05-21 14:13:29 +05:30
mariadb-DebarunBanerjee	8047c8bc71	MDEV-28800 SIGABRT due to running out of memory for InnoDB locks This regression is introduced in 10.6 by following commit. commit `898dcf93a8` (Cleanup the lock creation) It removed one important optimization for lock bitmap pre-allocation. We pre-allocate about 8 byte extra space along with every lock object to adjust for similar locks on newly created records on the same page by same transaction. When it is exhausted, a new lock object is created with similar 8 byte pre-allocation. With this optimization removed we are left with only 1 byte pre-allocation. When large number of records are inserted and locked in a single page, we end up creating too many new locks almost in n^2 order. Fix-1: Bring back LOCK_PAGE_BITMAP_MARGIN for pre-allocation. Fix-2: Use the extra space (40 bytes) for bitmap in trx->lock.rec_pool.	2024-05-20 21:19:13 +05:30
Marko Mäkelä	4aa92911c7	MDEV-33802 Weird read view after ROLLBACK of another transaction Even after commit `b8a6719889` there is an anomaly where a locking read could return inconsistent results. If a locking read would have to wait for a record lock, then by the definition of a read view, the modifications made by the current lock holder cannot be visible in the read view. This is because the read view must exclude any transactions that had not been committed at the time when the read view was created. lock_rec_convert_impl_to_expl_for_trx(), lock_rec_convert_impl_to_expl(): Return an unsafe-to-dereference pointer to a transaction that holds or held the lock, or nullptr if the lock was available. lock_clust_rec_modify_check_and_lock(), lock_sec_rec_read_check_and_lock(), lock_clust_rec_read_check_and_lock(): Return DB_RECORD_CHANGED if innodb_strict_isolation=ON and the lock was being held by another transaction. The test case, which is based on a bug report by Zhuang Liu, covers the function lock_sec_rec_read_check_and_lock(). Reviewed by: Vladislav Lesin	2024-04-09 12:50:24 +03:00
Jan Lindström	b762541dd6	MDEV-33278 : Assertion failure in thd_get_thread_id at lock_wait_wsrep Problem is that not all conflicting transactions have THD object. Therefore, it must be checked that victim has THD before it's identification is added to victim list as victim's thread identification is later requested using thd_get_thread_id function that requires that we have valid pointer to THD object in trx->mysql_thd. Victim might not have trx->mysql_thd in two cases: (1) An incomplete transaction that was recovered from undo logs on server startup (and not yet rolled back). (2) Transaction that is in XA PREPARE state and whose client connection was disconnected. Neither of these can complete before lock_wait_wsrep() releases lock_sys.latch. (1) trx_t::commit_in_memory() is clearing both trx_t::state and trx_t::is_recovered before it invokes lock_release(trx_t*) (which would be blocked by the exclusive lock_sys.latch that we are holding here). Hence, it is not possible to write a debug assertion to document this scenario. (2) If is in XA PREPARE state, it would eventually be rolled back and the lock conflict would be resolved when an XA COMMIT or XA ROLLBACK statement is executed in some other connection. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-03-26 02:06:51 +01:00
Marko Mäkelä	17e59ed3aa	MDEV-33454 release row locks for non-modified rows at XA PREPARE From the correctness point of view, it should be safe to release all locks on index records that were not modified by the transaction. Doing so should make the locks after XA PREPARE fully compatible with what would happen if the server were restarted: InnoDB table IX locks and exclusive record locks would be resurrected based on undo log records. Concurrently running transactions that are waiting for a lock may invoke lock_rec_convert_impl_to_expl() to create an explicit record lock object on behalf of the lock-owning transaction so that they can attaching their waiting lock request on the explicit record lock object. Explicit locks would be released by trx_t::release_locks() during commit or rollback. Any clustered index record whose DB_TRX_ID belongs to a transaction that is in active or XA PREPARE state will be implicitly locked by that transaction. On XA PREPARE, we can release explicit exclusive locks on records whose DB_TRX_ID does not match the current transaction identifier. lock_rec_unlock_unmodified(): Release record locks that are not implicitly held by the current transaction. lock_release_on_prepare_try(), lock_release_on_prepare(): Invoke lock_rec_unlock_unmodified(). row_trx_id_offset(): Declare non-static. lock_rec_unlock(): Replaces lock_rec_unlock_supremum(). Reviewed by: Vladislav Lesin	2024-03-22 14:33:48 +02:00
Marko Mäkelä	b8a6719889	MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation https://jepsen.io/analyses/mysql-8.0.34 highlights that the transaction isolation levels in the InnoDB storage engine do not correspond to any widely accepted definitions, such as "Generalized Isolation Level Definitions" https://pmg.csail.mit.edu/papers/icde00.pdf (PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ, PL-3 = SERIALIZABLE). Only READ UNCOMMITTED in InnoDB seems to match the above definition. The issue is that InnoDB does not detect write/write conflicts (Section 4.4.3, Definition 6) in the above. It appears that as soon as we implement write/write conflict detection (SET SESSION innodb_snapshot_isolation=ON), the default isolation level (SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of "A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf Locking reads inside InnoDB used to read the latest committed version, ignoring what should actually be visible to the transaction. The added test innodb.lock_isolation illustrates this. The statement UPDATE t SET a=3 WHERE b=2; is executed in a transaction that was started before a read view or a snapshot of the current transaction was created, and committed before the current transaction attempts to execute UPDATE t SET b=3; If SET innodb_snapshot_isolation=ON is in effect when the second transaction was started, the second transaction will be aborted with the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF), the second transaction would execute inconsistently, displaying an incorrect SELECT COUNT(*) FROM t in its read view. If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a record that does not exist in the current read view is made, an error DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will be raised. This error will be treated in the same way as a deadlock: the transaction will be rolled back. lock_clust_rec_read_check_and_lock(): If the current transaction has a read view where the record is not visible and innodb_snapshot_isolation=ON, fail before trying to acquire the lock. row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON, disable the "semi-consistent read" logic that had been implemented by myself on the directions of Heikki Tuuri in order to address https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer wanting UPDATE to skip locked rows that do not match the WHERE condition. It looks like my changes were included in the MySQL 5.1.5 commit ad126d90e019f223470e73e1b2b528f9007c4532; at that time, employees of Innobase Oy (a recent acquisition of Oracle) had lost write access to the repository. The only reason why we set innodb_snapshot_isolation=OFF by default is backward compatibility with applications, such as the one that motivated the implementation of "semi-consistent read" back in 2005. In a later major release, we can default to innodb_snapshot_isolation=ON. Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations and some testing of this fix. Thanks to Vladislav Lesin for the initial test for MDEV-26643, as well as reviewing these changes.	2024-03-20 09:48:03 +02:00
Marko Mäkelä	c3a00dfa53	Merge 10.5 into 10.6	2024-03-12 09:19:57 +02:00
mariadb-DebarunBanerjee	afe9632913	MDEV-33593 Auto increment deadlock error causes ASSERT in subsequent save point The issue here is ha_innobase::get_auto_increment() could cause a deadlock involving auto-increment lock and rollback the transaction implicitly. For such cases, storage engines usually call thd_mark_transaction_to_rollback() to inform SQL engine about it which in turn takes appropriate actions and close the transaction. In innodb, we call it while converting Innodb error code to MySQL. However, since ::innobase_get_autoinc() returns void, we skip the call for error code conversion and also miss marking the transaction for rollback for deadlock error. We assert eventually while releasing a savepoint as the transaction state is not active. Since convert_error_code_to_mysql() is handling some generic error handling part, like invoking the callback when needed, we should call that function in ha_innobase::get_auto_increment() even if we don't return the resulting mysql error code back.	2024-03-07 21:54:06 +05:30
Marko Mäkelä	b2654ba826	MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN KEY child table lock on DDL lock_table_children(): A new function to lock all child tables of a table. We will only hold dict_sys.latch while traversing dict_table_t::referenced_set. To prevent a race condition with std::set::erase() we will copy the pointers to the child tables to a local vector. Once we have acquired MDL and references to all child tables, we can safely release dict_sys.latch, wait for the locks, and finally release the references. dict_acquire_mdl_shared(): A new variant that takes mdl_context as a parameter. lock_table_for_trx(): Assert that we are not holding dict_sys.latch. ha_innobase::truncate(): When foreign_key_checks=ON, assert that no child tables exist (other than the current table). In any case, we will invoke lock_table_children() so that the child table metadata can be safely updated. (It is possible that a child table is being created concurrently with TRUNCATE TABLE.) ha_innobase::delete_table(): Before and after acquiring exclusive locks on the current table as well as all child tables, check that FOREIGN KEY constraints will not be violated. In this way, we can reject impossible DROP TABLE without having to wait for locks first. This fixes up commit `2ca1123464` (MDEV-26217) and commit `c3c53926c4` (MDEV-26554).	2024-02-08 14:22:35 +11:00
Marko Mäkelä	5f2dcd112b	MDEV-24167 fixup: srw_lock_debug instrumentation While the index_lock and block_lock include debug instrumentation to keep track of shared lock holders, such instrumentation was never part of the simpler srw_lock, and therefore some users of the class implemented a limited form of bookkeeping. srw_lock_debug encapsulates srw_lock and adds the data members writer, readers_lock, and readers to keep track of the threads that hold the exclusive latch or any shared latches. The debug checks are available also with SUX_LOCK_GENERIC (in environments that do not implement a futex-like system call). dict_sys_t::latch: Use srw_lock_debug in debug builds. This makes the debug fields latch_ex, latch_readers redundant. fil_space_t::latch: Use srw_lock_debug in debug builds. This makes the debug field latch_count redundant. The field latch_owner must be preserved, because fil_space_t::is_owner() is being used in all builds. lock_sys_t::latch: Use srw_lock_debug in debug builds. This makes the debug fields writer, readers redundant. lock_sys_t::is_holder(): A new debug predicate to check if the current thread is holding lock_sys.latch in any mode. trx_rseg_t::latch: Use srw_lock_debug in debug builds.	2024-02-08 14:22:35 +11:00
Marko Mäkelä	21560bee9d	Revert "MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN KEY child table lock on DDL" This reverts commit `569da6a7ba`, commit `768a736174`, and commit `ba6bf7ad9e` because of a regression that was filed as MDEV-33104.	2024-01-19 12:46:11 +02:00
Marko Mäkelä	ba6bf7ad9e	MDEV-32899 instrumentation In debug builds, let us declare dict_sys.latch as index_lock instead of srw_lock, so that we will benefit from the full tracking of lock ownership. lock_table_for_trx(): Assert that the current thread is not holding dict_sys.latch. If the dict_sys.unfreeze() call were moved to the end of lock_table_children(), this assertion would fail in the test innodb.innodb and many other tests that use FOREIGN KEY.	2023-11-29 10:48:10 +02:00
Marko Mäkelä	569da6a7ba	MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN KEY child table lock on DDL lock_table_children(): A new function to lock all child tables of a table. We will only hold dict_sys.latch while traversing dict_table_t::referenced_set. To prevent a race condition with std::set::erase() we will copy the pointers to the child tables to a local vector. Once we have acquired references to all child tables, we can safely release dict_sys.latch, wait for the locks, and finally release the references. This fixes up commit `2ca1123464` (MDEV-26217) and commit `c3c53926c4` (MDEV-26554).	2023-11-28 15:50:41 +02:00
Oleksandr Byelkin	6cfd2ba397	Merge branch '10.4' into 10.5	2023-11-08 12:59:00 +01:00
Marko Mäkelä	b78b77e77d	MDEV-32530 Race condition in lock_wait_rpl_report() After acquiring lock_sys.latch, always load trx->lock.wait_lock. It could have changed by another thread that did lock_rec_move() and released lock_sys.latch right before lock_sys.wr_lock_try() succeeded. This regression was introduced in commit `e039720bf3` (MDEV-32096). Reviewed by: Vladislav Lesin	2023-10-24 14:33:14 +03:00
Vlad Lesin	18fa00a54c	MDEV-32272 lock_release_on_prepare_try() does not release lock if supremum bit is set along with other bits set in lock's bitmap The error is caused by MDEV-30165 fix with the following commit: `d13a57ae81` There is logical error in lock_release_on_prepare_try(): if (supremum_bit) lock_rec_unlock_supremum(*cell, lock); else lock_rec_dequeue_from_page(lock, false); Because there can be other bits set in the lock's bitmap, and the lock type can be suitable for releasing criteria, but the above logic releases only supremum bit of the lock. The fix is to release lock if it suits for releasing criteria and unlock supremum if supremum is locked otherwise. Tere is also the test for the case, which was reported by QA team. I placed it in a separate files, because it requires debug build. Reviewed by: Marko Mäkelä	2023-10-13 16:29:04 +03:00
Vlad Lesin	96ae37abc5	MDEV-30658 lock_row_lock_current_waits counter in information_schema.innodb_metrics may become negative MONITOR_OVLD_ROW_LOCK_CURRENT_WAIT monitor should has MONITOR_DISPLAY_CURRENT flag set in its definition, as it shows the current state and does not accumulate anything. Reviewed by: Marko Mäkelä	2023-10-05 18:27:54 +03:00
Jan Lindström	076df87b4c	MDEV-30217 : Assertion `mode_ == m_local \|\| transaction_.is_streaming()' failed in int wsrep::client_state::bf_abort(wsrep::seqno) Problem was that brute force (BF) thread requested conflicting lock and was trying to kill victim transaction, but this victim was also brute force thread. However, this victim was not actually holding conflicting lock, instead both brute force transaction and victim transaction were had insert intention locks. We should not kill brute force victim transaction if requesting lock does not need to wait. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-09-25 16:38:55 +02:00
Vlad Lesin	d13a57ae81	Merge 10.5 into 10.6.	2023-09-22 15:21:15 +03:00
Vlad Lesin	95730372bd	MDEV-30165 X-lock on supremum for prepared transaction for RR trx_t::set_skip_lock_inheritance() must be invoked at the very beginning of lock_release_on_prepare(). Currently trx_t::set_skip_lock_inheritance() is invoked at the end of lock_release_on_prepare() when lock_sys and trx are released, and there can be a case when locks on prepare are released, but "not inherit gap locks" bit has not yet been set, and page split inherits lock to supremum. Also reset supremum bit and rebuild waiting queue when XA is prepared. Reviewed by: Marko Mäkelä	2023-09-21 20:07:53 +03:00
Marko Mäkelä	4a8291fc5f	MDEV-30531 Corrupt index(es) on busy table when using FOREIGN KEY lock_wait(): Never return the transient error code DB_LOCK_WAIT. In commit `78a04a4c22` (MDEV-29869) some assignments assign trx->error_state = DB_SUCCESS were removed, and it was possible that the field was left at its initial value DB_LOCK_WAIT. The test case for this is nondeterministic; without this fix, it would only occasionally fail. Reviewed by: Vladislav Lesin	2023-09-11 14:52:05 +03:00
Marko Mäkelä	e039720bf3	MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait lock_sys_t::cancel(trx_t*): Remove, and merge to its only caller innobase_kill_query(). innobase_kill_query(): Before reading trx->lock.wait_lock, do acquire lock_sys.wait_mutex, like we did before commit `e71e613353` (MDEV-24671). In this way, we should not miss a recently started lock wait by the killee transaction. lock_rec_lock(): Add a DEBUG_SYNC "lock_rec" for the test case. lock_wait(): Invoke trx_is_interrupted() before entering the wait, in case innobase_kill_query() was invoked some time earlier and some longer-running operation did not check for interrupts. As suggested by Vladislav Lesin, do not overwrite trx->error_state==DB_INTERRUPTED with DB_SUCCESS. This would avoid a call to trx_is_interrupted() when the test is modified to use the DEBUG_SYNC point lock_wait_start instead of lock_rec. Avoid some redundant loads of trx->lock.wait_lock; cache the value in the local variable wait_lock. Deadlock::check_and_resolve(): Take wait_lock as a parameter and return wait_lock (or -1 or nullptr). We only need to reload trx->lock.wait_lock if lock_sys.wait_mutex had been released and reacquired. trx_t::error_state: Correctly document the data member. trx_lock_t::was_chosen_as_deadlock_victim: Clarify that other threads may set the field (or flags in it) while holding lock_sys.wait_mutex. Thanks to Johannes Baumgarten for reporting the problem and testing the fix, as well as to Kristian Nielsen for suggesting the fix. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-09-11 14:51:02 +03:00
Kristian Nielsen	7c9837ce74	Merge 10.4 into 10.5 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 18:02:18 +02:00
Kristian Nielsen	805e0668c9	MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication Remove the exception that InnoDB does not report auto-increment locks waits to the parallel replication. There was an assumption that these waits could not cause conflicts with in-order parallel replication and thus need not be reported. However, this assumption is wrong and it is possible to get conflicts that lead to hangs for the duration of --innodb-lock-wait-timeout. This can be seen with three transactions: 1. T1 is waiting for T3 on an autoinc lock 2. T2 is waiting for T1 to commit 3. T3 is waiting on a normal row lock held by T2 Here, T3 needs to be deadlock killed on the wait by T1. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:40:02 +02:00

1 2 3 4 5 ...

778 commits