mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 12:02:42 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	15b607b552	Merge 10.5 into 10.6	2024-04-19 16:01:26 +03:00
Marko Mäkelä	ec7db2bdf8	MDEV-33325 fixup ibuf_remove_free_page(): Correct the calculation of root_savepoint(). The first entry acquired by ibuf_tree_root_get() will be ibuf.index.lock and not the change buffer root page. Thanks to Matthias Leich for finding this bug in RQG. Unfortunately, this code is very difficult to cover in our regression test suite.	2024-04-19 12:39:48 +03:00
mariadb-DebarunBanerjee	5928e04d5f	MDEV-32489 Change buffer index fails to delete the records When the change buffer records for a page span across multiple change buffer leaf pages or the starting record is at the beginning of a page with a left sibling, ibuf_delete_recs deletes only the records in first page and fails to move to subsequent pages. Subsequently a slow shutdown hangs trying to delete those left over records. Fix-A: Position the cursor to an user record in B-tree and exit only when all records are exhausted. Fix-B: Make sure we call ibuf_delete_recs during slow shutdown for pages with IBUF entries to cleanup any previously left over records.	2024-04-18 08:30:21 +05:30
mariadb-DebarunBanerjee	040069f4ba	MDEV-33431 Latching order violation reported fil_system.sys_space.latch and ibuf_pessimistic_insert_mutex Issue: ------ The actual order of acquisition of the IBUF pessimistic insert mutex (SYNC_IBUF_PESS_INSERT_MUTEX) and IBUF header page latch (SYNC_IBUF_HEADER) w.r.t space latch (SYNC_FSP) differs from the order defined in sync0types.h. It was not discovered earlier as the path to ibuf_remove_free_page was not covered by the mtr test. Ideal order and one defined in sync0types.h is as follows. SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX -> SYNC_FSP In ibuf_remove_free_page, we acquire space latch earlier and we have the order as follows resulting in the assert with innodb_sync_debug=on. SYNC_FSP -> SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX Fix: --- We do maintain this order in other places and there doesn't seem to be any real issue here. To reduce impact in GA versions, we avoid doing extensive changes in mutex ordering to match the current SYNC_IBUF_PESS_INSERT_MUTEX order. Instead we relax the ordering check for IBUF pessimistic insert mutex using SYNC_NO_ORDER_CHECK.	2024-04-17 15:16:50 +05:30
Marko Mäkelä	263932d505	MDEV-33325 Crash in flst_read_addr on corrupted data flst_read_addr(): Remove assertions. Instead, we will check these conditions in the callers and avoid a crash in case of corruption. We will check the conditions more carefully, because the callers know more exact bounds for the page numbers and the byte offsets withing pages. flst_remove(), flst_add_first(), flst_add_last(): Add a parameter for passing fil_space_t::free_limit. None of the lists may point to pages that are beyond the current initialized length of the tablespace. trx_rseg_mem_restore(): Access the first page of the tablespace, so that we will correctly recover rseg->space->free_limit in case some log based recovery is pending. ibuf_remove_free_page(): Only look up the root page once, and validate the last page number. Reviewed by: Debarun Banerjee	2024-04-11 09:58:53 +03:00
Marko Mäkelä	6538a91e94	Merge 10.5 into 10.6	2024-01-08 14:39:56 +02:00
Marko Mäkelä	f074223ae7	MDEV-32068 Some calls to buf_read_ahead_linear() seem to be useless The linear read-ahead (enabled by nonzero innodb_read_ahead_threshold) works best if index leaf pages or undo log pages have been allocated on adjacent page numbers. The read-ahead is assumed not to be helpful in other types of page accesses, such as non-leaf index pages. buf_page_get_low(): Do not invoke buf_page_t::set_accessed(), buf_page_make_young_if_needed(), or buf_read_ahead_linear(). We will invoke them in those callers of buf_page_get_gen() or buf_page_get() where it makes sense: the access is not one-time-on-startup and the page and not going to be freed soon. btr_copy_blob_prefix(), btr_pcur_move_to_next_page(), trx_undo_get_prev_rec_from_prev_page(), trx_undo_get_first_rec(), btr_cur_t::search_leaf(), btr_cur_t::open_leaf(): Invoke buf_read_ahead_linear(). We will not invoke linear read-ahead in functions that would essentially allocate or free pages, because pages that are freshly allocated are expected to be initialized by buf_page_create() and not read from the data file. Likewise, freeing pages should not involve accessing any sibling pages, except for freeing singly-linked lists of BLOB pages. We will not invoke read-ahead in btr_cur_t::pessimistic_search_leaf() or in a pessimistic operation of btr_cur_t::open_leaf(), because it is assumed that pessimistic operations should be preceded by optimistic operations, which should already have invoked read-ahead. buf_page_make_young_if_needed(): Invoke also buf_page_t::set_accessed() and return the result. btr_cur_nonleaf_make_young(): Like buf_page_make_young_if_needed(), but do not invoke buf_page_t::set_accessed(). Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-12-05 12:31:29 +02:00
Marko Mäkelä	5b62644e68	MDEV-31621 Remove ibuf_read_merge_pages() call from ibuf_insert_low() When InnoDB attempts to buffer a change operation of a secondary index leaf page (to insert, delete-mark or remove a record) and the change buffer is too large, InnoDB used to trigger a change buffer merge that could affect any tables. This could lead to huge variance in system throughput and potentially unpredictable crashes, in case the change buffer was corrupted and a crash occurred while attempting to merge changes to a table that is not being accessed by the current SQL statement. ibuf_insert_low(): Simply return DB_STRONG_FAIL when the maximum size of the change buffer is exceeded. ibuf_contract_after_insert(): Remove. ibuf_get_merge_page_nos_func(): Remove a constant parameter. The function ibuf_contract() will be our only caller, during shutdown with innodb_fast_shutdown=0.	2023-07-05 08:48:37 +03:00
Marko Mäkelä	8a86df37ef	MDEV-31088 Server freeze due to innodb_change_buffering A 3-thread deadlock has been frequently observed when using innodb_change_buffering!=none and innodb_file_per_table=0: (1) ibuf_merge_or_delete_for_page() holding an exclusive latch on the block and waiting for an exclusive tablespace latch in fseg_page_is_allocated() (2) btr_free_but_not_root() in fseg_free_step() waiting for an exclusive tablespace latch (3) fsp_alloc_free_page() holding the exclusive tablespace latch and waiting for a latch on the block, which it is reallocating for something else While this was reproduced using innodb_file_per_table=0, this hang should be theoretically possible in .ibd files as well, when the recovery or cleanup of a failed DROP INDEX or ADD INDEX is executing concurrently with something that involves page allocation. ibuf_merge_or_delete_for_page(): Avoid invoking fseg_page_is_allocated() when block==nullptr. The call was redundant in this case, and it could cause deadlocks due to latching order violation. ibuf_read_merge_pages(): Acquire an exclusive tablespace latch before invoking buf_page_get_gen(), which may cause fseg_page_is_allocated() to be invoked in ibuf_merge_or_delete_for_page(). Note: This will not fix all latching order violations in this area! Deadlocks involving ibuf_merge_or_delete_for_page(block!=nullptr) are still possible if the caller is not acquiring an exclusive tablespace latch upfront. This would be the case in any read operation that involves a change buffer merge, such as SELECT, CHECK TABLE, or any DML operation that cannot be buffered in the change buffer.	2023-06-02 10:44:34 +03:00
Marko Mäkelä	818d5e4814	Merge 10.5 into 10.6	2023-04-25 13:10:33 +03:00
Oleksandr Byelkin	1d74927c58	Merge branch '10.4' into 10.5	2023-04-24 12:43:47 +02:00
Thirunarayanan Balathandayuthapani	660afb1e9c	MDEV-30076 ibuf_insert tries to insert the entry for uncommitted index - Change buffer should not buffer the changes for uncommitted index	2023-04-19 17:11:14 +05:30
Marko Mäkelä	f169dfb41a	Merge 10.5 into 10.6	2023-03-10 09:35:50 +02:00
Marko Mäkelä	08267ba0c8	MDEV-30819 InnoDB fails to start up after downgrading from MariaDB 11.0 While downgrades are not supported and misguided attempts at it could cause serious corruption especially after commit `b07920b634` it might be useful if InnoDB would start up even after an upgrade to MariaDB Server 11.0 or later had removed the change buffer. innodb_change_buffering_update(): Disallow anything else than innodb_change_buffering=none when the change buffer is corrupted. ibuf_init_at_db_start(): Mention a possible downgrade in the corruption error message. If innodb_change_buffering=none, ignore the error but do not initialize ibuf.index. ibuf_free_excess_pages(), ibuf_contract(), ibuf_merge_space(), ibuf_update_max_tablespace_id(), ibuf_delete_for_discarded_space(), ibuf_print(): Check for !ibuf.index. ibuf_check_bitmap_on_import(): Remove some unnecessary code. This function is only accessing change buffer bitmap pages in a data file that is not attached to the rest of the database. It is not accessing the change buffer tree itself, hence it does not need any additional mutex protection. This has been tested both by starting up MariaDB Server 10.8 on a 11.0 data directory, and by running ./mtr --big-test while ibuf_init_at_db_start() was tweaked to always fail.	2023-03-09 16:16:58 +02:00
Marko Mäkelä	96a3b11d13	Merge 10.5 into 10.6	2023-02-14 15:23:23 +02:00
Thirunarayanan Balathandayuthapani	1a5c7552ea	MDEV-30552 InnoDB recovery crashes when error handling scenario - InnoDB fails to reset the after_apply variable before applying the redo log in last batch during multi-batch recovery.	2023-02-14 14:36:17 +05:30
Marko Mäkelä	de4030e4d4	MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT This also fixes part of MDEV-29835 Partial server freeze which is caused by violations of the latching order that was defined in https://dev.mysql.com/worklog/task/?id=6326 (WL#6326: InnoDB: fix index->lock contention). Unless the current thread is holding an exclusive dict_index_t::lock, it must acquire page latches in a strict parent-to-child, left-to-right order. Not all cases of MDEV-29835 are fixed yet. Failure to follow the correct latching order will cause deadlocks of threads due to lock order inversion. As part of these changes, the BTR_MODIFY_TREE mode is modified so that an Update latch (U a.k.a. SX) will be acquired on the root page, and eXclusive latches (X) will be acquired on all pages leading to the leaf page, as well as any left and right siblings of the pages along the path. The DEBUG_SYNC test innodb.innodb_wl6326 will be removed, because at the time the DEBUG_SYNC point is hit, the thread is actually holding several page latches that will be blocking a concurrent SELECT statement. We also remove double bookkeeping that was caused due to excessive information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo store information of latched pages, and ensure that mtr_memo_slot_t::object is never a null pointer. The tree_blocks[] and tree_savepoints[] were redundant. buf_page_get_low(): If innodb_change_buffering_debug=1, to avoid a hang, do not try to evict blocks if we are holding a latch on a modified page. The test innodb.innodb-change-buffer-recovery will be removed, because change buffering may no longer be forced by debug injection when the change buffer comprises multiple pages. Remove a debug assertion that could fail when innodb_change_buffering_debug=1 fails to evict a page. For other cases, the assertion is redundant, because we already checked that right after the got_block: label. The test innodb.innodb-change-buffering-recovery will be removed, because due to this change, we will be unable to evict the desired page. mtr_t::lock_register(): Register a change of a page latch on an unmodified buffer-fixed block. mtr_t::x_latch_at_savepoint(), mtr_t::sx_latch_at_savepoint(): Replaced by the use of mtr_t::upgrade_buffer_fix(), which now also handles RW_S_LATCH. mtr_t::set_modified(): For temporary tables, invoke buf_page_t::set_modified() here and not in mtr_t::commit(). We will never set the MTR_MEMO_MODIFY flag on other than persistent data pages, nor set mtr_t::m_modifications when temporary data pages are modified. mtr_t::commit(): Only invoke the buf_flush_note_modification() loop if persistent data pages were modified. mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo. This avoids many redundant entries in mtr_t::m_memo, as well as redundant calls to buf_page_get_gen() for blocks that had already been looked up in a mini-transaction. btr_get_latched_root(): Return a pointer to an already latched root page. This replaces btr_root_block_get() in cases where the mini-transaction has already latched the root page. btr_page_get_parent(): Fetch a parent page that was already latched in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched(). If needed, upgrade the root page U latch to X. This avoids bloating mtr_t::m_memo as well as performing redundant buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for B-tree defragmentation, we will invoke btr_cur_search_to_nth_level(). btr_cur_search_to_nth_level(): This will only be used for non-leaf (level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be removed altogether, or retained for the case of CHECK TABLE without QUICK. btr_cur_t::left_block: Remove. btr_pcur_move_backward_from_page() can retrieve the left sibling from the end of mtr_t::m_memo. btr_cur_t::open_leaf(): Some clean-up. btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level() for searches to level=0 (the leaf level). We will never release parent page latches before acquiring leaf page latches. If we need to temporarily release the level=1 page latch in the BTR_SEARCH_PREV or BTR_MODIFY_PREV latch_mode, we will reposition the cursor on the child node pointer so that we will land on the correct leaf page. btr_cur_t::pessimistic_search_leaf(): Implement new BTR_MODIFY_TREE latching logic in the case that page splits or merges will be needed. The parent pages (and their siblings) should already be latched on the first dive to the leaf and be present in mtr_t::m_memo; there should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost suffices; it must be revised in MDEV-29835 and work-arounds removed for cases where mtr_t::get_already_latched() fails to find a block. rtr_search_to_nth_level(): A SPATIAL INDEX version of btr_search_to_nth_level() that can search to any level (including the leaf level). rtr_search_leaf(), rtr_insert_leaf(): Wrappers for rtr_search_to_nth_level(). rtr_search(): Replaces rtr_pcur_open(). rtr_latch_leaves(): Replaces btr_cur_latch_leaves(). Note that unlike in the B-tree code, there is no error handling in case the sibling pages are corrupted. rtr_cur_restore_position(): Remove an unused constant parameter. btr_pcur_open_on_user_rec(): Remove the constant parameter mode=PAGE_CUR_GE. row_ins_clust_index_entry_low(): Use a new mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC. BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove. BTR_CONT_MODIFY_TREE: Note that this is only used by rtr_search_to_nth_level(). btr_pcur_optimistic_latch_leaves(): Replaces btr_cur_optimistic_latch_leaves(). ibuf_delete_rec(): Acquire exclusive ibuf.index->lock in order to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV). btr_blob_log_check_t(): Acquire a U latch on the root page, so that btr_page_alloc() in btr_store_big_rec_extern_fields() will avoid a deadlock. btr_store_big_rec_extern_fields(): Assert that the root page latch is being held. Tested by: Matthias Leich Reviewed by: Vladislav Lesin	2023-01-24 14:09:21 +02:00
Marko Mäkelä	e41fb3697c	Revert "MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT" This reverts commit `f9cac8d2cb` which was accidentally pushed prematurely.	2023-01-23 14:52:49 +02:00
Marko Mäkelä	f9cac8d2cb	MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT This also fixes part of MDEV-29835 Partial server freeze which is caused by violations of the latching order that was defined in https://dev.mysql.com/worklog/task/?id=6326 (WL#6326: InnoDB: fix index->lock contention). Unless the current thread is holding an exclusive dict_index_t::lock, it must acquire page latches in a strict parent-to-child, left-to-right order. Not all cases are fixed yet. Failure to follow the correct latching order will cause deadlocks of threads due to lock order inversion. As part of these changes, the BTR_MODIFY_TREE mode is modified so that an Update latch (U a.k.a. SX) will be acquired on the root page, and eXclusive latches (X) will be acquired on all pages leading to the leaf page, as well as any left and right siblings of the pages along the path. The test innodb.innodb_wl6326 will be removed, because at the time the DEBUG_SYNC point is hit, the thread is actually holding several page latches that will be blocking a concurrent SELECT statement. We also remove double bookkeeping that was caused due to excessive information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo store information of latched pages, and ensure that mtr_memo_slot_t::object is never a null pointer. The tree_blocks[] and tree_savepoints[] were redundant. mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo. This avoids many redundant entries in mtr_t::m_memo, as well as redundant calls to buf_page_get_gen() for blocks that had already been looked up in a mini-transaction. btr_get_latched_root(): Return a pointer to an already latched root page. This replaces btr_root_block_get() in cases where the mini-transaction has already latched the root page. btr_page_get_parent(): Fetch a parent page that was already latched in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched(). If needed, upgrade the root page U latch to X. This avoids bloating mtr_t::m_memo as well as redundant buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for B-tree defragmentation, we will invoke btr_cur_search_to_nth_level(). btr_cur_search_to_nth_level(): This will only be used for non-leaf (level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be removed altogether, or retained for the case of CHECK TABLE without QUICK. btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level() for searches to level=0 (the leaf level). btr_cur_t::pessimistic_search_leaf(): Implement the new BTR_MODIFY_TREE latching logic in the case that page splits or merges will be needed. The parent pages (and their siblings) should already be latched on the first dive to the leaf and be present in mtr_t::m_memo; there should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost suffices; MDEV-29835 will have to revise it and remove work-arounds where mtr_t::get_already_latched() fails to find a block. rtr_search_to_nth_level(): A SPATIAL INDEX version of btr_search_to_nth_level() that can search to any level (including the leaf level). rtr_search_leaf(), rtr_insert_leaf(): Wrappers for rtr_search_to_nth_level(). rtr_search(): Replaces rtr_pcur_open(). rtr_cur_restore_position(): Remove an unused constant parameter. btr_pcur_open_on_user_rec(): Remove the constant parameter mode=PAGE_CUR_GE. btr_cur_latch_leaves(): Update a pre-existing mtr_t::m_memo entry for the current leaf page. row_ins_clust_index_entry_low(): Use a new mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC. btr_cur_t::open_leaf(): Some clean-up. mtr_t::lock_register(): Register a page latch on a buffer-fixed block. BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove. BTR_CONT_MODIFY_TREE: Note that this is only used by rtr_search_to_nth_level(). btr_pcur_optimistic_latch_leaves(): Replaces btr_cur_optimistic_latch_leaves(). ibuf_delete_rec(): Acquire ibuf.index->lock.u_lock() in order to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV). Tested by: Matthias Leich	2023-01-19 17:19:18 +02:00
Marko Mäkelä	0a7d85c97f	MDEV-30148 Race condition between non-persistent statistics and purge btr_cur_t::open_random_leaf(): Replaces btr_cur_open_at_rnd_pos(). Acquire a shared latch on each page, and finally release all latches except the one on the leaf page. This fixes a race condition between the purge of history and btr_estimate_number_of_different_key_vals(), which turned out to only hold a buffer-fix on the randomly chosen leaf page. Typically, an assertion would fail in page_rec_is_supremum(). ibuf_contract(): Start from the beginning of the change buffer, to simplify the logic. Starting with commit `b42294bc64` it does not matter much where the change buffer merge is being initiated. The race condition may have been introduced as early as mysql/mysql-server@ac74632293 from where it was copied to commit `2e814d4702`. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2022-12-05 18:00:22 +02:00
Marko Mäkelä	fdc582fd98	Merge 10.5 into 10.6	2022-11-28 12:20:17 +02:00
Marko Mäkelä	db14eb16f9	MDEV-30106 InnoDB fails to validate the change buffer on startup ibuf_init_at_db_start(): Validate the change buffer root page. A later version may stop creating a change buffer, and this validation check will prevent a downgrade from such later versions. ibuf_max_size_update(): If the change buffer was not loaded, do nothing. dict_boot(): Merge the local variable "error" to "err". Ignore failures of ibuf_init_at_db_start() if innodb_force_recovery>=4.	2022-11-28 11:34:22 +02:00
Marko Mäkelä	6d40274f65	Merge 10.5 into 10.6	2022-11-23 18:13:28 +02:00
Marko Mäkelä	165564d3c3	MDEV-30009 InnoDB shutdown hangs when the change buffer is corrupted The InnoDB change buffer (ibuf.index, stored in the system tablespace) and the change buffer bitmaps in persistent tablespaces could get out of sync with each other: According to the bitmap, no changes exist for a page, while there actually exist buffered entries in ibuf.index. InnoDB performs lazy deletion of buffered changes. When a secondary index leaf page is freed (possibly as part of DROP INDEX), any buffered changes will not be deleted. Instead, they would be deleted on a subsequent buf_page_create_low(). One scenario where InnoDB failed to delete buffered changes is as follows: 1. Some changes were buffered for a secondary index leaf page. 2. The index page had been freed. 3. ibuf_read_merge_pages() invoked ibuf_merge_or_delete_for_page(), which noticed that the page had been freed, and reset the change buffer bits, but did not delete the records from ibuf.index. 4. The index page was reallocated for something else. 5. The index page was removed from the buffer pool. 6. Some changes were buffered for the newly created page. 7. Finally, the buffered changes from both 1. and 6. were merged. 8. The index is corrupted. An alternative outcome is: 4. Shutdown with innodb_fast_shutdown=0 gets into an infinite loop. An alternative scenario is: 3. ibuf_set_bitmap_for_bulk_load() reset the IBUF_BITMAP_BUFFERED bit but did not delete the ibuf.index records for that page number. The shutdown hang was already once fixed in commit `d7a2401750`, refactored for 10.5 in commit `77e8a311e1` and disabled in commit `310dff5d84` due to corruption. We will fix this as follows: ibuf_delete_recs(): Delete all ibuf.index entries for the specified page. ibuf_merge_or_delete_for_page(): When the change buffer bitmap bits were set and the page had been freed, and the page does not belong to ibuf.index itself, invoke ibuf_delete_recs(). This prevents the corruption from occurring when a DML operation is allocating a previously freed page for which changes had been buffered. ibuf_set_bitmap_for_bulk_load(): When the change buffer bitmap bits were set, invoke ibuf_delete_recs(). This prevents the corruption from occurring when CREATE INDEX is reusing a previously freed page. ibuf_read_merge_pages(): On slow shutdown, remove the orphan records by invoking ibuf_delete_recs(). This fixes the hang when the change buffer had become corrupted. We also remove the dops[] accounting, because nothing can monitor it during shutdown. We invoke ibuf_delete_recs() if: (a) buf_page_get_gen() failed to load the page or merge changes (b) the page is not a valid index leaf page (c) the page number is out of tablespace bounds srv_shutdown(): Invoke ibuf_max_size_update(0) to ensure that the race condition that motivated us to disable the code in ibuf_read_merge_pages() in commit `310dff5d84` is no longer possible. That is, during slow shutdown, both the rollback of transactions and the purge of history will return early from ibuf_insert_low(). ibuf_merge_space(), ibuf_delete_for_discarded_space(): Cleanup: Do not allocate a memory heap. This was implemented by Thirunarayanan Balathandayuthapani and tested with innodb_change_buffering_debug=1 by Matthias Leich.	2022-11-23 17:34:05 +02:00
Marko Mäkelä	24fe53477c	MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin	2022-11-17 08:19:01 +02:00
Marko Mäkelä	89ec4b53ac	MDEV-29603: Implement btr_cur_t::open_leaf() btr_cur_t::open_leaf(): Replaces btr_cur_open_at_index_side() for most calls, except dict_stats_analyze_index(), which is the only place where we need to open a page at the non-leaf level. Use btr_block_get() for better error handling. Also, use the enumeration type btr_latch_mode wherever possible. Reviewed by: Vladislav Lesin	2022-11-16 09:43:48 +02:00
Marko Mäkelä	6b2d6a81d4	Cleanup: Remove btr_pcur_t::old_stored The Boolean field btr_pcur_t::old_stored mostly duplicates old_rec. Let us remove it.	2022-11-16 09:36:36 +02:00
Marko Mäkelä	af7bd64edb	Cleanup: Use BTR_INSERT_TREE for pessimistic insert	2022-11-16 09:36:36 +02:00
Marko Mäkelä	b363d9758a	Cleanup: Use the alias BTR_PURGE_TREE for pessimistic delete	2022-11-16 09:36:36 +02:00
Marko Mäkelä	ae6ebafd81	Merge 10.5 into 10.6	2022-11-14 15:44:55 +02:00
Marko Mäkelä	dc2741be52	MDEV-29984 innodb_fast_shutdown=0 fails to report change buffer merge progress ibuf.size, ibuf.max_size: Changed the type to Atomic_relaxed<ulint> in order to fix some (not all) race conditions. ibuf_contract(): Renamed from ibuf_merge_pages(ulint*). ibuf_merge(), ibuf_merge_all(): Removed. srv_shutdown(): Invoke log_free_check() and ibuf_contract(). Even though ibuf_contract() is not writing anything, it will trigger calls of ibuf_merge_or_delete_for_page(), which will write something. Because we cannot invoke log_free_check() at that low level, we must invoke it at the high level. srv_shutdown_print(): Replaces srv_shutdown_print_master_pending(). Report progress and remaining work every 15 seconds. For the change buffer merge, the remaining work is indicated by ibuf.size.	2022-11-14 15:40:28 +02:00
Marko Mäkelä	2ac1edb1c3	Merge 10.5 into 10.6	2022-11-08 17:37:22 +02:00
Marko Mäkelä	a732d5e2ba	Merge 10.4 into 10.5	2022-11-08 17:01:28 +02:00
Marko Mäkelä	93b4f84ab2	Merge 10.3 into 10.4	2022-11-08 16:04:01 +02:00
Marko Mäkelä	b737d09dbc	MDEV-29905 Change buffer operations fail to check for log file overflow Every operation that is going to write redo log is supposed to invoke log_free_check() before acquiring any latches. If there is a risk of log buffer overrun, a log checkpoint would be triggered by that call. ibuf_merge_space(), ibuf_merge_in_background(), ibuf_delete_for_discarded_space(): Invoke log_free_check() when the current thread is not holding any page latches. Unfortunately, in lower-level code called from ibuf_insert() or ibuf_merge_or_delete_for_page(), some page latches may be held and a call to log_free_check() could hang. ibuf_set_bitmap_for_bulk_load(): Use the caller's mini-transaction. The caller should have invoked log_free_check() while not holding any page latches.	2022-11-08 11:37:43 +02:00
Marko Mäkelä	1e8189fcee	MDEV-13564 follow-up: Correct a bogus comment This fixes up commit `e3c39c0be8`	2022-11-08 10:55:22 +02:00
Marko Mäkelä	63478e72de	MDEV-21098: Assertion failure in rec_get_offsets_func() The function rec_get_offsets_func() used to hit ut_error due to an invalid rec_get_status() value of a ROW_FORMAT!=REDUNDANT record. This fix is twofold: We will not only avoid a crash on corruption in this case, but we will also make more effort to validate each record every time we are iterating over index page records. rec_get_offsets_func(): Do not crash on a corrupted record. page_rec_get_nth(): Return nullptr on error. page_dir_slot_get_rec_validate(): Like page_dir_slot_get_rec(), but validate the pointer and return nullptr on error. page_cur_search_with_match(), page_cur_search_with_match_bytes(), page_dir_split_slot(), page_cur_move_to_next(): Indicate failure in a return value. page_cur_search(): Replaced with page_cur_search_with_match(). rec_get_next_ptr_const(), rec_get_next_ptr(): Replaced with page_rec_get_next_low(). TODO: rtr_page_split_initialize_nodes(), rtr_update_mbr_field(), and possibly other SPATIAL INDEX functions fail to properly handle errors. Reviewed by: Thirunarayanan Balathandayuthapani Tested by: Matthias Leich Performance tested by: Axel Schwenke	2022-08-01 11:25:50 +03:00
Marko Mäkelä	77b3959b5c	MDEV-28457 Crash in page_dir_find_owner_slot() A prominent remaining source of crashes on corrupted index pages is page directory corruption. A frequent caller of page_dir_find_owner_slot() is page_rec_get_prev(). Some of those calls can be replaced with simpler logic that is less prone to fail. page_dir_find_owner_slot(), page_rec_get_prev(), page_rec_get_prev_const(), btr_pcur_move_to_prev(), btr_pcur_move_to_prev_on_page(), btr_cur_upd_rec_sys(), page_delete_rec_list_end(), rtr_page_copy_rec_list_end_no_locks(), rtr_page_copy_rec_list_start_no_locks(): Return an error code on failure. fil_space_t::io(), buf_page_get_low(): Use DB_CORRUPTION for out-of-bounds page reads. PageBulk::getSplitRec(), PageBulk::copyOut(): Simplify the code. btr_validate_level(): Prevent some more CHECK TABLE crashes on corrupted pages. btr_block_get(), btr_pcur_move_to_next_page(): Implement some checks that were previously only part of IndexPurge::next(). IndexPurge::next(): Use btr_pcur_move_to_next_page().	2022-06-08 14:53:24 +03:00
Marko Mäkelä	0b47c126e3	MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit `177d8b0c12` is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.	2022-06-06 14:03:22 +03:00
Marko Mäkelä	bc113b873f	MDEV-28465 Some calls to btr_pcur_close() are unnecessary The function btr_pcur_close() is being invoked on local variables even when no cleanup needs to be done. In particular, for B-tree indexes (not SPATIAL INDEX), unless btr_pcur_store_position() was invoked in the past, there is no need to invoke btr_pcur_close(). On purge and rollback, we will retain btr_pcur_close(&pcur) because otherwise some ./mtr --suite=innodb_gis tests would leak memory.	2022-05-03 17:49:26 +03:00
Marko Mäkelä	caa985e6e6	MDEV-28454 Latching order violation (and hang) in ibuf_insert_low Since commit `2ca1123464` (MDEV-26217) the function trx_t::commit(std::vector<pfs_os_file_t>&) holds exclusive lock_sys.latch while invoking fil_delete_tablespace(), which in turn may wait for change buffer tree latches in ibuf_delete_for_discarded_space(). ibuf_insert_low(): If a shared lock_sys.latch cannot be acquired without waiting, refuse to buffer the insert. In this way, a deadlock with a DDL operation will be avoided. ibuf_insert_to_index_page(), ibuf_delete(): Remove redundant calls to record locking. In ibuf_insert_low() we already ensured that no record locks existed on the page. No locks can be added before the buffered changes have been merged.	2022-05-03 13:31:40 +03:00
Marko Mäkelä	2ca1123464	MDEV-26217 Failing assertion: list.count > 0 in ut_list_remove or Assertion `lock->trx == this' failed in dberr_t trx_t::drop_table This follows up the previous fix in commit `c3c53926c4` (MDEV-26554). ha_innobase::delete_table(): Work around the insufficient metadata locking (MDL) during DML operations by acquiring exclusive InnoDB table locks on all child tables. Previously, this was only done on TRUNCATE and ALTER. ibuf_delete_rec(), btr_cur_optimistic_delete(): Do not invoke lock_update_delete() during change buffer operations. The revised trx_t::commit(std::vector<pfs_os_file_t>&) will hold exclusive lock_sys.latch while invoking fil_delete_tablespace(), which in turn may invoke ibuf_delete_rec(). dict_index_t::has_locking(): A new predicate, replacing the dummy !dict_table_is_locking_disabled(index->table). Used for skipping lock operations during ibuf_delete_rec(). trx_t::commit(std::vector<pfs_os_file_t>&): Release the locks and remove the table from the cache while holding exclusive lock_sys.latch. trx_t::commit_in_memory(): Skip release_locks() if dict_operation holds. trx_t::commit(): Reset dict_operation before invoking commit_in_memory() via commit_persist(). lock_release_on_drop(): Release locks while lock_sys.latch is exclusively locked. lock_table(): Add a parameter for a pointer to the table. We must not dereference the table before a lock_sys.latch has been acquired. If the pointer to the table does not match the table at that point, the table is invalid and DB_DEADLOCK will be returned. row_ins_foreign_check_on_constraint(): Improve the checks. Remove a bogus DB_LOCK_WAIT_TIMEOUT return that was needed before commit `c5fd9aa562` (MDEV-25919). row_upd_check_references_constraints(), wsrep_row_upd_check_foreign_constraints(): Simplify checks.	2022-04-26 18:09:03 +03:00
Marko Mäkelä	fae0ccad6e	Merge 10.5 into 10.6	2022-04-21 17:46:40 +03:00
Marko Mäkelä	620c55e708	Merge 10.4 into 10.5	2022-04-21 15:33:50 +03:00
Marko Mäkelä	394784095e	Merge 10.3 into 10.4	2022-04-21 11:33:59 +03:00
Marko Mäkelä	4730314a70	MDEV-28369 ibuf_bitmap_mutex is an unnecessary contention point The only purpose of ibuf_bitmap_mutex is to prevent a deadlock between two concurrent invocations of ibuf_update_free_bits_for_two_pages_low() on the same pair of bitmap pages, but in opposite order. The mutex is unnecessarily serializing the execution of the function even when it is being invoked on totally different tablespaces. To avoid deadlocks, it suffices to ensure that the two page latches are being acquired in a deterministic (sorted) order.	2022-04-21 09:15:18 +03:00
Marko Mäkelä	7b957316cb	Merge 10.3 into 10.4	2022-04-07 10:32:56 +03:00
Jan Lindström	3c99a48db3	MDEV-28247 : Disable background ibuf merge during Galera SST This failure was caused by MDEV-25975, which removed the parameter innodb_disallow_writes. Added a check for wsrep_sst_disable_writes to the function ibuf_merge_in_background().	2022-04-07 08:45:01 +03:00
Vlad Lesin	f6f055a191	Merge 10.3 into 10.4	2022-02-21 14:10:27 +03:00
Vlad Lesin	a6f258e47f	MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration Backported from 10.5 `20e9e804c1` and `5948d7602e`. sel_restore_position_for_mysql() moves forward persistent cursor position after btr_pcur_restore_position() call if cursor relative position is BTR_PCUR_ON and the cursor points to the record with NOT the same field values as in a stored record(and some other not important for this case conditions). It was done because btr_pcur_restore_position() sets page_cur_mode_t mode to PAGE_CUR_LE for cursor->rel_pos == BTR_PCUR_ON before opening cursor. So we are searching for the record less or equal to stored one. And if the found record is not equal to stored one, then it is less and we need to move cursor forward. But there can be a situation when the stored record was purged, but the new one with the same key but different value was inserted while row_search_mvcc() was suspended. In this case, when the thread is awaken, it will invoke sel_restore_position_for_mysql(), which, in turns, invoke btr_pcur_restore_position(), which will return false because found record don't match stored record, and sel_restore_position_for_mysql() will move forward cursor position. The above can lead to the case when awaken row_search_mvcc() do not see records inserted by other transactions while it slept. The mtr test case shows the example how it can be. The fix is to return special value from persistent cursor restoring function which would notify its caller that uniq fields of restored record and stored record are the same, and in this case sel_restore_position_for_mysql() don't move cursor forward. Delete-marked records are correctly processed in row_search_mvcc(). Non-unique secondary indexes are "uniquified" by adding the PK, the index->n_uniq should then be index->n_fields. So there is no need in additional checks in the fix. If transaction's readview can't see the changes made in secondary index record, it requests clustered index record in row_search_mvcc() to check its transaction id and get the correspondent record version. After this row_search_mvcc() commits mtr to preserve clustered index latching order, and starts mtr. Between those mtr commit and start secondary index pages are unlatched, and purge has the ability to remove stored in the cursor record, what causes rows duplication in result set for non-locking reads, as cursor position is restored to the previously visited record. To solve this the changes are just switched off for non-locking reads, it's quite simple solution, besides the changes don't make sense for non-locking reads. The more complex and effective from performance perspective solution is to create mtr savepoint before clustered record requesting and rolling back to that savepoint after that. See MDEV-27557. One more solution is to have per-record transaction id for secondary indexes. See MDEV-17598. If any of those is implemented, just remove select_lock_type argument in sel_restore_position_for_mysql().	2022-02-21 12:49:54 +03:00

1 2 3 4 5 ...

383 commits