mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-17 04:22:27 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	a60462d93e	Remove bogus references to replaced Google contributions In commit `03ca6495df` and commit `ff5d306e29` we forgot to remove some Google copyright notices related to a contribution of using atomic memory access in the old InnoDB mutex_t and rw_lock_t implementation. The copyright notices had been mostly added in commit `c6232c06fa` due to commit `a1bb700fd2`. The following Google contributions remain: * some logic related to the parameter innodb_io_capacity * innodb_encrypt_tables, added in MariaDB Server 10.1	2023-08-21 15:51:16 +03:00
Marko Mäkelä	6cc88c3db1	Clean up buf0buf.inl Let us move some #include directives from buf0buf.inl to the compilation units where they are really used.	2023-08-21 15:51:10 +03:00
Oleksandr Byelkin	6bf8483cac	Merge branch '10.5' into 10.6	2023-08-01 15:08:52 +02:00
Marko Mäkelä	b102872ad5	MDEV-31767 InnoDB tables are being flagged as corrupted on an I/O bound server The main problem is that at ever since commit `aaef2e1d8c` removed the function buf_wait_for_read(), it is not safe to invoke buf_page_get_low() with RW_NO_LATCH, that is, only buffer-fixing the page. If a page read (or decryption or decompression) is in progress, there would be a race condition when executing consistency checks, and a page would wrongly be flagged as corrupted. Furthermore, if the page is actually corrupted and the initial access to it was with RW_NO_LATCH (only buffer-fixing), the page read handler would likely end up in an infinite loop in buf_pool_t::corrupted_evict(). It is not safe to invoke mtr_t::upgrade_buffer_fix() on a block on which a page latch was not initially acquired in buf_page_get_low(). btr_block_reget(): Remove the constant parameter rw_latch=RW_X_LATCH. btr_block_get(): Assert that RW_NO_LATCH is not being used, and change the parameter type of rw_latch. btr_pcur_move_to_next_page(), innobase_table_is_empty(): Adjust for the parameter type change of btr_block_get(). btr_root_block_get(): If mode==RW_NO_LATCH, do not check the integrity of the page, because it is not safe to do so. btr_page_alloc_low(), btr_page_free(): If the root page latch is not previously held by the mini-transaction, invoke btr_root_block_get() again with the proper latching mode. btr_latch_prev(): Helper function to safely acquire a latch on a preceding sibling page while holding a latch on a B-tree page. To avoid deadlocks, we must not wait for the latch while holding a latch on the current page, because another thread may be waiting for our page latch when moving to the next page from our preceding sibling page. If s_lock_try() or x_lock_try() on the preceding page fails, we must release the current page latch, and wait for the latch on the preceding page as well as the current page, in that order. Page splits or merges will be prevented by the parent page latch that we are holding. btr_cur_t::search_leaf(): Make use of btr_latch_prev(). btr_cur_t::open_leaf(): Make use of btr_latch_prev(). Do not invoke mtr_t::upgrade_buffer_fix() (when latch_mode == BTR_MODIFY_TREE), because we will already have acquired all page latches upfront. btr_cur_t::pessimistic_search_leaf(): Do acquire an exclusive index latch before accessing the page. Make use of btr_latch_prev().	2023-07-25 11:40:58 +03:00
Oleksandr Byelkin	f52954ef42	Merge commit '10.4' into 10.5	2023-07-20 11:54:52 +02:00
Marko Mäkelä	b8088487e4	MDEV-19216 Assertion ...SYS_FOREIGN failed in btr_node_ptr_max_size btr_node_ptr_max_size(): Handle BINARY(0) and VARBINARY(0) as special cases, similar to CHAR(0) and VARCHAR(0).	2023-07-03 16:09:18 +03:00
Marko Mäkelä	3d90143859	MDEV-31559 btr_search_hash_table_validate() does not check if CHECK TABLE is killed btr_search_hash_table_validate(), btr_search_validate(): Add the parameter THD for checking if the statement has been killed. Any non-QUICK CHECK TABLE will validate the entire adaptive hash index for all InnoDB tables, which may be extremely slow when running multiple concurrent CHECK TABLE.	2023-06-30 17:07:21 +03:00
Vlad Lesin	687fd6bef5	MDEV-30648 btr_estimate_n_rows_in_range() accesses unfixed, unlatched page The issue is caused by MDEV-30400 fix. There are two cursors in btr_estimate_n_rows_in_range() - p1 and p2, but both share the same mtr. Each cursor contains mtr savepoint for the previously fetched block to release it then the current block is fetched. Before MDEV-30400 the block was released with mtr_t::release_block_at_savepoint(), it just unfixed a block and released its page patch. In MDEV-30400 it was replaced with mtr_t::rollback_to_savepoint(), which does the same as the former mtr_t::release_block_at_savepoint(ulint begin, ulint end) but also erases the corresponding slots from mtr memo, what invalidates any stored mtr's memo savepoints, greater or equal to "begin". The idea of the fix is to get rid of savepoints at all in btr_estimate_n_rows_in_range() and btr_estimate_n_rows_in_range_on_level(). As mtr_t::rollback_to_savepoint() erases elements from mtr_t::m_memo, we know what element of mtr_t::m_memo can be deleted on the certain case, so there is no need to store savepoints. See also the following slides for details: https://docs.google.com/presentation/d/1RFYBo7EUhM22ab3GOYctv3j_3yC0vHtBY9auObZec8U Reviewed by: Marko Mäkelä	2023-06-28 11:00:02 +03:00
Marko Mäkelä	883333a74e	MDEV-31158: Potential hang with ROW_FORMAT=COMPRESSED tables btr_cur_need_opposite_intention(): Check also page_zip_available() so that we will escalate to exclusive index latch when a non-leaf page may have to be split further due to ROW_FORMAT=COMPRESSED page overflow. Tested by: Matthias Leich	2023-06-03 11:12:16 +02:00
Marko Mäkelä	818d5e4814	Merge 10.5 into 10.6	2023-04-25 13:10:33 +03:00
Oleksandr Byelkin	1d74927c58	Merge branch '10.4' into 10.5	2023-04-24 12:43:47 +02:00
Thirunarayanan Balathandayuthapani	2bfd04e314	MDEV-31025 Redundant table alter fails when fixed column stored externally row_merge_buf_add(): Has strict assert that fixed length mismatch shouldn't happen while rebuilding the redundant row format table btr_index_rec_validate(): Fixed size column can be stored externally. So sum of inline stored length and external stored length of the column should be equal to total column length	2023-04-19 17:11:14 +05:30
Marko Mäkelä	a091d6ac4e	MDEV-26827 fixup: Do not duplicate io_slots::pending_io_count() os_aio_pending_reads_approx(), os_aio_pending_reads(): Replaces buf_pool.n_pend_reads. os_aio_pending_writes(): Replaces buf_dblwr.pending_writes(). buf_dblwr_t::write_cond, buf_dblwr_t::writes_pending: Remove.	2023-04-12 13:49:57 +03:00
Marko Mäkelä	1efdf67e60	Merge 10.5 into 10.6	2023-03-22 15:54:45 +02:00
Marko Mäkelä	701399ad37	MDEV-30882 Crash on ROLLBACK in a ROW_FORMAT=COMPRESSED table btr_cur_upd_rec_in_place(): Avoid calling page_zip_write_rec() if we are not modifying any fields that are stored in compressed format. btr_cur_update_in_place_zip_check(): New function to check if a ROW_FORMAT=COMPRESSED record can actually be updated in place. btr_cur_pessimistic_update(): If the BTR_KEEP_POS_FLAG is not set (we are in a ROLLBACK and cannot write any BLOBs), ignore the potential overflow and let page_zip_reorganize() or page_zip_compress() handle it. This avoids a failure when an attempted UPDATE of an NULL column to 0 is rolled back. During the ROLLBACK, we would try to move a non-updated long column to off-page storage in order to avoid a compression failure of the ROW_FORMAT=COMPRESSED page. page_zip_write_trx_id_and_roll_ptr(): Remove an assertion that would fail in row_upd_rec_in_place() because the uncompressed page would already have been modified there. This is a 10.5 version of commit `ff3d4395d8` (different because of commit `08ba388713`).	2023-03-22 15:19:52 +02:00
Marko Mäkelä	ff3d4395d8	MDEV-30882 Crash on ROLLBACK in a ROW_FORMAT=COMPRESSED table row_upd_rec_in_place(): Avoid calling page_zip_write_rec() if we are not modifying any fields that are stored in compressed format. btr_cur_update_in_place_zip_check(): New function to check if a ROW_FORMAT=COMPRESSED record can actually be updated in place. btr_cur_pessimistic_update(): If the BTR_KEEP_POS_FLAG is not set (we are in a ROLLBACK and cannot write any BLOBs), ignore the potential overflow and let page_zip_reorganize() or page_zip_compress() handle it. This avoids a failure when an attempted UPDATE of an NULL column to 0 is rolled back. During the ROLLBACK, we would try to move a non-updated long column to off-page storage in order to avoid a compression failure of the ROW_FORMAT=COMPRESSED page. page_zip_write_trx_id_and_roll_ptr(): Remove an assertion that would fail in row_upd_rec_in_place() because the uncompressed page would already have been modified there. Thanks to Jean-François Gagné for providing a copy of a page that triggered these bugs on the ROLLBACK of UPDATE and DELETE. A 10.6 version of this was tested by Matthias Leich using cmake -DWITH_INNODB_EXTRA_DEBUG=ON a.k.a. UNIV_ZIP_DEBUG.	2023-03-22 14:31:00 +02:00
Marko Mäkelä	535a38c455	MDEV-30400 fixup: Fix the UNIV_ZIP_DEBUG build	2023-03-20 14:28:03 +02:00
Marko Mäkelä	a55b951e60	MDEV-26827 Make page flushing even faster For more convenient monitoring of something that could greatly affect the volume of page writes, we add the status variable Innodb_buffer_pool_pages_split that was previously only available via information_schema.innodb_metrics as "innodb_page_splits". This was suggested by Axel Schwenke. buf_flush_page_count: Replaced with buf_pool.stat.n_pages_written. We protect buf_pool.stat (except n_page_gets) with buf_pool.mutex and remove unnecessary export_vars indirection. buf_pool.flush_list_bytes: Moved from buf_pool.stat.flush_list_bytes. Protected by buf_pool.flush_list_mutex. buf_pool_t::page_cleaner_status: Replaces buf_pool_t::n_flush_LRU_, buf_pool_t::n_flush_list_, and buf_pool_t::page_cleaner_is_idle. Protected by buf_pool.flush_list_mutex. We will exclusively broadcast buf_pool.done_flush_list by the buf_flush_page_cleaner thread, and only wait for it when communicating with buf_flush_page_cleaner. There is no need to keep a count of pending writes by the buf_pool.flush_list processing. A single flag suffices for that. Waits for page write completion can be performed by simply waiting on block->page.lock, or by invoking buf_dblwr.wait_for_page_writes(). buf_LRU_block_free_non_file_page(): Broadcast buf_pool.done_free and set buf_pool.try_LRU_scan when freeing a page. This would be executed also as part of buf_page_write_complete(). buf_page_write_complete(): Do not broadcast buf_pool.done_flush_list, and do not acquire buf_pool.mutex unless buf_pool.LRU eviction is needed. Let buf_dblwr count all writes to persistent pages and broadcast a condition variable when no outstanding writes remain. buf_flush_page_cleaner(): Prioritize LRU flushing and eviction right after "furious flushing" (lsn_limit). Simplify the conditions and reduce the hold time of buf_pool.flush_list_mutex. Refuse to shut down or sleep if buf_pool.ran_out(), that is, LRU eviction is needed. buf_pool_t::page_cleaner_wakeup(): Add the optional parameter for_LRU. buf_LRU_get_free_block(): Protect buf_lru_free_blocks_error_printed with buf_pool.mutex. Invoke buf_pool.page_cleaner_wakeup(true) to to ensure that buf_flush_page_cleaner() will process the LRU flush request. buf_do_LRU_batch(), buf_flush_list(), buf_flush_list_space(): Update buf_pool.stat.n_pages_written when submitting writes (while holding buf_pool.mutex), not when completing them. buf_page_t::flush(), buf_flush_discard_page(): Require that the page U-latch be acquired upfront, and remove buf_page_t::ready_for_flush(). buf_pool_t::delete_from_flush_list(): Remove the parameter "bool clear". buf_flush_page(): Count pending page writes via buf_dblwr. buf_flush_try_neighbors(): Take the block of page_id as a parameter. If the tablespace is dropped before our page has been written out, release the page U-latch. buf_pool_invalidate(): Let the caller ensure that there are no outstanding writes. buf_flush_wait_batch_end(false), buf_flush_wait_batch_end_acquiring_mutex(false): Replaced with buf_dblwr.wait_for_page_writes(). buf_flush_wait_LRU_batch_end(): Replaces buf_flush_wait_batch_end(true). buf_flush_list(): Remove some broadcast of buf_pool.done_flush_list. buf_flush_buffer_pool(): Invoke also buf_dblwr.wait_for_page_writes(). buf_pool_t::io_pending(), buf_pool_t::n_flush_list(): Remove. Outstanding writes are reflected by buf_dblwr.pending_writes(). buf_dblwr_t::init(): New function, to initialize the mutex and the condition variables, but not the backing store. buf_dblwr_t::is_created(): Replaces buf_dblwr_t::is_initialised(). buf_dblwr_t::pending_writes(), buf_dblwr_t::writes_pending: Keeps track of writes of persistent data pages. buf_flush_LRU(): Allow calls while LRU flushing may be in progress in another thread. Tested by Matthias Leich (correctness) and Axel Schwenke (performance)	2023-03-16 17:19:58 +02:00
Marko Mäkelä	f2096478d5	MDEV-29835 InnoDB hang on B-tree split or merge This is a follow-up to commit `de4030e4d4` (MDEV-30400), which fixed some hangs related to B-tree split or merge. btr_root_block_get(): Use and update the root page guess. This is just a minor performance optimization, not affecting correctness. btr_validate_level(): Remove the parameter "lockout", and always acquire an exclusive dict_index_t::lock in CHECK TABLE without QUICK. This is needed in order to avoid latching order violation in btr_page_get_father_node_ptr_for_validate(). btr_cur_need_opposite_intention(): Return true in case btr_cur_compress_recommendation() would hold later during the mini-transaction, or if a page underflow or overflow is possible. If we return true, our caller will escalate to aqcuiring an exclusive dict_index_t::lock, to prevent a latching order violation and deadlock during btr_compress() or btr_page_split_and_insert(). btr_cur_t::search_leaf(), btr_cur_t::open_leaf(): Also invoke btr_cur_need_opposite_intention() on the leaf page. btr_cur_t::open_leaf(): When escalating to exclusive index locking, acquire exclusive latches on all pages as well. innobase_instant_try(): Return an error code if the root page cannot be retrieved. In addition to the normal stress testing with Random Query Generator (RQG) this has been tested with ./mtr --mysqld=--loose-innodb-limit-optimistic-insert-debug=2 but with the injection in btr_cur_optimistic_insert() for non-leaf pages adjusted so that it would use the value 3. (Otherwise, infinite page splits could occur in some mtr tests.) Tested by: Matthias Leich	2023-03-16 15:52:42 +02:00
Marko Mäkelä	ccafd2d49d	MDEV-30524 btr_cur_t::open_leaf() opens non-leaf page in BTR_MODIFY_LEAF mode btr_cur_t::open_leaf(): When we have to reopen the root page in a different mode, ensure that we will actually acquire a latch upfront, instead of using RW_NO_LATCH. This prevents a race condition where the index tree would be split between the time we released the root page S latch and finally acquired a latch in mtr->upgrade_buffer_fix(), actually on a non-leaf root page. This race condition was introduced in commit `89ec4b53ac` (MDEV-29603).	2023-01-31 16:38:11 +02:00
Marko Mäkelä	de4030e4d4	MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT This also fixes part of MDEV-29835 Partial server freeze which is caused by violations of the latching order that was defined in https://dev.mysql.com/worklog/task/?id=6326 (WL#6326: InnoDB: fix index->lock contention). Unless the current thread is holding an exclusive dict_index_t::lock, it must acquire page latches in a strict parent-to-child, left-to-right order. Not all cases of MDEV-29835 are fixed yet. Failure to follow the correct latching order will cause deadlocks of threads due to lock order inversion. As part of these changes, the BTR_MODIFY_TREE mode is modified so that an Update latch (U a.k.a. SX) will be acquired on the root page, and eXclusive latches (X) will be acquired on all pages leading to the leaf page, as well as any left and right siblings of the pages along the path. The DEBUG_SYNC test innodb.innodb_wl6326 will be removed, because at the time the DEBUG_SYNC point is hit, the thread is actually holding several page latches that will be blocking a concurrent SELECT statement. We also remove double bookkeeping that was caused due to excessive information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo store information of latched pages, and ensure that mtr_memo_slot_t::object is never a null pointer. The tree_blocks[] and tree_savepoints[] were redundant. buf_page_get_low(): If innodb_change_buffering_debug=1, to avoid a hang, do not try to evict blocks if we are holding a latch on a modified page. The test innodb.innodb-change-buffer-recovery will be removed, because change buffering may no longer be forced by debug injection when the change buffer comprises multiple pages. Remove a debug assertion that could fail when innodb_change_buffering_debug=1 fails to evict a page. For other cases, the assertion is redundant, because we already checked that right after the got_block: label. The test innodb.innodb-change-buffering-recovery will be removed, because due to this change, we will be unable to evict the desired page. mtr_t::lock_register(): Register a change of a page latch on an unmodified buffer-fixed block. mtr_t::x_latch_at_savepoint(), mtr_t::sx_latch_at_savepoint(): Replaced by the use of mtr_t::upgrade_buffer_fix(), which now also handles RW_S_LATCH. mtr_t::set_modified(): For temporary tables, invoke buf_page_t::set_modified() here and not in mtr_t::commit(). We will never set the MTR_MEMO_MODIFY flag on other than persistent data pages, nor set mtr_t::m_modifications when temporary data pages are modified. mtr_t::commit(): Only invoke the buf_flush_note_modification() loop if persistent data pages were modified. mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo. This avoids many redundant entries in mtr_t::m_memo, as well as redundant calls to buf_page_get_gen() for blocks that had already been looked up in a mini-transaction. btr_get_latched_root(): Return a pointer to an already latched root page. This replaces btr_root_block_get() in cases where the mini-transaction has already latched the root page. btr_page_get_parent(): Fetch a parent page that was already latched in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched(). If needed, upgrade the root page U latch to X. This avoids bloating mtr_t::m_memo as well as performing redundant buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for B-tree defragmentation, we will invoke btr_cur_search_to_nth_level(). btr_cur_search_to_nth_level(): This will only be used for non-leaf (level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be removed altogether, or retained for the case of CHECK TABLE without QUICK. btr_cur_t::left_block: Remove. btr_pcur_move_backward_from_page() can retrieve the left sibling from the end of mtr_t::m_memo. btr_cur_t::open_leaf(): Some clean-up. btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level() for searches to level=0 (the leaf level). We will never release parent page latches before acquiring leaf page latches. If we need to temporarily release the level=1 page latch in the BTR_SEARCH_PREV or BTR_MODIFY_PREV latch_mode, we will reposition the cursor on the child node pointer so that we will land on the correct leaf page. btr_cur_t::pessimistic_search_leaf(): Implement new BTR_MODIFY_TREE latching logic in the case that page splits or merges will be needed. The parent pages (and their siblings) should already be latched on the first dive to the leaf and be present in mtr_t::m_memo; there should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost suffices; it must be revised in MDEV-29835 and work-arounds removed for cases where mtr_t::get_already_latched() fails to find a block. rtr_search_to_nth_level(): A SPATIAL INDEX version of btr_search_to_nth_level() that can search to any level (including the leaf level). rtr_search_leaf(), rtr_insert_leaf(): Wrappers for rtr_search_to_nth_level(). rtr_search(): Replaces rtr_pcur_open(). rtr_latch_leaves(): Replaces btr_cur_latch_leaves(). Note that unlike in the B-tree code, there is no error handling in case the sibling pages are corrupted. rtr_cur_restore_position(): Remove an unused constant parameter. btr_pcur_open_on_user_rec(): Remove the constant parameter mode=PAGE_CUR_GE. row_ins_clust_index_entry_low(): Use a new mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC. BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove. BTR_CONT_MODIFY_TREE: Note that this is only used by rtr_search_to_nth_level(). btr_pcur_optimistic_latch_leaves(): Replaces btr_cur_optimistic_latch_leaves(). ibuf_delete_rec(): Acquire exclusive ibuf.index->lock in order to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV). btr_blob_log_check_t(): Acquire a U latch on the root page, so that btr_page_alloc() in btr_store_big_rec_extern_fields() will avoid a deadlock. btr_store_big_rec_extern_fields(): Assert that the root page latch is being held. Tested by: Matthias Leich Reviewed by: Vladislav Lesin	2023-01-24 14:09:21 +02:00
Marko Mäkelä	e41fb3697c	Revert "MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT" This reverts commit `f9cac8d2cb` which was accidentally pushed prematurely.	2023-01-23 14:52:49 +02:00
Marko Mäkelä	f9cac8d2cb	MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT This also fixes part of MDEV-29835 Partial server freeze which is caused by violations of the latching order that was defined in https://dev.mysql.com/worklog/task/?id=6326 (WL#6326: InnoDB: fix index->lock contention). Unless the current thread is holding an exclusive dict_index_t::lock, it must acquire page latches in a strict parent-to-child, left-to-right order. Not all cases are fixed yet. Failure to follow the correct latching order will cause deadlocks of threads due to lock order inversion. As part of these changes, the BTR_MODIFY_TREE mode is modified so that an Update latch (U a.k.a. SX) will be acquired on the root page, and eXclusive latches (X) will be acquired on all pages leading to the leaf page, as well as any left and right siblings of the pages along the path. The test innodb.innodb_wl6326 will be removed, because at the time the DEBUG_SYNC point is hit, the thread is actually holding several page latches that will be blocking a concurrent SELECT statement. We also remove double bookkeeping that was caused due to excessive information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo store information of latched pages, and ensure that mtr_memo_slot_t::object is never a null pointer. The tree_blocks[] and tree_savepoints[] were redundant. mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo. This avoids many redundant entries in mtr_t::m_memo, as well as redundant calls to buf_page_get_gen() for blocks that had already been looked up in a mini-transaction. btr_get_latched_root(): Return a pointer to an already latched root page. This replaces btr_root_block_get() in cases where the mini-transaction has already latched the root page. btr_page_get_parent(): Fetch a parent page that was already latched in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched(). If needed, upgrade the root page U latch to X. This avoids bloating mtr_t::m_memo as well as redundant buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for B-tree defragmentation, we will invoke btr_cur_search_to_nth_level(). btr_cur_search_to_nth_level(): This will only be used for non-leaf (level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be removed altogether, or retained for the case of CHECK TABLE without QUICK. btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level() for searches to level=0 (the leaf level). btr_cur_t::pessimistic_search_leaf(): Implement the new BTR_MODIFY_TREE latching logic in the case that page splits or merges will be needed. The parent pages (and their siblings) should already be latched on the first dive to the leaf and be present in mtr_t::m_memo; there should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost suffices; MDEV-29835 will have to revise it and remove work-arounds where mtr_t::get_already_latched() fails to find a block. rtr_search_to_nth_level(): A SPATIAL INDEX version of btr_search_to_nth_level() that can search to any level (including the leaf level). rtr_search_leaf(), rtr_insert_leaf(): Wrappers for rtr_search_to_nth_level(). rtr_search(): Replaces rtr_pcur_open(). rtr_cur_restore_position(): Remove an unused constant parameter. btr_pcur_open_on_user_rec(): Remove the constant parameter mode=PAGE_CUR_GE. btr_cur_latch_leaves(): Update a pre-existing mtr_t::m_memo entry for the current leaf page. row_ins_clust_index_entry_low(): Use a new mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC. btr_cur_t::open_leaf(): Some clean-up. mtr_t::lock_register(): Register a page latch on a buffer-fixed block. BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove. BTR_CONT_MODIFY_TREE: Note that this is only used by rtr_search_to_nth_level(). btr_pcur_optimistic_latch_leaves(): Replaces btr_cur_optimistic_latch_leaves(). ibuf_delete_rec(): Acquire ibuf.index->lock.u_lock() in order to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV). Tested by: Matthias Leich	2023-01-19 17:19:18 +02:00
Marko Mäkelä	a8a5c8a1b8	Merge 10.5 into 10.6	2022-12-13 16:58:58 +02:00
Marko Mäkelä	1dc2f35598	Merge 10.4 into 10.5	2022-12-13 14:39:18 +02:00
Marko Mäkelä	fdf43b5c78	Merge 10.3 into 10.4	2022-12-13 11:37:33 +02:00
Marko Mäkelä	0a7d85c97f	MDEV-30148 Race condition between non-persistent statistics and purge btr_cur_t::open_random_leaf(): Replaces btr_cur_open_at_rnd_pos(). Acquire a shared latch on each page, and finally release all latches except the one on the leaf page. This fixes a race condition between the purge of history and btr_estimate_number_of_different_key_vals(), which turned out to only hold a buffer-fix on the randomly chosen leaf page. Typically, an assertion would fail in page_rec_is_supremum(). ibuf_contract(): Start from the beginning of the change buffer, to simplify the logic. Starting with commit `b42294bc64` it does not matter much where the change buffer merge is being initiated. The race condition may have been introduced as early as mysql/mysql-server@ac74632293 from where it was copied to commit `2e814d4702`. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2022-12-05 18:00:22 +02:00
Thirunarayanan Balathandayuthapani	71c93fb8fd	MDEV-28462 Race condition between instant alter and AHI access - InnoDB AHI tries to access the concurrent instant alter column, leads to asan failure. Instant alter column should acquire the clustered index search latch in exclusive mode before changing the table cache definition. - Removed the default parameter for the function btr_search_drop_page_hash_index() - Addressed the DWITH_INNODB_AHI=0 compilation failure by passing two parameters from all callers of btr_search_drop_page_hash_index()	2022-11-22 15:24:44 +05:30
Marko Mäkelä	24fe53477c	MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin	2022-11-17 08:19:01 +02:00
Marko Mäkelä	89ec4b53ac	MDEV-29603: Implement btr_cur_t::open_leaf() btr_cur_t::open_leaf(): Replaces btr_cur_open_at_index_side() for most calls, except dict_stats_analyze_index(), which is the only place where we need to open a page at the non-leaf level. Use btr_block_get() for better error handling. Also, use the enumeration type btr_latch_mode wherever possible. Reviewed by: Vladislav Lesin	2022-11-16 09:43:48 +02:00
Marko Mäkelä	6b2d6a81d4	Cleanup: Remove btr_pcur_t::old_stored The Boolean field btr_pcur_t::old_stored mostly duplicates old_rec. Let us remove it.	2022-11-16 09:36:36 +02:00
Marko Mäkelä	8442bc6e13	Cleanup: Remove an unnecessary page lookup btr_cur_search_to_nth_level(): Simply acquire a latch on the already buffer-fixed page. There is no need to release the buffer-fix and re-lookup the page.	2022-11-16 09:36:36 +02:00
Marko Mäkelä	a5ed56cd4f	Cleanup: Remove BTR_MODIFY_EXTERNAL btr_blob_log_check_t::check(): Acquire an index U-latch upfront and pass the flag BTR_MODIFY_LEAF_ALREADY_LATCHED to the search.	2022-11-16 09:36:36 +02:00
Marko Mäkelä	a75ec07b42	MDEV-27234 fixup: Remove a bogus assertion Starting with commit `8f8ba75855` the assertion would fail in ./mtr --mysqld=--innodb-adaptive-hash-index innodb.instant_alter_crash and it would keep failing even after commit `d2e649aec2`	2022-11-09 16:45:36 +02:00
Marko Mäkelä	3278c57917	Merge 10.4 into 10.5	2022-11-09 09:30:37 +02:00
Marko Mäkelä	20969aa412	Merge 10.3 into 10.4	2022-11-09 09:28:39 +02:00
Marko Mäkelä	6b91792a08	MDEV-29883 Deadlock between InnoDB statistics update and BLOB insert This is a backport of commit `8b6a308e46` from MariaDB Server 10.6.11. No attempt to reproduce the hang in earlier an earlier version of MariaDB Server than 10.6 was made. In each caller of fseg_n_reserved_pages() except ibuf_init_at_db_start() which is a special case for ibuf.index at database startup, we must hold an index latch that prevents concurrent allocation or freeing of index pages. Any operation that allocates or free pages that belong to an index tree must first acquire an index latch in non-shared mode, and while holding that, acquire an index root page latch in non-shared mode. btr_get_size(), btr_get_size_and_reserved(): Assert that a strong enough index latch is being held. dict_stats_update_transient_for_index(), dict_stats_analyze_index(): Acquire a strong enough index latch. These operations had followed the same order of acquiring latches in every InnoDB version since the very beginning (commit `c533308a15`). The hang was introduced in commit `2e814d4702` which imported mysql/mysql-server@ac74632293 which failed to strengthen the locking requirements of the function btr_get_size().	2022-11-09 09:23:18 +02:00
Marko Mäkelä	2ac1edb1c3	Merge 10.5 into 10.6	2022-11-08 17:37:22 +02:00
Marko Mäkelä	a732d5e2ba	Merge 10.4 into 10.5	2022-11-08 17:01:28 +02:00
Marko Mäkelä	93b4f84ab2	Merge 10.3 into 10.4	2022-11-08 16:04:01 +02:00
Thirunarayanan Balathandayuthapani	f4519fb772	MDEV-28797 Assertion `page_rec_is_user_rec(rec)' failed in PageBulk::getSplitRec - During alter operation of compressed table, page split operation chooses the first record of the page as split record and it leads to empty left page. This issue caused by the commit `77b3959b5c` (MDEV-28457). page_rec_is_second(), page_rec_is_second_last(): Removed the functions since it is a deadcode.	2022-11-08 15:27:29 +05:30
Marko Mäkelä	b737d09dbc	MDEV-29905 Change buffer operations fail to check for log file overflow Every operation that is going to write redo log is supposed to invoke log_free_check() before acquiring any latches. If there is a risk of log buffer overrun, a log checkpoint would be triggered by that call. ibuf_merge_space(), ibuf_merge_in_background(), ibuf_delete_for_discarded_space(): Invoke log_free_check() when the current thread is not holding any page latches. Unfortunately, in lower-level code called from ibuf_insert() or ibuf_merge_or_delete_for_page(), some page latches may be held and a call to log_free_check() could hang. ibuf_set_bitmap_for_bulk_load(): Use the caller's mini-transaction. The caller should have invoked log_free_check() while not holding any page latches.	2022-11-08 11:37:43 +02:00
Marko Mäkelä	8b6a308e46	MDEV-29883 Deadlock between InnoDB statistics update and BLOB insert The test innodb.innodb-wl5522-debug would occasionally hang (especially when run with ./mtr --rr) due to a deadlock between btr_store_big_rec_extern_fields() and dict_stats_analyze_index(). The two threads would acquire the clustered index root page latch and the tablespace latch in the opposite order. The deadlock was possible because dict_stats_analyze_index() was holding the index latch in shared mode and an index root page latch, while waiting for the tablespace latch. If a stronger dict_index_t::lock had been held by dict_stats_analyze_index(), any operations that free or allocate index pages would have been blocked. In each caller of fseg_n_reserved_pages() except ibuf_init_at_db_start() which is a special case for ibuf.index at database startup, we must hold an index latch that prevents concurrent allocation or freeing of index pages. Any operation that allocates or free pages that belong to an index tree must first acquire an index latch in Update or Exclusive mode, and while holding that, acquire an index root page latch in Update or Exclusive mode. dict_index_t::clear(): Also acquire an index latch. Otherwise, the test innodb.insert_into_empty could hang. btr_get_size_and_reserved(): Assert that a strong enough index latch is being held. Only acquire a shared fil_space_t::latch; we are only reading, not modifying any data. dict_stats_update_transient_for_index(), dict_stats_analyze_index(): Acquire a strong enough index latch. Only acquire a shared fil_space_t::latch. These operations had followed the same order of acquiring latches in every InnoDB version since the very beginning (commit `c533308a15`). The calls for acquiring tablespace latch had previously been moved in commit `87839258f8` and commit `1e9c922fa7`. The hang was introduced in commit `2e814d4702` which imported mysql/mysql-server@ac74632293 which failed to strengthen the locking requirements of the function btr_get_size().	2022-10-26 12:52:08 +03:00
Marko Mäkelä	ab0190101b	MDEV-24402: InnoDB CHECK TABLE ... EXTENDED Until now, the attribute EXTENDED of CHECK TABLE was ignored by InnoDB, and InnoDB only counted the records in each index according to the current read view. Unless the attribute QUICK was specified, the function btr_validate_index() would be invoked to validate the B-tree structure (the sibling and child links between index pages). The EXTENDED check will not only count all index records according to the current read view, but also ensure that any delete-marked records in the clustered index are waiting for the purge of history, and that all secondary index records point to a version of the clustered index record that is waiting for the purge of history. In other words, no index may contain orphan records. Normal MVCC reads and the non-EXTENDED version of CHECK TABLE would ignore these orphans. Unpurged records merely result in warnings (at most one per index), not errors, and no indexes will be flagged as corrupted due to such garbage. It will remain possible to SELECT data from such indexes or tables (which will skip such records) or to rebuild the table to reclaim some space. We introduce purge_sys.end_view that will be (almost) a copy of purge_sys.view at the end of a batch of purging committed transaction history. It is not an exact copy, because if the size of a purge batch is limited by innodb_purge_batch_size, some records that purge_sys.view would allow to be purged will be left over for subsequent batches. The purge_sys.view is relevant in the purge of committed transaction history, to determine if records are safe to remove. The new purge_sys.end_view is relevant in MVCC operations and in CHECK TABLE ... EXTENDED. It tells which undo log records are safe to access (have not been discarded at the end of a purge batch). purge_sys.clone_oldest_view<true>(): In trx_lists_init_at_db_start(), clone the oldest read view similar to purge_sys_t::clone_end_view() so that CHECK TABLE ... EXTENDED will not report bogus failures between InnoDB restart and the completed purge of committed transaction history. purge_sys_t::is_purgeable(): Replaces purge_sys_t::changes_visible() in the case that purge_sys.latch will not be held by the caller. Among other things, this guards access to BLOBs. It is not safe to dereference any BLOBs of a delete-marked purgeable record, because they may have already been freed. purge_sys_t::view_guard::view(): Return a reference to purge_sys.view that will be protected by purge_sys.latch, held by purge_sys_t::view_guard. purge_sys_t::end_view_guard::view(): Return a reference to purge_sys.end_view while it is protected by purge_sys.end_latch. Whenever a thread needs to retrieve an older version of a clustered index record, it will hold a page latch on the clustered index page and potentially also on a secondary index page that points to the clustered index page. If these pages contain purgeable records that would be accessed by a currently running purge batch, the progress of the purge batch would be blocked by the page latches. Hence, it is safe to make a copy of purge_sys.end_view while holding an index page latch, and consult the copy of the view to determine whether a record should already have been purged. btr_validate_index(): Remove a redundant check. row_check_index_match(): Check if a secondary index record and a version of a clustered index record match each other. row_check_index(): Replaces row_scan_index_for_mysql(). Count the records in each index directly, duplicating the relevant logic from row_search_mvcc(). Initialize check_table_extended_view for CHECK ... EXTENDED while holding an index leaf page latch. If we encounter an orphan record, the copy of purge_sys.end_view that we make is safe for visibility checks, and trx_undo_get_undo_rec() will check for the safety to access each undo log record. Should that check fail, we should return DB_MISSING_HISTORY to report a corrupted index. The EXTENDED check tries to match each secondary index record with every available clustered index record version, by duplicating the logic of row_vers_build_for_consistent_read() and invoking trx_undo_prev_version_build() directly. Before invoking row_check_index_match() on delete-marked clustered index record versions, we will consult purge_sys.is_purgeable() in order to avoid accessing freed BLOBs. We will always check that the DB_TRX_ID or PAGE_MAX_TRX_ID does not exceed the global maximum. Orphan secondary index records will be flagged only if everything up to PAGE_MAX_TRX_ID has been purged. We warn also about clustered index records whose nonzero DB_TRX_ID should have been reset in purge or rollback. trx_set_rw_mode(): Move an assertion from ReadView::set_creator_trx_id(). trx_undo_prev_version_build(): Remove two debug-only parameters, and return an error code instead of a Boolean. trx_undo_get_undo_rec(): Return a pointer to the undo log record, or nullptr if one cannot be retrieved. Instead of consulting the purge_sys.view, consult the purge_sys.end_view to determine which records can be accessed. trx_undo_get_rec_if_purgeable(): A variant of trx_undo_get_undo_rec() that will consult purge_sys.view instead of purge_sys.end_view. TRX_UNDO_CHECK_PURGEABILITY: A new parameter to trx_undo_prev_version_build(), passed by row_vers_old_has_index_entry() so that purge_sys.view instead of purge_sys.end_view will be consulted to determine whether a secondary index record may be safely purged. row_upd_changes_disowned_external(): Remove. This should be more expensive than briefly latching purge_sys in trx_undo_prev_version_build() (which may make use of transactional memory). row_sel_reset_old_vers_heap(): New function, split from row_sel_build_prev_vers_for_mysql(). row_sel_build_prev_vers_for_mysql(): Reorder some parameters to simplify the call to row_sel_reset_old_vers_heap(). row_search_for_mysql(): Replaced with direct calls to row_search_mvcc(). sel_node_get_nth_plan(): Define inline in row0sel.h open_step(): Define at the call site, in simplified form. sel_node_reset_cursor(): Merged with the only caller open_step(). --- ReadViewBase::check_trx_id_sanity(): Remove. Let us handle "future" DB_TRX_ID in a more meaningful way: row_sel_clust_sees(): Return DB_SUCCESS if the record is visible, DB_SUCCESS_LOCKED_REC if it is invisible, and DB_CORRUPTION if the DB_TRX_ID is in the future. row_undo_mod_must_purge(), row_undo_mod_clust(): Silently ignore corrupted DB_TRX_ID. We are in ROLLBACK, and we should have noticed that corruption when we were about to modify the record in the first place (leading us to refuse the operation). row_vers_build_for_consistent_read(): Return DB_CORRUPTION if DB_TRX_ID is in the future. Tested by: Matthias Leich Reviewed by: Vladislav Lesin	2022-10-21 10:02:54 +03:00
Marko Mäkelä	a992c615a6	Merge 10.5 into 10.6	2022-10-12 12:14:13 +03:00
Marko Mäkelä	977c385df3	Merge 10.4 into 10.5	2022-10-12 11:29:32 +03:00
Marko Mäkelä	7434eb566e	Merge 10.3 into 10.4	2022-10-11 15:18:49 +03:00
Marko Mäkelä	56b97ca03a	MDEV-29742 heap number overflow A previous fix in commit `efd8af535a` failed to cover ALTER TABLE. PageBulk::isSpaceAvailable(): Check for record heap number overflow.	2022-10-10 09:12:55 +03:00
Marko Mäkelä	3e9e377bf6	MDEV-29590 Deadlock between ibuf_insert_to_index_page_low() and DDL btr_page_reorganize_low(): Do not invoke lock_move_reorganize_page() on a dummy index during change buffer merge. The ibuf.index page latch that we are holding may block a DDL operation that is waiting in ibuf_delete_for_discarded_space() while holding exclusive lock_sys.latch. ibuf_insert_low() would refuse to buffer a change if any locks exist for the index page.	2022-10-06 13:14:07 +03:00
Marko Mäkelä	6dc157f8a6	Merge 10.5 into 10.6	2022-10-06 09:22:39 +03:00

1 2 3 4 5 ...

1111 commits