mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-29 10:14:19 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	de078e060e	Merge 10.4 into 10.5	2022-10-06 08:29:56 +03:00
Marko Mäkelä	65d0c57c1a	Merge 10.3 into 10.4	2022-10-05 20:30:57 +03:00
Vlad Lesin	c0eda62aec	MDEV-27927 row_sel_try_search_shortcut_for_mysql() does not latch a page, violating read view isolation btr_search_guess_on_hash() would only acquire an index page latch if it is invoked with ahi_latch=NULL. If it's invoked from row_sel_try_search_shortcut_for_mysql() with ahi_latch!=NULL, a page will not be latched, and row_search_mvcc() will get a pointer to the record, which can be changed by some other transaction before the record was stored in result buffer with row_sel_store_mysql_rec() call. ahi_latch argument of btr_cur_search_to_nth_level_func() and btr_pcur_open_with_no_init_func() is used only for row_sel_try_search_shortcut_for_mysql(). btr_cur_search_to_nth_level_func(..., ahi_latch !=0, ...) is invoked only from btr_pcur_open_with_no_init_func(..., ahi_latch !=0, ...), which, in turns, is invoked only from row_sel_try_search_shortcut_for_mysql(). I suppose that separate case with ahi_latch!=0 was intentionally implemented to protect row_sel_store_mysql_rec() call in row_search_mvcc() just after row_sel_try_search_shortcut_for_mysql() call. After the ahi_latch was moved from row_seach_mvcc() to row_sel_try_search_shortcut_for_mysql(), there is no need in it at all if btr_search_guess_on_hash() latches a page unconditionally. And if btr_search_guess_on_hash() latched the page, any access to the record in row_sel_try_search_shortcut_for_mysql() after btr_pcur_open_with_no_init() call will be protected with the page latch. The fix is to remove ahi_latch argument from btr_pcur_open_with_no_init_func(), btr_cur_search_to_nth_level_func() and btr_search_guess_on_hash(). There will not be test, as to test it we need to freeze some SELECT execution in the point between row_sel_try_search_shortcut_for_mysql() and row_sel_store_mysql_rec() calls in row_search_mvcc(), and to change the record in some other transaction to let row_sel_store_mysql_rec() to store changed record in result buffer. Buf we can't do this with the fix, as the page will be latched in btr_search_guess_on_hash() call.	2022-10-05 17:35:21 +03:00
Sergei Golubchik	194cc36805	Merge branch '10.5' into 10.6	2022-09-30 12:29:24 +02:00
Marko Mäkelä	44fd2c4b24	Merge 10.5 into 10.6	2022-09-20 16:53:20 +03:00
Alexander Barkov	fe844c16b6	Merge remote-tracking branch 'origin/10.4' into 10.5	2022-09-14 16:24:51 +04:00
Marko Mäkelä	18795f5512	Merge 10.3 into 10.4	2022-09-13 16:36:38 +03:00
Marko Mäkelä	68ce0231ad	MDEV-23801 Assertion failed in btr_pcur_store_position() btr_lift_page_up(): If the leaf page only contains a hidden metadata record for MDEV-11369 instant ADD COLUMN, convert the table to the canonical format like we are supposed to do whenever the table becomes empty.	2022-09-13 15:46:40 +03:00
Marko Mäkelä	40aa94df35	MDEV-29435 CHECK TABLE forgets to release latches after reporting failure btr_validate_level(): Invoke mtr.commit() after a failure. This omission was introduced in commit `0b47c126e3` (MDEV-13542).	2022-09-01 10:40:27 +03:00
Marko Mäkelä	0fbcb0a2b8	MDEV-29383 Assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed in mtr_t::commit() In commit `0b47c126e3` (MDEV-13542) a few calls to mtr_t::memo_push() were moved before a write latch on the page was acquired. This introduced a race condition: 1. is_block_dirtied() returned false to mtr_t::memo_push() 2. buf_page_t::write_complete() was executed, the block marked clean, and a page latch released 3. The page latch was acquired by the caller of mtr_t::memo_push(), and mtr_t::m_made_dirty was not set even though the block is in a clean state. The impact of this race condition is that crash recovery and backups may fail. btr_cur_latch_leaves(), btr_store_big_rec_extern_fields(), btr_free_externally_stored_field(), trx_purge_free_segment(): Acquire the page latch before invoking mtr_t::memo_push(). This fixes the regression caused by MDEV-13542. Side note: It would suffice to set mtr_t::m_made_dirty at the time we set the MTR_MEMO_MODIFY flag for a block. Currently that flag is unnecessarily set if a mini-transaction acquires a page latch on a page that is in a clean state, and will not actually modify the block. This may cause unnecessary acquisitions of log_sys.flush_order_mutex on mtr_t::commit(). mtr_t::free(): If the block had been exclusively latched in this mini-transaction, set the m_made_dirty flag so that the flush order mutex will be acquired during mtr_t::commit(). This should have been part of commit `4179f93d28` (MDEV-18976). It was necessary to change mtr_t::free() so that WriteOPT_PAGE_CHECKSUM::operator() would be able to avoid writing checksums for freed pages.	2022-08-26 11:41:43 +03:00
Marko Mäkelä	76bb671e42	Merge 10.5 into 10.6	2022-08-25 16:02:44 +03:00
Marko Mäkelä	9929301ecd	Merge 10.4 into 10.5	2022-08-25 15:31:19 +03:00
Marko Mäkelä	851058a3e6	Merge 10.3 into 10.4	2022-08-25 15:17:20 +03:00
Marko Mäkelä	f2a53b6158	btr_search_drop_page_hash_index(): Remove a racey debug check	2022-08-24 15:00:47 +03:00
Marko Mäkelä	01f9c81237	MDEV-29336: Potential deadlock in btr_page_alloc_low() with the AHI The index root page contains the fields BTR_SEG_TOP and BTR_SEG_LEAF which keep track of allocated pages in the index tree. These fields are normally protected by an Update latch, so that concurrent read access to other parts of the page will be possible. When the index root page is already exclusively latched in the mini-transaction, we must not try to acquire a lower-grade Update latch. In fact, when the root page is already X or U latched in the mini-transaction, there is no point to acquire another latch. Moreover, after a U latch was acquired on top of an X-latch, mtr_t::defer_drop_ahi() would trigger an assertion failure or lock corruption in block->page.lock.u_x_upgrade() because X locks already exist on the block. This problem may have been introduced in commit `03ca6495df` (MDEV-24142). btr_page_alloc_low(), btr_page_free(): Initially buffer-fix the root page. If it is already U or X latched, release the buffer-fix. Else, upgrade the buffer-fix to a U latch. mtr_t::u_lock_register(): Upgrade a buffer-fix to U latch. mtr_t::have_u_or_x_latch(): Check if U or X latches are already registered in the mini-transaction.	2022-08-23 08:47:49 +03:00
Marko Mäkelä	fbb2b1f55f	Merge 10.5 into 10.6	2022-08-23 08:47:21 +03:00
Marko Mäkelä	3b656ac8c1	Merge 10.4 into 10.5	2022-08-22 19:49:56 +03:00
Marko Mäkelä	b68ae6dc1d	Merge 10.3 into 10.4	2022-08-22 16:22:09 +03:00
Thirunarayanan Balathandayuthapani	c7f8cfc9e7	MDEV-27700 ASAN: Heap_use_after_free in btr_search_drop_page_hash_index() Reason: ======= Race condition between btr_search_drop_hash_index() and btr_search_lazy_free(). One thread does resizing of buffer pool and clears the ahi on all pages in the buffer pool, frees the index and table while removing the last reference. At the same time, other thread access index->heap in btr_search_drop_hash_index(). Solution: ========= Acquire the respective ahi latch before checking index->freed() btr_search_drop_page_hash_index(): Added new parameter to indicate that drop ahi entries only if the index is marked as freed btr_search_check_marked_free_index(): Acquire all ahi latches and return true if the index was freed	2022-08-22 16:29:46 +05:30
Marko Mäkelä	34cdc00628	MDEV-13542 fixup: Improve page reorganize btr_page_reorganize_low(): Restore mtr->set_log_mode() before returning. innobase_instant_try(): Do not invoke rec_get_offsets() if btr_cur_pessimistic_update() failed. The cursor position may be invalid.	2022-08-01 16:39:44 +03:00
Marko Mäkelä	63478e72de	MDEV-21098: Assertion failure in rec_get_offsets_func() The function rec_get_offsets_func() used to hit ut_error due to an invalid rec_get_status() value of a ROW_FORMAT!=REDUNDANT record. This fix is twofold: We will not only avoid a crash on corruption in this case, but we will also make more effort to validate each record every time we are iterating over index page records. rec_get_offsets_func(): Do not crash on a corrupted record. page_rec_get_nth(): Return nullptr on error. page_dir_slot_get_rec_validate(): Like page_dir_slot_get_rec(), but validate the pointer and return nullptr on error. page_cur_search_with_match(), page_cur_search_with_match_bytes(), page_dir_split_slot(), page_cur_move_to_next(): Indicate failure in a return value. page_cur_search(): Replaced with page_cur_search_with_match(). rec_get_next_ptr_const(), rec_get_next_ptr(): Replaced with page_rec_get_next_low(). TODO: rtr_page_split_initialize_nodes(), rtr_update_mbr_field(), and possibly other SPATIAL INDEX functions fail to properly handle errors. Reviewed by: Thirunarayanan Balathandayuthapani Tested by: Matthias Leich Performance tested by: Axel Schwenke	2022-08-01 11:25:50 +03:00
Vlad Lesin	222e800e24	MDEV-21136 InnoDB's records_in_range estimates can be way off Get rid of BTR_ESTIMATE and btr_cur_t::path_arr. Before the fix btr_estimate_n_rows_in_range_low() used two btr_cur_search_to_nth_level() calls to create two arrays of tree path, the array per border. And then it tried to estimate the number of rows diving level-by-level with the array elements. As the path pages are unlatched during the arrays iterating, the tree could be modified, the estimation function called itself until the number of attempts exceed. After the fix the estimation happens during search process. Roughly, the algorithm is the following. Dive in the left page, then if there are pages between left and right ones, read a few pages to the right, if the right page is reached, fetch it and count the exact number of rows, otherwise count the estimated number of rows, and fetch the right page. The latching order corresponds to WL#6326 rules, i.e.: (2.1) [same as (1.1)]: Page latches must be acquired in descending order of tree level. (2.2) When acquiring a node pointer page latch at level L, we must hold the left sibling page latch (at level L) or some ancestor latch (at level>L). When we dive to the level down, the parent page is unlatched only after the the current level page is latched. When we estimate the number of rows on some level, we latch the left border, then fetch the next page, and then fetch the next page unlatching the previous page after the current page is latched until the right border is reached. I.e. the left sibling is always latched when we acquire page latch on the same level. When we reach the right border, the current page is unlatched, and then the right border is latched. Following to (2.2) rule, we can do this because the right border's parent is latched.	2022-07-25 15:13:49 +03:00
Marko Mäkelä	62a20f8047	Merge 10.5 into 10.6	2022-07-01 15:24:50 +03:00
Marko Mäkelä	f09687094c	Merge 10.4 into 10.5	2022-07-01 14:42:02 +03:00
Marko Mäkelä	392ee571c1	Merge 10.3 into 10.4	2022-07-01 13:10:36 +03:00
Marko Mäkelä	045771c050	Fix most clang-15 -Wunused-but-set-variable Also, refactor trx_i_s_common_fill_table() to remove dead code. Warnings about yynerrs in Bison-generated yyparse() will remain for now.	2022-07-01 09:48:36 +03:00
Marko Mäkelä	39f45f6f89	MDEV-28950 Assertion `*err == DB_SUCCESS' failed in btr_page_split_and_insert btr_root_raise_and_insert(), btr_lift_page_up(), rtr_page_split_and_insert(): Reset DB_FAIL from a failure to copy records on a ROW_FORMAT=COMPRESSED page to DB_SUCCESS before retrying. This fixes a regression that was introduced by commit `0b47c126e3` (MDEV-13542). btr_root_raise_and_insert(): Remove a redundant condition. btr_page_split_and_insert() will invoke btr_page_split_and_insert() if needed.	2022-06-27 12:32:03 +03:00
Marko Mäkelä	77b3959b5c	MDEV-28457 Crash in page_dir_find_owner_slot() A prominent remaining source of crashes on corrupted index pages is page directory corruption. A frequent caller of page_dir_find_owner_slot() is page_rec_get_prev(). Some of those calls can be replaced with simpler logic that is less prone to fail. page_dir_find_owner_slot(), page_rec_get_prev(), page_rec_get_prev_const(), btr_pcur_move_to_prev(), btr_pcur_move_to_prev_on_page(), btr_cur_upd_rec_sys(), page_delete_rec_list_end(), rtr_page_copy_rec_list_end_no_locks(), rtr_page_copy_rec_list_start_no_locks(): Return an error code on failure. fil_space_t::io(), buf_page_get_low(): Use DB_CORRUPTION for out-of-bounds page reads. PageBulk::getSplitRec(), PageBulk::copyOut(): Simplify the code. btr_validate_level(): Prevent some more CHECK TABLE crashes on corrupted pages. btr_block_get(), btr_pcur_move_to_next_page(): Implement some checks that were previously only part of IndexPurge::next(). IndexPurge::next(): Use btr_pcur_move_to_next_page().	2022-06-08 14:53:24 +03:00
Marko Mäkelä	c9498f33de	MDEV-18519: Assertion failure in btr_page_reorganize_low() Even after commit `0b47c126e3` there are a few ib::fatal() calls in non-debug code that can be replaced easily. btr_page_reorganize_low(): On size invariant violation, return an error code instead of crashing. btr_check_blob_fil_page_type(): On an invalid page type, report an error but do not crash. btr_copy_blob_prefix(): Truncate the output if a page type is invalid. dict_load_foreign_cols(): On an error, return DB_CORRUPTION instead of crashing. fil_space_decrypt_full_crc32(), fil_space_decrypt_for_non_full_checksum(): On error, return DB_DECRYPTION_FAILED instead of crashing. fil_set_max_space_id_if_bigger(): Replace ib::fatal() with an equivalent ut_a() assertion.	2022-06-08 09:20:48 +03:00
Marko Mäkelä	4179f93d28	MDEV-18976 Implement OPT_PAGE_CHECKSUM log record for improved validation We will introduce an optional log record OPT_PAGE_CHECKSUM for recording page checksums, so that more inconsistencies on crash recovery may be caught. mtr_t::page_checksum(const buf_page_t&): Write OPT_PAGE_CHECKSUM (currently not for ROW_FORMAT=COMPRESSED pages). mtr_t::do_write(): Write OPT_PAGE_CHECKSUM records for all pages (currently, in debug builds only). mtr_t::is_logged(): Return whether log should be written. mtr_t::set_log_mode_sub(const mtr_t&): Set the logging mode of a sub-minitransaction when another mini-transaction is holding latches on some modified pages. When creating or freeing BLOB pages, we may only write OPT_PAGE_CHECKSUM records in the main mini-transaction, after all changes have been written to the log. MTR_LOG_SUB: Log mode for a sub-mini-transaction. mtr_t::free(): Define non-inline, and invoke MarkFreed. MarkFreed: For any matching page in the mini-transaction log, change the first entry to say MTR_MEMO_PAGE_X_MODIFY and any subsequent entries to MTR_MEMO_PAGE_X_FIX. FindModified: Simplify a condition. MTR_MEMO_MODIFY can only be set if MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX are set. FindBlockX: Consider also MTR_MEMO_PAGE_X_MODIFY. recv_sys_t::parse(): Store OPT_PAGE_CHECKSUM records. log_phys_t::apply(): Validate OPT_PAGE_CHECKSUM records. log_phys_t::page_checksum(): Validate an OPT_PAGE_CHECKSUM record. Tested by: Matthias Leich	2022-06-06 14:05:01 +03:00
Marko Mäkelä	0b47c126e3	MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit `177d8b0c12` is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.	2022-06-06 14:03:22 +03:00
Marko Mäkelä	75096c84b4	MDEV-28525 Some conditions around btr_latch_mode could be eliminated The types btr_latch_mode and mtr_memo_type_t are partly derived from rw_lock_type_t. Despite that, some code for converting between them is using conditions instead of bitwise arithmetics. Let us define btr_latch_mode in such a way that more conversions to rw_lock_type_t are possible by bitwise and. Some SPATIAL INDEX code that assumed !(BTR_MODIFY_TREE & BTR_MODIFY_LEAF) was adjusted.	2022-06-06 11:56:29 +03:00
Marko Mäkelä	1b03db11d2	MDEV-15528 fixup: Remove some dead code btr_page_split_and_insert(): Declare all parameters nonnull. btr_pessimistic_scrub() was removed in commit `a5584b13d1` (MDEV-15528).	2022-06-06 09:52:11 +03:00
Marko Mäkelä	2f8d0af883	Merge 10.5 into 10.6	2022-06-02 17:39:13 +03:00
Marko Mäkelä	5909e0ec31	Cleanup: btr_store_big_rec_extern_fields() does not really modify pcur	2022-06-02 17:22:16 +03:00
Sergei Golubchik	3bc98a4ec4	Merge branch '10.5' into 10.6	2022-05-10 14:01:23 +02:00
Sergei Golubchik	ef781162ff	Merge branch '10.4' into 10.5	2022-05-09 22:04:06 +02:00
Sergei Golubchik	a70a1cf3f4	Merge branch '10.3' into 10.4	2022-05-08 23:03:08 +02:00
Marko Mäkelä	57a9626fe4	Merge 10.5 into 10.6	2022-05-06 11:11:04 +03:00
Marko Mäkelä	26d46234e8	MDEV-28478: INSERT into SPATIAL INDEX in TEMPORARY table writes log This is based on commit `20ae4816bb` with some adjustments for MDEV-12353. row_ins_sec_index_entry_low(): If a separate mini-transaction is needed to adjust the minimum bounding rectangle (MBR) in the parent page, we must disable redo logging if the table is a temporary table. For temporary tables, no log is supposed to be written, because the temporary tablespace will be reinitialized on server restart. rtr_update_mbr_field(), rtr_merge_and_update_mbr(): Changed the return type to void and removed unreachable code. In older versions, these used to return a different value for temporary tables. page_id_t: Add constexpr to most member functions. mtr_t::log_write(): Catch log writes to invalid tablespaces so that the test case would crash without the fix to row_ins_sec_index_entry_low().	2022-05-06 10:12:31 +03:00
Marko Mäkelä	c844a5881a	MDEV-21452 fixup: Remove an unused variable	2022-05-03 13:31:59 +03:00
Marko Mäkelä	0806592ac8	MDEV-28422 Page split breaks a gap lock btr_insert_into_right_sibling(): Inherit any gap lock from the left sibling to the right sibling before inserting the record to the right sibling and updating the node pointer(s). lock_update_node_pointer(): Update locks in case a node pointer will move. Based on mysql/mysql-server@c7d93c274f	2022-04-27 13:38:08 +03:00
Marko Mäkelä	2ca1123464	MDEV-26217 Failing assertion: list.count > 0 in ut_list_remove or Assertion `lock->trx == this' failed in dberr_t trx_t::drop_table This follows up the previous fix in commit `c3c53926c4` (MDEV-26554). ha_innobase::delete_table(): Work around the insufficient metadata locking (MDL) during DML operations by acquiring exclusive InnoDB table locks on all child tables. Previously, this was only done on TRUNCATE and ALTER. ibuf_delete_rec(), btr_cur_optimistic_delete(): Do not invoke lock_update_delete() during change buffer operations. The revised trx_t::commit(std::vector<pfs_os_file_t>&) will hold exclusive lock_sys.latch while invoking fil_delete_tablespace(), which in turn may invoke ibuf_delete_rec(). dict_index_t::has_locking(): A new predicate, replacing the dummy !dict_table_is_locking_disabled(index->table). Used for skipping lock operations during ibuf_delete_rec(). trx_t::commit(std::vector<pfs_os_file_t>&): Release the locks and remove the table from the cache while holding exclusive lock_sys.latch. trx_t::commit_in_memory(): Skip release_locks() if dict_operation holds. trx_t::commit(): Reset dict_operation before invoking commit_in_memory() via commit_persist(). lock_release_on_drop(): Release locks while lock_sys.latch is exclusively locked. lock_table(): Add a parameter for a pointer to the table. We must not dereference the table before a lock_sys.latch has been acquired. If the pointer to the table does not match the table at that point, the table is invalid and DB_DEADLOCK will be returned. row_ins_foreign_check_on_constraint(): Improve the checks. Remove a bogus DB_LOCK_WAIT_TIMEOUT return that was needed before commit `c5fd9aa562` (MDEV-25919). row_upd_check_references_constraints(), wsrep_row_upd_check_foreign_constraints(): Simplify checks.	2022-04-26 18:09:03 +03:00
Thirunarayanan Balathandayuthapani	4b80c11f52	MDEV-15250 UPSERT during ALTER TABLE results in 'Duplicate entry' error for alter - InnoDB DDL results in `Duplicate entry' if concurrent DML throws duplicate key error. The following scenario explains the problem connection con1: ALTER TABLE t1 FORCE; connection con2: INSERT INTO t1(pk, uk) VALUES (2, 2), (3, 2); In connection con2, InnoDB throws the 'DUPLICATE KEY' error because of unique index. Alter operation will throw the error when applying the concurrent DML log. - Inserting the duplicate key for unique index logs the insert operation for online ALTER TABLE. When insertion fails, transaction does rollback and it leads to logging of delete operation for online ALTER TABLE. While applying the insert log entries, alter operation encounters 'DUPLICATE KEY' error. - To avoid the above fake duplicate scenario, InnoDB should not write any log for online ALTER TABLE before DML transaction commit. - User thread which does DML can apply the online log if InnoDB ran out of online log and index is marked as completed. Set online log error if apply phase encountered any error. It can also clear all other indexes log, marks the newly added indexes as corrupted. - Removed the old online code which was a part of DML operations commit_inplace_alter_table() : Does apply the online log for the last batch of secondary index log and does frees the log for the completed index. trx_t::apply_online_log: Set to true while writing the undo log if the modified table has active DDL trx_t::apply_log(): Apply the DML changes to online DDL tables dict_table_t::is_active_ddl(): Returns true if the table has an active DDL dict_index_t::online_log_make_dummy(): Assign dummy value for clustered index online log to indicate the secondary indexes are being rebuild. dict_index_t::online_log_is_dummy(): Check whether the online log has dummy value ha_innobase_inplace_ctx::log_failure(): Handle the apply log failure for online DDL transaction row_log_mark_other_online_index_abort(): Clear out all other online index log after encountering the error during row_log_apply() row_log_get_error(): Get the error happened during row_log_apply() row_log_online_op(): Does apply the online log if index is completed and ran out of memory. Returns false if apply log fails UndorecApplier: Introduced a class to maintain the undo log record, latched undo buffer page, parse the undo log record, maintain the undo record type, info bits and update vector UndorecApplier::get_old_rec(): Get the correct version of the clustered index record that was modified by the current undo log record UndorecApplier::clear_undo_rec(): Clear the undo log related information after applying the undo log record UndorecApplier::log_update(): Handle the update, delete undo log and apply it on online indexes UndorecApplier::log_insert(): Handle the insert undo log and apply it on online indexes UndorecApplier::is_same(): Check whether the given roll pointer is generated by the current undo log record information trx_t::rollback_low(): Set apply_online_log for the transaction after partially rollbacked transaction has any active DDL prepare_inplace_alter_table_dict(): After allocating the online log, InnoDB does create fulltext common tables. Fulltext index doesn't allow the index to be online. So removed the dead code of online log removal Thanks to Marko Mäkelä for providing the initial prototype and Matthias Leich for testing the issue patiently.	2022-04-25 18:52:19 +05:30
Marko Mäkelä	8f8ba75855	MDEV-27234: Data dictionary recovery was not READ COMMITTED This also fixes MDEV-20198: Instant ALTER TABLE is not crash safe InnoDB dictionary recovery wrongly used the READ UNCOMMITTED isolation level, causing some mismatch. For example, if a table was renamed or replaced in a transaction, according to READ UNCOMMITTED the table might not exist at all. We implement READ COMMITTED isolation level for accessing the dictionary tables SYS_TABLES, SYS_COLUMNS, SYS_INDEXES, SYS_FIELDS, SYS_VIRTUAL, SYS_FOREIGN, SYS_FOREIGN_COLS. For most of these tables, no secondary index exists. For the secondary indexes (on SYS_TABLES.ID, SYS_FOREIGN.FOR_NAME, SYS_FOREIGN.REF_NAME), we will always look up the primary key in the clustered index and check if the record actually is a committed version. dict_check_sys_tables(): Recover tablespaces also from delete-marked committed records, so that if a matching .ibd file exists, it will be removed by fil_delete_tablespace() when the committed delete-marked SYS_INDEXES record of the clustered index is purged in row_purge_remove_clust_if_poss_low(). fil_ibd_open(): Change the Boolean parameter "validate" to a ternary one, to suppress error messages when the file might not exist. It is possible that a .ibd file was deleted and the server shut down before the SYS_INDEXES and SYS_TABLES records were purged. Hence, if dict_check_sys_tables() finds a committed delete-marked record, we must not complain if the tablespace file is not found. On Windows, we msut treat ERROR_PATH_NOT_FOUND (directory not found) in the same way as ERROR_FILE_NOT_FOUND. This fixes a few failures where a previous test successfully executed DROP DATABASE (and deleted all files and the directory), but a committed delete-marked SYS_TABLES record had not been purged before server restart. dict_getnext_system_low(): Do not filter out delete-marked records. dict_startscan_system(), dict_getnext_system(): Do filter out delete-marked records, for accessing the INFORMATION_SCHEMA tables. dict_sys_tables_rec_read(): Return the DB_TRX_ID of the committed version of the record. This is needed in dict_load_table_low(). dict_load_foreign_cols(), dict_load_foreign(): Add a parameter for the current transaction identifier. In some DDL operations, the FOREIGN KEY constraints are being loaded from the data dictionary before the DDL transaction has been committed. For SYS_FOREIGN and SYS_FOREIGN_COLS, we must implement the special case of READ COMMITTED that the changes of the uncommitted current transaction are visible. dict_load_foreign(): Validate the table name. We could find a SYS_FOREIGN.ID via a committed delete-marked secondary index record that does not match the REF_NAME or FOR_NAME of the secondary index record. dict_load_index_low(): Optionally take the table as a parameter, so that table->def_trx_id can be updated in case of a committed delete-marked SYS_INDEXES record corresponding to DROP INDEX, but not corresponding to an index stub of ADD INDEX. dict_load_indexes(): Do not update table->def_trx_id in case of delete-marked records. rec_is_metadata(), rec_offs_make_valid(), rec_get_offsets_func(), row_build_low(): Relax some assertions. We may now have !index->is_instant() even if a metadata record is present in the index. Previously, the recovery of instant ADD/DROP COLUMN assumed that READ UNCOMMITTED of the data dictionary will be performed. Now, we will have a READ COMMITTED copy of the data dictionary cache, and a READ UNCOMMITTED copy of the metadata record. btr_page_reorganize_low(): Correctly update the FIL_PAGE_TYPE when rolling back an instant ADD/DROP COLUMN operation. row_rec_to_index_entry_impl(): Relax some assertions, and disallow accessing "extra" fields. This fixes the recovery of a crash during an instant ADD COLUMN after a successful instant DROP COLUMN, in the test innodb.instant_alter_crash. Tested by: Matthias Leich	2022-03-28 08:37:51 +03:00
Vlad Lesin	202316a38f	Merge 10.5 into 10.6	2022-03-07 18:42:47 +03:00
Marko Mäkelä	2dce3bad9c	Merge 10.4 into 10.5	2022-03-07 09:26:50 +02:00
Marko Mäkelä	7b97020d40	Merge 10.3 into 10.4	2022-03-07 09:05:36 +02:00
Marko Mäkelä	02da00a98c	Merge 10.2 into 10.3	2022-03-04 14:29:36 +02:00
Marko Mäkelä	a92f07f4bd	MDEV-27993 Assertion failed in btr_page_reorganize_low() btr_cur_optimistic_insert(): Disregard DEBUG_DBUG injection to invoke btr_page_reorganize() if the page (and the table) is empty. Otherwise, an assertion would fail in btr_page_reorganize_low() because PAGE_MAX_TRX_ID is 0 in an empty secondary index leaf page.	2022-03-03 11:51:25 +02:00

1 2 3 4 5 ...

1111 commits