mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-04-25 08:29:58 +02:00

Author	SHA1	Message	Date
Oleksandr Byelkin	6197e6abc4	Merge branch '10.11' into 11.2	2024-08-21 07:58:46 +02:00
Marko Mäkelä	62bfcfd8b2	Merge 10.6 into 10.11	2024-08-14 11:36:52 +03:00
Marko Mäkelä	4f8803c036	MDEV-34678 pthread_mutex_init() without pthread_mutex_destroy() When SUX_LOCK_GENERIC is defined, the srw_mutex, srw_lock, sux_lock are implemented based on pthread_mutex_t and pthread_cond_t. This is the only option for systems that lack a futex-like system call. In the SUX_LOCK_GENERIC mode, if pthread_mutex_init() is allocating some resources that need to be freed by pthread_mutex_destroy(), a memory leak could occur when we are repeatedly invoking pthread_mutex_init() without a pthread_mutex_destroy() in between. pthread_mutex_wrapper::initialized: A debug field to track whether pthread_mutex_init() has been invoked. This also helps find bugs like the one that was fixed by commit `1c8af2ae53` (MDEV-34422); one simply needs to add -DSUX_LOCK_GENERIC to the CMAKE_CXX_FLAGS to catch that particular bug on the initial server bootstrap. buf_block_init(), buf_page_init_for_read(): Invoke block_lock::init() because buf_page_t::init() will no longer do that. buf_page_t::init(): Instead of invoking lock.init(), assert that it has already been invoked (the lock is vacant). add_fts_index(), build_fts_hidden_table(): Explicitly invoke index_lock::init() in order to avoid a pthread_mutex_destroy() invocation on an uninitialized object. srw_lock_debug::destroy(): Invoke readers_lock.destroy(). trx_sys_t::create(): Invoke trx_rseg_t::init() on all rollback segments in order to guarantee a deterministic state for shutdown, even if InnoDB fails to start up. trx_rseg_array_init(), trx_temp_rseg_create(), trx_rseg_create(): Invoke trx_rseg_t::destroy() before trx_rseg_t::init() in order to balance pthread_mutex_init() and pthread_mutex_destroy() calls.	2024-08-14 07:54:15 +03:00
Sergei Golubchik	f9807aadef	Merge branch '10.11' into 11.0	2024-05-12 12:18:28 +02:00
Sergei Golubchik	018d537ec1	Merge branch '10.6' into 10.11	2024-04-22 15:23:10 +02:00
Jan Lindström	cac0fc97cc	MDEV-32974 : Member fails to join due to old seqno in GTID Before MDEV-15158, wsrep xid information was stored in only one place: in the TRX_SYS page. Starting with 10.3, it is not stored there but in the rollback segment header pages, and the latest one is what matters. MDEV-19229 allows the undo tablespaces to be rebuilt when innodb_undo_tablespaces is changed on startup. Previously it was not possible to change that parameter. These changes caused the fact that rollback segment header pages could contain several wsrep xid's stored and when undo tablespaces were rebuilt there was a effort to restore wsrep xid back to rollback segment header page but because there was several of them the latest wsrep xid was overwritten with older one. trx_rseg_read_wsrep_checkpoint trx_rseg_init_wsrep_xid Return true if read xid is wsrep xid, false if not trx_rseg_mem_restore Try to read wsrep xid and if it is found copy it to trx_sys.recovered_wsrep_xid if read xid has larger seqno.	2024-04-11 10:18:20 +03:00
Marko Mäkelä	263932d505	MDEV-33325 Crash in flst_read_addr on corrupted data flst_read_addr(): Remove assertions. Instead, we will check these conditions in the callers and avoid a crash in case of corruption. We will check the conditions more carefully, because the callers know more exact bounds for the page numbers and the byte offsets withing pages. flst_remove(), flst_add_first(), flst_add_last(): Add a parameter for passing fil_space_t::free_limit. None of the lists may point to pages that are beyond the current initialized length of the tablespace. trx_rseg_mem_restore(): Access the first page of the tablespace, so that we will correctly recover rseg->space->free_limit in case some log based recovery is pending. ibuf_remove_free_page(): Only look up the root page once, and validate the last page number. Reviewed by: Debarun Banerjee	2024-04-11 09:58:53 +03:00
Vlad Lesin	722df777ca	MDEV-33757 Get rid of TrxUndoRsegs code TrxUndoRsegs is wrapper for vector of trx_rseg_t*. It has two constructors, both initialize the vector with only one element. And they are used to push transactions rseg(the singular) to purge queue. There is no function to add elements to the vector. The default constructor is used only for declaration of NullElement. The TrxUndoRsegs was introduced in WL#6915 in MySQL 5.7 and. MySQL 5.7 would unnecessarily let the purge of history parse the temporary undo records, and then look up the table (via a global hash table), and only at the point of processing the parsed undo log record determine that the table is a temporary table and the undo record must be thrown away. In MariaDB 10.2 we have two disjoint sets of rollback segments (128 for persistent, 128 for temporary), and purge does not even see the temporary tables. The only reason why temporary tables are visible to other threads is a SQL layer bug (MDEV-17805). purge_sys_t::choose_next_log(): merge the relevant part of TrxUndoRsegsIterator::set_next() to the start of purge_sys_t::choose_next_log(). purge_sys_t::rseg_get_next_history_log(): add a tail call of purge_sys_t::choose_next_log() and adjust the callers, to simplify the control flow further. purge_sys.pq_mutex and purge_sys.purge_queue: make it private by adding some simple accessor function. trx_purge_cleanse_purge_queue(): make it a member of purge_sys_t to have have access to private purge_sys.pq_mutex and purge_sys.purge_queue, simplify the code with using simple array copy and clearing purge queue instead of poping each purge queue element. rseg_t::last_commit_and_offset: exchange trx_no and offset bits to avoid bitwise operations during pushing to/popping from purge queue. Thanks Marko Mäkelä for historical overview of TrxUndoRsegs development. Reviewed by: Marko Mäkelä	2024-04-03 16:01:12 +03:00
Marko Mäkelä	fec2fd6add	Merge 10.11 into 11.0	2024-03-28 10:51:36 +02:00
Marko Mäkelä	788953463d	Merge 10.6 into 10.11 Some fixes related to commit `f838b2d799` and Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row() for system-versioned tables were provided by Nikita Malyavin. This was required by test versioning.rpl,trx_id,row.	2024-03-28 09:16:57 +02:00
Sergei Golubchik	f71d7f2f0f	Merge branch '10.5' into 10.6	2024-03-13 21:02:34 +01:00
Marko Mäkelä	f703e72bd8	Merge 10.4 into 10.5	2024-03-11 10:08:20 +02:00
Daniele Sciascia	648d2da8f2	MDEV-33540 Avoid writes to TRX_SYS page during mariabackup operations Fix a scenario where `mariabackup --prepare` fails with assertion `!m_modifications \|\| !recv_no_log_write' in `mtr_t::commit()`. This happens if the prepare step of the backup encounters a data directory which happens to store wsrep xid position in TRX SYS page (this is no longer the case since 10.3.5). And since MDEV-17458, `trx_rseg_array_init()` handles this case by copying the xid position to rollback segments, before clearing the xid from TRX SYS page. However, this step should be avoided when `trx_rseg_array_init()` is invoked from mariabackup. The relevant code was surrounded by the condition `srv_operation == SRV_OPERATION_NORMAL`. An additional check ensures that we are not trying to copy a xid position which has already zeroed.	2024-03-08 16:02:38 +01:00
Marko Mäkelä	590036b021	Merge 10.11 into 11.0	2023-12-20 16:05:20 +02:00
Marko Mäkelä	2b99e5f7ef	Merge 10.6 into 10.11	2023-12-20 15:58:36 +02:00
Marko Mäkelä	2b01e5103d	Merge 10.5 into 10.6	2023-12-19 18:41:42 +02:00
Sergei Golubchik	8c8bce05d2	Merge branch '10.11' into 11.0	2023-12-19 15:53:18 +01:00
Sergei Golubchik	fd0b47f9d6	Merge branch '10.6' into 10.11	2023-12-18 11:19:04 +01:00
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Marko Mäkelä	f074223ae7	MDEV-32068 Some calls to buf_read_ahead_linear() seem to be useless The linear read-ahead (enabled by nonzero innodb_read_ahead_threshold) works best if index leaf pages or undo log pages have been allocated on adjacent page numbers. The read-ahead is assumed not to be helpful in other types of page accesses, such as non-leaf index pages. buf_page_get_low(): Do not invoke buf_page_t::set_accessed(), buf_page_make_young_if_needed(), or buf_read_ahead_linear(). We will invoke them in those callers of buf_page_get_gen() or buf_page_get() where it makes sense: the access is not one-time-on-startup and the page and not going to be freed soon. btr_copy_blob_prefix(), btr_pcur_move_to_next_page(), trx_undo_get_prev_rec_from_prev_page(), trx_undo_get_first_rec(), btr_cur_t::search_leaf(), btr_cur_t::open_leaf(): Invoke buf_read_ahead_linear(). We will not invoke linear read-ahead in functions that would essentially allocate or free pages, because pages that are freshly allocated are expected to be initialized by buf_page_create() and not read from the data file. Likewise, freeing pages should not involve accessing any sibling pages, except for freeing singly-linked lists of BLOB pages. We will not invoke read-ahead in btr_cur_t::pessimistic_search_leaf() or in a pessimistic operation of btr_cur_t::open_leaf(), because it is assumed that pessimistic operations should be preceded by optimistic operations, which should already have invoked read-ahead. buf_page_make_young_if_needed(): Invoke also buf_page_t::set_accessed() and return the result. btr_cur_nonleaf_make_young(): Like buf_page_make_young_if_needed(), but do not invoke buf_page_t::set_accessed(). Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-12-05 12:31:29 +02:00
Sergei Golubchik	98a39b0c91	Merge branch '10.4' into 10.5	2023-12-02 01:02:50 +01:00
Kristian Nielsen	9fa718b1a1	Fix mariabackup InnoDB recovered binlog position on server upgrade Before MariaDB 10.3.5, the binlog position was stored in the TRX_SYS page, while after it is stored in rollback segments. There is code to read the legacy position from TRX_SYS to handle upgrades. The problem was if the legacy position happens to compare larger than the position found in rollback segments; in this case, the old TRX_SYS position would incorrectly be preferred over the newer position from rollback segments. Fixed by always preferring a position from rollback segments over a legacy position. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-03 09:13:51 +01:00
Kristian Nielsen	f8f5ed2280	Revert: MDEV-22351 InnoDB may recover wrong information after RESET MASTER This commit can cause the wrong (old) binlog position to be recovered by mariabackup --prepare. It implements that the value of the FIL_PAGE_LSN is compared to determine which binlog position is the last one and should be recoved. However, it is not guaranteed that the FIL_PAGE_LSN order matches the commit order, as is assumed by the code. This is because the page LSN could be modified by an unrelated update of the page after the commit. In one example, the recovery first encountered this in trx_rseg_mem_restore(): lsn=27282754 binlog position (./master-bin.000001, 472908) and then later: lsn=27282699 binlog position (./master-bin.000001, 477164) The last one 477164 is the correct position. However, because the LSN encountered for the first one is higher, that position is recovered instead. This results in too old binlog position, and a newly provisioned slave will start replicating too early and get duplicate key error or similar. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-03 09:13:51 +01:00
Marko Mäkelä	54819192fe	Merge 10.11 into 11.0	2023-04-26 18:50:15 +03:00
Marko Mäkelä	52f6f364d9	Merge 10.10 into 10.11	2023-04-26 18:31:50 +03:00
Marko Mäkelä	204e7225dc	Cleanup: MONITOR_EXISTING trx_undo_slots_used, trx_undo_slots_cached Let us remove explicit updates of MONITOR_NUM_UNDO_SLOT_USED and MONITOR_NUM_UNDO_SLOT_CACHED, and let us compute the rough values from trx_sys.rseg_array[] on demand.	2023-04-21 17:58:18 +03:00
Marko Mäkelä	485a1b1f11	MDEV-30863 Server freeze, all threads in trx_assign_rseg_low() trx_assign_rseg_low(): Simplify the debug check. trx_rseg_t::reinit(): Reset the skip_allocation() flag. This logic was broken in the merge commit `3e2ad0e918` of commit `0de3be8cfd` (that is, innodb_undo_log_truncate=ON would never be "completed"). Tested by: Matthias Leich	2023-04-18 14:54:40 +03:00
Marko Mäkelä	a75cd0a734	MDEV-30671 follow-up: Remove the field TRX_UNDO_NEEDS_PURGE Because downgrades from 11.0 to older MariaDB server are not possible due to the removal of the InnoDB change buffer, there is no need to access the field TRX_UNDO_NEEDS_PURGE anymore.	2023-02-28 13:21:31 +02:00
Marko Mäkelä	7a834d6248	Merge 10.11 into 11.0	2023-02-28 13:14:08 +02:00
Marko Mäkelä	95d51369c9	Merge 10.10 into 10.11	2023-02-28 10:52:42 +02:00
Marko Mäkelä	3e2ad0e918	Merge 10.5 into 10.6	2023-02-27 13:17:35 +02:00
Marko Mäkelä	0de3be8cfd	MDEV-30671 InnoDB undo log truncation fails to wait for purge of history It is not safe to invoke trx_purge_free_segment() or execute innodb_undo_log_truncate=ON before all undo log records in the rollback segment has been processed. A prominent failure that would occur due to premature freeing of undo log pages is that trx_undo_get_undo_rec() would crash when trying to copy an undo log record to fetch the previous version of a record. If trx_undo_get_undo_rec() was not invoked in the unlucky time frame, then the symptom would be that some committed transaction history is never removed. This would be detected by CHECK TABLE...EXTENDED that was impleented in commit `ab0190101b`. Such a garbage collection leak should be possible even when using innodb_undo_log_truncate=OFF, just involving trx_purge_free_segment(). trx_rseg_t::needs_purge: Change the type from Boolean to a transaction identifier, noting the most recent non-purged transaction, or 0 if everything has been purged. On transaction start, we initialize this to 1 more than the transaction start ID. On recovery, the field may be adjusted to the transaction end ID (TRX_UNDO_TRX_NO) if it is larger. The field TRX_UNDO_NEEDS_PURGE becomes write-only; only some debug assertions that would validate the value. The field reflects the old inaccurate Boolean field trx_rseg_t::needs_purge. trx_undo_mem_create_at_db_start(), trx_undo_lists_init(), trx_rseg_mem_restore(): Remove the parameter max_trx_id. Instead, store the maximum in trx_rseg_t::needs_purge, where trx_rseg_array_init() will find it. trx_purge_free_segment(): Contiguously hold a lock on trx_rseg_t to prevent any concurrent allocation of undo log. trx_purge_truncate_rseg_history(): Only invoke trx_purge_free_segment() if the rollback segment is empty and there are no pending transactions associated with it. trx_purge_truncate_history(): Only proceed with innodb_undo_log_truncate=ON if trx_rseg_t::needs_purge indicates that all history has been purged. Tested by: Matthias Leich	2023-02-24 14:24:44 +02:00
Marko Mäkelä	f27e9c8947	MDEV-29694 Remove the InnoDB change buffer The purpose of the change buffer was to reduce random disk access, which could be useful on rotational storage, but maybe less so on solid-state storage. When we wished to (1) insert a record into a non-unique secondary index, (2) delete-mark a secondary index record, (3) delete a secondary index record as part of purge (but not ROLLBACK), and the B-tree leaf page where the record belongs to is not in the buffer pool, we inserted a record into the change buffer B-tree, indexed by the page identifier. When the page was eventually read into the buffer pool, we looked up the change buffer B-tree for any modifications to the page, applied these upon the completion of the read operation. This was called the insert buffer merge. We remove the change buffer, because it has been the source of various hard-to-reproduce corruption bugs, including those fixed in commit `5b9ee8d819` and commit `165564d3c3` but not limited to them. A downgrade will fail with a clear message starting with commit `db14eb16f9` (MDEV-30106). buf_page_t::state: Merge IBUF_EXIST to UNFIXED and WRITE_FIX_IBUF to WRITE_FIX. buf_pool_t::watch[]: Remove. trx_t: Move isolation_level, check_foreigns, check_unique_secondary, bulk_insert into the same bit-field. The only purpose of trx_t::check_unique_secondary is to enable bulk insert into an empty table. It no longer enables insert buffering for UNIQUE INDEX. btr_cur_t::thr: Remove. This field was originally needed for change buffering. Later, its use was extended to cover SPATIAL INDEX. Much of the time, rtr_info::thr holds this field. When it does not, we will add parameters to SPATIAL INDEX specific functions. ibuf_upgrade_needed(): Check if the change buffer needs to be updated. ibuf_upgrade(): Merge and upgrade the change buffer after all redo log has been applied. Free any pages consumed by the change buffer, and zero out the change buffer root page to mark the upgrade completed, and to prevent a downgrade to an earlier version. dict_load_tablespaces(): Renamed from dict_check_tablespaces_and_store_max_id(). This needs to be invoked before ibuf_upgrade(). btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics. The change buffer merge does not need this function anymore. btr_page_alloc(): Renamed from btr_page_alloc_low(). We no longer allocate any change buffer pages. btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics. The change buffer merge does not need this function anymore. row_search_index_entry(), btr_lift_page_up(): Add a parameter thr for the SPATIAL INDEX case. rtr_page_split_and_insert(): Specialized from btr_page_split_and_insert(). rtr_root_raise_and_insert(): Specialized from btr_root_raise_and_insert(). Note: The support for upgrading from the MySQL 3.23 or MySQL 4.0 change buffer format that predates the MySQL 4.1 introduction of the option innodb_file_per_table was removed in MySQL 5.6.5 as part of mysql/mysql-server@69b6241a79 and MariaDB 10.0.11 as part of `1d0f70c2f8`. In the tests innodb.log_upgrade and innodb.log_corruption, we create valid (upgraded) change buffer pages. Tested by: Matthias Leich	2023-01-11 17:59:36 +02:00
Thirunarayanan Balathandayuthapani	baf276e6d4	MDEV-19229 Allow innodb_undo_tablespaces to be changed after database creation trx_sys_t::undo_log_nonempty: Set to true if there are undo logs to rollback and purge. The algorithm for re-creating the undo tablespace when trx_sys_t::undo_log_nonempty is disabled: 1) trx_sys_t::reset_page(): Reset the TRX_SYS page and assign all rollback segment slots from 1..127 to FIL_NULL 2) Free the rollback segment header page of system tablespace for the slots 1..127 3) Update the binlog and WSREP information in system tablespace rollback segment header Step (1), (2) and Step (3) should happen atomically within a single mini-transaction. 4) srv_undo_delete_old_tablespaces(): Delete the old undo tablespaces present in the undo log directory 5) Make checkpoint to get rid of old undo log tablespaces redo logs 6) Assign new start space id for the undo log tablespaces 7) Re-create the specified undo log tablespaces. InnoDB uses same mtr for this one and step (6) 8) Make checkpoint again, so that server or mariabackup can read the undo log tablespace page0 before applying the redo logs srv_undo_tablespaces_reinit(): Recreate the undo log tablespaces. It does reset trx_sys page, delete the old undo tablespaces, update the binlog offset, write set replication checkpoint in system rollback segment page trx_rseg_update_binlog_offset(): Added 2 new parameters to pass binlog file name and binlog offset trx_rseg_array_init(): Return error if the rollback segment slot points to non-existent tablespace srv_undo_tablespaces_init(): Added new parameter mtr to initialize all undo tablespaces trx_assign_rseg_low(): Allow the transaction to use the rollback segment slots(1..127) even if InnoDB failed to change to the requested innodb_undo_tablespaces=0 srv_start(): Override the user specified value of innodb_undo_tablespaces variable with already existing actual undo tablespaces wf_incremental_process(): Detects whether TRX_SYS page has been modified since last backup. If it is then incremental backup fails and throws the information about taking full backup again xb_assign_undo_space_start(): Removed the function. Because undo001 has first undo space id value in page0 Added test case to test the scenario during startup and mariabackup incremental process too. Reviewed-by : Marko Mäkelä Tested-by : Matthias Leich	2022-10-25 11:19:36 +05:30
Marko Mäkelä	ec37906646	MDEV-29321 Percona XtraDB 5.7 can't be upgrade to MariaDB 10.6 or above In MySQL 5.7, rollback segments 1 to 32 are used for temporary tables, which is an unnecessary file format change from MySQL 5.6. This format change was avoided in MariaDB Server by commit `124bae082b` (MDEV-12289). An upgrade from MySQL 5.7 would crash due to dereferencing a null pointer, which is a regression due to commit `0b47c126e3` (MDEV-13542). trx_rseg_t::get(): Return nullptr if no tablespace exists. This is where the upgrade would crash. trx_rseg_mem_restore(): Return DB_TABLESPACE_NOT_FOUND if the undo tablespace does not exist. This is likely dead code.	2022-08-18 16:02:13 +03:00
Marko Mäkelä	0b47c126e3	MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit `177d8b0c12` is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.	2022-06-06 14:03:22 +03:00
Krunal Bauskar	83212632e4	MDEV-27935: Enable performance_schema profiling for trx_rseg_t latch - In 10.6, trx_rseg_t mutex was ported to use latch. As part of this porting profiling of the patch was removed. This patch reenables it given that the said latch continues to occupy the top-slots in the contention list.	2022-02-24 19:48:51 +08:00
Marko Mäkelä	aaef2e1d8c	MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t:🔒 Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.	2021-11-18 17:47:19 +02:00
Marko Mäkelä	d95361107c	Merge 10.5 into 10.6	2021-09-24 14:38:52 +03:00
Marko Mäkelä	88f38661b7	Merge 10.4 into 10.5	2021-09-24 12:14:35 +03:00
Marko Mäkelä	d5bd704f4b	Merge 10.3 into 10.4	2021-09-24 12:11:52 +03:00
Marko Mäkelä	4bfdba2e89	MDEV-26672 innodb_undo_log_truncate may reset transaction ID sequence trx_rseg_header_create(): Add a parameter for the value that is to be written to TRX_RSEG_MAX_TRX_ID. If we omit this write, then the updated test innodb.undo_truncate will fail for the 4k, 8k, 16k page sizes. This was broken ever since commit `947efe17ed` (MDEV-15158) removed the writes of transaction identifiers to the TRX_SYS page. srv_do_purge(): Truncate undo tablespaces also during slow shutdown (innodb_fast_shutdown=0). Thanks to Krunal Bauskar for noticing this problem.	2021-09-24 11:23:37 +03:00
Marko Mäkelä	6e12ebd4a7	MDEV-25062: Reduce trx_rseg_t::mutex contention redo_rseg_mutex, noredo_rseg_mutex: Remove the PERFORMANCE_SCHEMA keys. The rollback segment mutex will be uninstrumented. trx_sys_t: Remove pointer indirection for rseg_array, temp_rseg. Align each element to the cache line. trx_sys_t::rseg_id(): Replaces trx_rseg_t::id. trx_rseg_t::ref: Replaces needs_purge, trx_ref_count, skip_allocation in a single std::atomic<uint32_t>. trx_rseg_t::latch: Replaces trx_rseg_t::mutex. trx_rseg_t::history_size: Replaces trx_sys_t::rseg_history_len trx_sys_t::history_size_approx(): Replaces trx_sys.rseg_history_len in those places where the exact count does not matter. We must not acquire any trx_rseg_t::latch while holding index page latches, because normally the trx_rseg_t::latch is acquired before any page latches. trx_sys_t::history_exists(): Replaces trx_sys.rseg_history_len!=0 with an approximation. We remove some unnecessary trx_rseg_t::latch acquisition around trx_undo_set_state_at_prepare() and trx_undo_set_state_at_finish(). Those operations will only access fields that remain constant after trx_rseg_t::init().	2021-06-23 13:42:11 +03:00
Marko Mäkelä	3a566de22d	Merge 10.5 into 10.6	2021-06-23 09:24:32 +03:00
Marko Mäkelä	344e59904d	Merge 10.4 into 10.5	2021-06-23 08:17:49 +03:00
Marko Mäkelä	09b03ff31b	Merge 10.3 into 10.4	2021-06-23 08:05:27 +03:00
Marko Mäkelä	35a9aaebe2	MDEV-25981 InnoDB upgrade fails trx_undo_mem_create_at_db_start(): Relax too strict upgrade checks that were introduced in commit `e46f76c974` (MDEV-15912). On commit, pages will typically be set to TRX_UNDO_CACHED state. Having the type TRX_UNDO_INSERT in such pages is common and unproblematic; the type would be reset in trx_undo_reuse_cached(). trx_rseg_array_init(): On failure, clean up the rollback segments that were initialized so far, to avoid an assertion failure later during shutdown.	2021-06-22 09:30:25 +03:00
Marko Mäkelä	4dfec8b230	Merge 10.5 into 10.6	2021-06-21 17:49:33 +03:00
Marko Mäkelä	a42c80bd48	Merge 10.4 into 10.5	2021-06-21 14:22:22 +03:00
Marko Mäkelä	d3e4fae797	Merge 10.3 into 10.4	2021-06-21 12:38:25 +03:00

1 2 3

141 commits