mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 12:02:42 +01:00

Author	SHA1	Message	Date
Thirunarayanan Balathandayuthapani	5bb31bc882	MDEV-22230 : Unexpected ER_ERROR_ON_RENAME upon DROP non-existing FOREIGN KEY mysql_prepare_alter_table(): Alter table should check whether foreign key exists when it expected to exists and report the error in early stage dict_foreign_parse_drop_constraints(): Don't throw error if the foreign key constraints doesn't exist when if exists is given in the statement.	2023-11-26 18:46:00 +05:30
Marko Mäkelä	64f44b22d9	MDEV-31574: Assertion failure on REPLACE on ROW_FORMAT=COMPRESSED table btr_cur_update_in_place(): Update the DB_TRX_ID,DB_ROLL_PTR also on the compressed copy of the page. In a test case, a server built with cmake -DWITH_INNODB_EXTRA_DEBUG=ON would crash in page_zip_validate() due to the inconsistency. In a normal debug build, a different assertion would fail, depending on when the uncompressed page was restored from the compressed page. In MariaDB Server 10.5, this bug had already been fixed by commit `b3d02a1fcf` (MDEV-12353).	2023-11-23 15:09:26 +02:00
Marko Mäkelä	d963584d4c	Merge 10.5 into 10.6	2023-11-22 16:56:47 +02:00
Marko Mäkelä	78c9a12c8f	MDEV-32861 InnoDB hangs when running out of I/O slots When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1 and the server is run with the minimum parameters innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs were observed. tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire() will be woken up when the cache previously was empty. buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite batch so that os_aio_wait_until_no_pending_writes() has a chance of returning. Add a Boolean parameter and pass wait_for_reads=false inside buf_page_decrypt_after_read(), because those calls will be executed inside a read completion callback, and therefore os_aio_wait_until_no_pending_reads() would block indefinitely.	2023-11-22 16:54:41 +02:00
Marko Mäkelä	9c5600adde	Merge 10.5 into 10.6	2023-11-21 09:33:06 +02:00
Marko Mäkelä	de31ca6a21	MDEV-32820 Race condition between trx_purge_free_segment() and trx_undo_create() trx_purge_free_segment(): If fseg_free_step_not_header() needs to be called multiple times, acquire an exclusive latch on the rollback segment header page after restarting the mini-transaction so that the rest of this function cannot execute concurrently with trx_undo_create() on the same rollback segment. This fixes a regression that was introduced in commit `c14a39431b` (MDEV-30753). Note: The buffer-fixes that we are holding across the mini-transaction restart will prevent the pages from being evicted from the buffer pool. They may be accessed by other threads or written back to data files while we are not holding exclusive latches. Reviewed by: Vladislav Lesin	2023-11-21 08:53:02 +02:00
Thirunarayanan Balathandayuthapani	84e0c027e0	MDEV-28613 LeakSanitizer caused by I_S query using LIMIT ROWS EXAMINED Problem: ======== - InnoDB fails to free the allocated buffer of stored cursor when information schema query is interrupted. Solution: ========= - In case of error handling, information schema query should free the allocated buffer to store the cursor.	2023-11-21 11:13:43 +05:30
Marko Mäkelä	eb1f8b2919	MDEV-32027 Opening all .ibd files on InnoDB startup can be slow dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES. dict_check_tablespaces_and_store_max_id(): In the normal case (no encryption plugin has been loaded and the change buffer is empty), invoke dict_find_max_space_id() and do not open any .ibd files. If a std::set<uint32_t> has been specified, open the files whose tablespace ID is mentioned. Else, open all data files that are identified by SYS_TABLES records. fil_ibd_open(): Remove a call to os_file_get_last_error() that can report a misleading error, such as EINVAL inside my_realpath() that is not an actual error. This could be invoked when a data file is found but the FSP_SPACE_FLAGS are incorrect, such as is the case for table test.td in ./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags buf_load(): If any tablespaces could not be found, invoke dict_check_tablespaces_and_store_max_id() on the missing tablespaces. dict_load_tablespace(): Try to load the tablespace unless it was found to be futile. This fixes failures related to FTS_*.ibd files for FULLTEXT INDEX. btr_cur_t::search_leaf(): Prevent a crash when the tablespace does not exist. This was caught by the test innodb_fts.fts_concurrent_insert when the change to dict_load_tablespaces() was not present. We modify a few tests to ensure that tables will not be loaded at startup. For some fault injection tests this means that the corrupted tables will not be loaded, because dict_load_tablespace() would perform stricter checks than dict_check_tablespaces_and_store_max_id(). Tested by: Matthias Leich Reviewed by: Thirunarayanan Balathandayuthapani	2023-11-17 15:07:51 +02:00
Marko Mäkelä	9a545eb67c	MDEV-26055: Correct the formula for adaptive flushing This is a 10.5 backport of 10.6 commit `d4265fbde5`. page_cleaner_flush_pages_recommendation(): If dirty_pct is between innodb_max_dirty_pages_pct_lwm and innodb_max_dirty_pages_pct, scale the effort relative to how close we are to innodb_max_dirty_pages_pct. The previous formula was missing a multiplication by 100.	2023-11-16 17:45:37 +02:00
Marko Mäkelä	a3d0d5fc33	MDEV-26055: Improve adaptive flushing This is a 10.5 backport from 10.6 commit `9593cccf28`. Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0 (not default) and innodb_adaptive_flushing=ON (default). There is also the parameter innodb_adaptive_flushing_lwm (default: 10 per cent of the log capacity). It should enable some adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0. That is not being changed here. This idea was first presented by Inaam Rana several years ago, and I discussed it with Jean-François Gagné at FOSDEM 2023. buf_flush_page_cleaner(): When we are not near the log capacity limit (neither buf_flush_async_lsn nor buf_flush_sync_lsn are set), also try to move clean blocks from the buf_pool.LRU list to buf_pool.free or initiate writes (but not the eviction) of dirty blocks, until the remaining I/O capacity has been consumed. buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify whether dirty least recently used pages (from buf_pool.LRU) should be evicted immediately after they have been written out. Callers outside buf_flush_page_cleaner() will pass evict=true, to retain the existing behaviour. buf_do_LRU_batch(): Add the parameter bool evict. Return counts of evicted and flushed pages. buf_flush_LRU(): Add the parameter bool evict. Assume that the caller holds buf_pool.mutex and will invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list() whose caller must hold buf_pool.mutex and invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have buf_flush_wait_batch_end(). page_cleaner_flush_pages_recommendation(): Avoid some floating-point arithmetics. buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(), buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict". buf_free_from_unzip_LRU_list_batch(): Remove the parameter. Only actual page writes will contribute towards the limit. buf_LRU_free_page(): Evict freed pages of temporary tables. buf_pool.done_free: Broadcast whenever a block is freed (and buf_pool.try_LRU_scan is set). buf_pool_t::io_buf_t::reserve(): Retry indefinitely. During the test encryption.innochecksum we easily run out of these buffers for PAGE_COMPRESSED or ENCRYPTED pages. Tested by Matthias Leich and Axel Schwenke	2023-11-16 17:45:18 +02:00
Marko Mäkelä	5a1f821b93	MDEV-31861 Empty INSERT crashes with innodb_force_recovery=6 or innodb_read_only=ON ha_innobase::extra(): Do not invoke log_buffer_flush_to_disk() if high_level_read_only holds. log_buffer_flush_to_disk(): Remove an assertion that duplicates one at the start of log_write_up_to().	2023-11-16 16:57:42 +02:00
Marko Mäkelä	ea6ca01397	MDEV-32757: rollback crash on corruption trx_undo_free_page(): Detect a case of corrupted TRX_UNDO_PAGE_LIST. trx_undo_truncate_end(): Stop attempts to truncate a corrupted log. trx_t::commit_empty(): Add an error message of a corrupted log. Reviewed by: Thirunarayanan Balathandayuthapani	2023-11-15 14:11:38 +02:00
Marko Mäkelä	5dbe7a8c9a	Merge 10.5 into 10.6	2023-11-15 14:11:24 +02:00
Marko Mäkelä	52ca2e65af	Merge 10.5 into 10.6	2023-11-15 14:10:21 +02:00
Marko Mäkelä	a0f02f7438	MDEV-32757 innodb_undo_log_truncate=ON is not crash safe trx_purge_truncate_history(): Do not prematurely mark dirty pages as clean. This will be done in mtr_t::commit_shrink() as part of Shrink::operator()(mtr_memo_slot_t*). Also, register each dirty page only once in the mini-transaction. fsp_page_create(): Adjust and simplify the page creation during undo tablespace truncation. We can directly reuse pages that are already in buf_pool.page_hash. This fixes a regression that was caused by commit `f5794e1dc6` (MDEV-26445). Tested by: Matthias Leich Reviewed by: Thirunarayanan Balathandayuthapani	2023-11-15 12:23:35 +02:00
Marko Mäkelä	c638051d80	MDEV-32798 innodb_fast_shutdown=0 hang after incomplete startup innodb_preshutdown(): Only wait for active transactions to be terminated if InnoDB was started and innodb_force_recovery=3 or larger does not prevent a rollback. This fixes the following: ./mtr --parallel=auto --mysqld=--innodb-fast-shutdown=0 \ innodb.log_file_size innodb.innodb_force_recovery \ innodb.read_only_recovery innodb.read_only_recover_committed \ mariabackup.apply-log-only-incr	2023-11-14 14:35:51 +02:00
Oleksandr Byelkin	4a824c0cf0	Merge branch '10.6' into mariadb-10.6.16	2023-11-14 08:56:16 +01:00
Oleksandr Byelkin	9f83a8822f	Merge branch '10.5' into mariadb-10.5.23	2023-11-14 08:41:23 +01:00
Marko Mäkelä	dec4d0badc	MDEV-32788: Debug build failure with SUX_LOCK_GENERIC Fixes up commit `2027c482de`	2023-11-13 14:35:33 +02:00
Aleksey Midenkov	e53e7cd134	MDEV-20545 Assertion col.vers_sys_end() in dict_index_t::vers_history_row Index values for row_start/row_end was wrongly calculated for inplace ALTER for some layout of virtual fields. Possible impact 1. history row is not detected upon build clustered index for inplace ALTER which may lead to duplicate key errors on auto-increment and FTS index add. 2. foreign key constraint may falsely fail. 3. after inplace ALTER before server restart trx-based system versioning can cause server crash or incorrect data written upon UPDATE.	2023-11-10 15:46:14 +03:00
Marko Mäkelä	e0c65784aa	MDEV-32737 innodb.log_file_name fails on Assertion `after_apply \|\| !(blocks).end in recv_sys_t::clear recv_group_scan_log_recs(): Set the debug flag recv_sys.after_apply after actually completing the log scan. In the test, suppress some errors that may be reported when the crash recovery of RENAME TABLE t1 TO t2 is preceded by copying t2.ibd to t1.ibd.	2023-11-09 11:06:17 +02:00
Oleksandr Byelkin	b83c379420	Merge branch '10.5' into 10.6	2023-11-08 15:57:05 +01:00
Oleksandr Byelkin	6cfd2ba397	Merge branch '10.4' into 10.5	2023-11-08 12:59:00 +01:00
Thirunarayanan Balathandayuthapani	a44869d842	MDEV-31851 Doublewrite recovery fixup recv_dblwr_t::find_page(): Tablespace flags validity should be checked only for page 0.	2023-11-08 12:37:41 +01:00
Thirunarayanan Balathandayuthapani	b52b7b4129	MDEV-31851 Doublewrite recovery fixup recv_dblwr_t::find_page(): Tablespace flags validity should be checked only for page 0.	2023-11-06 19:18:33 +05:30
Marko Mäkelä	1fc2843eee	MDEV-31826: File handle leak on failed IMPORT TABLESPACE fil_space_t::drop(): If the caller is not interested in a detached handle, close it immediately.	2023-11-04 16:04:21 +02:00
Kristian Nielsen	9fa718b1a1	Fix mariabackup InnoDB recovered binlog position on server upgrade Before MariaDB 10.3.5, the binlog position was stored in the TRX_SYS page, while after it is stored in rollback segments. There is code to read the legacy position from TRX_SYS to handle upgrades. The problem was if the legacy position happens to compare larger than the position found in rollback segments; in this case, the old TRX_SYS position would incorrectly be preferred over the newer position from rollback segments. Fixed by always preferring a position from rollback segments over a legacy position. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-03 09:13:51 +01:00
Kristian Nielsen	f8f5ed2280	Revert: MDEV-22351 InnoDB may recover wrong information after RESET MASTER This commit can cause the wrong (old) binlog position to be recovered by mariabackup --prepare. It implements that the value of the FIL_PAGE_LSN is compared to determine which binlog position is the last one and should be recoved. However, it is not guaranteed that the FIL_PAGE_LSN order matches the commit order, as is assumed by the code. This is because the page LSN could be modified by an unrelated update of the page after the commit. In one example, the recovery first encountered this in trx_rseg_mem_restore(): lsn=27282754 binlog position (./master-bin.000001, 472908) and then later: lsn=27282699 binlog position (./master-bin.000001, 477164) The last one 477164 is the correct position. However, because the LSN encountered for the first one is higher, that position is recovered instead. This results in too old binlog position, and a newly provisioned slave will start replicating too early and get duplicate key error or similar. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-03 09:13:51 +01:00
Marko Mäkelä	bfab4ab000	MDEV-18867 fixup: Remove DBUG injection In commit `75e82f71f1` the code to rename internal tables for FULLTEXT INDEX that had been created on Microsoft Windows using incompatible names was removed. Let us also remove the related fault injection.	2023-11-02 15:27:52 +02:00
Thirunarayanan Balathandayuthapani	b4de67da45	MDEV-32638 MariaDB crashes with foreign_key_checks=0 when changing a column and adding a foreign key at the same time Problem: ======= - InnoDB fails to find the foreign key index for the newly added foreign key relation. This is caused by commit `5f09b53bdb` (MDEV-31086). FIX: === In check_col_is_in_fk_indexes(), while iterating through the newly added foreign key relationship, InnoDB should consider that foreign key relation may not have foreign index when foreign key check is disabled.	2023-11-02 14:33:05 +05:30
Marko Mäkelä	0cc809f91b	MDEV-31826: Memory leak on failed IMPORT TABLESPACE fil_delete_tablespace(): Invoke fil_space_free_low() directly. This fixes up commit `39e3ca8bd2`	2023-10-31 12:48:20 +02:00
Marko Mäkelä	15ae97b1c2	MDEV-32578 row_merge_fts_doc_tokenize() handles parser plugin inconsistently When mysql/mysql-server@0c954c2289 added a plugin interface for FULLTEXT INDEX tokenization to MySQL 5.7, fts_tokenize_ctx::processed_len got a second meaning, which is only partly implemented in row_merge_fts_doc_tokenize(). This inconsistency could cause a crash when using FULLTEXT...WITH PARSER. A test case that would crash MySQL 8.0 when using an n-gram parser and single-character words would fail to crash in MySQL 5.7, because the buf_full condition in row_merge_fts_doc_tokenize() was not met. This change is inspired by mysql/mysql-server@38e9a0779a that appeared in MySQL 5.7.44.	2023-10-27 13:13:49 +03:00
Marko Mäkelä	5b53342a6a	MDEV-32588 InnoDB may hang when running out of buffer pool buf_flush_LRU_list_batch(): Do not skip pages that are actually clean but in buf_pool.flush_list due to the "lazy removal" optimization of commit `22b62edaed`, but try to evict them. After acquiring buf_pool.flush_list_mutex, reread oldest_modification to ensure that the block still remains in buf_pool.flush_list. In addition to server hangs, this bug could also cause InnoDB: Failing assertion: list.count > 0 in invocations of UT_LIST_REMOVE(flush_list, ...). This fixes a regression that was caused by commit `a55b951e60` and possibly made more likely to hit due to commit `aa719b5010`.	2023-10-26 15:10:53 +03:00
Marko Mäkelä	39e3ca8bd2	MDEV-31826 InnoDB may fail to recover after being killed in fil_delete_tablespace() InnoDB was violating the write-ahead-logging protocol when a file was being deleted, like this: 1. fil_delete_tablespace() set the fil_space_t::STOPPING flag 2. The buf_flush_page_cleaner() thread discards some changed pages for this tablespace advances the log checkpoint a little. 3. The server process is killed before fil_delete_tablespace() wrote a FILE_DELETE record. 4. Recovery will try to apply log to pages of the tablespace, because there was no FILE_DELETE record. This will fail, because some pages that had been modified since the latest checkpoint had not been written by the page cleaner. Page writes must not be stopped before a FILE_DELETE record has been durably written. fil_space_t::drop(): Replaces fil_space_t::check_pending_operations(). Add the parameter detached_handle, and return a tablespace pointer if this thread was the first one to stop I/O on the tablespace. mtr_t::commit_file(): Remove the parameter detached_handle, and move some handling to fil_space_t::drop(). fil_space_t: STOPPING_READS, STOPPING_WRITES: Separate flags for STOPPING. We want to stop reads (and encryption) before stopping page writes. fil_space_t::is_stopping_writes(), fil_space_t::get_for_write(): Special accessors for the write path. fil_space_t::flush_low(): Ignore the STOPPING_READS flag and only stop if STOPPING_WRITES is set, to avoid an infinite loop in fil_flush_file_spaces(), which was occasionally repeated by running the test encryption.create_or_replace. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-10-26 15:07:59 +03:00
Marko Mäkelä	2ba9702163	MDEV-32050: Boost innodb_purge_batch_size on slow shutdown A slow shutdown using the previous default innodb_purge_batch_size=300 could be extremely slow, employing at most a few CPU cores on the average. Let us use the maximum batch size in order to increase throughput. Reviewed by: Vladislav Lesin	2023-10-25 10:21:49 +03:00
Marko Mäkelä	aa719b5010	MDEV-32050: Do not copy undo records in purge Also, default to innodb_purge_batch_size=1000, replacing the old default value of processing 300 undo log pages in a batch. Axel Schwenke found this value to help reduce purge lag without having a significant impact on workload throughput. In purge, we can simply acquire a shared latch on the undo log page (to avoid a race condition like the one that was fixed in commit `b102872ad5`) and retain a buffer-fix after releasing the latch. The buffer-fix will prevent the undo log page from being evicted from the buffer pool. Concurrent modification is prevented by design. Only the purge_coordinator_task (or its accomplice purge_truncation_task) may free the undo log pages, after any purge_worker_task have completed execution. Hence, we do not have to worry about any overwriting or reuse of the undo log records. trx_undo_rec_copy(): Remove. The only remaining caller would have been trx_undo_get_undo_rec_low(), which is where the logic was merged. purge_sys_t::m_initialized: Replaces heap. purge_sys_t::pages: A cache of buffer-fixed pages that have been looked up from buf_pool.page_hash. purge_sys_t::get_page(): Return a buffer-fixed undo page, using the pages cache. trx_purge_t::batch_cleanup(): Renamed from clone_end_view(). Clear the pages cache and clone the end_view at the end of a batch. purge_sys_t::n_pages_handled(): Return pages.size(). This determines if innodb_purge_batch_size was exceeded. purge_sys_t::rseg_get_next_history_log(): Replaces trx_purge_rseg_get_next_history_log(). purge_sys_t::choose_next_log(): Replaces trx_purge_choose_next_log() and trx_purge_read_undo_rec(). purge_sys_t::get_next_rec(): Replaces trx_purge_get_next_rec() and trx_undo_get_next_rec(). purge_sys_t::fetch_next_rec(): Replaces trx_purge_fetch_next_rec() and some use of trx_undo_get_first_rec(). trx_purge_attach_undo_recs(): Do not allow purge_sys.n_pages_handled() exceed the innodb_purge_batch_size or ¾ of the buffer pool, whichever is smaller. Reviewed by: Vladislav Lesin Tested by: Matthias Leich and Axel Schwenke	2023-10-25 10:19:17 +03:00
Marko Mäkelä	88733282fb	MDEV-32050: Look up tables in the purge coordinator The InnoDB table lookup in purge worker threads is a bottleneck that can degrade a slow shutdown to utilize less than 2 threads. Let us fix that bottleneck by constructing a local lookup table that does not require any synchronization while the undo log records of the current batch are being processed. TRX_PURGE_TABLE_BUCKETS: The initial number of std::unordered_map hash buckets used during a purge batch. This could avoid some resizing and rehashing in trx_purge_attach_undo_recs(). purge_node_t::tables: A lookup table from table ID to an already looked up and locked table. Replaces many fields. trx_purge_attach_undo_recs(): Look up each table in the purge batch only once. trx_purge(): Close all tables and release MDL at the end of the batch. trx_purge_table_open(), trx_purge_table_acquire(): Open a table in purge and acquire a metadata lock on it. This replaces dict_table_open_on_id<true>() and dict_acquire_mdl_shared(). purge_sys_t::close_and_reopen(): In case of an MDL conflict, close and reopen all tables that are covered by the current purge batch. It may be that some of the tables have been dropped meanwhile and can be ignored. This replaces wait_SYS() and wait_FTS(). row_purge_parse_undo_rec(): Make purge_coordinator_task issue a MDL warrant to any purge_worker_task which might need it when innodb_purge_threads>1. purge_node_t::end(): Clear the MDL warrant. Reviewed by: Vladislav Lesin and Vladislav Vaintroub	2023-10-25 10:08:20 +03:00
Marko Mäkelä	d70a98ae06	MDEV-32050: Revert the throttling of MDEV-26356 purge_coordinator_state::do_purge(): Simply use all innodb_purge_threads, no matter what the LSN age is. During shutdown with innodb_fast_shutdown=0 this code could degrade to using only 1 thread. Also, restore periodical "InnoDB: to purge" messages that were accidentally disabled in commit `80585c9d6f`. Reviewed by: Vladislav Lesin and Vladislav Vaintroub	2023-10-25 09:42:38 +03:00
Marko Mäkelä	2027c482de	MDEV-32050: Hold exclusive purge_sys.rseg->latch longer Let the purge_coordinator_task acquire purge_sys.rseg->latch less frequently and hold it longer at a time. This may throttle concurrent DML and prevent purge lag a little. Remove an unnecessary std::this_thread::yield(), because the trx_purge_attach_undo_recs() is supposed to terminate the scan when running out of undo log records. Ultimately, this will result in purge_coordinator_state::do_purge() and purge_coordinator_callback() returning control to the thread pool. Reviewed by: Vladislav Lesin and Vladislav Vaintroub	2023-10-25 09:38:49 +03:00
Marko Mäkelä	44689eb7d8	MDEV-32050: Improve srv_wake_purge_thread_if_not_active() purge_sys_t::wake_if_not_active(): Replaces srv_wake_purge_thread_if_not_active(). innodb_ddl_recovery_done(): Move the wakeup call to srv_init_purge_tasks(). purge_coordinator_timer: Remove. The srv_master_callback() already invokes purge_sys.wake_if_not_active() once per second. Reviewed by: Vladislav Lesin and Vladislav Vaintroub	2023-10-25 09:38:21 +03:00
Marko Mäkelä	14685b10df	MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency The motivation of introducing the parameter innodb_purge_rseg_truncate_frequency in mysql/mysql-server@28bbd66ea5 and mysql/mysql-server@8fc2120fed seems to have been to avoid stalls due to freeing undo log pages or truncating undo log tablespaces. In MariaDB Server, innodb_undo_log_truncate=ON should be a much lighter operation than in MySQL, because it will not involve any log checkpoint. Another source of performance stalls should be trx_purge_truncate_rseg_history(), which is shrinking the history list by freeing the undo log pages whose undo records have been purged. To alleviate that, we will introduce a purge_truncation_task that will offload this from the purge_coordinator_task. In that way, the next innodb_purge_batch_size pages may be parsed and purged while the pages from the previous batch are being freed and the history list being shrunk. The processing of innodb_undo_log_truncate=ON will still remain the responsibility of the purge_coordinator_task. purge_coordinator_state::count: Remove. We will ignore innodb_purge_rseg_truncate_frequency, and act as if it had been set to 1 (the maximum shrinking frequency). purge_coordinator_state::do_purge(): Invoke an asynchronous task purge_truncation_callback() to free the undo log pages. purge_sys_t::iterator::free_history(): Free those undo log pages that have been processed. This used to be a part of trx_purge_truncate_history(). purge_sys_t::clone_end_view(): Take a new value of purge_sys.head as a parameter, so that it will be updated while holding exclusive purge_sys.latch. This is needed for race-free access to the field in purge_truncation_callback(). Reviewed by: Vladislav Lesin	2023-10-25 09:11:58 +03:00
Marko Mäkelä	21bec97044	MDEV-32050: Clean up online ALTER UndorecApplier::assign_rec(): Remove. We will pass the undo record to UndorecApplier::apply_undo_rec(). There is no need to copy the undo record, because nothing else can write to the undo log pages that belong to an active or incomplete transaction. trx_t::apply_log(): Buffer-fix the undo page across mini-transaction boundary in order to avoid repeated page lookups. Reviewed by: Vladislav Lesin	2023-10-25 08:27:27 +03:00
Marko Mäkelä	9bb5d9fe8b	MDEV-32050: Clean up log parsing purge_node_t, undo_node_t: Change the type of rec_type and cmpl_info to byte, because this data is being extracted from a single byte. UndoRecApplier: Change type and cmpl_info to be of type byte, and move them next to the 16-bit offset field to minimize alignment bloat. row_purge_parse_undo_rec(): Remove some redundant code. Purge will be started by innodb_ddl_recovery_done(), at which point all necessary subsystems will have been initialized. trx_purge_rec_t::undo_rec: Point to const. Reviewed by: Vladislav Lesin	2023-10-25 08:27:08 +03:00
Marko Mäkelä	ea42c4baac	MDEV-32050 preparation: Simplify ROLLBACK undo_node_t::state: Replaced with bool is_temp. row_undo_rec_get(): Do not copy the undo log record. The motivation of the copying was to not hold latches on the undo pages and therefore to avoid deadlocks due to lock order inversion a.k.a. latching order violation: It is not allowed to wait for an index page latch while holding an undo page latch, because MVCC reads would first acquire an index page latch and then an undo page latch. But, in rollback, we do not actually need any latch on our own undo pages. The transaction that is being rolled back is the exclusive owner of its undo log records. They cannot be overwritten by other threads until the rollback is complete. Therefore, a buffer fix will protect the undo log record just fine, by preventing page eviction. We still must initially acquire a shared latch on each undo page, to avoid a race condition like the one that was fixed in commit `b102872ad5`. row_undo_ins_parse_undo_rec(): The first two bytes of the undo log record now are the pointer to the next record within the page, not a length. Reviewed by: Vladislav Lesin	2023-10-25 08:26:34 +03:00
Marko Mäkelä	b78b77e77d	MDEV-32530 Race condition in lock_wait_rpl_report() After acquiring lock_sys.latch, always load trx->lock.wait_lock. It could have changed by another thread that did lock_rec_move() and released lock_sys.latch right before lock_sys.wr_lock_try() succeeded. This regression was introduced in commit `e039720bf3` (MDEV-32096). Reviewed by: Vladislav Lesin	2023-10-24 14:33:14 +03:00
Alexander Barkov	df72c57d6f	MDEV-30048 Prefix keys for CHAR work differently for MyISAM vs InnoDB Also fixes: MDEV-30050 Inconsistent results of DISTINCT with NOPAD Problem: Key segments for CHAR columns where compared using strnncollsp() for engines MyISAM and Aria. This did not work correct in case if the engine applyied trailing space compression. Fix: Replacing ha_compare_text() calls to new functions: - ha_compare_char_varying() - ha_compare_char_fixed() - ha_compare_word() - ha_compare_word_prefix() - ha_compare_word_or_prefix() The code branch corresponding to comparison of CHAR column keys (HA_KEYTYPE_TEXT segment type) now uses ha_compare_char_fixed() which calls strnncollsp_nchars(). This patch does not change the behavior for the rest of the code: - comparison of VARCHAR/TEXT column keys (HA_KEYTYPE_VARTEXT1, HA_KEYTYPE_VARTEXT2 segments types) - comparison in the fulltext code	2023-10-24 03:35:48 +04:00
Marko Mäkelä	b21f52ee73	Merge 10.5 into 10.6	2023-10-23 16:43:48 +03:00
Marko Mäkelä	b5e43a1d35	MDEV-32552 Write-ahead logging is broken for freed pages buf_page_free(): Flag the freed page as modified if it is found in the buffer pool. buf_flush_page(): If the page has been freed, ensure that the log for it has been durably written, before removing the page from buf_pool.flush_list. FindBlockX: Find also MTR_MEMO_PAGE_X_MODIFY in order to avoid an occasional failure of innodb.innodb_defrag_concurrent, which involves freeing and reallocating pages in the same mini-transaction. This fixes a regression that was introduced in commit `a35b4ae898` (MDEV-15528). This logic was tested by commenting out the $shutdown_timeout line from a test and running the following: ./mtr --rr innodb.scrub rr replay var/log/mysqld.1.rr/mariadbd-0 A breakpoint in the modified buf_flush_page() was hit, and the FIL_PAGE_LSN of that page had been last modified during the mtr_t::commit() of a mini-transaction where buf_page_free() had been executed on that page.	2023-10-23 16:13:16 +03:00
Thirunarayanan Balathandayuthapani	7d89dcf1ae	MDEV-32527 Server aborts during alter operation when table doesn't have foreign index Problem: ======== InnoDB fails to find the foreign key index for the foreign key relation in the table while iterating the foreign key constraints during alter operation. This is caused by commit `5f09b53bdb` (MDEV-31086). Fix: ==== In check_col_is_in_fk_indexes(), while iterating through the foreign key relationship, InnoDB should consider that foreign key relation may not have foreign index when foreign key check is disabled.	2023-10-20 15:23:22 +05:30
Marko Mäkelä	6991b1c47c	Merge 10.5 into 10.6	2023-10-19 13:50:00 +03:00
Thirunarayanan Balathandayuthapani	85751ed81d	MDEV-31851 After crash recovery, undo tablespace fails to open srv_all_undo_tablespaces_open(): While opening the extra unused undo tablespaces, InnoDB should use ULINT_UNDEFINED instead of SRV_SPACE_ID_UPPER_BOUND.	2023-10-19 15:39:44 +05:30
Thirunarayanan Balathandayuthapani	dbba1bb1c3	MDEV-31851 After crash recovery, undo tablespace fails to open recv_recovery_from_checkpoint_start(): InnoDB should add the redo log block header + trailer size while checking the log sequence number in log file with log sequence number in the system tablespace first page.	2023-10-19 13:12:10 +05:30
Marko Mäkelä	2d6dc65de5	MDEV-32144 fixup In commit `384eb570a6` the debug check was relaxed in trx_undo_header_create(), not in the intended function trx_undo_write_xid().	2023-10-19 08:24:37 +03:00
Marko Mäkelä	cfd1788182	MDEV-32511: Race condition between checkpoint and page write fil_aio_callback(): Invoke fil_node_t::complete_write() before releasing any page latch, so that in case a log checkpoint is executed roughly concurrently with the first write into a file since the previous checkpoint, we will not miss a fdatasync() or fsync() call to make the write durable.	2023-10-18 16:51:04 +03:00
Marko Mäkelä	bf7c6fc20b	MDEV-32511 Assertion !os_aio_pending_writes() failed In MemorySanitizer builds of 10.10 and 10.11, we would rather often have the assertion fail in innodb_init() during mariadb-backup --prepare. The assertion could also fail during InnoDB startup, but less often. Before commit `685d958e38` in 10.8 the log file cleanup after a successfully applied backup is different, and the os_aio_pending_writes() assertion is in srv0start.cc. IORequest::write_complete(): Invoke node->complete_write() before releasing the page latch, so that a log checkpoint that is about to execute concurrently will not miss a fdatasync() or fsync() on the file, in case this was the first write since the last such call. create_log_file(), srv_start(): Replace the debug assertion with a debug check. For all intents and purposes, all writes could have been completed but some write_io_callback() may not have invoked io_slots::release() yet.	2023-10-18 16:33:11 +03:00
Daniel Black	e467e8d8c2	MDEV-30825 innodb_compression_algorithm=0 (none) increments Innodb_num_pages_page_compression_error fil_page_compress_low returns 0 for both innodb_compression_algorithm=0 and where there is compression errors. On the two callers to this function, don't increment the compression errors if the algorithm was none. Reviewed by: Marko Mäkelä	2023-10-18 19:18:50 +11:00
Thirunarayanan Balathandayuthapani	3da5d047b8	MDEV-31851 After crash recovery, undo tablespace fails to open Problem: ======== - InnoDB fails to open undo tablespace when page0 is corrupted and fails to throw error. Solution: ========= - InnoDB throws DB_CORRUPTION error when InnoDB encounters page0 corruption of undo tablespace. - InnoDB restores the page0 of undo tablespace from doublewrite buffer if it encounters page corruption - Moved Datafile::restore_from_doublewrite() to recv_dblwr_t::restore_first_page(). So that undo tablespace and system tablespace can use this function instead of duplicating the code srv_undo_tablespace_open(): Returns 0 if file doesn't exist or ULINT_UNDEFINED if page0 is corrupted.	2023-10-17 18:41:21 +05:30
Thirunarayanan Balathandayuthapani	ee5cadd5c8	MDEV-28122 Optimize table crash while applying online log - InnoDB fails to check the overflow buffer while applying the operation to the table that was rebuilt. This is caused by commit `3cef4f8f0f` (MDEV-515).	2023-10-16 20:17:09 +05:30
Vlad Lesin	18fa00a54c	MDEV-32272 lock_release_on_prepare_try() does not release lock if supremum bit is set along with other bits set in lock's bitmap The error is caused by MDEV-30165 fix with the following commit: `d13a57ae81` There is logical error in lock_release_on_prepare_try(): if (supremum_bit) lock_rec_unlock_supremum(*cell, lock); else lock_rec_dequeue_from_page(lock, false); Because there can be other bits set in the lock's bitmap, and the lock type can be suitable for releasing criteria, but the above logic releases only supremum bit of the lock. The fix is to release lock if it suits for releasing criteria and unlock supremum if supremum is locked otherwise. Tere is also the test for the case, which was reported by QA team. I placed it in a separate files, because it requires debug build. Reviewed by: Marko Mäkelä	2023-10-13 16:29:04 +03:00
Thirunarayanan Balathandayuthapani	cbad0bcd41	MDEV-31098 InnoDB Recovery doesn't display encryption message when no encryption configuration passed - InnoDB fails to report the error when encryption configuration wasn't passed. This patch addresses the issue by adding the error while loading the tablespace and deferring the tablespace creation.	2023-10-13 17:27:27 +05:30
Daniel Black	fbd11d5f29	MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success Review cleanups.	2023-10-13 09:48:57 +11:00
Daniel Black	c79ca7c7ad	MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success There are many filesystem related errors that can occur with MariaBackup. These already outputed to stderr with a good description of the error. Many of these are permission or resource (file descriptor) limits where the assertion and resulting core crash doesn't offer developers anything more than the log message. To the user, assertions and core crashes come across as poor error handling. As such we return an error and handle this all the way up the stack.	2023-10-12 21:37:27 +11:00
Thirunarayanan Balathandayuthapani	4045ead9db	MDEV-32337 Assertion `pos < table->n_def' failed in dict_table_get_nth_col While checking for altered column in foreign key constraints, InnoDB fails to ignore virtual columns. This issue caused by commit 5f09b53bdb4e973e7c7ec2c53a24c98321223f98(MDEV-31086).	2023-10-12 14:49:27 +05:30
Thirunarayanan Balathandayuthapani	a2312b6fb2	MDEV-32017 Auto-increment no longer works for explicit FTS_DOC_ID - InnoDB should avoid the sync commit operation when there is nothing in fulltext cache. This is caused by commit `1248fe7277` (MDEV-27582)	2023-10-12 14:48:43 +05:30
Marko Mäkelä	f9d471e2d5	Cleanup: Remove innobase_init_vc_templ() This fixes up a merge of commit `4fb8f7d07a` with respect to commit `ea37b14409`.	2023-10-12 09:48:54 +03:00
Marko Mäkelä	3f1a256234	MDEV-31890: Remove COMPILE_FLAGS The cmake configuration step is single-threaded and already consuming too much time. We should not make it worse by adding invocations like MY_CHECK_CXX_COMPILER_FLAG(). Let us prefer something that works on any supported version of GCC (4.8.5 or later) or clang, as well as recent versions of the Intel C compiler. This replaces commit `1fde785315`	2023-10-11 15:59:56 +03:00
Marko Mäkelä	625a150a86	Merge 10.5 into 10.6	2023-10-06 14:34:01 +03:00
Marko Mäkelä	6e9b421f77	MDEV-32364 Server crashes when starting server with high innodb_log_buffer_size log_t::create(): Return whether the initialisation succeeded. It may fail if too large an innodb_log_buffer_size is specified.	2023-10-06 14:16:01 +03:00
Vlad Lesin	96ae37abc5	MDEV-30658 lock_row_lock_current_waits counter in information_schema.innodb_metrics may become negative MONITOR_OVLD_ROW_LOCK_CURRENT_WAIT monitor should has MONITOR_DISPLAY_CURRENT flag set in its definition, as it shows the current state and does not accumulate anything. Reviewed by: Marko Mäkelä	2023-10-05 18:27:54 +03:00
Vladislav Vaintroub	e33e2fa949	MDEV-31095 tpool - restrict threadpool concurrency during bufferpool load Add threadpool functionality to restrict concurrency during "batch" periods (where tasks are added in rapid succession). This will throttle thread creation more agressively than usual, while keeping performance at least on-par. One of these cases is bufferpool load, where async read IOs are executed without any throttling. There can be as much as 650K read IOs for loading 10GB buffer pool. Another one is recovery, where "fake read" IOs are executed. Why there are more threads than we expect? Worker threads are not be recognized as idle, until they return to the standby list, and to return to that list, they need to acquire mutex currently held in the submit_task(). In those cases, submit_task() has no worker to wake, and would create threads until default concurrency level (2*ncpus) is satisfied. Only after that throttling would happen.	2023-10-04 17:44:02 +02:00
Daniel Black	ca66a2cbfa	MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success There are many filesystem related errors that can occur with MariaBackup. These already outputed to stderr with a good description of the error. Many of these are permission or resource (file descriptor) limits where the assertion and resulting core crash doesn't offer developers anything more than the log message. To the user, assertions and core crashes come across as poor error handling. As such we return an error and handle this all the way up the stack.	2023-09-26 08:55:52 +10:00
Jan Lindström	076df87b4c	MDEV-30217 : Assertion `mode_ == m_local \|\| transaction_.is_streaming()' failed in int wsrep::client_state::bf_abort(wsrep::seqno) Problem was that brute force (BF) thread requested conflicting lock and was trying to kill victim transaction, but this victim was also brute force thread. However, this victim was not actually holding conflicting lock, instead both brute force transaction and victim transaction were had insert intention locks. We should not kill brute force victim transaction if requesting lock does not need to wait. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-09-25 16:38:55 +02:00
Yuchen Pei	6b343de8ef	Merge branch '10.4' into 10.5	2023-09-25 13:06:57 +10:00
Vladislav Vaintroub	1ee0d09a2b	MDEV-32228 speedup opening tablespaces on Windows is_file_on_ssd() is more expensive than it should be. It caches the results by volume name, but still calls GetVolumePathName() every time, which, as procmon shows, opens multiple directories in filesystem hierarchy (db directory, datadir, and all ancestors) The fix is to cache SSD status by volume serial ID, which is cheap to retrieve with GetFileInformationByHandleEx()	2023-09-22 21:07:50 +02:00
Vlad Lesin	d13a57ae81	Merge 10.5 into 10.6.	2023-09-22 15:21:15 +03:00
Vlad Lesin	95730372bd	MDEV-30165 X-lock on supremum for prepared transaction for RR trx_t::set_skip_lock_inheritance() must be invoked at the very beginning of lock_release_on_prepare(). Currently trx_t::set_skip_lock_inheritance() is invoked at the end of lock_release_on_prepare() when lock_sys and trx are released, and there can be a case when locks on prepare are released, but "not inherit gap locks" bit has not yet been set, and page split inherits lock to supremum. Also reset supremum bit and rebuild waiting queue when XA is prepared. Reviewed by: Marko Mäkelä	2023-09-21 20:07:53 +03:00
Marko Mäkelä	52e7016248	Remove dead code This fixes up commmit `ed20e5b111` which fixed up the merge commit `202316a38f`	2023-09-20 08:36:30 +03:00
Marko Mäkelä	60b039a864	Merge 10.5 into 10.6	2023-09-20 08:32:04 +03:00
Marko Mäkelä	d58f43f8b4	MDEV-21174 fixup: Remove unused ut_bit_set_nth() This fixes up commit `56f6dab1d0`	2023-09-19 18:02:56 +03:00
Thirunarayanan Balathandayuthapani	2fdacdcd69	MDEV-30802 Assertion `index->is_btree() \|\| index->is_ibuf()' failed in btr_search_guess_on_hash Problem: ======= - There is a race condition between purge and rollback of alter operation. Alter rollback marks the index as corrupted. At the same time, purge is working on the same index and leads to assert failure. This is caused by commit `7c0b9c6020` (MDEV-15250). Solution: ======= - After MDEV-15250, InnoDB logs the operation only at the end of transaction commit and applies the log in ha_innobase::commit_inplace_alter_table() and also via dml thread. So there is no need for purge to work on uncommitted index. The assertion would fail in the test innodb.innodb-index-online when the following call is added to the start of the function row_purge_remove_sec_if_poss_leaf(): if (!index->is_committed()) sleep(5);	2023-09-19 19:04:53 +05:30
Marko Mäkelä	8096139b3a	Merge 10.5 into 10.6	2023-09-19 10:47:26 +03:00
Marko Mäkelä	6c05edfdcd	Merge 10.4 into 10.5	2023-09-19 10:20:09 +03:00
Marko Mäkelä	76b688f100	MDEV-30024 InnoDB: tried to purge non-delete-marked of a virtual column prefix row_vers_vc_matches_cluster(): Invoke dtype_get_at_most_n_mbchars() to extract the correct number of bytes corresponding to the number of characters in a virtual column prefix index, just like we do in row_sel_sec_rec_is_for_clust_rec(). The test case would occasionally reproduce the failure when this fix is not present.	2023-09-19 09:31:34 +03:00
Thirunarayanan Balathandayuthapani	85db6df412	MDEV-32151 InnoDB scrubbing doesn't write zero while freeing the page for temporary tablespace - InnoDB fails to mark the page status as FREED during freeing of page for temporary tablespace. This behaviour affects scrubbing and doesn't write all zeroes in file even though pages are freed. mtr_t::free(): Mark the page as freed for temporary tablespace also	2023-09-18 18:26:07 +05:30
Marko Mäkelä	6a470db552	Merge 10.5 into 10.6	2023-09-14 15:25:53 +03:00
Marko Mäkelä	81e60f1a0a	MDEV-32163 Crash recovery fails after DROP TABLE in system tablespace fseg_free_extent(): After fsp_free_extent() succeeded, properly mark the affected pages as freed. We failed to write FREE_PAGE records. This bug was revealed or caused by commit `e938d7c18f` (MDEV-32028).	2023-09-14 15:17:27 +03:00
Marko Mäkelä	0f9acce3f2	Merge 10.5 into 10.6	2023-09-14 09:01:15 +03:00
Marko Mäkelä	cce76df5cc	Fix cmake -DWITH_INNODB_AHI=OFF This fixes up commit `6cc88c3db1` Thanks to Markus Mäkelä for reporting the build failure.	2023-09-14 08:58:41 +03:00
Marko Mäkelä	d20a4da23d	MDEV-32150 InnoDB reports corruption on 32-bit platforms with ibd files sizes > 4GB buf_read_page_low(): Use 64-bit arithmetics when computing the file byte offset. In other calls to fil_space_t::io() the offset was being computed correctly, for example by buf_page_t::physical_offset().	2023-09-12 15:16:31 +03:00
Marko Mäkelä	736901b443	MDEV-30100 fixup: Remove a failing debug assertion trx_purge_truncate_history(): Remove a debug assertion that had originally been added in commit `0de3be8cfd` (MDEV-30671). In trx_t::commit_empty() we do not have any efficient way to rewind rseg.needs_purge to an accurate value that would satisfy this debug assertion. Note: No correctness property should be violated here. At the point where the debug assertion was located, we had already established that purge_sys.sees(rseg.needs_purge) holds, that is, it is safe to remove everything from rseg.	2023-09-12 12:25:51 +03:00
Marko Mäkelä	3c840ae746	MDEV-26782 fixup: Remove dead code trx_undo_reuse_cached(): Assert that this is being invoked on the persistent rollback segment of the transaction, and remove dead code that was handling cached temporary undo log. This was missed in commit `51e62cb3b3` (MDEV-26782).	2023-09-12 12:03:35 +03:00
Thirunarayanan Balathandayuthapani	a03b8cd0a2	MDEV-32145 Disable read-ahead for temporary tablespace - Lifetime of temporary tables is expected to be short, it would seem to make sense to assume that all temporary tablespace pages will remain in the buffer pool. It doesn't make sense to have read-ahead for pages of temporary tablespace	2023-09-11 18:02:53 +05:30
Marko Mäkelä	cdd2fa7fc5	MDEV-32134 InnoDB hang in buf_flush_wait_LRU_batch_end() buf_flush_page_cleaner(): Before finishing a batch, wake up any threads that are waiting for buf_pool.done_flush_LRU. This should fix a hung shutdown that we observed after SET GLOBAL innodb_buffer_pool_size started was executed to shrink the InnoDB buffer pool.	2023-09-11 14:54:50 +03:00
Marko Mäkelä	466d9f5ff3	MDEV-32103 InnoDB ALTER TABLE is not crash-safe Starting with commit `4ff5311dec` log_write_up_to(trx->commit_lsn, true) in DDL operations could end up being a no-op, because trx->commit_lsn would be 0. trx_flush_log_if_needed(): Revert an incorrect attempt to ensure that DDL operations are crash-safe. trx_t::commit(std::vector<pfs_os_file_t> &), ha_innobase::rename_table(): Set trx_t::flush_log_later so that trx_t::commit_in_memory() will retain trx_t::commit_lsn for the final durability call. Tested by: Matthias Leich	2023-09-11 14:54:17 +03:00
Marko Mäkelä	4a8291fc5f	MDEV-30531 Corrupt index(es) on busy table when using FOREIGN KEY lock_wait(): Never return the transient error code DB_LOCK_WAIT. In commit `78a04a4c22` (MDEV-29869) some assignments assign trx->error_state = DB_SUCCESS were removed, and it was possible that the field was left at its initial value DB_LOCK_WAIT. The test case for this is nondeterministic; without this fix, it would only occasionally fail. Reviewed by: Vladislav Lesin	2023-09-11 14:52:05 +03:00
Marko Mäkelä	e039720bf3	MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait lock_sys_t::cancel(trx_t*): Remove, and merge to its only caller innobase_kill_query(). innobase_kill_query(): Before reading trx->lock.wait_lock, do acquire lock_sys.wait_mutex, like we did before commit `e71e613353` (MDEV-24671). In this way, we should not miss a recently started lock wait by the killee transaction. lock_rec_lock(): Add a DEBUG_SYNC "lock_rec" for the test case. lock_wait(): Invoke trx_is_interrupted() before entering the wait, in case innobase_kill_query() was invoked some time earlier and some longer-running operation did not check for interrupts. As suggested by Vladislav Lesin, do not overwrite trx->error_state==DB_INTERRUPTED with DB_SUCCESS. This would avoid a call to trx_is_interrupted() when the test is modified to use the DEBUG_SYNC point lock_wait_start instead of lock_rec. Avoid some redundant loads of trx->lock.wait_lock; cache the value in the local variable wait_lock. Deadlock::check_and_resolve(): Take wait_lock as a parameter and return wait_lock (or -1 or nullptr). We only need to reload trx->lock.wait_lock if lock_sys.wait_mutex had been released and reacquired. trx_t::error_state: Correctly document the data member. trx_lock_t::was_chosen_as_deadlock_victim: Clarify that other threads may set the field (or flags in it) while holding lock_sys.wait_mutex. Thanks to Johannes Baumgarten for reporting the problem and testing the fix, as well as to Kristian Nielsen for suggesting the fix. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-09-11 14:51:02 +03:00
Marko Mäkelä	0dd25f28f7	Merge 10.5 into 10.6	2023-09-11 14:46:39 +03:00
Marko Mäkelä	384eb570a6	MDEV-32144 Debug assertion failure w == MAYBE_NOP in mtr_t::memcpy() trx_undo_write_trx_xid(): Silence the debug assertion by passing a template parameter that causes us to not care that the contents of the page did not actually change and no log record would be written. This debug assertion could fail if XA PREPARE was executed multiple times with the same XID.	2023-09-11 11:48:15 +03:00
Marko Mäkelä	f8f7d9de2c	Merge 10.4 into 10.5	2023-09-11 11:29:31 +03:00
Marko Mäkelä	65c99207e0	MDEV-23841: Memory leak in innodb_monitor_validate() innodb_monitor_validate(): Let item_val_str() allocate the memory in THD, so that it will be available to innodb_monitor_update(). In this way, there is no need to allocate another buffer, and no problem if the call to innodb_monitor_update() is skipped due to an invalid value that is passed to another configuration parameter. There are some other callers to st_mysql_sys_var::val_str() that validate configuration parameters that are related to FULLTEXT INDEX, but they will allocate memory by invoking thd_strmake().	2023-09-11 10:27:21 +03:00
Sergei Golubchik	fba4abf3b9	MDEV-32128 wrong table name in innodb's "row too big" errors	2023-09-08 19:15:33 +02:00
Marko Mäkelä	34c283ba1b	MDEV-32132 DROP INDEX followed by CREATE INDEX may corrupt data ibuf_set_bitmap_for_bulk_load(): Port a bug fix that was made as part of commit `165564d3c3` (MDEV-30009) in MariaDB Server 10.5.19.	2023-09-08 11:28:21 +03:00
Nayana Thorat	961b96a5e0	MDEV-29324 s390x patch srw_lock.cc Fix debug mode build failure on s390x. Replaced builtin_ttest by __builtin_tx_nesting_depth() > 0 as a s390x equivalent version of the expression.	2023-09-07 01:45:49 -07:00
Marko Mäkelä	b0a43818b4	Merge 10.5 into 10.6	2023-09-04 10:15:02 +03:00
Marko Mäkelä	59952b2625	Merge 10.4 into 10.5	2023-09-04 09:40:26 +03:00
Thirunarayanan Balathandayuthapani	d1fca0baab	MDEV-32060 Server aborts when table doesn't have referenced index - Server aborts when table doesn't have referenced index. This is caused by `5f09b53bdb` (MDEV-31086). While iterating the foreign key constraints, we fail to consider that InnoDB doesn't have referenced index for it when foreign key check is disabled.	2023-09-01 17:54:07 +05:30
Marko Mäkelä	2325f8f339	Merge 10.5 into 10.6	2023-08-31 13:01:42 +03:00
Marko Mäkelä	2db5f1b298	MDEV-32049 Deadlock due to log_free_check() in trx_purge_truncate_history() The function log_free_check() is not supposed to be invoked while the caller is holding any InnoDB synchronization objects, such as buffer page latches, tablespace latches, index tree latches, or in this case, rseg->mutex (rseg->latch in 10.6 or later). A hang was reported in 10.6 where several threads were waiting for an rseg->latch that had been exclusively acquired in trx_purge_truncate_history(), which invoked log_free_check() inside trx_purge_truncate_rseg_history(). Because the threads that were waiting for the rseg->latch were holding exclusive latches on some index pages, log_free_check() was unable to advance the checkpoint because those index pages could not be written out. trx_purge_truncate_history(): Invoke log_free_check() before acquiring the rseg->mutex and invoking trx_purge_free_segment(). trx_purge_free_segment(): Do not invoke log_free_check() in order to avoid a deadlock.	2023-08-31 12:14:49 +03:00
Marko Mäkelä	9d1466522e	MDEV-32029 Assertion failures in log_sort_flush_list upon crash recovery In commit `0d175968d1` (MDEV-31354) we only waited that no buf_pool.flush_list writes are in progress. The buf_flush_page_cleaner() thread could still initiate page writes from the buf_pool.LRU list while only holding buf_pool.mutex, not buf_pool.flush_list_mutex. This is something that was changed in commit `a55b951e60` (MDEV-26827). log_sort_flush_list(): Wait for the buf_flush_page_cleaner() thread to be completely idle, including LRU flushing. buf_flush_page_cleaner(): Always broadcast buf_pool.done_flush_list when becoming idle, so that log_sort_flush_list() will be woken up. Also, ensure that buf_pool.n_flush_inc() or buf_pool.flush_list_set_active() has been invoked before any page writes are initiated. buf_flush_try_neighbors(): Release buf_pool.mutex here and not in the callers, to avoid code duplication. Make innodb_flush_neighbors=ON obey the innodb_io_capacity limit.	2023-08-30 14:40:13 +03:00
Marko Mäkelä	31ea201ecc	MDEV-30986 Slow full index scan for I/O bound case buf_page_init_for_read(): Test a condition before acquiring a latch, not while holding it. buf_read_ahead_linear(): Do not use a memory transaction, because it could be too large, leading to frequent retries. Release the hash_lock as early as possible.	2023-08-30 13:20:27 +03:00
Thirunarayanan Balathandayuthapani	e938d7c18f	MDEV-32028 InnoDB scrubbing doesn't write zero while freeing the extent Problem: ======== InnoDB fails to mark the page status as FREED during freeing of an extent of a segment. This behaviour affects scrubbing and doesn't write all zeroes in file even though pages are freed. Solution: ======== InnoDB should mark the page status as FREED before reinitialize the extent descriptor entry.	2023-08-28 20:27:19 +05:30
Dmitry Shulga	1fde785315	MDEV-31890: Compilation failing on MacOS (unknown warning option -Wno-unused-but-set-variable) For clang compiler the compiler's flag -Wno-unused-but-set-variable was set based on compiler version. This approach could result in false positive detection for presence of compiler option since only first three groups of digits in compiler version taken into account and it could lead to inaccuracy in determining of supported compiler's features. Correct way to detect options supported by a compiler is to use the macros MY_CHECK_CXX_COMPILER_FLAG and to check the result of variable with prefix have_CXX__ So, to check whether compiler does support the option -Wno-unused-but-set-variable the macros MY_CHECK_CXX_COMPILER_FLAG(-Wno-unused-but-set-variable) should be called and the result variable have_CXX__Wno_unused_but_set_variable be tested for assigned value.	2023-08-28 16:47:00 +07:00
Thirunarayanan Balathandayuthapani	c438284863	MDEV-31835 Remove unnecesary extra HA_EXTRA_IGNORE_INSERT call - HA_EXTRA_IGNORE_INSERT call is being called for every inserted row, and on partitioned tables on every row * every partition. This leads to slowness during load..data operation - Under bulk operation, multiple insert statement error handling will end up emptying the table. This behaviour introduced by the commit `8ea923f55b` (MDEV-24818). This makes the HA_EXTRA_IGNORE_INSERT call redundant. We can use the same behavior for insert..ignore statement as well. - Removed the extra call HA_EXTRA_IGNORE_INSERT as the solution to improve the performance of load command.	2023-08-25 17:22:17 +05:30
Marko Mäkelä	08a549c33d	Clean up buf_LRU_remove_hashed() buf_LRU_block_remove_hashed(): Test for "not ROW_FORMAT=COMPRESSED" first, because in that case we can assume that an uncompressed page exists. This removes a condition from the likely code branch.	2023-08-25 13:44:59 +03:00
Marko Mäkelä	f7780a8eb8	MDEV-30100: Assertion purge_sys.tail.trx_no <= purge_sys.rseg->last_trx_no() trx_t::commit_empty(): A special case of transaction "commit" when the transaction was actually rolled back or the persistent undo log is empty. In this case, we need to change the undo log header state to TRX_UNDO_CACHED and move the undo log from rseg->undo_list to rseg->undo_cached for fast reuse. Furthermore, unless this is the only undo log record in the page, we will remove the record and rewind TRX_UNDO_PAGE_START, TRX_UNDO_PAGE_FREE, TRX_UNDO_LAST_LOG. We must also ensure that the system-wide transaction identifier will be persisted up to this->id, so that there will not be warnings or errors due to a PAGE_MAX_TRX_ID being too large. We might have modified secondary index pages before being rolled back, and any changes of PAGE_MAX_TRX_ID are never rolled back. Even though it is not going to be written persistently anywhere, we will invoke trx_sys.assign_new_trx_no(this), so that in the test innodb.instant_alter everything will be purged as expected. trx_t::write_serialisation_history(): Renamed from trx_write_serialisation_history(). If there is no undo log, invoke commit_empty(). trx_purge_add_undo_to_history(): Simplify an assertion and remove a comment. This function will not be invoked on an empty undo log anymore. trx_undo_header_create(): Add a debug assertion. trx_undo_mem_create_at_db_start(): Remove a duplicated assignment. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-08-25 13:41:54 +03:00
Marko Mäkelä	4ff5311dec	MDEV-30100 preparation: Simplify InnoDB transaction commit further trx_commit_complete_for_mysql(): Remove some conditions. We will rely on trx_t::commit_lsn. trx_t::must_flush_log_later: Remove. trx_commit_complete_for_mysql() can simply check for trx_t::flush_log_later. trx_t::commit_in_memory(): Set commit_lsn=0 if the log was written. trx_flush_log_if_needed_low(): Renamed to trx_flush_log_if_needed(). Assert that innodb_flush_log_at_trx_commit!=0 was checked by the caller and that the transaction is not in XA PREPARE state. Unconditionally flush the log for data dictionary transactions, to ensure the correct processing of ddl_recovery.log. trx_write_serialisation_history(): Move some code from trx_purge_add_undo_to_history(). trx_prepare(): Invoke log_write_up_to() directly if needed. innobase_commit_ordered_2(): Simplify some conditions. A read-write transaction will always carry nonzero trx_t::id. Let us unconditionally reset mysql_log_file_name, flush_log_later after trx_t::commit() was invoked.	2023-08-25 13:23:21 +03:00
Marko Mäkelä	f4bbea90f1	MDEV-30100 preparation: Simplify InnoDB transaction commit trx_commit_cleanup(): Clean up any temporary undo log. Replaces trx_undo_commit_cleanup() and trx_undo_seg_free(). trx_write_serialisation_history(): Commit the mini-transaction. Do not touch temporary undo logs. Assume that a persistent rollback segment has been assigned. trx_serialise(): Merged into trx_write_serialisation_history(). trx_t::commit_low(): Correct some comments and assertions. trx_t::commit_persist(): Only invoke commit_low() on a mini-transaction if the persistent state needs to change.	2023-08-25 13:16:54 +03:00
Marko Mäkelä	eda75cadea	Merge 10.5 into 10.6	2023-08-24 10:16:24 +03:00
Marko Mäkelä	aeb8eae5c8	Merge 10.4 into 10.5	2023-08-24 10:12:13 +03:00
Marko Mäkelä	02878f128e	MDEV-31813 SET GLOBAL innodb_max_purge_lag_wait hangs if innodb_read_only innodb_max_purge_lag_wait_update(): Return immediately if we are in high_level_read_only mode. srv_wake_purge_thread_if_not_active(): Relax a debug assertion. If srv_read_only_mode holds, purge_sys.enabled() will not hold and this function will do nothing. trx_t::commit_in_memory(): Remove a redundant condition before invoking srv_wake_purge_thread_if_not_active().	2023-08-24 10:08:51 +03:00
Marko Mäkelä	a60462d93e	Remove bogus references to replaced Google contributions In commit `03ca6495df` and commit `ff5d306e29` we forgot to remove some Google copyright notices related to a contribution of using atomic memory access in the old InnoDB mutex_t and rw_lock_t implementation. The copyright notices had been mostly added in commit `c6232c06fa` due to commit `a1bb700fd2`. The following Google contributions remain: * some logic related to the parameter innodb_io_capacity * innodb_encrypt_tables, added in MariaDB Server 10.1	2023-08-21 15:51:16 +03:00
Marko Mäkelä	6cc88c3db1	Clean up buf0buf.inl Let us move some #include directives from buf0buf.inl to the compilation units where they are really used.	2023-08-21 15:51:10 +03:00
Marko Mäkelä	448c2077fb	Merge 10.5 into 10.6	2023-08-21 15:50:31 +03:00
Marko Mäkelä	be5fd3ec35	Remove a stale comment buf_LRU_block_remove_hashed(): Remove a comment that had been added in mysql/mysql-server@aad1c7d0dd and apparently referring to buf_LRU_invalidate_tablespace(), which was later replaced with buf_LRU_flush_or_remove_pages() and ultimately with buf_flush_remove_pages() and buf_flush_list_space(). All that code is covered by buf_pool.mutex. The note about releasing the hash_lock for the buf_pool.page_hash slice would actually apply to the last reference to hash_lock in buf_LRU_free_page(), for the case zip=false (retaining a ROW_FORMAT=COMPRESSED page while discarding the uncompressed one).	2023-08-21 13:28:12 +03:00
Marko Mäkelä	5a8a8fc953	MDEV-31928 Assertion xid ... < 128 failed in trx_undo_write_xid() trx_undo_write_xid(): Correct an off-by-one error in a debug assertion.	2023-08-17 10:31:55 +03:00
Marko Mäkelä	518fe51988	MDEV-31254 InnoDB: Trying to read doublewrite buffer page buf_read_page_low(): Remove an error message that could be triggered by buf_read_ahead_linear() or buf_read_ahead_random(). This is a backport of commit `c9eff1a144` from MariaDB Server 10.5.	2023-08-17 10:31:44 +03:00
Marko Mäkelä	44df6f35aa	MDEV-31875 ROW_FORMAT=COMPRESSED table: InnoDB: ... Only 0 bytes read buf_read_ahead_random(), buf_read_ahead_linear(): Avoid read-ahead of the last page(s) of ROW_FORMAT=COMPRESSED tablespaces that use a page size of 1024 or 2048 bytes. We invoke os_file_set_size() on integer multiples of 4096 bytes in order to be compatible with the requirements of innodb_flush_method=O_DIRECT regardless of the physical block size of the underlying storage. This change must be null-merged to MariaDB Server 10.5 and later. There, out-of-bounds read-ahead should be handled gracefully by simply discarding the buffer page that had been allocated. Tested by: Matthias Leich	2023-08-17 10:31:28 +03:00
Kristian Nielsen	7c9837ce74	Merge 10.4 into 10.5 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 18:02:18 +02:00
Kristian Nielsen	805e0668c9	MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication Remove the exception that InnoDB does not report auto-increment locks waits to the parallel replication. There was an assumption that these waits could not cause conflicts with in-order parallel replication and thus need not be reported. However, this assumption is wrong and it is possible to get conflicts that lead to hangs for the duration of --innodb-lock-wait-timeout. This can be seen with three transactions: 1. T1 is waiting for T3 on an autoinc lock 2. T2 is waiting for T1 to commit 3. T3 is waiting on a normal row lock held by T2 Here, T3 needs to be deadlock killed on the wait by T1. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:40:02 +02:00
Kristian Nielsen	18acbaf416	MDEV-31655: Parallel replication deadlock victim preference code errorneously removed Restore code to make InnoDB choose the second transaction as a deadlock victim if two transactions deadlock that need to commit in-order for parallel replication. This code was erroneously removed when VATS was implemented in InnoDB. Also add a test case for InnoDB choosing the right deadlock victim. Also fixes this bug, with testcase that reliably reproduces: MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:39:49 +02:00
Kristian Nielsen	900c4d6920	MDEV-31655: Parallel replication deadlock victim preference code errorneously removed Restore code to make InnoDB choose the second transaction as a deadlock victim if two transactions deadlock that need to commit in-order for parallel replication. This code was erroneously removed when VATS was implemented in InnoDB. Also add a test case for InnoDB choosing the right deadlock victim. Also fixes this bug, with testcase that reliably reproduces: MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master Note: This should be null-merged to 10.6, as a different fix is needed there due to InnoDB locking code changes. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:35:30 +02:00
Kristian Nielsen	920789e9d4	MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication Remove the exception that InnoDB does not report auto-increment locks waits to the parallel replication. There was an assumption that these waits could not cause conflicts with in-order parallel replication and thus need not be reported. However, this assumption is wrong and it is possible to get conflicts that lead to hangs for the duration of --innodb-lock-wait-timeout. This can be seen with three transactions: 1. T1 is waiting for T3 on an autoinc lock 2. T2 is waiting for T1 to commit 3. T3 is waiting on a normal row lock held by T2 Here, T3 needs to be deadlock killed on the wait by T1. Note: This should be null-merged to 10.6, as a different fix is needed there due to InnoDB lock code changes. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:34:09 +02:00
Marko Mäkelä	e9723c2cbb	MDEV-31473 Wrong information about innodb_checksum_algorithm in information_schema.SYSTEM_VARIABLES MYSQL_SYSVAR_ENUM(checksum_algorithm): Correct the documentation string. Fixes up commit `7a4fbb55b0` (MDEV-25105).	2023-08-14 13:36:17 +03:00
Oleksandr Byelkin	d28d636f57	Merge branch '10.5' into 10.6	2023-08-08 13:20:58 +02:00
Oleksandr Byelkin	8852afe317	Merge branch '10.4' into 10.5	2023-08-08 11:24:42 +02:00
Thirunarayanan Balathandayuthapani	0ede90dd31	MDEV-31869 Server aborts when table does drop column - InnoDB aborts when table is dropping the column. This is caused by `5f09b53bdb` (MDEV-31086). While iterating the altered table fields, we fail to consider the dropped columns.	2023-08-08 13:24:23 +05:30
Oleksandr Byelkin	5ea5291d97	Merge branch '10.5' into 10.6	2023-08-04 07:52:54 +02:00
Sergei Golubchik	ab1191c039	cleanup: key->key_create_info.check_for_duplicate_indexes -> key->old mark old keys in the ALTER TABLE with the `old` flag, not with the `key_create_info.check_for_duplicate_indexes`. This allows to mark old foreign keys too.	2023-08-01 22:43:16 +02:00
Oleksandr Byelkin	6bf8483cac	Merge branch '10.5' into 10.6	2023-08-01 15:08:52 +02:00
Marko Mäkelä	72928e640e	MDEV-27593: Crashing on I/O error is unhelpful buf_page_t::write_complete(), buf_page_write_complete(), IORequest::write_complete(): Add a parameter for passing an error code. If an error occurred, we will release the io-fix, buffer-fix and page latch but not reset the oldest_modification field. The block would remain in buf_pool.LRU and possibly buf_pool.flush_list, to be written again later, by buf_flush_page_cleaner(). If all page writes start consistently failing, all write threads should eventually hang in log_free_check() because the log checkpoint cannot be advanced to make room in the circular write-ahead-log ib_logfile0. IORequest::read_complete(): Add a parameter for passing an error code. If a read operation fails, we report the error and discard the page, just like we would do if the page checksum was not validated or the page could not be decrypted. This only affects asynchronous reads, due to linear or random read-ahead or crash recovery. When buf_page_get_low() invokes buf_read_page(), that will be a synchronous read, not involving this code. This was tested by randomly injecting errors in write_io_callback() and read_io_callback(), like this: if (!ut_rnd_interval(100)) cb->m_err= 42;	2023-08-01 14:39:29 +03:00
Marko Mäkelä	96cfdb8710	MDEV-31816 fixup: Relax a debug assertion buf_LRU_free_page(): The block may also be in the IBUF_EXIST state when executing the test innodb.innodb_bulk_create_index_debug.	2023-08-01 13:22:16 +03:00
Oleksandr Byelkin	65405308a1	Merge branch '10.4' into 10.5	2023-08-01 11:52:13 +02:00
Marko Mäkelä	d794d3484b	MDEV-31816 buf_LRU_free_page() does not preserve ROW_FORMAT=COMPRESSED block state buf_LRU_free_page(): When we are discarding the uncompressed copy of a ROW_FORMAT=COMPRESSED page, buf_page_t::can_relocate() must have ensured that the block descriptor state is one of FREED, UNFIXED, REINIT. Do not overwrite the state with UNFIXED. We do not want to write back pages that were actually freed, and we want to avoid doublewrite for pages that were (re)initialized by log records written since the latest checkpoint. Last but not least, we do not want crashes like those that commit `dc1bd1802a` (MDEV-31386) was supposed to fix. The test innodb_zip.wl5522_zip should typically cover all 3 states. This bug is a regression due to commit `aaef2e1d8c` (MDEV-27058).	2023-08-01 09:58:15 +03:00
Aleksey Midenkov	69b118a346	Revert "MDEV-30528 Assertion in dtype_get_at_most_n_mbchars" This reverts commit `add0c01bae` Duplicates must be avoided in FTS_DOC_ID_INDEX	2023-07-31 16:57:18 +03:00
Marko Mäkelä	0d175968d1	MDEV-31354 SIGSEGV in log_sort_flush_list() in InnoDB crash recovery log_sort_flush_list(): Wait for any pending page writes to cease before sorting the buf_pool.flush_list. Starting with commit `22b62edaed` (MDEV-25113), it is possible that some buf_page_t::oldest_modification_ that we will be comparing in std::sort() will be updated from some value >2 to 1 while we are holding buf_pool.flush_list_mutex. To catch this type of trouble better in the future, we will clean garbage (pages that have been written out) from buf_pool.flush_list while constructing the array for sorting, and check with debug assertions that all blocks that we are copying from the array to the list will be dirty (requiring a writeback) while we sort and copy the array back to buf_pool.flush_list. This failure was observed by chance exactly once when running the test innodb.recovery_memory. It was never reproduced in the same form afterwards. Unrelated to this change, the test does occasionally reproduce a failure to start up InnoDB due to a corrupted page being read in recovery. The ticket MDEV-31791 was filed for that. Tested by: Matthias Leich	2023-07-28 12:36:45 +03:00
Oleksandr Byelkin	7564be1352	Merge branch '10.4' into 10.5	2023-07-26 16:02:57 +02:00
Marko Mäkelä	b102872ad5	MDEV-31767 InnoDB tables are being flagged as corrupted on an I/O bound server The main problem is that at ever since commit `aaef2e1d8c` removed the function buf_wait_for_read(), it is not safe to invoke buf_page_get_low() with RW_NO_LATCH, that is, only buffer-fixing the page. If a page read (or decryption or decompression) is in progress, there would be a race condition when executing consistency checks, and a page would wrongly be flagged as corrupted. Furthermore, if the page is actually corrupted and the initial access to it was with RW_NO_LATCH (only buffer-fixing), the page read handler would likely end up in an infinite loop in buf_pool_t::corrupted_evict(). It is not safe to invoke mtr_t::upgrade_buffer_fix() on a block on which a page latch was not initially acquired in buf_page_get_low(). btr_block_reget(): Remove the constant parameter rw_latch=RW_X_LATCH. btr_block_get(): Assert that RW_NO_LATCH is not being used, and change the parameter type of rw_latch. btr_pcur_move_to_next_page(), innobase_table_is_empty(): Adjust for the parameter type change of btr_block_get(). btr_root_block_get(): If mode==RW_NO_LATCH, do not check the integrity of the page, because it is not safe to do so. btr_page_alloc_low(), btr_page_free(): If the root page latch is not previously held by the mini-transaction, invoke btr_root_block_get() again with the proper latching mode. btr_latch_prev(): Helper function to safely acquire a latch on a preceding sibling page while holding a latch on a B-tree page. To avoid deadlocks, we must not wait for the latch while holding a latch on the current page, because another thread may be waiting for our page latch when moving to the next page from our preceding sibling page. If s_lock_try() or x_lock_try() on the preceding page fails, we must release the current page latch, and wait for the latch on the preceding page as well as the current page, in that order. Page splits or merges will be prevented by the parent page latch that we are holding. btr_cur_t::search_leaf(): Make use of btr_latch_prev(). btr_cur_t::open_leaf(): Make use of btr_latch_prev(). Do not invoke mtr_t::upgrade_buffer_fix() (when latch_mode == BTR_MODIFY_TREE), because we will already have acquired all page latches upfront. btr_cur_t::pessimistic_search_leaf(): Do acquire an exclusive index latch before accessing the page. Make use of btr_latch_prev().	2023-07-25 11:40:58 +03:00
Marko Mäkelä	9bb5b25325	MDEV-31120 Duplicate entry allowed into a UNIQUE column row_ins_sec_index_entry_low(): Correct a condition that was inadvertently inverted in commit `89ec4b53ac` (MDEV-29603). We are not supposed to buffer INSERT operations into unique indexes, because duplicate key values would not be checked for. It is only allowed when using unique_checks=0, and in that case the user is supposed to guarantee that there are no duplicates.	2023-07-24 12:29:43 +03:00
Aleksey Midenkov	14cc7e7d6e	MDEV-25644 UPDATE not working properly on transaction precise system versioned table First UPDATE under START TRANSACTION does nothing (nstate= nstate), but anyway generates history. Since update vector is empty we get into (!uvect->n_fields) branch which only adds history row, but does not do update. After that we get current row with wrong (old) row_start value and because of that second UPDATE tries to insert history row again because it sees trx->id != row_start which is the guard to avoid inserting multiple trx_id-based history rows under same transaction (because we have same trx_id and we get duplicate error and this bug demostrates that). But this try anyway fails because PK is based on row_end which is constant under same transaction, so PK didn't change. The fix moves vers_make_update() to an earlier stage of calc_row_difference(). Therefore it prepares update vector before (!uvect->n_fields) check and never gets into that branch, hence no need to handle versioning inside that condition anymore. Now trx->id and row_start are equal after first UPDATE and we don't try to insert second history row. == Cleanups and improvements == ha_innobase::update_row(): vers_set_fields and vers_ins_row are cleaned up into direct condition check. SQLCOM_ALTER_TABLE check now is not used as this is dead code, assertion is done instead. upd_node->is_delete is set in calc_row_difference() just to keep versioning code as much in one place as possible. vers_make_delete() is still located in row_update_for_mysql() as this is required for ha_innodbase::delete_row() as well. row_ins_duplicate_error_in_clust(): Restrict DB_FOREIGN_DUPLICATE_KEY to the better conditions. VERSIONED_DELETE is used specifically to help lower stack to understand what caused current insert. Related to MDEV-29813.	2023-07-20 18:22:31 +03:00
Aleksey Midenkov	add0c01bae	MDEV-30528 Assertion in dtype_get_at_most_n_mbchars 1. Exclude merging history rows into fts index. The check !history_fts && (index->type & DICT_FTS) was just incorrect attempt to avoid history in fts index. 2. Don't check for duplicates for history rows.	2023-07-20 18:22:30 +03:00
Oleksandr Byelkin	f52954ef42	Merge commit '10.4' into 10.5	2023-07-20 11:54:52 +02:00
Vlad Lesin	090a84366a	MDEV-29311 Server Status Innodb_row_lock_time% is reported in seconds Before MDEV-24671, the wait time was derived from my_interval_timer() / 1000 (nanoseconds converted to microseconds, and not microseconds to milliseconds like I must have assumed). The lock_sys.wait_time and lock_sys.wait_time_max are already in milliseconds; we should not divide them by 1000. In MDEV-24738 the millisecond counts lock_sys.wait_time and lock_sys.wait_time_max were changed to a 32-bit type. That would overflow in 49.7 days. Keep using a 64-bit type for those millisecond counters. Reviewed by: Marko Mäkelä	2023-07-10 12:42:46 +03:00
Monty	99bd226059	MDEV-31558 Add InnoDB engine information to the slow query log The new statistics is enabled by adding the "engine", "innodb" or "full" option to --log-slow-verbosity Example output: # Pages_accessed: 184 Pages_read: 95 Pages_updated: 0 Old_rows_read: 1 # Pages_read_time: 17.0204 Engine_time: 248.1297 Page_read_time is time doing physical reads inside a storage engine. (Writes cannot be tracked as these are usually done in the background). Engine_time is the time spent inside the storage engine for the full duration of the read/write/update calls. It uses the same code as 'analyze statement' for calculating the time spent. The engine statistics is done with a generic interface that should be easy for any engine to use. It can also easily be extended to provide even more statistics. Currently only InnoDB has counters for Pages_% and Undo_% status. Engine_time works for all engines. Implementation details: class ha_handler_stats holds all engine stats. This class is included in handler and THD classes. While a query is running, all statistics is updated in the handler. In close_thread_tables() the statistics is added to the THD. handler::handler_stats is a pointer to where statistics should be collected. This is set to point to handler::active_handler_stats if stats are requested. If not, it is set to 0. handler_stats has also an element, 'active' that is 1 if stats are requested. This is to allow engines to avoid doing any 'if's while updating the statistics. Cloned or partition tables have the pointer set to the base table if status are requested. There is a small performance impact when using --log-slow-verbosity=engine: - All engine calls in 'select' will be timed. - IO calls for InnoDB reads will be timed. - Incrementation of counters are done on local variables and accesses are inline, so these should have very little impact. - Statistics has to be reset for each statement for the THD and each used handler. This is only 40 bytes, which should be neglectable. - For partition tables we have to loop over all partitions to update the handler_status as part of table_init(). Can be optimized in the future to only do this is log-slow-verbosity changes. For this to work we have to update handler_status for all opened partitions and also for all partitions opened in the future. Other things: - Added options 'engine' and 'full' to log-slow-verbosity. - Some of the new files in the test suite comes from Percona server, which has similar status information. - buf_page_optimistic_get(): Do not increment any counter, since we are only validating a pointer, not performing any buf_pool.page_hash lookup. - Added THD argument to save_explain_data_intern(). - Switched arguments for save_explain_.*_data() to have always THD first (generates better code as other functions also have THD first).	2023-07-07 12:53:18 +03:00
Vlad Lesin	1bfd3cc457	MDEV-10962 Deadlock with 3 concurrent DELETEs by unique key PROBLEM: A deadlock was possible when a transaction tried to "upgrade" an already held Record Lock to Next Key Lock. SOLUTION: This patch is based on observations that: (1) a Next Key Lock is equivalent to Record Lock combined with Gap Lock (2) a GAP Lock never has to wait for any other lock In case we request a Next Key Lock, we check if we already own a Record Lock of equal or stronger mode, and if so, then we change the requested lock type to GAP Lock, which we either already have, or can be granted immediately, as GAP locks don't conflict with any other lock types. (We don't consider Insert Intention Locks a Gap Lock in above statements). The reason of why we don't upgrage Record Lock to Next Key Lock is the following. Imagine a transaction which does something like this: for each row { request lock in LOCK_X\|LOCK_REC_NOT_GAP mode request lock in LOCK_S mode } If we upgraded lock from Record Lock to Next Key lock, there would be created only two lock_t structs for each page, one for LOCK_X\|LOCK_REC_NOT_GAP mode and one for LOCK_S mode, and then used their bitmaps to mark all records from the same page. The situation would look like this: request lock in LOCK_X\|LOCK_REC_NOT_GAP mode on row 1: // -> creates new lock_t for LOCK_X\|LOCK_REC_NOT_GAP mode and sets bit for // 1 request lock in LOCK_S mode on row 1: // -> notices that we already have LOCK_X\|LOCK_REC_NOT_GAP on the row 1, // so it upgrades it to X request lock in LOCK_X\|LOCK_REC_NOT_GAP mode on row 2: // -> creates a new lock_t for LOCK_X\|LOCK_REC_NOT_GAP mode (because we // don't have any after we've upgraded!) and sets bit for 2 request lock in LOCK_S mode on row 2: // -> notices that we already have LOCK_X\|LOCK_REC_NOT_GAP on the row 2, // so it upgrades it to X ...etc...etc.. Each iteration of the loop creates a new lock_t struct, and in the end we have a lot (one for each record!) of LOCK_X locks, each with single bit set in the bitmap. Soon we run out of space for lock_t structs. If we create LOCK_GAP instead of lock upgrading, the above scenario works like the following: // -> creates new lock_t for LOCK_X\|LOCK_REC_NOT_GAP mode and sets bit for // 1 request lock in LOCK_S mode on row 1: // -> notices that we already have LOCK_X\|LOCK_REC_NOT_GAP on the row 1, // so it creates LOCK_S\|LOCK_GAP only and sets bit for 1 request lock in LOCK_X\|LOCK_REC_NOT_GAP mode on row 2: // -> reuses the lock_t for LOCK_X\|LOCK_REC_NOT_GAP by setting bit for 2 request lock in LOCK_S mode on row 2: // -> notices that we already have LOCK_X\|LOCK_REC_NOT_GAP on the row 2, // so it reuses LOCK_S\|LOCK_GAP setting bit for 2 In the end we have just two locks per page, one for each mode: LOCK_X\|LOCK_REC_NOT_GAP and LOCK_S\|LOCK_GAP. Another benefit of this solution is that it avoids not-entirely const-correct, (and otherwise looking risky) "upgrading". The fix was ported from mysql/mysql-server@bfba840dfa mysql/mysql-server@75cefdb1f7 Reviewed by: Marko Mäkelä	2023-07-06 15:06:10 +03:00
Marko Mäkelä	2855bc53bc	Merge 10.5 into 10.6	2023-07-05 16:40:22 +03:00
Marko Mäkelä	bd7908e6ac	MDEV-31568 InnoDB protection against dual processes accessing data insufficient fil_node_open_file_low(): Always acquire an advisory lock on the system tablespace. Originally, we already did this in SysTablespace::open_file(), but SysTablespace::open_or_create() would release those locks when it is closing the file handles. This is a 10.5+ specific follow up to commit `0ee1082bd2` (MDEV-28495). Thanks to Daniel Black for verifying this bug.	2023-07-05 15:15:04 +03:00
Marko Mäkelä	5b62644e68	MDEV-31621 Remove ibuf_read_merge_pages() call from ibuf_insert_low() When InnoDB attempts to buffer a change operation of a secondary index leaf page (to insert, delete-mark or remove a record) and the change buffer is too large, InnoDB used to trigger a change buffer merge that could affect any tables. This could lead to huge variance in system throughput and potentially unpredictable crashes, in case the change buffer was corrupted and a crash occurred while attempting to merge changes to a table that is not being accessed by the current SQL statement. ibuf_insert_low(): Simply return DB_STRONG_FAIL when the maximum size of the change buffer is exceeded. ibuf_contract_after_insert(): Remove. ibuf_get_merge_page_nos_func(): Remove a constant parameter. The function ibuf_contract() will be our only caller, during shutdown with innodb_fast_shutdown=0.	2023-07-05 08:48:37 +03:00
Marko Mäkelä	cb364a78d6	MDEV-31619 dict_stats_persistent_storage_check() may show garbage during --bootstrap dict_stats_persistent_storage_check(): Do not output errmsg if opt_bootstrap holds, because the message buffer would likely be uninitialized.	2023-07-04 15:24:57 +03:00
Marko Mäkelä	f7b8a2c953	MDEV-31607 ER_DUP_KEY in mysql.innodb_table_stats upon RENAME on sequence ha_innobase::delete_table(): Also on DROP SEQUENCE, do try to drop any persistent statistics. They should really not be created for SEQUENCE objects (which internally are 1-row no-rollback tables), but that is how happened to always work.	2023-07-03 16:47:58 +03:00
Marko Mäkelä	b8088487e4	MDEV-19216 Assertion ...SYS_FOREIGN failed in btr_node_ptr_max_size btr_node_ptr_max_size(): Handle BINARY(0) and VARBINARY(0) as special cases, similar to CHAR(0) and VARCHAR(0).	2023-07-03 16:09:18 +03:00
Marko Mäkelä	dc1bd1802a	MDEV-31386 InnoDB: Failing assertion: page_type == i_s_page_type[page_type].type_value i_s_innodb_buffer_page_get_info(): Correct a condition. After crash recovery, there may be some buffer pool pages in FREED state, containing garbage (invalid data page contents). Let us ignore such pages in the INFORMATION_SCHEMA output. The test innodb.innodb_defragment_fill_factor will be removed, because the queries that it is invoking on information_schema.innodb_buffer_page would start to fail. The defragmentation feature was removed in commit `7ca89af6f8` in MariaDB Server 11.1. Tested by: Matthias Leich	2023-07-03 14:39:29 +03:00
Marko Mäkelä	3d90143859	MDEV-31559 btr_search_hash_table_validate() does not check if CHECK TABLE is killed btr_search_hash_table_validate(), btr_search_validate(): Add the parameter THD for checking if the statement has been killed. Any non-QUICK CHECK TABLE will validate the entire adaptive hash index for all InnoDB tables, which may be extremely slow when running multiple concurrent CHECK TABLE.	2023-06-30 17:07:21 +03:00
Marko Mäkelä	33877cfeae	Fix WITH_UBSAN GCC -Wconversion	2023-06-28 17:07:00 +03:00
Vlad Lesin	687fd6bef5	MDEV-30648 btr_estimate_n_rows_in_range() accesses unfixed, unlatched page The issue is caused by MDEV-30400 fix. There are two cursors in btr_estimate_n_rows_in_range() - p1 and p2, but both share the same mtr. Each cursor contains mtr savepoint for the previously fetched block to release it then the current block is fetched. Before MDEV-30400 the block was released with mtr_t::release_block_at_savepoint(), it just unfixed a block and released its page patch. In MDEV-30400 it was replaced with mtr_t::rollback_to_savepoint(), which does the same as the former mtr_t::release_block_at_savepoint(ulint begin, ulint end) but also erases the corresponding slots from mtr memo, what invalidates any stored mtr's memo savepoints, greater or equal to "begin". The idea of the fix is to get rid of savepoints at all in btr_estimate_n_rows_in_range() and btr_estimate_n_rows_in_range_on_level(). As mtr_t::rollback_to_savepoint() erases elements from mtr_t::m_memo, we know what element of mtr_t::m_memo can be deleted on the certain case, so there is no need to store savepoints. See also the following slides for details: https://docs.google.com/presentation/d/1RFYBo7EUhM22ab3GOYctv3j_3yC0vHtBY9auObZec8U Reviewed by: Marko Mäkelä	2023-06-28 11:00:02 +03:00
Thirunarayanan Balathandayuthapani	5f09b53bdb	MDEV-31086 MODIFY COLUMN can break FK constraints, and lead to unrestorable dumps - When foreign_key_check is disabled, allowing to modify the column which is part of foreign key constraint can lead to refusal of TRUNCATE TABLE, OPTIMIZE TABLE later. So it make sense to block the column modify operation when foreign key is involved irrespective of foreign_key_check variable. Correct way to modify the charset of the column when fk is involved: SET foreign_key_checks=OFF; ALTER TABLE child DROP FOREIGN KEY fk, MODIFY m VARCHAR(200) CHARSET utf8mb4; ALTER TABLE parent MODIFY m VARCHAR(200) CHARSET utf8mb4; ALTER TABLE child ADD CONSTRAINT FOREIGN KEY (m) REFERENCES PARENT(m); SET foreign_key_checks=ON; fk_check_column_changes(): Remove the FOREIGN_KEY_CHECKS while checking the column change for foreign key constraint. This is the partial revert of commit `5f1f2fc0e4` and it changes the behaviour of copy alter algorithm ha_innobase::prepare_inplace_alter_table(): Find the modified column and check whether it is part of existing and newly added foreign key constraint.	2023-06-27 16:58:22 +05:30
Marko Mäkelä	694ce0d08e	Merge 10.5 into 10.6	2023-06-27 13:03:32 +03:00
Marko Mäkelä	84dbd0253d	MDEV-31487: Recovery or backup failure after innodb_undo_log_truncate=ON recv_sys_t::parse(): For undo tablespace truncation mini-transactions, remember the start_lsn instead of the end LSN. This is what we expect after commit `461402a564` (MDEV-30479).	2023-06-27 09:12:38 +03:00
Marko Mäkelä	493083833b	Merge 10.5 into 10.6	2023-06-26 17:11:38 +03:00
Thirunarayanan Balathandayuthapani	bd076d4dff	MDEV-31442 page_cleaner thread aborts while releasing the tablespace - InnoDB shouldn't acquire the tablespace when it is being stopped or closed	2023-06-16 14:58:48 +05:30
Thirunarayanan Balathandayuthapani	841e905f20	MDEV-31442 page_cleaner thread aborts while releasing the tablespace After further I/O on a tablespace has been stopped (for example due to DROP TABLE or an operation that rebuilds a table), page cleaner thread tries to flush the pending writes for the tablespace and releases the tablespace reference even though it was not acquired. fil_space_t::flush(): Don't release the tablespace when it is being stopped and closed Thanks to Marko Mäkelä for suggesting this patch.	2023-06-09 18:15:33 +05:30
Thirunarayanan Balathandayuthapani	bf0a54df34	MDEV-31416 ASAN errors in dict_v_col_t::detach upon adding key to virtual column - InnoDB throws ASAN error while adding the index on virtual column of system versioned table. InnoDB wrongly assumes that virtual column collation type changes, creates new column with different character set. This leads to failure while detaching the column from indexes.	2023-06-08 16:34:45 +05:30
Marko Mäkelä	80585c9d6f	Merge 10.5 into 10.6	2023-06-08 10:42:56 +03:00
Marko Mäkelä	c25b496724	MDEV-31382 SET GLOBAL innodb_undo_log_truncate=ON has no effect on logically empty undo logs innodb_undo_log_truncate_update(): A callback function. If SET GLOBAL innodb_undo_log_truncate=ON, invoke srv_wake_purge_thread_if_not_active(). srv_wake_purge_thread_if_not_active(): If innodb_undo_log_truncate=ON, always wake up the purge subsystem. srv_do_purge(): If the history is empty, invoke trx_purge_truncate_history() in order to free undo log pages. trx_purge_truncate_history(): If head.trx_no==0, consider the cached undo logs to be free. trx_purge(): Remove the parameter "bool truncate" and let the caller invoke trx_purge_truncate_history() directly. Reviewed by: Vladislav Lesin	2023-06-08 09:18:21 +03:00
Marko Mäkelä	3e40f9a7f3	MDEV-31355 innodb_undo_log_truncate=ON fails to wait for purge of enough transaction history purge_sys_t::sees(): Wrapper for view.sees(). trx_purge_truncate_history(): Invoke purge_sys.sees() instead of comparing to head.trx_no, to determine if undo pages can be safely freed. The test innodb.cursor-restore-locking was adjusted by Vladislav Lesin, as was the the debug instrumentation in row_purge_del_mark(). Reviewed by: Vladislav Lesin	2023-06-08 09:17:52 +03:00
Oleksandr Byelkin	04f0b955dd	Merge branch '10.6' into 10.6.14	2023-06-07 19:59:52 +02:00
Marko Mäkelä	c93754d45e	MDEV-31234 related cleanup trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Replace some unreachable code with debug assertions. A buffer-fix does prevent pages from being evicted from the buffer pool; see buf_page_t::can_relocate(). Tested by: Matthias Leich	2023-06-05 18:53:20 +02:00
Sergei Golubchik	a42a6fa99b	Merge branch 'bb-10.5-release' into bb-10.6-release	2023-06-05 18:53:02 +02:00
Marko Mäkelä	cf37e44eec	MDEV-31350: Hang in innodb.recovery_memory buf_flush_page_cleaner(): Whenever buf_pool.ran_out(), invoke buf_pool.get_oldest_modification(0) so that all clean blocks will be removed from buf_pool.flush_list and buf_flush_LRU_list_batch() will be able to evict some pages. This fixes a regression that was likely caused by commit `a55b951e60` (MDEV-26827).	2023-06-03 11:12:32 +02:00
Marko Mäkelä	dd298873da	MDEV-31309 Innodb_buffer_pool_read_requests is not updated correctly srv_export_innodb_status(): Update export_vars.innodb_buffer_pool_read_requests as it was done before commit `a55b951e60` (MDEV-26827). If innodb_status_variables[] pointed to a sharded variable, it would only access the first shard.	2023-06-03 11:12:27 +02:00
Marko Mäkelä	89eb6fa8a7	MDEV-31308 InnoDB monitor trx_rseg_history_len was accidentally disabled by default innodb_counter_info[]: Revert a change that was accidentally made in commit `204e7225dc`	2023-06-03 11:12:21 +02:00
Marko Mäkelä	883333a74e	MDEV-31158: Potential hang with ROW_FORMAT=COMPRESSED tables btr_cur_need_opposite_intention(): Check also page_zip_available() so that we will escalate to exclusive index latch when a non-leaf page may have to be split further due to ROW_FORMAT=COMPRESSED page overflow. Tested by: Matthias Leich	2023-06-03 11:12:16 +02:00
Marko Mäkelä	459eb9a686	MDEV-29593 fixup: Avoid a leak if rseg.undo_cached is corrupted trx_purge_truncate_rseg_history(): Avoid a leak similar to the one that was fixed in MDEV-31324, in case a supposedly cached undo log page is not found in the rseg.undo_cached list.	2023-06-03 11:12:11 +02:00
Marko Mäkelä	e89bd39c9b	MDEV-31343 Another server hang with innodb_undo_log_truncate=ON trx_purge_truncate_history(): While waiting for a write-fixed block to become available, simply wait for an exclusive latch on it. Also, simplify the iteration: first check for oldest_modification>2 (to ignore clean pages or pages belonging to the temporary tablespace) and then compare the tablespace identifier. Before releasing buf_pool.flush_list_mutex we will buffer-fix the block of interest. In that way, buf_page_t::can_relocate() will not hold on the block and it must remain in the buffer pool until we have acquired an exclusive latch on it. If the block is still dirty, we will register it with the tablespace truncation mini-transaction; else, we will simply release the latch and buffer-fix and move to the next block. This also reverts commit `c4d7939989` because that fix should no longer be necessary; the wait for an exclusive block latch should allow buf_pool_t::release_freed_page() on the same block to proceed. Tested by: Axel Schwenke, Matthias Leich	2023-06-03 11:12:03 +02:00
Marko Mäkelä	3b4b512d8e	MDEV-31234 fixup: Allow innodb_undo_log_truncate=ON after upgrade trx_purge_truncate_history(): Relax a condition that would prevent undo log truncation if the undo log tablespaces were "contaminated" by the bug that commit `e0084b9d31` fixed. That is, trx_purge_truncate_rseg_history() would have invoked flst_remove() on TRX_RSEG_HISTORY but not reduced TRX_RSEG_HISTORY_SIZE. To avoid any regression with normal operation, we implement this fixup during slow shutdown only. The condition on the history list being empty is necessary: without it, in the test innodb.undo_truncate_recover there may be much fewer than the expected 90,000 calls to row_purge() before the truncation. That is, we would truncate the undo tablespace before actually having processed all undo log records in it. To truncate such "contaminated" or "bloated" undo log tablespaces (when using innodb_undo_tablespaces=2 or more) you can execute the following SQL: BEGIN;INSERT mysql.innodb_table_stats VALUES('','',DEFAULT,0,0,0);ROLLBACK; SET GLOBAL innodb_undo_log_truncate=ON, innodb_fast_shutdown=0; SHUTDOWN; The first line creates a dummy InnoDB transaction, to ensure that there will be some history to be purged during shutdown and that the undo tablespaces will be truncated.	2023-06-03 10:59:53 +02:00
Marko Mäkelä	48d6a5f61b	MDEV-31234 fixup: Free some UNDO pages earlier trx_purge_truncate_rseg_history(): Add a parameter to specify if the entire rollback segment is safe to be freed. If not, we may still be able to invoke trx_undo_truncate_start() and free some pages.	2023-06-03 10:59:47 +02:00
Marko Mäkelä	318012a80a	MDEV-31234 InnoDB does not free UNDO after the fix of MDEV-30671 trx_purge_truncate_history(): Only call trx_purge_truncate_rseg_history() if the rollback segment is safe to process. This will avoid leaking undo log pages that are not yet ready to be processed. This fixes a regression that was introduced in commit `0de3be8cfd` (MDEV-30671). trx_sys_t::any_active_transactions(): Separately count XA PREPARE transactions. srv_purge_should_exit(): Terminate slow shutdown if the history size does not change and XA PREPARE transactions exist in the system. This will avoid a hang of the test innodb.recovery_shutdown. Tested by: Matthias Leich	2023-06-03 10:59:42 +02:00
Marko Mäkelä	f569e06e03	MDEV-31385 Change buffer stale entries leads to corruption while reusing page buf_page_free(): If buffered changes existed for the page, drop them. Co-developed with Thirunarayanan Balathandayuthapani	2023-06-02 11:06:09 +03:00
Marko Mäkelä	8a86df37ef	MDEV-31088 Server freeze due to innodb_change_buffering A 3-thread deadlock has been frequently observed when using innodb_change_buffering!=none and innodb_file_per_table=0: (1) ibuf_merge_or_delete_for_page() holding an exclusive latch on the block and waiting for an exclusive tablespace latch in fseg_page_is_allocated() (2) btr_free_but_not_root() in fseg_free_step() waiting for an exclusive tablespace latch (3) fsp_alloc_free_page() holding the exclusive tablespace latch and waiting for a latch on the block, which it is reallocating for something else While this was reproduced using innodb_file_per_table=0, this hang should be theoretically possible in .ibd files as well, when the recovery or cleanup of a failed DROP INDEX or ADD INDEX is executing concurrently with something that involves page allocation. ibuf_merge_or_delete_for_page(): Avoid invoking fseg_page_is_allocated() when block==nullptr. The call was redundant in this case, and it could cause deadlocks due to latching order violation. ibuf_read_merge_pages(): Acquire an exclusive tablespace latch before invoking buf_page_get_gen(), which may cause fseg_page_is_allocated() to be invoked in ibuf_merge_or_delete_for_page(). Note: This will not fix all latching order violations in this area! Deadlocks involving ibuf_merge_or_delete_for_page(block!=nullptr) are still possible if the caller is not acquiring an exclusive tablespace latch upfront. This would be the case in any read operation that involves a change buffer merge, such as SELECT, CHECK TABLE, or any DML operation that cannot be buffered in the change buffer.	2023-06-02 10:44:34 +03:00
Marko Mäkelä	548a41c5ec	Merge 10.5 into 10.6	2023-06-01 12:28:40 +03:00
Marko Mäkelä	bb9da13baf	MDEV-31373 innodb_undo_log_truncate=ON recovery results in a corrupted undo log recv_sys_t::apply(): When applying an undo log truncation operation, invoke os_file_truncate() on space->recv_size, which must not be less than the original truncated file size. Alternatively, as pointed out by Thirunarayanan Balathandayuthapani, we could assign space->size = t.pages, so that fil_system_t::extend_to_recv_size() would extend the file back to space->recv_size.	2023-06-01 12:11:18 +03:00
Marko Mäkelä	3aea77edeb	MDEV-31347 fil_ibd_create() may hijack the file handle of an old file fil_space_t::add(): If a file handle was passed, invoke fil_node_t::find_metadata() before releasing fil_system.mutex. The call was moved from fil_ibd_create(). This is a 10.5 version of commit `e3b06156c6` from 10.6.	2023-06-01 09:41:17 +03:00
Thirunarayanan Balathandayuthapani	5919f7b675	MDEV-31264 Purge trying to access freed secondary index page - InnoDB purge tries to access aborted secondary index and access the freed secondary index root page.	2023-05-31 19:07:41 +05:30
Marko Mäkelä	e3b06156c6	MDEV-31347 fil_ibd_create() may hijack the file handle of an old file fil_ibd_create(): Hold fil_system.mutex until fil_node_t::find_metadata() has completed, so that node->handle cannot be closed by a concurrent thread. This race condition was introduced in commit `10dd290b4b` (MDEV-17380). Tested by: Matthias Leich	2023-05-31 15:25:07 +03:00
Marko Mäkelä	eb20e7c900	MDEV-31353 InnoDB recovery hangs after reporting corruption recv_recover_page(): Remove some code which was added in commit `0b47c126e3` with no good reason and which would cause a hang after a corrupted page was reported during crash recovery. Tested by: Matthias Leich	2023-05-31 15:20:54 +03:00
Marko Mäkelä	a6c0a27696	MDEV-31362 recv_sys_t::apply(bool): Assertion `!last_batch \|\| recovered_lsn == scanned_lsn' failed recv_sys_t::apply(): Remove a bogus debug assertion that had been added in commit `f2c17cc9d9` (MDEV-29911). It is perfectly normal that when the server was killed in the middle of writing multiple redo log blocks, the recovery would end such that recv_sys.scanned_lsn will point to the end of the last complete 512-byte log block, but recv_sys.recovered_lsn will be less than that. Also, correct the function comment of recv_sys_t::parse().	2023-05-30 17:21:49 +03:00
Marko Mäkelä	ce547cfc05	MDEV-31350: Hang in innodb.recovery_memory buf_flush_page_cleaner(): Whenever buf_pool.ran_out(), invoke buf_pool.get_oldest_modification(0) so that all clean blocks will be removed from buf_pool.flush_list and buf_flush_LRU_list_batch() will be able to evict some pages. This fixes a regression that was likely caused by commit `a55b951e60` (MDEV-26827).	2023-05-26 16:40:07 +03:00
Marko Mäkelä	7b72fc0a57	MDEV-22739 !cursor->index->is_committed() in row0ins.cc row_ins_sec_index_entry_by_modify(): When noticing a corrupted secondary index on which CREATE INDEX is not in progress, return DB_CORRUPTION instead of intentionally crashing the server. Tested by: Matthias Leich	2023-05-26 16:40:02 +03:00
Marko Mäkelä	e38c075aa0	MDEV-31346 trx_purge_add_undo_to_history() is not optimal trx_undo_set_state_at_finish(): Merge to its only caller, trx_purge_add_undo_to_history(). trx_purge_add_undo_to_history(): Evaluate the condition related to TRX_UNDO_STATE only once. Tested by: Matthias Leich	2023-05-26 16:39:46 +03:00
Marko Mäkelä	db8765500e	MDEV-31343 Another server hang with innodb_undo_log_truncate=ON trx_purge_truncate_history(): While waiting for a write-fixed block to become available, simply wait for an exclusive latch on it. Also, simplify the iteration: first check for oldest_modification>2 (to ignore clean pages or pages belonging to the temporary tablespace) and then compare the tablespace identifier. Before releasing buf_pool.flush_list_mutex we will buffer-fix the block of interest. In that way, buf_page_t::can_relocate() will not hold on the block and it must remain in the buffer pool until we have acquired an exclusive latch on it. If the block is still dirty, we will register it with the tablespace truncation mini-transaction; else, we will simply release the latch and buffer-fix and move to the next block. This also reverts commit `c4d7939989` because that fix should no longer be necessary; the wait for an exclusive block latch should allow buf_pool_t::release_freed_page() on the same block to proceed. Tested by: Axel Schwenke, Matthias Leich	2023-05-26 16:16:10 +03:00
Alexander Barkov	9edb1a5ce3	MDEV-30483 After upgrade to 10.6 from Mysql 5.7 seeing "InnoDB: Column last_update in table mysql.innodb_table_stats is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL" Problem: Field_timestampf implementations differ in MySQL and MariaDB: - MariaDB sets the UNSIGNED_FLAG in Field::flags - MySQL does not The reference table structures (defined in table_stats_schema and index_stats_schema) expected the last_update column to have the DATA_UNSIGNED flag, because MariaDB's Field_timestampf has the UNSIGNED_FLAG. It worked fine on pure MariaDB installations. However, if a MariaDB server starts over a MySQL-5.7 data directory during a migration, the last_update column does not have DATA_UNSIGNED flag, because MySQL's Field_timestampf does not have the UNSIGNED_FLAG. This made InnoDB (after the migration from MySQL) complain into the server error log about the unexpected data type. The actual fix is done in storage/innobase/dict/dict0stats.cc: It removes DATA_UNSIGNED from the prtype_mask member of the reference columns, so now it does not require the underlying columns to have this flag. The rest of the fix is needed for MTR tests. The new data type plugin TYPE_MYSQL_TIMESTAMP implements a slightly modified version of Field_timestampf, which removes the unsigned flag, so it works like MySQL's Field_timestampf. The MTR test ALTERs the data type of the columns table_stats_schema.last_update and index_stats_schema.last_update from TIMESTAMP to TYPE_MYSQL_TIMESTAMP, then makes InnoDB verify the structure of the two statistics tables by creating and populating an InnoDB table t1. Without the fix made storage/innobase/dict/dict0stats.cc, MTR complains about unexpected warnings in the server error log: [ERROR] InnoDB: Column last_update in table mysql.innodb_table_stats is ... [ERROR] InnoDB: Column last_update in table mysql.innodb_index_stats is ... With the fix made storage/innobase/dict/dict0stats.cc these warnings go away.	2023-05-25 05:25:39 +04:00
Thirunarayanan Balathandayuthapani	7737f15f87	MDEV-31333 fsp_free_page() fails to move the extent from FSP_FREE_FRAG to FSP_FREE list - This issue was caused by commit `0b47c126e3`. In fsp_free_page(), InnoDB should set XDES_FREE_BIT of the page before moving the extent from FSP_FREE_FRAG to FSP_FREE list.	2023-05-24 15:15:29 +05:30
Marko Mäkelä	b220bb756b	Merge bb-10.6-release into 10.6	2023-05-24 08:37:19 +03:00
Marko Mäkelä	98de15aba1	Merge bb-10.5-release into bb-10.6-release	2023-05-24 08:36:30 +03:00
Marko Mäkelä	383105dae1	Merge bb-10.5-release into 10.5	2023-05-24 08:28:20 +03:00
Marko Mäkelä	c5cf94b2dc	MDEV-31234 fixup: Free some UNDO pages earlier trx_purge_truncate_rseg_history(): Add a parameter to specify if the entire rollback segment is safe to be freed. If not, we may still be able to invoke trx_undo_truncate_start() and free some pages.	2023-05-24 08:25:26 +03:00
Marko Mäkelä	270eeeb523	Merge 10.5 into 10.6	2023-05-23 12:25:39 +03:00
Marko Mäkelä	9c35f9c9c1	MDEV-31234 fixup: Allow innodb_undo_log_truncate=ON after upgrade trx_purge_truncate_history(): Relax a condition that would prevent undo log truncation if the undo log tablespaces were "contaminated" by the bug that commit `e0084b9d31` fixed. That is, trx_purge_truncate_rseg_history() would have invoked flst_remove() on TRX_RSEG_HISTORY but not reduced TRX_RSEG_HISTORY_SIZE. To avoid any regression with normal operation, we implement this fixup during slow shutdown only. The condition on the history list being empty is necessary: without it, in the test innodb.undo_truncate_recover there may be much fewer than the expected 90,000 calls to row_purge() before the truncation. That is, we would truncate the undo tablespace before actually having processed all undo log records in it. To truncate such "contaminated" or "bloated" undo log tablespaces (when using innodb_undo_tablespaces=2 or more) you can execute the following SQL: BEGIN;INSERT mysql.innodb_table_stats VALUES('','',DEFAULT,0,0,0);ROLLBACK; SET GLOBAL innodb_undo_log_truncate=ON, innodb_fast_shutdown=0; SHUTDOWN; The first line creates a dummy InnoDB transaction, to ensure that there will be some history to be purged during shutdown and that the undo tablespaces will be truncated.	2023-05-23 12:20:27 +03:00
Marko Mäkelä	a5ce335ac9	MDEV-29593 fixup: Avoid a leak if rseg.undo_cached is corrupted trx_purge_truncate_rseg_history(): Avoid a leak similar to the one that was fixed in MDEV-31324, in case a supposedly cached undo log page is not found in the rseg.undo_cached list.	2023-05-22 17:10:25 +03:00
Marko Mäkelä	eb2e074494	Merge 10.5 into 10.6	2023-05-22 08:38:21 +03:00
Teemu Ollakka	f307160218	MDEV-29293 MariaDB stuck on starting commit state This commit contains a merge from 10.5-MDEV-29293-squash into 10.6. Although the bug MDEV-29293 was not reproducible with 10.6, the fix contains several improvements for wsrep KILL query and BF abort handling, and addresses the following issues: * MDEV-30307 KILL command issued inside a transaction is problematic for galera replication: This commit will remove KILL TOI replication, so Galera side transaction context is not lost during KILL. * MDEV-21075 KILL QUERY maintains nodes data consistency but breaks GTID sequence: This is fixed as well as KILL does not use TOI, and thus does not change GTID state. * MDEV-30372 Assertion in wsrep-lib state: This was caused by BF abort or KILL when local transaction was in the middle of group commit. This commit disables THD::killed handling during commit, so the problem is avoided. * MDEV-30963 Assertion failure !lock.was_chosen_as_deadlock_victim in trx0trx.h:1065: The assertion happened when the victim was BF aborted via MDL while it was committing. This commit changes MDL BF aborts so that transactions which are committing cannot be BF aborted via MDL. The RQG grammar attached in the issue could not reproduce the crash anymore. Original commit message from 10.5 fix: MDEV-29293 MariaDB stuck on starting commit state The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Make galera_var_retry_autocommit result more readable by echoing cases and expectations into result. Only one expected result for reap to verify that server returns expected status for query. * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_bf_abort_registering to check that registering trx gets BF aborted through MDL. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:42:05 +02:00
Teemu Ollakka	3f59bbeeae	MDEV-29293 MariaDB stuck on starting commit state The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:39:43 +02:00
Teemu Ollakka	6966d7fe4b	MDEV-29293 MariaDB stuck on starting commit state This is a backport from 10.5. The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:33:37 +02:00
Vlad Lesin	b54e7b0cea	MDEV-31185 rw_trx_hash_t::find() unpins pins too early rw_trx_hash_t::find() acquires element->mutex, then unpins pins, used for lf_hash element search. After that the "element" can be deallocated and reused by some other thread. If we take a look rw_trx_hash_t::insert()->lf_hash_insert()->lf_alloc_new() calls, we will not find any element->mutex acquisition, as it was not initialized yet before it's allocation. rw_trx_hash_t::insert() can reuse the chunk, unpinned in rw_trx_hash_t::find(). The scenario is the following: 1. Thread 1 have just executed lf_hash_search() in rw_trx_hash_t::find(), but have not acquired element->mutex yet. 2. Thread 2 have removed the element from hash table with rw_trx_hash_t::erase() call. 3. Thread 1 acquired element->mutex and unpinned pin 2 pin with lf_hash_search_unpin(pins) call. 4. Some thread purged memory of the element. 5. Thread 3 reused the memory for the element, filled element->id, element->trx. 6. Thread 1 crashes with failed "DBUG_ASSERT(trx_id == trx->id)" assertion. Note that trx_t objects are also reused, see the code around trx_pools for details. The fix is to invoke "lf_hash_search_unpin(pins);" after element->trx is stored in local variable in rw_trx_hash_t::find(). Reviewed by: Nikita Malyavin, Marko Mäkelä.	2023-05-19 15:50:20 +03:00
Marko Mäkelä	d2420669bd	MDEV-31309 Innodb_buffer_pool_read_requests is not updated correctly srv_export_innodb_status(): Update export_vars.innodb_buffer_pool_read_requests as it was done before commit `a55b951e60` (MDEV-26827). If innodb_status_variables[] pointed to a sharded variable, it would only access the first shard.	2023-05-19 15:38:48 +03:00
Marko Mäkelä	df524dc06f	MDEV-31308 InnoDB monitor trx_rseg_history_len was accidentally disabled by default innodb_counter_info[]: Revert a change that was accidentally made in commit `204e7225dc`	2023-05-19 15:29:26 +03:00
Marko Mäkelä	f2c17cc9d9	MDEV-29911 InnoDB recovery and mariadb-backup --prepare fail to report detailed progress This is a 10.6 port of commit `2f9e264781` from MariaDB Server 10.9 that is missing some optimization due to a more complex redo log format and recovery logic (which was simplified in commit `685d958e38`). The progress reporting of InnoDB crash recovery was rather intermittent. Nothing was reported during the single-threaded log record parsing, which could consume minutes when parsing a large log. During log application, there only was progress reporting in background threads that would be invoked on data page read completion. The progress reporting here will be detailed like this: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1963895808 InnoDB: Multi-batch recovery needed at LSN 2534560930 InnoDB: Read redo log up to LSN=3312233472 InnoDB: Read redo log up to LSN=1599646720 InnoDB: Read redo log up to LSN=2160831488 InnoDB: To recover: LSN 2806789376/2806819840; 195082 pages InnoDB: To recover: LSN 2806789376/2806819840; 63507 pages InnoDB: Read redo log up to LSN=3195776000 InnoDB: Read redo log up to LSN=3687099392 InnoDB: Read redo log up to LSN=4165315584 InnoDB: To recover: LSN 4374395699/4374440960; 241454 pages InnoDB: To recover: LSN 4374395699/4374440960; 123701 pages InnoDB: Read redo log up to LSN=4508724224 InnoDB: Read redo log up to LSN=5094550528 InnoDB: To recover: 205230 pages The previous messages "Starting a batch to recover" or "Starting a final batch to recover" will be replaced by "To recover: ... pages" messages. If a batch lasts longer than 15 seconds, then there will be progress reports every 15 seconds, showing the number of remaining pages. For the non-final batch, the "To recover:" message includes two end LSN: that of the batch, and of the recovered log. This is the primary measure of progress. The batch will end once the number of pages to recover reaches 0. If recovery is possible in a single batch, the output will look like this, with a shorter "To recover:" message that counts only the remaining pages: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1984539648 InnoDB: Read redo log up to LSN=2710875136 InnoDB: Read redo log up to LSN=3358895104 InnoDB: Read redo log up to LSN=3965299712 InnoDB: Read redo log up to LSN=4557417472 InnoDB: Read redo log up to LSN=5219527680 InnoDB: To recover: 450915 pages We will also speed up recovery by improving the memory management and implementing multi-threaded recovery of data pages that will not need to be read into the buffer pool ("fake read"). Log application in the "fake read" threads will be protected by an atomic being_recovered field and exclusive buf_page_t::lock. Recovery will reserve for data pages two thirds of the buffer pool, or 256 pages, whichever is smaller. Previously, we could only use at most one third of the buffer pool for buffered log records. This would typically mean that with large buffer pools, recovery unnecessary consisted of multiple batches. If recovery runs out of memory, it will "roll back" or "rewind" the current mini-transaction. The recv_sys.recovered_lsn and recv_sys.pages will correspond to the "out of memory LSN", at the end of the previous complete mini-transaction. If recovery runs out of memory while executing the final recovery batch, we can simply invoke recv_sys.apply(false) to make room, and resume parsing. If recovery runs out of memory before the final batch, we will scan the redo log to the end and check for any missing or inconsistent files. In this version of the patch, we will throw away any previously buffered recv_sys.pages and rescan the log from the checkpoint onwards. recv_sys_t::pages_it: A cached iterator to recv_sys.pages. recv_sys_t::is_memory_exhausted(): Remove. We will have out-of-memory handling deep inside recv_sys_t::parse(). recv_sys_t::rewind(), page_recv_t::recs_t::rewind(): Remove all log starting with a specific LSN. IORequest::write_complete(), IORequest::read_complete(): Replaces fil_aio_callback(). read_io_callback(), write_io_callback(): Replaces io_callback(). IORequest::fake_read_complete(), fake_io_callback(), os_fake_read(): Process a "fake read" request for concurrent recovery. recv_sys_t::apply_batch(): Choose a number of successive pages for a recovery batch. recv_sys_t::erase(recv_sys_t::map::iterator): Remove log records for a page whose recovery is not in progress. Log application threads will not invoke this; they will only set being_recovered=-1 to indicate that the entry is no longer needed. recv_sys_t::garbage_collect(): Remove all being_recovered=-1 entries. recv_sys_t::wait_for_pool(): Wait for some space to become available in the buffer pool. mlog_init_t::mark_ibuf_exist(): Avoid calls to recv_sys::recover_low() via ibuf_page_exists() and buf_page_get_low(). Such calls would lead to double locking of recv_sys.mutex, which depending on implementation could cause a deadlock. We will use lower-level calls to look up index pages. buf_LRU_block_remove_hashed(): Disable consistency checks for freed ROW_FORMAT=COMPRESSED pages. Their contents could be uninitialized garbage. This fixes an occasional failure of the test innodb.innodb_bulk_create_index_debug. Tested by: Matthias Leich	2023-05-19 15:20:07 +03:00
Vlad Lesin	5422784792	MDEV-31256 fil_node_open_file() releases fil_system.mutex allowing other thread to open its file node There is room between mutex_exit(&fil_system.mutex) and mutex_enter(&fil_system.mutex) calls in fil_node_open_file(). During this room another thread can open the node, and ut_ad(!node->is_open()) assertion in fil_node_open_file_low() can fail. The fix is not to open node if it was already opened by another thread.	2023-05-19 14:52:47 +03:00
Marko Mäkelä	347e22fbf8	Merge bb-10.6-release into 10.6	2023-05-19 14:23:53 +03:00
Marko Mäkelä	06d555a41a	Merge bb-10.5-release into 10.5	2023-05-19 14:23:04 +03:00
Marko Mäkelä	e5933b99d5	MDEV-31234 related cleanup trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Replace some unreachable code with debug assertions. A buffer-fix does prevent pages from being evicted from the buffer pool; see buf_page_t::can_relocate(). Tested by: Matthias Leich	2023-05-19 12:25:30 +03:00
Marko Mäkelä	37492960f3	Merge 10.5 into 10.6	2023-05-19 12:24:58 +03:00
Marko Mäkelä	e0084b9d31	MDEV-31234 InnoDB does not free UNDO after the fix of MDEV-30671 trx_purge_truncate_history(): Only call trx_purge_truncate_rseg_history() if the rollback segment is safe to process. This will avoid leaking undo log pages that are not yet ready to be processed. This fixes a regression that was introduced in commit `0de3be8cfd` (MDEV-30671). trx_sys_t::any_active_transactions(): Separately count XA PREPARE transactions. srv_purge_should_exit(): Terminate slow shutdown if the history size does not change and XA PREPARE transactions exist in the system. This will avoid a hang of the test innodb.recovery_shutdown. Tested by: Matthias Leich	2023-05-19 12:19:26 +03:00
Marko Mäkelä	a3e5b5c4db	Merge 10.5 into 10.6	2023-05-15 09:02:32 +03:00
Marko Mäkelä	c9eff1a144	MDEV-31254 InnoDB: Trying to read doublewrite buffer page buf_read_page_low(): Remove an error message and a debug assertion that can be triggered when using innodb_page_size=4k and innodb_file_per_table=0. In that case, buf_read_ahead_linear() may be invoked on page 255, which is one less than the first page of the doublewrite buffer (256).	2023-05-12 15:04:50 +03:00
Marko Mäkelä	477285c8ea	MDEV-31253 Freed data pages are not always being scrubbed fil_space_t::flush_freed(): Renamed from buf_flush_freed_pages(); this is a backport of `aa45850687` from 10.6. Invoke log_write_up_to() on last_freed_lsn, instead of avoiding the operation when the log has not yet been written. A more costly alternative would be that log_checkpoint() would invoke this function on every affected tablespace.	2023-05-12 14:57:14 +03:00
Marko Mäkelä	c271057288	Merge 10.5 into 10.6	2023-05-11 13:27:01 +03:00
Marko Mäkelä	279d0120f5	MDEV-29967 innodb_read_ahead_threshold (linear read-ahead) does not work buf_read_ahead_linear(): Correct some calculations that were broken in commit `b1ab211dee` (MDEV-15053). Thanks to Daniel Black for providing a test case and initial debugging. Tested by: Matthias Leich	2023-05-11 13:21:57 +03:00
Marko Mäkelä	7124911a2c	MDEV-31158: Potential hang with ROW_FORMAT=COMPRESSED tables btr_cur_need_opposite_intention(): Check also page_zip_available() so that we will escalate to exclusive index latch when a non-leaf page may have to be split further due to ROW_FORMAT=COMPRESSED page overflow. Tested by: Matthias Leich	2023-05-11 08:43:00 +03:00
Marko Mäkelä	4a668c1892	MDEV-29401 InnoDB history list length increased in 10.6 compared to 10.5 The InnoDB buffer pool and locking were heavily refactored in MariaDB Server 10.6. Among other things, dict_sys.mutex was removed, and the contended lock_sys.mutex was replaced with a combination of lock_sys.latch and distributed latches in hash tables. Also, a default value was changed to innodb_flush_method=O_DIRECT to improve performance in write-heavy workloads. One thing where an adjustment was missing is around the parameters innodb_max_purge_lag (number of committed transactions waiting to be purged), and innodb_max_purge_lag_delay (maximum number of microseconds to delay a DML operation). purge_coordinator_state::do_purge(): Pass the history_size to trx_purge() and reset srv_dml_needed_delay if the history is empty. Keep executing the loop non-stop as long as srv_dml_needed_delay is set. trx_purge_dml_delay(): Made part of trx_purge(). Set srv_dml_needed_delay=0 when nothing can be purged (!n_pages_handled). row_mysql_delay_if_needed(): Mimic the logic of innodb_max_purge_lag_wait_update(). Reviewed by: Thirunarayanan Balathandayuthapani	2023-04-27 17:11:32 +03:00
Marko Mäkelä	5740638c4c	MDEV-31132 Deadlock between DDL and purge of InnoDB history log_free_check(): Assert that the caller must not hold exclusive lock_sys.latch. This was the case for calls from ibuf_delete_for_discarded_space(). This caused a deadlock with another thread that would be holding a latch on a dirty page that would need to be written so that the checkpoint would advance and log_free_check() could return. That other thread was waiting for a shared lock_sys.latch. fil_delete_tablespace(): Do not invoke ibuf_delete_for_discarded_space() because in DDL operations, we will be holding exclusive lock_sys.latch. trx_t::commit(std::vector<pfs_os_file_t>&), innodb_drop_database(), row_purge_remove_clust_if_poss_low(), row_undo_ins_remove_clust_rec(), row_discard_tablespace_for_mysql(): Invoke ibuf_delete_for_discarded_space() on the deleted tablespaces after releasing all latches.	2023-04-26 12:08:59 +03:00
Marko Mäkelä	d4265fbde5	MDEV-26055: Correct the formula for adaptive flushing page_cleaner_flush_pages_recommendation(): If dirty_pct is between innodb_max_dirty_pages_pct_lwm and innodb_max_dirty_pages_pct, scale the effort relative to how close we are to innodb_max_dirty_pages_pct. The previous formula was missing a multiplication by 100. Tested by: Axel Schwenke	2023-04-26 11:53:42 +03:00
Marko Mäkelä	c22ab93f8a	MDEV-26827 fixup: Prevent a hang in LRU eviction buf_pool_t::page_cleaner_wakeup(): If for_LRU=true, wake up the page cleaner immediately, also when it is in a timed wait. This avoids an unnecessary delay of up to 1 second.	2023-04-25 15:03:38 +03:00
Marko Mäkelä	818d5e4814	Merge 10.5 into 10.6	2023-04-25 13:10:33 +03:00
Marko Mäkelä	50f3b7d164	MDEV-31124 Innodb_data_written miscounts doublewrites When commit `a5a2ef079c` implemented asynchronous doublewrite, the writes via the doublewrite buffer started to be counted incorrectly, without multiplying them by innodb_page_size. srv_export_innodb_status(): Correctly count the Innodb_data_written. buf_dblwr_t: Remove submitted(), because it is close to written() and only Innodb_data_written was interested in it. According to its name, it should count completed and not submitted writes. Tested by: Axel Schwenke	2023-04-25 12:17:06 +03:00
Oleksandr Byelkin	1d74927c58	Merge branch '10.4' into 10.5	2023-04-24 12:43:47 +02:00
Marko Mäkelä	0976afec88	MDEV-31114 Assertion !...is_waiting() failed in os_aio_wait_until_no_pending_writes() os_aio_wait_until_no_pending_reads(), os_aio_wait_until_pending_writes(): Add a Boolean parameter to indicate whether the wait should be declared in the thread pool. buf_flush_wait(): The callers have already declared a wait, so let us avoid doing that again, just call os_aio_wait_until_pending_writes(false). buf_flush_wait_flushed(): Do not declare a wait in the rare case that the buf_flush_page_cleaner thread has been shut down already. buf_flush_page_cleaner(), buf_flush_buffer_pool(): In the code that runs during shutdown, do not declare waits. buf_flush_buffer_pool(): Remove a debug assertion that might fail. What really matters here is buf_pool.flush_list.count==0. buf_read_recv_pages(), srv_prepare_to_delete_redo_log_file(): Do not declare waits during InnoDB startup.	2023-04-24 09:57:58 +03:00
Thirunarayanan Balathandayuthapani	2c567b2fa3	MDEV-30996 insert.. select in presence of full text index freezes all other commits at commit time - This patch does the following: git revert --no-commit `673243c893` git revert --no-commit `6c669b9586` git revert --no-commit `bacaf2d4f4` git checkout HEAD mysql-test git revert --no-commit `1fd7d3a9ad` Above command reverts MDEV-29277, MDEV-25581, MDEV-29342. When binlog is enabled, trasaction takes a lot of time to do sync operation on innodb fts table. This leads to block of other transaction commit. To avoid this failure, remove the fulltext sync operation during transaction commit. So reverted MDEV-25581 related patches. We filed MDEV-31105 to avoid the memory consumption problem during fulltext sync operation.	2023-04-24 11:06:56 +05:30
Alexander Barkov	9f98a2acd7	MDEV-30968 mariadb-backup does not copy Aria logs if aria_log_dir_path is used - `mariadb-backup --backup` was fixed to fetch the value of the @@aria_log_dir_path server variable and copy aria_log* files from @@aria_log_dir_path directory to the backup directory. Absolute and relative (to --datadir) paths are supported. Before this change aria_log* files were copied to the backup only if they were in the default location in @@datadir. - `mariadb-backup --copy-back` now understands a new my.cnf and command line parameter --aria-log-dir-path. `mariadb-backup --copy-back` in the main loop in copy_back() (when copying back from the backup directory to --datadir) was fixed to ignore all aria_log* files. A new function copy_back_aria_logs() was added. It consists of a separate loop copying back aria_log* files from the backup directory to the directory specified in --aria-log-dir-path. Absolute and relative (to --datadir) paths are supported. If --aria-log-dir-path is not specified, aria_log* files are copied to --datadir by default. - The function is_absolute_path() was fixed to understand MTR style paths on Windows with forward slashes, e.g. --aria-log-dir-path=D:/Buildbot/amd64-windows/build/mysql-test/var/...	2023-04-21 19:08:35 +04:00
Marko Mäkelä	51e62cb3b3	MDEV-26782 InnoDB temporary tablespace: reclaiming of free space does not work The motivation of this change is to allow undo pages for temporary tables to be marked free as often as possible, so that we can avoid buf_pool.LRU eviction (and writes) of undo pages that contain data that is no longer needed. For temporary tables, no MVCC or purge of history is needed, and reusing cached undo log pages might not help that much. It is possible that this may cause some performance regression due to more frequent allocation and freeing of undo log pages, but I only measured a performance improvement. trx_write_serialisation_history(): Never cache temporary undo log pages. trx_undo_reuse_cached(): Assert that the rollback segment is persistent. trx_undo_assign_low(): Add template<bool is_temp>. Never invoke trx_undo_reuse_cached() for temporary tables. Tested by: Matthias Leich	2023-04-21 17:58:26 +03:00
Marko Mäkelä	204e7225dc	Cleanup: MONITOR_EXISTING trx_undo_slots_used, trx_undo_slots_cached Let us remove explicit updates of MONITOR_NUM_UNDO_SLOT_USED and MONITOR_NUM_UNDO_SLOT_CACHED, and let us compute the rough values from trx_sys.rseg_array[] on demand.	2023-04-21 17:58:18 +03:00
Marko Mäkelä	86767bcc0f	MDEV-29593 Purge misses a chance to free not-yet-reused undo pages trx_purge_truncate_rseg_history(): If all other conditions for invoking trx_purge_remove_log_hdr() hold, but the state is TRX_UNDO_CACHED instead of TRX_UNDO_TO_PURGE, detach and free it. Tested by: Matthias Leich	2023-04-21 17:58:09 +03:00
Marko Mäkelä	40eff3f868	MDEV-26827 fixup: hangs and !os_aio_pending_writes() assertion failures buf_LRU_get_free_block(): Always wake up the page cleaner if needed before exiting the inner loop. srv_prepare_to_delete_redo_log_file(): Replace a debug assertion with a wait in debug builds. Starting with commit `7e31a8e7fa` the debug assertion ut_ad(!os_aio_pending_writes()) could occasionally fail, while it would hold in core dumps of crashes. The failure can be reproduced more easily by adding a sleep to the write completion callback function, right before releasing to write_slots. srv_start(): Remove a bogus debug assertion ut_ad(!os_aio_pending_writes()) that could fail in mariadb-backup --prepare. In an rr replay trace, we had buf_pool.flush_list.count==0 but write_slots->m_cache.m_pos==1 and buf_page_t::write_complete() was executing u_unlock().	2023-04-21 17:52:47 +03:00
Marko Mäkelä	e55e761eae	MDEV-31084 assert(waiting) failed in TP_connection_generic::wait_end buf_flush_wait_flushed(): Correct the logic for registering a wait around buf_flush_wait() that commit `a091d6ac4e` recently broke. This should be easily repeatable when using a non-default startup parameter: thread-handling=pool-of-threads	2023-04-21 16:49:59 +03:00
Marko Mäkelä	abe4c7bfd6	Merge 10.5 into 10.6	2023-04-21 16:38:22 +03:00
Marko Mäkelä	c6e58a8d17	MDEV-30753 fixup: Unsafe buffer page restoration trx_purge_free_segment(): The buffer-fix only prevents a block from being freed completely from the buffer pool, but it will not prevent the block from being evicted. Recheck the page identifier after acquiring an exclusive page latch. If it has changed, backtrack and invoke buf_page_get_gen() to look up the page normally.	2023-04-21 16:19:39 +03:00
Marko Mäkelä	7e31a8e7fa	MDEV-26827 fixup: Fix os_aio_wait_until_no_pending_writes() io_callback(): Process the request before releasing the write slot. Before commit `a091d6ac4e` when we had a duplicated counter for writes, either ordering was fine. Now, correctness depends on os_aio_wait_until_no_pending_writes().	2023-04-20 14:08:48 +03:00
Marko Mäkelä	27ff972be2	MDEV-26827 fixup: Do not hog buf_pool.mutex buf_flush_LRU_list_batch(): When evicting clean pages, release and reacquire the buf_pool.mutex after every 32 pages. Also, eliminate some conditional branches.	2023-04-19 18:57:18 +03:00
Marko Mäkelä	0cda0e4e15	MDEV-31080 fil_validate() failures during deferred tablespace recovery fil_space_t::create(), fil_space_t::add(): Expect the caller to acquire and release fil_system.mutex. In this way, creating a tablespace and adding the first (usually only) data file will be atomic. recv_sys_t::recover_deferred(): Correctly protect some changes by holding fil_system.mutex. Tested by: Matthias Leich	2023-04-19 18:56:58 +03:00
Marko Mäkelä	78368e5866	MDEV-30863 fixup: Assertion failure when using innodb_undo_tablespaces=0 trx_assign_rseg_low(): Let us restore the debug variable look_for_rollover to avoid assertion failures when a server that was created with multiple undo tablespaces is being started with innodb_undo_tablespaces=0.	2023-04-19 15:52:11 +03:00
Marko Mäkelä	1892f5d8fc	MDEV-30863 fixup: Hang in a debug build trx_assign_rseg_low(): Correct a debug injection condition.	2023-04-19 14:46:49 +03:00

... 3 4 5 6 7 ...

10127 commits