mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-14 19:07:15 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	3c312d247c	MDEV-35190 HASH_SEARCH duplicates effort before HASH_INSERT or HASH_DELETE The HASH_ macros are unnecessarily obfuscating the logic, so we had better replace them. hash_cell_t::search(): Implement most of the HASH_DELETE logic, for a subsequent insert or remove(). hash_cell_t::remove(): Remove an element. hash_cell_t::find(): Implement the HASH_SEARCH logic. xb_filter_hash_free(): Avoid any hash table lookup; just traverse the hash bucket chains and free each element. xb_register_filter_entry(): Search databases_hash only once. rm_if_not_found(): Make use of find_filter_in_hashtable(). dict_sys_t::acquire_temporary_table(), dict_sys_t::find_table(): Define non-inline to avoid unnecessary code duplication. dict_sys_t::add(dict_table_t *table), dict_table_rename_in_cache(): Look for duplicate while finding the insert position. dict_table_change_id_in_cache(): Merged to the only caller row_discard_tablespace(). hash_insert(): Helper function of dict_sys_t::resize(). fil_space_t::create(): Look for a duplicate (and crash if found) when searching for the insert position. lock_rec_discard(): Take the hash array cell as a parameter to avoid a duplicated lookup. lock_rec_free_all_from_discard_page(): Remove a parameter. Reviewed by: Debarun Banerjee	2024-11-21 08:59:02 +02:00
Marko Mäkelä	ba69d811fa	MDEV-35409 InnoDB can still hang while running out of buffer pool buf_pool_t::LRU_warn(): Also clear the try_LRU_scan flag, to ensure that need_LRU_eviction() will hold. This should ensure progress when buf_LRU_get_free_block() is expecting buf_flush_page_cleaner() to make some room, even when buf_pool.LRU.count is small. This hang was observed in trx_lists_init_at_db_start() while the last batch of crash recovery was in progress, but it could theoretically be possible also when a large part of the buffer pool is occupied by record locks or the adaptive hash index. Reviewed by: Debarun Banerjee	2024-11-18 08:13:18 +02:00
Marko Mäkelä	7701ccb72d	MDEV-35149 Race condition around SET GLOBAL innodb_lru_scan_depth A debug assertion in buf_LRU_get_free_block() could fail if SET GLOBAL innodb_lru_scan_depth is being executed during a workload that involves allocating buffer pool pages. buf_pool_t::LRU_scan_depth: Replaces srv_LRU_scan_depth. buf_pool_t::flush_neighbors: Replaces srv_flush_neighbors. innodb_buf_pool_update<T>(): Update a parameter of buf_pool while holding buf_pool.mutex.	2024-10-21 10:08:58 +03:00
Marko Mäkelä	bb47e575de	MDEV-34830: LSN in the future is not being treated as serious corruption The invariant of write-ahead logging is that before any change to a page is written to the data file, the corresponding log record must must first have been durably written. On crash recovery, there were some sloppy checks for this. Let us implement accurate checks and flag an inconsistency as a hard error, so that we can avoid further corruption of a corrupted database. For data extraction from the corrupted database, innodb_force_recovery can be used. Before recovery is reading any data pages or invoking buf_dblwr_t::recover() to recover torn pages from the doublewrite buffer, InnoDB will have parsed the log until the final LSN and updated log_sys.lsn to that. So, we can rely on log_sys.lsn at all times. The doublewrite buffer recovery has been refactored in such a way that the recv_sys.dblwr.pages may be consulted while discovering files and their page sizes, but nothing will be written back to data files before buf_dblwr_t::recover() is invoked. A section of the test mariabackup.innodb_redo_overwrite that is parsing some mariadb-backup --backup output has been removed, because that output "redo log block is overwritten" would often be missing in a Microsoft Windows environment as a result of these changes. recv_max_page_lsn, recv_lsn_checks_on: Remove. recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging condition at the end of the recovery. recv_dblwr_t::validate_page(): Keep track of the maximum LSN (if we are checking a non-doublewrite copy of a page) but do not complain LSN being in the future. The doublewrite buffer is a special case, because it will be read early during recovery. Besides, starting with commit `762bcb81b5` the dblwr=true copies of pages may legitimately be "too new". recv_dblwr_t::find_page(): Find a valid page with the smallest FIL_PAGE_LSN that is in the valid range for recovery. recv_dblwr_t::restore_first_page(): Replaced by find_page(). Only buf_dblwr_t::recover() will write to data files. buf_dblwr_t::recover(): Simplify the message output. Do attempt doublewrite recovery on user page read error. Ignore doublewrite pages whose FIL_PAGE_LSN is outside the usable bounds. Previously, we could wrongly recover a too new page from the doublewrite buffer. It is unlikely that this could have lead to an actual error. Write back all recovered pages from the doublewrite buffer here, including for the first page of any tablespace. buf_page_is_corrupted(): Distinguish the return values CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER. buf_page_check_corrupt(): Return the error code DB_CORRUPTION in case the LSN is in the future. Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff in the same way on both 32-bit and 64-bit architectures. Datafile::read_first_page_flags(): Split from read_first_page(). Take a copy of the first page as a parameter. recv_sys_t::free_corrupted_page(): Take the file as a parameter and return whether a message was displayed. This avoids some duplicated and incomplete error messages. buf_page_t::read_complete(): Remove some redundant output and always display the name of the corrupted file. Never return DB_FAIL; use it only in internal error handling. IORequest::read_complete(): Assume that buf_page_t::read_complete() will have reported any error. fil_space_t::set_corrupted(): Return whether this is the first time the tablespace had been flagged as corrupted. Datafile::validate_first_page(), fil_node_open_file_low(), fil_node_open_file(), fil_space_t::read_page0(), fil_node_t::read_page0(): Add a parameter for a copy of the first page, and a parameter to indicate whether the FIL_PAGE_LSN check should be suppressed. Before buf_dblwr_t::recover() is invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the FSP_SPACE_FLAGS and the tablespace ID that may be present in a potentially too new copy of a page. Reviewed by: Debarun Banerjee	2024-10-17 17:24:20 +03:00
mariadb-DebarunBanerjee	35d477dd1d	MDEV-34453 Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1 The issue is caused by a race between buf_page_create_low getting the page from buffer pool hash and buf_LRU_free_page evicting it from LRU. The issue is introduced in 10.6 by MDEV-27058 commit `aaef2e1d8c` MDEV-27058: Reduce the size of buf_block_t and buf_page_t The solution is buffer fix the page before releasing buffer pool mutex in buf_page_create_low when x_lock_try fails to acquire the page latch.	2024-09-20 20:26:43 +05:30
Marko Mäkelä	a74bea7ba9	MDEV-34879 InnoDB fails to merge the change buffer to ROW_FORMAT=COMPRESSED tables buf_page_t::read_complete(): Fix an incorrect condition that had been added in commit `aaef2e1d8c` (MDEV-27058). Also for compressed-only pages we must remember that buffered changes may exist. buf_read_page(): Correct the function comment; this is for a synchronous and not asynchronous read. Pass the parameter unzip=true to buf_read_page_low(), because each of our callers will be interested in the uncompressed page frame. This will cause the test encryption.innodb-compressed-blob to emit more errors when the correct keys for decrypting the clustered index root page are unavailable. Reviewed by: Debarun Banerjee	2024-09-12 10:52:55 +03:00
Marko Mäkelä	f168050e90	MDEV-34791 fixup: Avoid an infinite loop with ROW_FORMAT=COMPRESSED buf_pool_t::page_fix(): If a change buffer merge may be needed on a ROW_FORMAT=COMPRESSED page that exists in compressed-only format in the buffer pool, go ahead to decompress the block. This fixes an infinite loop. Reviewed by: Debarun Banerjee	2024-09-12 10:52:12 +03:00
Marko Mäkelä	9f0b106631	MDEV-34845 buf_flush_buffer_pool(): Assertion `!os_aio_pending_reads()' failed buf_flush_buffer_pool(): Wait for any pending asynchronous reads to complete. This assertion failed in a run where buf_read_ahead_linear() had been triggered in an SQL statement that was executed right before shutdown. Reviewed by: Debarun Banerjee	2024-09-03 18:22:10 +03:00
Marko Mäkelä	9878238f74	MDEV-34791: Redundant page lookups hurt performance btr_cur_t::search_leaf(): When the index root page is also a leaf page, we may need to upgrade our existing shared root page latch into an exclusive latch. Even if we end up waiting, the root page won't be able to go away while we hold an index()->lock. The index page may be split; that is all. btr_latch_prev(): Acquire the page latch while holding a buffer-fix and an index tree latch. Merge the change buffer if needed. Use buf_pool_t::page_fix() for this special case instead of complicating buf_page_get_low() and buf_page_get_gen(). row_merge_read_clustered_index(): Remove some code that does not seem to be useful. No difference was observed with regard to removing this code when a CREATE INDEX or OPTIMIZE TABLE statement was run concurrently with sysbench oltp_update_index --tables=1 --table_size=1000 --threads=16. buf_pool_t::unzip(): Decompress a ROW_FORMAT=COMPRESSED page. buf_pool_t::page_fix(): Handle also ROW_FORMAT=COMPRESSED pages as well as change buffer merge. Optionally return an error. Add a flag for suppressing a page latch wait and a special return value -1 to indicate that the call would block. This is the preferred way of buffer-fixing blocks. The functions buf_page_get_gen() and buf_page_get_low() are only being invoked with rw_latch=RW_NO_LATCH in operations on SPATIAL INDEX. buf_page_t: Define some static functions for interpreting state(). buf_page_get_zip(), buf_read_page(), buf_read_ahead_random(), buf_read_ahead_linear(): Remove the redundant parameter zip_size. We must look up the tablespace and can invoke fil_space_t::zip_size() on it. buf_page_get_low(): Require mtr!=nullptr. buf_page_get_gen(): Implement some lock downgrading during recovery. ibuf_page_low(): Use buf_pool_t::page_fix() in a debug check. We do wait for a page read here, because otherwise a debug assertion in buf_page_get_low() in the test innodb.ibuf_delete could occasionally fail. PageConverter::operator(): Invoke buf_pool_t::page_fix() in order to possibly evict a block. This allows us to remove some special case code from buf_page_get_low().	2024-09-03 14:15:57 +03:00
Marko Mäkelä	bda40ccb85	MDEV-34803 innodb_lru_flush_size is no longer used In commit `fa8a46eb68` (MDEV-33613) the parameter innodb_lru_flush_size ceased to have any effect. Let us declare the parameter as deprecated and additionally as MARIADB_REMOVED_OPTION, so that there will be a warning written to the error log in case the option is specified in the command line. Let us also do the same for the parameter innodb_purge_rseg_truncate_frequency that was deprecated&ignored earlier in MDEV-32050. Reviewed by: Debarun Banerjee	2024-08-28 07:18:03 +03:00
Marko Mäkelä	b7b9f3ce82	MDEV-34515: Contention between purge and workload In a Sysbench oltp_update_index workload that involves 1 table, a serious contention between the workload and the purge of history was observed. This was the worst when the table contained only 1 record. This turned out to be fixed by setting innodb_purge_batch_size=128, which corresponds to the number of usable persistent rollback segments. When we go above that, there would be contention between row_purge_poss_sec() and the workload, typically on the clustered index page latch, sometimes also on a secondary index page latch. It might be that with smaller batches, trx_sys.history_size() will end up pausing all concurrent transaction start/commit frequently enough so that purge will be able to make some progress, so that there would be less contention on the index page latches between purge and SQL execution. In commit `aa719b5010` (part of MDEV-32050) the interpretation of the parameter innodb_purge_batch_size was slightly changed. It would correspond to the maximum desired size of the purge_sys.pages cache. Before that change, the parameter was referring to a number of undo log pages, but the accounting might have been inaccurate. To avoid a regression, we will reduce the default value to innodb_purge_batch_size=127, which will also be compatible with innodb_undo_tablespaces>1 (which will disable rollback segment 0). Additionally, some logic in the purge and MVCC checks is simplified. The purge tasks will make use of purge_sys.pages when accessing undo log pages to find out if a secondary index record can be removed. If an undo page needs to be looked up in buf_pool.page_hash, we will merely buffer-fix it. This is correct, because the undo pages are append-only in nature. Holding purge_sys.latch or purge_sys.end_latch or the fact that the current thread is executing as a part of an in-progress purge batch will prevent the contents of the undo page from being freed and subsequently reused. The buffer-fix will prevent the page from being evicted form the buffer pool. Thanks to this logic, we can refer to the undo log record directly in the buffer pool page and avoid copying the record. buf_pool_t::page_fix(): Look up and buffer-fix a page. This is useful for accessing undo log pages, which are append-only by nature. There will be no need to deal with change buffer or ROW_FORMAT=COMPRESSED in that case. purge_sys_t::view_guard::view_guard(): Allow the type of guard to be acquired: end_latch, latch, or no latch (in case we are a purge thread). purge_sys_t::view_guard::get(): Read-only accessor to purge_sys.pages. purge_sys_t::get_page(): Invoke buf_pool_t::page_fix(). row_vers_old_has_index_entry(): Replaced with row_purge_is_unsafe() and row_undo_mod_sec_unsafe(). trx_undo_get_undo_rec(): Merged to trx_undo_prev_version_build(). row_purge_poss_sec(): Add the parameter mtr and remove redundant or unused parameters sec_pcur, sec_mtr, is_tree. We will use the caller's mtr object but release any acquired page latches before returning. btr_cur_get_page(), page_cur_get_page(): Do not invoke page_align(). row_purge_remove_sec_if_poss_leaf(): Return the value of PAGE_MAX_TRX_ID to be checked against the page in row_purge_remove_sec_if_poss_tree(). If the secondary index page was not changed meanwhile, it will be unnecessary to invoke row_purge_poss_sec() again. trx_undo_prev_version_build(): Access any undo log pages using the caller's mini-transaction object. row_purge_vc_matches_cluster(): Moved to the only compilation unit that needs it. Reviewed by: Debarun Banerjee	2024-08-26 12:23:06 +03:00
Marko Mäkelä	9db2b327d4	MDEV-34759: buf_page_get_low() is unnecessarily acquiring exclusive latch buf_page_ibuf_merge_try(): A new, separate function for invoking ibuf_merge_or_delete_for_page() when needed. Use the already requested page latch for determining if the call is necessary. If it is and if we are currently holding rw_latch==RW_S_LATCH, upgrading to an exclusive latch may involve waiting that another thread acquires and releases a U or X latch on the page. If we have to wait, we must recheck if the call to ibuf_merge_or_delete_for_page() is still needed. If the page turns out to be corrupted, we will release and fail the operation. Finally, the exclusive page latch will be downgraded to the originally requested latch. ssux_lock_impl::rd_u_upgrade_try(): Attempt to upgrade a shared lock to an update lock. sux_lock::s_x_upgrade_try(): Attempt to upgrade a shared lock to exclusive. sux_lock::s_x_upgrade(): Upgrade a shared lock to exclusive. Return whether a wait was elided. ssux_lock_impl::u_rd_downgrade(), sux_lock::u_s_downgrade(): Downgrade an update lock to shared.	2024-08-23 13:27:50 +03:00
Marko Mäkelä	267c0fce56	Merge 10.5 into 10.6	2024-08-15 10:16:46 +03:00
Marko Mäkelä	e40dfcdd89	Fix clang++-19 -Wunused-but-set-variable	2024-08-15 10:13:49 +03:00
Marko Mäkelä	4f8803c036	MDEV-34678 pthread_mutex_init() without pthread_mutex_destroy() When SUX_LOCK_GENERIC is defined, the srw_mutex, srw_lock, sux_lock are implemented based on pthread_mutex_t and pthread_cond_t. This is the only option for systems that lack a futex-like system call. In the SUX_LOCK_GENERIC mode, if pthread_mutex_init() is allocating some resources that need to be freed by pthread_mutex_destroy(), a memory leak could occur when we are repeatedly invoking pthread_mutex_init() without a pthread_mutex_destroy() in between. pthread_mutex_wrapper::initialized: A debug field to track whether pthread_mutex_init() has been invoked. This also helps find bugs like the one that was fixed by commit `1c8af2ae53` (MDEV-34422); one simply needs to add -DSUX_LOCK_GENERIC to the CMAKE_CXX_FLAGS to catch that particular bug on the initial server bootstrap. buf_block_init(), buf_page_init_for_read(): Invoke block_lock::init() because buf_page_t::init() will no longer do that. buf_page_t::init(): Instead of invoking lock.init(), assert that it has already been invoked (the lock is vacant). add_fts_index(), build_fts_hidden_table(): Explicitly invoke index_lock::init() in order to avoid a pthread_mutex_destroy() invocation on an uninitialized object. srw_lock_debug::destroy(): Invoke readers_lock.destroy(). trx_sys_t::create(): Invoke trx_rseg_t::init() on all rollback segments in order to guarantee a deterministic state for shutdown, even if InnoDB fails to start up. trx_rseg_array_init(), trx_temp_rseg_create(), trx_rseg_create(): Invoke trx_rseg_t::destroy() before trx_rseg_t::init() in order to balance pthread_mutex_init() and pthread_mutex_destroy() calls.	2024-08-14 07:54:15 +03:00
Yuchen Pei	f071b7620b	Merge branch '10.5' into 10.6	2024-07-16 15:54:22 +08:00
Thirunarayanan Balathandayuthapani	3662f8ca89	MDEV-34543 Shutdown hangs while freeing the asynchronous i/o slots Problem: ======== - During shutdown, InnoDB tries to free the asynchronous I/O slots and hangs. The reason is that InnoDB disables asynchronous I/O before waiting for pending asynchronous I/O to finish. buf_load(): InnoDB aborts the buffer pool load due to user requested shutdown and doesn't wait for the asynchronous read to get completed. This could lead to debug assertion in buf_flush_buffer_pool() during shutdown Fix: === os_aio_free(): Should wait all read_slots and write_slots to finish before disabling the aio. buf_load(): Should wait for pending read request to complete even though it was aborted.	2024-07-12 17:42:14 +05:30
Sergei Petrunia	513c827041	MDEV-34190: r_engine_stats.pages_read_count is unrealistically low The symptoms were: take a server with no activity and a table that's not in the buffer pool. Run a query that reads the whole table and observe that r_engine_stats.pages_read_count shows about 2% of the table was read. Who reads the rest? The cause was that page prefetching done inside InnoDB was not counted. This counts page prefetch requests made in buf_read_ahead_random() and buf_read_ahead_linear() and makes them visible in: - ANALYZE: r_engine_stats.pages_prefetch_read_count - Slow Query Log: Pages_prefetched: This patch intentionally doesn't attempt to count the time to read the prefetched pages: * there's no obvious place where one can do it * prefetch reads may be done in parallel (right?), it is not clear how to count the time in this case.	2024-07-04 15:24:49 +03:00
mariadb-DebarunBanerjee	73ad436e16	MDEV-34458 wait_for_read in buf_page_get_low hurts performance The performance regression seen while loading BP is caused by the deadlock fix given in MDEV-33543. The area of impact is wider but is more visible when BP is being loaded initially via DMLs. Specifically the response time could be impacted in DML doing pessimistic operation on index(split/merge) and the leaf pages are not found in buffer pool. It is more likely to occur with small BP size. The origin of the issue dates back to MDEV-30400 that introduced btr_cur_t::search_leaf() replacing btr_cur_search_to_nth_level() for leaf page searches. In btr_latch_prev, we use RW_NO_LATCH to get the previous page fixed in BP without latching. When the page is not in BP, we try to acquire and wait for S latch violating the latching order. This deadlock was analyzed in MDEV-33543 and fixed by using the already present wait logic in buf_page_get_gen() instead of waiting for latch. The wait logic is inferior to usual S latch wait and is simply a repeated sleep 100 of micro-sec (The actual sleep time could be more depending on platforms). The bug was seen with "change-buffering" code path and the idea was that this path should be less exercised. The judgement was not correct and the path is actually quite frequent and does impact performance when pages are not in BP and being loaded by DML expanding/shrinking large data. Fix: While trying to get a page with RW_NO_LATCH and we are attempting "out of order" latch, return from buf_page_get_gen immediately instead of waiting and follow the ordered latching path.	2024-07-03 18:08:43 +05:30
Marko Mäkelä	699d38d951	MDEV-34296 extern thread_local is a CPU waste In commit 99bd22605938c42d876194f2ec75b32e658f00f5 (MDEV-31558) we wrongly thought that there would be minimal overhead for accessing a thread-local variable mariadb_stats. It turns out that in C++11, each access to an extern thread_local variable requires conditionally invoking an initialization function. In fact, the initializer expression of mariadb_stats is dynamic, and those calls were actually unavoidable. In C++20, one could declare constinit thread_local variables, but the address of a thread_local variable (&mariadb_dummy_stats) is not a compile-time constant. We did not want to declare mariadb_dummy_stats without thread_local, because then the dummy accesses could lead to cache line contention between threads. mariadb_stats: Declare as __thread or __declspec(thread) so that there will be no dynamic initialization, but zero-initialization. mariadb_dummy_stats: Remove. It is a lesser evil to let the environment perform zero-initialization and check if !mariadb_stats. Reviewed by: Sergei Petrunia	2024-06-06 14:38:42 +03:00
Marko Mäkelä	bc3660925d	MDEV-34307 On startup, [FATAL] InnoDB: Page ... still fixed or dirty buf_pool_invalidate(): Properly wait for os_aio_wait_until_no_pending_writes() to ensure so that there are no pending buf_page_t::write_complete() or buf_page_write_complete() operations. This will avoid a failure of buf_pool.assert_all_freed(). This bug should affect debug builds only. At this point, the buf_pool.flush_list should be clear and all changes should have been written out. The loop around buf_LRU_scan_and_free_block() should have eventually completed and freed all pages as soon as buf_page_t::write_complete() had a chance to release the page latches. It is worth noting that buf_flush_wait() is working as intended. As soon as buf_flush_page_cleaner() invokes buf_pool.get_oldest_modification() it will observe that buf_page_t::write_complete() had assigned oldest_modification_ to 1, and remove such blocks from buf_pool.flush_list. Upon reaching buf_pool.flush_list.count=0 the buf_flush_page_cleaner() will mark itself idle and wake buf_flush_wait() by broadcasting buf_pool.done_flush_list. This regression was introduced in commit `a55b951e60` (MDEV-26827). Reviewed by: Debarun Banerjee	2024-06-06 10:18:42 +03:00
mariadb-DebarunBanerjee	b12c14e3b4	MDEV-34265 Possible hang during IO burst with innodb_flush_sync enabled When checkpoint age goes beyond the sync flush threshold and buf_flush_sync_lsn is set, page cleaner enters into "furious flush" stage to aggressively flush dirty pages from flush list and pull checkpoint LSN above safe margin. In this stage, page cleaner skips doing LRU flush and eviction. In 10.6, all other threads entirely rely on page cleaner to generate free pages. If free pages get over while page cleaner is busy in "furious flush" stage, a session thread could wait for free page in the middle of a min-transaction(mtr) while holding latches on other pages. It, in turn, can prevent page cleaner to flush such pages preventing checkpoint LSN to move forward creating a deadlock situation. Even otherwise, it could create a stall and hang like situation for large BP with plenty of dirty pages to flush before the stage could finish. Fix: During furious flush, check and evict LRU pages after each flush iteration.	2024-06-05 18:11:29 +05:30
Marko Mäkelä	5ba542e9ee	Merge 10.5 into 10.6	2024-05-30 14:27:07 +03:00
mariadb-DebarunBanerjee	b2944adb76	MDEV-34166 Server could hang with BP < 80M under stress BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size. For example, for BP size lower than 80M and 16 K page size, the limit is more than 5% of total BP and for lowest BP 5M, it is 80% of the BP. Non-data objects like explicit locks could occupy part of the BP pool reducing the pages available for LRU. If LRU reaches minimum limit and if no free pages are available, server would hang with page cleaner not able to free any more pages. Fix: To avoid such hang, we adjust the LRU limit lower than the limit for data objects as checked in buf_LRU_check_size_of_non_data_objects() i.e. one page less than 5% of BP.	2024-05-21 14:13:29 +05:30
mariadb-DebarunBanerjee	2d5cba22a9	MDEV-34167 We fail to terminate transaction early with ER_LOCK_TABLE_FULL when lock memory is growing This regression is introduced in 10.6 by following commit. commit `b6a2472489` MDEV-27891: SIGSEGV in InnoDB buffer pool resize During DML, we check if buffer pool is running out of data pages in buf_pool_t::running_out. Here is 75% of the buffer pool is occupied by non-data pages we rollback the current transaction and exit with ER_LOCK_TABLE_FULL. The integer division (n_chunks_new / 4) becomes zero whenever the total number of chunks are < 4 making the check completely ineffective for such cases. Also the check is inaccurate for larger chunks. Fix-1: Correct the check in buf_pool_t::running_out. Fix-2: While waiting for free page, check for buf_LRU_check_size_of_non_data_objects.	2024-05-21 08:33:44 +05:30
Sergei Golubchik	7b53672c63	Merge branch '10.5' into 10.6	2024-05-08 20:06:00 +02:00
Vladislav Vaintroub	b18259ecf5	Compiling - Fix MSVC compile warnings, on x86	2024-05-03 22:00:56 +02:00
mariadb-DebarunBanerjee	90b95c6149	MDEV-33543 Server hang caused by InnoDB change buffer Issue: When getting a page (buf_page_get_gen) with no latch option (RW_NO_LATCH), the caller is not expected to follow the B-tree latching order. However in buf_page_get_low we try to acquire shared page latch unconditionally to wait for a page that is being loaded by another thread concurrently. In general it could lead to latch order violation and deadlock. Currently it affects the change buffer insert path btr_latch_prev() which tries to load the previous page out of order with RW_NO_LATCH and two concurrent inserts into IBUF tree cause deadlock. This problem is introduced in 10.6 by following commit. commit `9436c778c3` (MDEV-27058) Fix: While trying to latch a page with RW_NO_LATCH, always use the "*lock_try" interface and retry operation on failure after unfixing the page.	2024-05-02 17:07:01 +05:30
Marko Mäkelä	a4cda66e2d	MDEV-33588 buf::Block_hint is a performance hog In so-called optimistic buffer pool lookups, we must not dereference a block descriptor before we have made sure that it is accessible. While buf_pool_t::resize() is running, block descriptors could become invalid. The buf::Block_hint class was essentially duplicating a buf_pool.page_hash lookup that was executed in buf_page_optimistic_get() anyway. For better locality of reference, we had better execute that lookup only once. buf_page_optimistic_fix(): Prepare for buf_page_optimistic_get(). This basically is a simpler version of Buf::Block_hint. buf_page_optimistic_get(): Assume that buf_page_optimistic_fix() has been called and the page identifier verified. Should the block be evicted, the block->modify_clock will be invalidated; we do not need to check the block->page.id() again. It suffices to check the block->modify_clock after acquiring the page latch. btr_pcur_t::old_page_id: Store the expected page identifier for buf_page_optimistic_fix(). btr_pcur_t::block_when_stored: Remove. This was duplicating page_cur_t::block. btr_pcur_optimistic_latch_leaves(): Remove redundant parameters. First, invoke buf_page_optimistic_fix() on the requested page. If needed, acquire a latch on the left page. Finally, acquire a latch on the target page and recheck the block->modify_clock. If the page had been freed while we were not holding a page latch, fall back to the slow path. Validate the FIL_PAGE_PREV after acquiring a latch on the current page. The block->modify_clock is only being incremented when records are deleted or pages reorganized or evicted; it does not guard against concurrent page splits. Reviewed by: Debarun Banerjee	2024-04-09 12:48:01 +03:00
Marko Mäkelä	ccb7a1e9a1	Merge 10.5 into 10.6	2024-03-27 15:00:56 +02:00
Marko Mäkelä	f0590db5c5	MDEV-33591 MONITOR_INC_VALUE_CUMULATIVE is executed regardless of "if" condition MONITOR_INC_VALUE_CUMULATIVE is a multiline macro, so the second statement will be executed always, regardless of "if" condition. These problems first started with commit `b1ab211dee` (MDEV-15053). Thanks to Yury Chaikou from ServiceNow for the report.	2024-03-22 14:57:00 +02:00
Marko Mäkelä	fa8a46eb68	MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool By design, InnoDB has always hung when permanently running out of buffer pool, for example when several threads are waiting to allocate a block, and all of the buffer pool is buffer-fixed by the active threads. The hang that we are fixing here occurs when the buffer pool is only temporarily running out and the situation could be rescued by writing out some dirty pages or evicting some clean pages. buf_LRU_get_free_block(): Simplify the way how we wait for the buf_flush_page_cleaner thread. This fixes occasional hangs of the test encryption.innochecksum that were introduced by commit `a55b951e60` (MDEV-26827). To play it safe, we use a timed wait when waiting for the buf_flush_page_cleaner() thread to perform its job. Should that thread get stuck, we will invoke buf_pool.LRU_warn() in order to display a message that pages could not be freed, and keep trying to wake up the buf_flush_page_cleaner() thread. The INFORMATION_SCHEMA.INNODB_METRICS counters buffer_LRU_single_flush_failure_count and buffer_LRU_get_free_waits will be removed. The latter is represented by buffer_pool_wait_free. Also removed will be the message "InnoDB: Difficult to find free blocks in the buffer pool" because in `d34479dc66` we introduced a more precise message "InnoDB: Could not free any blocks in the buffer pool" in the buf_flush_page_cleaner thread. buf_pool_t::LRU_warn(): Issue the warning message that we could not free any blocks in the buffer pool. This may also be invoked by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears to be stuck. buf_pool_t::n_flush_dec(): Remove. buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec(). buf_flush_LRU_list_batch(): Increment the eviction counter for blocks of temporary, discarded or dropped tablespaces. buf_flush_LRU(): Make static, and remove the constant parameter evict=false. The only caller will be the buf_flush_page_cleaner() thread. IORequest::is_LRU(): Remove. The only case of evicting pages on write completion will be when we are writing out pages of the temporary tablespace. Those pages are not in buf_pool.flush_list, only in buf_pool.LRU. buf_page_t::flush(): Remove the parameter evict. buf_page_t::write_complete(): Change the parameter "bool temporary" to "bool persistent" and add a parameter for an already read state(). Reviewed by: Debarun Banerjee	2024-03-22 14:17:39 +02:00
Marko Mäkelä	50715bd2ed	Merge 10.5 into 10.6	2024-03-18 17:07:32 +02:00
Marko Mäkelä	4592af2e84	Work around missing MSAN instrumentation Let us skip the recently added test main.mysql-interactive if an instrumented ncurses library is not available. In InnoDB, let us work around an uninstrumented libnuma, by declaring that the objects returned by numa_get_mems_allowed() are initialized.	2024-03-18 16:01:58 +02:00
Marko Mäkelä	0772ac1f16	MDEV-33508 Performance regression due to frequent scan of full buf_pool.flush_list buf_flush_page_cleaner(): Remove a loop that had originally been added in commit `9d1466522e` (MDEV-32029) and made redundant by commit `5b53342a6a` (MDEV-32588). Starting with commit `d34479dc66` (MDEV-33053) this loop would cause a significant performance regression in workloads where buf_pool.need_LRU_eviction() constantly holds in buf_flush_page_cleaner(). Thanks to Steve Shaw of Intel for noticing this. Reviewed by: Debarun Banerjee Tested by: Matthias Leich	2024-02-28 12:48:38 +02:00
Marko Mäkelä	691f923906	Merge 10.5 into 10.6	2024-02-13 20:42:59 +02:00
Marko Mäkelä	68d9deb69a	MDEV-33332 SIGSEGV in buf_read_ahead_linear() when bpage is in buf_pool.watch buf_read_ahead_linear(): If buf_pool.watch_is_sentinel(*bpage), do not attempt to read the page frame because the pointer would be null for the elements of buf_pool.watch[]. Hitting this bug requires the use of a non-default value of innodb_change_buffering.	2024-02-13 14:10:44 +02:00
Marko Mäkelä	495e7f1b3d	MDEV-33053 fixup: Correct a condition before a message	2024-01-22 08:24:08 +02:00
Daniel Black	0c23f84d8d	MDEV-32983 cosmetic improvement on path separator near ib_buffer_pool A mix of path separators looks odd. InnoDB: Loading buffer pool(s) from C:\xampp\mysql\data/ib_buffer_pool This was changed in `cf552f5886` Both forward slashes and backward slashes work on Windows. We do not use \\?\ names. So we improve the consistent look of it so it doesn't look like a bug. Normalize, in this case, the path separator to \ for making the filename. Reported thanks to Github user @celestinoxp. Closes: https://github.com/ApacheFriends/xampp-build/issues/33 Reviewed by: Marko Mäkelä and Vladislav Vaintroub	2024-01-22 16:56:00 +11:00
Marko Mäkelä	7e65e3027e	MDEV-33275 buf_flush_LRU(): mysql_mutex_assert_owner(&buf_pool.mutex) failed In commit `a55b951e60` (MDEV-26827) an error was introduced in a rarely executed code path of the buf_flush_page_cleaner() thread. As a result, the function buf_flush_LRU() could be invoked while not holding buf_pool.mutex. Reviewed by: Debarun Banerjee	2024-01-19 12:40:32 +02:00
Marko Mäkelä	d34479dc66	MDEV-33053 InnoDB LRU flushing does not run before running out of buffer pool buf_flush_LRU(): Display a warning if no pages could be evicted and no writes initiated. buf_pool_t::need_LRU_eviction(): Renamed from buf_pool_t::ran_out(). Check if the amount of free pages is smaller than innodb_lru_scan_depth instead of checking if it is 0. buf_flush_page_cleaner(): For the final LRU flush after a checkpoint flush, use a "budget" of innodb_io_capacity_max, like we do in the case when we are not in "furious" checkpoint flushing. Co-developed by: Debarun Banerjee Reviewed by: Debarun Banerjee Tested by: Matthias Leich	2024-01-19 12:40:16 +02:00
Marko Mäkelä	3a96eba25f	Merge 10.5 into 10.6	2024-01-17 13:35:05 +02:00
Thirunarayanan Balathandayuthapani	caad34df54	MDEV-32968 InnoDB fails to restore tablespace first page from doublewrite buffer when page is empty - InnoDB fails to find the space id from the page0 of the tablespace. In that case, InnoDB can use doublewrite buffer to recover the page0 and write into the file. - buf_dblwr_t::init_or_load_pages(): Loads only the pages which are valid.(page lsn >= checkpoint). To do that, InnoDB has to open the redo log before system tablespace, read the latest checkpoint information. recv_dblwr_t::find_first_page(): 1) Iterate the doublewrite buffer pages and find the 0th page 2) Read the tablespace flags, space id from the 0th page. 3) Read the 1st, 2nd and 3rd page from tablespace file and compare the space id with the space id which is stored in doublewrite buffer. 4) If it matches then we can write into the file. 5) Return space which matches the pages from the file. SysTablespace::read_lsn_and_check_flags(): Remove the retry logic for validating the first page. After restoring the first page from doublewrite buffer, assign tablespace flags by reading the first page. recv_recovery_read_max_checkpoint(): Reads the maximum checkpoint information from log file recv_recovery_from_checkpoint_start(): Avoid reading the checkpoint header information from log file Datafile::validate_first_page(): Throw error in case of first page validation fails.	2024-01-15 14:08:27 +05:30
Marko Mäkelä	3613fb2aa8	MDEV-33112 innodb_undo_log_truncate=ON is blocking page write When innodb_undo_log_truncate=ON causes an InnoDB undo tablespace to be truncated, we must guarantee that the undo tablespace will be rebuilt atomically: After mtr_t::commit_shrink() has durably written the mini-transaction that rebuilds the undo tablespace, we must not write any old pages to the tablespace. To guarantee this, in trx_purge_truncate_history() we used to traverse the entire buf_pool.flush_list in order to acquire exclusive latches on all pages for the undo tablespace that reside in the buffer pool, so that those pages cannot be written and will be evicted during mtr_t::commit_shrink(). But, this traversal may interfere with the page writing activity of buf_flush_page_cleaner(). It would be better to lazily discard the old pages of the truncated undo tablespace. fil_space_t::is_being_truncated, fil_space_t::clear_stopping(): Remove. fil_space_t::create_lsn: A new field, identifying the LSN of the latest rebuild of a tablespace. buf_page_t::flush(), buf_flush_try_neighbors(): Evict pages whose FIL_PAGE_LSN is below fil_space_t::create_lsn. mtr_t::commit_shrink(): Update fil_space_t::create_lsn and fil_space_t::size right before the log is durably written and the tablespace file is being truncated. fsp_page_create(), trx_purge_truncate_history(): Simplify the logic. Reviewed by: Thirunarayanan Balathandayuthapani, Vladislav Lesin Performance tested by: Axel Schwenke Correctness tested by: Matthias Leich	2024-01-10 11:53:00 +02:00
Marko Mäkelä	6538a91e94	Merge 10.5 into 10.6	2024-01-08 14:39:56 +02:00
Marko Mäkelä	0b612619ef	MDEV-33098: Fix some instrumentation for innodb.doublewrite_debug buf_flush_page_cleaner(): A continue or break inside DBUG_EXECUTE_IF actually is a no-op. Use an explicit call to _db_keyword_() to actually avoid advancing the checkpoint. buf_flush_list_now_set(): Invoke os_aio_wait_until_no_pending_writes() to ensure that the page write to the system tablespace is completed.	2024-01-08 14:36:54 +02:00
Marko Mäkelä	2b01e5103d	Merge 10.5 into 10.6	2023-12-19 18:41:42 +02:00
Marko Mäkelä	4c2e971841	MDEV-33052 MSAN use-of-uninitialized-value in buf_read_ahead_linear() buf_read_ahead_linear(): Suppress a warning of comparing potentially uninitialized FIL_PAGE_PREV and FIL_PAGE_NEXT fields.	2023-12-18 10:37:11 +02:00
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Marko Mäkelä	c8346c0bac	MDEV-31939 Adaptive flush recommendation ignores dirty ratio and checkpoint age buf_flush_page_cleaner(): Pass pct_lwm=srv_max_dirty_pages_pct_lwm (innodb_max_dirty_pages_pct_lwm) to page_cleaner_flush_pages_recommendation() unless the dirty page ratio of the buffer pool is below that. Starting with commit `d4265fbde5` we used to always pass pct_lwm=0.0, which was not intended. Reviewed by: Vladislav Vaintroub	2023-12-08 09:31:34 +02:00

1 2 3 4 5 ...

1,437 commits