mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 12:02:42 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	6538a91e94	Merge 10.5 into 10.6	2024-01-08 14:39:56 +02:00
Marko Mäkelä	0b612619ef	MDEV-33098: Fix some instrumentation for innodb.doublewrite_debug buf_flush_page_cleaner(): A continue or break inside DBUG_EXECUTE_IF actually is a no-op. Use an explicit call to _db_keyword_() to actually avoid advancing the checkpoint. buf_flush_list_now_set(): Invoke os_aio_wait_until_no_pending_writes() to ensure that the page write to the system tablespace is completed.	2024-01-08 14:36:54 +02:00
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Marko Mäkelä	c8346c0bac	MDEV-31939 Adaptive flush recommendation ignores dirty ratio and checkpoint age buf_flush_page_cleaner(): Pass pct_lwm=srv_max_dirty_pages_pct_lwm (innodb_max_dirty_pages_pct_lwm) to page_cleaner_flush_pages_recommendation() unless the dirty page ratio of the buffer pool is below that. Starting with commit `d4265fbde5` we used to always pass pct_lwm=0.0, which was not intended. Reviewed by: Vladislav Vaintroub	2023-12-08 09:31:34 +02:00
Marko Mäkelä	d963584d4c	Merge 10.5 into 10.6	2023-11-22 16:56:47 +02:00
Marko Mäkelä	78c9a12c8f	MDEV-32861 InnoDB hangs when running out of I/O slots When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1 and the server is run with the minimum parameters innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs were observed. tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire() will be woken up when the cache previously was empty. buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite batch so that os_aio_wait_until_no_pending_writes() has a chance of returning. Add a Boolean parameter and pass wait_for_reads=false inside buf_page_decrypt_after_read(), because those calls will be executed inside a read completion callback, and therefore os_aio_wait_until_no_pending_reads() would block indefinitely.	2023-11-22 16:54:41 +02:00
Marko Mäkelä	9a545eb67c	MDEV-26055: Correct the formula for adaptive flushing This is a 10.5 backport of 10.6 commit `d4265fbde5`. page_cleaner_flush_pages_recommendation(): If dirty_pct is between innodb_max_dirty_pages_pct_lwm and innodb_max_dirty_pages_pct, scale the effort relative to how close we are to innodb_max_dirty_pages_pct. The previous formula was missing a multiplication by 100.	2023-11-16 17:45:37 +02:00
Marko Mäkelä	a3d0d5fc33	MDEV-26055: Improve adaptive flushing This is a 10.5 backport from 10.6 commit `9593cccf28`. Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0 (not default) and innodb_adaptive_flushing=ON (default). There is also the parameter innodb_adaptive_flushing_lwm (default: 10 per cent of the log capacity). It should enable some adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0. That is not being changed here. This idea was first presented by Inaam Rana several years ago, and I discussed it with Jean-François Gagné at FOSDEM 2023. buf_flush_page_cleaner(): When we are not near the log capacity limit (neither buf_flush_async_lsn nor buf_flush_sync_lsn are set), also try to move clean blocks from the buf_pool.LRU list to buf_pool.free or initiate writes (but not the eviction) of dirty blocks, until the remaining I/O capacity has been consumed. buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify whether dirty least recently used pages (from buf_pool.LRU) should be evicted immediately after they have been written out. Callers outside buf_flush_page_cleaner() will pass evict=true, to retain the existing behaviour. buf_do_LRU_batch(): Add the parameter bool evict. Return counts of evicted and flushed pages. buf_flush_LRU(): Add the parameter bool evict. Assume that the caller holds buf_pool.mutex and will invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list() whose caller must hold buf_pool.mutex and invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have buf_flush_wait_batch_end(). page_cleaner_flush_pages_recommendation(): Avoid some floating-point arithmetics. buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(), buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict". buf_free_from_unzip_LRU_list_batch(): Remove the parameter. Only actual page writes will contribute towards the limit. buf_LRU_free_page(): Evict freed pages of temporary tables. buf_pool.done_free: Broadcast whenever a block is freed (and buf_pool.try_LRU_scan is set). buf_pool_t::io_buf_t::reserve(): Retry indefinitely. During the test encryption.innochecksum we easily run out of these buffers for PAGE_COMPRESSED or ENCRYPTED pages. Tested by Matthias Leich and Axel Schwenke	2023-11-16 17:45:18 +02:00
Marko Mäkelä	5b53342a6a	MDEV-32588 InnoDB may hang when running out of buffer pool buf_flush_LRU_list_batch(): Do not skip pages that are actually clean but in buf_pool.flush_list due to the "lazy removal" optimization of commit `22b62edaed`, but try to evict them. After acquiring buf_pool.flush_list_mutex, reread oldest_modification to ensure that the block still remains in buf_pool.flush_list. In addition to server hangs, this bug could also cause InnoDB: Failing assertion: list.count > 0 in invocations of UT_LIST_REMOVE(flush_list, ...). This fixes a regression that was caused by commit `a55b951e60` and possibly made more likely to hit due to commit `aa719b5010`.	2023-10-26 15:10:53 +03:00
Marko Mäkelä	39e3ca8bd2	MDEV-31826 InnoDB may fail to recover after being killed in fil_delete_tablespace() InnoDB was violating the write-ahead-logging protocol when a file was being deleted, like this: 1. fil_delete_tablespace() set the fil_space_t::STOPPING flag 2. The buf_flush_page_cleaner() thread discards some changed pages for this tablespace advances the log checkpoint a little. 3. The server process is killed before fil_delete_tablespace() wrote a FILE_DELETE record. 4. Recovery will try to apply log to pages of the tablespace, because there was no FILE_DELETE record. This will fail, because some pages that had been modified since the latest checkpoint had not been written by the page cleaner. Page writes must not be stopped before a FILE_DELETE record has been durably written. fil_space_t::drop(): Replaces fil_space_t::check_pending_operations(). Add the parameter detached_handle, and return a tablespace pointer if this thread was the first one to stop I/O on the tablespace. mtr_t::commit_file(): Remove the parameter detached_handle, and move some handling to fil_space_t::drop(). fil_space_t: STOPPING_READS, STOPPING_WRITES: Separate flags for STOPPING. We want to stop reads (and encryption) before stopping page writes. fil_space_t::is_stopping_writes(), fil_space_t::get_for_write(): Special accessors for the write path. fil_space_t::flush_low(): Ignore the STOPPING_READS flag and only stop if STOPPING_WRITES is set, to avoid an infinite loop in fil_flush_file_spaces(), which was occasionally repeated by running the test encryption.create_or_replace. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-10-26 15:07:59 +03:00
Marko Mäkelä	b21f52ee73	Merge 10.5 into 10.6	2023-10-23 16:43:48 +03:00
Marko Mäkelä	b5e43a1d35	MDEV-32552 Write-ahead logging is broken for freed pages buf_page_free(): Flag the freed page as modified if it is found in the buffer pool. buf_flush_page(): If the page has been freed, ensure that the log for it has been durably written, before removing the page from buf_pool.flush_list. FindBlockX: Find also MTR_MEMO_PAGE_X_MODIFY in order to avoid an occasional failure of innodb.innodb_defrag_concurrent, which involves freeing and reallocating pages in the same mini-transaction. This fixes a regression that was introduced in commit `a35b4ae898` (MDEV-15528). This logic was tested by commenting out the $shutdown_timeout line from a test and running the following: ./mtr --rr innodb.scrub rr replay var/log/mysqld.1.rr/mariadbd-0 A breakpoint in the modified buf_flush_page() was hit, and the FIL_PAGE_LSN of that page had been last modified during the mtr_t::commit() of a mini-transaction where buf_page_free() had been executed on that page.	2023-10-23 16:13:16 +03:00
Marko Mäkelä	cdd2fa7fc5	MDEV-32134 InnoDB hang in buf_flush_wait_LRU_batch_end() buf_flush_page_cleaner(): Before finishing a batch, wake up any threads that are waiting for buf_pool.done_flush_LRU. This should fix a hung shutdown that we observed after SET GLOBAL innodb_buffer_pool_size started was executed to shrink the InnoDB buffer pool.	2023-09-11 14:54:50 +03:00
Marko Mäkelä	9d1466522e	MDEV-32029 Assertion failures in log_sort_flush_list upon crash recovery In commit `0d175968d1` (MDEV-31354) we only waited that no buf_pool.flush_list writes are in progress. The buf_flush_page_cleaner() thread could still initiate page writes from the buf_pool.LRU list while only holding buf_pool.mutex, not buf_pool.flush_list_mutex. This is something that was changed in commit `a55b951e60` (MDEV-26827). log_sort_flush_list(): Wait for the buf_flush_page_cleaner() thread to be completely idle, including LRU flushing. buf_flush_page_cleaner(): Always broadcast buf_pool.done_flush_list when becoming idle, so that log_sort_flush_list() will be woken up. Also, ensure that buf_pool.n_flush_inc() or buf_pool.flush_list_set_active() has been invoked before any page writes are initiated. buf_flush_try_neighbors(): Release buf_pool.mutex here and not in the callers, to avoid code duplication. Make innodb_flush_neighbors=ON obey the innodb_io_capacity limit.	2023-08-30 14:40:13 +03:00
Marko Mäkelä	72928e640e	MDEV-27593: Crashing on I/O error is unhelpful buf_page_t::write_complete(), buf_page_write_complete(), IORequest::write_complete(): Add a parameter for passing an error code. If an error occurred, we will release the io-fix, buffer-fix and page latch but not reset the oldest_modification field. The block would remain in buf_pool.LRU and possibly buf_pool.flush_list, to be written again later, by buf_flush_page_cleaner(). If all page writes start consistently failing, all write threads should eventually hang in log_free_check() because the log checkpoint cannot be advanced to make room in the circular write-ahead-log ib_logfile0. IORequest::read_complete(): Add a parameter for passing an error code. If a read operation fails, we report the error and discard the page, just like we would do if the page checksum was not validated or the page could not be decrypted. This only affects asynchronous reads, due to linear or random read-ahead or crash recovery. When buf_page_get_low() invokes buf_read_page(), that will be a synchronous read, not involving this code. This was tested by randomly injecting errors in write_io_callback() and read_io_callback(), like this: if (!ut_rnd_interval(100)) cb->m_err= 42;	2023-08-01 14:39:29 +03:00
Marko Mäkelä	ce547cfc05	MDEV-31350: Hang in innodb.recovery_memory buf_flush_page_cleaner(): Whenever buf_pool.ran_out(), invoke buf_pool.get_oldest_modification(0) so that all clean blocks will be removed from buf_pool.flush_list and buf_flush_LRU_list_batch() will be able to evict some pages. This fixes a regression that was likely caused by commit `a55b951e60` (MDEV-26827).	2023-05-26 16:40:07 +03:00
Marko Mäkelä	f2c17cc9d9	MDEV-29911 InnoDB recovery and mariadb-backup --prepare fail to report detailed progress This is a 10.6 port of commit `2f9e264781` from MariaDB Server 10.9 that is missing some optimization due to a more complex redo log format and recovery logic (which was simplified in commit `685d958e38`). The progress reporting of InnoDB crash recovery was rather intermittent. Nothing was reported during the single-threaded log record parsing, which could consume minutes when parsing a large log. During log application, there only was progress reporting in background threads that would be invoked on data page read completion. The progress reporting here will be detailed like this: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1963895808 InnoDB: Multi-batch recovery needed at LSN 2534560930 InnoDB: Read redo log up to LSN=3312233472 InnoDB: Read redo log up to LSN=1599646720 InnoDB: Read redo log up to LSN=2160831488 InnoDB: To recover: LSN 2806789376/2806819840; 195082 pages InnoDB: To recover: LSN 2806789376/2806819840; 63507 pages InnoDB: Read redo log up to LSN=3195776000 InnoDB: Read redo log up to LSN=3687099392 InnoDB: Read redo log up to LSN=4165315584 InnoDB: To recover: LSN 4374395699/4374440960; 241454 pages InnoDB: To recover: LSN 4374395699/4374440960; 123701 pages InnoDB: Read redo log up to LSN=4508724224 InnoDB: Read redo log up to LSN=5094550528 InnoDB: To recover: 205230 pages The previous messages "Starting a batch to recover" or "Starting a final batch to recover" will be replaced by "To recover: ... pages" messages. If a batch lasts longer than 15 seconds, then there will be progress reports every 15 seconds, showing the number of remaining pages. For the non-final batch, the "To recover:" message includes two end LSN: that of the batch, and of the recovered log. This is the primary measure of progress. The batch will end once the number of pages to recover reaches 0. If recovery is possible in a single batch, the output will look like this, with a shorter "To recover:" message that counts only the remaining pages: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1984539648 InnoDB: Read redo log up to LSN=2710875136 InnoDB: Read redo log up to LSN=3358895104 InnoDB: Read redo log up to LSN=3965299712 InnoDB: Read redo log up to LSN=4557417472 InnoDB: Read redo log up to LSN=5219527680 InnoDB: To recover: 450915 pages We will also speed up recovery by improving the memory management and implementing multi-threaded recovery of data pages that will not need to be read into the buffer pool ("fake read"). Log application in the "fake read" threads will be protected by an atomic being_recovered field and exclusive buf_page_t::lock. Recovery will reserve for data pages two thirds of the buffer pool, or 256 pages, whichever is smaller. Previously, we could only use at most one third of the buffer pool for buffered log records. This would typically mean that with large buffer pools, recovery unnecessary consisted of multiple batches. If recovery runs out of memory, it will "roll back" or "rewind" the current mini-transaction. The recv_sys.recovered_lsn and recv_sys.pages will correspond to the "out of memory LSN", at the end of the previous complete mini-transaction. If recovery runs out of memory while executing the final recovery batch, we can simply invoke recv_sys.apply(false) to make room, and resume parsing. If recovery runs out of memory before the final batch, we will scan the redo log to the end and check for any missing or inconsistent files. In this version of the patch, we will throw away any previously buffered recv_sys.pages and rescan the log from the checkpoint onwards. recv_sys_t::pages_it: A cached iterator to recv_sys.pages. recv_sys_t::is_memory_exhausted(): Remove. We will have out-of-memory handling deep inside recv_sys_t::parse(). recv_sys_t::rewind(), page_recv_t::recs_t::rewind(): Remove all log starting with a specific LSN. IORequest::write_complete(), IORequest::read_complete(): Replaces fil_aio_callback(). read_io_callback(), write_io_callback(): Replaces io_callback(). IORequest::fake_read_complete(), fake_io_callback(), os_fake_read(): Process a "fake read" request for concurrent recovery. recv_sys_t::apply_batch(): Choose a number of successive pages for a recovery batch. recv_sys_t::erase(recv_sys_t::map::iterator): Remove log records for a page whose recovery is not in progress. Log application threads will not invoke this; they will only set being_recovered=-1 to indicate that the entry is no longer needed. recv_sys_t::garbage_collect(): Remove all being_recovered=-1 entries. recv_sys_t::wait_for_pool(): Wait for some space to become available in the buffer pool. mlog_init_t::mark_ibuf_exist(): Avoid calls to recv_sys::recover_low() via ibuf_page_exists() and buf_page_get_low(). Such calls would lead to double locking of recv_sys.mutex, which depending on implementation could cause a deadlock. We will use lower-level calls to look up index pages. buf_LRU_block_remove_hashed(): Disable consistency checks for freed ROW_FORMAT=COMPRESSED pages. Their contents could be uninitialized garbage. This fixes an occasional failure of the test innodb.innodb_bulk_create_index_debug. Tested by: Matthias Leich	2023-05-19 15:20:07 +03:00
Marko Mäkelä	a3e5b5c4db	Merge 10.5 into 10.6	2023-05-15 09:02:32 +03:00
Marko Mäkelä	477285c8ea	MDEV-31253 Freed data pages are not always being scrubbed fil_space_t::flush_freed(): Renamed from buf_flush_freed_pages(); this is a backport of `aa45850687` from 10.6. Invoke log_write_up_to() on last_freed_lsn, instead of avoiding the operation when the log has not yet been written. A more costly alternative would be that log_checkpoint() would invoke this function on every affected tablespace.	2023-05-12 14:57:14 +03:00
Marko Mäkelä	d4265fbde5	MDEV-26055: Correct the formula for adaptive flushing page_cleaner_flush_pages_recommendation(): If dirty_pct is between innodb_max_dirty_pages_pct_lwm and innodb_max_dirty_pages_pct, scale the effort relative to how close we are to innodb_max_dirty_pages_pct. The previous formula was missing a multiplication by 100. Tested by: Axel Schwenke	2023-04-26 11:53:42 +03:00
Marko Mäkelä	c22ab93f8a	MDEV-26827 fixup: Prevent a hang in LRU eviction buf_pool_t::page_cleaner_wakeup(): If for_LRU=true, wake up the page cleaner immediately, also when it is in a timed wait. This avoids an unnecessary delay of up to 1 second.	2023-04-25 15:03:38 +03:00
Marko Mäkelä	0976afec88	MDEV-31114 Assertion !...is_waiting() failed in os_aio_wait_until_no_pending_writes() os_aio_wait_until_no_pending_reads(), os_aio_wait_until_pending_writes(): Add a Boolean parameter to indicate whether the wait should be declared in the thread pool. buf_flush_wait(): The callers have already declared a wait, so let us avoid doing that again, just call os_aio_wait_until_pending_writes(false). buf_flush_wait_flushed(): Do not declare a wait in the rare case that the buf_flush_page_cleaner thread has been shut down already. buf_flush_page_cleaner(), buf_flush_buffer_pool(): In the code that runs during shutdown, do not declare waits. buf_flush_buffer_pool(): Remove a debug assertion that might fail. What really matters here is buf_pool.flush_list.count==0. buf_read_recv_pages(), srv_prepare_to_delete_redo_log_file(): Do not declare waits during InnoDB startup.	2023-04-24 09:57:58 +03:00
Marko Mäkelä	e55e761eae	MDEV-31084 assert(waiting) failed in TP_connection_generic::wait_end buf_flush_wait_flushed(): Correct the logic for registering a wait around buf_flush_wait() that commit `a091d6ac4e` recently broke. This should be easily repeatable when using a non-default startup parameter: thread-handling=pool-of-threads	2023-04-21 16:49:59 +03:00
Marko Mäkelä	27ff972be2	MDEV-26827 fixup: Do not hog buf_pool.mutex buf_flush_LRU_list_batch(): When evicting clean pages, release and reacquire the buf_pool.mutex after every 32 pages. Also, eliminate some conditional branches.	2023-04-19 18:57:18 +03:00
Marko Mäkelä	a091d6ac4e	MDEV-26827 fixup: Do not duplicate io_slots::pending_io_count() os_aio_pending_reads_approx(), os_aio_pending_reads(): Replaces buf_pool.n_pend_reads. os_aio_pending_writes(): Replaces buf_dblwr.pending_writes(). buf_dblwr_t::write_cond, buf_dblwr_t::writes_pending: Remove.	2023-04-12 13:49:57 +03:00
Marko Mäkelä	5bada1246d	Merge 10.5 into 10.6	2023-04-11 16:15:19 +03:00
Marko Mäkelä	375991a531	MDEV-26827 fixup for DDL race condition buf_flush_try_neighbors(): Tolerate count<2 in case the tablespace is being dropped.	2023-04-11 14:42:08 +03:00
Oleksandr Byelkin	ac5a534a4c	Merge remote-tracking branch '10.4' into 10.5	2023-03-31 21:32:41 +02:00
Marko Mäkelä	dfa90257f6	MDEV-30936 clang 15.0.7 -fsanitize=memory fails massively handle_slave_io(), handle_slave_sql(), os_thread_exit(): Remove a redundant pthread_exit(nullptr) call, because it would cause SIGSEGV. mysql_print_status(): Add MEM_MAKE_DEFINED() to work around some missing instrumentation around mallinfo2(). que_graph_free_stat_list(): Invoke que_node_get_next(node) before que_graph_free_recursive(node). That is the logical and MSAN_OPTIONS=poison_in_dtor=1 compatible way of freeing memory. ins_node_t::~ins_node_t(): Invoke mem_heap_free(entry_sys_heap). que_graph_free_recursive(): Rely on ins_node_t::~ins_node_t(). fts_t::~fts_t(): Invoke mem_heap_free(fts_heap). fts_free(): Replace with direct calls to fts_t::~fts_t(). The failures in free_root() due to MSAN_OPTIONS=poison_in_dtor=1 will be covered in MDEV-30942.	2023-03-28 11:44:24 +03:00
Marko Mäkelä	32a53a66df	MDEV-26827 fixup: Remove a bogus assertion We can have dirty_blocks=0 when buf_flush_page_cleaner() is being woken up to write out or evict pages from the buf_pool.LRU list.	2023-03-20 10:32:35 +02:00
Marko Mäkelä	a55b951e60	MDEV-26827 Make page flushing even faster For more convenient monitoring of something that could greatly affect the volume of page writes, we add the status variable Innodb_buffer_pool_pages_split that was previously only available via information_schema.innodb_metrics as "innodb_page_splits". This was suggested by Axel Schwenke. buf_flush_page_count: Replaced with buf_pool.stat.n_pages_written. We protect buf_pool.stat (except n_page_gets) with buf_pool.mutex and remove unnecessary export_vars indirection. buf_pool.flush_list_bytes: Moved from buf_pool.stat.flush_list_bytes. Protected by buf_pool.flush_list_mutex. buf_pool_t::page_cleaner_status: Replaces buf_pool_t::n_flush_LRU_, buf_pool_t::n_flush_list_, and buf_pool_t::page_cleaner_is_idle. Protected by buf_pool.flush_list_mutex. We will exclusively broadcast buf_pool.done_flush_list by the buf_flush_page_cleaner thread, and only wait for it when communicating with buf_flush_page_cleaner. There is no need to keep a count of pending writes by the buf_pool.flush_list processing. A single flag suffices for that. Waits for page write completion can be performed by simply waiting on block->page.lock, or by invoking buf_dblwr.wait_for_page_writes(). buf_LRU_block_free_non_file_page(): Broadcast buf_pool.done_free and set buf_pool.try_LRU_scan when freeing a page. This would be executed also as part of buf_page_write_complete(). buf_page_write_complete(): Do not broadcast buf_pool.done_flush_list, and do not acquire buf_pool.mutex unless buf_pool.LRU eviction is needed. Let buf_dblwr count all writes to persistent pages and broadcast a condition variable when no outstanding writes remain. buf_flush_page_cleaner(): Prioritize LRU flushing and eviction right after "furious flushing" (lsn_limit). Simplify the conditions and reduce the hold time of buf_pool.flush_list_mutex. Refuse to shut down or sleep if buf_pool.ran_out(), that is, LRU eviction is needed. buf_pool_t::page_cleaner_wakeup(): Add the optional parameter for_LRU. buf_LRU_get_free_block(): Protect buf_lru_free_blocks_error_printed with buf_pool.mutex. Invoke buf_pool.page_cleaner_wakeup(true) to to ensure that buf_flush_page_cleaner() will process the LRU flush request. buf_do_LRU_batch(), buf_flush_list(), buf_flush_list_space(): Update buf_pool.stat.n_pages_written when submitting writes (while holding buf_pool.mutex), not when completing them. buf_page_t::flush(), buf_flush_discard_page(): Require that the page U-latch be acquired upfront, and remove buf_page_t::ready_for_flush(). buf_pool_t::delete_from_flush_list(): Remove the parameter "bool clear". buf_flush_page(): Count pending page writes via buf_dblwr. buf_flush_try_neighbors(): Take the block of page_id as a parameter. If the tablespace is dropped before our page has been written out, release the page U-latch. buf_pool_invalidate(): Let the caller ensure that there are no outstanding writes. buf_flush_wait_batch_end(false), buf_flush_wait_batch_end_acquiring_mutex(false): Replaced with buf_dblwr.wait_for_page_writes(). buf_flush_wait_LRU_batch_end(): Replaces buf_flush_wait_batch_end(true). buf_flush_list(): Remove some broadcast of buf_pool.done_flush_list. buf_flush_buffer_pool(): Invoke also buf_dblwr.wait_for_page_writes(). buf_pool_t::io_pending(), buf_pool_t::n_flush_list(): Remove. Outstanding writes are reflected by buf_dblwr.pending_writes(). buf_dblwr_t::init(): New function, to initialize the mutex and the condition variables, but not the backing store. buf_dblwr_t::is_created(): Replaces buf_dblwr_t::is_initialised(). buf_dblwr_t::pending_writes(), buf_dblwr_t::writes_pending: Keeps track of writes of persistent data pages. buf_flush_LRU(): Allow calls while LRU flushing may be in progress in another thread. Tested by Matthias Leich (correctness) and Axel Schwenke (performance)	2023-03-16 17:19:58 +02:00
Marko Mäkelä	9593cccf28	MDEV-26055: Improve adaptive flushing Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0 (not default) and innodb_adaptive_flushing=ON (default). There is also the parameter innodb_adaptive_flushing_lwm (default: 10 per cent of the log capacity). It should enable some adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0. That is not being changed here. This idea was first presented by Inaam Rana several years ago, and I discussed it with Jean-François Gagné at FOSDEM 2023. buf_flush_page_cleaner(): When we are not near the log capacity limit (neither buf_flush_async_lsn nor buf_flush_sync_lsn are set), also try to move clean blocks from the buf_pool.LRU list to buf_pool.free or initiate writes (but not the eviction) of dirty blocks, until the remaining I/O capacity has been consumed. buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify whether dirty least recently used pages (from buf_pool.LRU) should be evicted immediately after they have been written out. Callers outside buf_flush_page_cleaner() will pass evict=true, to retain the existing behaviour. buf_do_LRU_batch(): Add the parameter bool evict. Return counts of evicted and flushed pages. buf_flush_LRU(): Add the parameter bool evict. Assume that the caller holds buf_pool.mutex and will invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list() whose caller must hold buf_pool.mutex and invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have buf_flush_wait_batch_end(). page_cleaner_flush_pages_recommendation(): Avoid some floating-point arithmetics. buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(), buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict". buf_free_from_unzip_LRU_list_batch(): Remove the parameter. Only actual page writes will contribute towards the limit. buf_LRU_free_page(): Evict freed pages of temporary tables. buf_pool.done_free: Broadcast whenever a block is freed (and buf_pool.try_LRU_scan is set). buf_pool_t::io_buf_t::reserve(): Retry indefinitely. During the test encryption.innochecksum we easily run out of these buffers for PAGE_COMPRESSED or ENCRYPTED pages. Tested by Matthias Leich and Axel Schwenke	2023-03-16 17:09:08 +02:00
Marko Mäkelä	2b3423c462	Merge 10.3 into 10.4	2023-01-17 18:03:58 +02:00
Marko Mäkelä	489b556947	MDEV-30422 Merge new release of InnoDB 5.7.41 to 10.3 MySQL 5.7.41 includes one InnoDB change mysql/mysql-server@d2d6b2dd00 that seems to be applicable to MariaDB Server 10.3 and 10.4. Even though commit `5b9ee8d819` seems to have fixed sporadic failures on our CI systems, it is theoretically possible that another race condition remained. buf_flush_page_cleaner_coordinator(): In the final loop, wait also for buf_get_n_pending_read_ios() to reach 0. In this way, if a secondary index leaf page was read into the buffer pool and ibuf_merge_or_delete_for_page() modified that page or some change buffer pages, the flush loop would execute until the buffer pool really is in a clean state. This potential data corruption bug does not affect MariaDB Server 10.5 or later, thanks to commit `b42294bc64` which removed change buffer merges that are not explicitly requested.	2023-01-17 17:52:16 +02:00
Marko Mäkelä	ae6ebafd81	Merge 10.5 into 10.6	2022-11-14 15:44:55 +02:00
Marko Mäkelä	e0e096faaa	MDEV-29982 Improve the InnoDB log overwrite error message The InnoDB write-ahead log ib_logfile0 is of fixed size, specified by innodb_log_file_size. If the tail of the log manages to overwrite the head (latest checkpoint) of the log, crash recovery will be broken. Let us clarify the messages about this, including adding a message on the completion of a log checkpoint that notes that the dangerous situation is over. To reproduce the dangerous scenario, we will introduce the debug injection label ib_log_checkpoint_avoid_hard, which will avoid log checkpoints even harder than the previous ib_log_checkpoint_avoid. log_t::overwrite_warned: The first known dangerous log sequence number. Set in log_close() and cleared in log_write_checkpoint_info(), which will output a "Crash recovery was broken" message.	2022-11-14 12:18:03 +02:00
Marko Mäkelä	0c0a569028	Merge 10.3 into 10.4	2022-09-20 12:38:25 +03:00
Marko Mäkelä	c22dff21a5	InnoDB cleanup: Replace UNIV_LINUX, UNIV_SOLARIS, UNIV_AIX Let us use the normal platform-specific preprocessor symbols __linux__, __sun__, _AIX instead of some homebrew ones. The preprocessor symbol UNIV_HPUX must have lost its meaning by `f6deb00a56` (note: the symbol UNIV_HPUX10 is being checked for, but only UNIV_HPUX is defined).	2022-09-19 12:20:53 +03:00
Brad Smith	f02ca429f7	Revert aligned_alloc() addition from MDEV-28836 As pointed out with MDEV-29308 there are issues with the code as is. MariaDB is built as C++11 / C99. aligned_alloc() is not guarenteed to be exposed when building with any mode other than C++17 / C11. The other *BSD's have their stdlib.h header to expose the function with C+11 anyway, but the issue exists in the C99 code too, the build just does not use -Werror. Linux globally defines _GNU_SOURCE hiding the issue as well.	2022-08-22 09:10:40 +03:00
Marko Mäkelä	0fa19fdebf	MDEV-28836 fixup On GNU/Linux, even though the C11 aligned_alloc() appeared in GNU libc early on, some custom memory allocators did not implement it until recently. For example, before gperftools/gperftools@d406f22853 the free() in tcmalloc would fail to free memory that was returned by aligned_alloc(), because the latter would map to the built-in allocator of libc. The Linux specific memalign() has a similar interface and is safer to use, because it has been available for a longer time. For AddressSanitizer, we will use aligned_alloc() so that the constraint on size can be enforced. buf_tmp_reserve_compression_buf(): When HAVE_ALIGNED_ALLOC holds, round up the size to be an integer multiple of the alignment. pfs_malloc(): In the unit test stub, round up the size to be an integer multiple of the alignment.	2022-06-22 08:23:32 +03:00
Marko Mäkelä	1f1fa7e09c	Merge 10.5 into 10.6	2022-06-14 09:49:47 +03:00
Marko Mäkelä	4849d94fe6	MDEV-28828 SIGSEGV in buf_flush_LRU_list_batch In commit `73fee39ea6` (MDEV-27985) a regression was introduced that would cause bpage=nullptr to be referenced. buf_flush_LRU_list_batch(): Always terminate the loop upon encountering a null pointer.	2022-06-14 09:14:24 +03:00
Marko Mäkelä	892c426371	MDEV-13542: Do not crash on decryption failure fil_page_type_validate(): Remove. This debug check was mostly redundant and added little value to the code paths that deal with page_compressed or encrypted pages. fil_get_page_type_name(): Remove; unused function. fil_space_decrypt(): Return an error if the page is not supposed to be encrypted. It is possible that an unencrypted page contains a nonzero key_version field even though it is not supposed to be encrypted. Previously we would crash in such a situation. buf_page_decrypt_after_read(): Simplify the code. Remove some unnecessary error message about temporary tablespace corruption. This is where we would usually invoke fil_space_decrypt().	2022-06-08 09:48:12 +03:00
Marko Mäkelä	aa45850687	Cleanup: Make fil_space_t::freed_ranges private fil_space_t::is_freed(): Check if a page is in freed_ranges. fil_space_t::flush_freed(): Replaces buf_flush_freed_pages().	2022-06-06 11:55:29 +03:00
Sergei Golubchik	a70a1cf3f4	Merge branch '10.3' into 10.4	2022-05-08 23:03:08 +02:00
Oleksandr Byelkin	9614fde1aa	Merge branch '10.2' into 10.3	2022-05-03 10:59:54 +02:00
Marko Mäkelä	f21a875600	MDEV-28415 ALTER TABLE on a large table hangs InnoDB buf_flush_page(): Never wait for a page latch, even in checkpoint flushing (flush_type == BUF_FLUSH_LIST), to prevent a hang of the page cleaner threads when a large number of pages is latched. In mysql/mysql-server@9542f3015b it was claimed that such a hang only affects CREATE FULLTEXT INDEX. Their fix was to retain buffer-fix but release exclusive latch on non-leaf pages, and subsequently write to those pages while they are not associated with the mini-transaction, which would trip a debug assertion in the MariaDB version of mtr_t::memo_modify_page() and cause potential corruption when using the default MariaDB setting innodb_log_optimize_ddl=OFF. This change essentially backports a small part of commit `7cffb5f6e8` (MDEV-23399) from MariaDB Server 10.5.7.	2022-04-27 07:57:04 +03:00
Marko Mäkelä	e135edec3a	Merge 10.5 into 10.6	2022-04-26 15:21:20 +03:00
Marko Mäkelä	c009ce7dd0	MDEV-27094 Debug builds include useless InnoDB "disabled" options This is a backport of commit `4489a89c71` in order to remove the test innodb.redo_log_during_checkpoint that would cause trouble in the DBUG subsystem invoked by safe_mutex_lock() via log_checkpoint(). Before commit `7cffb5f6e8` these mutexes were of different type. The following options were introduced in commit `2e814d4702` (mariadb-10.2.2) and have little use: innodb_disable_resize_buffer_pool_debug had no effect even in MariaDB 10.2.2 or MySQL 5.7.9. It was introduced in mysql/mysql-server@5c4094cf49 to work around a problem that was fixed in mysql/mysql-server@2957ae4f99 (but the parameter was not removed). innodb_page_cleaner_disabled_debug and innodb_master_thread_disabled_debug are only used by the test innodb.redo_log_during_checkpoint that will be removed as part of this commit. innodb_dict_stats_disabled_debug is only used by that test, and it is redundant because one could simply use innodb_stats_persistent=OFF or the STATS_PERSISTENT=0 attribute of the table in the test to achieve the same effect.	2022-04-22 12:48:40 +03:00
Marko Mäkelä	ff99413804	MDEV-25975: Merge 10.5 into 10.6	2022-04-06 12:45:14 +03:00

1 2 3 4 5 ...

350 commits