mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-29 10:14:19 +01:00

Author	SHA1	Message	Date
Thirunarayanan Balathandayuthapani	647a7232ff	MDEV-30438 innodb.undo_truncate,4k fails when innodb-immediate-scrub-data-uncompressed is enabled - InnoDB fails to clear the freed ranges during truncation of innodb undo log tablespace. During shutdown, InnoDB flushes the freed page ranges and throws the out of bound error. mtr_t::commit_shrink(): clear the freed ranges while doing undo tablespace truncation	2023-01-23 09:55:49 +05:30
Marko Mäkelä	e0e096faaa	MDEV-29982 Improve the InnoDB log overwrite error message The InnoDB write-ahead log ib_logfile0 is of fixed size, specified by innodb_log_file_size. If the tail of the log manages to overwrite the head (latest checkpoint) of the log, crash recovery will be broken. Let us clarify the messages about this, including adding a message on the completion of a log checkpoint that notes that the dangerous situation is over. To reproduce the dangerous scenario, we will introduce the debug injection label ib_log_checkpoint_avoid_hard, which will avoid log checkpoints even harder than the previous ib_log_checkpoint_avoid. log_t::overwrite_warned: The first known dangerous log sequence number. Set in log_close() and cleared in log_write_checkpoint_info(), which will output a "Crash recovery was broken" message.	2022-11-14 12:18:03 +02:00
Marko Mäkelä	7ee612c912	MDEV-21174 fixup: Remove mtr_t::release_page() mtr_t::release_page(): Remove. The function became unused in commit `56f6dab1d0` when the call was replaced with a call to mtr_t::memo_release().	2022-11-10 12:50:44 +02:00
Marko Mäkelä	22f935d6da	MDEV-28731 Race condition on log checkpoint mtr_t::modify(): Set the m_made_dirty flag if needed, so that buf_pool_t::insert_into_flush_list() will be invoked while holding log_sys.flush_order_mutex. This is something that was should have been part of commit `b212f1dac2` (MDEV-22107).	2022-06-02 17:33:03 +03:00
Marko Mäkelä	5294695ebd	Clean up mtr_t mtr_t::is_empty(): Replaces mtr_t::get_log() and mtr_t::get_memo(). mtr_t::get_log_size(): Replaces mtr_t::get_log(). mtr_t::print(): Remove, unused function. ReleaseBlocks::ReleaseBlocks(): Remove an unused parameter.	2022-06-02 17:18:00 +03:00
Vlad Lesin	dcb2968f90	MDEV-27557 InnoDB unnecessarily commits mtr during secondary index search to preserve clustered index latching order New function to release latches till savepoint was added in mtr_t. As there is no longer need to limit MDEV-20605 fix usage for locking reads only, the limitation is removed.	2022-03-25 09:22:36 +03:00
Marko Mäkelä	fd101daa84	MDEV-27716 mtr_t::commit() acquires log_sys.mutex when writing no log mtr_t::is_block_dirtied(), mtr_t::memo_push(): Never set m_made_dirty for pages of the temporary tablespace. Ever since commit `5eb539555b` we never add those pages to buf_pool.flush_list. mtr_t::commit(): Implement part of mtr_t::prepare_write() here, and avoid acquiring log_sys.mutex if no log is written. During IMPORT TABLESPACE fixup, we do not write log, but we must add pages to buf_pool.flush_list and for that, be prepared to acquire log_sys.flush_order_mutex. mtr_t::do_write(): Replaces mtr_t::prepare_write().	2022-02-09 15:10:10 +02:00
Marko Mäkelä	017d1b867b	MDEV-27476 heap-use-after-free in buf_pool_t::is_block_field() mtr_t::modify(): Remove a debug assertion that had been added in commit `05fa4558e0` (MDEV-22110). The function buf_pool_t::is_uncompressed() is only safe to invoke while holding a buf_pool.page_hash latch so that buf_pool_t::resize() cannot concurrently invoke free() on any chunks.	2022-01-12 12:29:16 +02:00
Marko Mäkelä	f5794e1dc6	MDEV-26445 innodb_undo_log_truncate is unnecessarily slow trx_purge_truncate_history(): Do not force a write of the undo tablespace that is being truncated. Instead, prevent page writes by acquiring an exclusive latch on all dirty pages of the tablespace. fseg_create(): Relax an assertion that could fail if a dirty undo page is being initialized during undo tablespace truncation (and trx_purge_truncate_history() already acquired an exclusive latch on it). fsp_page_create(): If we are truncating a tablespace, try to reuse a page that we may have already latched exclusively (because it was in buf_pool.flush_list). To some extent, this helps the test innodb.undo_truncate,16k to avoid running out of buffer pool. mtr_t::commit_shrink(): Mark as clean all pages that are outside the new bounds of the tablespace, and only add the newly reinitialized pages to the buf_pool.flush_list. buf_page_create(): Do not unnecessarily invoke change buffer merge on undo tablespaces. buf_page_t::clear_oldest_modification(bool temporary): Move some assertions to the caller buf_page_write_complete(). innodb.undo_truncate: Use a bigger innodb_buffer_pool_size=24M. On my system, it would otherwise hang 1 out of 1547 attempts (on the 40th repeat of innodb.undo_truncate,16k). Other page sizes were not affected.	2021-09-24 08:24:03 +03:00
Marko Mäkelä	f5fddae3cb	MDEV-26450: Corruption due to innodb_undo_log_truncate At least since commit `055a3334ad` (MDEV-13564) the undo log truncation in InnoDB did not work correctly. The main issue is that during the execution of trx_purge_truncate_history() some pages of the newly truncated undo tablespace could be discarded. This is improved from commit `1cb218c37c` which was applied to earlier-version branches. fsp_try_extend_data_file(): Apply the peculiar rounding of fil_space_t::size_in_header only to the system tablespace, whose size can be expressed in megabytes in a configuration parameter. Other files may freely grow by a number of pages. fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces, and mention the file name in the error message. mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace: (1) durably write the log (2) release the page latches of the rebuilt tablespace (3) release the mutexes (4) truncate the file (5) release the tablespace latch This is refactored from trx_purge_truncate_history(). log_write_and_flush_prepare(), log_write_and_flush(): New functions to durably write log during mtr_t::commit_shrink().	2021-09-24 08:22:19 +03:00
Vladislav Vaintroub	fc2ec25733	MDEV-26166 replace log_write_up_to(LSN_MAX,...) with log_buffer_flush_to_disk() Also, remove comparison lsn > flush/write lsn, prior to calling log_write_up_to. The checks and early returns are part of this function.	2021-07-16 18:44:58 +02:00
Marko Mäkelä	6441bc614a	MDEV-25113: Introduce a page cleaner mode before 'furious flush' MDEV-23855 changed the way how the page cleaner is signaled by user threads. If a threshold is exceeded, a mini-transaction commit would invoke buf_flush_ahead() in order to initiate page flushing before all writers would eventually grind to halt in log_free_check(), waiting for the checkpoint age to reduce. However, buf_flush_ahead() would always initiate 'furious flushing', making the buf_flush_page_cleaner thread write innodb_io_capacity_max pages per batch, and sleeping no time between batches, until the limit LSN is reached. Because this could saturate the I/O subsystem, system throughput could significantly reduce during these 'furious flushing' spikes. With this change, we introduce a gentler version of flush-ahead, which would write innodb_io_capacity_max pages per second until the 'soft limit' is reached. buf_flush_ahead(): Add a parameter to specify whether furious flushing is requested. buf_flush_async_lsn: Similar to buf_flush_sync_lsn, a limit for the less intrusive flushing. buf_flush_page_cleaner(): Keep working until buf_flush_async_lsn has been reached. log_close(): Suppress a warning message in the event that a new log is being created during startup, when old logs did not exist. Return what type of page cleaning will be needed. mtr_t::finish_write(): Also when m_log.is_small(), invoke log_close(). Return what type of page cleaning will be needed. mtr_t::commit(): Invoke buf_flush_ahead() based on the return value of mtr_t::finish_write().	2021-06-23 19:06:52 +03:00
Marko Mäkelä	e87a8efd32	MDEV-24455 Assertion `!m_freed_space' failed in mtr_t::start In commit `0c23e32d27` (MDEV-24445) we forgot to keep m_freed_space in sync with m_freed_pages in one case.	2020-12-21 10:48:51 +02:00
Marko Mäkelä	0c23e32d27	MDEV-24445 Using innodb_undo_tablespaces corrupts system tablespace In the rewrite of MDEV-8139 (based on MDEV-15528), we introduced a wrong assumption that any persistent tablespace that is not an .ibd file is the system tablespace. This assumption is broken when innodb_undo_tablespaces (files undo001, undo002, ...) are being used. By default, we have innodb_undo_tablespaces=0 (the persistent undo log is being stored in the system tablespace). In MDEV-15528 and MDEV-8139 we rewrote the page scrubbing logic so that it will follow the tried-and-true write-ahead logging protocol, first writing FREE_PAGE records and then in the page flushing, zerofilling or hole-punching freed pages. Unfortunately, the implementation included a wrong assumption that that anything that is not in an .ibd file must be the system tablespace. This wrong assumption would cause overwrites of valid data pages in the system tablespace. mtr_t::m_freed_in_system_tablespace: Remove. mtr_t::m_freed_space: The tablespace associated with m_freed_pages. buf_page_free(): Take the tablespace and page number as a parameter, instead of taking a page identifier.	2020-12-18 19:23:34 +02:00
Marko Mäkelä	aa0e380568	MDEV-24348 InnoDB shutdown hang with innodb_flush_sync=0 This hang was caused by MDEV-23855, and we failed to fix it in MDEV-24109 (commit `4cbfdeca84`). When buf_flush_ahead() is invoked soon before server shutdown and the non-default setting innodb_flush_sync=OFF is in effect and the buffer pool contains dirty pages of temporary tables, the page cleaner thread may remain in an infinite loop without completing its work, thus causing the shutdown to hang. buf_flush_page_cleaner(): If the buffer pool contains no unmodified persistent pages, ensure that buf_flush_sync_lsn= 0 will be assigned, so that shutdown will proceed. The test case is not deterministic. On my system, it reproduced the hang with 95% probability when running multiple instances of the test in parallel, and 4% when running single-threaded. Thanks to Eugene Kosov for debugging and testing this.	2020-12-04 14:11:48 +02:00
Marko Mäkelä	6a1e655cb0	Merge 10.4 into 10.5	2020-12-02 18:29:49 +02:00
Marko Mäkelä	589cf8dbf3	Merge 10.3 into 10.4	2020-12-01 19:51:14 +02:00
Marko Mäkelä	81ab9ea63f	Merge 10.2 into 10.3	2020-12-01 14:55:46 +02:00
Marko Mäkelä	ce0cb6a4f6	MDEV-24188 fixup: Correct the FindBlockX predicate FindBlockX::operator(): Return false if an x-latched block is found. Previously, we were incorrectly returning false if the block was in the log, only if not x-latched. It is unknown if this mistake had any visible impact. Often, we would register both MTR_MEMO_BUF_FIX and MTR_MEMO_PAGE_X_FIX for the same block.	2020-11-18 13:52:37 +02:00
Marko Mäkelä	e8f8992801	MDEV-24188: Merge 10.4 into 10.5	2020-11-13 22:06:50 +02:00
Marko Mäkelä	749ecedfec	MDEV-24188: Merge 10.3 into 10.4	2020-11-13 20:45:28 +02:00
Marko Mäkelä	f9f2f37495	MDEV-24188: Merge 10.2 into 10.3	2020-11-13 20:41:48 +02:00
Marko Mäkelä	bb328a2a27	MDEV-24188 Hang in buf_page_create() after reusing a previously freed page The fix of MDEV-23456 (commit `b1009ae5c1`) introduced a livelock between page flushing and a thread that is executing buf_page_create(). buf_page_create(): If the current mini-transaction is holding an exclusive latch on the page, do not attempt to acquire another one, and do not care about any I/O fix. mtr_t::have_x_latch(): Replaces mtr_t::get_fix_count(). dyn_buf_t::for_each_block(const Functor&) const: A new variant. rw_lock_own(): Add a const qualifier. Reviewed by: Thirunarayanan Balathandayuthapani	2020-11-13 20:16:39 +02:00
Marko Mäkelä	7b2bb67113	Merge 10.3 into 10.4	2020-10-29 13:38:38 +02:00
Marko Mäkelä	a8de8f261d	Merge 10.2 into 10.3	2020-10-28 10:01:50 +02:00
Thirunarayanan Balathandayuthapani	bc540b8706	MDEV-23693 Failing assertion: my_atomic_load32_explicit(&lock->lock_word, MY_MEMORY_ORDER_RELAXED) == X_LOCK_DECR InnoDB frees the block lock during buffer pool shrinking when other thread is yet to release the block lock. While shrinking the buffer pool, InnoDB allows the page to be freed unless it is buffer fixed. In some cases, InnoDB releases the latch after unfixing the block. Fix: ==== - InnoDB should unfix the block after releases the latch. - Add more assertion to check buffer fix while accessing the page. - Introduced block_hint structure to store buf_block_t pointer and allow accessing the buf_block_t pointer only by passing a functor. It returns original buf_block_t* pointer if it is valid or nullptr if the pointer become stale. - Replace buf_block_is_uncompressed() with buf_pool_t::is_block_pointer() This change is motivated by a change in mysql-5.7.32: mysql/mysql-server@46e60de444 Bug #31036301 ASSERTION FAILURE: SYNC0RW.IC:429:LOCK->LOCK_WORD	2020-10-27 18:30:00 +05:30
Marko Mäkelä	c27e53f459	MDEV-23855: Use normal mutex for log_sys.mutex, log_sys.flush_order_mutex With an unreasonably small innodb_log_file_size, the page cleaner thread would frequently acquire log_sys.flush_order_mutex and spend a significant portion of CPU time spinning on that mutex when determining the checkpoint LSN.	2020-10-26 17:53:55 +02:00
Marko Mäkelä	45ed9dd957	MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contention Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored for system tablespace on SSD When the maximum configured number of file is exceeded, InnoDB will close data files. We used to maintain a fil_system.LRU list and a counter fil_node_t::n_pending to achieve this, at the huge cost of multiple fil_system.mutex operations per I/O operation. fil_node_open_file_low(): Implement a FIFO replacement policy: The last opened file will be moved to the end of fil_system.space_list, and files will be closed from the start of the list. However, we will not move tablespaces in fil_system.space_list while i_s_tablespaces_encryption_fill_table() is executing (producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION) because it may cause information of some tablespaces to go missing. We also avoid this in mariabackup --backup because datafiles_iter_next() assumes that the ordering is not changed. IORequest: Fold more parameters to IORequest::type. fil_space_t::io(): Replaces fil_io(). fil_space_t::flush(): Replaces fil_flush(). OS_AIO_IBUF: Remove. We will always issue synchronous reads of the change buffer pages in buf_read_page_low(). We will always ignore some errors for background reads. This should reduce fil_system.mutex contention a little. fil_node_t::complete_write(): Replaces fil_node_t::complete_io(). On both read and write completion, fil_space_t::release_for_io() will have to be called. fil_space_t::io(): Do not acquire fil_system.mutex in the normal code path. xb_delta_open_matching_space(): Do not try to open the system tablespace which was already opened. This fixes a file sharing violation in mariabackup --prepare --incremental. Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	3a9a3be1c6	MDEV-23855: Improve InnoDB log checkpoint performance After MDEV-15053, MDEV-22871, MDEV-23399 shifted the scalability bottleneck, log checkpoints became a new bottleneck. If innodb_io_capacity is set low or innodb_max_dirty_pct_lwm is set high and the workload fits in the buffer pool, the page cleaner thread will perform very little flushing. When we reach the capacity of the circular redo log file ib_logfile0 and must initiate a checkpoint, some 'furious flushing' will be necessary. (If innodb_flush_sync=OFF, then flushing would continue at the innodb_io_capacity rate, and writers would be throttled.) We have the best chance of advancing the checkpoint LSN immediately after a page flush batch has been completed. Hence, it is best to perform checkpoints after every batch in the page cleaner thread, attempting to run once per second. By initiating high-priority flushing in the page cleaner as early as possible, we aim to make the throughput more stable. The function buf_flush_wait_flushed() used to sleep for 10ms, hoping that the page cleaner thread would do something during that time. The observed end result was that a large number of threads that call log_free_check() would end up sleeping while nothing useful is happening. We will revise the design so that in the default innodb_flush_sync=ON mode, buf_flush_wait_flushed() will wake up the page cleaner thread to perform the necessary flushing, and it will wait for a signal from the page cleaner thread. If innodb_io_capacity is set to a low value (causing the page cleaner to throttle its work), a write workload would initially perform well, until the capacity of the circular ib_logfile0 is reached and log_free_check() will trigger checkpoints. At that point, the extra waiting in buf_flush_wait_flushed() will start reducing throughput. The page cleaner thread will also initiate log checkpoints after each buf_flush_lists() call, because that is the best point of time for the checkpoint LSN to advance by the maximum amount. Even in 'furious flushing' mode we invoke buf_flush_lists() with innodb_io_capacity_max pages at a time, and at the start of each batch (in the log_flush() callback function that runs in a separate task) we will invoke os_aio_wait_until_no_pending_writes(). This tweak allows the checkpoint to advance in smaller steps and significantly reduces the maximum latency. On an Intel Optane 960 NVMe SSD on Linux, it reduced from 4.6 seconds to 74 milliseconds. On Microsoft Windows with a slower SSD, it reduced from more than 180 seconds to 0.6 seconds. We will make innodb_adaptive_flushing=OFF simply flush innodb_io_capacity per second whenever the dirty proportion of buffer pool pages exceeds innodb_max_dirty_pages_pct_lwm. For innodb_adaptive_flushing=ON we try to make page_cleaner_flush_pages_recommendation() more consistent and predictable: if we are below innodb_adaptive_flushing_lwm, let us flush pages according to the return value of af_get_pct_for_dirty(). innodb_max_dirty_pages_pct_lwm: Revert the change of the default value that was made in MDEV-23399. The value innodb_max_dirty_pages_pct_lwm=0 guarantees that a shutdown of an idle server will be fast. Users might be surprised if normal shutdown suddenly became slower when upgrading within a GA release series. innodb_checkpoint_usec: Remove. The master task will no longer perform periodic log checkpoints. It is the duty of the page cleaner thread. log_sys.max_modified_age: Remove. The current span of the buf_pool.flush_list expressed in LSN only matters for adaptive flushing (outside the 'furious flushing' condition). For the correctness of checkpoints, the only thing that matters is the checkpoint age (log_sys.lsn - log_sys.last_checkpoint_lsn). This run-time constant was also reported as log_max_modified_age_sync. log_sys.max_checkpoint_age_async: Remove. This does not serve any purpose, because the checkpoints will now be triggered by the page cleaner thread. We will retain the log_sys.max_checkpoint_age limit for engaging 'furious flushing'. page_cleaner.slot: Remove. It turns out that page_cleaner_slot.flush_list_time was duplicating page_cleaner.slot.flush_time and page_cleaner.slot.flush_list_pass was duplicating page_cleaner.flush_pass. Likewise, there were some redundant monitor counters, because the page cleaner thread no longer performs any buf_pool.LRU flushing, and because there only is one buf_flush_page_cleaner thread. buf_flush_sync_lsn: Protect writes by buf_pool.flush_list_mutex. buf_pool_t::get_oldest_modification(): Add a parameter to specify the return value when no persistent data pages are dirty. Require the caller to hold buf_pool.flush_list_mutex. log_buf_pool_get_oldest_modification(): Take the fall-back LSN as a parameter. All callers will also invoke log_sys.get_lsn(). log_preflush_pool_modified_pages(): Replaced with buf_flush_wait_flushed(). buf_flush_wait_flushed(): Implement two limits. If not enough buffer pool has been flushed, signal the page cleaner (unless innodb_flush_sync=OFF) and wait for the page cleaner to complete. If the page cleaner thread is not running (which can be the case durign shutdown), initiate the flush and wait for it directly. buf_flush_ahead(): If innodb_flush_sync=ON (the default), submit a new buf_flush_sync_lsn target for the page cleaner but do not wait for the flushing to finish. log_get_capacity(), log_get_max_modified_age_async(): Remove, to make it easier to see that af_get_pct_for_lsn() is not acquiring any mutexes. page_cleaner_flush_pages_recommendation(): Protect all access to buf_pool.flush_list with buf_pool.flush_list_mutex. Previously there were some race conditions in the calculation. buf_flush_sync_for_checkpoint(): New function to process buf_flush_sync_lsn in the page cleaner thread. At the end of each batch, we try to wake up any blocked buf_flush_wait_flushed(). If everything up to buf_flush_sync_lsn has been flushed, we will reset buf_flush_sync_lsn=0. The page cleaner thread will keep 'furious flushing' until the limit is reached. Any threads that are waiting in buf_flush_wait_flushed() will be able to resume as soon as their own limit has been satisfied. buf_flush_page_cleaner: Prioritize buf_flush_sync_lsn and do not sleep as long as it is set. Do not update any page_cleaner statistics for this special mode of operation. In the normal mode (buf_flush_sync_lsn is not set for innodb_flush_sync=ON), try to wake up once per second. No longer check whether srv_inc_activity_count() has been called. After each batch, try to perform a log checkpoint, because the best chances for the checkpoint LSN to advance by the maximum amount are upon completing a flushing batch. log_t: Move buf_free, max_buf_free possibly to the same cache line with log_sys.mutex. log_margin_checkpoint_age(): Simplify the logic, and replace a 0.1-second sleep with a call to buf_flush_wait_flushed() to initiate flushing. Moved to the same compilation unit with the only caller. log_close(): Clean up the calculations. (Should be no functional change.) Return whether flush-ahead is needed. Moved to the same compilation unit with the only caller. mtr_t::finish_write(): Return whether flush-ahead is needed. mtr_t::commit(): Invoke buf_flush_ahead() when needed. Let us avoid external calls in mtr_t::commit() and make the logic easier to follow by having related code in a single compilation unit. Also, we will invoke srv_stats.log_write_requests.inc() only once per mini-transaction commit, while not holding mutexes. log_checkpoint_margin(): Only care about log_sys.max_checkpoint_age. Upon reaching log_sys.max_checkpoint_age where we must wait to prevent the log from getting corrupted, let us wait for at most 1MiB of LSN at a time, before rechecking the condition. This should allow writers to proceed even if the redo log capacity has been reached and 'furious flushing' is in progress. We no longer care about log_sys.max_modified_age_sync or log_sys.max_modified_age_async. The log_sys.max_modified_age_sync could be a relic from the time when there was a srv_master_thread that wrote dirty pages to data files. Also, we no longer have any log_sys.max_checkpoint_age_async limit, because log checkpoints will now be triggered by the page cleaner thread upon completing buf_flush_lists(). log_set_capacity(): Simplify the calculations of the limit (no functional change). log_checkpoint_low(): Split from log_checkpoint(). Moved to the same compilation unit with the caller. log_make_checkpoint(): Only wait for everything to be flushed until the current LSN. create_log_file(): After checkpoint, invoke log_write_up_to() to ensure that the FILE_CHECKPOINT record has been written. This avoids ut_ad(!srv_log_file_created) in create_log_file_rename(). srv_start(): Do not call recv_recovery_from_checkpoint_start() if the log has just been created. Set fil_system.space_id_reuse_warned before dict_boot() has been executed, and clear it after recovery has finished. dict_boot(): Initialize fil_system.max_assigned_id. srv_check_activity(): Remove. The activity count is counting transaction commits and therefore mostly interesting for the purge of history. BtrBulk::insert(): Do not explicitly wake up the page cleaner, but do invoke srv_inc_activity_count(), because that counter is still being used in buf_load_throttle_if_needed() for some heuristics. (It might be cleaner to execute buf_load() in the page cleaner thread!) Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	3421223363	Merge 10.4 into 10.5	2020-09-09 16:57:30 +03:00
Marko Mäkelä	66ae50a564	Merge 10.3 into 10.4	2020-09-09 15:00:21 +03:00
Marko Mäkelä	7e07e38cf6	Merge 10.2 into 10.3	2020-09-09 13:06:46 +03:00
Marko Mäkelä	c26eae0cc0	MDEV-23456 fixup: Fix mtr_t::get_fix_count() Before commit `05fa4558e0` (MDEV-22110) we have slot->type == MTR_MEMO_MODIFY that are unrelated to incrementing the buffer-fix count. FindBlock::operator(): In debug builds, skip MTR_MEMO_MODIFY entries. Also, simplify the code a little. This fixes an infinite loop in the tests innodb.innodb_defragment and innodb.innodb_wl6326_big.	2020-09-09 12:01:03 +03:00
Thirunarayanan Balathandayuthapani	b1009ae5c1	MDEV-23456 fil_space_crypt_t::write_page0() is accessing an uninitialized page buf_page_create() is invoked when page is initialized. So that previous contents of the page ignored. In few cases, it calls buf_page_get_gen() is called to fetch the page from buffer pool. It should take x-latch on the page. If other thread uses the block or block io state is different from BUF_IO_NONE then release the mutex and check the state and buffer fix count again. For compressed page, use the existing free block from LRU list to create new page. Retry to fetch the compressed page if it is in flush list fseg_create(), fseg_create_general(): Introduce block as a parameter where segment header is placed. It is used to avoid repetitive x-latch on the same page Change the assert to check whether the page has SX latch and X latch in all callee function of buf_page_create() mtr_t::get_fix_count(): Get the buffer fix count of the given block added by the mtr FindBlock is added to find the buffer fix count of the given block acquired by the mini-transaction	2020-09-09 11:58:15 +05:30
Marko Mäkelä	05fa4558e0	MDEV-22110 InnoDB unnecessarily writes unmodified pages At least since commit `6a7be48b1b` InnoDB appears to be invoking buf_flush_note_modification() on pages that were exclusively latched but not modified in a mini-transaction. MTR_MEMO_MODIFY, mtr_t::modify(): Define not only in debug code, but also in release code. We will set the MTR_MEMO_MODIFY flag on the earliest mtr_t::m_memo entry that we find. MTR_LOG_NONE: Only use this mode in cases where the previous mode will be restored before anything is modified in the mini-transaction. MTR_MEMO_PAGE_X_MODIFY, MTR_MEMO_PAGE_SX_MODIFY: The allowed flag combinations that include MTR_MEMO_MODIFY. ReleaseBlocks: Only invoke buf_flush_note_modification() on those buffer pool blocks on which mtr_t::set_modified() and mtr_t::modify() were invoked.	2020-07-28 14:02:11 +03:00
Marko Mäkelä	4d4865de6f	Merge 10.4 into 10.5	2020-07-20 15:55:59 +03:00
Marko Mäkelä	4b959bd8df	Merge 10.3 into 10.4	2020-07-20 15:34:59 +03:00
Marko Mäkelä	acc58fd835	Merge 10.2 into 10.3	2020-07-20 15:11:59 +03:00
Marko Mäkelä	ca9276e37e	Merge 10.1 into 10.2	2020-07-20 14:53:24 +03:00
Marko Mäkelä	57ec42bc32	MDEV-23190 InnoDB data file extension is not crash-safe When InnoDB is extending a data file, it is updating the FSP_SIZE field in the first page of the data file. In commit `8451e09073` (MDEV-11556) we removed a work-around for this bug and made recovery stricter, by making it track changes to FSP_SIZE via redo log records, and extend the data files before any changes are being applied to them. It turns out that the function fsp_fill_free_list() is not crash-safe with respect to this when it is initializing the change buffer bitmap page (page 1, or generally, N*innodb_page_size+1). It uses a separate mini-transaction that is committed (and will be written to the redo log file) before the mini-transaction that actually extended the data file. Hence, recovery can observe a reference to a page that is beyond the current end of the data file. fsp_fill_free_list(): Initialize the change buffer bitmap page in the same mini-transaction. The rest of the changes are fixing a bug that the use of the separate mini-transaction was attempting to work around. Namely, we must ensure that no other thread will access the change buffer bitmap page before our mini-transaction has been committed and all page latches have been released. That is, for read-ahead as well as neighbour flushing, we must avoid accessing pages that might not yet be durably part of the tablespace. fil_space_t::committed_size: The size of the tablespace as persisted by mtr_commit(). fil_space_t::max_page_number_for_io(): Limit the highest page number for I/O batches to committed_size. MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch. mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch. mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK, copy space->size to space->committed_size. In this way, read-ahead or flushing will never be invoked on pages that do not yet exist according to FSP_SIZE.	2020-07-20 14:48:56 +03:00
Monty	0fd89a1a89	Merge remote-tracking branch 'origin/10.4' into 10.5	2020-07-03 23:31:12 +03:00
Marko Mäkelä	1813d92d0c	Merge 10.4 into 10.5	2020-07-02 09:41:44 +03:00
Marko Mäkelä	f347b3e0e6	Merge 10.3 into 10.4	2020-07-02 07:39:33 +03:00
Marko Mäkelä	1df1a63924	Merge 10.2 into 10.3	2020-07-02 06:17:51 +03:00
Marko Mäkelä	c36834c832	MDEV-20377: Make WITH_MSAN more usable MemorySanitizer (clang -fsanitize=memory) requires that all code be compiled with instrumentation enabled. The only exception is the C runtime library. Failure to use instrumented libraries will cause bogus messages about memory being uninitialized. In WITH_MSAN builds, we must avoid calling getservbyname(), because even though it is a standard library function, it is not instrumented, not even in clang 10. Note: Before MariaDB Server 10.5, ./mtr will typically fail due to the old PCRE library, which was updated in MDEV-14024. The following cmake options were tested on 10.5 in commit `94d0bb4dbe`: cmake \ -DCMAKE_C_FLAGS='-march=native -O2' \ -DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \ -DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \ -DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \ -DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \ -DWITH_SAFEMALLOC=OFF \ -DWITH_{ZLIB,SSL,PCRE}=bundled \ -DHAVE_LIBAIO_H=0 \ -DWITH_MSAN=ON MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED() and __msan_unpoison(). MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow(). InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros. ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in functions instead of inline assembler when building WITH_MSAN. This will require at least -msse4.2 when building for IA-32 or AMD64. The inline assembler would not be instrumented, and would thus cause bogus failures.	2020-07-01 17:23:00 +03:00
Thirunarayanan Balathandayuthapani	572e53d8cc	MDEV-22931 mtr_t::mtr_t() allocates some memory mtr_t::m_freed_pages: Renamed from m_freed_ranges and made it as pointer indirection. mtr_t::add_freed_offset(): Allocates m_freed_pages. mtr_t:clear_freed_ranges(): Removed. mtr_t::init(): Added debug assertion to check whether m_freed_pages is not yet initialized. btr_page_alloc_low(): Remove #ifdef UNIV_DEBUG_SCRUBBING. mtr_t::commit(): Delete m_freed_pages, reset m_trim_pages and m_freed_in_system_tablespace. fil_space_t::clear_freed_ranges(): Added a comment to explain how undo log tablespaces uses it.	2020-06-19 21:12:13 +05:30
Thirunarayanan Balathandayuthapani	d0c69ccab5	MDEV-22911: Fix the valgrind & MSAN instrumentation of MDEV-8139 MEM_GET_VBITS(): Save information about uninitialized data. MEM_SET_VBITS(): Restore information about uninitialized data.	2020-06-16 20:03:35 +05:30
Thirunarayanan Balathandayuthapani	d34cc6b3fd	MDEV-8139: Fix the MSAN instrumentation	2020-06-12 21:57:33 +05:30
Thirunarayanan Balathandayuthapani	c92f7e287f	MDEV-8139 Fix Scrubbing fil_space_t::freed_ranges: Store ranges of freed page numbers. fil_space_t::last_freed_lsn: Store the most recent LSN of freeing a page. fil_space_t::freed_mutex: Protects freed_ranges, last_freed_lsn. fil_space_create(): Initialize the freed_range mutex. fil_space_free_low(): Frees the freed_range mutex. range_set: Ranges of page numbers. buf_page_create(): Removes the page from freed_ranges when page is being reused. btr_free_root(): Remove the PAGE_INDEX_ID invalidation. Because btr_free_root() and dict_drop_index_tree() are executed in the same atomic mini-transaction, there is no need to invalidate the root page. buf_release_freed_page(): Split from buf_flush_freed_page(). Skip any I/O buf_flush_freed_pages(): Get the freed ranges from tablespace and Write punch-hole or zeroes of the freed ranges. buf_flush_try_neighbors(): Handles the flushing of freed ranges. mtr_t::freed_pages: Variable to store the list of freed pages. mtr_t::add_freed_pages(): To add freed pages. mtr_t::clear_freed_pages(): To clear the freed pages. mtr_t::m_freed_in_system_tablespace: Variable to indicate whether page has been freed in system tablespace. mtr_t::m_trim_pages: Variable to indicate whether the space has been trimmed. mtr_t::commit(): Add the freed page and update the last freed lsn in the tablespace and clear the tablespace freed range if space is trimmed. file_name_t::freed_pages: Store the freed pages during recovery. file_name_t::add_freed_page(), file_name_t::remove_freed_page(): To add and remove freed page during recovery. store_freed_or_init_rec(): Store or remove the freed pages while encountering FREE_PAGE or INIT_PAGE redo log record. recv_init_crash_recovery_spaces(): Add the freed page encountered during recovery to respective tablespace.	2020-06-12 09:17:51 +05:30
Marko Mäkelä	17a7bafec0	MDEV-22110 preparation: Remove mtr_memo_contains macros Let us invoke the debug member functions of mtr_t directly. mtr_t::memo_contains(): Change the parameter type to const rw_lock_t&. This function cannot be invoked on buf_block_t::lock. The function mtr_t::memo_contains_flagged() is intended to be invoked on buf_block_t* or rw_lock_t*, and it along with mtr_t::memo_contains_page_flagged() are the way to check whether a buffer pool page has been latched within a mini-transaction.	2020-06-10 07:50:09 +03:00

1 2 3 4

185 commits