mirror of
https://github.com/MariaDB/server.git
synced 2025-01-29 10:14:19 +01:00
f2c17cc9d9
This is a 10.6 port of commit2f9e264781
from MariaDB Server 10.9 that is missing some optimization due to a more complex redo log format and recovery logic (which was simplified in commit685d958e38
). The progress reporting of InnoDB crash recovery was rather intermittent. Nothing was reported during the single-threaded log record parsing, which could consume minutes when parsing a large log. During log application, there only was progress reporting in background threads that would be invoked on data page read completion. The progress reporting here will be detailed like this: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1963895808 InnoDB: Multi-batch recovery needed at LSN 2534560930 InnoDB: Read redo log up to LSN=3312233472 InnoDB: Read redo log up to LSN=1599646720 InnoDB: Read redo log up to LSN=2160831488 InnoDB: To recover: LSN 2806789376/2806819840; 195082 pages InnoDB: To recover: LSN 2806789376/2806819840; 63507 pages InnoDB: Read redo log up to LSN=3195776000 InnoDB: Read redo log up to LSN=3687099392 InnoDB: Read redo log up to LSN=4165315584 InnoDB: To recover: LSN 4374395699/4374440960; 241454 pages InnoDB: To recover: LSN 4374395699/4374440960; 123701 pages InnoDB: Read redo log up to LSN=4508724224 InnoDB: Read redo log up to LSN=5094550528 InnoDB: To recover: 205230 pages The previous messages "Starting a batch to recover" or "Starting a final batch to recover" will be replaced by "To recover: ... pages" messages. If a batch lasts longer than 15 seconds, then there will be progress reports every 15 seconds, showing the number of remaining pages. For the non-final batch, the "To recover:" message includes two end LSN: that of the batch, and of the recovered log. This is the primary measure of progress. The batch will end once the number of pages to recover reaches 0. If recovery is possible in a single batch, the output will look like this, with a shorter "To recover:" message that counts only the remaining pages: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1984539648 InnoDB: Read redo log up to LSN=2710875136 InnoDB: Read redo log up to LSN=3358895104 InnoDB: Read redo log up to LSN=3965299712 InnoDB: Read redo log up to LSN=4557417472 InnoDB: Read redo log up to LSN=5219527680 InnoDB: To recover: 450915 pages We will also speed up recovery by improving the memory management and implementing multi-threaded recovery of data pages that will not need to be read into the buffer pool ("fake read"). Log application in the "fake read" threads will be protected by an atomic being_recovered field and exclusive buf_page_t::lock. Recovery will reserve for data pages two thirds of the buffer pool, or 256 pages, whichever is smaller. Previously, we could only use at most one third of the buffer pool for buffered log records. This would typically mean that with large buffer pools, recovery unnecessary consisted of multiple batches. If recovery runs out of memory, it will "roll back" or "rewind" the current mini-transaction. The recv_sys.recovered_lsn and recv_sys.pages will correspond to the "out of memory LSN", at the end of the previous complete mini-transaction. If recovery runs out of memory while executing the final recovery batch, we can simply invoke recv_sys.apply(false) to make room, and resume parsing. If recovery runs out of memory before the final batch, we will scan the redo log to the end and check for any missing or inconsistent files. In this version of the patch, we will throw away any previously buffered recv_sys.pages and rescan the log from the checkpoint onwards. recv_sys_t::pages_it: A cached iterator to recv_sys.pages. recv_sys_t::is_memory_exhausted(): Remove. We will have out-of-memory handling deep inside recv_sys_t::parse(). recv_sys_t::rewind(), page_recv_t::recs_t::rewind(): Remove all log starting with a specific LSN. IORequest::write_complete(), IORequest::read_complete(): Replaces fil_aio_callback(). read_io_callback(), write_io_callback(): Replaces io_callback(). IORequest::fake_read_complete(), fake_io_callback(), os_fake_read(): Process a "fake read" request for concurrent recovery. recv_sys_t::apply_batch(): Choose a number of successive pages for a recovery batch. recv_sys_t::erase(recv_sys_t::map::iterator): Remove log records for a page whose recovery is not in progress. Log application threads will not invoke this; they will only set being_recovered=-1 to indicate that the entry is no longer needed. recv_sys_t::garbage_collect(): Remove all being_recovered=-1 entries. recv_sys_t::wait_for_pool(): Wait for some space to become available in the buffer pool. mlog_init_t::mark_ibuf_exist(): Avoid calls to recv_sys::recover_low() via ibuf_page_exists() and buf_page_get_low(). Such calls would lead to double locking of recv_sys.mutex, which depending on implementation could cause a deadlock. We will use lower-level calls to look up index pages. buf_LRU_block_remove_hashed(): Disable consistency checks for freed ROW_FORMAT=COMPRESSED pages. Their contents could be uninitialized garbage. This fixes an occasional failure of the test innodb.innodb_bulk_create_index_debug. Tested by: Matthias Leich
120 lines
5.7 KiB
C
120 lines
5.7 KiB
C
/*****************************************************************************
|
|
|
|
Copyright (c) 1995, 2015, Oracle and/or its affiliates. All Rights Reserved.
|
|
Copyright (c) 2015, 2021, MariaDB Corporation.
|
|
|
|
This program is free software; you can redistribute it and/or modify it under
|
|
the terms of the GNU General Public License as published by the Free Software
|
|
Foundation; version 2 of the License.
|
|
|
|
This program is distributed in the hope that it will be useful, but WITHOUT
|
|
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
|
|
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License along with
|
|
this program; if not, write to the Free Software Foundation, Inc.,
|
|
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
|
|
|
|
*****************************************************************************/
|
|
|
|
/**************************************************//**
|
|
@file include/buf0rea.h
|
|
The database buffer read
|
|
|
|
Created 11/5/1995 Heikki Tuuri
|
|
*******************************************************/
|
|
|
|
#ifndef buf0rea_h
|
|
#define buf0rea_h
|
|
|
|
#include "buf0buf.h"
|
|
|
|
/** High-level function which reads a page asynchronously from a file to the
|
|
buffer buf_pool if it is not already there. Sets the io_fix flag and sets
|
|
an exclusive lock on the buffer frame. The flag is cleared and the x-lock
|
|
released by the i/o-handler thread.
|
|
@param page_id page id
|
|
@param zip_size ROW_FORMAT=COMPRESSED page size, or 0
|
|
@retval DB_SUCCESS if the page was read and is not corrupted
|
|
@retval DB_SUCCESS_LOCKED_REC if the page was not read
|
|
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted
|
|
@retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
|
|
after decryption normal page checksum does not match.
|
|
@retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */
|
|
dberr_t buf_read_page(const page_id_t page_id, ulint zip_size);
|
|
|
|
/** High-level function which reads a page asynchronously from a file to the
|
|
buffer buf_pool if it is not already there. Sets the io_fix flag and sets
|
|
an exclusive lock on the buffer frame. The flag is cleared and the x-lock
|
|
released by the i/o-handler thread.
|
|
@param[in,out] space tablespace
|
|
@param[in] page_id page id
|
|
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0 */
|
|
void buf_read_page_background(fil_space_t *space, const page_id_t page_id,
|
|
ulint zip_size)
|
|
MY_ATTRIBUTE((nonnull));
|
|
|
|
/** Applies a random read-ahead in buf_pool if there are at least a threshold
|
|
value of accessed pages from the random read-ahead area. Does not read any
|
|
page, not even the one at the position (space, offset), if the read-ahead
|
|
mechanism is not activated. NOTE 1: the calling thread may own latches on
|
|
pages: to avoid deadlocks this function must be written such that it cannot
|
|
end up waiting for these latches! NOTE 2: the calling thread must want
|
|
access to the page given: this rule is set to prevent unintended read-aheads
|
|
performed by ibuf routines, a situation which could result in a deadlock if
|
|
the OS does not support asynchronous i/o.
|
|
@param[in] page_id page id of a page which the current thread
|
|
wants to access
|
|
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
|
|
@param[in] ibuf whether we are inside ibuf routine
|
|
@return number of page read requests issued; NOTE that if we read ibuf
|
|
pages, it may happen that the page at the given page number does not
|
|
get read even if we return a positive value! */
|
|
ulint
|
|
buf_read_ahead_random(const page_id_t page_id, ulint zip_size, bool ibuf);
|
|
|
|
/** Applies linear read-ahead if in the buf_pool the page is a border page of
|
|
a linear read-ahead area and all the pages in the area have been accessed.
|
|
Does not read any page if the read-ahead mechanism is not activated. Note
|
|
that the algorithm looks at the 'natural' adjacent successor and
|
|
predecessor of the page, which on the leaf level of a B-tree are the next
|
|
and previous page in the chain of leaves. To know these, the page specified
|
|
in (space, offset) must already be present in the buf_pool. Thus, the
|
|
natural way to use this function is to call it when a page in the buf_pool
|
|
is accessed the first time, calling this function just after it has been
|
|
bufferfixed.
|
|
NOTE 1: as this function looks at the natural predecessor and successor
|
|
fields on the page, what happens, if these are not initialized to any
|
|
sensible value? No problem, before applying read-ahead we check that the
|
|
area to read is within the span of the space, if not, read-ahead is not
|
|
applied. An uninitialized value may result in a useless read operation, but
|
|
only very improbably.
|
|
NOTE 2: the calling thread may own latches on pages: to avoid deadlocks this
|
|
function must be written such that it cannot end up waiting for these
|
|
latches!
|
|
NOTE 3: the calling thread must want access to the page given: this rule is
|
|
set to prevent unintended read-aheads performed by ibuf routines, a situation
|
|
which could result in a deadlock if the OS does not support asynchronous io.
|
|
@param[in] page_id page id; see NOTE 3 above
|
|
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
|
|
@param[in] ibuf whether if we are inside ibuf routine
|
|
@return number of page read requests issued */
|
|
ulint
|
|
buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf);
|
|
|
|
/** Schedule a page for recovery.
|
|
@param space tablespace
|
|
@param page_id page identifier
|
|
@param recs log records
|
|
@param init page initialization, or nullptr if the page needs to be read */
|
|
void buf_read_recover(fil_space_t *space, const page_id_t page_id,
|
|
page_recv_t &recs, recv_init *init);
|
|
|
|
/** @name Modes used in read-ahead @{ */
|
|
/** read only pages belonging to the insert buffer tree */
|
|
#define BUF_READ_IBUF_PAGES_ONLY 131
|
|
/** read any page */
|
|
#define BUF_READ_ANY_PAGE 132
|
|
/* @} */
|
|
|
|
#endif
|