mariadb/storage/innobase/include/buf0lru.h
Marko Mäkelä cc89e5f94f MDEV-29445: Reimplement SET GLOBAL innodb_buffer_pool_size
We deprecate and ignore the parameter innodb_buffer_pool_chunk_size
and let the buffer pool size to be changed in arbitrary 1-megabyte
increments, all the way up to innodb_buffer_pool_size_max,
which must be specified at startup.

If innodb_buffer_pool_size_max is not specified, it will default to
twice the specified innodb_buffer_pool_size.

The buffer pool will be mapped in a contiguous memory area that
will be aligned and partitioned into extents of 8 MiB on 64-bit systems
and 2 MiB on 32-bit systems.

Within an extent, the first few innodb_page_size blocks contain
buf_block_t objects that will cover the page frames in the rest
of the extent. In this way, there is a trivial mapping between
page frames and block descriptors and we do not need any
lookup tables like buf_pool.zip_hash or buf_pool_t::chunk_t::map.

We will always allocate the same number of block descriptors for
an extent, even if we do not need all the buf_block_t in the last
extent in case the innodb_buffer_pool_size is not an integer multiple
of the of extents size.

The minimum innodb_buffer_pool_size is 256*5/4 pages. At the default
innodb_page_size=16k this corresponds to 5 MiB. However, now that the
innodb_buffer_pool_size includes the memory allocated for the block
descriptors, the minimum would be innodb_buffer_pool_size=6m.

Innodb_buffer_pool_resize_status: Remove. We will execute
buf_pool_t::resize() synchronously in the thread that is executing
SET GLOBAL innodb_buffer_pool_size. That operation will run until
it completes, or until a KILL statement is executed, the client
is disconnected, the buf_flush_page_cleaner() thread notices that
we are running out of memory, or the server is shut down.

my_large_virtual_alloc(): A new function, similar to my_large_malloc().
FIXME: On Microsoft Windows, let the caller know if large page allocation
was used. In that case, we must disallow buffer pool resizing.

buf_pool_t::create(), buf_pool_t::chunk_t::create(): Only initialize
the first page descriptor of each chunk.

buf_pool_t::lazy_allocate(): Lazily initialize a previously allocated
page descriptor and increase buf_pool.n_blocks, which must be below
buf_pool.n_blocks_alloc.

buf_pool_t::allocate(): Renamed from buf_LRU_get_free_only().

buf_pool_t::LRU_warned: Changed to Atomic_relaxed<bool>,
only to be modified by the buf_flush_page_cleaner() thread.

buf_pool_t::LRU_shrink(): Check if buffer pool shrinking needs
to process a buffer page.

buf_pool_t::resize(): Always zero out b->page.zip.data.
Failure to do so would cause crashes or corruption in
the test innodb.innodb_buffer_pool_resize due to
duplicated allocation in the buddy system.
Before tarting to shrink the buffer pool, run one batch of
buf_flush_page_cleaner() in order to prevent LRU_warn().
Abort shrinking if the buf_flush_page_cleaner() has LRU_warned.

buf_pool_t::first_to_withdraw: The first block descriptor that is
out of the bounds of the shrunk buffer pool.

buf_pool_t::withdrawn: The list of withdrawn blocks.
If buf_pool_t::resize() is aborted, we must be able to resurrect
the withdrawn blocks in the free list.

buf_pool_t::contains_zip(): Added a parameter for the
number of least significant pointer bits to disregard,
so that we can find any pointers to within a block
that is supposed to be free.

buf_pool_t::get_info(): Replaces buf_stats_get_pool_info().

innodb_init_param(): Refactored. We must first compute
srv_page_size_shift and then determine the valid bounds of
innodb_buffer_pool_size.

buf_buddy_shrink(): Replaces buf_buddy_realloc().
Part of the work is deferred to buf_buddy_condense_free(),
which is being executed when we are not holding any
buf_pool.page_hash latch.

buf_buddy_condense_free(): Do not relocate blocks.

buf_buddy_free_low(): Do not care about buffer pool shrinking.
This will be handled by buf_buddy_shrink() and
buf_buddy_condense_free().

buf_buddy_alloc_zip(): Assert !buf_pool.contains_zip()
when we are allocating from the binary buddy system.
Previously we were asserting this on multiple recursion levels.

buf_buddy_block_free(), buf_buddy_free_low():
Assert !buf_pool.contains_zip().

buf_buddy_alloc_from(): Remove the redundant parameter j.

buf_flush_LRU_list_batch(): Add the parameter shrinking.
If we are shrinking, invoke buf_pool_t::LRU_shrink() to see
if we must keep going.

buf_do_LRU_batch(): Skip buf_free_from_unzip_LRU_list_batch()
if we are shrinking the buffer pool. In that case, we want
to minimize the page relocations and just finish as quickly
as possible.

trx_purge_attach_undo_recs(): Limit purge_sys.n_pages_handled()
in every iteration, in case the buffer pool is being shrunk
in the middle of a purge batch.
2025-02-05 16:12:29 +02:00

195 lines
7.5 KiB
C

/*****************************************************************************
Copyright (c) 1995, 2016, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2017, 2021, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
*****************************************************************************/
/**************************************************//**
@file include/buf0lru.h
The database buffer pool LRU replacement algorithm
Created 11/5/1995 Heikki Tuuri
*******************************************************/
#pragma once
#include "buf0types.h"
#include "hash0hash.h"
// Forward declaration
struct trx_t;
struct fil_space_t;
/*#######################################################################
These are low-level functions
#########################################################################*/
/** Minimum LRU list length for which the LRU_old pointer is defined */
#define BUF_LRU_OLD_MIN_LEN 512 /* 8 megabytes of 16k pages */
/** Try to free a block. If bpage is a descriptor of a compressed-only
ROW_FORMAT=COMPRESSED page, the buf_page_t object will be freed as well.
The caller must hold buf_pool.mutex.
@param bpage block to be freed
@param zip whether to remove both copies of a ROW_FORMAT=COMPRESSED page
@retval true if freed and buf_pool.mutex may have been temporarily released
@retval false if the page was not freed */
bool buf_LRU_free_page(buf_page_t *bpage, bool zip)
MY_ATTRIBUTE((nonnull));
/** Try to free a replaceable block.
@param limit maximum number of blocks to scan
@return true if found and freed */
bool buf_LRU_scan_and_free_block(ulint limit= ULINT_UNDEFINED);
/** Get a block from the buf_pool.free list.
If the list is empty, blocks will be moved from the end of buf_pool.LRU
to buf_pool.free.
This function is called from a user thread when it needs a clean
block to read in a page. Note that we only ever get a block from
the free list. Even when we flush a page or find a page in LRU scan
we put it to free list to be used.
* iteration 0:
* get a block from the buf_pool.free list
* if buf_pool.try_LRU_scan is set
* scan LRU up to 100 pages to free a clean block
* success:retry the free list
* invoke buf_pool.page_cleaner_wakeup(true) and wait its completion
* subsequent iterations: same as iteration 0 except:
* scan the entire LRU list
@param have_mutex whether buf_pool.mutex is already being held
@return the free control block, in state BUF_BLOCK_MEMORY */
buf_block_t* buf_LRU_get_free_block(bool have_mutex)
MY_ATTRIBUTE((malloc,warn_unused_result));
#define buf_block_alloc() buf_LRU_get_free_block(false)
/** @return whether the unzip_LRU list should be used for evicting a victim
instead of the general LRU list */
bool buf_LRU_evict_from_unzip_LRU();
/** Free a buffer block which does not contain a file page,
while holding buf_pool.mutex.
@param block block to be put to buf_pool.free */
void buf_LRU_block_free_non_file_page(buf_block_t *block);
/******************************************************************//**
Adds a block to the LRU list. Please make sure that the page_size is
already set when invoking the function, so that we can get correct
page_size from the buffer page when adding a block into LRU */
void
buf_LRU_add_block(
/*==============*/
buf_page_t* bpage, /*!< in: control block */
bool old); /*!< in: true if should be put to the old
blocks in the LRU list, else put to the
start; if the LRU list is very short, added to
the start regardless of this parameter */
/** Move a block to the start of the buf_pool.LRU list.
@param bpage buffer pool page */
void buf_page_make_young(buf_page_t *bpage);
/** Flag a page accessed in buf_pool and move it to the start of buf_pool.LRU
if it is too old.
@param bpage buffer pool page
@return whether this is not the first access */
bool buf_page_make_young_if_needed(buf_page_t *bpage);
/******************************************************************//**
Adds a block to the LRU list of decompressed zip pages. */
void
buf_unzip_LRU_add_block(
/*====================*/
buf_block_t* block, /*!< in: control block */
ibool old); /*!< in: TRUE if should be put to the end
of the list, else put to the start */
/** Update buf_pool.LRU_old_ratio.
@param[in] old_pct Reserve this percentage of
the buffer pool for "old" blocks
@param[in] adjust true=adjust the LRU list;
false=just assign buf_pool.LRU_old_ratio
during the initialization of InnoDB
@return updated old_pct */
uint buf_LRU_old_ratio_update(uint old_pct, bool adjust);
/********************************************************************//**
Update the historical stats that we are collecting for LRU eviction
policy at the end of each interval. */
void
buf_LRU_stat_update();
#ifdef UNIV_DEBUG
/** Validate the LRU list. */
void buf_LRU_validate();
#endif /* UNIV_DEBUG */
#if defined UNIV_DEBUG_PRINT || defined UNIV_DEBUG
/** Dump the LRU list to stderr. */
void buf_LRU_print();
#endif /* UNIV_DEBUG_PRINT || UNIV_DEBUG */
/** @name Heuristics for detecting index scan @{ */
/** The denominator of buf_pool.LRU_old_ratio. */
#define BUF_LRU_OLD_RATIO_DIV 1024
/** Maximum value of buf_pool.LRU_old_ratio.
@see buf_LRU_old_adjust_len
@see buf_pool.LRU_old_ratio_update */
#define BUF_LRU_OLD_RATIO_MAX BUF_LRU_OLD_RATIO_DIV
/** Minimum value of buf_pool.LRU_old_ratio.
@see buf_LRU_old_adjust_len
@see buf_pool.LRU_old_ratio_update
The minimum must exceed
(BUF_LRU_OLD_TOLERANCE + 5) * BUF_LRU_OLD_RATIO_DIV / BUF_LRU_OLD_MIN_LEN. */
#define BUF_LRU_OLD_RATIO_MIN 51
#if BUF_LRU_OLD_RATIO_MIN >= BUF_LRU_OLD_RATIO_MAX
# error "BUF_LRU_OLD_RATIO_MIN >= BUF_LRU_OLD_RATIO_MAX"
#endif
#if BUF_LRU_OLD_RATIO_MAX > BUF_LRU_OLD_RATIO_DIV
# error "BUF_LRU_OLD_RATIO_MAX > BUF_LRU_OLD_RATIO_DIV"
#endif
/** Move blocks to "new" LRU list only if the first access was at
least this many milliseconds ago. Not protected by any mutex or latch. */
extern uint buf_LRU_old_threshold_ms;
/* @} */
/** @brief Statistics for selecting the LRU list for eviction.
These statistics are not 'of' LRU but 'for' LRU. We keep count of I/O
and page_zip_decompress() operations. Based on the statistics we decide
if we want to evict from buf_pool.unzip_LRU or buf_pool.LRU. */
struct buf_LRU_stat_t
{
ulint io; /**< Counter of buffer pool I/O operations. */
ulint unzip; /**< Counter of page_zip_decompress operations. */
};
/** Current operation counters. Not protected by any mutex.
Cleared by buf_LRU_stat_update(). */
extern buf_LRU_stat_t buf_LRU_stat_cur;
/** Running sum of past values of buf_LRU_stat_cur.
Updated by buf_LRU_stat_update(). Protected by buf_pool.mutex. */
extern buf_LRU_stat_t buf_LRU_stat_sum;
/********************************************************************//**
Increments the I/O counter in buf_LRU_stat_cur. */
#define buf_LRU_stat_inc_io() buf_LRU_stat_cur.io++
/********************************************************************//**
Increments the page_zip_decompress() counter in buf_LRU_stat_cur. */
#define buf_LRU_stat_inc_unzip() buf_LRU_stat_cur.unzip++