mirror of
https://github.com/MariaDB/server.git
synced 2025-01-30 18:41:56 +01:00
de4030e4d4
This also fixes part of MDEV-29835 Partial server freeze which is caused by violations of the latching order that was defined in https://dev.mysql.com/worklog/task/?id=6326 (WL#6326: InnoDB: fix index->lock contention). Unless the current thread is holding an exclusive dict_index_t::lock, it must acquire page latches in a strict parent-to-child, left-to-right order. Not all cases of MDEV-29835 are fixed yet. Failure to follow the correct latching order will cause deadlocks of threads due to lock order inversion. As part of these changes, the BTR_MODIFY_TREE mode is modified so that an Update latch (U a.k.a. SX) will be acquired on the root page, and eXclusive latches (X) will be acquired on all pages leading to the leaf page, as well as any left and right siblings of the pages along the path. The DEBUG_SYNC test innodb.innodb_wl6326 will be removed, because at the time the DEBUG_SYNC point is hit, the thread is actually holding several page latches that will be blocking a concurrent SELECT statement. We also remove double bookkeeping that was caused due to excessive information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo store information of latched pages, and ensure that mtr_memo_slot_t::object is never a null pointer. The tree_blocks[] and tree_savepoints[] were redundant. buf_page_get_low(): If innodb_change_buffering_debug=1, to avoid a hang, do not try to evict blocks if we are holding a latch on a modified page. The test innodb.innodb-change-buffer-recovery will be removed, because change buffering may no longer be forced by debug injection when the change buffer comprises multiple pages. Remove a debug assertion that could fail when innodb_change_buffering_debug=1 fails to evict a page. For other cases, the assertion is redundant, because we already checked that right after the got_block: label. The test innodb.innodb-change-buffering-recovery will be removed, because due to this change, we will be unable to evict the desired page. mtr_t::lock_register(): Register a change of a page latch on an unmodified buffer-fixed block. mtr_t::x_latch_at_savepoint(), mtr_t::sx_latch_at_savepoint(): Replaced by the use of mtr_t::upgrade_buffer_fix(), which now also handles RW_S_LATCH. mtr_t::set_modified(): For temporary tables, invoke buf_page_t::set_modified() here and not in mtr_t::commit(). We will never set the MTR_MEMO_MODIFY flag on other than persistent data pages, nor set mtr_t::m_modifications when temporary data pages are modified. mtr_t::commit(): Only invoke the buf_flush_note_modification() loop if persistent data pages were modified. mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo. This avoids many redundant entries in mtr_t::m_memo, as well as redundant calls to buf_page_get_gen() for blocks that had already been looked up in a mini-transaction. btr_get_latched_root(): Return a pointer to an already latched root page. This replaces btr_root_block_get() in cases where the mini-transaction has already latched the root page. btr_page_get_parent(): Fetch a parent page that was already latched in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched(). If needed, upgrade the root page U latch to X. This avoids bloating mtr_t::m_memo as well as performing redundant buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for B-tree defragmentation, we will invoke btr_cur_search_to_nth_level(). btr_cur_search_to_nth_level(): This will only be used for non-leaf (level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be removed altogether, or retained for the case of CHECK TABLE without QUICK. btr_cur_t::left_block: Remove. btr_pcur_move_backward_from_page() can retrieve the left sibling from the end of mtr_t::m_memo. btr_cur_t::open_leaf(): Some clean-up. btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level() for searches to level=0 (the leaf level). We will never release parent page latches before acquiring leaf page latches. If we need to temporarily release the level=1 page latch in the BTR_SEARCH_PREV or BTR_MODIFY_PREV latch_mode, we will reposition the cursor on the child node pointer so that we will land on the correct leaf page. btr_cur_t::pessimistic_search_leaf(): Implement new BTR_MODIFY_TREE latching logic in the case that page splits or merges will be needed. The parent pages (and their siblings) should already be latched on the first dive to the leaf and be present in mtr_t::m_memo; there should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost suffices; it must be revised in MDEV-29835 and work-arounds removed for cases where mtr_t::get_already_latched() fails to find a block. rtr_search_to_nth_level(): A SPATIAL INDEX version of btr_search_to_nth_level() that can search to any level (including the leaf level). rtr_search_leaf(), rtr_insert_leaf(): Wrappers for rtr_search_to_nth_level(). rtr_search(): Replaces rtr_pcur_open(). rtr_latch_leaves(): Replaces btr_cur_latch_leaves(). Note that unlike in the B-tree code, there is no error handling in case the sibling pages are corrupted. rtr_cur_restore_position(): Remove an unused constant parameter. btr_pcur_open_on_user_rec(): Remove the constant parameter mode=PAGE_CUR_GE. row_ins_clust_index_entry_low(): Use a new mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC. BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove. BTR_CONT_MODIFY_TREE: Note that this is only used by rtr_search_to_nth_level(). btr_pcur_optimistic_latch_leaves(): Replaces btr_cur_optimistic_latch_leaves(). ibuf_delete_rec(): Acquire exclusive ibuf.index->lock in order to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV). btr_blob_log_check_t(): Acquire a U latch on the root page, so that btr_page_alloc() in btr_store_big_rec_extern_fields() will avoid a deadlock. btr_store_big_rec_extern_fields(): Assert that the root page latch is being held. Tested by: Matthias Leich Reviewed by: Vladislav Lesin
275 lines
8.6 KiB
C++
275 lines
8.6 KiB
C++
/*****************************************************************************
|
|
|
|
Copyright (c) 1997, 2015, Oracle and/or its affiliates. All Rights Reserved.
|
|
|
|
This program is free software; you can redistribute it and/or modify it under
|
|
the terms of the GNU General Public License as published by the Free Software
|
|
Foundation; version 2 of the License.
|
|
|
|
This program is distributed in the hope that it will be useful, but WITHOUT
|
|
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
|
|
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License along with
|
|
this program; if not, write to the Free Software Foundation, Inc.,
|
|
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
|
|
|
|
*****************************************************************************/
|
|
|
|
/**************************************************//**
|
|
@file include/ibuf0ibuf.ic
|
|
Insert buffer
|
|
|
|
Created 7/19/1997 Heikki Tuuri
|
|
*******************************************************/
|
|
|
|
#include "page0page.h"
|
|
#include "page0zip.h"
|
|
#include "fsp0types.h"
|
|
#include "buf0lru.h"
|
|
|
|
/** An index page must contain at least srv_page_size /
|
|
IBUF_PAGE_SIZE_PER_FREE_SPACE bytes of free space for ibuf to try to
|
|
buffer inserts to this page. If there is this much of free space, the
|
|
corresponding bits are set in the ibuf bitmap. */
|
|
#define IBUF_PAGE_SIZE_PER_FREE_SPACE 32
|
|
|
|
/***************************************************************//**
|
|
Starts an insert buffer mini-transaction. */
|
|
UNIV_INLINE
|
|
void
|
|
ibuf_mtr_start(
|
|
/*===========*/
|
|
mtr_t* mtr) /*!< out: mini-transaction */
|
|
{
|
|
mtr_start(mtr);
|
|
mtr->enter_ibuf();
|
|
|
|
if (high_level_read_only || srv_read_only_mode) {
|
|
mtr_set_log_mode(mtr, MTR_LOG_NO_REDO);
|
|
}
|
|
|
|
}
|
|
/***************************************************************//**
|
|
Commits an insert buffer mini-transaction. */
|
|
UNIV_INLINE
|
|
void
|
|
ibuf_mtr_commit(
|
|
/*============*/
|
|
mtr_t* mtr) /*!< in/out: mini-transaction */
|
|
{
|
|
ut_ad(mtr->is_inside_ibuf());
|
|
ut_d(mtr->exit_ibuf());
|
|
|
|
mtr_commit(mtr);
|
|
}
|
|
|
|
/************************************************************************//**
|
|
Sets the free bit of the page in the ibuf bitmap. This is done in a separate
|
|
mini-transaction, hence this operation does not restrict further work to only
|
|
ibuf bitmap operations, which would result if the latch to the bitmap page
|
|
were kept. */
|
|
void
|
|
ibuf_set_free_bits_func(
|
|
/*====================*/
|
|
buf_block_t* block, /*!< in: index page of a non-clustered index;
|
|
free bit is reset if page level is 0 */
|
|
#ifdef UNIV_IBUF_DEBUG
|
|
ulint max_val,/*!< in: ULINT_UNDEFINED or a maximum
|
|
value which the bits must have before
|
|
setting; this is for debugging */
|
|
#endif /* UNIV_IBUF_DEBUG */
|
|
ulint val); /*!< in: value to set: < 4 */
|
|
#ifdef UNIV_IBUF_DEBUG
|
|
# define ibuf_set_free_bits(b,v,max) ibuf_set_free_bits_func(b,max,v)
|
|
#else /* UNIV_IBUF_DEBUG */
|
|
# define ibuf_set_free_bits(b,v,max) ibuf_set_free_bits_func(b,v)
|
|
#endif /* UNIV_IBUF_DEBUG */
|
|
|
|
/**********************************************************************//**
|
|
A basic partial test if an insert to the insert buffer could be possible and
|
|
recommended. */
|
|
UNIV_INLINE
|
|
ibool
|
|
ibuf_should_try(
|
|
/*============*/
|
|
dict_index_t* index, /*!< in: index where to insert */
|
|
ulint ignore_sec_unique) /*!< in: if != 0, we should
|
|
ignore UNIQUE constraint on
|
|
a secondary index when we
|
|
decide */
|
|
{
|
|
return(innodb_change_buffering
|
|
&& !(index->type & (DICT_CLUSTERED | DICT_IBUF))
|
|
&& ibuf.max_size != 0
|
|
&& index->table->quiesce == QUIESCE_NONE
|
|
&& (ignore_sec_unique || !dict_index_is_unique(index)));
|
|
}
|
|
|
|
/******************************************************************//**
|
|
Returns TRUE if the current OS thread is performing an insert buffer
|
|
routine.
|
|
|
|
For instance, a read-ahead of non-ibuf pages is forbidden by threads
|
|
that are executing an insert buffer routine.
|
|
@return TRUE if inside an insert buffer routine */
|
|
UNIV_INLINE
|
|
ibool
|
|
ibuf_inside(
|
|
/*========*/
|
|
const mtr_t* mtr) /*!< in: mini-transaction */
|
|
{
|
|
return(mtr->is_inside_ibuf());
|
|
}
|
|
|
|
/** Translates the free space on a page to a value in the ibuf bitmap.
|
|
@param[in] page_size page size in bytes
|
|
@param[in] max_ins_size maximum insert size after reorganize for
|
|
the page
|
|
@return value for ibuf bitmap bits */
|
|
UNIV_INLINE
|
|
ulint
|
|
ibuf_index_page_calc_free_bits(
|
|
ulint page_size,
|
|
ulint max_ins_size)
|
|
{
|
|
ulint n;
|
|
ut_ad(ut_is_2pow(page_size));
|
|
ut_ad(page_size > IBUF_PAGE_SIZE_PER_FREE_SPACE);
|
|
|
|
n = max_ins_size / (page_size / IBUF_PAGE_SIZE_PER_FREE_SPACE);
|
|
|
|
if (n == 3) {
|
|
n = 2;
|
|
}
|
|
|
|
if (n > 3) {
|
|
n = 3;
|
|
}
|
|
|
|
return(n);
|
|
}
|
|
|
|
/*********************************************************************//**
|
|
Translates the free space on a compressed page to a value in the ibuf bitmap.
|
|
@return value for ibuf bitmap bits */
|
|
UNIV_INLINE
|
|
ulint
|
|
ibuf_index_page_calc_free_zip(
|
|
/*==========================*/
|
|
const buf_block_t* block) /*!< in: buffer block */
|
|
{
|
|
ulint max_ins_size;
|
|
const page_zip_des_t* page_zip;
|
|
lint zip_max_ins;
|
|
|
|
ut_ad(block->page.zip.data);
|
|
|
|
/* Consider the maximum insert size on the uncompressed page
|
|
without reorganizing the page. We must not assume anything
|
|
about the compression ratio. If zip_max_ins > max_ins_size and
|
|
there is 1/4 garbage on the page, recompression after the
|
|
reorganize could fail, in theory. So, let us guarantee that
|
|
merging a buffered insert to a compressed page will always
|
|
succeed without reorganizing or recompressing the page, just
|
|
by using the page modification log. */
|
|
max_ins_size = page_get_max_insert_size(
|
|
buf_block_get_frame(block), 1);
|
|
|
|
page_zip = buf_block_get_page_zip(block);
|
|
zip_max_ins = page_zip_max_ins_size(page_zip,
|
|
FALSE/* not clustered */);
|
|
|
|
if (zip_max_ins < 0) {
|
|
return(0);
|
|
} else if (max_ins_size > (ulint) zip_max_ins) {
|
|
max_ins_size = (ulint) zip_max_ins;
|
|
}
|
|
|
|
return(ibuf_index_page_calc_free_bits(block->physical_size(),
|
|
max_ins_size));
|
|
}
|
|
|
|
/*********************************************************************//**
|
|
Translates the free space on a page to a value in the ibuf bitmap.
|
|
@return value for ibuf bitmap bits */
|
|
UNIV_INLINE
|
|
ulint
|
|
ibuf_index_page_calc_free(
|
|
/*======================*/
|
|
const buf_block_t* block) /*!< in: buffer block */
|
|
{
|
|
if (!block->page.zip.data) {
|
|
ulint max_ins_size;
|
|
|
|
max_ins_size = page_get_max_insert_size_after_reorganize(
|
|
buf_block_get_frame(block), 1);
|
|
|
|
return(ibuf_index_page_calc_free_bits(
|
|
block->physical_size(), max_ins_size));
|
|
} else {
|
|
return(ibuf_index_page_calc_free_zip(block));
|
|
}
|
|
}
|
|
|
|
/************************************************************************//**
|
|
Updates the free bits of an uncompressed page in the ibuf bitmap if
|
|
there is not enough free on the page any more. This is done in a
|
|
separate mini-transaction, hence this operation does not restrict
|
|
further work to only ibuf bitmap operations, which would result if the
|
|
latch to the bitmap page were kept. NOTE: The free bits in the insert
|
|
buffer bitmap must never exceed the free space on a page. It is
|
|
unsafe to increment the bits in a separately committed
|
|
mini-transaction, because in crash recovery, the free bits could
|
|
momentarily be set too high. It is only safe to use this function for
|
|
decrementing the free bits. Should more free space become available,
|
|
we must not update the free bits here, because that would break crash
|
|
recovery. */
|
|
UNIV_INLINE
|
|
void
|
|
ibuf_update_free_bits_if_full(
|
|
/*==========================*/
|
|
buf_block_t* block, /*!< in: index page to which we have added new
|
|
records; the free bits are updated if the
|
|
index is non-clustered and non-unique and
|
|
the page level is 0, and the page becomes
|
|
fuller */
|
|
ulint max_ins_size,/*!< in: value of maximum insert size with
|
|
reorganize before the latest operation
|
|
performed to the page */
|
|
ulint increase)/*!< in: upper limit for the additional space
|
|
used in the latest operation, if known, or
|
|
ULINT_UNDEFINED */
|
|
{
|
|
ulint before;
|
|
ulint after;
|
|
|
|
ut_ad(buf_block_get_page_zip(block) == NULL);
|
|
|
|
before = ibuf_index_page_calc_free_bits(
|
|
srv_page_size, max_ins_size);
|
|
|
|
if (max_ins_size >= increase) {
|
|
compile_time_assert(ULINT32_UNDEFINED > UNIV_PAGE_SIZE_MAX);
|
|
after = ibuf_index_page_calc_free_bits(
|
|
srv_page_size, max_ins_size - increase);
|
|
#ifdef UNIV_IBUF_DEBUG
|
|
ut_a(after <= ibuf_index_page_calc_free(block));
|
|
#endif
|
|
} else {
|
|
after = ibuf_index_page_calc_free(block);
|
|
}
|
|
|
|
if (after == 0) {
|
|
/* We move the page to the front of the buffer pool LRU list:
|
|
the purpose of this is to prevent those pages to which we
|
|
cannot make inserts using the insert buffer from slipping
|
|
out of the buffer pool */
|
|
|
|
buf_page_make_young(&block->page);
|
|
}
|
|
|
|
if (before > after) {
|
|
ibuf_set_free_bits(block, after, before);
|
|
}
|
|
}
|