mariadb/storage/innobase/include/ibuf0ibuf.inl
Marko Mäkelä de4030e4d4 MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT
This also fixes part of MDEV-29835 Partial server freeze
which is caused by violations of the latching order that was
defined in https://dev.mysql.com/worklog/task/?id=6326
(WL#6326: InnoDB: fix index->lock contention). Unless the
current thread is holding an exclusive dict_index_t::lock,
it must acquire page latches in a strict parent-to-child,
left-to-right order. Not all cases of MDEV-29835 are fixed yet.
Failure to follow the correct latching order will cause deadlocks
of threads due to lock order inversion.

As part of these changes, the BTR_MODIFY_TREE mode is modified
so that an Update latch (U a.k.a. SX) will be acquired on the
root page, and eXclusive latches (X) will be acquired on all pages
leading to the leaf page, as well as any left and right siblings
of the pages along the path. The DEBUG_SYNC test innodb.innodb_wl6326
will be removed, because at the time the DEBUG_SYNC point is hit,
the thread is actually holding several page latches that will be
blocking a concurrent SELECT statement.

We also remove double bookkeeping that was caused due to excessive
information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo
store information of latched pages, and ensure that
mtr_memo_slot_t::object is never a null pointer.
The tree_blocks[] and tree_savepoints[] were redundant.

buf_page_get_low(): If innodb_change_buffering_debug=1, to avoid
a hang, do not try to evict blocks if we are holding a latch on
a modified page. The test innodb.innodb-change-buffer-recovery
will be removed, because change buffering may no longer be forced
by debug injection when the change buffer comprises multiple pages.
Remove a debug assertion that could fail when
innodb_change_buffering_debug=1 fails to evict a page.
For other cases, the assertion is redundant, because we already
checked that right after the got_block: label. The test
innodb.innodb-change-buffering-recovery will be removed, because
due to this change, we will be unable to evict the desired page.

mtr_t::lock_register(): Register a change of a page latch
on an unmodified buffer-fixed block.

mtr_t::x_latch_at_savepoint(), mtr_t::sx_latch_at_savepoint():
Replaced by the use of mtr_t::upgrade_buffer_fix(), which now
also handles RW_S_LATCH.

mtr_t::set_modified(): For temporary tables, invoke
buf_page_t::set_modified() here and not in mtr_t::commit().
We will never set the MTR_MEMO_MODIFY flag on other than
persistent data pages, nor set mtr_t::m_modifications when
temporary data pages are modified.

mtr_t::commit(): Only invoke the buf_flush_note_modification() loop
if persistent data pages were modified.

mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo.
This avoids many redundant entries in mtr_t::m_memo, as well as
redundant calls to buf_page_get_gen() for blocks that had already
been looked up in a mini-transaction.

btr_get_latched_root(): Return a pointer to an already latched root page.
This replaces btr_root_block_get() in cases where the mini-transaction
has already latched the root page.

btr_page_get_parent(): Fetch a parent page that was already latched
in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched().
If needed, upgrade the root page U latch to X.
This avoids bloating mtr_t::m_memo as well as performing redundant
buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for
B-tree defragmentation, we will invoke btr_cur_search_to_nth_level().

btr_cur_search_to_nth_level(): This will only be used for non-leaf
(level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE
or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be
removed altogether, or retained for the case of
CHECK TABLE without QUICK.

btr_cur_t::left_block: Remove. btr_pcur_move_backward_from_page()
can retrieve the left sibling from the end of mtr_t::m_memo.

btr_cur_t::open_leaf(): Some clean-up.

btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level()
for searches to level=0 (the leaf level). We will never release
parent page latches before acquiring leaf page latches. If we need to
temporarily release the level=1 page latch in the BTR_SEARCH_PREV or
BTR_MODIFY_PREV latch_mode, we will reposition the cursor on the
child node pointer so that we will land on the correct leaf page.

btr_cur_t::pessimistic_search_leaf(): Implement new BTR_MODIFY_TREE
latching logic in the case that page splits or merges will be needed.
The parent pages (and their siblings) should already be latched on
the first dive to the leaf and be present in mtr_t::m_memo; there
should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost
suffices; it must be revised in MDEV-29835 and work-arounds removed
for cases where mtr_t::get_already_latched() fails to find a block.

rtr_search_to_nth_level(): A SPATIAL INDEX version of
btr_search_to_nth_level() that can search to any level
(including the leaf level).

rtr_search_leaf(), rtr_insert_leaf(): Wrappers for
rtr_search_to_nth_level().

rtr_search(): Replaces rtr_pcur_open().

rtr_latch_leaves(): Replaces btr_cur_latch_leaves(). Note that unlike
in the B-tree code, there is no error handling in case the sibling
pages are corrupted.

rtr_cur_restore_position(): Remove an unused constant parameter.

btr_pcur_open_on_user_rec(): Remove the constant parameter
mode=PAGE_CUR_GE.

row_ins_clust_index_entry_low(): Use a new
mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page
when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC.

BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove.

BTR_CONT_MODIFY_TREE: Note that this is only used by
rtr_search_to_nth_level().

btr_pcur_optimistic_latch_leaves(): Replaces
btr_cur_optimistic_latch_leaves().

ibuf_delete_rec(): Acquire exclusive ibuf.index->lock in order
to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV).

btr_blob_log_check_t(): Acquire a U latch on the root page,
so that btr_page_alloc() in btr_store_big_rec_extern_fields()
will avoid a deadlock.

btr_store_big_rec_extern_fields(): Assert that the root page latch
is being held.

Tested by: Matthias Leich
Reviewed by: Vladislav Lesin
2023-01-24 14:09:21 +02:00

275 lines
8.6 KiB
C++

/*****************************************************************************
Copyright (c) 1997, 2015, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
*****************************************************************************/
/**************************************************//**
@file include/ibuf0ibuf.ic
Insert buffer
Created 7/19/1997 Heikki Tuuri
*******************************************************/
#include "page0page.h"
#include "page0zip.h"
#include "fsp0types.h"
#include "buf0lru.h"
/** An index page must contain at least srv_page_size /
IBUF_PAGE_SIZE_PER_FREE_SPACE bytes of free space for ibuf to try to
buffer inserts to this page. If there is this much of free space, the
corresponding bits are set in the ibuf bitmap. */
#define IBUF_PAGE_SIZE_PER_FREE_SPACE 32
/***************************************************************//**
Starts an insert buffer mini-transaction. */
UNIV_INLINE
void
ibuf_mtr_start(
/*===========*/
mtr_t* mtr) /*!< out: mini-transaction */
{
mtr_start(mtr);
mtr->enter_ibuf();
if (high_level_read_only || srv_read_only_mode) {
mtr_set_log_mode(mtr, MTR_LOG_NO_REDO);
}
}
/***************************************************************//**
Commits an insert buffer mini-transaction. */
UNIV_INLINE
void
ibuf_mtr_commit(
/*============*/
mtr_t* mtr) /*!< in/out: mini-transaction */
{
ut_ad(mtr->is_inside_ibuf());
ut_d(mtr->exit_ibuf());
mtr_commit(mtr);
}
/************************************************************************//**
Sets the free bit of the page in the ibuf bitmap. This is done in a separate
mini-transaction, hence this operation does not restrict further work to only
ibuf bitmap operations, which would result if the latch to the bitmap page
were kept. */
void
ibuf_set_free_bits_func(
/*====================*/
buf_block_t* block, /*!< in: index page of a non-clustered index;
free bit is reset if page level is 0 */
#ifdef UNIV_IBUF_DEBUG
ulint max_val,/*!< in: ULINT_UNDEFINED or a maximum
value which the bits must have before
setting; this is for debugging */
#endif /* UNIV_IBUF_DEBUG */
ulint val); /*!< in: value to set: < 4 */
#ifdef UNIV_IBUF_DEBUG
# define ibuf_set_free_bits(b,v,max) ibuf_set_free_bits_func(b,max,v)
#else /* UNIV_IBUF_DEBUG */
# define ibuf_set_free_bits(b,v,max) ibuf_set_free_bits_func(b,v)
#endif /* UNIV_IBUF_DEBUG */
/**********************************************************************//**
A basic partial test if an insert to the insert buffer could be possible and
recommended. */
UNIV_INLINE
ibool
ibuf_should_try(
/*============*/
dict_index_t* index, /*!< in: index where to insert */
ulint ignore_sec_unique) /*!< in: if != 0, we should
ignore UNIQUE constraint on
a secondary index when we
decide */
{
return(innodb_change_buffering
&& !(index->type & (DICT_CLUSTERED | DICT_IBUF))
&& ibuf.max_size != 0
&& index->table->quiesce == QUIESCE_NONE
&& (ignore_sec_unique || !dict_index_is_unique(index)));
}
/******************************************************************//**
Returns TRUE if the current OS thread is performing an insert buffer
routine.
For instance, a read-ahead of non-ibuf pages is forbidden by threads
that are executing an insert buffer routine.
@return TRUE if inside an insert buffer routine */
UNIV_INLINE
ibool
ibuf_inside(
/*========*/
const mtr_t* mtr) /*!< in: mini-transaction */
{
return(mtr->is_inside_ibuf());
}
/** Translates the free space on a page to a value in the ibuf bitmap.
@param[in] page_size page size in bytes
@param[in] max_ins_size maximum insert size after reorganize for
the page
@return value for ibuf bitmap bits */
UNIV_INLINE
ulint
ibuf_index_page_calc_free_bits(
ulint page_size,
ulint max_ins_size)
{
ulint n;
ut_ad(ut_is_2pow(page_size));
ut_ad(page_size > IBUF_PAGE_SIZE_PER_FREE_SPACE);
n = max_ins_size / (page_size / IBUF_PAGE_SIZE_PER_FREE_SPACE);
if (n == 3) {
n = 2;
}
if (n > 3) {
n = 3;
}
return(n);
}
/*********************************************************************//**
Translates the free space on a compressed page to a value in the ibuf bitmap.
@return value for ibuf bitmap bits */
UNIV_INLINE
ulint
ibuf_index_page_calc_free_zip(
/*==========================*/
const buf_block_t* block) /*!< in: buffer block */
{
ulint max_ins_size;
const page_zip_des_t* page_zip;
lint zip_max_ins;
ut_ad(block->page.zip.data);
/* Consider the maximum insert size on the uncompressed page
without reorganizing the page. We must not assume anything
about the compression ratio. If zip_max_ins > max_ins_size and
there is 1/4 garbage on the page, recompression after the
reorganize could fail, in theory. So, let us guarantee that
merging a buffered insert to a compressed page will always
succeed without reorganizing or recompressing the page, just
by using the page modification log. */
max_ins_size = page_get_max_insert_size(
buf_block_get_frame(block), 1);
page_zip = buf_block_get_page_zip(block);
zip_max_ins = page_zip_max_ins_size(page_zip,
FALSE/* not clustered */);
if (zip_max_ins < 0) {
return(0);
} else if (max_ins_size > (ulint) zip_max_ins) {
max_ins_size = (ulint) zip_max_ins;
}
return(ibuf_index_page_calc_free_bits(block->physical_size(),
max_ins_size));
}
/*********************************************************************//**
Translates the free space on a page to a value in the ibuf bitmap.
@return value for ibuf bitmap bits */
UNIV_INLINE
ulint
ibuf_index_page_calc_free(
/*======================*/
const buf_block_t* block) /*!< in: buffer block */
{
if (!block->page.zip.data) {
ulint max_ins_size;
max_ins_size = page_get_max_insert_size_after_reorganize(
buf_block_get_frame(block), 1);
return(ibuf_index_page_calc_free_bits(
block->physical_size(), max_ins_size));
} else {
return(ibuf_index_page_calc_free_zip(block));
}
}
/************************************************************************//**
Updates the free bits of an uncompressed page in the ibuf bitmap if
there is not enough free on the page any more. This is done in a
separate mini-transaction, hence this operation does not restrict
further work to only ibuf bitmap operations, which would result if the
latch to the bitmap page were kept. NOTE: The free bits in the insert
buffer bitmap must never exceed the free space on a page. It is
unsafe to increment the bits in a separately committed
mini-transaction, because in crash recovery, the free bits could
momentarily be set too high. It is only safe to use this function for
decrementing the free bits. Should more free space become available,
we must not update the free bits here, because that would break crash
recovery. */
UNIV_INLINE
void
ibuf_update_free_bits_if_full(
/*==========================*/
buf_block_t* block, /*!< in: index page to which we have added new
records; the free bits are updated if the
index is non-clustered and non-unique and
the page level is 0, and the page becomes
fuller */
ulint max_ins_size,/*!< in: value of maximum insert size with
reorganize before the latest operation
performed to the page */
ulint increase)/*!< in: upper limit for the additional space
used in the latest operation, if known, or
ULINT_UNDEFINED */
{
ulint before;
ulint after;
ut_ad(buf_block_get_page_zip(block) == NULL);
before = ibuf_index_page_calc_free_bits(
srv_page_size, max_ins_size);
if (max_ins_size >= increase) {
compile_time_assert(ULINT32_UNDEFINED > UNIV_PAGE_SIZE_MAX);
after = ibuf_index_page_calc_free_bits(
srv_page_size, max_ins_size - increase);
#ifdef UNIV_IBUF_DEBUG
ut_a(after <= ibuf_index_page_calc_free(block));
#endif
} else {
after = ibuf_index_page_calc_free(block);
}
if (after == 0) {
/* We move the page to the front of the buffer pool LRU list:
the purpose of this is to prevent those pages to which we
cannot make inserts using the insert buffer from slipping
out of the buffer pool */
buf_page_make_young(&block->page);
}
if (before > after) {
ibuf_set_free_bits(block, after, before);
}
}