mirror of
https://github.com/MariaDB/server.git
synced 2025-01-17 04:22:27 +01:00
de4030e4d4
This also fixes part of MDEV-29835 Partial server freeze which is caused by violations of the latching order that was defined in https://dev.mysql.com/worklog/task/?id=6326 (WL#6326: InnoDB: fix index->lock contention). Unless the current thread is holding an exclusive dict_index_t::lock, it must acquire page latches in a strict parent-to-child, left-to-right order. Not all cases of MDEV-29835 are fixed yet. Failure to follow the correct latching order will cause deadlocks of threads due to lock order inversion. As part of these changes, the BTR_MODIFY_TREE mode is modified so that an Update latch (U a.k.a. SX) will be acquired on the root page, and eXclusive latches (X) will be acquired on all pages leading to the leaf page, as well as any left and right siblings of the pages along the path. The DEBUG_SYNC test innodb.innodb_wl6326 will be removed, because at the time the DEBUG_SYNC point is hit, the thread is actually holding several page latches that will be blocking a concurrent SELECT statement. We also remove double bookkeeping that was caused due to excessive information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo store information of latched pages, and ensure that mtr_memo_slot_t::object is never a null pointer. The tree_blocks[] and tree_savepoints[] were redundant. buf_page_get_low(): If innodb_change_buffering_debug=1, to avoid a hang, do not try to evict blocks if we are holding a latch on a modified page. The test innodb.innodb-change-buffer-recovery will be removed, because change buffering may no longer be forced by debug injection when the change buffer comprises multiple pages. Remove a debug assertion that could fail when innodb_change_buffering_debug=1 fails to evict a page. For other cases, the assertion is redundant, because we already checked that right after the got_block: label. The test innodb.innodb-change-buffering-recovery will be removed, because due to this change, we will be unable to evict the desired page. mtr_t::lock_register(): Register a change of a page latch on an unmodified buffer-fixed block. mtr_t::x_latch_at_savepoint(), mtr_t::sx_latch_at_savepoint(): Replaced by the use of mtr_t::upgrade_buffer_fix(), which now also handles RW_S_LATCH. mtr_t::set_modified(): For temporary tables, invoke buf_page_t::set_modified() here and not in mtr_t::commit(). We will never set the MTR_MEMO_MODIFY flag on other than persistent data pages, nor set mtr_t::m_modifications when temporary data pages are modified. mtr_t::commit(): Only invoke the buf_flush_note_modification() loop if persistent data pages were modified. mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo. This avoids many redundant entries in mtr_t::m_memo, as well as redundant calls to buf_page_get_gen() for blocks that had already been looked up in a mini-transaction. btr_get_latched_root(): Return a pointer to an already latched root page. This replaces btr_root_block_get() in cases where the mini-transaction has already latched the root page. btr_page_get_parent(): Fetch a parent page that was already latched in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched(). If needed, upgrade the root page U latch to X. This avoids bloating mtr_t::m_memo as well as performing redundant buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for B-tree defragmentation, we will invoke btr_cur_search_to_nth_level(). btr_cur_search_to_nth_level(): This will only be used for non-leaf (level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be removed altogether, or retained for the case of CHECK TABLE without QUICK. btr_cur_t::left_block: Remove. btr_pcur_move_backward_from_page() can retrieve the left sibling from the end of mtr_t::m_memo. btr_cur_t::open_leaf(): Some clean-up. btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level() for searches to level=0 (the leaf level). We will never release parent page latches before acquiring leaf page latches. If we need to temporarily release the level=1 page latch in the BTR_SEARCH_PREV or BTR_MODIFY_PREV latch_mode, we will reposition the cursor on the child node pointer so that we will land on the correct leaf page. btr_cur_t::pessimistic_search_leaf(): Implement new BTR_MODIFY_TREE latching logic in the case that page splits or merges will be needed. The parent pages (and their siblings) should already be latched on the first dive to the leaf and be present in mtr_t::m_memo; there should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost suffices; it must be revised in MDEV-29835 and work-arounds removed for cases where mtr_t::get_already_latched() fails to find a block. rtr_search_to_nth_level(): A SPATIAL INDEX version of btr_search_to_nth_level() that can search to any level (including the leaf level). rtr_search_leaf(), rtr_insert_leaf(): Wrappers for rtr_search_to_nth_level(). rtr_search(): Replaces rtr_pcur_open(). rtr_latch_leaves(): Replaces btr_cur_latch_leaves(). Note that unlike in the B-tree code, there is no error handling in case the sibling pages are corrupted. rtr_cur_restore_position(): Remove an unused constant parameter. btr_pcur_open_on_user_rec(): Remove the constant parameter mode=PAGE_CUR_GE. row_ins_clust_index_entry_low(): Use a new mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC. BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove. BTR_CONT_MODIFY_TREE: Note that this is only used by rtr_search_to_nth_level(). btr_pcur_optimistic_latch_leaves(): Replaces btr_cur_optimistic_latch_leaves(). ibuf_delete_rec(): Acquire exclusive ibuf.index->lock in order to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV). btr_blob_log_check_t(): Acquire a U latch on the root page, so that btr_page_alloc() in btr_store_big_rec_extern_fields() will avoid a deadlock. btr_store_big_rec_extern_fields(): Assert that the root page latch is being held. Tested by: Matthias Leich Reviewed by: Vladislav Lesin
636 lines
20 KiB
C++
636 lines
20 KiB
C++
/*****************************************************************************
|
|
|
|
Copyright (c) 2019, 2023, MariaDB Corporation.
|
|
|
|
This program is free software; you can redistribute it and/or modify it under
|
|
the terms of the GNU General Public License as published by the Free Software
|
|
Foundation; version 2 of the License.
|
|
|
|
This program is distributed in the hope that it will be useful, but WITHOUT
|
|
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
|
|
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License along with
|
|
this program; if not, write to the Free Software Foundation, Inc.,
|
|
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
|
|
|
|
*****************************************************************************/
|
|
|
|
/**
|
|
@file include/mtr0log.h
|
|
Mini-transaction log record encoding and decoding
|
|
*******************************************************/
|
|
|
|
#pragma once
|
|
#include "mtr0mtr.h"
|
|
|
|
/** The smallest invalid page identifier for persistent tablespaces */
|
|
constexpr page_id_t end_page_id{SRV_SPACE_ID_UPPER_BOUND, 0};
|
|
|
|
/** The minimum 2-byte integer (0b10xxxxxx xxxxxxxx) */
|
|
constexpr uint32_t MIN_2BYTE= 1 << 7;
|
|
/** The minimum 3-byte integer (0b110xxxxx xxxxxxxx xxxxxxxx) */
|
|
constexpr uint32_t MIN_3BYTE= MIN_2BYTE + (1 << 14);
|
|
/** The minimum 4-byte integer (0b1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx) */
|
|
constexpr uint32_t MIN_4BYTE= MIN_3BYTE + (1 << 21);
|
|
/** Minimum 5-byte integer (0b11110000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx) */
|
|
constexpr uint32_t MIN_5BYTE= MIN_4BYTE + (1 << 28);
|
|
|
|
/** Error from mlog_decode_varint() */
|
|
constexpr uint32_t MLOG_DECODE_ERROR= ~0U;
|
|
|
|
/** Decode the length of a variable-length encoded integer.
|
|
@param first first byte of the encoded integer
|
|
@return the length, in bytes */
|
|
inline uint8_t mlog_decode_varint_length(byte first)
|
|
{
|
|
uint8_t len= 1;
|
|
for (; first & 0x80; len++, first= static_cast<uint8_t>(first << 1));
|
|
return len;
|
|
}
|
|
|
|
/** Decode an integer in a redo log record.
|
|
@param log redo log record buffer
|
|
@return the decoded integer
|
|
@retval MLOG_DECODE_ERROR on error */
|
|
inline uint32_t mlog_decode_varint(const byte* log)
|
|
{
|
|
uint32_t i= *log;
|
|
if (i < MIN_2BYTE)
|
|
return i;
|
|
if (i < 0xc0)
|
|
return MIN_2BYTE + ((i & ~0x80) << 8 | log[1]);
|
|
if (i < 0xe0)
|
|
return MIN_3BYTE + ((i & ~0xc0) << 16 | uint32_t{log[1]} << 8 | log[2]);
|
|
if (i < 0xf0)
|
|
return MIN_4BYTE + ((i & ~0xe0) << 24 | uint32_t{log[1]} << 16 |
|
|
uint32_t{log[2]} << 8 | log[3]);
|
|
if (i == 0xf0)
|
|
{
|
|
i= uint32_t{log[1]} << 24 | uint32_t{log[2]} << 16 |
|
|
uint32_t{log[3]} << 8 | log[4];
|
|
if (i <= ~MIN_5BYTE)
|
|
return MIN_5BYTE + i;
|
|
}
|
|
return MLOG_DECODE_ERROR;
|
|
}
|
|
|
|
/** Encode an integer in a redo log record.
|
|
@param log redo log record buffer
|
|
@param i the integer to encode
|
|
@return end of the encoded integer */
|
|
inline byte *mlog_encode_varint(byte *log, size_t i)
|
|
{
|
|
#if defined __GNUC__ && !defined __clang__ && __GNUC__ < 6
|
|
# pragma GCC diagnostic push
|
|
# pragma GCC diagnostic ignored "-Wconversion" /* GCC 4 and 5 need this here */
|
|
#endif
|
|
if (i < MIN_2BYTE)
|
|
{
|
|
}
|
|
else if (i < MIN_3BYTE)
|
|
{
|
|
i-= MIN_2BYTE;
|
|
static_assert(MIN_3BYTE - MIN_2BYTE == 1 << 14, "compatibility");
|
|
*log++= 0x80 | static_cast<byte>(i >> 8);
|
|
}
|
|
else if (i < MIN_4BYTE)
|
|
{
|
|
i-= MIN_3BYTE;
|
|
static_assert(MIN_4BYTE - MIN_3BYTE == 1 << 21, "compatibility");
|
|
*log++= 0xc0 | static_cast<byte>(i >> 16);
|
|
goto last2;
|
|
}
|
|
else if (i < MIN_5BYTE)
|
|
{
|
|
i-= MIN_4BYTE;
|
|
static_assert(MIN_5BYTE - MIN_4BYTE == 1 << 28, "compatibility");
|
|
*log++= 0xe0 | static_cast<byte>(i >> 24);
|
|
goto last3;
|
|
}
|
|
else
|
|
{
|
|
ut_ad(i < MLOG_DECODE_ERROR);
|
|
i-= MIN_5BYTE;
|
|
*log++= 0xf0;
|
|
*log++= static_cast<byte>(i >> 24);
|
|
last3:
|
|
*log++= static_cast<byte>(i >> 16);
|
|
last2:
|
|
*log++= static_cast<byte>(i >> 8);
|
|
}
|
|
#if defined __GNUC__ && !defined __clang__ && __GNUC__ < 6
|
|
# pragma GCC diagnostic pop
|
|
#endif
|
|
*log++= static_cast<byte>(i);
|
|
return log;
|
|
}
|
|
|
|
/** Determine the length of a log record.
|
|
@param log start of log record
|
|
@param end end of the log record buffer
|
|
@return the length of the record, in bytes
|
|
@retval 0 if the log extends past the end
|
|
@retval MLOG_DECODE_ERROR if the record is corrupted */
|
|
inline uint32_t mlog_decode_len(const byte *log, const byte *end)
|
|
{
|
|
ut_ad(log < end);
|
|
uint32_t i= *log;
|
|
if (!i)
|
|
return 0; /* end of mini-transaction */
|
|
if (~i & 15)
|
|
return (i & 15) + 1; /* 1..16 bytes */
|
|
if (UNIV_UNLIKELY(++log == end))
|
|
return 0; /* end of buffer */
|
|
i= *log;
|
|
if (UNIV_LIKELY(i < MIN_2BYTE)) /* 1 additional length byte: 16..143 bytes */
|
|
return 16 + i;
|
|
if (i < 0xc0) /* 2 additional length bytes: 144..16,527 bytes */
|
|
{
|
|
if (UNIV_UNLIKELY(log + 1 == end))
|
|
return 0; /* end of buffer */
|
|
return 16 + MIN_2BYTE + ((i & ~0xc0) << 8 | log[1]);
|
|
}
|
|
if (i < 0xe0) /* 3 additional length bytes: 16528..1065103 bytes */
|
|
{
|
|
if (UNIV_UNLIKELY(log + 2 == end))
|
|
return 0; /* end of buffer */
|
|
return 16 + MIN_3BYTE + ((i & ~0xe0) << 16 |
|
|
static_cast<uint32_t>(log[1]) << 8 | log[2]);
|
|
}
|
|
/* 1,065,103 bytes per log record ought to be enough for everyone */
|
|
return MLOG_DECODE_ERROR;
|
|
}
|
|
|
|
/** Write 1, 2, 4, or 8 bytes to a file page.
|
|
@param[in] block file page
|
|
@param[in,out] ptr pointer in file page
|
|
@param[in] val value to write
|
|
@tparam l number of bytes to write
|
|
@tparam w write request type
|
|
@tparam V type of val
|
|
@return whether any log was written */
|
|
template<unsigned l,mtr_t::write_type w,typename V>
|
|
inline bool mtr_t::write(const buf_block_t &block, void *ptr, V val)
|
|
{
|
|
ut_ad(ut_align_down(ptr, srv_page_size) == block.page.frame);
|
|
static_assert(l == 1 || l == 2 || l == 4 || l == 8, "wrong length");
|
|
byte buf[l];
|
|
|
|
switch (l) {
|
|
case 1:
|
|
ut_ad(val == static_cast<byte>(val));
|
|
buf[0]= static_cast<byte>(val);
|
|
break;
|
|
case 2:
|
|
ut_ad(val == static_cast<uint16_t>(val));
|
|
mach_write_to_2(buf, static_cast<uint16_t>(val));
|
|
break;
|
|
case 4:
|
|
ut_ad(val == static_cast<uint32_t>(val));
|
|
mach_write_to_4(buf, static_cast<uint32_t>(val));
|
|
break;
|
|
case 8:
|
|
mach_write_to_8(buf, val);
|
|
break;
|
|
}
|
|
byte *p= static_cast<byte*>(ptr);
|
|
const byte *const end= p + l;
|
|
if (w != FORCED && is_logged())
|
|
{
|
|
const byte *b= buf;
|
|
while (*p++ == *b++)
|
|
{
|
|
if (p == end)
|
|
{
|
|
ut_ad(w == MAYBE_NOP);
|
|
return false;
|
|
}
|
|
}
|
|
p--;
|
|
}
|
|
::memcpy(ptr, buf, l);
|
|
memcpy_low(block, static_cast<uint16_t>
|
|
(ut_align_offset(p, srv_page_size)), p, end - p);
|
|
return true;
|
|
}
|
|
|
|
/** Log an initialization of a string of bytes.
|
|
@param[in] b buffer page
|
|
@param[in] ofs byte offset from b->frame
|
|
@param[in] len length of the data to write
|
|
@param[in] val the data byte to write */
|
|
inline void mtr_t::memset(const buf_block_t &b, ulint ofs, ulint len, byte val)
|
|
{
|
|
ut_ad(len);
|
|
set_modified(b);
|
|
if (!is_logged())
|
|
return;
|
|
|
|
static_assert(MIN_4BYTE > UNIV_PAGE_SIZE_MAX, "consistency");
|
|
size_t lenlen= (len < MIN_2BYTE ? 1 + 1 : len < MIN_3BYTE ? 2 + 1 : 3 + 1);
|
|
byte *l= log_write<MEMSET>(b.page.id(), &b.page, lenlen, true, ofs);
|
|
l= mlog_encode_varint(l, len);
|
|
*l++= val;
|
|
m_log.close(l);
|
|
m_last_offset= static_cast<uint16_t>(ofs + len);
|
|
}
|
|
|
|
/** Initialize a string of bytes.
|
|
@param[in,out] b buffer page
|
|
@param[in] ofs byte offset from block->frame
|
|
@param[in] len length of the data to write
|
|
@param[in] val the data byte to write */
|
|
inline void mtr_t::memset(const buf_block_t *b, ulint ofs, ulint len, byte val)
|
|
{
|
|
ut_ad(ofs <= ulint(srv_page_size));
|
|
ut_ad(ofs + len <= ulint(srv_page_size));
|
|
::memset(ofs + b->page.frame, val, len);
|
|
memset(*b, ofs, len, val);
|
|
}
|
|
|
|
/** Log an initialization of a repeating string of bytes.
|
|
@param[in] b buffer page
|
|
@param[in] ofs byte offset from b->frame
|
|
@param[in] len length of the data to write, in bytes
|
|
@param[in] str the string to write
|
|
@param[in] size size of str, in bytes */
|
|
inline void mtr_t::memset(const buf_block_t &b, ulint ofs, size_t len,
|
|
const void *str, size_t size)
|
|
{
|
|
ut_ad(size);
|
|
ut_ad(len > size); /* use mtr_t::memcpy() for shorter writes */
|
|
set_modified(b);
|
|
if (!is_logged())
|
|
return;
|
|
|
|
static_assert(MIN_4BYTE > UNIV_PAGE_SIZE_MAX, "consistency");
|
|
size_t lenlen= (len < MIN_2BYTE ? 1 : len < MIN_3BYTE ? 2 : 3);
|
|
byte *l= log_write<MEMSET>(b.page.id(), &b.page, lenlen + size, true, ofs);
|
|
l= mlog_encode_varint(l, len);
|
|
::memcpy(l, str, size);
|
|
l+= size;
|
|
m_log.close(l);
|
|
m_last_offset= static_cast<uint16_t>(ofs + len);
|
|
}
|
|
|
|
/** Initialize a repeating string of bytes.
|
|
@param[in,out] b buffer page
|
|
@param[in] ofs byte offset from b->frame
|
|
@param[in] len length of the data to write, in bytes
|
|
@param[in] str the string to write
|
|
@param[in] size size of str, in bytes */
|
|
inline void mtr_t::memset(const buf_block_t *b, ulint ofs, size_t len,
|
|
const void *str, size_t size)
|
|
{
|
|
ut_ad(ofs <= ulint(srv_page_size));
|
|
ut_ad(ofs + len <= ulint(srv_page_size));
|
|
ut_ad(len > size); /* use mtr_t::memcpy() for shorter writes */
|
|
size_t s= 0;
|
|
while (s < len)
|
|
{
|
|
::memcpy(ofs + s + b->page.frame, str, size);
|
|
s+= len;
|
|
}
|
|
::memcpy(ofs + s + b->page.frame, str, len - s);
|
|
memset(*b, ofs, len, str, size);
|
|
}
|
|
|
|
/** Log a write of a byte string to a page.
|
|
@param[in] b buffer page
|
|
@param[in] offset byte offset from b->frame
|
|
@param[in] str the data to write
|
|
@param[in] len length of the data to write */
|
|
inline void mtr_t::memcpy(const buf_block_t &b, ulint offset, ulint len)
|
|
{
|
|
ut_ad(len);
|
|
ut_ad(offset <= ulint(srv_page_size));
|
|
ut_ad(offset + len <= ulint(srv_page_size));
|
|
memcpy_low(b, uint16_t(offset), &b.page.frame[offset], len);
|
|
}
|
|
|
|
/** Log a write of a byte string to a page.
|
|
@param block page
|
|
@param offset byte offset within page
|
|
@param data data to be written
|
|
@param len length of the data, in bytes */
|
|
inline void mtr_t::memcpy_low(const buf_block_t &block, uint16_t offset,
|
|
const void *data, size_t len)
|
|
{
|
|
ut_ad(len);
|
|
set_modified(block);
|
|
if (!is_logged())
|
|
return;
|
|
if (len < mtr_buf_t::MAX_DATA_SIZE - (1 + 3 + 3 + 5 + 5))
|
|
{
|
|
byte *end= log_write<WRITE>(block.page.id(), &block.page, len, true,
|
|
offset);
|
|
::memcpy(end, data, len);
|
|
m_log.close(end + len);
|
|
}
|
|
else
|
|
{
|
|
m_log.close(log_write<WRITE>(block.page.id(), &block.page, len, false,
|
|
offset));
|
|
m_log.push(static_cast<const byte*>(data), static_cast<uint32_t>(len));
|
|
}
|
|
m_last_offset= static_cast<uint16_t>(offset + len);
|
|
}
|
|
|
|
/** Log that a string of bytes was copied from the same page.
|
|
@param[in] b buffer page
|
|
@param[in] d destination offset within the page
|
|
@param[in] s source offset within the page
|
|
@param[in] len length of the data to copy */
|
|
inline void mtr_t::memmove(const buf_block_t &b, ulint d, ulint s, ulint len)
|
|
{
|
|
ut_ad(d >= 8);
|
|
ut_ad(s >= 8);
|
|
ut_ad(len);
|
|
ut_ad(s <= ulint(srv_page_size));
|
|
ut_ad(s + len <= ulint(srv_page_size));
|
|
ut_ad(s != d);
|
|
ut_ad(d <= ulint(srv_page_size));
|
|
ut_ad(d + len <= ulint(srv_page_size));
|
|
|
|
set_modified(b);
|
|
if (!is_logged())
|
|
return;
|
|
static_assert(MIN_4BYTE > UNIV_PAGE_SIZE_MAX, "consistency");
|
|
size_t lenlen= (len < MIN_2BYTE ? 1 : len < MIN_3BYTE ? 2 : 3);
|
|
/* The source offset is encoded relative to the destination offset,
|
|
with the sign in the least significant bit. */
|
|
if (s > d)
|
|
s= (s - d) << 1;
|
|
else
|
|
s= (d - s) << 1 | 1;
|
|
/* The source offset 0 is not possible. */
|
|
s-= 1 << 1;
|
|
size_t slen= (s < MIN_2BYTE ? 1 : s < MIN_3BYTE ? 2 : 3);
|
|
byte *l= log_write<MEMMOVE>(b.page.id(), &b.page, lenlen + slen, true, d);
|
|
l= mlog_encode_varint(l, len);
|
|
l= mlog_encode_varint(l, s);
|
|
m_log.close(l);
|
|
m_last_offset= static_cast<uint16_t>(d + len);
|
|
}
|
|
|
|
/**
|
|
Write a log record.
|
|
@tparam type redo log record type
|
|
@param id persistent page identifier
|
|
@param bpage buffer pool page, or nullptr
|
|
@param len number of additional bytes to write
|
|
@param alloc whether to allocate the additional bytes
|
|
@param offset byte offset, or 0 if the record type does not allow one
|
|
@return end of mini-transaction log, minus len */
|
|
template<byte type>
|
|
inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage,
|
|
size_t len, bool alloc, size_t offset)
|
|
{
|
|
static_assert(!(type & 15) && type != RESERVED &&
|
|
type <= FILE_CHECKPOINT, "invalid type");
|
|
ut_ad(type >= FILE_CREATE || is_named_space(id.space()));
|
|
ut_ad(!bpage || bpage->id() == id);
|
|
ut_ad(id < end_page_id);
|
|
constexpr bool have_len= type != INIT_PAGE && type != FREE_PAGE;
|
|
constexpr bool have_offset= type == WRITE || type == MEMSET ||
|
|
type == MEMMOVE;
|
|
static_assert(!have_offset || have_len, "consistency");
|
|
ut_ad(have_len || len == 0);
|
|
ut_ad(have_len || !alloc);
|
|
ut_ad(have_offset || offset == 0);
|
|
ut_ad(offset + len <= srv_page_size);
|
|
static_assert(MIN_4BYTE >= UNIV_PAGE_SIZE_MAX, "consistency");
|
|
ut_ad(type == FREE_PAGE || type == OPTION || (type == EXTENDED && !bpage) ||
|
|
memo_contains_flagged(bpage, MTR_MEMO_MODIFY));
|
|
size_t max_len;
|
|
if (!have_len)
|
|
max_len= 1 + 5 + 5;
|
|
else if (!have_offset)
|
|
max_len= bpage && m_last == bpage
|
|
? 1 + 3
|
|
: 1 + 3 + 5 + 5;
|
|
else if (bpage && m_last == bpage && m_last_offset <= offset)
|
|
{
|
|
/* Encode the offset relative from m_last_offset. */
|
|
offset-= m_last_offset;
|
|
max_len= 1 + 3 + 3;
|
|
}
|
|
else
|
|
max_len= 1 + 3 + 5 + 5 + 3;
|
|
byte *const log_ptr= m_log.open(alloc ? max_len + len : max_len);
|
|
byte *end= log_ptr + 1;
|
|
const byte same_page= max_len < 1 + 5 + 5 ? 0x80 : 0;
|
|
if (!same_page)
|
|
{
|
|
end= mlog_encode_varint(end, id.space());
|
|
end= mlog_encode_varint(end, id.page_no());
|
|
m_last= bpage;
|
|
}
|
|
if (have_offset)
|
|
{
|
|
byte* oend= mlog_encode_varint(end, offset);
|
|
if (oend + len > &log_ptr[16])
|
|
{
|
|
len+= oend - log_ptr - 15;
|
|
if (len >= MIN_3BYTE - 1)
|
|
len+= 2;
|
|
else if (len >= MIN_2BYTE)
|
|
len++;
|
|
|
|
*log_ptr= type | same_page;
|
|
end= mlog_encode_varint(log_ptr + 1, len);
|
|
if (!same_page)
|
|
{
|
|
end= mlog_encode_varint(end, id.space());
|
|
end= mlog_encode_varint(end, id.page_no());
|
|
}
|
|
end= mlog_encode_varint(end, offset);
|
|
return end;
|
|
}
|
|
else
|
|
end= oend;
|
|
}
|
|
else if (len >= 3 && end + len > &log_ptr[16])
|
|
{
|
|
len+= end - log_ptr - 15;
|
|
if (len >= MIN_3BYTE - 1)
|
|
len+= 2;
|
|
else if (len >= MIN_2BYTE)
|
|
len++;
|
|
|
|
end= log_ptr;
|
|
*end++= type | same_page;
|
|
end= mlog_encode_varint(end, len);
|
|
|
|
if (!same_page)
|
|
{
|
|
end= mlog_encode_varint(end, id.space());
|
|
end= mlog_encode_varint(end, id.page_no());
|
|
}
|
|
return end;
|
|
}
|
|
|
|
ut_ad(end + len >= &log_ptr[1] + !same_page);
|
|
ut_ad(end + len <= &log_ptr[16]);
|
|
ut_ad(end <= &log_ptr[max_len]);
|
|
*log_ptr= type | same_page | static_cast<byte>(end + len - log_ptr - 1);
|
|
ut_ad(*log_ptr & 15);
|
|
return end;
|
|
}
|
|
|
|
/** Write a byte string to a page.
|
|
@param[in] b buffer page
|
|
@param[in] dest destination within b.frame
|
|
@param[in] str the data to write
|
|
@param[in] len length of the data to write
|
|
@tparam w write request type */
|
|
template<mtr_t::write_type w>
|
|
inline void mtr_t::memcpy(const buf_block_t &b, void *dest, const void *str,
|
|
ulint len)
|
|
{
|
|
ut_ad(ut_align_down(dest, srv_page_size) == b.page.frame);
|
|
char *d= static_cast<char*>(dest);
|
|
const char *s= static_cast<const char*>(str);
|
|
if (w != FORCED && is_logged())
|
|
{
|
|
ut_ad(len);
|
|
const char *const end= d + len;
|
|
while (*d++ == *s++)
|
|
{
|
|
if (d == end)
|
|
{
|
|
ut_ad(w == MAYBE_NOP);
|
|
return;
|
|
}
|
|
}
|
|
s--;
|
|
d--;
|
|
len= static_cast<ulint>(end - d);
|
|
}
|
|
::memcpy(d, s, len);
|
|
memcpy(b, ut_align_offset(d, srv_page_size), len);
|
|
}
|
|
|
|
/** Write an EXTENDED log record.
|
|
@param block buffer pool page
|
|
@param type extended record subtype; @see mrec_ext_t */
|
|
inline void mtr_t::log_write_extended(const buf_block_t &block, byte type)
|
|
{
|
|
set_modified(block);
|
|
if (!is_logged())
|
|
return;
|
|
byte *l= log_write<EXTENDED>(block.page.id(), &block.page, 1, true);
|
|
*l++= type;
|
|
m_log.close(l);
|
|
m_last_offset= FIL_PAGE_TYPE;
|
|
}
|
|
|
|
/** Write log for partly initializing a B-tree or R-tree page.
|
|
@param block B-tree or R-tree page
|
|
@param comp false=ROW_FORMAT=REDUNDANT, true=COMPACT or DYNAMIC */
|
|
inline void mtr_t::page_create(const buf_block_t &block, bool comp)
|
|
{
|
|
static_assert(false == INIT_ROW_FORMAT_REDUNDANT, "encoding");
|
|
static_assert(true == INIT_ROW_FORMAT_DYNAMIC, "encoding");
|
|
log_write_extended(block, comp);
|
|
}
|
|
|
|
/** Write log for deleting a B-tree or R-tree record in ROW_FORMAT=REDUNDANT.
|
|
@param block B-tree or R-tree page
|
|
@param prev_rec byte offset of the predecessor of the record to delete,
|
|
starting from PAGE_OLD_INFIMUM */
|
|
inline void mtr_t::page_delete(const buf_block_t &block, ulint prev_rec)
|
|
{
|
|
ut_ad(!block.zip_size());
|
|
ut_ad(prev_rec < block.physical_size());
|
|
set_modified(block);
|
|
if (!is_logged())
|
|
return;
|
|
size_t len= (prev_rec < MIN_2BYTE ? 2 : prev_rec < MIN_3BYTE ? 3 : 4);
|
|
byte *l= log_write<EXTENDED>(block.page.id(), &block.page, len, true);
|
|
ut_d(byte *end= l + len);
|
|
*l++= DELETE_ROW_FORMAT_REDUNDANT;
|
|
l= mlog_encode_varint(l, prev_rec);
|
|
ut_ad(end == l);
|
|
m_log.close(l);
|
|
m_last_offset= FIL_PAGE_TYPE;
|
|
}
|
|
|
|
/** Write log for deleting a COMPACT or DYNAMIC B-tree or R-tree record.
|
|
@param block B-tree or R-tree page
|
|
@param prev_rec byte offset of the predecessor of the record to delete,
|
|
starting from PAGE_NEW_INFIMUM
|
|
@param prev_rec the predecessor of the record to delete
|
|
@param hdr_size record header size, excluding REC_N_NEW_EXTRA_BYTES
|
|
@param data_size data payload size, in bytes */
|
|
inline void mtr_t::page_delete(const buf_block_t &block, ulint prev_rec,
|
|
size_t hdr_size, size_t data_size)
|
|
{
|
|
ut_ad(!block.zip_size());
|
|
set_modified(block);
|
|
ut_ad(hdr_size < MIN_3BYTE);
|
|
ut_ad(prev_rec < block.physical_size());
|
|
ut_ad(data_size < block.physical_size());
|
|
if (!is_logged())
|
|
return;
|
|
size_t len= prev_rec < MIN_2BYTE ? 2 : prev_rec < MIN_3BYTE ? 3 : 4;
|
|
len+= hdr_size < MIN_2BYTE ? 1 : 2;
|
|
len+= data_size < MIN_2BYTE ? 1 : data_size < MIN_3BYTE ? 2 : 3;
|
|
byte *l= log_write<EXTENDED>(block.page.id(), &block.page, len, true);
|
|
ut_d(byte *end= l + len);
|
|
*l++= DELETE_ROW_FORMAT_DYNAMIC;
|
|
l= mlog_encode_varint(l, prev_rec);
|
|
l= mlog_encode_varint(l, hdr_size);
|
|
l= mlog_encode_varint(l, data_size);
|
|
ut_ad(end == l);
|
|
m_log.close(l);
|
|
m_last_offset= FIL_PAGE_TYPE;
|
|
}
|
|
|
|
/** Write log for initializing an undo log page.
|
|
@param block undo page */
|
|
inline void mtr_t::undo_create(const buf_block_t &block)
|
|
{
|
|
log_write_extended(block, UNDO_INIT);
|
|
}
|
|
|
|
/** Write log for appending an undo log record.
|
|
@param block undo page
|
|
@param data record within the undo page
|
|
@param len length of the undo record, in bytes */
|
|
inline void mtr_t::undo_append(const buf_block_t &block,
|
|
const void *data, size_t len)
|
|
{
|
|
ut_ad(len > 2);
|
|
set_modified(block);
|
|
if (!is_logged())
|
|
return;
|
|
const bool small= len + 1 < mtr_buf_t::MAX_DATA_SIZE - (1 + 3 + 3 + 5 + 5);
|
|
byte *end= log_write<EXTENDED>(block.page.id(), &block.page, len + 1, small);
|
|
if (UNIV_LIKELY(small))
|
|
{
|
|
*end++= UNDO_APPEND;
|
|
::memcpy(end, data, len);
|
|
m_log.close(end + len);
|
|
}
|
|
else
|
|
{
|
|
m_log.close(end);
|
|
*m_log.push<byte*>(1)= UNDO_APPEND;
|
|
m_log.push(static_cast<const byte*>(data), static_cast<uint32_t>(len));
|
|
}
|
|
m_last_offset= FIL_PAGE_TYPE;
|
|
}
|
|
|
|
/** Trim the end of a tablespace.
|
|
@param id first page identifier that will not be in the file */
|
|
inline void mtr_t::trim_pages(const page_id_t id)
|
|
{
|
|
if (!is_logged())
|
|
return;
|
|
byte *l= log_write<EXTENDED>(id, nullptr, 1, true);
|
|
*l++= TRIM_PAGES;
|
|
m_log.close(l);
|
|
set_trim_pages();
|
|
}
|