mirror of
https://github.com/MariaDB/server.git
synced 2025-01-31 02:51:44 +01:00
7ae21b18a6
log_t::FORMAT_10_5: physical redo log format tag log_phys_t: Buffered records in the physical format. The log record bytes will follow the last data field, making use of alignment padding that would otherwise be wasted. If there are multiple records for the same page, also those may be appended to an existing log_phys_t object if the memory is available. In the physical format, the first byte of a record identifies the record and its length (up to 15 bytes). For longer records, the immediately following bytes will encode the remaining length in a variable-length encoding. Usually, a variable-length-encoded page identifier will follow, followed by optional payload, whose length is included in the initially encoded total record length. When a mini-transaction is updating multiple fields in a page, it can avoid repeating the tablespace identifier and page number by setting the same_page flag (most significant bit) in the first byte of the log record. The byte offset of the record will be relative to where the previous record for that page ended. Until MDEV-14425 introduces a separate file-level log for redo log checkpoints and file operations, we will write the file-level records in the page-level redo log file. The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT) will be removed in MDEV-14425, and one sequential scan of the page recovery log will suffice. Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags. If the information is needed, it can be parsed from WRITE records that modify FSP_SPACE_FLAGS. MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily as part of this work, before being replaced with WRITE (along with MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES). mtr_buf_t::empty(): Check if the buffer is empty. mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty. mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record, for the same_page encoding. page_recv_t::last_offset: Reflects mtr_t::m_last_offset. Valid values for last_offset during recovery should be 0 or above 8. (The first 8 bytes of a page are the checksum and the page number, and neither are ever updated directly by log records.) Internally, the special value 1 indicates that the same_page form will not be allowed for the subsequent record. mtr_t::page_create(): Take the block descriptor as parameter, so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE record will always followed by a subtype byte, because same_page records must be longer than 1 byte. trx_undo_page_init(): Combine the writes in WRITE record. trx_undo_header_create(): Write 4 bytes using a special MEMSET record that includes 1 bytes of length and 2 bytes of payload. flst_write_addr(): Define as a static function. Combine the writes. flst_zero_both(): Replaces two flst_zero_addr() calls. flst_init(): Do not inline the function. fsp_free_seg_inode(): Zerofill the whole inode. fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT to FIL_NULL when using the physical format. btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page() must have been invoked. fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE. fil_names_dirty_and_write(): Remove the parameter mtr. Write the records using a separate mini-transaction object, because any FILE_ records must be at the start of a mini-transaction log. recv_recover_page(): Add a fil_space_t* parameter. After applying log to the a ROW_FORMAT=COMPRESSED page, invoke buf_zip_decompress() to restore the uncompressed page. buf_page_io_complete(): Remove the temporary hack to discard the uncompressed page of a ROW_FORMAT=COMPRESSED page. page_zip_write_header(): Remove. Use mtr_t::write() or mtr_t::memset() instead, and update the compressed page frame separately. trx_undo_header_add_space_for_xid(): Remove. trx_undo_seg_create(): Perform the changes that were previously made by trx_undo_header_add_space_for_xid(). btr_reset_instant(): New function: Reset the table to MariaDB 10.2 or 10.3 format when rolling back an instant ALTER TABLE operation. page_rec_find_owner_rec(): Merge with the only callers. page_cur_insert_rec_low(): Combine writes by using a local buffer. MEMMOVE data from the preceding record whenever feasible (copying at least 3 bytes). page_cur_insert_rec_zip(): Combine writes to page header fields. PageBulk::insertPage(): Issue MEMMOVE records to copy a matching part from the preceding record. PageBulk::finishPage(): Combine the writes to the page header and to the sparse page directory slots. mtr_t::write(): Only log the least significant (last) bytes of multi-byte fields that actually differ. For updating FSP_SIZE, we must always write all 4 bytes to the redo log, so that the fil_space_set_recv_size() logic in recv_sys_t::parse() will work. mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument instead of a numeric offset to the page frame. Only log the last bytes of multi-byte fields that actually differ. In fil_space_crypt_t::write_page0(), we must log also any unchanged bytes, so that recovery will recognize the record and invoke fil_crypt_parse(). Future work: MDEV-21724 Optimize page_cur_insert_rec_low() redo logging MDEV-21725 Optimize btr_page_reorganize_low() redo logging MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
163 lines
5.7 KiB
C
163 lines
5.7 KiB
C
/*****************************************************************************
|
|
|
|
Copyright (c) 1995, 2014, Oracle and/or its affiliates. All Rights Reserved.
|
|
Copyright (c) 2018, 2020, MariaDB Corporation.
|
|
|
|
This program is free software; you can redistribute it and/or modify it under
|
|
the terms of the GNU General Public License as published by the Free Software
|
|
Foundation; version 2 of the License.
|
|
|
|
This program is distributed in the hope that it will be useful, but WITHOUT
|
|
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
|
|
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License along with
|
|
this program; if not, write to the Free Software Foundation, Inc.,
|
|
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
|
|
|
|
*****************************************************************************/
|
|
|
|
/******************************************************************//**
|
|
@file include/fut0lst.h
|
|
File-based list utilities
|
|
|
|
Created 11/28/1995 Heikki Tuuri
|
|
***********************************************************************/
|
|
|
|
#ifndef fut0lst_h
|
|
#define fut0lst_h
|
|
|
|
#ifdef UNIV_INNOCHECKSUM
|
|
# include "fil0fil.h"
|
|
#else
|
|
#include "fut0fut.h"
|
|
#include "mtr0log.h"
|
|
|
|
/* The C 'types' of base node and list node: these should be used to
|
|
write self-documenting code. Of course, the sizeof macro cannot be
|
|
applied to these types! */
|
|
|
|
typedef byte flst_base_node_t;
|
|
typedef byte flst_node_t;
|
|
|
|
#endif /* !UNIV_INNOCHECKSUM */
|
|
|
|
/* The physical size of a list base node in bytes */
|
|
#define FLST_BASE_NODE_SIZE (4 + 2 * FIL_ADDR_SIZE)
|
|
/* The physical size of a list node in bytes */
|
|
#define FLST_NODE_SIZE (2 * FIL_ADDR_SIZE)
|
|
|
|
#ifndef UNIV_INNOCHECKSUM
|
|
/* We define the field offsets of a node for the list */
|
|
#define FLST_PREV 0 /* 6-byte address of the previous list element;
|
|
the page part of address is FIL_NULL, if no
|
|
previous element */
|
|
#define FLST_NEXT FIL_ADDR_SIZE /* 6-byte address of the next
|
|
list element; the page part of address
|
|
is FIL_NULL, if no next element */
|
|
|
|
/* We define the field offsets of a base node for the list */
|
|
#define FLST_LEN 0 /* 32-bit list length field */
|
|
#define FLST_FIRST 4 /* 6-byte address of the first element
|
|
of the list; undefined if empty list */
|
|
#define FLST_LAST (4 + FIL_ADDR_SIZE) /* 6-byte address of the
|
|
last element of the list; undefined
|
|
if empty list */
|
|
|
|
/** Initialize a zero-initialized list base node.
|
|
@param[in,out] block file page
|
|
@param[in] ofs byte offset of the list base node
|
|
@param[in,out] mtr mini-transaction */
|
|
inline void flst_init(const buf_block_t* block, uint16_t ofs, mtr_t* mtr)
|
|
{
|
|
ut_ad(!mach_read_from_2(FLST_LEN + ofs + block->frame));
|
|
ut_ad(!mach_read_from_2(FLST_FIRST + FIL_ADDR_BYTE + ofs + block->frame));
|
|
ut_ad(!mach_read_from_2(FLST_LAST + FIL_ADDR_BYTE + ofs + block->frame));
|
|
compile_time_assert(FIL_NULL == 0xffU * 0x1010101U);
|
|
mtr->memset(block, FLST_FIRST + FIL_ADDR_PAGE + ofs, 4, 0xff);
|
|
mtr->memset(block, FLST_LAST + FIL_ADDR_PAGE + ofs, 4, 0xff);
|
|
}
|
|
|
|
/** Initialize a list base node.
|
|
@param[in] block file page
|
|
@param[in,out] base base node
|
|
@param[in,out] mtr mini-transaction */
|
|
void flst_init(const buf_block_t& block, byte *base, mtr_t *mtr)
|
|
MY_ATTRIBUTE((nonnull));
|
|
|
|
/** Append a file list node to a list.
|
|
@param[in,out] base base node block
|
|
@param[in] boffset byte offset of the base node
|
|
@param[in,out] add block to be added
|
|
@param[in] aoffset byte offset of the node to be added
|
|
@param[in,outr] mtr mini-transaction */
|
|
void flst_add_last(buf_block_t *base, uint16_t boffset,
|
|
buf_block_t *add, uint16_t aoffset, mtr_t *mtr)
|
|
MY_ATTRIBUTE((nonnull));
|
|
/** Prepend a file list node to a list.
|
|
@param[in,out] base base node block
|
|
@param[in] boffset byte offset of the base node
|
|
@param[in,out] add block to be added
|
|
@param[in] aoffset byte offset of the node to be added
|
|
@param[in,outr] mtr mini-transaction */
|
|
void flst_add_first(buf_block_t *base, uint16_t boffset,
|
|
buf_block_t *add, uint16_t aoffset, mtr_t *mtr)
|
|
MY_ATTRIBUTE((nonnull));
|
|
/** Remove a file list node.
|
|
@param[in,out] base base node block
|
|
@param[in] boffset byte offset of the base node
|
|
@param[in,out] cur block to be removed
|
|
@param[in] coffset byte offset of the current record to be removed
|
|
@param[in,outr] mtr mini-transaction */
|
|
void flst_remove(buf_block_t *base, uint16_t boffset,
|
|
buf_block_t *cur, uint16_t coffset, mtr_t *mtr)
|
|
MY_ATTRIBUTE((nonnull));
|
|
|
|
/** @return the length of a list */
|
|
inline uint32_t flst_get_len(const flst_base_node_t *base)
|
|
{
|
|
return mach_read_from_4(base + FLST_LEN);
|
|
}
|
|
|
|
/** @return a file address */
|
|
inline fil_addr_t flst_read_addr(const byte *faddr)
|
|
{
|
|
fil_addr_t addr= { mach_read_from_4(faddr + FIL_ADDR_PAGE),
|
|
mach_read_from_2(faddr + FIL_ADDR_BYTE) };
|
|
ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);
|
|
ut_a(ut_align_offset(faddr, srv_page_size) >= FIL_PAGE_DATA);
|
|
return addr;
|
|
}
|
|
|
|
/** @return list first node address */
|
|
inline fil_addr_t flst_get_first(const flst_base_node_t *base)
|
|
{
|
|
return flst_read_addr(base + FLST_FIRST);
|
|
}
|
|
|
|
/** @return list last node address */
|
|
inline fil_addr_t flst_get_last(const flst_base_node_t *base)
|
|
{
|
|
return flst_read_addr(base + FLST_LAST);
|
|
}
|
|
|
|
/** @return list next node address */
|
|
inline fil_addr_t flst_get_next_addr(const flst_node_t* node)
|
|
{
|
|
return flst_read_addr(node + FLST_NEXT);
|
|
}
|
|
|
|
/** @return list prev node address */
|
|
inline fil_addr_t flst_get_prev_addr(const flst_node_t *node)
|
|
{
|
|
return flst_read_addr(node + FLST_PREV);
|
|
}
|
|
|
|
#ifdef UNIV_DEBUG
|
|
/** Validate a file-based list. */
|
|
void flst_validate(const buf_block_t *base, uint16_t boffset, mtr_t *mtr);
|
|
#endif
|
|
|
|
#endif /* !UNIV_INNOCHECKSUM */
|
|
|
|
#endif
|