MDEV-12353: Change the redo log encoding

log_t::FORMAT_10_5: physical redo log format tag

log_phys_t: Buffered records in the physical format.
The log record bytes will follow the last data field,
making use of alignment padding that would otherwise be wasted.
If there are multiple records for the same page, also those
may be appended to an existing log_phys_t object if the memory
is available.

In the physical format, the first byte of a record identifies the
record and its length (up to 15 bytes). For longer records, the
immediately following bytes will encode the remaining length
in a variable-length encoding. Usually, a variable-length-encoded
page identifier will follow, followed by optional payload, whose
length is included in the initially encoded total record length.

When a mini-transaction is updating multiple fields in a page,
it can avoid repeating the tablespace identifier and page number
by setting the same_page flag (most significant bit) in the first
byte of the log record. The byte offset of the record will be
relative to where the previous record for that page ended.

Until MDEV-14425 introduces a separate file-level log for
redo log checkpoints and file operations, we will write the
file-level records in the page-level redo log file.
The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
will be removed in MDEV-14425, and one sequential scan of the
page recovery log will suffice.

Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
If the information is needed, it can be parsed from WRITE records that
modify FSP_SPACE_FLAGS.

MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
as part of this work, before being replaced with WRITE (along with
MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).

mtr_buf_t::empty(): Check if the buffer is empty.

mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.

mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
for the same_page encoding.

page_recv_t::last_offset: Reflects mtr_t::m_last_offset.

Valid values for last_offset during recovery should be 0 or above 8.
(The first 8 bytes of a page are the checksum and the page number,
and neither are ever updated directly by log records.)
Internally, the special value 1 indicates that the same_page form
will not be allowed for the subsequent record.

mtr_t::page_create(): Take the block descriptor as parameter,
so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
record will always followed by a subtype byte, because same_page
records must be longer than 1 byte.

trx_undo_page_init(): Combine the writes in WRITE record.

trx_undo_header_create(): Write 4 bytes using a special MEMSET
record that includes 1 bytes of length and 2 bytes of payload.

flst_write_addr(): Define as a static function. Combine the writes.

flst_zero_both(): Replaces two flst_zero_addr() calls.

flst_init(): Do not inline the function.

fsp_free_seg_inode(): Zerofill the whole inode.

fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
to FIL_NULL when using the physical format.

btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
must have been invoked.

fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.

fil_names_dirty_and_write(): Remove the parameter mtr.
Write the records using a separate mini-transaction object,
because any FILE_ records must be at the start of a mini-transaction log.

recv_recover_page(): Add a fil_space_t* parameter.
After applying log to the a ROW_FORMAT=COMPRESSED page,
invoke buf_zip_decompress() to restore the uncompressed page.

buf_page_io_complete(): Remove the temporary hack to discard the
uncompressed page of a ROW_FORMAT=COMPRESSED page.

page_zip_write_header(): Remove. Use mtr_t::write() or
mtr_t::memset() instead, and update the compressed page frame
separately.

trx_undo_header_add_space_for_xid(): Remove.

trx_undo_seg_create(): Perform the changes that were previously
made by trx_undo_header_add_space_for_xid().

btr_reset_instant(): New function: Reset the table to MariaDB 10.2
or 10.3 format when rolling back an instant ALTER TABLE operation.

page_rec_find_owner_rec(): Merge with the only callers.

page_cur_insert_rec_low(): Combine writes by using a local buffer.
MEMMOVE data from the preceding record whenever feasible
(copying at least 3 bytes).

page_cur_insert_rec_zip(): Combine writes to page header fields.

PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
part from the preceding record.

PageBulk::finishPage(): Combine the writes to the page header
and to the sparse page directory slots.

mtr_t::write(): Only log the least significant (last) bytes
of multi-byte fields that actually differ.

For updating FSP_SIZE, we must always write all 4 bytes to the
redo log, so that the fil_space_set_recv_size() logic in
recv_sys_t::parse() will work.

mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
instead of a numeric offset to the page frame. Only log the
last bytes of multi-byte fields that actually differ.

In fil_space_crypt_t::write_page0(), we must log also any
unchanged bytes, so that recovery will recognize the record
and invoke fil_crypt_parse().

Future work:
MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
MDEV-21725 Optimize btr_page_reorganize_low() redo logging
MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
This commit is contained in:
Marko Mäkelä 2020-02-13 19:12:17 +02:00
commit 7ae21b18a6
49 changed files with 3660 additions and 1820 deletions

View file

@ -590,26 +590,25 @@ std::string filename_to_spacename(const byte *filename, size_t len)
/** Report an operation to create, delete, or rename a file during backup.
@param[in] space_id tablespace identifier
@param[in] flags tablespace flags (NULL if not create)
@param[in] create whether the file is being created
@param[in] name file name (not NUL-terminated)
@param[in] len length of name, in bytes
@param[in] new_name new file name (NULL if not rename)
@param[in] new_len length of new_name, in bytes (0 if NULL) */
static void backup_file_op(ulint space_id, const byte* flags,
static void backup_file_op(ulint space_id, bool create,
const byte* name, ulint len,
const byte* new_name, ulint new_len)
{
ut_ad(!flags || !new_name);
ut_ad(!create || !new_name);
ut_ad(name);
ut_ad(len);
ut_ad(!new_name == !new_len);
pthread_mutex_lock(&backup_mutex);
if (flags) {
if (create) {
ddl_tracker.id_to_name[space_id] = filename_to_spacename(name, len);
msg("DDL tracking : create %zu \"%.*s\": %x",
space_id, int(len), name, mach_read_from_4(flags));
msg("DDL tracking : create %zu \"%.*s\"", space_id, int(len), name);
}
else if (new_name) {
ddl_tracker.id_to_name[space_id] = filename_to_spacename(new_name, new_len);
@ -632,14 +631,14 @@ static void backup_file_op(ulint space_id, const byte* flags,
We will abort backup in this case.
*/
static void backup_file_op_fail(ulint space_id, const byte* flags,
static void backup_file_op_fail(ulint space_id, bool create,
const byte* name, ulint len,
const byte* new_name, ulint new_len)
{
bool fail;
if (flags) {
msg("DDL tracking : create %zu \"%.*s\": %x",
space_id, int(len), name, mach_read_from_4(flags));
if (create) {
msg("DDL tracking : create %zu \"%.*s\"",
space_id, int(len), name);
std::string spacename = filename_to_spacename(name, len);
fail = !check_if_skip_table(spacename.c_str());
}

View file

@ -136,7 +136,7 @@ WHERE engine = 'innodb'
AND support IN ('YES', 'DEFAULT', 'ENABLED');
COUNT(*)
1
FOUND 1 /InnoDB: .* started; log sequence number 121397[09]/ in mysqld.1.err
FOUND 1 /InnoDB: .* started; log sequence number 12139[78]\d; transaction id 0/ in mysqld.1.err
# Empty 10.2 redo log
# restart: --innodb-data-home-dir=MYSQLTEST_VARDIR/tmp/log_corruption --innodb-log-group-home-dir=MYSQLTEST_VARDIR/tmp/log_corruption --innodb-force-recovery=5 --innodb-log-file-size=2m
SELECT COUNT(*) FROM INFORMATION_SCHEMA.ENGINES

View file

@ -1,21 +0,0 @@
# restart
#
# Bug#21801423 INNODB REDO LOG DOES NOT INDICATE WHEN
# FILES ARE CREATED
#
# Bug#21796691 INNODB REDO LOG DOES NOT INDICATE WHEN
# REDO LOGGING IS SKIPPED
#
CREATE TABLE t1 (a INT NOT NULL, b INT UNIQUE) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1,2);
ALTER TABLE t1 ADD PRIMARY KEY(a), LOCK=SHARED, ALGORITHM=INPLACE;
ALTER TABLE t1 DROP INDEX b, ADD INDEX (b), LOCK=SHARED;
# Kill the server
# restart: --debug=d,ib_log
FOUND 2 /scan \d+: multi-log rec MLOG_FILE_CREATE2 len \d+ page \d+:0/ in mysqld.1.err
NOT FOUND /scan \d+: log rec MLOG_INDEX_LOAD/ in mysqld.1.err
CHECK TABLE t1;
Table Op Msg_type Msg_text
test.t1 check status OK
# restart
DROP TABLE t1;

View file

@ -136,7 +136,7 @@ WHERE engine = 'innodb'
AND support IN ('YES', 'DEFAULT', 'ENABLED');
COUNT(*)
1
FOUND 1 /InnoDB: .* started; log sequence number 121397[09]/ in mysqld.1.err
FOUND 1 /InnoDB: .* started; log sequence number 12139[78]\d; transaction id 0/ in mysqld.1.err
# Empty 10.2 redo log
# restart: --innodb-data-home-dir=MYSQLTEST_VARDIR/tmp/log_corruption --innodb-log-group-home-dir=MYSQLTEST_VARDIR/tmp/log_corruption --innodb-force-recovery=5 --innodb-log-file-size=2m
SELECT COUNT(*) FROM INFORMATION_SCHEMA.ENGINES

View file

@ -12,7 +12,7 @@ FOUND 1 /InnoDB: Tablespace 4294967280 was not found at .*, but there were no mo
# restart: --debug=d,innodb_log_abort_3,ib_log --innodb-log-files-in-group=2 --innodb-log-file-size=4M
SELECT * FROM t1;
ERROR 42000: Unknown storage engine 'InnoDB'
FOUND 1 /srv_prepare_to_delete_redo_log_files: ib_log: MLOG_CHECKPOINT.* written/ in mysqld.1.err
FOUND 1 /srv_prepare_to_delete_redo_log_files: ib_log: FILE_CHECKPOINT.* written/ in mysqld.1.err
# restart
# restart
DROP TABLE t1;

View file

@ -1 +0,0 @@
--innodb-log-optimize-ddl

View file

@ -1,46 +0,0 @@
--source include/have_innodb.inc
--source include/have_debug.inc
# Embedded server does not support crashing
--source include/not_embedded.inc
# start afresh
--source include/restart_mysqld.inc
--echo #
--echo # Bug#21801423 INNODB REDO LOG DOES NOT INDICATE WHEN
--echo # FILES ARE CREATED
--echo #
--echo # Bug#21796691 INNODB REDO LOG DOES NOT INDICATE WHEN
--echo # REDO LOGGING IS SKIPPED
--echo #
--source include/no_checkpoint_start.inc
CREATE TABLE t1 (a INT NOT NULL, b INT UNIQUE) ENGINE=InnoDB;
# MLOG_INDEX_LOAD will not be emitted for empty tables. Insert a row.
INSERT INTO t1 VALUES (1,2);
# We should get two MLOG_INDEX_LOAD for this.
ALTER TABLE t1 ADD PRIMARY KEY(a), LOCK=SHARED, ALGORITHM=INPLACE;
# And one MLOG_INDEX_LOAD for this.
ALTER TABLE t1 DROP INDEX b, ADD INDEX (b), LOCK=SHARED;
--let CLEANUP_IF_CHECKPOINT=DROP TABLE t1;
--source include/no_checkpoint_end.inc
--let $restart_parameters= --debug=d,ib_log
--source include/start_mysqld.inc
let SEARCH_FILE = $MYSQLTEST_VARDIR/log/mysqld.1.err;
# ensure that we have exactly 2 records there.
let SEARCH_PATTERN=scan \d+: multi-log rec MLOG_FILE_CREATE2 len \d+ page \d+:0;
--source include/search_pattern_in_file.inc
# ensure that we have 0 records there.
let SEARCH_PATTERN=scan \d+: log rec MLOG_INDEX_LOAD;
--source include/search_pattern_in_file.inc
CHECK TABLE t1;
# Remove the --debug=d,ib_log setting.
--let $restart_parameters=
--source include/restart_mysqld.inc
DROP TABLE t1;

View file

@ -424,8 +424,8 @@ AND support IN ('YES', 'DEFAULT', 'ENABLED');
# In encryption.innodb_encrypt_log_corruption, we would convert the
# log to encrypted format. Writing an extra log checkpoint before the
# redo log conversion would advance the LSN by the size of a
# MLOG_CHECKPOINT record (9 bytes).
--let SEARCH_PATTERN= InnoDB: .* started; log sequence number 121397[09]
# FILE_CHECKPOINT record (12 bytes).
--let SEARCH_PATTERN= InnoDB: .* started; log sequence number 12139[78]\d; transaction id 0
--source include/search_pattern_in_file.inc
--echo # Empty 10.2 redo log

View file

@ -39,7 +39,7 @@ SELECT * FROM t1;
--source include/restart_mysqld.inc
--error ER_UNKNOWN_STORAGE_ENGINE
SELECT * FROM t1;
--let SEARCH_PATTERN= srv_prepare_to_delete_redo_log_files: ib_log: MLOG_CHECKPOINT.* written
--let SEARCH_PATTERN= srv_prepare_to_delete_redo_log_files: ib_log: FILE_CHECKPOINT.* written
--source include/search_pattern_in_file.inc
--let $restart_parameters=

View file

@ -438,32 +438,33 @@ btr_page_create(
ulint level, /*!< in: the B-tree level of the page */
mtr_t* mtr) /*!< in: mtr */
{
ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
byte *index_id= &block->frame[PAGE_HEADER + PAGE_INDEX_ID];
ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
byte *index_id= my_assume_aligned<2>(PAGE_HEADER + PAGE_INDEX_ID +
block->frame);
if (UNIV_LIKELY_NULL(page_zip)) {
page_create_zip(block, index, level, 0, mtr);
mach_write_to_8(index_id, index->id);
page_zip_write_header(block, index_id, 8, mtr);
} else {
page_create(block, mtr, dict_table_is_comp(index->table));
if (index->is_spatial()) {
static_assert(((FIL_PAGE_INDEX & 0xff00)
| byte(FIL_PAGE_RTREE))
== FIL_PAGE_RTREE, "compatibility");
mtr->write<1>(*block, FIL_PAGE_TYPE + 1 + block->frame,
byte(FIL_PAGE_RTREE));
if (mach_read_from_8(block->frame
+ FIL_RTREE_SPLIT_SEQ_NUM)) {
mtr->memset(block, FIL_RTREE_SPLIT_SEQ_NUM,
8, 0);
}
}
/* Set the level of the new index page */
mtr->write<2,mtr_t::OPT>(*block, PAGE_HEADER + PAGE_LEVEL
+ block->frame, level);
mtr->write<8,mtr_t::OPT>(*block, index_id, index->id);
}
if (UNIV_LIKELY_NULL(page_zip))
{
mach_write_to_8(index_id, index->id);
page_create_zip(block, index, level, 0, mtr);
}
else
{
page_create(block, mtr, dict_table_is_comp(index->table));
if (index->is_spatial())
{
static_assert(((FIL_PAGE_INDEX & 0xff00) | byte(FIL_PAGE_RTREE)) ==
FIL_PAGE_RTREE, "compatibility");
mtr->write<1>(*block, FIL_PAGE_TYPE + 1 + block->frame,
byte(FIL_PAGE_RTREE));
if (mach_read_from_8(block->frame + FIL_RTREE_SPLIT_SEQ_NUM))
mtr->memset(block, FIL_RTREE_SPLIT_SEQ_NUM, 8, 0);
}
/* Set the level of the new index page */
mtr->write<2,mtr_t::OPT>(*block,
my_assume_aligned<2>(PAGE_HEADER + PAGE_LEVEL +
block->frame), level);
mtr->write<8,mtr_t::OPT>(*block, index_id, index->id);
}
}
/**************************************************************//**
@ -984,14 +985,12 @@ static void btr_free_root(buf_block_t *block, mtr_t *mtr, bool invalidate)
#endif /* UNIV_BTR_DEBUG */
if (invalidate)
{
byte *page_index_id= PAGE_HEADER + PAGE_INDEX_ID + block->frame;
if (UNIV_LIKELY_NULL(block->page.zip.data))
{
mach_write_to_8(page_index_id, BTR_FREED_INDEX_ID);
page_zip_write_header(block, page_index_id, 8, mtr);
}
else
mtr->write<8,mtr_t::OPT>(*block, page_index_id, BTR_FREED_INDEX_ID);
constexpr uint16_t field= PAGE_HEADER + PAGE_INDEX_ID;
byte *page_index_id= my_assume_aligned<2>(field + block->frame);
if (mtr->write<8,mtr_t::OPT>(*block, page_index_id, BTR_FREED_INDEX_ID) &&
UNIV_LIKELY_NULL(block->page.zip.data))
memcpy_aligned<2>(&block->page.zip.data[field], page_index_id, 8);
}
/* Free the entire segment in small steps. */
@ -1120,16 +1119,17 @@ btr_create(
buf_block_dbg_add_level(block, SYNC_TREE_NODE_NEW);
}
byte* page_index_id = PAGE_HEADER + PAGE_INDEX_ID + block->frame;
ut_ad(!page_has_siblings(block->frame));
constexpr uint16_t field = PAGE_HEADER + PAGE_INDEX_ID;
byte* page_index_id = my_assume_aligned<2>(field + block->frame);
/* Create a new index page on the allocated segment page */
if (UNIV_LIKELY_NULL(block->page.zip.data)) {
page_create_zip(block, index, 0, 0, mtr);
mach_write_to_8(page_index_id, index_id);
page_zip_write_header(block, page_index_id, 8, mtr);
static_assert(FIL_PAGE_PREV % 8 == 0, "alignment");
memset_aligned<8>(FIL_PAGE_PREV + block->page.zip.data,
0xff, 8);
ut_ad(!page_has_siblings(block->page.zip.data));
page_create_zip(block, index, 0, 0, mtr);
} else {
page_create(block, mtr, index->table->not_redundant());
if (index->is_spatial()) {
@ -1150,11 +1150,6 @@ btr_create(
mtr->write<8,mtr_t::OPT>(*block, page_index_id, index_id);
}
/* Set the next node and previous node fields */
compile_time_assert(FIL_PAGE_NEXT == FIL_PAGE_PREV + 4);
compile_time_assert(FIL_NULL == 0xffffffff);
mtr->memset(block, FIL_PAGE_PREV, 8, 0xff);
/* We reset the free bits for the page in a separate
mini-transaction to allow creation of several trees in the
same mtr, otherwise the latch on a bitmap page would prevent
@ -1781,6 +1776,49 @@ void btr_set_instant(buf_block_t* root, const dict_index_t& index, mtr_t* mtr)
}
}
/** Reset the table to the canonical format on ROLLBACK of instant ALTER TABLE.
@param[in] index clustered index with instant ALTER TABLE
@param[in] all whether to reset FIL_PAGE_TYPE as well
@param[in,out] mtr mini-transaction */
ATTRIBUTE_COLD
void btr_reset_instant(const dict_index_t &index, bool all, mtr_t *mtr)
{
ut_ad(!index.table->is_temporary());
ut_ad(index.is_primary());
if (buf_block_t *root = btr_root_block_get(&index, RW_SX_LATCH, mtr))
{
byte *page_type= root->frame + FIL_PAGE_TYPE;
if (all)
{
ut_ad(mach_read_from_2(page_type) == FIL_PAGE_TYPE_INSTANT ||
mach_read_from_2(page_type) == FIL_PAGE_INDEX);
mtr->write<2,mtr_t::OPT>(*root, page_type, FIL_PAGE_INDEX);
byte *instant= PAGE_INSTANT + PAGE_HEADER + root->frame;
mtr->write<2,mtr_t::OPT>(*root, instant,
page_ptr_get_direction(instant + 1));
}
else
ut_ad(mach_read_from_2(page_type) == FIL_PAGE_TYPE_INSTANT);
static const byte supremuminfimum[8 + 8] = "supremuminfimum";
uint16_t infimum, supremum;
if (page_is_comp(root->frame))
{
infimum= PAGE_NEW_INFIMUM;
supremum= PAGE_NEW_SUPREMUM;
}
else
{
infimum= PAGE_OLD_INFIMUM;
supremum= PAGE_OLD_SUPREMUM;
}
ut_ad(!memcmp(&root->frame[infimum], supremuminfimum + 8, 8) ==
!memcmp(&root->frame[supremum], supremuminfimum, 8));
mtr->memcpy<mtr_t::OPT>(*root, &root->frame[infimum], supremuminfimum + 8,
8);
mtr->memcpy<mtr_t::OPT>(*root, &root->frame[supremum], supremuminfimum, 8);
}
}
/*************************************************************//**
Makes tree one level higher by splitting the root, and inserts
the tuple. It is assumed that mtr contains an x-latch on the tree.
@ -1859,16 +1897,13 @@ btr_root_raise_and_insert(
== page_zip_get_size(root_page_zip));
btr_page_create(new_block, new_page_zip, index, level, mtr);
/* Set the next node and previous node fields of new page */
if (!page_has_siblings(new_block->frame)) {
ut_ad(index->is_ibuf());
} else {
if (page_has_siblings(new_block->frame)) {
compile_time_assert(FIL_PAGE_NEXT == FIL_PAGE_PREV + 4);
compile_time_assert(FIL_NULL == 0xffffffff);
static_assert(FIL_PAGE_PREV % 8 == 0, "alignment");
memset_aligned<8>(new_block->frame + FIL_PAGE_PREV, 0xff, 8);
mtr->memset(new_block, FIL_PAGE_PREV, 8, 0xff);
if (UNIV_LIKELY_NULL(new_page_zip)) {
static_assert(FIL_PAGE_PREV % 8 == 0, "alignment");
memset_aligned<8>(new_page_zip->data + FIL_PAGE_PREV,
0xff, 8);
}
@ -1902,6 +1937,7 @@ btr_root_raise_and_insert(
}
}
constexpr uint16_t max_trx_id = PAGE_HEADER + PAGE_MAX_TRX_ID;
if (dict_index_is_sec_or_ibuf(index)) {
/* In secondary indexes and the change buffer,
PAGE_MAX_TRX_ID can be reset on the root page, because
@ -1910,11 +1946,12 @@ btr_root_raise_and_insert(
set PAGE_MAX_TRX_ID on all secondary index pages.) */
byte* p = my_assume_aligned<8>(
PAGE_HEADER + PAGE_MAX_TRX_ID + root->frame);
if (UNIV_LIKELY_NULL(root->page.zip.data)) {
memset_aligned<8>(p, 0, 8);
page_zip_write_header(root, p, 8, mtr);
} else if (mach_read_from_8(p)) {
mtr->memset(root, PAGE_HEADER + PAGE_MAX_TRX_ID, 8, 0);
if (mach_read_from_8(p)) {
mtr->memset(root, max_trx_id, 8, 0);
if (UNIV_LIKELY_NULL(root->page.zip.data)) {
memset_aligned<8>(max_trx_id
+ root->page.zip.data, 0, 8);
}
}
} else {
/* PAGE_ROOT_AUTO_INC is only present in the clustered index
@ -1922,12 +1959,13 @@ btr_root_raise_and_insert(
the field PAGE_MAX_TRX_ID for future use. */
byte* p = my_assume_aligned<8>(
PAGE_HEADER + PAGE_MAX_TRX_ID + new_block->frame);
if (UNIV_LIKELY_NULL(new_block->page.zip.data)) {
memset_aligned<8>(p, 0, 8);
page_zip_write_header(new_block, p, 8, mtr);
} else if (mach_read_from_8(p)) {
mtr->memset(new_block, PAGE_HEADER + PAGE_MAX_TRX_ID,
8, 0);
if (mach_read_from_8(p)) {
mtr->memset(new_block, max_trx_id, 8, 0);
if (UNIV_LIKELY_NULL(new_block->page.zip.data)) {
memset_aligned<8>(max_trx_id
+ new_block->page.zip.data,
0, 8);
}
}
}
@ -2522,37 +2560,15 @@ btr_attach_half_pages(
if (direction == FSP_DOWN) {
ut_ad(lower_block == new_block);
ut_ad(btr_page_get_next(upper_block->frame) == next_page_no);
if (UNIV_UNLIKELY(btr_page_get_prev(lower_block->frame)
== prev_page_no)) {
ut_ad(index->is_ibuf());
} else {
btr_page_set_prev(lower_block, prev_page_no, mtr);
}
btr_page_set_prev(lower_block, prev_page_no, mtr);
} else {
ut_ad(upper_block == new_block);
ut_ad(btr_page_get_prev(lower_block->frame) == prev_page_no);
if (UNIV_UNLIKELY(btr_page_get_next(upper_block->frame)
== next_page_no)) {
ut_ad(index->is_ibuf());
} else {
btr_page_set_next(upper_block, next_page_no, mtr);
}
btr_page_set_next(upper_block, next_page_no, mtr);
}
if (UNIV_UNLIKELY(btr_page_get_next(lower_block->frame)
== upper_block->page.id.page_no())) {
ut_ad(index->is_ibuf());
} else {
btr_page_set_next(lower_block, upper_block->page.id.page_no(),
mtr);
}
if (UNIV_UNLIKELY(btr_page_get_prev(upper_block->frame)
== lower_block->page.id.page_no())) {
ut_ad(index->is_ibuf());
} else {
btr_page_set_prev(upper_block, lower_block->page.id.page_no(),
mtr);
}
btr_page_set_prev(upper_block, lower_block->page.id.page_no(), mtr);
btr_page_set_next(lower_block, upper_block->page.id.page_no(), mtr);
}
/*************************************************************//**
@ -2838,8 +2854,9 @@ func_start:
return(NULL););
/* 2. Allocate a new page to the index */
const uint16_t page_level = btr_page_get_level(page);
new_block = btr_page_alloc(cursor->index, hint_page_no, direction,
btr_page_get_level(page), mtr, mtr);
page_level, mtr, mtr);
if (!new_block) {
return(NULL);
@ -2847,10 +2864,16 @@ func_start:
new_page = buf_block_get_frame(new_block);
new_page_zip = buf_block_get_page_zip(new_block);
if (page_level && UNIV_LIKELY_NULL(new_page_zip)) {
/* ROW_FORMAT=COMPRESSED non-leaf pages are not expected
to contain FIL_NULL in FIL_PAGE_PREV at this stage. */
memset_aligned<4>(new_page + FIL_PAGE_PREV, 0, 4);
}
btr_page_create(new_block, new_page_zip, cursor->index,
btr_page_get_level(page), mtr);
page_level, mtr);
/* Only record the leaf level page splits. */
if (page_is_leaf(page)) {
if (!page_level) {
cursor->index->stat_defrag_n_page_split ++;
cursor->index->stat_defrag_modified_counter ++;
btr_defragment_save_defrag_stats_if_needed(cursor->index);
@ -2895,6 +2918,7 @@ insert_empty:
/* 4. Do first the modifications in the tree structure */
/* FIXME: write FIL_PAGE_PREV,FIL_PAGE_NEXT in new_block earlier! */
btr_attach_half_pages(flags, cursor->index, block,
first_rec, new_block, direction, mtr);

View file

@ -82,26 +82,21 @@ PageBulk::init()
new_page = buf_block_get_frame(new_block);
new_page_no = page_get_page_no(new_page);
byte* index_id = PAGE_HEADER + PAGE_INDEX_ID + new_page;
byte* index_id = my_assume_aligned<2>
(PAGE_HEADER + PAGE_INDEX_ID + new_page);
compile_time_assert(FIL_PAGE_NEXT == FIL_PAGE_PREV + 4);
compile_time_assert(FIL_NULL == 0xffffffff);
memset_aligned<8>(new_page + FIL_PAGE_PREV, 0xff, 8);
if (UNIV_LIKELY_NULL(new_block->page.zip.data)) {
mach_write_to_8(index_id, m_index->id);
page_create_zip(new_block, m_index, m_level, 0,
&m_mtr);
static_assert(FIL_PAGE_PREV % 8 == 0, "alignment");
memset_aligned<8>(FIL_PAGE_PREV + new_page, 0xff, 8);
page_zip_write_header(new_block,
FIL_PAGE_PREV + new_page,
8, &m_mtr);
mach_write_to_8(index_id, m_index->id);
page_zip_write_header(new_block, index_id, 8, &m_mtr);
} else {
ut_ad(!m_index->is_spatial());
page_create(new_block, &m_mtr,
m_index->table->not_redundant());
compile_time_assert(FIL_PAGE_NEXT
== FIL_PAGE_PREV + 4);
compile_time_assert(FIL_NULL == 0xffffffff);
m_mtr.memset(new_block, FIL_PAGE_PREV, 8, 0xff);
m_mtr.memset(*new_block, FIL_PAGE_PREV, 8, 0xff);
m_mtr.write<2,mtr_t::OPT>(*new_block,
PAGE_HEADER + PAGE_LEVEL
+ new_page, m_level);
@ -155,22 +150,25 @@ PageBulk::init()
/** Insert a record in the page.
@tparam fmt the page format
@param[in] rec record
@param[in,out] rec record
@param[in] offsets record offsets */
template<PageBulk::format fmt>
inline void PageBulk::insertPage(const rec_t *rec, offset_t *offsets)
inline void PageBulk::insertPage(rec_t *rec, offset_t *offsets)
{
ut_ad((m_page_zip != nullptr) == (fmt == COMPRESSED));
ut_ad((fmt != REDUNDANT) == m_is_comp);
ut_ad(page_align(m_heap_top) == m_page);
ut_ad(m_heap);
ulint rec_size= rec_offs_size(offsets);
const ulint rec_size= rec_offs_size(offsets);
const ulint extra_size= rec_offs_extra_size(offsets);
ut_ad(page_align(m_heap_top + rec_size) == m_page);
ut_d(const bool is_leaf= page_rec_is_leaf(m_cur_rec));
#ifdef UNIV_DEBUG
/* Check whether records are in order. */
if (!page_rec_is_infimum_low(page_offset(m_cur_rec)))
if (page_offset(m_cur_rec) !=
(fmt == REDUNDANT ? PAGE_OLD_INFIMUM : PAGE_NEW_INFIMUM))
{
const rec_t *old_rec = m_cur_rec;
offset_t *old_offsets= rec_get_offsets(old_rec, m_index, nullptr, is_leaf,
@ -181,41 +179,126 @@ inline void PageBulk::insertPage(const rec_t *rec, offset_t *offsets)
m_total_data+= rec_size;
#endif /* UNIV_DEBUG */
/* Copy the record payload. */
rec_t *insert_rec= rec_copy(m_heap_top, rec, offsets);
ut_ad(page_align(insert_rec) == m_page);
rec_offs_make_valid(insert_rec, m_index, is_leaf, offsets);
rec_t* const insert_rec= m_heap_top + extra_size;
/* Insert the record in the linked list. */
if (fmt != REDUNDANT)
{
rec_t *next_rec= m_page +
const rec_t *next_rec= m_page +
page_offset(m_cur_rec + mach_read_from_2(m_cur_rec - REC_NEXT));
mach_write_to_2(insert_rec - REC_NEXT,
static_cast<uint16_t>(next_rec - insert_rec));
if (fmt != COMPRESSED)
m_mtr.write<2>(*m_block, m_cur_rec - REC_NEXT,
static_cast<uint16_t>(insert_rec - m_cur_rec));
else
{
mach_write_to_2(m_cur_rec - REC_NEXT,
static_cast<uint16_t>(insert_rec - m_cur_rec));
rec_set_bit_field_1(insert_rec, 0, REC_NEW_N_OWNED, REC_N_OWNED_MASK,
memcpy(m_heap_top, rec - extra_size, rec_size);
}
rec_t * const this_rec= fmt != COMPRESSED
? const_cast<rec_t*>(rec) : insert_rec;
rec_set_bit_field_1(this_rec, 0, REC_NEW_N_OWNED, REC_N_OWNED_MASK,
REC_N_OWNED_SHIFT);
rec_set_bit_field_2(insert_rec, PAGE_HEAP_NO_USER_LOW + m_rec_no,
rec_set_bit_field_2(this_rec, PAGE_HEAP_NO_USER_LOW + m_rec_no,
REC_NEW_HEAP_NO, REC_HEAP_NO_MASK, REC_HEAP_NO_SHIFT);
mach_write_to_2(this_rec - REC_NEXT,
static_cast<uint16_t>(next_rec - insert_rec));
}
else
{
memcpy(insert_rec - REC_NEXT, m_cur_rec - REC_NEXT, 2);
memcpy(const_cast<rec_t*>(rec) - REC_NEXT, m_cur_rec - REC_NEXT, 2);
m_mtr.write<2>(*m_block, m_cur_rec - REC_NEXT, page_offset(insert_rec));
rec_set_bit_field_1(insert_rec, 0, REC_OLD_N_OWNED, REC_N_OWNED_MASK,
REC_N_OWNED_SHIFT);
rec_set_bit_field_2(insert_rec, PAGE_HEAP_NO_USER_LOW + m_rec_no,
rec_set_bit_field_1(const_cast<rec_t*>(rec), 0,
REC_OLD_N_OWNED, REC_N_OWNED_MASK, REC_N_OWNED_SHIFT);
rec_set_bit_field_2(const_cast<rec_t*>(rec),
PAGE_HEAP_NO_USER_LOW + m_rec_no,
REC_OLD_HEAP_NO, REC_HEAP_NO_MASK, REC_HEAP_NO_SHIFT);
}
if (fmt != COMPRESSED)
m_mtr.memcpy(*m_block, page_offset(m_heap_top), rec_offs_size(offsets));
if (fmt == COMPRESSED)
/* We already wrote the record. Log is written in PageBulk::compress(). */;
else if (page_offset(m_cur_rec) ==
(fmt == REDUNDANT ? PAGE_OLD_INFIMUM : PAGE_NEW_INFIMUM))
m_mtr.memcpy(*m_block, m_heap_top, rec - extra_size, rec_size);
else
{
/* Try to copy common prefix from the preceding record. */
const byte *r= rec - extra_size;
const byte * const insert_rec_end= m_heap_top + rec_size;
byte *b= m_heap_top;
/* Skip any unchanged prefix of the record. */
for (; * b == *r; b++, r++);
ut_ad(b < insert_rec_end);
const byte *c= m_cur_rec - (rec - r);
const byte * const c_end= std::min(m_cur_rec + rec_offs_data_size(offsets),
m_heap_top);
/* Try to copy any bytes of the preceding record. */
if (UNIV_LIKELY(c >= m_page && c < c_end))
{
const byte *cm= c;
byte *bm= b;
const byte *rm= r;
for (; cm < c_end && *rm == *cm; cm++, bm++, rm++);
ut_ad(bm <= insert_rec_end);
size_t len= static_cast<size_t>(rm - r);
ut_ad(!memcmp(r, c, len));
if (len > 2)
{
memcpy(b, c, len);
m_mtr.memmove(*m_block, page_offset(b), page_offset(c), len);
c= cm;
b= bm;
r= rm;
}
}
if (c < m_cur_rec)
{
if (!rec_offs_data_size(offsets))
{
no_data:
m_mtr.memcpy<mtr_t::FORCED>(*m_block, b, r, m_cur_rec - c);
goto rec_done;
}
/* Some header bytes differ. Compare the data separately. */
const byte *cd= m_cur_rec;
byte *bd= insert_rec;
const byte *rd= rec;
/* Skip any unchanged prefix of the record. */
for (; *bd == *rd; cd++, bd++, rd++)
if (bd == insert_rec_end)
goto no_data;
/* Try to copy any data bytes of the preceding record. */
const byte *cdm= cd;
const byte *rdm= rd;
for (; cdm < c_end && *rdm == *cdm; cdm++, rdm++)
ut_ad(rdm - rd + bd <= insert_rec_end);
size_t len= static_cast<size_t>(rdm - rd);
ut_ad(!memcmp(rd, cd, len));
if (len > 2)
{
m_mtr.memcpy<mtr_t::FORCED>(*m_block, b, r, m_cur_rec - c);
memcpy(bd, cd, len);
m_mtr.memmove(*m_block, page_offset(bd), page_offset(cd), len);
c= cdm;
b= rdm - rd + bd;
r= rdm;
}
}
if (size_t len= static_cast<size_t>(insert_rec_end - b))
m_mtr.memcpy<mtr_t::FORCED>(*m_block, b, r, len);
}
rec_done:
ut_ad(fmt == COMPRESSED || !memcmp(m_heap_top, rec - extra_size, rec_size));
rec_offs_make_valid(insert_rec, m_index, is_leaf, offsets);
/* Update the member variables. */
ulint slot_size= page_dir_calc_reserved_space(m_rec_no + 1) -
@ -235,12 +318,25 @@ inline void PageBulk::insertPage(const rec_t *rec, offset_t *offsets)
@param[in] offsets record offsets */
inline void PageBulk::insert(const rec_t *rec, offset_t *offsets)
{
byte rec_hdr[REC_N_OLD_EXTRA_BYTES];
static_assert(REC_N_OLD_EXTRA_BYTES > REC_N_NEW_EXTRA_BYTES, "file format");
if (UNIV_LIKELY_NULL(m_page_zip))
insertPage<COMPRESSED>(rec, offsets);
insertPage<COMPRESSED>(const_cast<rec_t*>(rec), offsets);
else if (m_is_comp)
insertPage<DYNAMIC>(rec, offsets);
{
memcpy(rec_hdr, rec - REC_N_NEW_EXTRA_BYTES, REC_N_NEW_EXTRA_BYTES);
insertPage<DYNAMIC>(const_cast<rec_t*>(rec), offsets);
memcpy(const_cast<rec_t*>(rec) - REC_N_NEW_EXTRA_BYTES, rec_hdr,
REC_N_NEW_EXTRA_BYTES);
}
else
insertPage<REDUNDANT>(rec, offsets);
{
memcpy(rec_hdr, rec - REC_N_OLD_EXTRA_BYTES, REC_N_OLD_EXTRA_BYTES);
insertPage<REDUNDANT>(const_cast<rec_t*>(rec), offsets);
memcpy(const_cast<rec_t*>(rec) - REC_N_OLD_EXTRA_BYTES, rec_hdr,
REC_N_OLD_EXTRA_BYTES);
}
}
/** Set the number of owned records in the uncompressed page of
@ -283,18 +379,13 @@ inline void PageBulk::finishPage()
if (count == (PAGE_DIR_SLOT_MAX_N_OWNED + 1) / 2)
{
slot-= PAGE_DIR_SLOT_SIZE;
mach_write_to_2(slot, offset);
if (fmt != COMPRESSED)
{
m_mtr.write<2,mtr_t::OPT>(*m_block, slot, offset);
page_rec_set_n_owned<false>(m_block, m_page + offset, count, true,
&m_mtr);
}
else
{
mach_write_to_2(slot, offset);
rec_set_n_owned_zip(m_page + offset, count);
}
count= 0;
}
@ -321,17 +412,12 @@ inline void PageBulk::finishPage()
else
slot-= PAGE_DIR_SLOT_SIZE;
mach_write_to_2(slot, PAGE_NEW_SUPREMUM);
if (fmt != COMPRESSED)
{
m_mtr.write<2,mtr_t::OPT>(*m_block, slot, PAGE_NEW_SUPREMUM);
page_rec_set_n_owned<false>(m_block, m_page + PAGE_NEW_SUPREMUM,
count + 1, true, &m_mtr);
}
else
{
mach_write_to_2(slot, PAGE_NEW_SUPREMUM);
rec_set_n_owned_zip(m_page + PAGE_NEW_SUPREMUM, count + 1);
}
}
else
{
@ -347,7 +433,7 @@ inline void PageBulk::finishPage()
if (count == (PAGE_DIR_SLOT_MAX_N_OWNED + 1) / 2)
{
slot-= PAGE_DIR_SLOT_SIZE;
m_mtr.write<2,mtr_t::OPT>(*m_block, slot, page_offset(insert_rec));
mach_write_to_2(slot, page_offset(insert_rec));
page_rec_set_n_owned<false>(m_block, insert_rec, count, false, &m_mtr);
count= 0;
}
@ -368,31 +454,35 @@ inline void PageBulk::finishPage()
else
slot-= PAGE_DIR_SLOT_SIZE;
m_mtr.write<2,mtr_t::OPT>(*m_block, slot, PAGE_OLD_SUPREMUM);
mach_write_to_2(slot, PAGE_OLD_SUPREMUM);
page_rec_set_n_owned<false>(m_block, m_page + PAGE_OLD_SUPREMUM, count + 1,
false, &m_mtr);
}
ut_ad(!dict_index_is_spatial(m_index));
ut_ad(!m_index->is_spatial());
ut_ad(!page_get_instant(m_page));
ut_ad(!mach_read_from_2(PAGE_HEADER + PAGE_N_DIRECTION + m_page));
if (fmt != COMPRESSED)
{
m_mtr.write<2,mtr_t::OPT>(*m_block,
PAGE_HEADER + PAGE_N_DIR_SLOTS + m_page,
1 + static_cast<ulint>(slot0 - slot) /
PAGE_DIR_SLOT_SIZE);
m_mtr.write<2>(*m_block, PAGE_HEADER + PAGE_HEAP_TOP + m_page,
static_cast<ulint>(m_heap_top - m_page));
m_mtr.write<2>(*m_block, PAGE_HEADER + PAGE_N_HEAP + m_page,
(PAGE_HEAP_NO_USER_LOW + m_rec_no) |
uint16_t{fmt != REDUNDANT} << 15);
m_mtr.write<2>(*m_block, PAGE_HEADER + PAGE_N_RECS + m_page, m_rec_no);
m_mtr.write<2>(*m_block, PAGE_HEADER + PAGE_LAST_INSERT + m_page,
static_cast<ulint>(m_cur_rec - m_page));
m_mtr.write<2>(*m_block, PAGE_HEADER + PAGE_DIRECTION_B - 1 + m_page,
PAGE_RIGHT);
static_assert(PAGE_N_DIR_SLOTS == 0, "compatibility");
alignas(8) byte page_header[PAGE_N_RECS + 2];
mach_write_to_2(page_header + PAGE_N_DIR_SLOTS,
1 + (slot0 - slot) / PAGE_DIR_SLOT_SIZE);
mach_write_to_2(page_header + PAGE_HEAP_TOP, m_heap_top - m_page);
mach_write_to_2(page_header + PAGE_N_HEAP,
(PAGE_HEAP_NO_USER_LOW + m_rec_no) |
uint16_t{fmt != REDUNDANT} << 15);
memset_aligned<2>(page_header + PAGE_FREE, 0, 4);
static_assert(PAGE_GARBAGE == PAGE_FREE + 2, "compatibility");
mach_write_to_2(page_header + PAGE_LAST_INSERT, m_cur_rec - m_page);
mach_write_to_2(page_header + PAGE_DIRECTION_B - 1, PAGE_RIGHT);
mach_write_to_2(page_header + PAGE_N_DIRECTION, m_rec_no);
memcpy_aligned<2>(page_header + PAGE_N_RECS,
page_header + PAGE_N_DIRECTION, 2);
m_mtr.memcpy(*m_block, PAGE_HEADER + m_page, page_header,
sizeof page_header);
m_mtr.memcpy(*m_block, page_offset(slot), slot0 - slot);
}
else
{

View file

@ -3898,49 +3898,94 @@ static void btr_cur_write_sys(
}
/** Update DB_TRX_ID, DB_ROLL_PTR in a clustered index record.
@param[in,out] block clustered index leaf page
@param[in,out] rec clustered index record
@param[in] index clustered index
@param[in] offsets rec_get_offsets(rec, index)
@param[in] trx transaction
@param[in] roll_ptr DB_ROLL_PTR value
@param[in,out] mtr mini-transaction */
static void btr_cur_upd_rec_sys(buf_block_t *block, rec_t* rec,
dict_index_t* index, const offset_t* offsets,
const trx_t* trx, roll_ptr_t roll_ptr,
mtr_t* mtr)
@param[in,out] block clustered index leaf page
@param[in,out] rec clustered index record
@param[in] index clustered index
@param[in] offsets rec_get_offsets(rec, index)
@param[in] trx transaction
@param[in] roll_ptr DB_ROLL_PTR value
@param[in,out] mtr mini-transaction */
static void btr_cur_upd_rec_sys(buf_block_t *block, rec_t *rec,
dict_index_t *index, const offset_t *offsets,
const trx_t *trx, roll_ptr_t roll_ptr,
mtr_t *mtr)
{
ut_ad(index->is_primary());
ut_ad(rec_offs_validate(rec, index, offsets));
ut_ad(index->is_primary());
ut_ad(rec_offs_validate(rec, index, offsets));
if (UNIV_LIKELY_NULL(block->page.zip.data)) {
page_zip_write_trx_id_and_roll_ptr(block, rec, offsets,
index->db_trx_id(),
trx->id, roll_ptr, mtr);
} else {
ulint offset = index->trx_id_offset;
if (UNIV_LIKELY_NULL(block->page.zip.data))
{
page_zip_write_trx_id_and_roll_ptr(block, rec, offsets, index->db_trx_id(),
trx->id, roll_ptr, mtr);
return;
}
if (!offset) {
offset = row_get_trx_id_offset(index, offsets);
}
ulint offset= index->trx_id_offset;
compile_time_assert(DATA_TRX_ID + 1 == DATA_ROLL_PTR);
if (!offset)
offset= row_get_trx_id_offset(index, offsets);
/* During IMPORT the trx id in the record can be in the
future, if the .ibd file is being imported from another
instance. During IMPORT roll_ptr will be 0. */
ut_ad(roll_ptr == 0
|| lock_check_trx_id_sanity(
trx_read_trx_id(rec + offset),
rec, index, offsets));
compile_time_assert(DATA_TRX_ID + 1 == DATA_ROLL_PTR);
trx_write_trx_id(rec + offset, trx->id);
trx_write_roll_ptr(rec + offset + DATA_TRX_ID_LEN, roll_ptr);
/* MDEV-12353 FIXME: consider emitting MEMMOVE for the
DB_TRX_ID if it is found in the preceding record */
mtr->memcpy(*block, page_offset(rec + offset),
DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN);
}
/* During IMPORT the trx id in the record can be in the future, if
the .ibd file is being imported from another instance. During IMPORT
roll_ptr will be 0. */
ut_ad(roll_ptr == 0 ||
lock_check_trx_id_sanity(trx_read_trx_id(rec + offset),
rec, index, offsets));
byte sys[DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN];
trx_write_trx_id(sys, trx->id);
trx_write_roll_ptr(sys + DATA_TRX_ID_LEN, roll_ptr);
ulint d= 0;
const byte *src= nullptr;
byte *dest= rec + offset;
ulint len= DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN;
if (UNIV_LIKELY(index->trx_id_offset))
{
const rec_t *prev= page_rec_get_prev_const(rec);
if (UNIV_UNLIKELY(prev == rec))
ut_ad(0);
else if (page_rec_is_infimum(prev));
else
for (src= prev + offset; d < DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN; d++)
if (src[d] != sys[d])
break;
if (d > 6 && memcmp(dest, sys, d))
{
/* We save space by replacing a single record
WRITE,page_offset(dest),byte[13]
with two records:
MEMMOVE,page_offset(dest),d(1 byte),offset(1..3 bytes),
WRITE|0x80,0,byte[13-d]
The single WRITE record would be x+13 bytes long, with x>2.
The MEMMOVE record would be up to x+1+3 = x+4 bytes, and the
second WRITE would be 1+1+13-d = 15-d bytes.
The total size is: x+13 versus x+4+15-d = x+19-d bytes.
To save space, we must have d>6, that is, the complete DB_TRX_ID and
the first byte(s) of DB_ROLL_PTR must match the previous record. */
memcpy(dest, src, d);
mtr->memmove(*block, page_offset(dest), page_offset(src), d);
dest+= d;
len-= d;
/* DB_TRX_ID,DB_ROLL_PTR must be unique in each record when
DB_TRX_ID refers to an active transaction. */
ut_ad(len);
}
else
d= 0;
}
if (UNIV_LIKELY(len)) /* extra safety, to avoid corrupting the log */
mtr->memcpy<mtr_t::OPT>(*block, dest, sys + d, len);
}
/*********************************************************************//**
@ -4400,10 +4445,13 @@ void btr_cur_upd_rec_in_place(rec_t *rec, const dict_index_t *index,
if (UNIV_UNLIKELY(dfield_is_null(&uf->new_val))) {
ut_ad(!rec_offs_nth_sql_null(offsets, n));
ut_ad(!index->table->not_redundant());
mtr->memset(block,
page_offset(rec + rec_get_field_start_offs(
rec, n)),
rec_get_nth_field_size(rec, n), 0);
if (ulint size = rec_get_nth_field_size(rec, n)) {
mtr->memset(
block,
page_offset(rec_get_field_start_offs(
rec, n) + rec),
size, 0);
}
ulint l = rec_get_1byte_offs_flag(rec)
? (n + 1) : (n + 1) * 2;
byte* b = &rec[-REC_N_OLD_EXTRA_BYTES - l];
@ -4436,7 +4484,10 @@ void btr_cur_upd_rec_in_place(rec_t *rec, const dict_index_t *index,
byte(*b & ~REC_1BYTE_SQL_NULL_MASK));
}
mtr->memcpy(block, page_offset(data), uf->new_val.data, len);
if (len) {
mtr->memcpy<mtr_t::OPT>(*block, data, uf->new_val.data,
len);
}
}
if (UNIV_LIKELY_NULL(block->page.zip.data)) {
@ -7855,21 +7906,10 @@ btr_store_big_rec_extern_fields(
int err;
page_zip_des_t* blob_page_zip;
/* Write FIL_PAGE_TYPE to the redo log
separately, before logging any other
changes to the block, so that the debug
assertions in
recv_parse_or_apply_log_rec_body() can
be made simpler. Before InnoDB Plugin
1.0.4, the initialization of
FIL_PAGE_TYPE was logged as part of
the mtr_t::memcpy() below. */
mtr.write<2>(*block,
block->frame + FIL_PAGE_TYPE,
prev_page_no == FIL_NULL
? FIL_PAGE_TYPE_ZBLOB
: FIL_PAGE_TYPE_ZBLOB2);
mach_write_to_2(block->frame + FIL_PAGE_TYPE,
prev_page_no == FIL_NULL
? FIL_PAGE_TYPE_ZBLOB
: FIL_PAGE_TYPE_ZBLOB2);
c_stream.next_out = block->frame
+ FIL_PAGE_DATA;
@ -7886,9 +7926,9 @@ btr_store_big_rec_extern_fields(
compile_time_assert(FIL_NULL == 0xffffffff);
mtr.memset(block, FIL_PAGE_PREV, 8, 0xff);
mtr.memcpy(*block,
FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION,
FIL_PAGE_TYPE,
page_zip_get_size(page_zip)
- FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION
- FIL_PAGE_TYPE
- c_stream.avail_out);
/* Zero out the unused part of the page. */
if (c_stream.avail_out) {
@ -7966,12 +8006,14 @@ next_zip_page:
store_len = extern_len;
}
mtr.memcpy(block,
FIL_PAGE_DATA + BTR_BLOB_HDR_SIZE,
(const byte*)
big_rec_vec->fields[i].data
+ big_rec_vec->fields[i].len
- extern_len, store_len);
mtr.memcpy<mtr_t::OPT>(
*block,
FIL_PAGE_DATA + BTR_BLOB_HDR_SIZE
+ block->frame,
static_cast<const byte*>
(big_rec_vec->fields[i].data)
+ big_rec_vec->fields[i].len
- extern_len, store_len);
mtr.write<4>(*block, BTR_BLOB_HDR_PART_LEN
+ FIL_PAGE_DATA + block->frame,
store_len);

View file

@ -5493,7 +5493,7 @@ release_page:
}
if (recv_recovery_is_on()) {
recv_recover_page(bpage);
recv_recover_page(space, bpage);
}
if (uncompressed
@ -5536,27 +5536,13 @@ release_page:
ut_ad(buf_pool->n_pend_reads > 0);
buf_pool->n_pend_reads--;
buf_pool->stat.n_pages_read++;
ut_ad(!uncompressed || !bpage->zip.data
|| !recv_recovery_is_on()
|| buf_page_can_relocate(bpage));
mutex_exit(block_mutex);
if (uncompressed) {
#if 1 /* MDEV-12353 FIXME: Remove this! */
if (UNIV_LIKELY_NULL(bpage->zip.data)
&& recv_recovery_is_on()) {
rw_lock_x_unlock_gen(
&reinterpret_cast<buf_block_t*>(bpage)
->lock, BUF_IO_READ);
if (!buf_LRU_free_page(bpage, false)) {
ut_ad(!"could not remove");
}
goto func_exit;
}
#endif
rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock,
BUF_IO_READ);
}
mutex_exit(block_mutex);
} else {
/* Write means a flush operation: call the completion
routine in the flush system */
@ -5590,7 +5576,6 @@ release_page:
DBUG_PRINT("ib_buf", ("%s page %u:%u",
io_type == BUF_IO_READ ? "read" : "wrote",
bpage->id.space(), bpage->id.page_no()));
func_exit:
mutex_exit(&buf_pool->mutex);
return DB_SUCCESS;
}

View file

@ -418,9 +418,7 @@ void fil_space_crypt_t::write_page0(buf_block_t* block, mtr_t* mtr)
+ fsp_header_get_encryption_offset(block->zip_size());
byte* b = block->frame + offset;
if (memcmp(b, CRYPT_MAGIC, MAGIC_SZ)) {
mtr->memcpy(block, offset, CRYPT_MAGIC, MAGIC_SZ);
}
mtr->memcpy<mtr_t::OPT>(*block, b, CRYPT_MAGIC, MAGIC_SZ);
b += MAGIC_SZ;
byte* const start = b;
@ -436,6 +434,8 @@ void fil_space_crypt_t::write_page0(buf_block_t* block, mtr_t* mtr)
b += 4;
*b++ = byte(encryption);
ut_ad(b - start == 11 + MY_AES_BLOCK_SIZE);
/* We must log also any unchanged bytes, because recovery will
invoke fil_crypt_parse() based on this log record. */
mtr->memcpy(*block, offset + MAGIC_SZ, b - start);
}

View file

@ -1817,68 +1817,62 @@ fil_create_directory_for_tablename(
@param space_id tablespace identifier
@param first_page_no first page number in the file
@param path file path
@param new_path new file path for type=MLOG_FILE_RENAME2
@param flags tablespace flags for type=MLOG_FILE_CREATE2 */
inline void mtr_t::log_file_op(mlog_id_t type,
@param new_path new file path for type=FILE_RENAME */
inline void mtr_t::log_file_op(mfile_type_t type,
ulint space_id, ulint first_page_no,
const char *path, const char *new_path,
ulint flags)
const char *path, const char *new_path)
{
ulint len;
ut_ad(first_page_no == 0 || type == FILE_CREATE);
ut_ad((new_path != nullptr) == (type == FILE_RENAME));
ut_ad(!(byte(type) & 15));
ut_ad(first_page_no == 0 || type == MLOG_FILE_CREATE2);
ut_ad(fil_space_t::is_valid_flags(flags, space_id));
/* fil_name_parse() requires that there be at least one path
separator and that the file path end with ".ibd". */
ut_ad(strchr(path, OS_PATH_SEPARATOR) != NULL);
ut_ad(first_page_no /* trimming an undo tablespace */ ||
!strcmp(&path[strlen(path) - strlen(DOT_IBD)], DOT_IBD));
/* fil_name_parse() requires that there be at least one path
separator and that the file path end with ".ibd". */
ut_ad(strchr(path, OS_PATH_SEPARATOR) != NULL);
ut_ad(first_page_no /* trimming an undo tablespace */
|| !strcmp(&path[strlen(path) - strlen(DOT_IBD)], DOT_IBD));
set_modified();
if (m_log_mode != MTR_LOG_ALL)
return;
m_last= nullptr;
set_modified();
if (m_log_mode != MTR_LOG_ALL) {
return;
}
const size_t len= strlen(path);
const size_t new_len= type == FILE_RENAME ? 1 + strlen(new_path) : 0;
ut_ad(len > 0);
byte *const log_ptr= m_log.open(1 + 3/*length*/ + 5/*space_id*/ +
5/*first_page_no*/);
byte *end= log_ptr + 1;
end= mlog_encode_varint(end, space_id);
end= mlog_encode_varint(end, first_page_no);
if (UNIV_LIKELY(end + len + new_len >= &log_ptr[16]))
{
*log_ptr= type;
size_t total_len= len + new_len + end - log_ptr - 15;
if (total_len >= MIN_3BYTE)
total_len+= 2;
else if (total_len >= MIN_2BYTE)
total_len++;
end= mlog_encode_varint(log_ptr + 1, total_len);
end= mlog_encode_varint(end, space_id);
end= mlog_encode_varint(end, first_page_no);
}
else
{
*log_ptr= type | static_cast<byte>(end + len + new_len - &log_ptr[1]);
ut_ad(*log_ptr & 15);
}
byte* log_ptr = log_write_low(type, page_id_t(space_id, first_page_no),
m_log.open(11 + 4 + 2 + 1));
m_log.close(end);
if (type == MLOG_FILE_CREATE2) {
mach_write_to_4(log_ptr, flags);
log_ptr += 4;
}
/* Let us store the strings as null-terminated for easier readability
and handling */
len = strlen(path) + 1;
mach_write_to_2(log_ptr, len);
log_ptr += 2;
m_log.close(log_ptr);
m_log.push(reinterpret_cast<const byte*>(path), uint32_t(len));
switch (type) {
case MLOG_FILE_RENAME2:
ut_ad(strchr(new_path, OS_PATH_SEPARATOR) != NULL);
len = strlen(new_path) + 1;
log_ptr = m_log.open(2 + len);
ut_a(log_ptr);
mach_write_to_2(log_ptr, len);
log_ptr += 2;
m_log.close(log_ptr);
m_log.push(reinterpret_cast<const byte*>(new_path),
uint32_t(len));
break;
case MLOG_FILE_NAME:
case MLOG_FILE_DELETE:
case MLOG_FILE_CREATE2:
break;
default:
ut_ad(0);
}
if (type == FILE_RENAME)
{
ut_ad(strchr(new_path, OS_PATH_SEPARATOR));
m_log.push(reinterpret_cast<const byte*>(path), uint32_t(len + 1));
m_log.push(reinterpret_cast<const byte*>(new_path), uint32_t(new_len));
}
else
m_log.push(reinterpret_cast<const byte*>(path), uint32_t(len));
}
/** Write redo log for renaming a file.
@ -1897,8 +1891,7 @@ fil_name_write_rename_low(
mtr_t* mtr)
{
ut_ad(!is_predefined_tablespace(space_id));
mtr->log_file_op(MLOG_FILE_RENAME2, space_id, first_page_no,
old_name, new_name);
mtr->log_file_op(FILE_RENAME, space_id, first_page_no, old_name, new_name);
}
/** Write redo log for renaming a file.
@ -1918,7 +1911,7 @@ fil_name_write_rename(
log_write_up_to(mtr.commit_lsn(), true);
}
/** Write MLOG_FILE_NAME for a file.
/** Write FILE_MODIFY for a file.
@param[in] space_id tablespace id
@param[in] first_page_no first page number in the file
@param[in] name tablespace file name
@ -1931,9 +1924,10 @@ fil_name_write(
const char* name,
mtr_t* mtr)
{
mtr->log_file_op(MLOG_FILE_NAME, space_id, first_page_no, name);
ut_ad(!is_predefined_tablespace(space_id));
mtr->log_file_op(FILE_MODIFY, space_id, first_page_no, name);
}
/** Write MLOG_FILE_NAME for a file.
/** Write FILE_MODIFY for a file.
@param[in] space tablespace
@param[in] first_page_no first page number in the file
@param[in] file tablespace file
@ -1946,7 +1940,7 @@ fil_name_write(
const fil_node_t* file,
mtr_t* mtr)
{
mtr->log_file_op(MLOG_FILE_NAME, space->id, first_page_no, file->name);
fil_name_write(space->id, first_page_no, file->name, mtr);
}
/** Replay a file rename operation if possible.
@ -2347,7 +2341,7 @@ fil_delete_tablespace(
mtr_t mtr;
mtr.start();
mtr.log_file_op(MLOG_FILE_DELETE, id, 0, path);
mtr.log_file_op(FILE_DELETE, id, 0, path);
mtr.commit();
/* Even if we got killed shortly after deleting the
tablespace file, the record must have already been
@ -2429,13 +2423,12 @@ fil_space_t* fil_truncate_prepare(ulint space_id)
/** Write log about an undo tablespace truncate operation. */
void fil_truncate_log(fil_space_t* space, ulint size, mtr_t* mtr)
{
/* Write a MLOG_FILE_CREATE2 record with the new size, so that
recovery and backup will ignore any preceding redo log records
for writing pages that are after the new end of the tablespace. */
ut_ad(UT_LIST_GET_LEN(space->chain) == 1);
const fil_node_t* file = UT_LIST_GET_FIRST(space->chain);
mtr->log_file_op(MLOG_FILE_CREATE2, space->id, size, file->name,
nullptr, space->flags & ~FSP_FLAGS_MEM_MASK);
/* Write a record with the new size, so that recovery and
backup will ignore any preceding redo log records for writing
pages that are after the new end of the tablespace. */
ut_ad(UT_LIST_GET_LEN(space->chain) == 1);
const fil_node_t *file= UT_LIST_GET_FIRST(space->chain);
mtr->log_file_op(FILE_CREATE, space->id, size, file->name);
}
/*******************************************************************//**
@ -2928,9 +2921,7 @@ err_exit:
false, true);
mtr_t mtr;
mtr.start();
mtr.log_file_op(MLOG_FILE_CREATE2, space_id, 0, node->name,
nullptr, space->flags & ~FSP_FLAGS_MEM_MASK);
fil_name_write(space, 0, node, &mtr);
mtr.log_file_op(FILE_CREATE, space_id, 0, node->name);
mtr.commit();
node->find_metadata(file);
@ -4561,7 +4552,7 @@ fil_space_validate_for_mtr_commit(
}
#endif /* UNIV_DEBUG */
/** Write a MLOG_FILE_NAME record for a persistent tablespace.
/** Write a FILE_MODIFY record for a persistent tablespace.
@param[in] space tablespace
@param[in,out] mtr mini-transaction */
static
@ -4591,22 +4582,20 @@ fil_names_dirty(
space->max_lsn = log_sys.lsn;
}
/** Write MLOG_FILE_NAME records when a non-predefined persistent
/** Write FILE_MODIFY records when a non-predefined persistent
tablespace was modified for the first time since the latest
fil_names_clear().
@param[in,out] space tablespace
@param[in,out] mtr mini-transaction */
void
fil_names_dirty_and_write(
fil_space_t* space,
mtr_t* mtr)
@param[in,out] space tablespace */
void fil_names_dirty_and_write(fil_space_t* space)
{
ut_ad(log_mutex_own());
ut_d(fil_space_validate_for_mtr_commit(space));
ut_ad(space->max_lsn == log_sys.lsn);
UT_LIST_ADD_LAST(fil_system.named_spaces, space);
fil_names_write(space, mtr);
mtr_t mtr;
mtr.start();
fil_names_write(space, &mtr);
DBUG_EXECUTE_IF("fil_names_write_bogus",
{
@ -4614,14 +4603,16 @@ fil_names_dirty_and_write(
os_normalize_path(bogus_name);
fil_name_write(
SRV_SPACE_ID_UPPER_BOUND, 0,
bogus_name, mtr);
bogus_name, &mtr);
});
mtr.commit_files();
}
/** On a log checkpoint, reset fil_names_dirty_and_write() flags
and write out MLOG_FILE_NAME and MLOG_CHECKPOINT if needed.
and write out FILE_MODIFY and FILE_CHECKPOINT if needed.
@param[in] lsn checkpoint LSN
@param[in] do_write whether to always write MLOG_CHECKPOINT
@param[in] do_write whether to always write FILE_CHECKPOINT
@return whether anything was written to the redo log
@retval false if no flags were set and nothing written
@retval true if anything was written to the redo log */
@ -4631,7 +4622,7 @@ fil_names_clear(
bool do_write)
{
mtr_t mtr;
ulint mtr_checkpoint_size = LOG_CHECKPOINT_FREE_PER_THREAD;
ulint mtr_checkpoint_size = RECV_SCAN_SIZE - 1;
DBUG_EXECUTE_IF(
"increase_mtr_checkpoint_size",
@ -4650,6 +4641,14 @@ fil_names_clear(
for (fil_space_t* space = UT_LIST_GET_FIRST(fil_system.named_spaces);
space != NULL; ) {
if (mtr.get_log()->size()
+ (3 + 5 + 1) + strlen(space->chain.start->name)
>= mtr_checkpoint_size) {
/* Prevent log parse buffer overflow */
mtr.commit_files();
mtr.start();
}
fil_space_t* next = UT_LIST_GET_NEXT(named_spaces, space);
ut_ad(space->max_lsn > 0);
@ -4671,19 +4670,6 @@ fil_names_clear(
fil_names_write(space, &mtr);
do_write = true;
const mtr_buf_t* mtr_log = mtr_get_log(&mtr);
/** If the mtr buffer size exceeds the size of
LOG_CHECKPOINT_FREE_PER_THREAD then commit the multi record
mini-transaction, start the new mini-transaction to
avoid the parsing buffer overflow error during recovery. */
if (mtr_log->size() > mtr_checkpoint_size) {
ut_ad(mtr_log->size() < (RECV_PARSING_BUF_SIZE / 2));
mtr.commit_files();
mtr.start();
}
space = next;
}

View file

@ -476,27 +476,29 @@ xdes_get_offset(
/** Initialize a file page whose prior contents should be ignored.
@param[in,out] block buffer pool block */
void fsp_apply_init_file_page(buf_block_t* block)
void fsp_apply_init_file_page(buf_block_t *block)
{
page_t* page = buf_block_get_frame(block);
memset_aligned<UNIV_PAGE_SIZE_MIN>(block->frame, 0, srv_page_size);
memset(page, 0, srv_page_size);
mach_write_to_4(page + FIL_PAGE_OFFSET, block->page.id.page_no());
mach_write_to_4(page + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID,
block->page.id.space());
if (page_zip_des_t* page_zip= buf_block_get_page_zip(block)) {
memset(page_zip->data, 0, page_zip_get_size(page_zip));
static_assert(FIL_PAGE_OFFSET % 4 == 0, "alignment");
memcpy_aligned<4>(page_zip->data + FIL_PAGE_OFFSET,
page + FIL_PAGE_OFFSET, 4);
static_assert(FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID % 4 == 2,
"not perfect alignment");
memcpy_aligned<2>(page_zip->data
+ FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID,
page + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID, 4);
}
mach_write_to_4(block->frame + FIL_PAGE_OFFSET, block->page.id.page_no());
if (log_sys.is_physical())
memset_aligned<8>(block->frame + FIL_PAGE_PREV, 0xff, 8);
mach_write_to_4(block->frame + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID,
block->page.id.space());
if (page_zip_des_t* page_zip= buf_block_get_page_zip(block))
{
memset_aligned<UNIV_ZIP_SIZE_MIN>(page_zip->data, 0,
page_zip_get_size(page_zip));
static_assert(FIL_PAGE_OFFSET == 4, "compatibility");
memcpy_aligned<4>(page_zip->data + FIL_PAGE_OFFSET,
block->frame + FIL_PAGE_OFFSET, 4);
if (log_sys.is_physical())
memset_aligned<8>(page_zip->data + FIL_PAGE_PREV, 0xff, 8);
static_assert(FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID % 4 == 2,
"not perfect alignment");
memcpy_aligned<2>(page_zip->data + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID,
block->frame + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID, 4);
}
}
#ifdef UNIV_DEBUG
@ -577,8 +579,12 @@ void fsp_header_init(fil_space_t* space, ulint size, mtr_t* mtr)
+ block->frame, space->id);
ut_ad(0 == mach_read_from_4(FSP_HEADER_OFFSET + FSP_NOT_USED
+ block->frame));
mtr->write<4>(*block, FSP_HEADER_OFFSET + FSP_SIZE + block->frame,
size);
/* recv_sys_t::parse() expects to find a WRITE record that
covers all 4 bytes. Therefore, we must specify mtr_t::FORCED
in order to avoid optimizing away any unchanged most
significant bytes of FSP_SIZE. */
mtr->write<4,mtr_t::FORCED>(*block, FSP_HEADER_OFFSET + FSP_SIZE
+ block->frame, size);
ut_ad(0 == mach_read_from_4(FSP_HEADER_OFFSET + FSP_FREE_LIMIT
+ block->frame));
mtr->write<4,mtr_t::OPT>(*block, FSP_HEADER_OFFSET + FSP_SPACE_FLAGS
@ -636,8 +642,12 @@ fsp_try_extend_data_file_with_pages(
success = fil_space_extend(space, page_no + 1);
/* The size may be less than we wanted if we ran out of disk space. */
mtr->write<4>(*header, FSP_HEADER_OFFSET + FSP_SIZE + header->frame,
space->size);
/* recv_sys_t::parse() expects to find a WRITE record that
covers all 4 bytes. Therefore, we must specify mtr_t::FORCED
in order to avoid optimizing away any unchanged most
significant bytes of FSP_SIZE. */
mtr->write<4,mtr_t::FORCED>(*header, FSP_HEADER_OFFSET + FSP_SIZE
+ header->frame, space->size);
space->size_in_header = space->size;
return(success);
@ -770,8 +780,12 @@ fsp_try_extend_data_file(fil_space_t *space, buf_block_t *header, mtr_t *mtr)
space->size_in_header = ut_2pow_round(space->size, (1024 * 1024) / ps);
mtr->write<4>(*header, FSP_HEADER_OFFSET + FSP_SIZE + header->frame,
space->size_in_header);
/* recv_sys_t::parse() expects to find a WRITE record that
covers all 4 bytes. Therefore, we must specify mtr_t::FORCED
in order to avoid optimizing away any unchanged most
significant bytes of FSP_SIZE. */
mtr->write<4,mtr_t::FORCED>(*header, FSP_HEADER_OFFSET + FSP_SIZE
+ header->frame, space->size_in_header);
return(size_increase);
}
@ -1511,8 +1525,7 @@ static void fsp_free_seg_inode(
iblock, FSEG_INODE_PAGE_NODE, mtr);
}
mtr->write<8>(*iblock, inode + FSEG_ID, 0U);
mtr->write<4>(*iblock, inode + FSEG_MAGIC_N, 0xfa051ce3);
mtr->memset(iblock, page_offset(inode) + FSEG_ID, FSEG_INODE_SIZE, 0);
if (ULINT_UNDEFINED
== fsp_seg_inode_page_find_used(iblock->frame, physical_size)) {

View file

@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1995, 2016, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2019, MariaDB Corporation.
Copyright (c) 2019, 2020, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
@ -28,6 +28,61 @@ Created 11/28/1995 Heikki Tuuri
#include "buf0buf.h"
#include "page0page.h"
/** Write a file address.
@param[in] block file page
@param[in,out] faddr file address location
@param[in] page page number
@param[in] boffset byte offset
@param[in,out] mtr mini-transaction */
static void flst_write_addr(const buf_block_t& block, byte *faddr,
uint32_t page, uint16_t boffset, mtr_t* mtr)
{
ut_ad(mtr->memo_contains_page_flagged(faddr,
MTR_MEMO_PAGE_X_FIX
| MTR_MEMO_PAGE_SX_FIX));
ut_a(page == FIL_NULL || boffset >= FIL_PAGE_DATA);
ut_a(ut_align_offset(faddr, srv_page_size) >= FIL_PAGE_DATA);
static_assert(FIL_ADDR_PAGE == 0, "compatibility");
static_assert(FIL_ADDR_BYTE == 4, "compatibility");
static_assert(FIL_ADDR_SIZE == 6, "compatibility");
const bool same_page= mach_read_from_4(faddr + FIL_ADDR_PAGE) == page;
const bool same_offset= mach_read_from_2(faddr + FIL_ADDR_BYTE) == boffset;
if (same_page)
{
if (!same_offset)
mtr->write<2>(block, faddr + FIL_ADDR_BYTE, boffset);
return;
}
if (same_offset)
mtr->write<4>(block, faddr + FIL_ADDR_PAGE, page);
else
{
alignas(4) byte fil_addr[6];
mach_write_to_4(fil_addr + FIL_ADDR_PAGE, page);
mach_write_to_2(fil_addr + FIL_ADDR_BYTE, boffset);
mtr->memcpy(block, faddr + FIL_ADDR_PAGE, fil_addr, 6);
}
}
/** Write 2 null file addresses.
@param[in] b file page
@param[in,out] addr file address to be zeroed out
@param[in,out] mtr mini-transaction */
static void flst_zero_both(const buf_block_t& b, byte *addr, mtr_t *mtr)
{
if (mach_read_from_4(addr + FIL_ADDR_PAGE) != FIL_NULL)
mtr->memset(&b, ulint(addr - b.frame) + FIL_ADDR_PAGE, 4, 0xff);
mtr->write<2,mtr_t::OPT>(b, addr + FIL_ADDR_BYTE, 0U);
/* Initialize the other address by (MEMMOVE|0x80,offset,FIL_ADDR_SIZE,source)
which is 4 bytes, or less than FIL_ADDR_SIZE. */
memcpy(addr + FIL_ADDR_SIZE, addr, FIL_ADDR_SIZE);
const uint16_t boffset= page_offset(addr);
mtr->memmove(b, boffset + FIL_ADDR_SIZE, boffset, FIL_ADDR_SIZE);
}
/** Add a node to an empty list. */
static void flst_add_to_empty(buf_block_t *base, uint16_t boffset,
buf_block_t *add, uint16_t aoffset, mtr_t *mtr)
@ -41,20 +96,22 @@ static void flst_add_to_empty(buf_block_t *base, uint16_t boffset,
ut_ad(mtr_memo_contains_page_flagged(mtr, add->frame,
MTR_MEMO_PAGE_X_FIX |
MTR_MEMO_PAGE_SX_FIX));
fil_addr_t addr= { add->page.id.page_no(), aoffset };
/* Update first and last fields of base node */
flst_write_addr(*base, base->frame + boffset + FLST_FIRST, addr, mtr);
/* MDEV-12353 TODO: use MEMMOVE record */
flst_write_addr(*base, base->frame + boffset + FLST_LAST, addr, mtr);
/* Set prev and next fields of node to add */
flst_zero_addr(*add, add->frame + aoffset + FLST_PREV, mtr);
flst_zero_addr(*add, add->frame + aoffset + FLST_NEXT, mtr);
/* Update len of base node */
ut_ad(!mach_read_from_4(base->frame + boffset + FLST_LEN));
mtr->write<1>(*base, base->frame + boffset + (FLST_LEN + 3), 1U);
/* Update first and last fields of base node */
flst_write_addr(*base, base->frame + boffset + FLST_FIRST,
add->page.id.page_no(), aoffset, mtr);
memcpy(base->frame + boffset + FLST_LAST, base->frame + boffset + FLST_FIRST,
FIL_ADDR_SIZE);
/* Initialize FLST_LAST by (MEMMOVE|0x80,offset,FIL_ADDR_SIZE,source)
which is 4 bytes, or less than FIL_ADDR_SIZE. */
mtr->memmove(*base, boffset + FLST_LAST, boffset + FLST_FIRST,
FIL_ADDR_SIZE);
/* Set prev and next fields of node to add */
static_assert(FLST_NEXT == FLST_PREV + FIL_ADDR_SIZE, "compatibility");
flst_zero_both(*add, add->frame + aoffset + FLST_PREV, mtr);
}
/** Insert a node after another one.
@ -85,24 +142,27 @@ static void flst_insert_after(buf_block_t *base, uint16_t boffset,
MTR_MEMO_PAGE_X_FIX |
MTR_MEMO_PAGE_SX_FIX));
fil_addr_t cur_addr= { cur->page.id.page_no(), coffset };
fil_addr_t add_addr= { add->page.id.page_no(), aoffset };
fil_addr_t next_addr= flst_get_next_addr(cur->frame + coffset);
flst_write_addr(*add, add->frame + aoffset + FLST_PREV, cur_addr, mtr);
flst_write_addr(*add, add->frame + aoffset + FLST_NEXT, next_addr, mtr);
flst_write_addr(*add, add->frame + aoffset + FLST_PREV,
cur->page.id.page_no(), coffset, mtr);
flst_write_addr(*add, add->frame + aoffset + FLST_NEXT,
next_addr.page, next_addr.boffset, mtr);
if (fil_addr_is_null(next_addr))
flst_write_addr(*base, base->frame + boffset + FLST_LAST, add_addr, mtr);
flst_write_addr(*base, base->frame + boffset + FLST_LAST,
add->page.id.page_no(), aoffset, mtr);
else
{
buf_block_t *block;
flst_node_t *next= fut_get_ptr(add->page.id.space(), add->zip_size(),
next_addr, RW_SX_LATCH, mtr, &block);
flst_write_addr(*block, next + FLST_PREV, add_addr, mtr);
flst_write_addr(*block, next + FLST_PREV,
add->page.id.page_no(), aoffset, mtr);
}
flst_write_addr(*cur, cur->frame + coffset + FLST_NEXT, add_addr, mtr);
flst_write_addr(*cur, cur->frame + coffset + FLST_NEXT,
add->page.id.page_no(), aoffset, mtr);
byte *len= &base->frame[boffset + FLST_LEN];
mtr->write<4>(*base, len, mach_read_from_4(len) + 1);
@ -136,29 +196,45 @@ static void flst_insert_before(buf_block_t *base, uint16_t boffset,
MTR_MEMO_PAGE_X_FIX |
MTR_MEMO_PAGE_SX_FIX));
fil_addr_t cur_addr= { cur->page.id.page_no(), coffset };
fil_addr_t add_addr= { add->page.id.page_no(), aoffset };
fil_addr_t prev_addr= flst_get_prev_addr(cur->frame + coffset);
flst_write_addr(*add, add->frame + aoffset + FLST_PREV, prev_addr, mtr);
flst_write_addr(*add, add->frame + aoffset + FLST_NEXT, cur_addr, mtr);
flst_write_addr(*add, add->frame + aoffset + FLST_PREV,
prev_addr.page, prev_addr.boffset, mtr);
flst_write_addr(*add, add->frame + aoffset + FLST_NEXT,
cur->page.id.page_no(), coffset, mtr);
if (fil_addr_is_null(prev_addr))
flst_write_addr(*base, base->frame + boffset + FLST_FIRST, add_addr, mtr);
flst_write_addr(*base, base->frame + boffset + FLST_FIRST,
add->page.id.page_no(), aoffset, mtr);
else
{
buf_block_t *block;
flst_node_t *prev= fut_get_ptr(add->page.id.space(), add->zip_size(),
prev_addr, RW_SX_LATCH, mtr, &block);
flst_write_addr(*block, prev + FLST_NEXT, add_addr, mtr);
flst_write_addr(*block, prev + FLST_NEXT,
add->page.id.page_no(), aoffset, mtr);
}
flst_write_addr(*cur, cur->frame + coffset + FLST_PREV, add_addr, mtr);
flst_write_addr(*cur, cur->frame + coffset + FLST_PREV,
add->page.id.page_no(), aoffset, mtr);
byte *len= &base->frame[boffset + FLST_LEN];
mtr->write<4>(*base, len, mach_read_from_4(len) + 1);
}
/** Initialize a list base node.
@param[in] block file page
@param[in,out] base base node
@param[in,out] mtr mini-transaction */
void flst_init(const buf_block_t& block, byte *base, mtr_t *mtr)
{
ut_ad(mtr->memo_contains_page_flagged(base, MTR_MEMO_PAGE_X_FIX |
MTR_MEMO_PAGE_SX_FIX));
mtr->write<4,mtr_t::OPT>(block, base + FLST_LEN, 0U);
static_assert(FLST_LAST == FLST_FIRST + FIL_ADDR_SIZE, "compatibility");
flst_zero_both(block, base + FLST_FIRST, mtr);
}
/** Append a file list node to a list.
@param[in,out] base base node block
@param[in] boffset byte offset of the base node
@ -251,7 +327,8 @@ void flst_remove(buf_block_t *base, uint16_t boffset,
const fil_addr_t next_addr= flst_get_next_addr(cur->frame + coffset);
if (fil_addr_is_null(prev_addr))
flst_write_addr(*base, base->frame + boffset + FLST_FIRST, next_addr, mtr);
flst_write_addr(*base, base->frame + boffset + FLST_FIRST,
next_addr.page, next_addr.boffset, mtr);
else
{
buf_block_t *block= cur;
@ -259,11 +336,13 @@ void flst_remove(buf_block_t *base, uint16_t boffset,
? cur->frame + prev_addr.boffset
: fut_get_ptr(cur->page.id.space(), cur->zip_size(), prev_addr,
RW_SX_LATCH, mtr, &block);
flst_write_addr(*block, prev + FLST_NEXT, next_addr, mtr);
flst_write_addr(*block, prev + FLST_NEXT,
next_addr.page, next_addr.boffset, mtr);
}
if (fil_addr_is_null(next_addr))
flst_write_addr(*base, base->frame + boffset + FLST_LAST, prev_addr, mtr);
flst_write_addr(*base, base->frame + boffset + FLST_LAST,
prev_addr.page, prev_addr.boffset, mtr);
else
{
buf_block_t *block= cur;
@ -271,7 +350,8 @@ void flst_remove(buf_block_t *base, uint16_t boffset,
? cur->frame + next_addr.boffset
: fut_get_ptr(cur->page.id.space(), cur->zip_size(), next_addr,
RW_SX_LATCH, mtr, &block);
flst_write_addr(*block, next + FLST_PREV, prev_addr, mtr);
flst_write_addr(*block, next + FLST_PREV,
prev_addr.page, prev_addr.boffset, mtr);
}
byte *len= &base->frame[boffset + FLST_LEN];

View file

@ -300,8 +300,9 @@ rtr_update_mbr_field(
memcpy(rec, node_ptr->fields[0].data, DATA_MBR_LEN);
page_zip_write_rec(block, rec, index, offsets, 0, mtr);
} else {
mtr->memcpy(block, page_offset(rec),
node_ptr->fields[0].data, DATA_MBR_LEN);
mtr->memcpy<mtr_t::OPT>(*block, rec,
node_ptr->fields[0].data,
DATA_MBR_LEN);
}
if (cursor2) {
@ -895,7 +896,6 @@ rtr_page_split_and_insert(
rtr_split_node_t* cur_split_node;
rtr_split_node_t* end_split_node;
double* buf_pos;
ulint page_level;
node_seq_t current_ssn;
node_seq_t next_ssn;
buf_block_t* root_block;
@ -926,7 +926,6 @@ func_start:
block = btr_cur_get_block(cursor);
page = buf_block_get_frame(block);
page_zip = buf_block_get_page_zip(block);
page_level = btr_page_get_level(page);
current_ssn = page_get_ssn_id(page);
ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
@ -971,9 +970,19 @@ func_start:
/* Allocate a new page to the index */
hint_page_no = page_no + 1;
const uint16_t page_level = btr_page_get_level(page);
new_block = btr_page_alloc(cursor->index, hint_page_no, FSP_UP,
page_level, mtr, mtr);
if (!new_block) {
return NULL;
}
new_page_zip = buf_block_get_page_zip(new_block);
if (page_level && UNIV_LIKELY_NULL(new_page_zip)) {
/* ROW_FORMAT=COMPRESSED non-leaf pages are not expected
to contain FIL_NULL in FIL_PAGE_PREV at this stage. */
memset_aligned<4>(new_block->frame + FIL_PAGE_PREV, 0, 4);
}
btr_page_create(new_block, new_page_zip, cursor->index,
page_level, mtr);

View file

@ -18585,7 +18585,7 @@ checkpoint_now_set(THD*, st_mysql_sys_var*, void*, const void* save)
mysql_mutex_unlock(&LOCK_global_system_variables);
while (log_sys.last_checkpoint_lsn
+ SIZE_OF_MLOG_CHECKPOINT
+ SIZE_OF_FILE_CHECKPOINT
+ (log_sys.append_on_checkpoint != NULL
? log_sys.append_on_checkpoint->size() : 0)
< log_sys.lsn) {

View file

@ -10890,19 +10890,14 @@ ha_innobase::commit_inplace_alter_table(
ut_ad(trx->has_logged());
if (mtr.get_log()->size() > 0) {
ut_ad(*mtr.get_log()->front()->begin()
== MLOG_FILE_RENAME2);
/* Append the MLOG_FILE_RENAME2
ut_ad((*mtr.get_log()->front()->begin()
& 0xf0) == FILE_RENAME);
/* Append the FILE_RENAME
records on checkpoint, as a separate
mini-transaction before the one that
contains the MLOG_CHECKPOINT marker. */
static const byte multi
= MLOG_MULTI_REC_END;
contains the FILE_CHECKPOINT marker. */
mtr.get_log()->for_each_block(logs);
logs.m_buf.push(&multi, sizeof multi);
logs.m_buf.push(field_ref_zero, 1);
log_append_on_checkpoint(&logs.m_buf);
}

View file

@ -276,23 +276,15 @@ btr_page_get_index_id(
/*==================*/
const page_t* page) /*!< in: index page */
MY_ATTRIBUTE((warn_unused_result));
/********************************************************//**
Gets the node level field in an index page.
@param[in] page index page
@return level, leaf level == 0 */
UNIV_INLINE
ulint
btr_page_get_level(const page_t* page)
/** Read the B-tree or R-tree PAGE_LEVEL.
@param page B-tree or R-tree page
@return number of child page links to reach the leaf level
@retval 0 for leaf pages */
inline uint16_t btr_page_get_level(const page_t *page)
{
ulint level;
ut_ad(page);
level = mach_read_from_2(page + PAGE_HEADER + PAGE_LEVEL);
ut_ad(level <= BTR_MAX_NODE_LEVEL);
return(level);
uint16_t level = mach_read_from_2(page + PAGE_HEADER + PAGE_LEVEL);
ut_ad(level <= BTR_MAX_NODE_LEVEL);
return level;
} MY_ATTRIBUTE((warn_unused_result))
/** Read FIL_PAGE_NEXT.
@ -403,6 +395,13 @@ btr_write_autoinc(dict_index_t* index, ib_uint64_t autoinc, bool reset = false)
@param[in,out] mtr mini-transaction */
void btr_set_instant(buf_block_t* root, const dict_index_t& index, mtr_t* mtr);
/** Reset the table to the canonical format on ROLLBACK of instant ALTER TABLE.
@param[in] index clustered index with instant ALTER TABLE
@param[in] all whether to reset FIL_PAGE_TYPE as well
@param[in,out] mtr mini-transaction */
ATTRIBUTE_COLD __attribute__((nonnull))
void btr_reset_instant(const dict_index_t &index, bool all, mtr_t *mtr);
/*************************************************************//**
Makes tree one level higher by splitting the root, and inserts
the tuple. It is assumed that mtr contains an x-latch on the tree.

View file

@ -49,16 +49,11 @@ inline
void btr_page_set_level(buf_block_t *block, ulint level, mtr_t *mtr)
{
ut_ad(level <= BTR_MAX_NODE_LEVEL);
byte *page_level= PAGE_HEADER + PAGE_LEVEL + block->frame;
if (UNIV_LIKELY_NULL(block->page.zip.data))
{
mach_write_to_2(page_level, level);
page_zip_write_header(block, page_level, 2, mtr);
}
else
mtr->write<2,mtr_t::OPT>(*block, page_level, level);
constexpr uint16_t field= PAGE_HEADER + PAGE_LEVEL;
byte *b= my_assume_aligned<2>(&block->frame[field]);
if (mtr->write<2,mtr_t::OPT>(*block, b, level) &&
UNIV_LIKELY_NULL(block->page.zip.data))
memcpy_aligned<2>(&block->page.zip.data[field], b, 2);
}
/** Set FIL_PAGE_NEXT.
@ -67,14 +62,11 @@ void btr_page_set_level(buf_block_t *block, ulint level, mtr_t *mtr)
@param[in,out] mtr mini-transaction */
inline void btr_page_set_next(buf_block_t *block, ulint next, mtr_t *mtr)
{
byte *fil_page_next= block->frame + FIL_PAGE_NEXT;
if (UNIV_LIKELY_NULL(block->page.zip.data))
{
mach_write_to_4(fil_page_next, next);
page_zip_write_header(block, fil_page_next, 4, mtr);
}
else
mtr->write<4>(*block, fil_page_next, next);
constexpr uint16_t field= FIL_PAGE_NEXT;
byte *b= my_assume_aligned<4>(&block->frame[field]);
if (mtr->write<4,mtr_t::OPT>(*block, b, next) &&
UNIV_LIKELY_NULL(block->page.zip.data))
memcpy_aligned<4>(&block->page.zip.data[field], b, 4);
}
/** Set FIL_PAGE_PREV.
@ -83,14 +75,11 @@ inline void btr_page_set_next(buf_block_t *block, ulint next, mtr_t *mtr)
@param[in,out] mtr mini-transaction */
inline void btr_page_set_prev(buf_block_t *block, ulint prev, mtr_t *mtr)
{
byte *fil_page_prev= block->frame + FIL_PAGE_PREV;
if (UNIV_LIKELY_NULL(block->page.zip.data))
{
mach_write_to_4(fil_page_prev, prev);
page_zip_write_header(block, fil_page_prev, 4, mtr);
}
else
mtr->write<4>(*block, fil_page_prev, prev);
constexpr uint16_t field= FIL_PAGE_PREV;
byte *b= my_assume_aligned<4>(&block->frame[field]);
if (mtr->write<4,mtr_t::OPT>(*block, b, prev) &&
UNIV_LIKELY_NULL(block->page.zip.data))
memcpy_aligned<4>(&block->page.zip.data[field], b, 4);
}
/**************************************************************//**

View file

@ -109,10 +109,9 @@ private:
template<format> inline void finishPage();
/** Insert a record in the page.
@tparam format the page format
@param[in] rec record
@param[in,out] rec record
@param[in] offsets record offsets */
template<format> inline void insertPage(const rec_t* rec,
offset_t* offsets);
template<format> inline void insertPage(rec_t* rec, offset_t* offsets);
public:
/** Mark end of insertion to the page. Scan all records to set page

View file

@ -382,6 +382,9 @@ public:
return(m_heap == NULL);
}
/** @return whether the buffer is empty */
bool empty() const { return !back()->m_used; }
private:
// Disable copying
mtr_buf_t(const mtr_buf_t&);

View file

@ -149,7 +149,7 @@ struct fil_space_t
rw_lock_t latch; /*!< latch protecting the file space storage
allocation */
UT_LIST_NODE_T(fil_space_t) named_spaces;
/*!< list of spaces for which MLOG_FILE_NAME
/*!< list of spaces for which FILE_MODIFY
records have been issued */
/** Checks that this tablespace in a list of unflushed tablespaces.
@return true if in a list */
@ -641,13 +641,6 @@ extern const char* dot_ext[];
but in the MySQL Embedded Server Library and mysqlbackup it is not the default
directory, and we must set the base file path explicitly */
extern const char* fil_path_to_mysql_datadir;
/* Space address data type; this is intended to be used when
addresses accurate to a byte are stored in file pages. If the page part
of the address is FIL_NULL, the address is considered undefined. */
typedef byte fil_faddr_t; /*!< 'type' definition in C: an address
stored in a file page is a string of bytes */
#else
# include "univ.i"
#endif /* !UNIV_INNOCHECKSUM */
@ -951,7 +944,7 @@ public:
/*!< list of all file spaces */
UT_LIST_BASE_NODE_T(fil_space_t) named_spaces;
/*!< list of all file spaces
for which a MLOG_FILE_NAME
for which a FILE_MODIFY
record has been written since
the latest redo log checkpoint.
Protected only by log_sys.mutex. */
@ -1531,26 +1524,18 @@ void
fil_names_dirty(
fil_space_t* space);
/** Write MLOG_FILE_NAME records when a non-predefined persistent
/** Write FILE_MODIFY records when a non-predefined persistent
tablespace was modified for the first time since the latest
fil_names_clear().
@param[in,out] space tablespace
@param[in,out] mtr mini-transaction */
void
fil_names_dirty_and_write(
fil_space_t* space,
mtr_t* mtr);
@param[in,out] space tablespace */
void fil_names_dirty_and_write(fil_space_t* space);
/** Write MLOG_FILE_NAME records if a persistent tablespace was modified
/** Write FILE_MODIFY records if a persistent tablespace was modified
for the first time since the latest fil_names_clear().
@param[in,out] space tablespace
@param[in,out] mtr mini-transaction
@return whether any MLOG_FILE_NAME record was written */
inline MY_ATTRIBUTE((warn_unused_result))
bool
fil_names_write_if_was_clean(
fil_space_t* space,
mtr_t* mtr)
@return whether any FILE_MODIFY record was written */
inline bool fil_names_write_if_was_clean(fil_space_t* space)
{
ut_ad(log_mutex_own());
@ -1563,7 +1548,7 @@ fil_names_write_if_was_clean(
space->max_lsn = log_sys.lsn;
if (was_clean) {
fil_names_dirty_and_write(space, mtr);
fil_names_dirty_and_write(space);
}
return(was_clean);
@ -1588,9 +1573,9 @@ inline void fil_space_open_if_needed(fil_space_t* space)
}
/** On a log checkpoint, reset fil_names_dirty_and_write() flags
and write out MLOG_FILE_NAME and MLOG_CHECKPOINT if needed.
and write out FILE_MODIFY and FILE_CHECKPOINT if needed.
@param[in] lsn checkpoint LSN
@param[in] do_write whether to always write MLOG_CHECKPOINT
@param[in] do_write whether to always write FILE_CHECKPOINT
@return whether anything was written to the redo log
@retval false if no flags were set and nothing written
@retval true if anything was written to the redo log */

View file

@ -612,7 +612,7 @@ inline bool fsp_descr_page(const page_id_t page_id, ulint physical_size)
/** Initialize a file page whose prior contents should be ignored.
@param[in,out] block buffer pool block */
void fsp_apply_init_file_page(buf_block_t* block);
void fsp_apply_init_file_page(buf_block_t *block);
/** Initialize a file page.
@param[in] space tablespace

View file

@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1995, 2014, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2018, 2019, MariaDB Corporation.
Copyright (c) 2018, 2020, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
@ -78,47 +78,12 @@ inline void flst_init(const buf_block_t* block, uint16_t ofs, mtr_t* mtr)
mtr->memset(block, FLST_LAST + FIL_ADDR_PAGE + ofs, 4, 0xff);
}
/** Write a null file address.
@param[in] b file page
@param[in,out] addr file address to be zeroed out
@param[in,out] mtr mini-transaction */
inline void flst_zero_addr(const buf_block_t& b, fil_faddr_t *addr, mtr_t *mtr)
{
if (mach_read_from_4(addr + FIL_ADDR_PAGE) != FIL_NULL)
mtr->memset(&b, ulint(addr - b.frame) + FIL_ADDR_PAGE, 4, 0xff);
mtr->write<2,mtr_t::OPT>(b, addr + FIL_ADDR_BYTE, 0U);
}
/** Write a file address.
@param[in] block file page
@param[in,out] faddr file address location
@param[in] addr file address to be written out
@param[in,out] mtr mini-transaction */
inline void flst_write_addr(const buf_block_t& block, fil_faddr_t *faddr,
fil_addr_t addr, mtr_t* mtr)
{
ut_ad(mtr->memo_contains_page_flagged(faddr,
MTR_MEMO_PAGE_X_FIX
| MTR_MEMO_PAGE_SX_FIX));
ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);
ut_a(ut_align_offset(faddr, srv_page_size) >= FIL_PAGE_DATA);
mtr->write<4,mtr_t::OPT>(block, faddr + FIL_ADDR_PAGE, addr.page);
mtr->write<2,mtr_t::OPT>(block, faddr + FIL_ADDR_BYTE, addr.boffset);
}
/** Initialize a list base node.
@param[in] block file page
@param[in,out] base base node
@param[in,out] mtr mini-transaction */
inline void flst_init(const buf_block_t& block, byte *base, mtr_t *mtr)
{
ut_ad(mtr->memo_contains_page_flagged(base, MTR_MEMO_PAGE_X_FIX |
MTR_MEMO_PAGE_SX_FIX));
mtr->write<4,mtr_t::OPT>(block, base + FLST_LEN, 0U);
flst_zero_addr(block, base + FLST_FIRST, mtr);
flst_zero_addr(block, base + FLST_LAST, mtr);
}
void flst_init(const buf_block_t& block, byte *base, mtr_t *mtr)
MY_ATTRIBUTE((nonnull));
/** Append a file list node to a list.
@param[in,out] base base node block
@ -155,7 +120,7 @@ inline uint32_t flst_get_len(const flst_base_node_t *base)
}
/** @return a file address */
inline fil_addr_t flst_read_addr(const fil_faddr_t *faddr)
inline fil_addr_t flst_read_addr(const byte *faddr)
{
fil_addr_t addr= { mach_read_from_4(faddr + FIL_ADDR_PAGE),
mach_read_from_2(faddr + FIL_ADDR_BYTE) };

View file

@ -2,7 +2,7 @@
Copyright (c) 1995, 2017, Oracle and/or its affiliates. All rights reserved.
Copyright (c) 2009, Google Inc.
Copyright (c) 2017, 2019, MariaDB Corporation.
Copyright (c) 2017, 2020, MariaDB Corporation.
Portions of this file contain modifications contributed and copyrighted by
Google, Inc. Those modifications are gratefully acknowledged and are described
@ -206,7 +206,7 @@ logs_empty_and_mark_files_at_shutdown(void);
@param[in] header 0 or LOG_CHECKPOINT_1 or LOG_CHECKPOINT2 */
void log_header_read(ulint header);
/** Write checkpoint info to the log header and invoke log_mutex_exit().
@param[in] end_lsn start LSN of the MLOG_CHECKPOINT mini-transaction */
@param[in] end_lsn start LSN of the FILE_CHECKPOINT mini-transaction */
void log_write_checkpoint_info(lsn_t end_lsn);
/** Set extra data to be written to the redo log during checkpoint.
@ -499,6 +499,10 @@ struct log_t{
static constexpr uint32_t FORMAT_ENCRYPTED = 1U << 31;
/** The MariaDB 10.4.0 log format (only with innodb_encrypt_log=ON) */
static constexpr uint32_t FORMAT_ENC_10_4 = FORMAT_10_4 | FORMAT_ENCRYPTED;
/** The MariaDB 10.5 physical redo log format */
static constexpr uint32_t FORMAT_10_5 = 0x50485953;
/** The MariaDB 10.5 physical format (only with innodb_encrypt_log=ON) */
static constexpr uint32_t FORMAT_ENC_10_5 = FORMAT_10_5 | FORMAT_ENCRYPTED;
MY_ALIGNED(CACHE_LINE_SIZE)
lsn_t lsn; /*!< log sequence number */
@ -548,7 +552,7 @@ struct log_t{
struct files {
/** number of files */
ulint n_files;
/** format of the redo log: e.g., FORMAT_10_4 */
/** format of the redo log: e.g., FORMAT_10_5 */
uint32_t format;
/** redo log subformat: 0 with separately logged TRUNCATE,
2 with fully redo-logged TRUNCATE (1 in MariaDB 10.2) */
@ -586,6 +590,9 @@ struct log_t{
/** @return whether the redo log is encrypted */
bool is_encrypted() const { return format & FORMAT_ENCRYPTED; }
/** @return whether the redo log is in the physical format */
bool is_physical() const
{ return (format & ~FORMAT_ENCRYPTED) == FORMAT_10_5; }
/** @return capacity in bytes */
lsn_t capacity() const{ return (file_size - LOG_FILE_HDR_SIZE) * n_files; }
/** Calculate the offset of a log sequence number.
@ -718,6 +725,8 @@ public:
/** @return whether the redo log is encrypted */
bool is_encrypted() const { return(log.is_encrypted()); }
/** @return whether the redo log is in the physical format */
bool is_physical() const { return log.is_physical(); }
bool is_initialised() const { return m_initialised; }

View file

@ -48,8 +48,10 @@ recv_find_max_checkpoint(ulint* max_field)
MY_ATTRIBUTE((nonnull, warn_unused_result));
/** Apply any buffered redo log to a page that was just read from a data file.
@param[in,out] space tablespace
@param[in,out] bpage buffer pool page */
ATTRIBUTE_COLD void recv_recover_page(buf_page_t* bpage);
ATTRIBUTE_COLD void recv_recover_page(fil_space_t* space, buf_page_t* bpage)
MY_ATTRIBUTE((nonnull));
/** Start recovering from a redo log checkpoint.
@see recv_recovery_from_checkpoint_finish
@ -102,24 +104,21 @@ to wait merging to file pages.
@param[in] checkpoint_lsn the LSN of the latest checkpoint
@param[in] store whether to store page operations
@param[in] apply whether to apply the records
@return whether MLOG_CHECKPOINT record was seen the first time,
or corruption was noticed */
bool recv_parse_log_recs(
lsn_t checkpoint_lsn,
store_t* store,
bool apply);
@return whether MLOG_CHECKPOINT or FILE_CHECKPOINT record
was seen the first time, or corruption was noticed */
bool recv_parse_log_recs(lsn_t checkpoint_lsn, store_t *store, bool apply);
/** Moves the parsing buffer data left to the buffer start */
void recv_sys_justify_left_parsing_buf();
/** Report an operation to create, delete, or rename a file during backup.
@param[in] space_id tablespace identifier
@param[in] flags tablespace flags (NULL if not create)
@param[in] create whether the file is being created
@param[in] name file name (not NUL-terminated)
@param[in] len length of name, in bytes
@param[in] new_name new file name (NULL if not rename)
@param[in] new_len length of new_name, in bytes (0 if NULL) */
extern void (*log_file_op)(ulint space_id, const byte* flags,
extern void (*log_file_op)(ulint space_id, bool create,
const byte* name, ulint len,
const byte* new_name, ulint new_len);
@ -134,7 +133,10 @@ struct log_rec_t
/** next record */
log_rec_t *next;
/** mtr_t::commit_lsn() of the mini-transaction */
const lsn_t lsn;
lsn_t lsn;
protected:
void set_lsn(lsn_t end_lsn) { ut_ad(lsn <= end_lsn); lsn= end_lsn; }
};
struct recv_dblwr_t {
@ -171,13 +173,17 @@ struct page_recv_t
/** log records are being applied on the page */
RECV_BEING_PROCESSED
} state= RECV_NOT_PROCESSED;
/** Latest written byte offset when applying the log records.
@see mtr_t::m_last_offset */
uint16_t last_offset= 1;
/** log records for a page */
class recs_t
{
/** The first log record */
log_rec_t *head= NULL;
log_rec_t *head= nullptr;
/** The last log record */
log_rec_t *tail= NULL;
log_rec_t *tail= nullptr;
friend struct page_recv_t;
public:
/** Append a redo log snippet for the page
@param recs log snippet */
@ -190,12 +196,10 @@ struct page_recv_t
tail= recs;
}
/** Trim old log records for a page.
@param start_lsn oldest log sequence number to preserve
@return whether all the log for the page was trimmed */
inline bool trim(lsn_t start_lsn);
/** @return the last log snippet */
const log_rec_t* last() const { return tail; }
/** @return the last log snippet */
log_rec_t* last() { return tail; }
class iterator
{
@ -213,6 +217,10 @@ struct page_recv_t
inline void clear();
} log;
/** Trim old log records for a page.
@param start_lsn oldest log sequence number to preserve
@return whether all the log for the page was trimmed */
inline bool trim(lsn_t start_lsn);
/** Ignore any earlier redo log records for this page. */
inline void will_not_read();
/** @return whether the log records for the page are being processed */
@ -288,7 +296,7 @@ struct recv_sys_t{
(indexed by page_id_t::space() - srv_undo_space_id_start) */
struct trunc
{
/** log sequence number of MLOG_FILE_CREATE2, or 0 if none */
/** log sequence number of FILE_CREATE, or 0 if none */
lsn_t lsn;
/** truncated size of the tablespace, or 0 if not truncated */
unsigned pages;
@ -342,8 +350,25 @@ public:
const byte* body, const byte* rec_end, lsn_t lsn,
lsn_t end_lsn);
/** Clear a fully processed set of stored redo log records. */
inline void clear();
/** Register a redo log snippet for a page.
@param page_id page identifier
@param start_lsn start LSN of the mini-transaction
@param lsn @see mtr_t::commit_lsn()
@param l redo log snippet @see log_t::FORMAT_10_5
@param len length of l, in bytes */
inline void add(const page_id_t page_id, lsn_t start_lsn, lsn_t lsn,
const byte *l, size_t len);
/** Parse and register one mini-transaction in log_t::FORMAT_10_5.
@param checkpoint_lsn the log sequence number of the latest checkpoint
@param store whether to store the records
@param apply whether to apply file-level log records
@return whether FILE_CHECKPOINT record was seen the first time,
or corruption was noticed */
inline bool parse(lsn_t checkpoint_lsn, store_t store, bool apply);
/** Clear a fully processed set of stored redo log records. */
inline void clear();
/** Determine whether redo log recovery progress should be reported.
@param[in] time the current time
@ -362,19 +387,15 @@ public:
/** The alloc() memory alignment, in bytes */
static constexpr size_t ALIGNMENT= sizeof(size_t);
/** Get the memory block for storing recv_t and redo log data
@param[in] len length of the data to be stored
@param[in] store_recv whether to store recv_t object
/** Allocate memory for log_rec_t
@param len allocation size, in bytes
@return pointer to len bytes of memory (never NULL) */
inline byte *alloc(size_t len, bool store_recv= false);
inline void *alloc(size_t len, bool store_recv= false);
/** Free a redo log snippet.
@param data buffer returned by alloc() */
inline void free(const void *data);
/** @return the free length of the latest alloc() block, in bytes */
inline size_t get_free_len() const;
/** Remove records for a corrupted page.
This function should only be called when innodb_force_recovery is set.
@param page_id corrupted page identifier */

View file

@ -33,82 +33,478 @@ Created 12/7/1995 Heikki Tuuri
// Forward declaration
struct dict_index_t;
/** The minimum 2-byte integer (0b10xxxxxx xxxxxxxx) */
constexpr uint32_t MIN_2BYTE= 1 << 7;
/** The minimum 3-byte integer (0b110xxxxx xxxxxxxx xxxxxxxx) */
constexpr uint32_t MIN_3BYTE= MIN_2BYTE + (1 << 14);
/** The minimum 4-byte integer (0b1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx) */
constexpr uint32_t MIN_4BYTE= MIN_3BYTE + (1 << 21);
/** Minimum 5-byte integer (0b11110000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx) */
constexpr uint32_t MIN_5BYTE= MIN_4BYTE + (1 << 28);
/** Error from mlog_decode_varint() */
constexpr uint32_t MLOG_DECODE_ERROR= ~0U;
/** Decode the length of a variable-length encoded integer.
@param first first byte of the encoded integer
@return the length, in bytes */
inline uint8_t mlog_decode_varint_length(byte first)
{
uint8_t len= 1;
for (; first & 0x80; len++, first<<= 1);
return len;
}
/** Decode an integer in a redo log record.
@param log redo log record buffer
@return the decoded integer
@retval MLOG_DECODE_ERROR on error */
inline uint32_t mlog_decode_varint(const byte* log)
{
uint32_t i= *log;
if (i < MIN_2BYTE)
return i;
if (i < 0xc0)
return MIN_2BYTE + ((i & ~0x80) << 8 | log[1]);
if (i < 0xe0)
return MIN_3BYTE + ((i & ~0xc0) << 16 | uint32_t{log[1]} << 8 | log[2]);
if (i < 0xf0)
return MIN_4BYTE + ((i & ~0xe0) << 24 | uint32_t{log[1]} << 16 |
uint32_t{log[2]} << 8 | log[3]);
if (i == 0xf0)
{
i= uint32_t{log[1]} << 24 | uint32_t{log[2]} << 16 |
uint32_t{log[3]} << 8 | log[4];
if (i <= ~MIN_5BYTE)
return MIN_5BYTE + i;
}
return MLOG_DECODE_ERROR;
}
/** Encode an integer in a redo log record.
@param log redo log record buffer
@param i the integer to encode
@return end of the encoded integer */
inline byte *mlog_encode_varint(byte *log, size_t i)
{
if (i < MIN_2BYTE)
{
}
else if (i < MIN_3BYTE)
{
i-= MIN_2BYTE;
static_assert(MIN_3BYTE - MIN_2BYTE == 1 << 14, "compatibility");
*log++= 0x80 | static_cast<byte>(i >> 8);
}
else if (i < MIN_4BYTE)
{
i-= MIN_3BYTE;
static_assert(MIN_4BYTE - MIN_3BYTE == 1 << 21, "compatibility");
*log++= 0xc0 | static_cast<byte>(i >> 16);
goto last2;
}
else if (i < MIN_5BYTE)
{
i-= MIN_4BYTE;
static_assert(MIN_5BYTE - MIN_4BYTE == 1 << 28, "compatibility");
*log++= 0xe0 | static_cast<byte>(i >> 24);
goto last3;
}
else
{
ut_ad(i < MLOG_DECODE_ERROR);
i-= MIN_5BYTE;
*log++= 0xf0;
*log++= static_cast<byte>(i >> 24);
last3:
*log++= static_cast<byte>(i >> 16);
last2:
*log++= static_cast<byte>(i >> 8);
}
*log++= static_cast<byte>(i);
return log;
}
/** Determine the length of a log record.
@param log start of log record
@param end end of the log record buffer
@return the length of the record, in bytes
@retval 0 if the log extends past the end
@retval MLOG_DECODE_ERROR if the record is corrupted */
inline uint32_t mlog_decode_len(const byte *log, const byte *end)
{
ut_ad(log < end);
uint32_t i= *log;
if (!i)
return 0; /* end of mini-transaction */
if (~i & 15)
return (i & 15) + 1; /* 1..16 bytes */
if (UNIV_UNLIKELY(++log == end))
return 0; /* end of buffer */
i= *log;
if (UNIV_LIKELY(i < MIN_2BYTE)) /* 1 additional length byte: 16..143 bytes */
return 16 + i;
if (i < 0xc0) /* 2 additional length bytes: 144..16,527 bytes */
{
if (UNIV_UNLIKELY(log + 1 == end))
return 0; /* end of buffer */
return 16 + MIN_2BYTE + ((i & ~0xc0) << 8 | log[1]);
}
if (i < 0xe0) /* 3 additional length bytes: 16528..1065103 bytes */
{
if (UNIV_UNLIKELY(log + 2 == end))
return 0; /* end of buffer */
return 16 + MIN_3BYTE + ((i & ~0xe0) << 16 |
static_cast<uint32_t>(log[1]) << 8 | log[2]);
}
/* 1,065,103 bytes per log record ought to be enough for everyone */
return MLOG_DECODE_ERROR;
}
/** Write 1, 2, 4, or 8 bytes to a file page.
@param[in] block file page
@param[in,out] ptr pointer in file page
@param[in] val value to write
@tparam l number of bytes to write
@tparam w write request type
@tparam V type of val */
@tparam V type of val
@return whether any log was written */
template<unsigned l,mtr_t::write_type w,typename V>
inline void mtr_t::write(const buf_block_t &block, byte *ptr, V val)
inline bool mtr_t::write(const buf_block_t &block, void *ptr, V val)
{
ut_ad(ut_align_down(ptr, srv_page_size) == block.frame);
ut_ad(m_log_mode == MTR_LOG_NONE || m_log_mode == MTR_LOG_NO_REDO ||
!block.page.zip.data ||
/* written by fil_crypt_rotate_page() or innodb_make_page_dirty()? */
(w == FORCED && l == 1 && ptr == &block.frame[FIL_PAGE_SPACE_ID]) ||
mach_read_from_2(block.frame + FIL_PAGE_TYPE) <= FIL_PAGE_TYPE_ZBLOB2);
static_assert(l == 1 || l == 2 || l == 4 || l == 8, "wrong length");
byte buf[l];
switch (l) {
case 1:
if (w == OPT && mach_read_from_1(ptr) == val) return;
ut_ad(w != NORMAL || mach_read_from_1(ptr) != val);
ut_ad(val == static_cast<byte>(val));
*ptr= static_cast<byte>(val);
buf[0]= static_cast<byte>(val);
break;
case 2:
ut_ad(val == static_cast<uint16_t>(val));
if (w == OPT && mach_read_from_2(ptr) == val) return;
ut_ad(w != NORMAL || mach_read_from_2(ptr) != val);
mach_write_to_2(ptr, static_cast<uint16_t>(val));
mach_write_to_2(buf, static_cast<uint16_t>(val));
break;
case 4:
ut_ad(val == static_cast<uint32_t>(val));
if (w == OPT && mach_read_from_4(ptr) == val) return;
ut_ad(w != NORMAL || mach_read_from_4(ptr) != val);
mach_write_to_4(ptr, static_cast<uint32_t>(val));
mach_write_to_4(buf, static_cast<uint32_t>(val));
break;
case 8:
if (w == OPT && mach_read_from_8(ptr) == val) return;
ut_ad(w != NORMAL || mach_read_from_8(ptr) != val);
mach_write_to_8(ptr, val);
mach_write_to_8(buf, val);
break;
}
byte *p= static_cast<byte*>(ptr);
const byte *const end= p + l;
if (w != FORCED && m_log_mode == MTR_LOG_ALL)
{
const byte *b= buf;
while (*p++ == *b++)
{
if (p == end)
{
ut_ad(w == OPT);
return false;
}
}
p--;
}
::memcpy(ptr, buf, l);
memcpy_low(block.page, static_cast<uint16_t>
(ut_align_offset(p, srv_page_size)), p, end - p);
return true;
}
/** Log an initialization of a string of bytes.
@param[in] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write
@param[in] val the data byte to write */
inline void mtr_t::memset(const buf_block_t &b, ulint ofs, ulint len, byte val)
{
ut_ad(len);
set_modified();
if (m_log_mode != MTR_LOG_ALL)
return;
byte *log_ptr= m_log.open(11 + 2 + (l == 8 ? 9 : 5));
if (l == 8)
log_write(block, ptr, static_cast<mlog_id_t>(l), log_ptr, uint64_t{val});
else
log_write(block, ptr, static_cast<mlog_id_t>(l), log_ptr,
static_cast<uint32_t>(val));
static_assert(MIN_4BYTE > UNIV_PAGE_SIZE_MAX, "consistency");
size_t lenlen= (len < MIN_2BYTE ? 1 + 1 : len < MIN_3BYTE ? 2 + 1 : 3 + 1);
byte *l= log_write<MEMSET>(b.page.id, &b.page, lenlen, true, ofs);
l= mlog_encode_varint(l, len);
*l++= val;
m_log.close(l);
m_last_offset= static_cast<uint16_t>(ofs + len);
}
/** Write a byte string to a page.
/** Initialize a string of bytes.
@param[in,out] b buffer page
@param[in] ofs byte offset from block->frame
@param[in] len length of the data to write
@param[in] val the data byte to write */
inline void mtr_t::memset(const buf_block_t *b, ulint ofs, ulint len, byte val)
{
ut_ad(ofs <= ulint(srv_page_size));
ut_ad(ofs + len <= ulint(srv_page_size));
::memset(ofs + b->frame, val, len);
memset(*b, ofs, len, val);
}
/** Log an initialization of a repeating string of bytes.
@param[in] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write, in bytes
@param[in] str the string to write
@param[in] size size of str, in bytes */
inline void mtr_t::memset(const buf_block_t &b, ulint ofs, size_t len,
const void *str, size_t size)
{
ut_ad(size);
ut_ad(len > size); /* use mtr_t::memcpy() for shorter writes */
set_modified();
if (m_log_mode != MTR_LOG_ALL)
return;
static_assert(MIN_4BYTE > UNIV_PAGE_SIZE_MAX, "consistency");
size_t lenlen= (len < MIN_2BYTE ? 1 : len < MIN_3BYTE ? 2 : 3);
byte *l= log_write<MEMSET>(b.page.id, &b.page, lenlen + size, true, ofs);
l= mlog_encode_varint(l, len);
::memcpy(l, str, size);
l+= size;
m_log.close(l);
m_last_offset= static_cast<uint16_t>(ofs + len);
}
/** Initialize a repeating string of bytes.
@param[in,out] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write, in bytes
@param[in] str the string to write
@param[in] size size of str, in bytes */
inline void mtr_t::memset(const buf_block_t *b, ulint ofs, size_t len,
const void *str, size_t size)
{
ut_ad(ofs <= ulint(srv_page_size));
ut_ad(ofs + len <= ulint(srv_page_size));
ut_ad(len > size); /* use mtr_t::memcpy() for shorter writes */
size_t s= 0;
while (s < len)
{
::memcpy(ofs + s + b->frame, str, size);
s+= len;
}
::memcpy(ofs + s + b->frame, str, len - s);
memset(*b, ofs, len, str, size);
}
/** Log a write of a byte string to a page.
@param[in] b buffer page
@param[in] offset byte offset from b->frame
@param[in] str the data to write
@param[in] len length of the data to write */
inline
void mtr_t::memcpy(buf_block_t *b, ulint offset, const void *str, ulint len)
inline void mtr_t::memcpy(const buf_block_t &b, ulint offset, ulint len)
{
::memcpy(b->frame + offset, str, len);
memcpy(*b, offset, len);
ut_ad(len);
ut_ad(offset <= ulint(srv_page_size));
ut_ad(offset + len <= ulint(srv_page_size));
memcpy_low(b.page, uint16_t(offset), &b.frame[offset], len);
}
/** Log a write of a byte string to a page.
@param id page identifier
@param offset byte offset within page
@param data data to be written
@param len length of the data, in bytes */
inline void mtr_t::memcpy_low(const buf_page_t &bpage, uint16_t offset,
const void *data, size_t len)
{
ut_ad(len);
set_modified();
if (m_log_mode != MTR_LOG_ALL)
return;
if (len < mtr_buf_t::MAX_DATA_SIZE - (1 + 3 + 3 + 5 + 5))
{
byte *end= log_write<WRITE>(bpage.id, &bpage, len, true, offset);
::memcpy(end, data, len);
m_log.close(end + len);
}
else
{
m_log.close(log_write<WRITE>(bpage.id, &bpage, len, false, offset));
m_log.push(static_cast<const byte*>(data), static_cast<uint32_t>(len));
}
m_last_offset= static_cast<uint16_t>(offset + len);
}
/** Log that a string of bytes was copied from the same page.
@param[in] b buffer page
@param[in] d destination offset within the page
@param[in] s source offset within the page
@param[in] len length of the data to copy */
inline void mtr_t::memmove(const buf_block_t &b, ulint d, ulint s, ulint len)
{
ut_ad(d >= 8);
ut_ad(s >= 8);
ut_ad(len);
ut_ad(s <= ulint(srv_page_size));
ut_ad(s + len <= ulint(srv_page_size));
ut_ad(s != d);
ut_ad(d <= ulint(srv_page_size));
ut_ad(d + len <= ulint(srv_page_size));
set_modified();
if (m_log_mode != MTR_LOG_ALL)
return;
static_assert(MIN_4BYTE > UNIV_PAGE_SIZE_MAX, "consistency");
size_t lenlen= (len < MIN_2BYTE ? 1 : len < MIN_3BYTE ? 2 : 3);
/* The source offset is encoded relative to the destination offset,
with the sign in the least significant bit. */
if (s > d)
s= (s - d) << 1;
else
s= (d - s) << 1 | 1;
/* The source offset 0 is not possible. */
s-= 1 << 1;
size_t slen= (s < MIN_2BYTE ? 1 : s < MIN_3BYTE ? 2 : 3);
byte *l= log_write<MEMMOVE>(b.page.id, &b.page, lenlen + slen, true, d);
l= mlog_encode_varint(l, len);
l= mlog_encode_varint(l, s);
m_log.close(l);
m_last_offset= static_cast<uint16_t>(d + len);
}
/**
Write a log record.
@tparam type redo log record type
@param id persistent page identifier
@param bpage buffer pool page, or nullptr
@param len number of additional bytes to write
@param alloc whether to allocate the additional bytes
@param offset byte offset, or 0 if the record type does not allow one
@return end of mini-transaction log, minus len */
template<byte type>
inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage,
size_t len, bool alloc, size_t offset)
{
static_assert(!(type & 15) && type != RESERVED && type != OPTION &&
type <= FILE_CHECKPOINT, "invalid type");
ut_ad(type >= FILE_CREATE || is_named_space(id.space()));
ut_ad(!bpage || bpage->id == id);
constexpr bool have_len= type != INIT_PAGE && type != FREE_PAGE;
constexpr bool have_offset= type == WRITE || type == MEMSET ||
type == MEMMOVE;
static_assert(!have_offset || have_len, "consistency");
ut_ad(have_len || len == 0);
ut_ad(have_len || !alloc);
ut_ad(have_offset || offset == 0);
ut_ad(offset + len <= srv_page_size);
static_assert(MIN_4BYTE >= UNIV_PAGE_SIZE_MAX, "consistency");
size_t max_len;
if (!have_len)
max_len= 1 + 5 + 5;
else if (!have_offset)
max_len= m_last == bpage
? 1 + 3
: 1 + 3 + 5 + 5;
else if (m_last == bpage && m_last_offset <= offset)
{
/* Encode the offset relative from m_last_offset. */
offset-= m_last_offset;
max_len= 1 + 3 + 3;
}
else
max_len= 1 + 3 + 5 + 5 + 3;
byte *const log_ptr= m_log.open(alloc ? max_len + len : max_len);
byte *end= log_ptr + 1;
const byte same_page= max_len < 1 + 5 + 5 ? 0x80 : 0;
if (!same_page)
{
end= mlog_encode_varint(end, id.space());
end= mlog_encode_varint(end, id.page_no());
m_last= bpage;
}
if (have_offset)
{
byte* oend= mlog_encode_varint(end, offset);
if (oend + len > &log_ptr[16])
{
len+= oend - log_ptr - 15;
if (len >= MIN_3BYTE)
len+= 2;
else if (len >= MIN_2BYTE)
len++;
*log_ptr= type | same_page;
end= mlog_encode_varint(log_ptr + 1, len);
if (!same_page)
{
end= mlog_encode_varint(end, id.space());
end= mlog_encode_varint(end, id.page_no());
}
end= mlog_encode_varint(end, offset);
return end;
}
else
end= oend;
}
else if (len >= 3 && end + len > &log_ptr[16])
{
len+= end - log_ptr - 16;
if (len >= MIN_3BYTE)
len+= 2;
else if (len >= MIN_2BYTE)
len++;
end= log_ptr;
*end++= type | same_page;
mlog_encode_varint(end, len);
if (!same_page)
{
end= mlog_encode_varint(end, id.space());
end= mlog_encode_varint(end, id.page_no());
}
return end;
}
ut_ad(end + len >= &log_ptr[1] + !same_page);
ut_ad(end + len <= &log_ptr[16]);
ut_ad(end <= &log_ptr[max_len]);
*log_ptr= type | same_page | static_cast<byte>(end + len - log_ptr - 1);
ut_ad(*log_ptr & 15);
return end;
}
/** Write a byte string to a page.
@param[in,out] b ROW_FORMAT=COMPRESSED index page
@param[in] ofs byte offset from b->zip.data
@param[in] b buffer page
@param[in] dest destination within b.frame
@param[in] str the data to write
@param[in] len length of the data to write */
inline
void mtr_t::zmemcpy(buf_page_t *b, ulint offset, const void *str, ulint len)
@param[in] len length of the data to write
@tparam w write request type */
template<mtr_t::write_type w>
inline void mtr_t::memcpy(const buf_block_t &b, void *dest, const void *str,
ulint len)
{
::memcpy(b->zip.data + offset, str, len);
zmemcpy(*b, offset, len);
ut_ad(ut_align_down(dest, srv_page_size) == b.frame);
char *d= static_cast<char*>(dest);
const char *s= static_cast<const char*>(str);
if (w != FORCED && m_log_mode == MTR_LOG_ALL)
{
ut_ad(len);
const char *const end= d + len;
while (*d++ == *s++)
{
if (d == end)
{
ut_ad(w == OPT);
return;
}
}
s--;
d--;
len= static_cast<ulint>(end - d);
}
::memcpy(d, s, len);
memcpy(b, ut_align_offset(d, srv_page_size), len);
}
/** Initialize an entire page.
@ -121,13 +517,37 @@ inline void mtr_t::init(buf_block_t *b)
return;
}
m_log.close(log_write_low(MLOG_INIT_FILE_PAGE2, b->page.id, m_log.open(11)));
m_log.close(log_write<INIT_PAGE>(b->page.id, &b->page));
m_last_offset= FIL_PAGE_TYPE;
b->page.init_on_flush= true;
}
/** Free a page.
@param id page identifier */
inline void mtr_t::free(const page_id_t id)
{
if (m_log_mode == MTR_LOG_ALL)
m_log.close(log_write<FREE_PAGE>(id, nullptr));
}
/** Partly initialize a B-tree page.
@param block B-tree page
@param comp false=ROW_FORMAT=REDUNDANT, true=COMPACT or DYNAMIC */
inline void mtr_t::page_create(const buf_block_t &block, bool comp)
{
set_modified();
if (m_log_mode != MTR_LOG_ALL)
return;
byte *l= log_write<INIT_INDEX_PAGE>(block.page.id, &block.page, 1, true);
*l++= comp;
m_log.close(l);
m_last_offset= FIL_PAGE_TYPE;
}
/********************************************************//**
Parses an initial log record written by mtr_t::log_write_low().
Parses an initial log record written by mlog_write_initial_log_record_low().
@return parsed record end, NULL if not a complete record */
ATTRIBUTE_COLD /* only used when crash-upgrading */
const byte*
mlog_parse_initial_log_record(
/*==========================*/

View file

@ -129,7 +129,7 @@ struct mtr_t {
/** Commit a mini-transaction that did not modify any pages,
but generated some redo log on a higher level, such as
MLOG_FILE_NAME records and an optional MLOG_CHECKPOINT marker.
FILE_MODIFY records and an optional FILE_CHECKPOINT marker.
The caller must invoke log_mutex_enter() and log_mutex_exit().
This is to be used at log_checkpoint().
@param checkpoint_lsn the log sequence number of a checkpoint, or 0 */
@ -171,7 +171,7 @@ struct mtr_t {
inline mtr_log_t set_log_mode(mtr_log_t mode);
/** Copy the tablespaces associated with the mini-transaction
(needed for generating MLOG_FILE_NAME records)
(needed for generating FILE_MODIFY records)
@param[in] mtr mini-transaction that may modify
the same set of tablespaces as this one */
void set_spaces(const mtr_t& mtr)
@ -184,7 +184,7 @@ struct mtr_t {
}
/** Set the tablespace associated with the mini-transaction
(needed for generating a MLOG_FILE_NAME record)
(needed for generating a FILE_MODIFY record)
@param[in] space_id user or system tablespace ID
@return the tablespace */
fil_space_t* set_named_space_id(ulint space_id)
@ -203,7 +203,7 @@ struct mtr_t {
}
/** Set the tablespace associated with the mini-transaction
(needed for generating a MLOG_FILE_NAME record)
(needed for generating a FILE_MODIFY record)
@param[in] space user or system tablespace */
void set_named_space(fil_space_t* space)
{
@ -216,12 +216,12 @@ struct mtr_t {
#ifdef UNIV_DEBUG
/** Check the tablespace associated with the mini-transaction
(needed for generating a MLOG_FILE_NAME record)
(needed for generating a FILE_MODIFY record)
@param[in] space tablespace
@return whether the mini-transaction is associated with the space */
bool is_named_space(ulint space) const;
/** Check the tablespace associated with the mini-transaction
(needed for generating a MLOG_FILE_NAME record)
(needed for generating a FILE_MODIFY record)
@param[in] space tablespace
@return whether the mini-transaction is associated with the space */
bool is_named_space(const fil_space_t* space) const;
@ -407,136 +407,124 @@ struct mtr_t {
@param[in] val value to write
@tparam l number of bytes to write
@tparam w write request type
@tparam V type of val */
@tparam V type of val
@return whether any log was written */
template<unsigned l,write_type w= NORMAL,typename V>
inline void write(const buf_block_t &block, byte *ptr, V val)
inline bool write(const buf_block_t &block, void *ptr, V val)
MY_ATTRIBUTE((nonnull));
/** Log a write of a byte string to a page.
@param[in] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write */
void memcpy(const buf_block_t &b, ulint ofs, ulint len);
inline void memcpy(const buf_block_t &b, ulint ofs, ulint len);
/** Write a byte string to a page.
@param[in,out] b buffer page
@param[in] offset byte offset from b->frame
@param[in] dest destination within b.frame
@param[in] str the data to write
@param[in] len length of the data to write */
inline void memcpy(buf_block_t *b, ulint offset, const void *str, ulint len);
@param[in] len length of the data to write
@tparam w write request type */
template<write_type w= NORMAL>
inline void memcpy(const buf_block_t &b, void *dest, const void *str,
ulint len);
/** Write a byte string to a ROW_FORMAT=COMPRESSED page.
/** Log a write of a byte string to a ROW_FORMAT=COMPRESSED page.
@param[in] b ROW_FORMAT=COMPRESSED index page
@param[in] ofs byte offset from b.zip.data
@param[in] offset byte offset from b.zip.data
@param[in] len length of the data to write */
void zmemcpy(const buf_page_t &b, ulint offset, ulint len);
inline void zmemcpy(const buf_page_t &b, ulint offset, ulint len);
/** Write a byte string to a ROW_FORMAT=COMPRESSED page.
@param[in,out] b ROW_FORMAT=COMPRESSED index page
@param[in] ofs byte offset from b->zip.data
@param[in] dest destination within b.zip.data
@param[in] str the data to write
@param[in] len length of the data to write */
inline void zmemcpy(buf_page_t *b, ulint offset, const void *str, ulint len);
@param[in] len length of the data to write
@tparam w write request type */
template<write_type w= NORMAL>
inline void zmemcpy(const buf_page_t &b, void *dest, const void *str,
ulint len);
/** Log an initialization of a string of bytes.
@param[in] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write
@param[in] val the data byte to write */
inline void memset(const buf_block_t &b, ulint ofs, ulint len, byte val);
/** Initialize a string of bytes.
@param[in,out] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write
@param[in] val the data byte to write */
void memset(const buf_block_t* b, ulint ofs, ulint len, byte val);
inline void memset(const buf_block_t *b, ulint ofs, ulint len, byte val);
/** Log an initialization of a repeating string of bytes.
@param[in] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write, in bytes
@param[in] str the string to write
@param[in] size size of str, in bytes */
inline void memset(const buf_block_t &b, ulint ofs, size_t len,
const void *str, size_t size);
/** Initialize a repeating string of bytes.
@param[in,out] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write, in bytes
@param[in] str the string to write
@param[in] size size of str, in bytes */
inline void memset(const buf_block_t *b, ulint ofs, size_t len,
const void *str, size_t size);
/** Log that a string of bytes was copied from the same page.
@param[in] b buffer page
@param[in] d destination offset within the page
@param[in] s source offset within the page
@param[in] len length of the data to copy */
inline void memmove(const buf_block_t &b, ulint d, ulint s, ulint len);
/** Initialize an entire page.
@param[in,out] b buffer page */
void init(buf_block_t *b);
/** Free a page.
@param id page identifier */
void free(const page_id_t id) { log_page_write(id, MLOG_INIT_FREE_PAGE); }
inline void free(const page_id_t id);
/** Partly initialize a B-tree page.
@param id page identifier
@param block B-tree page
@param comp false=ROW_FORMAT=REDUNDANT, true=COMPACT or DYNAMIC */
void page_create(const page_id_t id, bool comp)
{
set_modified();
log_page_write(id, comp ? MLOG_COMP_PAGE_CREATE : MLOG_PAGE_CREATE);
}
inline void page_create(const buf_block_t &block, bool comp);
/** Write a log record about a file operation.
@param type file operation
@param space_id tablespace identifier
@param first_page_no first page number in the file
@param path file path
@param new_path new file path for type=MLOG_FILE_RENAME2
@param flags tablespace flags for type=MLOG_FILE_CREATE2 */
inline void log_file_op(mlog_id_t type, ulint space_id, ulint first_page_no,
const char *path,
const char *new_path= nullptr, ulint flags= 0);
@param new_path new file path for type=FILE_RENAME */
inline void log_file_op(mfile_type_t type, ulint space_id,
ulint first_page_no, const char *path,
const char *new_path= nullptr);
private:
/**
Write a complex page operation.
@param id page identifier
@param type type of operation */
void log_page_write(const page_id_t id, mlog_id_t type)
{
ut_ad(type == MLOG_INIT_FREE_PAGE || type == MLOG_COMP_PAGE_CREATE ||
type == MLOG_PAGE_CREATE);
if (m_log_mode == MTR_LOG_ALL)
m_log.close(log_write_low(type, id, m_log.open(11)));
}
/** Log a write of a byte string to a page.
@param b buffer page
@param offset byte offset within page
@param data data to be written
@param len length of the data, in bytes */
inline void memcpy_low(const buf_page_t &bpage, uint16_t offset,
const void *data, size_t len);
/**
Write a log record.
@param type redo log record type
@tparam type redo log record type
@param id persistent page identifier
@param l current end of mini-transaction log
@return new end of mini-transaction log */
inline byte *log_write_low(mlog_id_t type, const page_id_t id, byte *l)
{
ut_ad(type <= MLOG_BIGGEST_TYPE);
ut_ad(type == MLOG_FILE_NAME || type == MLOG_FILE_DELETE ||
type == MLOG_FILE_CREATE2 || type == MLOG_FILE_RENAME2 ||
is_named_space(id.space()));
*l++= type;
l+= mach_write_compressed(l, id.space());
l+= mach_write_compressed(l, id.page_no());
++m_n_log_recs;
return l;
}
/**
Write a log record for writing 1, 2, 4, or 8 bytes.
@param[in] type number of bytes to write
@param[in] block file page
@param[in] ptr pointer within block.frame
@param[in,out] l log record buffer
@return new end of mini-transaction log */
byte *log_write_low(mlog_id_t type, const buf_block_t &block,
const byte *ptr, byte *l);
/**
Write a log record for writing 1, 2, or 4 bytes.
@param[in] block file page
@param[in,out] ptr pointer in file page
@param[in] l number of bytes to write
@param[in,out] log_ptr log record buffer
@param[in] val value to write */
void log_write(const buf_block_t &block, byte *ptr, mlog_id_t l,
byte *log_ptr, uint32_t val)
MY_ATTRIBUTE((nonnull));
/**
Write a log record for writing 8 bytes.
@param[in] block file page
@param[in,out] ptr pointer in file page
@param[in] l number of bytes to write (8)
@param[in,out] log_ptr log record buffer
@param[in] val value to write */
void log_write(const buf_block_t &block, byte *ptr, mlog_id_t l,
byte *log_ptr, uint64_t val)
MY_ATTRIBUTE((nonnull));
@param bpage buffer pool page, or nullptr
@param len number of additional bytes to write
@param alloc whether to allocate the additional bytes
@param offset byte offset, or 0 if the record type does not allow one
@return end of mini-transaction log, minus len */
template<byte type>
inline byte *log_write(const page_id_t id, const buf_page_t *bpage,
size_t len= 0, bool alloc= false, size_t offset= 0);
/** Prepare to write the mini-transaction log to the redo log buffer.
@return number of bytes to write in finish_write() */
@ -563,6 +551,11 @@ private:
bool m_commit= false;
#endif
/** The page of the most recent m_log record written, or NULL */
const buf_page_t* m_last;
/** The current byte offset in m_last, or 0 */
uint16_t m_last_offset;
/** specifies which operations should be logged; default MTR_LOG_ALL */
uint16_t m_log_mode:2;
@ -576,8 +569,6 @@ private:
to suppress some read-ahead operations, @see ibuf_inside() */
uint16_t m_inside_ibuf:1;
/** number of m_log records */
uint16_t m_n_log_recs:11;
#ifdef UNIV_DEBUG
/** Persistent user tablespace associated with the
mini-transaction, or 0 (TRX_SYS_SPACE) if none yet */

View file

@ -204,7 +204,7 @@ mtr_t::set_log_mode(mtr_log_t mode)
case MTR_LOG_ALL:
/* MTR_LOG_NO_REDO can only be set before generating
any redo log records. */
ut_ad(mode != MTR_LOG_NO_REDO || m_n_log_recs == 0);
ut_ad(mode != MTR_LOG_NO_REDO || m_log.empty());
m_log_mode = mode;
return(old_mode);
}

View file

@ -29,6 +29,8 @@ Created 11/26/1995 Heikki Tuuri
#ifndef UNIV_INNOCHECKSUM
#include "sync0rw.h"
#else
#include "univ.i"
#endif /* UNIV_INNOCHECKSUM */
struct mtr_t;
@ -47,6 +49,233 @@ enum mtr_log_t {
MTR_LOG_NO_REDO
};
/*
A mini-transaction is a stream of records that is always terminated by
a NUL byte. The first byte of a mini-transaction record is never NUL,
but NUL bytes can occur within mini-transaction records. The first
bytes of each record will explicitly encode the length of the record.
NUL bytes also acts as padding in log blocks, that is, there can be
multiple sucessive NUL bytes between mini-transactions in a redo log
block.
The first byte of the record would contain a record type, flags, and a
part of length. The optional second byte of the record will contain
more length. (Not needed for short records.)
Bit 7 of the first byte of a redo log record is the same_page flag.
If same_page=1, the record is referring to the same page as the
previous record. Records that do not refer to data pages but to file
operations are identified by setting the same_page=1 in the very first
record(s) of the mini-transaction. A mini-transaction record that
carries same_page=0 must only be followed by page-oriented records.
Bits 6..4 of the first byte of a redo log record identify the redo log
type. The following record types refer to data pages:
FREE_PAGE (0): corresponds to MLOG_INIT_FREE_PAGE
INIT_PAGE (1): corresponds to MLOG_INIT_FILE_PAGE2
INIT_INDEX_PAGE (2): initialize a B-tree or R-tree page
WRITE (3): replaces MLOG_nBYTES, MLOG_WRITE_STRING, MLOG_ZIP_*
MEMSET (4): extends the 10.4 MLOG_MEMSET record
MEMMOVE (5): copy data within the page (avoids logging redundant data)
RESERVED (6): reserved for future use; a subtype code
(encoded immediately after the length) would be written
to reserve code space for further extensions
OPTION (7): optional record that may be ignored; a subtype code
(encoded immediately after the length) would distinguish actual
usage, such as:
* MDEV-18976 page checksum record
* binlog record
* SQL statement (at the start of statement)
Bits 3..0 indicate the redo log record length, excluding the first
byte, but including additional length bytes and any other bytes,
such as the optional tablespace identifier and page number.
Values 1..15 represent lengths of 1 to 15 bytes. The special value 0
indicates that 1 to 3 length bytes will follow to encode the remaining
length that exceeds 16 bytes.
Additional length bytes if length>16: 0 to 3 bytes
0xxxxxxx for 0 to 127 (total: 16 to 143 bytes)
10xxxxxx xxxxxxxx for 128 to 16511 (total: 144 to 16527)
110xxxxx xxxxxxxx xxxxxxxx for 16512 to 2113663 (total: 16528 to 2113679)
111xxxxx reserved (corrupted record, and file!)
If same_page=0, the tablespace identifier and page number will use
similar 1-to-5-byte variable-length encoding:
0xxxxxxx for 0 to 127
10xxxxxx xxxxxxxx for 128 to 16,511
110xxxxx xxxxxxxx xxxxxxxx for 16,512 to 2,113,663
1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx for 2,113,664 to 270,549,119
11110xxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx for 270,549,120 to 34,630,287,487
11111xxx reserved (corrupted record)
Note: Some 5-byte values are reserved, because the tablespace identifier
and page number can only be up to 4,294,967,295.
If same_page=1 is set in a record that follows a same_page=0 record
in a mini-transaction, the tablespace identifier and page number
fields will be omitted.
(For some file-oriented records (if same_page=1 for the first records
of a mini-transaction), we will write tablespace identifier using the
same 1-to-5-byte encoding. TBD: describe the exact format of
file-oriented records. With MDEV-14425, we could write file-level log
records to a separate file, not interleaved with page-level redo log
at all. We could reserve the file ib_logfile0 for checkpoint information
and for file-level redo log records.)
For FREE_PAGE or INIT_PAGE, if same_page=1, the record will be treated
as corrupted (or reserved for future extension). The type code must
be followed by 1+1 to 5+5 bytes (to encode the tablespace identifier
and page number). If the record length does not match the encoded
lengths of the tablespace identifier and page number, the record will
be treated as corrupted. This allows future expansion of the format.
If there is a FREE_PAGE record in a mini-transaction, it must be the
only record for that page in the mini-transaction. If there is an
INIT_PAGE record for a page in a mini-transaction, it must be the
first record for that page in the mini-transaction.
An INIT_INDEX_PAGE must be followed by 1+1 to 5+5 bytes for the page
identifier (unless the same_page flag is set) and a subtype code:
0 for ROW_FORMAT=REDUNDANT and 1 for ROW_FORMAT=COMPACT or DYNAMIC.
For WRITE, MEMSET, MEMMOVE, the next 1 to 3 bytes are the byte offset
on the page, relative from the previous offset. If same_page=0, the
"previous offset" is 0. If same_page=1, the "previous offset" is where
the previous operation ended (FIL_PAGE_TYPE for INIT_PAGE or INIT_INDEX_PAGE).
0xxxxxxx for 0 to 127
10xxxxxx xxxxxxxx for 128 to 16,511
110xxxxx xxxxxxxx xxxxxxxx for 16,512 to 2,113,663
111xxxxx reserved (corrupted record)
If the sum of the "previous offset" and the current offset exceeds the
page size, the record is treated as corrupted. Negative relative offsets
cannot be written. Instead, a record with same_page=0 can be written.
For MEMSET and MEMMOVE, the target length will follow, encoded in 1 to
3 bytes. If the length+offset exceeds the page size, the record will
be treated as corrupted.
For MEMMOVE, the source offset will follow, encoded in 1 to 3 bytes,
relative to the current offset. The offset 0 is not possible, and
the sign bit is the least significant bit. That is,
+x is encoded as (x-1)<<1 (+1,+2,+3,... is 0,2,4,...) and
-x is encoded as (x-1)<<1|1 (-1,-2,-3,... is 1,3,5,...).
The source offset must be within the page size, or else the record
will be treated as corrupted.
For MEMSET or WRITE, the byte(s) to be written will follow. For
MEMSET, it usually is a single byte, but it could also be a multi-byte
string, which would be copied over and over until the target length is
reached. The length of the remaining bytes is implied by the length
bytes at the start of the record.
For MEMMOVE, if any bytes follow, the record is treated as corrupted
(future expansion).
As mentioned at the start of this comment, the type byte 0 would be
special, marking the end of a mini-transaction. We could use the
corresponding value 0x80 (with same_page=1) for something special,
such as a future extension when more type codes are needed, or for
encoding rarely needed redo log records.
Examples:
INIT could be logged as 0x12 0x34 0x56, meaning "type code 1 (INIT), 2
bytes to follow" and "tablespace ID 0x34", "page number 0x56".
The first byte must be between 0x12 and 0x1a, and the total length of
the record must match the lengths of the encoded tablespace ID and
page number.
WRITE could be logged as 0x36 0x40 0x57 0x60 0x12 0x34 0x56, meaning
"type code 3 (WRITE), 6 bytes to follow" and "tablespace ID 0x40",
"page number 0x57", "byte offset 0x60", data 0x34,0x56.
A subsequent WRITE to the same page could be logged 0xb5 0x7f 0x23
0x34 0x56 0x78, meaning "same page, type code 3 (WRITE), 5 bytes to
follow", "byte offset 0x7f"+0x60+2, bytes 0x23,0x34,0x56,0x78.
The end of the mini-transaction would be indicated by a NUL byte.
*/
/** Redo log record types. These bit patterns (3 bits) will be written
to the redo log file, so the existing codes or their interpretation on
crash recovery must not be changed. */
enum mrec_type_t
{
/** Free a page. On recovery, it is unnecessary to read the page.
The next record for the page (if any) must be INIT_PAGE or
INIT_INDEX_PAGE. After this record has been written, the page may be
overwritten with zeros, or discarded or trimmed. */
FREE_PAGE = 0,
/** Zero-initialize a page. The current byte offset (for subsequent
records) will be reset to FIL_PAGE_TYPE. */
INIT_PAGE = 0x10,
/** Like INIT_PAGE, but initializing a B-tree or R-tree index page,
including writing the "infimum" and "supremum" pseudo-records. The
current byte offset will be reset to FIL_PAGE_TYPE. The
type code is followed by a subtype byte to specify the ROW_FORMAT:
0 for ROW_FORMAT=REDUNDANT, 1 for ROW_FORMAT=COMPACT or DYNAMIC. */
INIT_INDEX_PAGE = 0x20,
/** Write a string of bytes. Followed by the byte offset (unsigned,
relative to the current byte offset, encoded in 1 to 3 bytes) and
the bytes to write (at least one). The current byte offset will be
set after the last byte written. */
WRITE = 0x30,
/** Like WRITE, but before the bytes to write, the data_length-1
(encoded in 1 to 3 bytes) will be encoded, and it must be more
than the length of the following data bytes to write.
The data byte(s) will be repeatedly copied to the output until
the data_length is reached. */
MEMSET = 0x40,
/** Like MEMSET, but instead of the bytes to write, a source byte
offset (signed, nonzero, relative to the target byte offset, encoded
in 1 to 3 bytes, with the sign bit in the least significant bit)
will be written.
That is, +x is encoded as (x-1)<<1 (+1,+2,+3,... is 0,2,4,...)
and -x is encoded as (x-1)<<1|1 (-1,-2,-3,... is 1,3,5,...).
The source offset and data_length must be within the page size, or
else the record will be treated as corrupted. The data will be
copied from the page as it was at the start of the
mini-transaction. */
MEMMOVE = 0x50,
/** Reserved for future use. */
RESERVED = 0x60,
/** Optional record that may be ignored in crash recovery.
A subtype code will be encoded immediately after the length.
Possible subtypes would include a MDEV-18976 page checksum record,
a binlog record, or an SQL statement. */
OPTION = 0x70
};
/** Redo log record types for file-level operations. These bit
patterns will be written to redo log files, so the existing codes or
their interpretation on crash recovery must not be changed. */
enum mfile_type_t
{
/** Create a file. Followed by tablespace ID and the file name. */
FILE_CREATE = 0x80,
/** Delete a file. Followed by tablespace ID and the file name. */
FILE_DELETE = 0x90,
/** Rename a file. Followed by tablespace ID and the old file name,
NUL, and the new file name. */
FILE_RENAME = 0xa0,
/** Modify a file. Followed by tablespace ID and the file name. */
FILE_MODIFY = 0xb0,
#if 1 /* MDEV-14425 FIXME: Remove this! */
/** End-of-checkpoint marker. Followed by 2 dummy bytes of page identifier,
8 bytes of LSN, and padded with a NUL; @see SIZE_OF_FILE_CHECKPOINT. */
FILE_CHECKPOINT = 0xf0
#endif
};
#if 1 /* MDEV-14425 FIXME: Remove this! */
/** Size of a FILE_CHECKPOINT record, including the trailing byte to
terminate the mini-transaction. */
constexpr byte SIZE_OF_FILE_CHECKPOINT= 3/*type,page_id*/ + 8/*LSN*/ + 1;
#endif
/** @name Log item types
The log items are declared 'byte' so that the compiler can warn if val
and type parameters are switched in a call to mlog_write. NOTE!
@ -120,9 +349,6 @@ enum mlog_id_t {
/** initialize an ibuf bitmap page (used in MariaDB 10.2 and 10.3) */
MLOG_IBUF_BITMAP_INIT = 27,
/** MDEV-12353 WIP: write to a ROW_FORMAT=COMPRESSED page */
MLOG_ZIP_WRITE_STRING = 29,
/** write a string to a page */
MLOG_WRITE_STRING = 30,

View file

@ -178,7 +178,7 @@ the first record in the list of records. */
#define PAGE_DIR FIL_PAGE_DATA_END
/* We define a slot in the page directory as two bytes */
#define PAGE_DIR_SLOT_SIZE 2
constexpr uint16_t PAGE_DIR_SLOT_SIZE= 2;
/* The offset of the physically lower end of the directory, counted from
page end, when the page is empty */
@ -840,15 +840,6 @@ page_rec_is_second_last(
const page_t* page) /*!< in: page */
MY_ATTRIBUTE((warn_unused_result));
/***************************************************************//**
Looks for the record which owns the given record.
@return the owner record */
UNIV_INLINE
rec_t*
page_rec_find_owner_rec(
/*====================*/
rec_t* rec); /*!< in: the physical record */
/************************************************************//**
Returns the maximum combined size of records which can be inserted on top
of record heap.
@ -924,7 +915,7 @@ page_get_instant(const page_t* page);
@param[in,out] block buffer block
@param[in,out] mtr mini-transaction
@param[in] comp set unless ROW_FORMAT=REDUNDANT */
void page_create(buf_block_t* block, mtr_t* mtr, bool comp);
void page_create(buf_block_t *block, mtr_t *mtr, bool comp);
/**********************************************************//**
Create a compressed B-tree index page. */
void

View file

@ -89,17 +89,14 @@ page_set_ssn_id(
node_seq_t ssn_id, /*!< in: transaction id */
mtr_t* mtr) /*!< in/out: mini-transaction */
{
ut_ad(!mtr || mtr_memo_contains_flagged(mtr, block,
MTR_MEMO_PAGE_SX_FIX
| MTR_MEMO_PAGE_X_FIX));
byte* ssn = block->frame + FIL_RTREE_SPLIT_SEQ_NUM;
if (UNIV_LIKELY_NULL(page_zip)) {
mach_write_to_8(ssn, ssn_id);
page_zip_write_header(block, ssn, 8, mtr);
} else {
mtr->write<8,mtr_t::OPT>(*block, ssn, ssn_id);
}
ut_ad(mtr_memo_contains_flagged(mtr, block,
MTR_MEMO_PAGE_SX_FIX | MTR_MEMO_PAGE_X_FIX));
ut_ad(!page_zip || page_zip == &block->page.zip);
constexpr uint16_t field= FIL_RTREE_SPLIT_SEQ_NUM;
byte *b= my_assume_aligned<2>(&block->frame[field]);
if (mtr->write<8,mtr_t::OPT>(*block, b, ssn_id) &&
UNIV_LIKELY_NULL(page_zip))
memcpy_aligned<2>(&page_zip->data[field], b, 8);
}
#endif /* !UNIV_INNOCHECKSUM */
@ -133,15 +130,11 @@ Reset PAGE_LAST_INSERT.
@param[in,out] mtr mini-transaction */
inline void page_header_reset_last_insert(buf_block_t *block, mtr_t *mtr)
{
byte *b= &block->frame[PAGE_HEADER + PAGE_LAST_INSERT];
if (UNIV_LIKELY_NULL(block->page.zip.data))
{
mach_write_to_2(b, 0);
page_zip_write_header(block, b, 2, mtr);
}
else
mtr->write<2,mtr_t::OPT>(*block, b, 0U);
constexpr uint16_t field= PAGE_HEADER + PAGE_LAST_INSERT;
byte *b= my_assume_aligned<2>(&block->frame[field]);
if (mtr->write<2,mtr_t::OPT>(*block, b, 0U) &&
UNIV_LIKELY_NULL(block->page.zip.data))
memcpy_aligned<2>(&block->page.zip.data[field], b, 2);
}
/***************************************************************//**
@ -576,30 +569,6 @@ page_rec_get_prev(
return((rec_t*) page_rec_get_prev_const(rec));
}
/***************************************************************//**
Looks for the record which owns the given record.
@return the owner record */
UNIV_INLINE
rec_t*
page_rec_find_owner_rec(
/*====================*/
rec_t* rec) /*!< in: the physical record */
{
ut_ad(page_rec_check(rec));
if (page_rec_is_comp(rec)) {
while (rec_get_n_owned_new(rec) == 0) {
rec = page_rec_get_next(rec);
}
} else {
while (rec_get_n_owned_old(rec) == 0) {
rec = page_rec_get_next(rec);
}
}
return(rec);
}
/**********************************************************//**
Returns the base extra size of a physical record. This is the
size of the fixed header, independent of the record size.

View file

@ -230,19 +230,6 @@ page_zip_available(
the heap */
MY_ATTRIBUTE((warn_unused_result));
/**********************************************************************//**
Write data to the uncompressed header portion of a page. The data must
already have been written to the uncompressed page. */
UNIV_INLINE
void
page_zip_write_header(
/*==================*/
buf_block_t* block, /*!< in/out: compressed page */
const byte* str, /*!< in: address on the uncompressed page */
ulint length, /*!< in: length of the data */
mtr_t* mtr) /*!< in/out: mini-transaction */
MY_ATTRIBUTE((nonnull));
/** Write an entire record to the ROW_FORMAT=COMPRESSED page.
The data must already have been written to the uncompressed page.
@param[in,out] block ROW_FORMAT=COMPRESSED page
@ -342,17 +329,14 @@ page_zip_parse_write_trx_id(
page_zip_des_t* page_zip)
MY_ATTRIBUTE((nonnull(1,2), warn_unused_result));
/**********************************************************************//**
Write the "deleted" flag of a record on a compressed page. The flag must
already have been written on the uncompressed page. */
void
page_zip_rec_set_deleted(
/*=====================*/
buf_block_t* block, /*!< in/out: ROW_FORMAT=COMPRESSED page */
const byte* rec, /*!< in: record on the uncompressed page */
ulint flag, /*!< in: the deleted flag (nonzero=TRUE) */
mtr_t* mtr) /*!< in,out: mini-transaction */
MY_ATTRIBUTE((nonnull));
/** Modify the delete-mark flag of a ROW_FORMAT=COMPRESSED record.
@param[in,out] block buffer block
@param[in,out] rec record on a physical index page
@param[in] flag the value of the delete-mark flag
@param[in,out] mtr mini-transaction */
void page_zip_rec_set_deleted(buf_block_t *block, rec_t *rec, bool flag,
mtr_t *mtr)
MY_ATTRIBUTE((nonnull));
/**********************************************************************//**
Insert a record to the dense page directory. */
@ -360,8 +344,8 @@ void
page_zip_dir_insert(
/*================*/
page_cur_t* cursor, /*!< in/out: page cursor */
const byte* free_rec,/*!< in: record from which rec was
allocated, or NULL */
uint16_t free_rec,/*!< in: record from which rec was
allocated, or 0 */
byte* rec, /*!< in: record to insert */
mtr_t* mtr) /*!< in/out: mini-transaction */
MY_ATTRIBUTE((nonnull(1,3,4)));

View file

@ -25,10 +25,7 @@ Compressed page interface
Created June 2005 by Marko Makela
*******************************************************/
#include "page0zip.h"
#include "mtr0log.h"
#include "page0page.h"
#include "srv0srv.h"
/* The format of compressed pages is as follows.
@ -319,29 +316,6 @@ page_zip_des_init(
memset(page_zip, 0, sizeof *page_zip);
}
/**********************************************************************//**
Write data to the uncompressed header portion of a page. The data must
already have been written to the uncompressed page.
However, the data portion of the uncompressed page may differ from
the compressed page when a record is being inserted in
page_cur_insert_rec_zip(). */
UNIV_INLINE
void
page_zip_write_header(
/*==================*/
buf_block_t* block, /*!< in/out: compressed page */
const byte* str, /*!< in: address on the uncompressed page */
ulint length, /*!< in: length of the data */
mtr_t* mtr) /*!< in/out: mini-transaction */
{
ut_ad(page_align(str) == block->frame);
const uint16_t pos = page_offset(str);
ut_ad(pos < PAGE_DATA);
ut_ad(pos + length < PAGE_DATA);
mtr->zmemcpy(&block->page, pos, str, length);
}
/**********************************************************************//**
Reset the counters used for filling
INFORMATION_SCHEMA.innodb_cmp_per_index. */

View file

@ -717,7 +717,7 @@ void log_t::files::create(ulint n_files)
ut_ad(log_sys.is_initialised());
this->n_files= n_files;
format= srv_encrypt_log ? log_t::FORMAT_ENC_10_4 : log_t::FORMAT_10_4;
format= srv_encrypt_log ? log_t::FORMAT_ENC_10_5 : log_t::FORMAT_10_5;
subformat= 2;
file_size= srv_log_file_size;
lsn= LOG_START_LSN;
@ -745,8 +745,8 @@ log_file_header_flush(
ut_ad(log_write_mutex_own());
ut_ad(!recv_no_log_write);
ut_a(nth_file < log_sys.log.n_files);
ut_ad(log_sys.log.format == log_t::FORMAT_10_4
|| log_sys.log.format == log_t::FORMAT_ENC_10_4);
ut_ad(log_sys.log.format == log_t::FORMAT_10_5
|| log_sys.log.format == log_t::FORMAT_ENC_10_5);
// man 2 open suggests this buffer to be aligned by 512 for O_DIRECT
MY_ALIGNED(OS_FILE_LOG_BLOCK_SIZE)
@ -1273,14 +1273,14 @@ void log_header_read(ulint header)
}
/** Write checkpoint info to the log header and invoke log_mutex_exit().
@param[in] end_lsn start LSN of the MLOG_CHECKPOINT mini-transaction */
@param[in] end_lsn start LSN of the FILE_CHECKPOINT mini-transaction */
void log_write_checkpoint_info(lsn_t end_lsn)
{
ut_ad(log_mutex_own());
ut_ad(!srv_read_only_mode);
ut_ad(end_lsn == 0 || end_lsn >= log_sys.next_checkpoint_lsn);
ut_ad(end_lsn <= log_sys.lsn);
ut_ad(end_lsn + SIZE_OF_MLOG_CHECKPOINT <= log_sys.lsn
ut_ad(end_lsn + SIZE_OF_FILE_CHECKPOINT <= log_sys.lsn
|| srv_shutdown_state != SRV_SHUTDOWN_NONE);
DBUG_PRINT("ib_log", ("checkpoint " UINT64PF " at " LSN_PF
@ -1415,23 +1415,23 @@ bool log_checkpoint()
ut_ad(oldest_lsn >= log_sys.last_checkpoint_lsn);
if (oldest_lsn
> log_sys.last_checkpoint_lsn + SIZE_OF_MLOG_CHECKPOINT) {
> log_sys.last_checkpoint_lsn + SIZE_OF_FILE_CHECKPOINT) {
/* Some log has been written since the previous checkpoint. */
} else if (srv_shutdown_state != SRV_SHUTDOWN_NONE) {
/* MariaDB 10.3 startup expects the redo log file to be
/* MariaDB startup expects the redo log file to be
logically empty (not even containing a MLOG_CHECKPOINT record)
after a clean shutdown. Perform an extra checkpoint at
shutdown. */
} else {
/* Do nothing, because nothing was logged (other than
a MLOG_CHECKPOINT marker) since the previous checkpoint. */
a FILE_CHECKPOINT marker) since the previous checkpoint. */
log_mutex_exit();
return(true);
}
/* Repeat the MLOG_FILE_NAME records after the checkpoint, in
/* Repeat the FILE_MODIFY records after the checkpoint, in
case some log records between the checkpoint and log_sys.lsn
need them. Finally, write a MLOG_CHECKPOINT marker. Redo log
apply expects to see a MLOG_CHECKPOINT after the checkpoint,
need them. Finally, write a FILE_CHECKPOINT marker. Redo log
apply expects to see a FILE_CHECKPOINT after the checkpoint,
except on clean shutdown, where the log will be empty after
the checkpoint.
It is important that we write out the redo log before any
@ -1446,7 +1446,7 @@ bool log_checkpoint()
|| flush_lsn != end_lsn;
if (fil_names_clear(flush_lsn, do_write)) {
ut_ad(log_sys.lsn >= end_lsn + SIZE_OF_MLOG_CHECKPOINT);
ut_ad(log_sys.lsn >= end_lsn + SIZE_OF_FILE_CHECKPOINT);
flush_lsn = log_sys.lsn;
}
@ -1794,7 +1794,9 @@ wait_suspend_loop:
lsn = log_sys.lsn;
const bool lsn_changed = lsn != log_sys.last_checkpoint_lsn;
const bool lsn_changed = lsn != log_sys.last_checkpoint_lsn
&& lsn != log_sys.last_checkpoint_lsn
+ SIZE_OF_FILE_CHECKPOINT;
ut_ad(lsn >= log_sys.last_checkpoint_lsn);
log_mutex_exit();
@ -1956,7 +1958,7 @@ void
log_pad_current_log_block(void)
/*===========================*/
{
byte b = MLOG_DUMMY_RECORD;
byte b = 0;
ulint pad_length;
ulint i;
lsn_t lsn;

File diff suppressed because it is too large Load diff

View file

@ -26,15 +26,14 @@ Created 12/7/1995 Heikki Tuuri
#include "mtr0log.h"
#include "buf0buf.h"
#include "dict0dict.h"
#include "dict0mem.h"
#include "log0recv.h"
#include "page0page.h"
#include "buf0dblwr.h"
#include "dict0boot.h"
/********************************************************//**
Parses an initial log record written by mtr_t::write_low().
Parses an initial log record written by mlog_write_initial_log_record_low().
@return parsed record end, NULL if not a complete record */
ATTRIBUTE_COLD /* only used when crash-upgrading */
const byte*
mlog_parse_initial_log_record(
/*==========================*/
@ -196,112 +195,6 @@ mlog_parse_nbytes(
return const_cast<byte*>(ptr);
}
/**
Write a log record for writing 1, 2, 4, or 8 bytes.
@param[in] type number of bytes to write
@param[in] block file page
@param[in] ptr pointer within block.frame
@param[in,out] l log record buffer
@return new end of mini-transaction log */
byte *mtr_t::log_write_low(mlog_id_t type, const buf_block_t &block,
const byte *ptr, byte *l)
{
ut_ad(type == MLOG_1BYTE || type == MLOG_2BYTES || type == MLOG_4BYTES ||
type == MLOG_8BYTES);
ut_ad(block.page.state == BUF_BLOCK_FILE_PAGE);
ut_ad(ptr >= block.frame + FIL_PAGE_OFFSET);
ut_ad(ptr + unsigned(type) <=
&block.frame[srv_page_size - FIL_PAGE_DATA_END]);
l= log_write_low(type, block.page.id, l);
mach_write_to_2(l, page_offset(ptr));
return l + 2;
}
/**
Write a log record for writing 1, 2, or 4 bytes.
@param[in] block file page
@param[in,out] ptr pointer in file page
@param[in] l number of bytes to write
@param[in,out] log_ptr log record buffer
@param[in] val value to write */
void mtr_t::log_write(const buf_block_t &block, byte *ptr, mlog_id_t l,
byte *log_ptr, uint32_t val)
{
ut_ad(l == MLOG_1BYTE || l == MLOG_2BYTES || l == MLOG_4BYTES);
log_ptr= log_write_low(l, block, ptr, log_ptr);
log_ptr+= mach_write_compressed(log_ptr, val);
m_log.close(log_ptr);
}
/**
Write a log record for writing 8 bytes.
@param[in] block file page
@param[in,out] ptr pointer in file page
@param[in] l number of bytes to write
@param[in,out] log_ptr log record buffer
@param[in] val value to write */
void mtr_t::log_write(const buf_block_t &block, byte *ptr, mlog_id_t l,
byte *log_ptr, uint64_t val)
{
ut_ad(l == MLOG_8BYTES);
log_ptr= log_write_low(l, block, ptr, log_ptr);
log_ptr+= mach_u64_write_compressed(log_ptr, val);
m_log.close(log_ptr);
}
/** Log a write of a byte string to a page.
@param[in] b buffer page
@param[in] ofs byte offset from b->frame
@param[in] len length of the data to write */
void mtr_t::memcpy(const buf_block_t &b, ulint ofs, ulint len)
{
ut_ad(len);
ut_ad(ofs <= ulint(srv_page_size));
ut_ad(ofs + len <= ulint(srv_page_size));
set_modified();
if (m_log_mode != MTR_LOG_ALL)
{
ut_ad(m_log_mode == MTR_LOG_NONE || m_log_mode == MTR_LOG_NO_REDO);
return;
}
ut_ad(ofs + len < PAGE_DATA || !b.page.zip.data ||
mach_read_from_2(b.frame + FIL_PAGE_TYPE) <= FIL_PAGE_TYPE_ZBLOB2);
byte *l= log_write_low(MLOG_WRITE_STRING, b.page.id, m_log.open(11 + 2 + 2));
mach_write_to_2(l, ofs);
mach_write_to_2(l + 2, len);
m_log.close(l + 4);
m_log.push(b.frame + ofs, static_cast<uint32_t>(len));
}
/** Write a byte string to a ROW_FORMAT=COMPRESSED page.
@param[in] b ROW_FORMAT=COMPRESSED index page
@param[in] ofs byte offset from b.zip.data
@param[in] len length of the data to write */
void mtr_t::zmemcpy(const buf_page_t &b, ulint offset, ulint len)
{
ut_ad(page_zip_simple_validate(&b.zip));
ut_ad(len);
ut_ad(offset + len <= page_zip_get_size(&b.zip));
ut_ad(mach_read_from_2(b.zip.data + FIL_PAGE_TYPE) == FIL_PAGE_INDEX ||
mach_read_from_2(b.zip.data + FIL_PAGE_TYPE) == FIL_PAGE_RTREE);
set_modified();
if (m_log_mode != MTR_LOG_ALL)
{
ut_ad(m_log_mode == MTR_LOG_NONE || m_log_mode == MTR_LOG_NO_REDO);
return;
}
byte *l= log_write_low(MLOG_ZIP_WRITE_STRING, b.id, m_log.open(11 + 2 + 2));
mach_write_to_2(l, offset);
mach_write_to_2(l + 2, len);
m_log.close(l + 4);
m_log.push(b.zip.data + offset, static_cast<uint32_t>(len));
}
/********************************************************//**
Parses a log record written by mtr_t::memcpy().
@return parsed record end, NULL if not a complete record */
@ -353,34 +246,6 @@ mlog_parse_string(
return(ptr + len);
}
/** Initialize a string of bytes.
@param[in,out] b buffer page
@param[in] ofs byte offset from block->frame
@param[in] len length of the data to write
@param[in] val the data byte to write */
void mtr_t::memset(const buf_block_t* b, ulint ofs, ulint len, byte val)
{
ut_ad(len);
ut_ad(ofs <= ulint(srv_page_size));
ut_ad(ofs + len <= ulint(srv_page_size));
ut_ad(ofs + len < PAGE_DATA || !b->page.zip.data ||
mach_read_from_2(b->frame + FIL_PAGE_TYPE) <= FIL_PAGE_TYPE_ZBLOB2);
::memset(ofs + b->frame, val, len);
set_modified();
if (m_log_mode != MTR_LOG_ALL)
{
ut_ad(m_log_mode == MTR_LOG_NONE || m_log_mode == MTR_LOG_NO_REDO);
return;
}
byte *l= log_write_low(MLOG_MEMSET, b->page.id, m_log.open(11 + 2 + 2 + 1));
mach_write_to_2(l, ofs);
mach_write_to_2(l + 2, len);
l[4]= val;
m_log.close(l + 5);
}
/********************************************************//**
Parses a log record written by mlog_open_and_write_index.
@return parsed record end, NULL if not a complete record */

View file

@ -378,13 +378,15 @@ void mtr_t::start()
ut_d(m_start= true);
ut_d(m_commit= false);
m_last= nullptr;
m_last_offset= 0;
new(&m_memo) mtr_buf_t();
new(&m_log) mtr_buf_t();
m_made_dirty= false;
m_inside_ibuf= false;
m_modifications= false;
m_n_log_recs= 0;
m_log_mode= MTR_LOG_ALL;
ut_d(m_user_space_id= TRX_SYS_SPACE);
m_user_space= nullptr;
@ -411,7 +413,7 @@ void mtr_t::commit()
ut_ad(!m_modifications || !recv_no_log_write);
ut_ad(!m_modifications || m_log_mode != MTR_LOG_NONE);
if (m_modifications && (m_n_log_recs || m_log_mode == MTR_LOG_NO_REDO))
if (m_modifications && (m_log_mode == MTR_LOG_NO_REDO || !m_log.empty()))
{
ut_ad(!srv_read_only_mode || m_log_mode == MTR_LOG_NO_REDO);
@ -445,7 +447,7 @@ void mtr_t::commit()
/** Commit a mini-transaction that did not modify any pages,
but generated some redo log on a higher level, such as
MLOG_FILE_NAME records and an optional MLOG_CHECKPOINT marker.
FILE_MODIFY records and an optional FILE_CHECKPOINT marker.
The caller must invoke log_mutex_enter() and log_mutex_exit().
This is to be used at log_checkpoint().
@param[in] checkpoint_lsn log checkpoint LSN, or 0 */
@ -458,23 +460,16 @@ void mtr_t::commit_files(lsn_t checkpoint_lsn)
ut_ad(!m_made_dirty);
ut_ad(m_memo.size() == 0);
ut_ad(!srv_read_only_mode);
ut_ad(checkpoint_lsn || m_n_log_recs > 1);
switch (m_n_log_recs) {
case 0:
break;
case 1:
*m_log.front()->begin() |= MLOG_SINGLE_REC_FLAG;
break;
default:
*m_log.push<byte*>(1) = MLOG_MULTI_REC_END;
}
if (checkpoint_lsn) {
byte* ptr = m_log.push<byte*>(SIZE_OF_MLOG_CHECKPOINT);
compile_time_assert(SIZE_OF_MLOG_CHECKPOINT == 1 + 8);
*ptr = MLOG_CHECKPOINT;
mach_write_to_8(ptr + 1, checkpoint_lsn);
byte* ptr = m_log.push<byte*>(SIZE_OF_FILE_CHECKPOINT);
compile_time_assert(SIZE_OF_FILE_CHECKPOINT == 3 + 8 + 1);
*ptr = FILE_CHECKPOINT | (SIZE_OF_FILE_CHECKPOINT - 2);
::memset(ptr + 1, 0, 2);
mach_write_to_8(ptr + 3, checkpoint_lsn);
ptr[3 + 8] = 0;
} else {
*m_log.push<byte*>(1) = 0;
}
finish_write(m_log.size());
@ -482,14 +477,14 @@ void mtr_t::commit_files(lsn_t checkpoint_lsn)
if (checkpoint_lsn) {
DBUG_PRINT("ib_log",
("MLOG_CHECKPOINT(" LSN_PF ") written at " LSN_PF,
("FILE_CHECKPOINT(" LSN_PF ") written at " LSN_PF,
checkpoint_lsn, log_sys.lsn));
}
}
#ifdef UNIV_DEBUG
/** Check if a tablespace is associated with the mini-transaction
(needed for generating a MLOG_FILE_NAME record)
(needed for generating a FILE_MODIFY record)
@param[in] space tablespace
@return whether the mini-transaction is associated with the space */
bool
@ -510,7 +505,7 @@ mtr_t::is_named_space(ulint space) const
return(false);
}
/** Check if a tablespace is associated with the mini-transaction
(needed for generating a MLOG_FILE_NAME record)
(needed for generating a FILE_MODIFY record)
@param[in] space tablespace
@return whether the mini-transaction is associated with the space */
bool mtr_t::is_named_space(const fil_space_t* space) const
@ -618,53 +613,32 @@ inline ulint mtr_t::prepare_write()
}
ulint len = m_log.size();
ulint n_recs = m_n_log_recs;
ut_ad(len > 0);
ut_ad(n_recs > 0);
if (len > srv_log_buffer_size / 2) {
log_buffer_extend(ulong((len + 1) * 2));
}
ut_ad(m_n_log_recs == n_recs);
fil_space_t* space = m_user_space;
if (space != NULL && is_predefined_tablespace(space->id)) {
/* Omit MLOG_FILE_NAME for predefined tablespaces. */
/* Omit FILE_MODIFY for predefined tablespaces. */
space = NULL;
}
log_mutex_enter();
if (fil_names_write_if_was_clean(space, this)) {
/* This mini-transaction was the first one to modify
this tablespace since the latest checkpoint, so
some MLOG_FILE_NAME records were appended to m_log. */
ut_ad(m_n_log_recs > n_recs);
*m_log.push<byte*>(1) = MLOG_MULTI_REC_END;
if (fil_names_write_if_was_clean(space)) {
len = m_log.size();
} else {
/* This was not the first time of dirtying a
tablespace since the latest checkpoint. */
ut_ad(n_recs == m_n_log_recs);
if (n_recs <= 1) {
ut_ad(n_recs == 1);
/* Flag the single log record as the
only record in this mini-transaction. */
*m_log.front()->begin() |= MLOG_SINGLE_REC_FLAG;
} else {
/* Because this mini-transaction comprises
multiple log records, append MLOG_MULTI_REC_END
at the end. */
*m_log.push<byte*>(1) = MLOG_MULTI_REC_END;
len++;
}
ut_ad(len == m_log.size());
}
*m_log.push<byte*>(1) = 0;
len++;
/* check and attempt a checkpoint if exceeding capacity */
log_margin_checkpoint_age(len);

File diff suppressed because it is too large Load diff

View file

@ -198,17 +198,15 @@ page_set_max_trx_id(
mtr_t* mtr) /*!< in/out: mini-transaction, or NULL */
{
ut_ad(!mtr || mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
ut_ad(!page_zip || page_zip == &block->page.zip);
static_assert((PAGE_HEADER + PAGE_MAX_TRX_ID) % 8 == 0, "alignment");
byte *max_trx_id= my_assume_aligned<8>(PAGE_MAX_TRX_ID +
PAGE_HEADER + block->frame);
mtr->write<8>(*block, max_trx_id, trx_id);
if (UNIV_LIKELY_NULL(page_zip))
{
mach_write_to_8(max_trx_id, trx_id);
page_zip_write_header(block, max_trx_id, 8, mtr);
}
else
mtr->write<8>(*block, max_trx_id, trx_id);
memcpy_aligned<8>(&page_zip->data[PAGE_MAX_TRX_ID + PAGE_HEADER],
max_trx_id, 8);
}
/** Persist the AUTO_INCREMENT value on a clustered index root page.
@ -229,17 +227,16 @@ page_set_autoinc(
ut_ad(mtr->memo_contains_flagged(block, MTR_MEMO_PAGE_X_FIX |
MTR_MEMO_PAGE_SX_FIX));
byte *field= PAGE_HEADER + PAGE_ROOT_AUTO_INC + block->frame;
byte *field= my_assume_aligned<8>(PAGE_HEADER + PAGE_ROOT_AUTO_INC +
block->frame);
ib_uint64_t old= mach_read_from_8(field);
if (old == autoinc || (old > autoinc && !reset))
/* nothing to update */;
else if (UNIV_LIKELY_NULL(block->page.zip.data))
{
mach_write_to_8(field, autoinc);
page_zip_write_header(block, field, 8, mtr);
}
else
mtr->write<8>(*block, field, autoinc);
return; /* nothing to update */
mtr->write<8>(*block, field, autoinc);
if (UNIV_LIKELY_NULL(block->page.zip.data))
memcpy_aligned<8>(PAGE_HEADER + PAGE_ROOT_AUTO_INC + block->page.zip.data,
field, 8);
}
/** The page infimum and supremum of an empty page in ROW_FORMAT=REDUNDANT */
@ -327,11 +324,11 @@ void page_create_low(const buf_block_t* block, bool comp)
@param[in,out] block buffer block
@param[in,out] mtr mini-transaction
@param[in] comp set unless ROW_FORMAT=REDUNDANT */
void page_create(buf_block_t* block, mtr_t* mtr, bool comp)
void page_create(buf_block_t *block, mtr_t *mtr, bool comp)
{
mtr->page_create(block->page.id, comp);
buf_block_modify_clock_inc(block);
page_create_low(block, comp);
mtr->page_create(*block, comp);
buf_block_modify_clock_inc(block);
page_create_low(block, comp);
}
/**********************************************************//**
@ -961,14 +958,15 @@ delete_all:
buf_block_modify_clock_inc(block);
const bool is_leaf = page_is_leaf(block->frame);
byte* last_insert = my_assume_aligned<2>(PAGE_LAST_INSERT + PAGE_HEADER
+ block->frame);
mtr->write<2,mtr_t::OPT>(*block, my_assume_aligned<2>
(PAGE_LAST_INSERT + PAGE_HEADER
+ block->frame), 0U);
if (UNIV_LIKELY_NULL(page_zip)) {
ut_ad(page_is_comp(block->frame));
memset(last_insert, 0, 2);
page_zip_write_header(block, last_insert, 2, mtr);
memset_aligned<2>(PAGE_LAST_INSERT + PAGE_HEADER
+ page_zip->data, 0, 2);
do {
page_cur_t cur;
@ -990,8 +988,6 @@ delete_all:
return;
}
mtr->write<2,mtr_t::OPT>(*block, last_insert, 0U);
prev_rec = page_rec_get_prev(rec);
last_rec = page_rec_get_prev(page_get_supremum_rec(block->frame));

View file

@ -361,6 +361,54 @@ page_zip_dir_get(
- PAGE_ZIP_DIR_SLOT_SIZE * (slot + 1)));
}
/** Write a byte string to a ROW_FORMAT=COMPRESSED page.
@param[in] b ROW_FORMAT=COMPRESSED index page
@param[in] offset byte offset from b.zip.data
@param[in] len length of the data to write */
inline void mtr_t::zmemcpy(const buf_page_t &b, ulint offset, ulint len)
{
ut_ad(mach_read_from_2(b.zip.data + FIL_PAGE_TYPE) == FIL_PAGE_INDEX ||
mach_read_from_2(b.zip.data + FIL_PAGE_TYPE) == FIL_PAGE_RTREE);
ut_ad(page_zip_simple_validate(&b.zip));
ut_ad(offset + len <= page_zip_get_size(&b.zip));
memcpy_low(b, static_cast<uint16_t>(offset), &b.zip.data[offset], len);
m_last_offset= static_cast<uint16_t>(offset + len);
}
/** Write a byte string to a ROW_FORMAT=COMPRESSED page.
@param[in,out] b ROW_FORMAT=COMPRESSED index page
@param[in] dest destination within b.zip.data
@param[in] str the data to write
@param[in] len length of the data to write
@tparam w write request type */
template<mtr_t::write_type w>
inline void mtr_t::zmemcpy(const buf_page_t &b, void *dest, const void *str,
ulint len)
{
byte *d= static_cast<byte*>(dest);
const byte *s= static_cast<const byte*>(str);
ut_ad(d >= b.zip.data + FIL_PAGE_OFFSET);
if (w != FORCED)
{
ut_ad(len);
const byte *const end= d + len;
while (*d++ == *s++)
{
if (d == end)
{
ut_ad(w == OPT);
return;
}
}
s--;
d--;
len= static_cast<ulint>(end - d);
}
::memcpy(d, s, len);
zmemcpy(b, d - b.zip.data, len);
}
/** Write redo log for compressing a ROW_FORMAT=COMPRESSED index page.
@param[in,out] block ROW_FORMAT=COMPRESSED index page
@param[in] index the index that the block belongs to
@ -3545,9 +3593,9 @@ page_zip_write_rec_ext(
byte* ext_start = ext_end
- n_ext * FIELD_REF_SIZE;
memmove(ext_start, ext_end, len);
/* TODO: write MEMMOVE record */
mtr->zmemcpy(block->page, ext_start
- page_zip->data, len);
mtr->memmove(*block,
ext_start - page_zip->data,
ext_end - page_zip->data, len);
}
}
@ -3783,8 +3831,8 @@ void page_zip_write_rec(buf_block_t *block, const byte *rec,
/* Copy the node pointer to the uncompressed area. */
byte* node_ptr = storage - REC_NODE_PTR_SIZE * (heap_no - 1);
mtr->zmemcpy(&block->page, node_ptr - page_zip->data,
rec + len, REC_NODE_PTR_SIZE);
mtr->zmemcpy<mtr_t::OPT>(block->page, node_ptr,
rec + len, REC_NODE_PTR_SIZE);
}
ut_a(!*data);
@ -3917,8 +3965,8 @@ page_zip_write_blob_ptr(
externs -= (blob_no + 1) * BTR_EXTERN_FIELD_REF_SIZE;
field += len - BTR_EXTERN_FIELD_REF_SIZE;
mtr->zmemcpy(&block->page, ulint(externs - page_zip->data),
field, BTR_EXTERN_FIELD_REF_SIZE);
mtr->zmemcpy<mtr_t::OPT>(block->page, externs, field,
BTR_EXTERN_FIELD_REF_SIZE);
#ifdef UNIV_ZIP_DEBUG
ut_a(page_zip_validate(page_zip, page, index));
@ -4040,8 +4088,7 @@ page_zip_write_node_ptr(
#endif /* UNIV_DEBUG || UNIV_ZIP_DEBUG */
compile_time_assert(REC_NODE_PTR_SIZE == 4);
mach_write_to_4(field, ptr);
mtr->zmemcpy(&block->page, ulint(storage - page_zip->data),
field, REC_NODE_PTR_SIZE);
mtr->zmemcpy(block->page, storage, field, REC_NODE_PTR_SIZE);
}
/** Write the DB_TRX_ID,DB_ROLL_PTR into a clustered index leaf page record.
@ -4062,9 +4109,6 @@ page_zip_write_trx_id_and_roll_ptr(
roll_ptr_t roll_ptr,
mtr_t* mtr)
{
byte* field;
byte* storage;
ulint len;
page_zip_des_t* const page_zip = &block->page.zip;
ut_d(const page_t* const page = block->frame);
@ -4084,12 +4128,13 @@ page_zip_write_trx_id_and_roll_ptr(
UNIV_MEM_ASSERT_RW(page_zip->data, page_zip_get_size(page_zip));
constexpr ulint sys_len = DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN;
storage = page_zip_dir_start(page_zip)
- (rec_get_heap_no_new(rec) - 1)
* sys_len;
const ulint heap_no = rec_get_heap_no_new(rec);
ut_ad(heap_no >= PAGE_HEAP_NO_USER_LOW);
byte* storage = page_zip_dir_start(page_zip) - (heap_no - 1) * sys_len;
compile_time_assert(DATA_TRX_ID + 1 == DATA_ROLL_PTR);
field = rec_get_nth_field(rec, offsets, trx_id_col, &len);
ulint len;
byte* field = rec_get_nth_field(rec, offsets, trx_id_col, &len);
ut_ad(len == DATA_TRX_ID_LEN);
ut_ad(field + DATA_TRX_ID_LEN
== rec_get_nth_field(rec, offsets, trx_id_col + 1, &len));
@ -4101,8 +4146,47 @@ page_zip_write_trx_id_and_roll_ptr(
mach_write_to_6(field, trx_id);
compile_time_assert(DATA_ROLL_PTR_LEN == 7);
mach_write_to_7(field + DATA_TRX_ID_LEN, roll_ptr);
mtr->zmemcpy(&block->page, ulint(storage - page_zip->data),
field, sys_len);
len = 0;
if (heap_no > PAGE_HEAP_NO_USER_LOW) {
byte* prev = storage + sys_len;
for (; len < sys_len && prev[len] == field[len]; len++);
if (len > 4) {
/* We save space by replacing a single record
WRITE,offset(storage),byte[13]
with up to two records:
MEMMOVE,offset(storage),len(1 byte),+13(1 byte),
WRITE|0x80,0,byte[13-len]
The single WRITE record would be x+13 bytes long (x>2).
The MEMMOVE record would be x+1+1 = x+2 bytes, and
the second WRITE would be 1+1+13-len = 15-len bytes.
The total size is: x+13 versus x+2+15-len = x+17-len.
To save space, we must have len>4. */
memcpy(storage, prev, len);
mtr->memmove(*block, ulint(storage - page_zip->data),
ulint(storage - page_zip->data) + sys_len,
len);
storage += len;
field += len;
if (UNIV_LIKELY(len < sys_len)) {
goto write;
}
} else {
len = 0;
goto write;
}
} else {
write:
mtr->zmemcpy<mtr_t::OPT>(block->page, storage, field,
sys_len - len);
}
#if defined UNIV_DEBUG || defined UNIV_ZIP_DEBUG
ut_a(!memcmp(storage - len, field - len, sys_len));
#endif /* UNIV_DEBUG || UNIV_ZIP_DEBUG */
UNIV_MEM_ASSERT_RW(rec, rec_offs_data_size(offsets));
UNIV_MEM_ASSERT_RW(rec - rec_offs_extra_size(offsets),
@ -4222,9 +4306,8 @@ page_zip_clear_rec(
memset(field, 0, REC_NODE_PTR_SIZE);
storage -= (heap_no - 1) * REC_NODE_PTR_SIZE;
clear_page_zip:
/* TODO: write MEMSET record */
memset(storage, 0, len);
mtr->zmemcpy(block->page, storage - page_zip->data, len);
mtr->memset(*block, storage - page_zip->data, len, 0);
} else if (index->is_clust()) {
/* Clear trx_id and roll_ptr. On the compressed page,
there is an array of these fields immediately before the
@ -4265,33 +4348,24 @@ clear_page_zip:
}
}
/**********************************************************************//**
Write the "deleted" flag of a record on a compressed page. The flag must
already have been written on the uncompressed page. */
void
page_zip_rec_set_deleted(
/*=====================*/
buf_block_t* block, /*!< in/out: ROW_FORMAT=COMPRESSED page */
const byte* rec, /*!< in: record on the uncompressed page */
ulint flag, /*!< in: the deleted flag (nonzero=TRUE) */
mtr_t* mtr) /*!< in,out: mini-transaction */
/** Modify the delete-mark flag of a ROW_FORMAT=COMPRESSED record.
@param[in,out] block buffer block
@param[in,out] rec record on a physical index page
@param[in] flag the value of the delete-mark flag
@param[in,out] mtr mini-transaction */
void page_zip_rec_set_deleted(buf_block_t *block, rec_t *rec, bool flag,
mtr_t *mtr)
{
ut_ad(page_align(rec) == block->frame);
page_zip_des_t* const page_zip = &block->page.zip;
byte* slot = page_zip_dir_find(&block->page.zip, page_offset(rec));
ut_a(slot);
UNIV_MEM_ASSERT_RW(page_zip->data, page_zip_get_size(page_zip));
byte b = *slot;
if (flag) {
b |= (PAGE_ZIP_DIR_SLOT_DEL >> 8);
} else {
b &= ~(PAGE_ZIP_DIR_SLOT_DEL >> 8);
}
if (b != *slot) {
mtr->zmemcpy(&block->page, slot - page_zip->data, &b, 1);
}
ut_ad(page_align(rec) == block->frame);
byte *slot= page_zip_dir_find(&block->page.zip, page_offset(rec));
byte b= *slot;
if (flag)
b|= (PAGE_ZIP_DIR_SLOT_DEL >> 8);
else
b&= ~(PAGE_ZIP_DIR_SLOT_DEL >> 8);
mtr->zmemcpy<mtr_t::OPT>(block->page, slot, &b, 1);
#ifdef UNIV_ZIP_DEBUG
ut_a(page_zip_validate(page_zip, page_align(rec), NULL));
ut_a(page_zip_validate(&block->page.zip, block->frame, nullptr));
#endif /* UNIV_ZIP_DEBUG */
}
@ -4306,20 +4380,16 @@ page_zip_rec_set_owned(
ulint flag, /*!< in: the owned flag (nonzero=TRUE) */
mtr_t* mtr) /*!< in/out: mini-transaction */
{
ut_ad(page_align(rec) == block->frame);
page_zip_des_t* const page_zip = &block->page.zip;
byte* slot = page_zip_dir_find(page_zip, page_offset(rec));
ut_a(slot);
UNIV_MEM_ASSERT_RW(page_zip->data, page_zip_get_size(page_zip));
byte b = *slot;
if (flag) {
b |= (PAGE_ZIP_DIR_SLOT_OWNED >> 8);
} else {
b &= ~(PAGE_ZIP_DIR_SLOT_OWNED >> 8);
}
if (b != *slot) {
mtr->zmemcpy(&block->page, slot - page_zip->data, &b, 1);
}
ut_ad(page_align(rec) == block->frame);
page_zip_des_t *const page_zip= &block->page.zip;
byte *slot= page_zip_dir_find(page_zip, page_offset(rec));
UNIV_MEM_ASSERT_RW(page_zip->data, page_zip_get_size(page_zip));
byte b= *slot;
if (flag)
b|= (PAGE_ZIP_DIR_SLOT_OWNED >> 8);
else
b&= ~(PAGE_ZIP_DIR_SLOT_OWNED >> 8);
mtr->zmemcpy<mtr_t::OPT>(block->page, slot, &b, 1);
}
/**********************************************************************//**
@ -4328,8 +4398,8 @@ void
page_zip_dir_insert(
/*================*/
page_cur_t* cursor, /*!< in/out: page cursor */
const byte* free_rec,/*!< in: record from which rec was
allocated, or NULL */
uint16_t free_rec,/*!< in: record from which rec was
allocated, or 0 */
byte* rec, /*!< in: record to insert */
mtr_t* mtr) /*!< in/out: mini-transaction */
{
@ -4371,7 +4441,7 @@ page_zip_dir_insert(
n_dense = page_dir_get_n_heap(page_zip->data)
- (PAGE_HEAP_NO_USER_LOW + 1U);
if (UNIV_LIKELY_NULL(free_rec)) {
if (UNIV_UNLIKELY(free_rec)) {
/* The record was allocated from the free list.
Shift the dense directory only up to that slot.
Note that in this case, n_dense is actually
@ -4379,8 +4449,8 @@ page_zip_dir_insert(
did not increment n_heap. */
ut_ad(rec_get_heap_no_new(rec) < n_dense + 1
+ PAGE_HEAP_NO_USER_LOW);
ut_ad(rec >= free_rec);
slot_free = page_zip_dir_find(page_zip, page_offset(free_rec));
ut_ad(page_offset(rec) >= free_rec);
slot_free = page_zip_dir_find(page_zip, free_rec);
ut_ad(slot_free);
slot_free += PAGE_ZIP_DIR_SLOT_SIZE;
} else {
@ -4394,17 +4464,20 @@ page_zip_dir_insert(
- PAGE_ZIP_DIR_SLOT_SIZE * n_dense;
}
const ulint slot_len = ulint(slot_rec - slot_free);
/* Shift the dense directory to allocate place for rec. */
memmove_aligned<2>(slot_free - PAGE_ZIP_DIR_SLOT_SIZE, slot_free,
slot_len);
if (const ulint slot_len = ulint(slot_rec - slot_free)) {
/* Shift the dense directory to allocate place for rec. */
memmove_aligned<2>(slot_free - PAGE_ZIP_DIR_SLOT_SIZE,
slot_free, slot_len);
mtr->memmove(*cursor->block, (slot_free - page_zip->data)
- PAGE_ZIP_DIR_SLOT_SIZE,
slot_free - page_zip->data, slot_len);
}
/* Write the entry for the inserted record.
The "owned" and "deleted" flags must be zero. */
mach_write_to_2(slot_rec - PAGE_ZIP_DIR_SLOT_SIZE, page_offset(rec));
/* TODO: issue MEMMOVE record to reduce log volume */
mtr->zmemcpy(cursor->block->page, slot_free - PAGE_ZIP_DIR_SLOT_SIZE
- page_zip->data, PAGE_ZIP_DIR_SLOT_SIZE + slot_len);
mtr->zmemcpy(cursor->block->page, slot_rec - page_zip->data
- PAGE_ZIP_DIR_SLOT_SIZE, PAGE_ZIP_DIR_SLOT_SIZE);
}
/** Shift the dense page directory and the array of BLOB pointers
@ -4434,12 +4507,13 @@ void page_zip_dir_delete(buf_block_t *block, byte *rec,
free ? static_cast<uint16_t>(free - rec) : 0);
byte *page_free= my_assume_aligned<2>(PAGE_FREE + PAGE_HEADER +
block->frame);
mach_write_to_2(page_free, page_offset(rec));
mtr->write<2>(*block, page_free, page_offset(rec));
byte *garbage= my_assume_aligned<2>(PAGE_GARBAGE + PAGE_HEADER +
block->frame);
mach_write_to_2(garbage, rec_offs_size(offsets) + mach_read_from_2(garbage));
mtr->write<2>(*block, garbage, rec_offs_size(offsets) +
mach_read_from_2(garbage));
compile_time_assert(PAGE_GARBAGE == PAGE_FREE + 2);
page_zip_write_header(block, page_free, 4, mtr);
memcpy_aligned<4>(PAGE_FREE + PAGE_HEADER + page_zip->data, page_free, 4);
byte *slot_rec= page_zip_dir_find(page_zip, page_offset(rec));
ut_a(slot_rec);
uint16_t n_recs= page_get_n_recs(block->frame);
@ -4448,8 +4522,9 @@ void page_zip_dir_delete(buf_block_t *block, byte *rec,
/* This could not be done before page_zip_dir_find(). */
byte *page_n_recs= my_assume_aligned<2>(PAGE_N_RECS + PAGE_HEADER +
block->frame);
mach_write_to_2(page_n_recs, n_recs - 1);
page_zip_write_header(block, page_n_recs, 2, mtr);
mtr->write<2>(*block, page_n_recs, n_recs - 1U);
memcpy_aligned<2>(PAGE_N_RECS + PAGE_HEADER + page_zip->data, page_n_recs,
2);
byte *slot_free;
@ -4468,16 +4543,17 @@ void page_zip_dir_delete(buf_block_t *block, byte *rec,
const ulint slot_len= slot_rec > slot_free ? ulint(slot_rec - slot_free) : 0;
if (slot_len)
/* MDEV-12353 TODO: issue MEMMOVE record */
{
memmove_aligned<2>(slot_free + PAGE_ZIP_DIR_SLOT_SIZE, slot_free,
slot_len);
mtr->memmove(*block, (slot_free - page_zip->data) + PAGE_ZIP_DIR_SLOT_SIZE,
slot_free - page_zip->data, slot_len);
}
/* Write the entry for the deleted record.
The "owned" and "deleted" flags will be cleared. */
mach_write_to_2(slot_free, page_offset(rec));
mtr->zmemcpy(block->page, slot_free - page_zip->data,
slot_len + PAGE_ZIP_DIR_SLOT_SIZE);
mtr->zmemcpy(block->page, slot_free - page_zip->data, 2);
if (const ulint n_ext= rec_offs_n_extern(offsets))
{
@ -4491,18 +4567,18 @@ void page_zip_dir_delete(buf_block_t *block, byte *rec,
byte *externs= page_zip->data + page_zip_get_size(page_zip) -
(page_dir_get_n_heap(block->frame) - PAGE_HEAP_NO_USER_LOW) *
PAGE_ZIP_CLUST_LEAF_SLOT_SIZE;
byte *ext_end= externs - page_zip->n_blobs * FIELD_REF_SIZE;
/* Shift and zero fill the array. */
memmove(ext_end + n_ext * FIELD_REF_SIZE, ext_end,
ulint(page_zip->n_blobs - n_ext - blob_no) *
BTR_EXTERN_FIELD_REF_SIZE);
if (const ulint ext_len= ulint(page_zip->n_blobs - n_ext - blob_no) *
BTR_EXTERN_FIELD_REF_SIZE)
{
memmove(ext_end + n_ext * FIELD_REF_SIZE, ext_end, ext_len);
mtr->memmove(*block, (ext_end - page_zip->data) + n_ext * FIELD_REF_SIZE,
ext_end - page_zip->data, ext_len);
}
memset(ext_end, 0, n_ext * FIELD_REF_SIZE);
/* TODO: use MEMMOVE and MEMSET records to reduce volume */
const ulint ext_len= ulint(page_zip->n_blobs - blob_no) * FIELD_REF_SIZE;
mtr->zmemcpy(block->page, ext_end - page_zip->data, ext_len);
mtr->memset(*block, ext_end - page_zip->data, n_ext * FIELD_REF_SIZE, 0);
page_zip->n_blobs -= static_cast<unsigned>(n_ext);
}

View file

@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1997, 2017, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2017, 2019, MariaDB Corporation.
Copyright (c) 2017, 2020, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
@ -206,32 +206,7 @@ func_exit:
if (err == DB_SUCCESS && node->rec_type == TRX_UNDO_INSERT_METADATA) {
/* When rolling back the very first instant ADD COLUMN
operation, reset the root page to the basic state. */
ut_ad(!index->table->is_temporary());
if (buf_block_t* root = btr_root_block_get(index, RW_SX_LATCH,
&mtr)) {
byte* page_type = root->frame + FIL_PAGE_TYPE;
ut_ad(mach_read_from_2(page_type)
== FIL_PAGE_TYPE_INSTANT
|| mach_read_from_2(page_type)
== FIL_PAGE_INDEX);
mtr.write<2,mtr_t::OPT>(*root, page_type,
FIL_PAGE_INDEX);
byte* instant = PAGE_INSTANT + PAGE_HEADER
+ root->frame;
mtr.write<2,mtr_t::OPT>(
*root, instant,
page_ptr_get_direction(instant + 1));
rec_t* infimum = page_get_infimum_rec(root->frame);
rec_t* supremum = page_get_supremum_rec(root->frame);
static const byte str[8 + 8] = "supremuminfimum";
if (memcmp(infimum, str + 8, 8)
|| memcmp(supremum, str, 8)) {
mtr.memcpy(root, page_offset(infimum),
str + 8, 8);
mtr.memcpy(root, page_offset(supremum),
str, 8);
}
}
btr_reset_instant(*index, true, &mtr);
}
btr_pcur_commit_specify_mtr(&node->pcur, &mtr);

View file

@ -148,37 +148,12 @@ row_undo_mod_clust_low(
ut_a(!dummy_big_rec);
static const byte
INFIMUM[8] = {'i','n','f','i','m','u','m',0},
SUPREMUM[8] = {'s','u','p','r','e','m','u','m'};
if (err == DB_SUCCESS
&& node->ref == &trx_undo_metadata
&& btr_cur_get_index(btr_cur)->table->instant
&& node->update->info_bits == REC_INFO_METADATA_ADD) {
if (buf_block_t* root = btr_root_block_get(
btr_cur_get_index(btr_cur), RW_SX_LATCH,
mtr)) {
uint16_t infimum, supremum;
if (page_is_comp(root->frame)) {
infimum = PAGE_NEW_INFIMUM;
supremum = PAGE_NEW_SUPREMUM;
} else {
infimum = PAGE_OLD_INFIMUM;
supremum = PAGE_OLD_SUPREMUM;
}
ut_ad(!memcmp(root->frame + infimum,
INFIMUM, 8)
== !memcmp(root->frame + supremum,
SUPREMUM, 8));
if (memcmp(root->frame + infimum, INFIMUM, 8)) {
mtr->memcpy(root, infimum, INFIMUM, 8);
mtr->memcpy(root, supremum, SUPREMUM,
8);
}
}
btr_reset_instant(*btr_cur_get_index(btr_cur), false,
mtr);
}
}

View file

@ -1083,7 +1083,7 @@ srv_prepare_to_delete_redo_log_files(
ib::info info;
if (srv_log_file_size == 0
|| (log_sys.log.format & ~log_t::FORMAT_ENCRYPTED)
!= log_t::FORMAT_10_4) {
!= log_t::FORMAT_10_5) {
info << "Upgrading redo log: ";
} else if (n_files != srv_n_log_files
|| srv_log_file_size
@ -1829,8 +1829,8 @@ files_checked:
&& srv_n_log_files_found == srv_n_log_files
&& log_sys.log.format
== (srv_encrypt_log
? log_t::FORMAT_ENC_10_4
: log_t::FORMAT_10_4)
? log_t::FORMAT_ENC_10_5
: log_t::FORMAT_10_5)
&& log_sys.log.subformat == 2) {
/* No need to add or remove encryption,
upgrade, downgrade, or resize. */

View file

@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1996, 2016, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2017, 2019, MariaDB Corporation.
Copyright (c) 2017, 2020, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
@ -72,8 +72,8 @@ trx_rseg_write_wsrep_checkpoint(
const ulint xid_length = static_cast<ulint>(xid->gtrid_length
+ xid->bqual_length);
mtr->memcpy(rseg_header, TRX_RSEG + TRX_RSEG_WSREP_XID_DATA,
xid->data, xid_length);
mtr->memcpy(*rseg_header, TRX_RSEG + TRX_RSEG_WSREP_XID_DATA
+ rseg_header->frame, xid->data, xid_length);
if (UNIV_LIKELY(xid_length < XIDDATASIZE)) {
mtr->memset(rseg_header,
TRX_RSEG + TRX_RSEG_WSREP_XID_DATA + xid_length,
@ -738,9 +738,9 @@ void trx_rseg_update_binlog_offset(buf_block_t *rseg_header, const trx_t *trx,
+ rseg_header->frame,
trx->mysql_log_offset);
if (memcmp(trx->mysql_log_file_name, TRX_RSEG + TRX_RSEG_BINLOG_NAME
+ rseg_header->frame, len)) {
mtr->memcpy(rseg_header, TRX_RSEG + TRX_RSEG_BINLOG_NAME,
trx->mysql_log_file_name, len);
void* name = TRX_RSEG + TRX_RSEG_BINLOG_NAME + rseg_header->frame;
if (memcmp(trx->mysql_log_file_name, name, len)) {
mtr->memcpy(*rseg_header, name, trx->mysql_log_file_name, len);
}
}

View file

@ -390,12 +390,11 @@ static void trx_undo_page_init(const buf_block_t *undo_block, mtr_t *mtr)
compile_time_assert(TRX_UNDO_PAGE_START == 2);
compile_time_assert(TRX_UNDO_PAGE_NODE == TRX_UNDO_PAGE_FREE + 2);
/* MDEV-12353 FIXME: write minimal number of bytes in the new encoding */
mtr->write<4>(*undo_block, TRX_UNDO_PAGE_HDR + undo_block->frame,
TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_HDR_SIZE);
mtr->write<2>(*undo_block, TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_FREE +
undo_block->frame,
TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_HDR_SIZE);
alignas(4) byte hdr[6];
mach_write_to_4(hdr, TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_HDR_SIZE);
memcpy_aligned<2>(hdr + 4, hdr + 2, 2);
static_assert(TRX_UNDO_PAGE_FREE == 4, "compatibility");
mtr->memcpy(*undo_block, undo_block->frame + TRX_UNDO_PAGE_HDR, hdr, 6);
}
/** Look for a free slot for an undo log segment.
@ -501,41 +500,63 @@ trx_undo_seg_create(fil_space_t *space, buf_block_t *rseg_hdr, ulint *id,
static uint16_t trx_undo_header_create(buf_block_t *undo_page, trx_id_t trx_id,
mtr_t* mtr)
{
const uint16_t free= mach_read_from_2(TRX_UNDO_PAGE_HDR +
TRX_UNDO_PAGE_FREE + undo_page->frame);
const uint16_t new_free= free + TRX_UNDO_LOG_OLD_HDR_SIZE;
/* Reset the TRX_UNDO_PAGE_TYPE in case this page is being
repurposed after upgrading to MariaDB 10.3. */
byte *undo_type= my_assume_aligned<2>
(TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_TYPE + undo_page->frame);
ut_ad(mach_read_from_2(undo_type) <= TRX_UNDO_UPDATE);
mtr->write<2,mtr_t::OPT>(*undo_page, undo_type, 0U);
byte *start= my_assume_aligned<4>(TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_START +
undo_page->frame);
const uint16_t free= mach_read_from_2(start + 2);
static_assert(TRX_UNDO_PAGE_START + 2 == TRX_UNDO_PAGE_FREE,
"compatibility");
ut_a(free + TRX_UNDO_LOG_XA_HDR_SIZE < srv_page_size - 100);
mtr->write<2>(*undo_page, TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_START +
undo_page->frame, new_free);
/* MDEV-12353 TODO: use MEMMOVE record */
mtr->write<2>(*undo_page, TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_FREE +
undo_page->frame, new_free);
mtr->write<2>(*undo_page, TRX_UNDO_SEG_HDR + TRX_UNDO_STATE +
undo_page->frame, TRX_UNDO_ACTIVE);
mtr->write<2,mtr_t::OPT>(*undo_page, free + TRX_UNDO_NEEDS_PURGE +
undo_page->frame, 1U);
mtr->write<8>(*undo_page, free + TRX_UNDO_TRX_ID + undo_page->frame, trx_id);
mtr->write<2,mtr_t::OPT>(*undo_page, free + TRX_UNDO_LOG_START +
undo_page->frame, new_free);
mtr->memset(undo_page, free + TRX_UNDO_XID_EXISTS,
TRX_UNDO_LOG_OLD_HDR_SIZE - TRX_UNDO_XID_EXISTS, 0);
if (uint16_t prev_log= mach_read_from_2(TRX_UNDO_SEG_HDR +
TRX_UNDO_LAST_LOG +
undo_page->frame))
{
mach_write_to_2(start, free + TRX_UNDO_LOG_XA_HDR_SIZE);
/* A WRITE of 2 bytes is never longer than a MEMMOVE.
So, WRITE 2+2 bytes is better than WRITE+MEMMOVE.
But, a MEMSET will only be 1+2 bytes, that is, 1 byte shorter! */
memcpy_aligned<2>(start + 2, start, 2);
mtr->memset(*undo_page, TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_START, 4,
start, 2);
uint16_t prev_log= mach_read_from_2(TRX_UNDO_SEG_HDR + TRX_UNDO_LAST_LOG +
undo_page->frame);
alignas(4) byte buf[4];
mach_write_to_2(buf, TRX_UNDO_ACTIVE);
mach_write_to_2(buf + 2, free);
static_assert(TRX_UNDO_STATE + 2 == TRX_UNDO_LAST_LOG, "compatibility");
static_assert(!((TRX_UNDO_SEG_HDR + TRX_UNDO_STATE) % 4), "alignment");
mtr->memcpy(*undo_page, my_assume_aligned<4>
(TRX_UNDO_SEG_HDR + TRX_UNDO_STATE + undo_page->frame),
buf, 4);
if (prev_log)
mtr->write<2>(*undo_page, prev_log + TRX_UNDO_NEXT_LOG + undo_page->frame,
free);
mtr->write<2>(*undo_page, free + TRX_UNDO_PREV_LOG + undo_page->frame,
prev_log);
mtr->write<8>(*undo_page, free + TRX_UNDO_TRX_ID + undo_page->frame, trx_id);
/* Write TRX_UNDO_NEEDS_PURGE=1 and TRX_UNDO_LOG_START. */
mach_write_to_2(buf, 1);
memcpy_aligned<2>(buf + 2, start, 2);
static_assert(TRX_UNDO_NEEDS_PURGE + 2 == TRX_UNDO_LOG_START,
"compatibility");
mtr->memcpy(*undo_page, free + TRX_UNDO_NEEDS_PURGE + undo_page->frame,
buf, 4);
/* Initialize all fields TRX_UNDO_XID_EXISTS to TRX_UNDO_HISTORY_NODE. */
if (prev_log)
{
mtr->memset(undo_page, free + TRX_UNDO_XID_EXISTS,
TRX_UNDO_PREV_LOG - TRX_UNDO_XID_EXISTS, 0);
mtr->write<2,mtr_t::OPT>(*undo_page, free + TRX_UNDO_PREV_LOG +
undo_page->frame, prev_log);
static_assert(TRX_UNDO_PREV_LOG + 2 == TRX_UNDO_HISTORY_NODE,
"compatibility");
mtr->memset(undo_page, free + TRX_UNDO_HISTORY_NODE, FLST_NODE_SIZE, 0);
static_assert(TRX_UNDO_LOG_OLD_HDR_SIZE == TRX_UNDO_HISTORY_NODE +
FLST_NODE_SIZE, "compatibility");
}
mtr->write<2>(*undo_page, TRX_UNDO_SEG_HDR + TRX_UNDO_LAST_LOG +
undo_page->frame, free);
else
mtr->memset(undo_page, free + TRX_UNDO_XID_EXISTS,
TRX_UNDO_LOG_OLD_HDR_SIZE - TRX_UNDO_XID_EXISTS, 0);
return free;
}
@ -563,7 +584,8 @@ static void trx_undo_write_xid(buf_block_t *block, uint16_t offset,
static_cast<uint32_t>(xid.bqual_length));
const ulint xid_length= static_cast<ulint>(xid.gtrid_length
+ xid.bqual_length);
mtr->memcpy(block, offset + TRX_UNDO_XA_XID, xid.data, xid_length);
mtr->memcpy(*block, &block->frame[offset + TRX_UNDO_XA_XID],
xid.data, xid_length);
if (UNIV_LIKELY(xid_length < XIDDATASIZE))
mtr->memset(block, offset + TRX_UNDO_XA_XID + xid_length,
XIDDATASIZE - xid_length, 0);
@ -587,29 +609,6 @@ trx_undo_read_xid(const trx_ulogf_t* log_hdr, XID* xid)
memcpy(xid->data, log_hdr + TRX_UNDO_XA_XID, XIDDATASIZE);
}
/** Add space for the XA XID after an undo log old-style header.
@param[in,out] block undo page
@param[in] offset offset of the undo log header
@param[in,out] mtr mini-transaction */
static void trx_undo_header_add_space_for_xid(buf_block_t *block, ulint offset,
mtr_t *mtr)
{
uint16_t free= mach_read_from_2(TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_FREE +
block->frame);
/* free is now the end offset of the old style undo log header */
ut_a(free == offset + TRX_UNDO_LOG_OLD_HDR_SIZE);
free += TRX_UNDO_LOG_XA_HDR_SIZE - TRX_UNDO_LOG_OLD_HDR_SIZE;
/* Add space for a XID after the header, update the free offset
fields on the undo log page and in the undo log header */
mtr->write<2>(*block, TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_START + block->frame,
free);
/* MDEV-12353 TODO: use MEMMOVE record */
mtr->write<2>(*block, TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_FREE + block->frame,
free);
mtr->write<2>(*block, offset + TRX_UNDO_LOG_START + block->frame, free);
}
/** Parse the redo log entry of an undo log page header create.
@param[in] ptr redo log record
@param[in] end_ptr end of log buffer
@ -1133,8 +1132,6 @@ trx_undo_create(trx_t* trx, trx_rseg_t* rseg, trx_undo_t** undo,
uint16_t offset = trx_undo_header_create(block, trx->id, mtr);
trx_undo_header_add_space_for_xid(block, offset, mtr);
*undo = trx_undo_mem_create(rseg, id, trx->id, trx->xid,
block->page.id.page_no(), offset);
if (*undo == NULL) {
@ -1204,17 +1201,6 @@ trx_undo_reuse_cached(trx_t* trx, trx_rseg_t* rseg, trx_undo_t** pundo,
*pundo = undo;
uint16_t offset = trx_undo_header_create(block, trx->id, mtr);
/* Reset the TRX_UNDO_PAGE_TYPE in case this page is being
repurposed after upgrading to MariaDB 10.3. */
if (ut_d(ulint type =) UNIV_UNLIKELY(
mach_read_from_2(TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_TYPE
+ block->frame))) {
ut_ad(type == TRX_UNDO_INSERT || type == TRX_UNDO_UPDATE);
mtr->write<2>(*block, TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_TYPE
+ block->frame, 0U);
}
trx_undo_header_add_space_for_xid(block, offset, mtr);
trx_undo_mem_init_for_reuse(undo, trx->id, trx->xid, offset);