* Remove those tests that will not be supported on that release.
* Make sure that correct tests are disabled and have MDEVs
* Sort test names
This should not be merged upwards.
There is no reason for the dummy index object dict_ind_redundant
to exist any more. It was only being passed to btr_create().
btr_create(): If !index, assume that a ROW_FORMAT=REDUNDANT
table is being created.
We could pass ibuf.index, dict_sys.sys_tables->indexes.start
and so on, if those objects had been initialized before the
function btr_create() is called.
recv_sys_t opened redo log files along with log_sys_t. That's why I
removed file sharing logic from InnoDB
in 9ef2d29ff4
But it was actually used to ensure that only one MariaDB instance
will touch the same InnoDB files.
os0file.cc: revert some changes done previously
mapped_file_t::map(): now has arguments read_only, nvme
file_io::open(): now has argument read_only
class file_os_io: make final
log_file_t::open(): now has argument read_only
We introduce an EXTENDED log record for appending an undo log record
to an undo log page. This is equivalent to the MLOG_UNDO_INSERT record
that was removed in commit f802c989ec,
only using more compact encoding.
mtr_t::log_write(): Fix a bug that affects longer log
record writes in the !same_page && !have_offset case.
Similar code is already implemented for the have_offset code path.
The bug was unobservable before we started to write longer
EXTENDED records. All !have_offset records (FREE_PAGE, INIT_PAGE,
EXTENDED) that were written so far are short, and we never write
RESERVED or OPTION records.
mtr_t::undo_append(): Write an UNDO_APPEND record.
log_phys_t::undo_append(): Apply an UNDO_APPEND record.
trx_undo_page_set_next_prev_and_add(),
trx_undo_page_report_modify(),
trx_undo_page_report_rename():
Invoke mtr_t::undo_append() instead of emitting WRITE records.
We introduce an EXTENDED log record for initializing an undo log page.
The size of the record will be 2 bytes plus the optional page identifier.
The entire undo page will be initialized, except the space that is
already reserved for TRX_UNDO_SEG_HDR in trx_undo_seg_create().
mtr_t::undo_create(): Write the UNDO_INIT record.
trx_undo_page_init(): Initialize the undo page corresponding to the
UNDO_INIT record. Unlike the former MLOG_UNDO_INIT record, we will
initialize almost the entire page, including initializing the
TRX_UNDO_PAGE_NODE to an empty list node, so that the subsequent call
to flst_init() will avoid writing log for the undo page.
Now there can be only one log file instead of several which
logically work as a single file.
Possible names of redo log files: ib_logfile0,
ib_logfile101 (for just created one)
innodb_log_fiels_in_group: value of this variable is not used
by InnoDB. Possible values are still 1..100, to not break upgrade
LOG_FILE_NAME: add constant of value "ib_logfile0"
LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile"
get_log_file_path(): convenience function that returns full
path of a redo log file
SRV_N_LOG_FILES_MAX: removed
srv_n_log_files: we can't remove this for compatibility reasons,
but now server doesn't use this variable
log_sys_t::file::fd: now just one, not std::vector
log_sys_t::log_capacity: removed word 'group'
find_and_check_log_file(): part of logic from huge srv_start()
moved here
recv_sys_t::files: file descriptors of redo log files.
There can be several of those in case we're upgrading
from older MariaDB version.
recv_sys_t::remove_extra_log_files: whether to remove
ib_logfile{1,2,3...} after successfull upgrade.
recv_sys_t::read(): open if needed and read from one
of several log files
recv_sys_t::files_size(): open if needed and return files count
redo_file_sizes_are_correct(): check that redo log files
sizes are equal. Just to log an error for a user.
Corresponding check was moved from srv0start.cc
namespace deprecated: put all deprecated variables here to
prevent usage of it by us, developers
We plan use the redo log record main type code 0x20 for
InnoDB specific index page operations.
mrec_type_t: Rename INIT_INDEX_PAGE to EXTENDED.
mrec_ext_t: The EXTENDED subtypes.
This is a non-functional change: the redo log record encoding
that was introduced in commit 7ae21b18a6
is not affected.
btr_page_reorganize_low(): Log only the changed data in the page.
TODO: Do not copy the entire changed payload to the redo log.
Emit a combination of MEMMOVE and WRITE records to reduce the log volume.
commit 08ba388713 of MDEV-12353
introduced an incorrect assumption, which was documented by
the failing assertion.
After instant ADD COLUMN, we can have a null (and in-place) UPDATE
of NULL to NULL. No data needs to be written for such updates.
For ROW_FORMAT=REDUNDANT, we reserve space for the NULL values,
and to be compatible with existing behaviour, we will zerofill
the unused data bytes when updating to NULL value.
trx_purge_free_segment(): In some cases (observed when running
the test innodb_zip.wl5522_debug_zip), there is no change to
the TRX_UNDO_NEEDS_PURGE field. Add mtr_t::OPT to disable a debug check.
The bogus debug check was introduced in
commit 56f6dab1d0.
We add FIXME comments and some sketch code for the following cases:
It is possible to write considerably less log for ROW_FORMAT=COMPRESSED
pages. For now, we will delete the records one by one.
It is also possible to treat 'deleting the last records' as a special
case that would involve shrinking PAGE_HEAP_TOP. That should reduce
the need of reorganizing pages.
page_mem_free(): When deleting the very last record of the page,
even if the record did not fully utilize all bytes in a
former PAGE_FREE record, truncate the PAGE_HEAP_TOP and reduce
PAGE_GARBAGE by the saved amount.
fsp_page_create(): Always initialize the page. The logic to
avoid initialization was made redundant and should have been removed
in mysql/mysql-server@ce0a1e85e2
(MySQL 5.7.5).
btr_store_big_rec_extern_fields(): Remove the redundant initialization
of FIL_PAGE_PREV and FIL_PAGE_NEXT. An INIT_PAGE record will have
been written already. Only write the ROW_FORMAT=COMPRESSED page payload
from FIL_PAGE_DATA onwards. We were unnecessarily writing from
FIL_PAGE_TYPE onwards, which caused an assertion failure on recovery:
recv_sys_t::alloc(size_t): Assertion 'len <= srv_page_size' failed
when running the following tests:
./mtr --no-reorder innodb_zip.blob,4k innodb_zip.bug56680,4k
trx_rseg_write_wsrep_checkpoint(): Add missing mtr_t::OPT,
and avoid an unnecessary call to mtr_t::memset().
This addresses a debug assertion failure in wsrep_info.plugin.
page_update_max_trx_id(), page_delete_rec_list_end(): Remove conditions
on recv_recovery_is_on(). These conditions should have been removed in
or before commit f8a9f90667
(removing the support for crash-upgrade).
The physical redo log based recovery will not call such high-level code.
page_mem_free(): When deleting the last record of a page,
do not add it to the PAGE_FREE list, but instead truncate the
PAGE_HEAP_TOP. Modify the page header fields by writing fewer
records.
page_cur_delete_rec(): Let page_mem_free() reset the PAGE_LAST_INSERT.
page_header_reset_last_insert(): Issue memset(), not memcpy(), for
the ROW_FORMAT=COMPRESSED page.
Optionally use libpmem for InnoDB redo log writing.
When server is built -DWITH_PMEM=ON InnoDB tries to detect
that redo log is located on persistent memory storage and
uses faster file access method.
When server is built with -DWITH_PMEM=OFF preprocessor is
used to ensure that no slowdown will present due to allocations
and virtual function calls. So, we don't slow down server
in a common case.
mapped_file_t: an map file, unmap file and returns mapped memory buffer
file_io: abstraction around memory mapped files and file descriptors.
Allows writing, reading and flushing to files.
file_io::writes_are_durable(): notable method of a class.
When it returns true writes are flushed immediately.
file_os_io: file descriptor based file access. Depends on a global state
like srv_read_only_mode
file_pmem_io: file access via libpmem
This is a collaboration work with Sergey Vojtovich
This is a fixup for commit 7ae21b18a6.
It turns out that even if we in the future made LSN
count mini-transactions instead of bytes, we will need
both start LSN and end LSN, which must exactly match
between mtr_t::commit() and log_phys_t::apply().
log_rec_t::lsn: Restore the const qualifier.
log_phys_t::append(): Remove the lsn parameter. Both the start
and end LSN must remain unchanged. We can only append log from
the same mini-transaction to a single log record snippet.
If we combined the log from mini-transactions A and B, it could
happen that the FIL_PAGE_LSN of the page is somewhere between
A.start_lsn and B.start_lsn. In that case, also the log of B
would be wrongly skipped.
recv_sys_t::add(): Assert that if the start LSN matches, also
the end LSN will match.
log_t::FORMAT_10_5: physical redo log format tag
log_phys_t: Buffered records in the physical format.
The log record bytes will follow the last data field,
making use of alignment padding that would otherwise be wasted.
If there are multiple records for the same page, also those
may be appended to an existing log_phys_t object if the memory
is available.
In the physical format, the first byte of a record identifies the
record and its length (up to 15 bytes). For longer records, the
immediately following bytes will encode the remaining length
in a variable-length encoding. Usually, a variable-length-encoded
page identifier will follow, followed by optional payload, whose
length is included in the initially encoded total record length.
When a mini-transaction is updating multiple fields in a page,
it can avoid repeating the tablespace identifier and page number
by setting the same_page flag (most significant bit) in the first
byte of the log record. The byte offset of the record will be
relative to where the previous record for that page ended.
Until MDEV-14425 introduces a separate file-level log for
redo log checkpoints and file operations, we will write the
file-level records in the page-level redo log file.
The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
will be removed in MDEV-14425, and one sequential scan of the
page recovery log will suffice.
Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
If the information is needed, it can be parsed from WRITE records that
modify FSP_SPACE_FLAGS.
MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
as part of this work, before being replaced with WRITE (along with
MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).
mtr_buf_t::empty(): Check if the buffer is empty.
mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.
mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
for the same_page encoding.
page_recv_t::last_offset: Reflects mtr_t::m_last_offset.
Valid values for last_offset during recovery should be 0 or above 8.
(The first 8 bytes of a page are the checksum and the page number,
and neither are ever updated directly by log records.)
Internally, the special value 1 indicates that the same_page form
will not be allowed for the subsequent record.
mtr_t::page_create(): Take the block descriptor as parameter,
so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
record will always followed by a subtype byte, because same_page
records must be longer than 1 byte.
trx_undo_page_init(): Combine the writes in WRITE record.
trx_undo_header_create(): Write 4 bytes using a special MEMSET
record that includes 1 bytes of length and 2 bytes of payload.
flst_write_addr(): Define as a static function. Combine the writes.
flst_zero_both(): Replaces two flst_zero_addr() calls.
flst_init(): Do not inline the function.
fsp_free_seg_inode(): Zerofill the whole inode.
fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
to FIL_NULL when using the physical format.
btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
must have been invoked.
fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.
fil_names_dirty_and_write(): Remove the parameter mtr.
Write the records using a separate mini-transaction object,
because any FILE_ records must be at the start of a mini-transaction log.
recv_recover_page(): Add a fil_space_t* parameter.
After applying log to the a ROW_FORMAT=COMPRESSED page,
invoke buf_zip_decompress() to restore the uncompressed page.
buf_page_io_complete(): Remove the temporary hack to discard the
uncompressed page of a ROW_FORMAT=COMPRESSED page.
page_zip_write_header(): Remove. Use mtr_t::write() or
mtr_t::memset() instead, and update the compressed page frame
separately.
trx_undo_header_add_space_for_xid(): Remove.
trx_undo_seg_create(): Perform the changes that were previously
made by trx_undo_header_add_space_for_xid().
btr_reset_instant(): New function: Reset the table to MariaDB 10.2
or 10.3 format when rolling back an instant ALTER TABLE operation.
page_rec_find_owner_rec(): Merge with the only callers.
page_cur_insert_rec_low(): Combine writes by using a local buffer.
MEMMOVE data from the preceding record whenever feasible
(copying at least 3 bytes).
page_cur_insert_rec_zip(): Combine writes to page header fields.
PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
part from the preceding record.
PageBulk::finishPage(): Combine the writes to the page header
and to the sparse page directory slots.
mtr_t::write(): Only log the least significant (last) bytes
of multi-byte fields that actually differ.
For updating FSP_SIZE, we must always write all 4 bytes to the
redo log, so that the fil_space_set_recv_size() logic in
recv_sys_t::parse() will work.
mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
instead of a numeric offset to the page frame. Only log the
last bytes of multi-byte fields that actually differ.
In fil_space_crypt_t::write_page0(), we must log also any
unchanged bytes, so that recovery will recognize the record
and invoke fil_crypt_parse().
Future work:
MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
MDEV-21725 Optimize btr_page_reorganize_low() redo logging
MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
NOTE: This may break crash-upgrade from a dataset that was
created with innodb_log_optimize_ddl=ON. Also due to
ROW_FORMAT=COMPRESSED pages, it will be easiest to disallow
crash-upgrade.
It would be more robust to disable the MDEV-12699 logic when
crash-upgrading from old redo log format.
log_optimized_ddl_op: Remove.
fil_space_t::enable_lsn, file_name_t::enable_lsn: Remove.
ddl_tracker_t::optimized_ddl: Remove.
TODO: Remove ddl_tracker
mtr_t::log_write_low(): Replaces mlog_write_initial_log_record_low().
mtr_t::log_file_op(): Replaces fil_op_write_log().
mtr_t::free(): Write MLOG_INIT_FREE_PAGE.
mtr_t::init(): Write MLOG_INIT_FILE_PAGE2.
mtr_t::page_create(): Write record about the partial initialization
of an index page.
mlog_catenate_ulint(), mlog_catenate_string(),
mlog_open(), mlog_close(): Remove.
Pass buf_block_t* to all functions that write redo log.
Specifically, replace the parameters page,page_zip
with buf_block_t* block in page_zip_ functions.
page_mem_alloc_free(), page_dir_set_n_heap(), page_ptr_set_direction():
Merge with the callers.
page_direction_reset(), page_direction_increment(),
page_zip_dir_insert(), page_zip_write_rec_ext(), page_zip_write_rec():
Add the parameter mtr, and write log.
PageBulk::insert(), PageBulk::finish(): Write log for all changes.
page_cur_rec_insert(), page_cur_insert_rec_write_log(),
page_cur_insert_rec_write_log(): Remove.
page_rec_set_next(), page_header_set_field(), page_header_set_ptr():
Remove. Use lower-level operations with or without logging.
page_zip_dir_add_slot(): Move to the same compilation unit with
its only caller, page_cur_insert_rec_zip().
page_cur_insert_rec_zip(): Mark pieces of code that must be skipped
once this task is completed.
btr_defragment_chunk(): Before starting a mini-transaction that
is writing (a lot), invoke log_free_check(). This should allow
the test innodb.innodb_defrag_concurrent to pass with the
mtr default_mysqld.cnf setting of innodb_log_file_size=10M.
MLOG_BUF_MARGIN: Remove.
page_zip_compress_write_log(): Write MLOG_INIT_FILE_PAGE2
and MLOG_ZIP_WRITE_STRING records instead of MLOG_ZIP_PAGE_COMPRESS.
This depends on the changes to buf_page_io_complete() and friends
in the parent commit.
No longer write the following redo log records:
MLOG_COMP_LIST_END_DELETE, MLOG_LIST_END_DELETE,
MLOG_COMP_LIST_START_DELETE, MLOG_LIST_START_DELETE,
MLOG_REC_DELETE,MLOG_COMP_REC_DELETE.
Each individual deleted record will be logged separately
using physical log records.
page_dir_slot_set_n_owned(),
page_zip_rec_set_owned(), page_zip_dir_delete(), page_zip_clear_rec():
Add the parameter mtr, and write redo log.
page_dir_slot_set_rec(): Remove. Replaced with lower-level operations
that write redo log when necessary.
page_rec_set_n_owned(): Replaces rec_set_n_owned_old(),
rec_set_n_owned_new().
rec_set_heap_no(): Replaces rec_set_heap_no_old(), rec_set_heap_no_new().
page_mem_free(), page_dir_split_slot(), page_dir_balance_slot():
Add the parameter mtr.
page_dir_set_n_slots(): Merge with the caller page_dir_split_slot().
page_dir_slot_set_rec(): Merge with the callers page_dir_split_slot()
and page_dir_balance_slot().
page_cur_insert_rec_low(), page_cur_insert_rec_zip():
Suppress the logging of lower-level operations.
page_cur_delete_rec_write_log(): Remove.
page_cur_delete_rec(): Do not tolerate mtr=NULL.
rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_comp():
Replace rec_set_heap_no_old() and rec_set_heap_no_new() with direct
access that does not involve redo logging.
mtr_t::memcpy(): Do allow non-redo-logged writes to uncompressed pages
of ROW_FORMAT=COMPRESSED pages.
buf_page_io_complete(): Evict the uncompressed page of
a ROW_FORMAT=COMPRESSED page after recovery. Because we no longer
write logical log records for deleting index records, but instead
write physical records that may refer directly to the compressed
page frame of a ROW_FORMAT=COMPRESSED page, and because on recovery
we will only apply the changes to the ROW_FORMAT=COMPRESSED page,
the uncompressed page frame can be stale until page_zip_decompress()
is executed.
recv_parse_or_apply_log_rec_body(): After applying MLOG_ZIP_WRITE_STRING,
ensure that the FIL_PAGE_TYPE of the uncompressed page matches the
compressed page, because buf_flush_init_for_writing() assumes that
field to be valid.
mlog_init_t::mark_ibuf_exist(): Invoke page_zip_decompress(), because
the uncompressed page after buf_page_create() is not necessarily
up to date.
buf_LRU_block_remove_hashed(): Bypass a page_zip_validate() check
during redo log apply.
recv_apply_hashed_log_recs(): Invoke mlog_init.mark_ibuf_exist()
also for the last batch, to ensure that page_zip_decompress() will
be called for freshly initialized pages.
page_create(): Create normal B-tree pages. Callers that create
R-tree pages will set FIL_PAGE_TYPE and reset the split
sequence number afterwards.
The creation of ROW_FORMAT=COMPRESSED pages is unaffected;
they will be logged as compressed page images.
page_create_low(): Take const buf_block_t* as a parameter.
Let the callers invoke buf_block_modify_clock_inc().
btr_cur_upd_rec_sys(): Replaces row_upd_rec_sys_fields() and
implements redo logging.
row_upd_rec_sys_fields_in_recovery(): Remove, and merge to the
only remaining caller btr_cur_parse_update_in_place().
btr_cur_del_mark_set_clust_rec_log(),
btr_cur_del_mark_set_sec_rec_log(),
btr_cur_set_deleted_flag_for_ibuf():
Remove, and replace with btr_rec_set_deleted<bool>().
page_zip_rec_set_deleted(): Add the parameter mtr, and write a
MLOG_ZIP_WRITE_STRING record to the log.
Log the low-level operations for ROW_FORMAT=COMPRESSED index pages
using a new record, MLOG_ZIP_WRITE_STRING. We will still use
MLOG_1BYTE,..., MLOG_8BYTES or MLOG_WRITE_STRING for operations
on other than index pages (such as the page allocation bitmap pages).
We will stop writing the record MLOG_ZIP_PAGE_COMPRESS later, after
replacing all MLOG_REC_ and MLOG_COMP_REC_ that update index pages.