log_t::FORMAT_10_5: physical redo log format tag
log_phys_t: Buffered records in the physical format.
The log record bytes will follow the last data field,
making use of alignment padding that would otherwise be wasted.
If there are multiple records for the same page, also those
may be appended to an existing log_phys_t object if the memory
is available.
In the physical format, the first byte of a record identifies the
record and its length (up to 15 bytes). For longer records, the
immediately following bytes will encode the remaining length
in a variable-length encoding. Usually, a variable-length-encoded
page identifier will follow, followed by optional payload, whose
length is included in the initially encoded total record length.
When a mini-transaction is updating multiple fields in a page,
it can avoid repeating the tablespace identifier and page number
by setting the same_page flag (most significant bit) in the first
byte of the log record. The byte offset of the record will be
relative to where the previous record for that page ended.
Until MDEV-14425 introduces a separate file-level log for
redo log checkpoints and file operations, we will write the
file-level records in the page-level redo log file.
The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
will be removed in MDEV-14425, and one sequential scan of the
page recovery log will suffice.
Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
If the information is needed, it can be parsed from WRITE records that
modify FSP_SPACE_FLAGS.
MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
as part of this work, before being replaced with WRITE (along with
MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).
mtr_buf_t::empty(): Check if the buffer is empty.
mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.
mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
for the same_page encoding.
page_recv_t::last_offset: Reflects mtr_t::m_last_offset.
Valid values for last_offset during recovery should be 0 or above 8.
(The first 8 bytes of a page are the checksum and the page number,
and neither are ever updated directly by log records.)
Internally, the special value 1 indicates that the same_page form
will not be allowed for the subsequent record.
mtr_t::page_create(): Take the block descriptor as parameter,
so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
record will always followed by a subtype byte, because same_page
records must be longer than 1 byte.
trx_undo_page_init(): Combine the writes in WRITE record.
trx_undo_header_create(): Write 4 bytes using a special MEMSET
record that includes 1 bytes of length and 2 bytes of payload.
flst_write_addr(): Define as a static function. Combine the writes.
flst_zero_both(): Replaces two flst_zero_addr() calls.
flst_init(): Do not inline the function.
fsp_free_seg_inode(): Zerofill the whole inode.
fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
to FIL_NULL when using the physical format.
btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
must have been invoked.
fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.
fil_names_dirty_and_write(): Remove the parameter mtr.
Write the records using a separate mini-transaction object,
because any FILE_ records must be at the start of a mini-transaction log.
recv_recover_page(): Add a fil_space_t* parameter.
After applying log to the a ROW_FORMAT=COMPRESSED page,
invoke buf_zip_decompress() to restore the uncompressed page.
buf_page_io_complete(): Remove the temporary hack to discard the
uncompressed page of a ROW_FORMAT=COMPRESSED page.
page_zip_write_header(): Remove. Use mtr_t::write() or
mtr_t::memset() instead, and update the compressed page frame
separately.
trx_undo_header_add_space_for_xid(): Remove.
trx_undo_seg_create(): Perform the changes that were previously
made by trx_undo_header_add_space_for_xid().
btr_reset_instant(): New function: Reset the table to MariaDB 10.2
or 10.3 format when rolling back an instant ALTER TABLE operation.
page_rec_find_owner_rec(): Merge with the only callers.
page_cur_insert_rec_low(): Combine writes by using a local buffer.
MEMMOVE data from the preceding record whenever feasible
(copying at least 3 bytes).
page_cur_insert_rec_zip(): Combine writes to page header fields.
PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
part from the preceding record.
PageBulk::finishPage(): Combine the writes to the page header
and to the sparse page directory slots.
mtr_t::write(): Only log the least significant (last) bytes
of multi-byte fields that actually differ.
For updating FSP_SIZE, we must always write all 4 bytes to the
redo log, so that the fil_space_set_recv_size() logic in
recv_sys_t::parse() will work.
mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
instead of a numeric offset to the page frame. Only log the
last bytes of multi-byte fields that actually differ.
In fil_space_crypt_t::write_page0(), we must log also any
unchanged bytes, so that recovery will recognize the record
and invoke fil_crypt_parse().
Future work:
MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
MDEV-21725 Optimize btr_page_reorganize_low() redo logging
MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
NOTE: This may break crash-upgrade from a dataset that was
created with innodb_log_optimize_ddl=ON. Also due to
ROW_FORMAT=COMPRESSED pages, it will be easiest to disallow
crash-upgrade.
It would be more robust to disable the MDEV-12699 logic when
crash-upgrading from old redo log format.
log_optimized_ddl_op: Remove.
fil_space_t::enable_lsn, file_name_t::enable_lsn: Remove.
ddl_tracker_t::optimized_ddl: Remove.
TODO: Remove ddl_tracker
mtr_t::log_write_low(): Replaces mlog_write_initial_log_record_low().
mtr_t::log_file_op(): Replaces fil_op_write_log().
mtr_t::free(): Write MLOG_INIT_FREE_PAGE.
mtr_t::init(): Write MLOG_INIT_FILE_PAGE2.
mtr_t::page_create(): Write record about the partial initialization
of an index page.
mlog_catenate_ulint(), mlog_catenate_string(),
mlog_open(), mlog_close(): Remove.
Pass buf_block_t* to all functions that write redo log.
Specifically, replace the parameters page,page_zip
with buf_block_t* block in page_zip_ functions.
page_mem_alloc_free(), page_dir_set_n_heap(), page_ptr_set_direction():
Merge with the callers.
page_direction_reset(), page_direction_increment(),
page_zip_dir_insert(), page_zip_write_rec_ext(), page_zip_write_rec():
Add the parameter mtr, and write log.
PageBulk::insert(), PageBulk::finish(): Write log for all changes.
page_cur_rec_insert(), page_cur_insert_rec_write_log(),
page_cur_insert_rec_write_log(): Remove.
page_rec_set_next(), page_header_set_field(), page_header_set_ptr():
Remove. Use lower-level operations with or without logging.
page_zip_dir_add_slot(): Move to the same compilation unit with
its only caller, page_cur_insert_rec_zip().
page_cur_insert_rec_zip(): Mark pieces of code that must be skipped
once this task is completed.
btr_defragment_chunk(): Before starting a mini-transaction that
is writing (a lot), invoke log_free_check(). This should allow
the test innodb.innodb_defrag_concurrent to pass with the
mtr default_mysqld.cnf setting of innodb_log_file_size=10M.
MLOG_BUF_MARGIN: Remove.
page_zip_compress_write_log(): Write MLOG_INIT_FILE_PAGE2
and MLOG_ZIP_WRITE_STRING records instead of MLOG_ZIP_PAGE_COMPRESS.
This depends on the changes to buf_page_io_complete() and friends
in the parent commit.
No longer write the following redo log records:
MLOG_COMP_LIST_END_DELETE, MLOG_LIST_END_DELETE,
MLOG_COMP_LIST_START_DELETE, MLOG_LIST_START_DELETE,
MLOG_REC_DELETE,MLOG_COMP_REC_DELETE.
Each individual deleted record will be logged separately
using physical log records.
page_dir_slot_set_n_owned(),
page_zip_rec_set_owned(), page_zip_dir_delete(), page_zip_clear_rec():
Add the parameter mtr, and write redo log.
page_dir_slot_set_rec(): Remove. Replaced with lower-level operations
that write redo log when necessary.
page_rec_set_n_owned(): Replaces rec_set_n_owned_old(),
rec_set_n_owned_new().
rec_set_heap_no(): Replaces rec_set_heap_no_old(), rec_set_heap_no_new().
page_mem_free(), page_dir_split_slot(), page_dir_balance_slot():
Add the parameter mtr.
page_dir_set_n_slots(): Merge with the caller page_dir_split_slot().
page_dir_slot_set_rec(): Merge with the callers page_dir_split_slot()
and page_dir_balance_slot().
page_cur_insert_rec_low(), page_cur_insert_rec_zip():
Suppress the logging of lower-level operations.
page_cur_delete_rec_write_log(): Remove.
page_cur_delete_rec(): Do not tolerate mtr=NULL.
rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_comp():
Replace rec_set_heap_no_old() and rec_set_heap_no_new() with direct
access that does not involve redo logging.
mtr_t::memcpy(): Do allow non-redo-logged writes to uncompressed pages
of ROW_FORMAT=COMPRESSED pages.
buf_page_io_complete(): Evict the uncompressed page of
a ROW_FORMAT=COMPRESSED page after recovery. Because we no longer
write logical log records for deleting index records, but instead
write physical records that may refer directly to the compressed
page frame of a ROW_FORMAT=COMPRESSED page, and because on recovery
we will only apply the changes to the ROW_FORMAT=COMPRESSED page,
the uncompressed page frame can be stale until page_zip_decompress()
is executed.
recv_parse_or_apply_log_rec_body(): After applying MLOG_ZIP_WRITE_STRING,
ensure that the FIL_PAGE_TYPE of the uncompressed page matches the
compressed page, because buf_flush_init_for_writing() assumes that
field to be valid.
mlog_init_t::mark_ibuf_exist(): Invoke page_zip_decompress(), because
the uncompressed page after buf_page_create() is not necessarily
up to date.
buf_LRU_block_remove_hashed(): Bypass a page_zip_validate() check
during redo log apply.
recv_apply_hashed_log_recs(): Invoke mlog_init.mark_ibuf_exist()
also for the last batch, to ensure that page_zip_decompress() will
be called for freshly initialized pages.
page_create(): Create normal B-tree pages. Callers that create
R-tree pages will set FIL_PAGE_TYPE and reset the split
sequence number afterwards.
The creation of ROW_FORMAT=COMPRESSED pages is unaffected;
they will be logged as compressed page images.
page_create_low(): Take const buf_block_t* as a parameter.
Let the callers invoke buf_block_modify_clock_inc().
btr_cur_upd_rec_sys(): Replaces row_upd_rec_sys_fields() and
implements redo logging.
row_upd_rec_sys_fields_in_recovery(): Remove, and merge to the
only remaining caller btr_cur_parse_update_in_place().
btr_cur_del_mark_set_clust_rec_log(),
btr_cur_del_mark_set_sec_rec_log(),
btr_cur_set_deleted_flag_for_ibuf():
Remove, and replace with btr_rec_set_deleted<bool>().
page_zip_rec_set_deleted(): Add the parameter mtr, and write a
MLOG_ZIP_WRITE_STRING record to the log.
Log the low-level operations for ROW_FORMAT=COMPRESSED index pages
using a new record, MLOG_ZIP_WRITE_STRING. We will still use
MLOG_1BYTE,..., MLOG_8BYTES or MLOG_WRITE_STRING for operations
on other than index pages (such as the page allocation bitmap pages).
We will stop writing the record MLOG_ZIP_PAGE_COMPRESS later, after
replacing all MLOG_REC_ and MLOG_COMP_REC_ that update index pages.
Log page reorganize as a series of insert operations.
This will make the redo log volume proportional to the page payload size.
btr_page_reorganize_low(): Add template <bool recovery=false>
btr_page_reorganize_block(): Remove the parameter 'bool recovery'
Instead of writing the high-level redo log records
MLOG_LIST_END_COPY_CREATED, MLOG_COMP_LIST_END_COPY_CREATED
write log for each individual insert of a record.
page_copy_rec_list_end_to_created_page(): Remove.
This will improve the fill factor of some pages.
Adjust some tests accordingly.
PageBulk::init(), PageBulk::finish(): Avoid setting bogus limits
to PAGE_HEAP_TOP and PAGE_N_DIR_SLOTS. Avoid accessor functions
that would enforce these limits before the correct ones are set
at the end of PageBulk::finish().
page_zip_reorganize(): Restore the page on failure.
In callers, omit now-redundant calls to page_zip_decompress().
btr_page_reorganize_low(): Define in static scope only, and
remove the z_level parameter. Assert that ROW_FORMAT is not COMPRESSED.
btr_page_reorganize_block(), btr_page_reorganize(): Invoke
page_zip_reorganize() for ROW_FORMAT=COMPRESSED.
page_zip_compress_write_log_no_data(): Remove.
We no longer write the MLOG_ZIP_PAGE_COMPRESS_NO_DATA record.
Instead, we will write MLOG_ZIP_PAGE_COMPRESS records.
REMOVED OPTIONS / SYSTEM VARIABLES
* mysqld removed options will not stop the server from starting. They
will be silently accepted, as if --loose flag is set. Removed options
prefixes will not interfere with existing options prefixes. They are
consumed last.
* mysqld removed options will not show in --help.
* mysqld system variables will be removed according to the deprecation
& removal timeline and not function within the client interface once
removed.
DEPRECTATED OPTIONS / SYSTEM VARIABLES
* mysqld deprecated options will issue a warning to the user.
* mysqld deprecated options will still be visible in --help.
* mysqld system variables will be removed in the next GA version that is
past EOL of the version that deprecated the variable.
* deprecated options / variables will not be used anywhere in the server
code the moment they are deprecated. At most, they will act as aliases.
The advantage of this policy is that it ensures upgrades will always
allow the user to start the server, even when upgrading from a very old
version. It is still possible for user applications to break when
upgrading, as system variables set via the client interface will return
errors. However, this will happen after a long time, with lots of
warnings between versions. The expected timeline is ~ 5 years until a
deprecated variable dissapears from the server.
Remove usage of deprecated variable storage_engine. It was deprecated in 5.5 but
it never issued a deprecation warning. Make it issue a warning in 10.5.1.
Replaced with default_storage_engine.
For compatibility with diagnostic software, let us
return a dummy buffer pool identifier 0 and restore
the columns that were initially deleted in
commit 1a6f708ec5:
information_schema.innodb_buffer_page.pool_id
information_schema.innodb_buffer_page_lru.pool_id
information_schema.innodb_buffer_pool_stats.pool_id
information_schema.innodb_cmpmem.buffer_pool_instance
information_schema.innodb_cmpmem_reset.buffer_pool_instance
Thanks to Vladislav Vaintroub for pointing this out.
Our benchmarking efforts indicate that the reasons for splitting the
buf_pool in commit c18084f71b
have mostly gone away, possibly as a result of
mysql/mysql-server@ce6109ebfd
or similar work.
Only in one write-heavy benchmark where the working set size is
ten times the buffer pool size, the buf_pool->mutex would be
less contended with 4 buffer pool instances than with 1 instance,
in buf_page_io_complete(). That contention could be alleviated
further by making more use of std::atomic and by splitting
buf_pool_t::mutex further (MDEV-15053).
We will deprecate and ignore the following parameters:
innodb_buffer_pool_instances
innodb_page_cleaners
There will be only one buffer pool and one page cleaner task.
In a number of INFORMATION_SCHEMA views, columns that indicated
the buffer pool instance will be removed:
information_schema.innodb_buffer_page.pool_id
information_schema.innodb_buffer_page_lru.pool_id
information_schema.innodb_buffer_pool_stats.pool_id
information_schema.innodb_cmpmem.buffer_pool_instance
information_schema.innodb_cmpmem_reset.buffer_pool_instance
During native table rebuild or index creation, InnoDB used to skip
redo logging and write MLOG_INDEX_LOAD records to inform crash recovery
and Mariabackup of the gaps in redo log. This is fragile and prohibits
some optimizations, such as skipping the doublewrite buffer for
newly (re)initialized pages (MDEV-19738).
row_merge_write_redo(): Remove. We do not write MLOG_INDEX_LOAD
records any more. Instead, we write full redo log.
FlushObserver: Remove.
fseg_free_page_func(): Remove the parameter log. Redo logging
cannot be disabled.
fil_space_t::redo_skipped_count: Remove.
We cannot remove buf_block_t::skip_flush_check, because PageBulk
will temporarily generate invalid B-tree pages in the buffer pool.
Let us define page_id_t as a thin wrapper of uint64_t so that
the comparison operators can be simplified. This is a follow-up
to the original commit 14be814380.
The comparison operator for recv_sys.pages.emplace() turned out to be
a busy spot in a recovery benchmark. That data structure was introduced
in MDEV-19586 in commit 177a571e01.
The linear scan of recv_sys_t::blocks() in recv_sys_t::free()
turns out to dominate the execution time in crash recovery.
Let us scan the much shorter buf_pool->chunks lists instead.
Introduced a new wsrep_strict_ddl configuration variable in which
Galera checks storage engine of the effected table. If table is not
InnoDB (only storage engine currently fully supporting Galera
replication) DDL-statement will return error code:
ER_GALERA_REPLICATION_NOT_SUPPORTED
eng "DDL-statement is forbidden as table storage engine does not support Galera replication"
However, when wsrep_replicate_myisam=ON we allow DDL-statements to
MyISAM tables. If effected table is allowed storage engine Galera
will run normal TOI.
This new setting should be for now set globally on all
nodes in a cluster. When this setting is set following DDL-clauses
accessing tables not supporting Galera replication are refused:
* CREATE TABLE (e.g. CREATE TABLE t1(a int) engine=Aria
* ALTER TABLE
* TRUNCATE TABLE
* CREATE VIEW
* CREATE TRIGGER
* CREATE INDEX
* DROP INDEX
* RENAME TABLE
* DROP TABLE
Statements on PROCEDURE, EVENT, FUNCTION are allowed as effected
tables are known only at execution. Furthermore, USER, ROLE, SERVER,
DATABASE statements are also allowed as they do not really have
effected table.
ANding of the range built from inferred NOT NULL conditions and the range
built from other conditions used in WHERE/ON clauses may produce an
IMPOSSIBLE range. The code of MDEV-15777 did not take into account this
possibility.