page_zip_reorganize(): Restore the page on failure.
In callers, omit now-redundant calls to page_zip_decompress().
btr_page_reorganize_low(): Define in static scope only, and
remove the z_level parameter. Assert that ROW_FORMAT is not COMPRESSED.
btr_page_reorganize_block(), btr_page_reorganize(): Invoke
page_zip_reorganize() for ROW_FORMAT=COMPRESSED.
page_zip_compress_write_log_no_data(): Remove.
We no longer write the MLOG_ZIP_PAGE_COMPRESS_NO_DATA record.
Instead, we will write MLOG_ZIP_PAGE_COMPRESS records.
Our benchmarking efforts indicate that the reasons for splitting the
buf_pool in commit c18084f71b
have mostly gone away, possibly as a result of
mysql/mysql-server@ce6109ebfd
or similar work.
Only in one write-heavy benchmark where the working set size is
ten times the buffer pool size, the buf_pool->mutex would be
less contended with 4 buffer pool instances than with 1 instance,
in buf_page_io_complete(). That contention could be alleviated
further by making more use of std::atomic and by splitting
buf_pool_t::mutex further (MDEV-15053).
We will deprecate and ignore the following parameters:
innodb_buffer_pool_instances
innodb_page_cleaners
There will be only one buffer pool and one page cleaner task.
In a number of INFORMATION_SCHEMA views, columns that indicated
the buffer pool instance will be removed:
information_schema.innodb_buffer_page.pool_id
information_schema.innodb_buffer_page_lru.pool_id
information_schema.innodb_buffer_pool_stats.pool_id
information_schema.innodb_cmpmem.buffer_pool_instance
information_schema.innodb_cmpmem_reset.buffer_pool_instance
During native table rebuild or index creation, InnoDB used to skip
redo logging and write MLOG_INDEX_LOAD records to inform crash recovery
and Mariabackup of the gaps in redo log. This is fragile and prohibits
some optimizations, such as skipping the doublewrite buffer for
newly (re)initialized pages (MDEV-19738).
row_merge_write_redo(): Remove. We do not write MLOG_INDEX_LOAD
records any more. Instead, we write full redo log.
FlushObserver: Remove.
fseg_free_page_func(): Remove the parameter log. Redo logging
cannot be disabled.
fil_space_t::redo_skipped_count: Remove.
We cannot remove buf_block_t::skip_flush_check, because PageBulk
will temporarily generate invalid B-tree pages in the buffer pool.
cmake -DWITH_INNODB_EXTRA_DEBUG:BOOL=ON
was broken ever since commit 8777458a6e
(MDEV-6076 Persistent AUTO_INCREMENT for InnoDB).
There is a race condition between page reads that call
page_zip_validate() (while holding clustered index root page S-latch)
and writes that update PAGE_ROOT_AUTO_INC
(with buf_block_t::lock SX-latch, compatible with S-latch).
page_zip_validate_low(): Skip the PAGE_ROOT_AUTO_INC field on
clustered index root pages in order to avoid false positives.
I found that memcpy_aligned was used incorrectly at redo log and decided to put
assertions in aligned functions. And found even more incorrect cases.
Given the amount discovered of bugs, I left assertions to prevent future bugs.
my_assume_aligned(): instead of MY_ASSUME_ALIGNED macro
page_set_autoinc(): Check if any change would take place,
and omit the mtr_t::OPT parameter. This should avoid
unnecessarily redo log writes for ROW_FORMAT=COMPRESSED pages.
In commit befde6e97e
a bogus condition was added, aiming to avoid debug assertion failures
during ALTER TABLE...IMPORT TABLESPACE.
page_cur_delete_rec(): Remove the bogus condition, and replace the
use of buf_block_modify_clock_inc() with a lower-level operation
that skips the debug checks that would fail during IMPORT.
Pass buf_block_t* to more functions that write redo log.
page_zip_write_node_ptr(), page_zip_write_blob_ptr(),
page_zip_compress_write_log_no_data():
Take buf_block_t* as parameter, and do not tolerate mtr=NULL.
page_zip_compress(): Do not tolerate mtr=NULL.
page_zip_dir_insert(): Take page_cur_t* as parameter.
mlog_write_initial_log_record(): Remove. This function was unused.
RecIterator::remove(): Remove the redundant page_zip parameter.
PageConverter::m_page_zip_ptr: Remove.
page_cur_delete_rec(): Do not tolerate mtr=NULL.
page_delete_rec(): Merge with the only caller, RecIterator::remove().
RecIterator::m_mtr: New data member: a dummy mini-transaction.
offset_t: this is a type which represents one record offset.
It's unsigned short int.
a lot of functions: replace ulint with offset_t
btr_pcur_restore_position_func(),
page_validate(),
row_ins_scan_sec_index_for_duplicate(),
row_upd_clust_rec_by_insert_inherit_func(),
row_vers_impl_x_locked_low(),
trx_undo_prev_version_build():
allocate record offsets on the stack instead of waiting for rec_get_offsets()
to allocate it from mem_heap_t. So, reducing memory allocations.
RECORD_OFFSET, INDEX_OFFSET:
now it's less convenient to store pointers in offset_t*
array. One pointer occupies now several offset_t. And those constant are start
indexes into array to places where to store pointer values
REC_OFFS_HEADER_SIZE: adjusted for the new reality
REC_OFFS_NORMAL_SIZE:
increase size from 100 to 300 which means less heap allocations.
And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which
is smaller than previous 800 bytes.
REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality
rem0rec.h, rem0rec.ic, rem0rec.cc:
various arguments, return values and local variables types were changed to
fix numerous integer conversions issues.
enum field_type_t:
offset types concept was introduces which replaces old offset flags stuff.
Like in earlier version, 2 upper bits are used to store offset type.
And this enum represents those types.
REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed
get_type(), set_type(), get_value(), combine():
these are convenience functions to work with offsets and it's types
rec_offs_base()[0]:
still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL
rec_offs_base()[i]:
these have type offset_t now. Two upper bits contains type.
ut_rnd_interval(): Remove the first parameter, which was mostly
passed as 0. Implement as a simple wrapper around ut_rnd_gen().
Trivially return 0 if the size of the interval is smaller than 2.
ut_rnd_ulint_counter, ut_rnd_gen_next_ulint(), ut_rnd_gen_ulint(): Remove.
ut_rnd_set_seed(): Unused function; remove.
ut_rnd_gen(): Renamed from page_cur_lcg_prng().
ut_rnd_current: The internal state of ut_rnd_gen().
page_cur_open_on_rnd_user_rec(): Replace linear search with
page_rec_get_nth().
Before commit 90c52e5291 introduced
aligned_malloc(), InnoDB always used a pattern of over-allocating
memory and invoking ut_align() to guarantee the desired alignment.
It is cleaner to invoke aligned_malloc() and aligned_free() directly.
ut_align(): Remove. In assertions, ut_align_down() can be used instead.
mtr_t::write(): Replaces mlog_write_ulint(), mlog_write_ull().
Optimize away writes if the page contents does not change,
except when a dummy write has been explicitly requested.
Because the member function template takes a block descriptor as a
parameter, it is possible to introduce better consistency checks.
Due to this, the code for handling file-based lists, undo logs
and user transactions was refactored to pass around buf_block_t.
page_create_write_log(), mlog_write_initial_log_record():
Merge to the only caller, and use
mlog_write_initial_log_record_low() for writing the log record.
page_zip_compress(), page_zip_decompress_low(), page_zip_reorganize():
Make use of memcpy_aligned() and memset_aligned(), so that some
operations can be translated more efficiently.
No difference was observed for AMD64 on GCC 9.2.1. But, such hints could
enable more efficient code on less common instruction set architectures,
such as 32-bit ARM, POWER or MIPS.
Except for fil_name_process(), which invokes os_normalize_path(),
the redo log record parser will not modify the redo log records.
Add const qualifiers accordingly.
btr_page_get_split_rec_to_left(): Assert that in the leftmost leaf page,
the metadata record exists if and only if index->is_instant().
page_validate(): Correct the wording of a message.
rec_init_offsets(): Assert that whenever a record is in "instant ALTER"
format, index->is_instant() must hold.
In MDEV-11369 (instant ADD COLUMN) in MariaDB Server 10.3,
we introduced the hidden metadata record that must be the
first record in the clustered index if and only if
index->is_instant() holds.
To catch MDEV-19783, in
commit ed0793e096 and
commit 99dc40d6ac
we added some assertions to find cases where
the metadata record is missing while it should not be, or a
record exists when it should not. Those assertions were invalid
when traversing the PAGE_FREE list. That list can contain anything;
we must only be able to determine the successor and the size of
each garbage record in it.
page_validate(), page_simple_validate_old(), page_simple_validate_new():
Do not invoke page_rec_get_next_const() for traversing the PAGE_FREE
list, but instead use a lower-level accessor that does not attempt to
validate the REC_INFO_MIN_REC_FLAG.
page_copy_rec_list_end_no_locks(),
page_copy_rec_list_start(), page_delete_rec_list_start():
Add assertions.
btr_page_get_split_rec_to_left(): Remove a redundant return value,
and make the output parameter the return value.
btr_page_get_split_rec_to_right(), btr_page_split_and_insert(): Clean up.
btr_cur_pessimistic_delete(): code changed in a way that allows
to put more REC_INFO_MIN_REC_FLAG assertions inside btr_set_min_rec_mark().
Without that change tests innodb.innodb-table-online,
innodb.temp_table_savepoint and innodb_zip.prefix_index_liftedlimit fail.
Removed basically duplicated page_zip_validate() calls
which fails because of temporary(!) invariant violation.
That fixed innodb_zip.wl5522_debug_zip and
innodb_zip.prefix_index_liftedlimit
The bug affects MariaDB Server 10.3 or later, but it makes sense
to improve CHECK TABLE in earlier versions already.
page_validate(): Check REC_INFO_MIN_REC_FLAG in the records.
This allows CHECK TABLE to catch more bugs.
Until now, InnoDB inefficiently compared the aligned fields
FIL_PAGE_PREV, FIL_PAGE_NEXT to the byte-order-agnostic value FIL_NULL.
This is a backport of 32170f8c6d
from MariaDB Server 10.3.
The crash scenario is as follows:
(1) A non-empty table exists.
(2) MDEV-15562 instant ADD/DROP/reorder has been invoked.
(3) Some purgeable undo log exists for the table.
(4) The table becomes empty, containing not even any delete-marked records,
only containing the hidden metadata record that was added in (2).
(5) An instant ADD/DROP/reorder column is executed, and the table
is emptied and the (2) metadata removed.
(6) Purge processes an undo log record from (3), which will refer to
a non-existent clustered index field, because the metadata that
was created in (2) was remoeved in (5).
We fix this by adjusting step (5) so that we will never remove the
MDEV-15562-style metadata record. Removing the MDEV-11369 metadata
record (instant ADD COLUMN to the end of the table) is completely
fine at any time when the table becomes empty, because
dict_index_t::n_fields will remain unchanged.
innobase_instant_try(): Never remove the MDEV-15562 metadata record.
page_cur_delete_rec(): Do not reset FIL_PAGE_TYPE when the
MDEV-15562 metadata record is being removed as part of
btr_cur_pessimistic_update() invoked by innobase_instant_try().
In MariaDB 10.4.0, commit 09af00cbde
removed the crash-upgrade logic for the MariaDB 10.2
innodb_safe_truncate=OFF TRUNCATE TABLE (which was the only option
between MariaDB 10.2.2 and 10.2.18), but failed to adjust some
comments and code.
buf_page_io_complete(): Remove a bogus comment about TRUNCATE.
dict_recreate_index_tree(): Unused function; remove.
fil_space_t::stop_new_ops: Clarify the comment.
fil_space_acquire_low(): Remove a bogus comment about TRUNCATE.
fil_check_pending_ops(), fil_check_pending_io(): Adjust a warning message.
This code is only invoked as part of DISCARD TABLESPACE or DROP TABLE.
DROP TABLE is internally used as part of ALTER TABLE, OPTIMIZE TABLE,
or TRUNCATE TABLE.
RemoteDatafile::create_link_file(): Clarify a comment.
ibuf_delete_for_discarded_space(): Clarify the function comment.
dict_table_x_lock_indexes(), dict_table_x_unlock_indexes():
Merge with the only remaining caller, row_quiesce_set_state().
page_create_zip(): Remove a bogus comment about TRUNCATE.
page_mem_free(): Define in the same file with the only caller
page_cur_delete_rec().
page_dir_slot_set_rec(): Add const qualifier to a parameter.
page_dir_delete_slot(): Merge with the only caller page_dir_balance_slot().
page_dir_add_slot(): Merge with the only caller page_dir_split_slot().
page_dir_split_slot(), page_dir_balance_slot(): Define in the
same compilation unit with the callers, and simplify the code.