Remove unnecessary buf_pool_t:: qualifiers. In comments,
replace buf_pool::mutex with buf_pool.mutex.
Remove an outdated comment about a planned buffer pool resizing feature.
It is already implemented in MariaDB 10.2.2 (and MySQL 5.7.9).
fil_delete_tablespace(): Remove the unused parameter drop_ahi,
and add the parameter if_exists=false. We want to suppress
error messages if we know that the tablespace has been discarded.
dict_table_rename_in_cache(): Pass the new parameter to
fil_delete_tablespace(), that is, do not complain about
missing tablespace if the tablespace has been discarded.
row_make_new_pathname(): Declare as static.
row_drop_table_for_mysql(): Tolerate !table->data_dir_path
when the tablespace has been discarded.
row_rename_table_for_mysql(): Skip part of the RENAME TABLE
when fil_space_get_first_path() returns NULL.
Thanks to MDEV-15058, there is only one InnoDB buffer pool.
Allocating buf_pool statically removes one level of pointer indirection
and makes code more readable, and removes the awkward initialization of
some buf_pool members.
While doing this, we will also declare some buf_pool_t data members
private and replace some functions with member functions. This is
mostly affecting buffer pool resizing.
This is not aiming to be a complete rewrite of buf_pool_t to
a proper class. Most of the buffer pool interface, such as
buf_page_get_gen(), will remain in the C programming style
for now.
buf_pool_t::withdrawing: Replaces buf_pool_withdrawing.
buf_pool_t::withdraw_clock_: Replaces buf_withdraw_clock.
buf_pool_t::create(): Repalces buf_pool_init().
buf_pool_t::close(): Replaces buf_pool_free().
buf_bool_t::will_be_withdrawn(): Replaces buf_block_will_be_withdrawn(),
buf_frame_will_be_withdrawn().
buf_pool_t::clear_hash_index(): Replaces buf_pool_clear_hash_index().
buf_pool_t::get_n_pages(): Replaces buf_pool_get_n_pages().
buf_pool_t::validate(): Replaces buf_validate().
buf_pool_t::print(): Replaces buf_print().
buf_pool_t::block_from_ahi(): Replaces buf_block_from_ahi().
buf_pool_t::is_block_field(): Replaces buf_pointer_is_block_field().
buf_pool_t::is_block_mutex(): Replaces buf_pool_is_block_mutex().
buf_pool_t::is_block_lock(): Replaces buf_pool_is_block_lock().
buf_pool_t::is_obsolete(): Replaces buf_pool_is_obsolete().
buf_pool_t::io_buf: Make default-constructible.
buf_pool_t::io_buf::create(): Delayed 'constructor'
buf_pool_t::io_buf::close(): Early 'destructor'
HazardPointer: Make default-constructible. Define all member functions
inline, also for derived classes.
- This issue is caused by MDEV-19176
(bba59abb03).
- Problem is that there is miscalculation of available memory during
recovery if innodb_buffer_pool_instances > 1.
- Ignore the buffer pool instance while calculating available_memory
- Removed recv_n_pool_free_frames variable and use buf_pool_get_n_pages()
instead.
log_t::has_encryption_key_rotation(): checks whether
key rotation is supported.
In a subsequent redo log format version, this key rotation
may be broken again.
Write log header just ones when file is created, instead of
writing to it on every log file wrap around.
log_t::file::write_header_durable(): this one writes to log header
log_write_buf(): this one stops writing to log header
Starting with commit 1a6f708ec5
the function buf_pool_get_dirty_pages_count() is only used
in a debug check. It was dead code for non-debug builds.
buf_flush_dirty_pages(): Perform the debug check inline,
and replace the assertion
ut_ad(first || buf_pool_get_dirty_pages_count(id) == 0);
with another one that is executed while holding the mutexes:
ut_ad(id != bpage->id.space());
All tablespace metadata is buffered in fil_system. There is a LRU
mechanism, but that only controls the opening and closing of
fil_node_t::handle.
It is much more efficient and less error-prone to access data file names
by looking up the fil_space_t object rather than by essentially joining
each row with an access to SYS_DATAFILES via the InnoDB internal SQL parser.
dict_get_first_path(): Declare static. The function may only be needed
when loading or updating the data dictionary. Also, change a condition
in order to avoid a bogus GCC 10 -Wstringop-overflow warning for
mem_strdupl() about len==ULINT_UNDEFINED.
i_s_sys_tablespaces_fill_table(): Do not access other InnoDB internal
dictionary tables than SYS_TABLESPACES.
The -Wconversion in GCC seems to be stricter than in clang.
GCC at least since version 4.4.7 issues truncation warnings for
assignments to bitfields, while clang 10 appears to only issue
warnings when the sizes in bytes rounded to the nearest integer
powers of 2 are different.
Before GCC 10.0.0, -Wconversion required more casts and would not
allow some operations, such as x<<=1 or x+=1 on a data type that
is narrower than int.
GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
about x|=y even when x and y are compatible types that are narrower
than int. Hence, we must rewrite some x|=y as
x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.
In GCC 6 and later, the warning for assigning wider to bitfields
that are narrower than 8, 16, or 32 bits can be suppressed by
applying a bitwise & with the exact bitmask of the bitfield.
For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
cases.
The bitwise negation operator appears to promote short integers
to a wider type, and hence we must add explicit truncation casts
around them. Microsoft Visual C does not allow a static_cast to
truncate a constant, such as static_cast<byte>(1) truncating int.
Hence, we will use the constructor-style cast byte(~1) for such cases.
This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
The extension of the record comparison functions for SPATIAL INDEX in
mysql/mysql-server@b66ad511b6
was suboptimal for multiple reasons:
Some functions used unnecessary temporary variables of the int type,
instead of the more appropriate size_t, causing type mismatch.
Many functions unnecessarily required rec_get_offsets() to be
computed, or a parameter for length, although the size of the
minimum bounding rectangle (MBR) is hard-coded as
SPDIMS * 2 * sizeof(double), or 32 bytes.
In InnoDB SPATIAL INDEX records, there always is a 32-byte key
followed by either a 4-byte child page number or the PRIMARY KEY value.
The length parameters were not properly validated.
The function cmp_geometry_field() was making an incorrect attempt
at checking that the lengths are at least sizeof(double) (8 bytes),
even though the function is accessing up to 32 bytes in both MBR.
Functions that are called from only one compilation unit are defined
in another compilation unit, making the code harder to follow and
potentially slower to execute.
cmp_dtuple_rec_with_gis(): FIXME: Correct the debug assertion
and possibly the function TABLE_SHARE::init_from_binary_frm_image()
or related code, which causes an unexpected length of
DATA_MBR_LEN + 2 bytes to be passed to this function.
buf_flush_freed_page(): Reformat in the common style, and
simplify some code. Prefer to request all information from
smaller data structures (buf_page_t) than from fil_space_t
or the global variable srv_immediate_scrub_data_uncompressed.
SysTablespace::open_or_create(): Assert that the temporary
tablespace will not be created in page_compressed format, so that
buf_flush_freed_page() can avoid checking that on every call.
IORequest: Remove duplicated constructors, and do not explicitly
declare a default constructor.
The following parameters are deprecated:
innodb-background-scrub-data-uncompressed
innodb-background-scrub-data-compressed
innodb-background-scrub-data-interval
innodb-background-scrub-data-check-interval
Removed scrubbing code completely(btr0scrub.h, btr0scrub.cc)
Removed information_schema.innodb_tablespaces_scrubbing tables
Removed the scrubbing logic from fil_crypt_thread()
When a InnoDB data file page is freed, its contents becomes garbage,
and any storage allocated in the data file is wasted. During flushing,
InnoDB initializes the page with zeros if scrubbing is enabled. If the
tablespace is compressed then InnoDB should punch a hole else ignore the
flushing of the freed page.
buf_page_t:
- Replaced the variable file_page_was_freed, init_on_flush in buf_page_t
with status enum variable.
- Changed all debug assert of file_page_was_freed to DBUG_ASSERT
of buf_page_t::status
Removed buf_page_set_file_page_was_freed(),
buf_page_reset_file_page_was_freed().
buf_page_free(): Newly added function which takes X-lock on the page
before marking the status as FREED. So that InnoDB flush handler can
avoid concurrent flush of the freed page. Also while flushing the page,
InnoDB make sure that redo log which does freeing of the page also written
to the disk. Currently, this function only marks the page as FREED if
it is in buffer pool
buf_flush_freed_page(): Newly added function which initializes zeros
asynchorously if innodb_immediate_scrub_data_uncompressed is enabled.
Punch a hole to the file synchorously if page_compressed is enabled.
Reset the io_fix to NORMAL. Release the block from flush list and
associated mutex before writing zeros or punch a hole to the file.
buf_flush_page(): Removed the unnecessary usage of temporary
variable "flush"
fil_io(): Introduce new parameter called punch_hole. It allows fil_io()
to punch the hole to the file for the given offset.
buf_page_create(): Let the callers assign buf_page_t::status.
Every caller should eventually invoke mtr_t::init().
fsp_page_create(): Remove the unused mtr_t parameter.
In all other callers of buf_page_create() except fsp_page_create(),
before invoking mtr_t::init(), invoke
mtr_t::sx_latch_at_savepoint() or mtr_t::x_latch_at_savepoint().
mtr_t::init(): Initialize buf_page_t::status also for the temporary
tablespace (when redo logging is disabled), to avoid assertion failures.
create_log_file(): Delete all old redo log files where they used to be
deleted, after the crash injection point innodb_log_abort_6,
before commit 9ef2d29ff4
deprecated and ignored the setting innodb_log_files_in_group.
log_crypt_101_read_checkpoint(), log_crypt_101_read_block():
Declare as ATTRIBUTE_COLD. These are only used when
checking that a MariaDB 10.1 encrypted redo log is clean.
log_block_calc_checksum_format_0(): Define in the only
compilation unit where it is needed. This is only used
when reading the checkpoint information from redo logs
before MariaDB 10.2.2.
crypt_info_t: Declare the byte arrays directly with alignas().
log_crypt(): Use memcpy_aligned instead of reinterpret_cast
on integers.
Some fields were protected by log_sys.mutex, which adds quite some
overhead for readers. Some readers were submitting dirty reads.
log_t::lsn: Declare private and atomic. Add wrappers get_lsn()
and set_lsn() that will use relaxed memory access. Many accesses
to log_sys.lsn are still protected by log_sys.mutex; we avoid the
mutex for some readers.
log_t::flushed_to_disk_lsn: Declare private and atomic, and move
to the same cache line with log_t::lsn.
log_t::buf_free: Declare as size_t, and move to the same cache line
with log_t::lsn.
log_t::check_flush_or_checkpoint_: Declare private and atomic,
and move to the same cache line with log_t::lsn.
log_get_lsn(): Define as an alias of log_sys.get_lsn().
log_get_lsn_nowait(), log_peek_lsn(): Remove.
log_get_flush_lsn(): Define as an alias of log_sys.get_flush_lsn().
log_t::initiate_write(): Replaces log_buffer_sync_in_background().
O_DSYNC is faster than O_SYNC because it syncs as little as needed
(e.g. no timestamp changes)
This change is similar to change fsync() -> fdatasync() in MDEV-21382
The ut_crc32() function uses a hard-coded initial CRC-32C value of 0.
Replace it with ut_crc32_low(), which allows to specify the initial
checksum value, and provide an inlined compatibility wrapper ut_crc32().
Also, remove non-inlined wrapper functions on ARMv8 and POWER8,
and remove dead code (the generic implementation) on POWER8.
Note: The original AMD64 instruction set architecture in 2003 only
included SSE2. The CRC-32C instructions are part of the SSE4.2
instruction set extension for IA-32 and AMD64, with first processors
released in November 2007 (using the AMD Barcelona microarchitecture)
and November 2008 (Intel Nehalem microarchiteture). It might be safe
to assume that SSE4.2 is available on all currently used AMD64 based
systems, but we are not taking that step yet.
The configuration parameter innodb_scrub_log never really worked, as
reported in MDEV-13019 and MDEV-18370.
Because MDEV-14425 is changing the redo log format, the innodb_scrub_log
feature would have to be adjusted for it. Due to the known problems,
it is easier to remove the feature for now, and to ignore and deprecate
the parameters.
If old log contents should be kept secret, then enabling innodb_encrypt_log
or setting a smaller innodb_log_file_size could help.
Compute MONITOR_LSN_CHECKPOINT_AGE on demand in
srv_mon_process_existing_counter().
This allows us to remove the overhead of MONITOR_SET
calls for the counter.
The function log_header_read() was only used during server startup,
and it will mostly be used only for reading checkpoint information
from pre-MDEV-14425 format redo log files.
Let us replace the function with more direct calls, so that
it is clearer what is going on. It is not strictly necessary to
hold any mutex during this operation, and because there will be
only a limited number of operations during early server startup,
it is not necessary to increment any I/O counters.
Simplify the logging of ALTER TABLE operations, by making use of the
TRX_UNDO_RENAME_TABLE undo log record that was introduced in
commit 0bc36758ba.
commit_try_rebuild(): Invoke row_rename_table_for_mysql() and
actually rename the files before committing the transaction.
fil_mtr_rename_log(), commit_cache_rebuild(),
log_append_on_checkpoint(), row_merge_rename_tables_dict(): Remove.
mtr_buf_copy_t, log_t::append_on_checkpoint: Remove.
row_rename_table_for_mysql(): If !use_fk, ignore missing foreign
keys. Remove a call to dict_table_rename_in_cache(), because
trx_rollback_to_savepoint() should invoke the function if needed.
For undo log truncation, commit 055a3334ad
repurposed the MLOG_FILE_CREATE2 record with a nonzero page size
to indicate that an undo tablespace will be shrunk in size.
In commit 7ae21b18a6 the
MLOG_FILE_CREATE2 record was replaced by a FILE_CREATE record.
Now that the redo log encoding was changed, there is no actual need
to write a file name in the log record; it suffices to write the
page identifier of the first page that is not part of the file.
This TRIM_PAGES record could allow us to shrink any data files in the
future. For now, it will be limited to undo tablespaces.
mtr_t::log_file_op(): Remove the parameter first_page_no, because
it would always be 0 for file operations.
mtr_t::trim_pages(): Replaces fil_truncate_log().
mtr_t::log_write(): Avoid same_page encoding if !bpage&&!m_last.
fil_op_replay_rename(): Remove the constant parameter first_page_no=0.
Introduce special synchronization primitive group_commit_lock
for more efficient synchronization of redo log writing and flushing.
The goal is to reduce CPU consumption on log_write_up_to, to reduce
the spurious wakeups, and improve the throughput in write-intensive
benchmarks.
Inserting a record into an index page involves updating multiple
fields in the page header as well as updating the next-record links
and potentially updating fields related to the sparse page directory.
Let us cover the insert operations by higher-level log records, to avoid
'redundant' logging about the writes.
The code for applying the high-level log records will check the
consistency of the page thoroughly, to avoid crashes during recovery.
We will refuse to replay the inserts if any inconsistency is detected.
With innodb_force_recovery=1, recovery will continue, but the affected
pages may be more inconsistent if some changes were omitted.
mrec_ext_t: Introduce the EXTENDED record subtypes
INSERT_HEAP_REDUNDANT, INSERT_REUSE_REDUNDANT,
INSERT_HEAP_DYNAMIC, INSERT_REUSE_DYNAMIC.
The record will explicitly identify the page type and whether
the space will be allocated from PAGE_HEAP_TOP or reused from
the PAGE_FREE list. It will also tell how many bytes to copy
from the preceding record header and payload, and how to
initialize the rest of the record header and payload.
mtr_t::page_insert(): Write the high-level log records.
log_phys_t::apply(): Parse the high-level log records.
page_apply_insert_redundant(), page_apply_insert_dynamic():
Apply the high-level log records.
page_dir_split_slot(): Introduce a variant that does not write log
nor deal with ROW_FORMAT=COMPRESSED pages.
page_mem_alloc_heap(): Remove the mtr_t parameter
page_cur_insert_rec_low(): Write log only via mtr_t::page_insert().