Provide some statistics about asynchronous IO reads and writes:
- number of pending operations
- number of completion callbacks that are currently being executed
- number of completion callbacks that are currently queued
(due to restriction on number of IO threads)
- total number of IOs finished
- total time to wait for free IO slot
- total number of completions that were queued.
Also revert tpool InnoDB perfschema instrumentation (MDEV-31048)
That instrumentation of cache mutex did not bring any revelation (
the mutex is taken for a couple of instructions), and made it impossible
to use tpool outside of the server (e.g in mariadbimport/dump)
The parameter innodb_undo_log_truncate=ON enables a multi-phased logic:
1. Any "producers" (new starting transactions) are prohibited
from using the rollback segments that reside in the undo tablespace.
2. Any transactions that use any of the rollback segments must be
committed or aborted.
3. The purge of committed transaction history must process all the
rollback segments.
4. The undo tablespace is truncated and rebuilt.
5. The rollback segments are re-enabled for new transactions.
There was one flaw in this logic: The first step was not being invoked
as often as it could be, and therefore innodb_undo_log_truncate=ON
would have no chance to work during a heavy write workload.
Independent of innodb_undo_log_truncate, even after
commit 86767bcc0f
we are missing some chances to free processed undo log pages.
If we prohibited the creation of new transactions in one busy
rollback segment at a time, we would be eventually guaranteed
to be able to free such pages.
purge_sys_t::skipped_rseg: The current candidate rollback segment
for shrinking the history independent of innodb_undo_log_truncate.
purge_sys_t::iterator::free_history_rseg(): Renamed from
trx_purge_truncate_rseg_history(). Implement the logic
around purge_sys.m_skipped_rseg.
purge_sys_t::truncate_undo_space: Renamed from truncate.
purge_sys.truncate_undo_space.last: Changed the type to integer
to get rid of some pointer dereferencing and conditional branches.
purge_sys_t::truncating_tablespace(), purge_sys_t::undo_truncate_try():
Refactored from trx_purge_truncate_history().
Set purge_sys.truncate_undo_space.current if applicable,
or return an already set purge_sys.truncate_undo_space.current.
purge_coordinator_state::do_purge(): Invoke
purge_sys_t::truncating_tablespace() as part of the normal work loop,
to implement innodb_undo_log_truncate=ON as often as possible.
trx_purge_truncate_rseg_history(): Remove a redundant parameter.
trx_undo_truncate_start(): Replace dead code with a debug assertion.
Correctness tested by: Matthias Leich
Performance tested by: Axel Schwenke
Reviewed by: Debarun Banerjee
- InnoDB fails to find the space id from the page0 of
the tablespace. In that case, InnoDB can use
doublewrite buffer to recover the page0 and write
into the file.
- buf_dblwr_t::init_or_load_pages(): Loads only the pages
which are valid.(page lsn >= checkpoint). To do that,
InnoDB has to open the redo log before system
tablespace, read the latest checkpoint information.
recv_dblwr_t::find_first_page():
1) Iterate the doublewrite buffer pages and find the 0th page
2) Read the tablespace flags, space id from the 0th page.
3) Read the 1st, 2nd and 3rd page from tablespace file and
compare the space id with the space id which is stored
in doublewrite buffer.
4) If it matches then we can write into the file.
5) Return space which matches the pages from the file.
SysTablespace::read_lsn_and_check_flags(): Remove the
retry logic for validating the first page. After
restoring the first page from doublewrite buffer,
assign tablespace flags by reading the first page.
recv_recovery_read_max_checkpoint(): Reads the maximum
checkpoint information from log file
recv_recovery_from_checkpoint_start(): Avoid reading
the checkpoint header information from log file
Datafile::validate_first_page(): Throw error in case
of first page validation fails.
srv_start(): Move a read only mode startup tweak from
innodb_init_params() to the correct location. Also if
innodb_force_recovery=6 we will disable the doublewrite buffer,
because InnoDB must run in read-only mode to prevent further corruption.
This change only affects debug checks. Whenever srv_read_only_mode holds,
the buf_pool.flush_list will be empty, that is, there will be no writes
of persistent InnoDB data pages.
Reviewed by: Thirunarayanan Balathandayuthapani
srv_export_innodb_status(): Update
export_vars.innodb_buffer_pool_read_requests
with buf_pool.stat.n_page_gets. This is caused due
to incorrect merge commit 44c9008ba6
buf_page_get_low(): Rename to buf_page_get_gen(), and assume that no
crash recovery is needed.
recv_sys_t::recover(): Replaces the old buf_page_get_gen(). Read a page
while crash recovery is in progress.
trx_rseg_get_n_undo_tablespaces(), ibuf_upgrade_needed():
Invoke recv_sys.recover() instead of buf_page_get_gen().
dict_boot(): Invoke recv_sys.recover() instead of buf_page_get_gen().
Do not load the system tables.
srv_start(): Load the system tables and the undo logs after all redo log
has been applied in recv_sys.apply(true) and we can safely invoke the
regular buf_page_get_gen().
dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES.
dict_check_tablespaces_and_store_max_id(): In the normal case
(no encryption plugin has been loaded and the change buffer is empty),
invoke dict_find_max_space_id() and do not open any .ibd files.
If a std::set<uint32_t> has been specified, open the files whose
tablespace ID is mentioned. Else, open all data files that are identified
by SYS_TABLES records.
fil_ibd_open(): Remove a call to os_file_get_last_error() that can
report a misleading error, such as EINVAL inside my_realpath() that is
not an actual error. This could be invoked when a data file is found
but the FSP_SPACE_FLAGS are incorrect, such as is the case for
table test.td in
./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags
buf_load(): If any tablespaces could not be found, invoke
dict_check_tablespaces_and_store_max_id() on the missing tablespaces.
dict_load_tablespace(): Try to load the tablespace unless it was found
to be futile. This fixes failures related to FTS_*.ibd files for
FULLTEXT INDEX.
btr_cur_t::search_leaf(): Prevent a crash when the tablespace
does not exist. This was caught by the test innodb_fts.fts_concurrent_insert
when the change to dict_load_tablespaces() was not present.
We modify a few tests to ensure that tables will not be loaded at startup.
For some fault injection tests this means that the corrupted tables
will not be loaded, because dict_load_tablespace() would perform stricter
checks than dict_check_tablespaces_and_store_max_id().
Tested by: Matthias Leich
Reviewed by: Thirunarayanan Balathandayuthapani
innodb_preshutdown(): Only wait for active transactions to be terminated
if InnoDB was started and innodb_force_recovery=3 or larger does not
prevent a rollback.
This fixes the following:
./mtr --parallel=auto --mysqld=--innodb-fast-shutdown=0 \
innodb.log_file_size innodb.innodb_force_recovery \
innodb.read_only_recovery innodb.read_only_recover_committed \
mariabackup.apply-log-only-incr
A slow shutdown using the previous default innodb_purge_batch_size=300
could be extremely slow, employing at most a few CPU cores on the average.
Let us use the maximum batch size in order to increase throughput.
Reviewed by: Vladislav Lesin
The InnoDB table lookup in purge worker threads is a bottleneck that can
degrade a slow shutdown to utilize less than 2 threads. Let us fix that
bottleneck by constructing a local lookup table that does not require any
synchronization while the undo log records of the current batch
are being processed.
TRX_PURGE_TABLE_BUCKETS: The initial number of std::unordered_map
hash buckets used during a purge batch. This could avoid some
resizing and rehashing in trx_purge_attach_undo_recs().
purge_node_t::tables: A lookup table from table ID to an already
looked up and locked table. Replaces many fields.
trx_purge_attach_undo_recs(): Look up each table in the purge batch
only once.
trx_purge(): Close all tables and release MDL at the end of the batch.
trx_purge_table_open(), trx_purge_table_acquire(): Open a table in purge
and acquire a metadata lock on it. This replaces
dict_table_open_on_id<true>() and dict_acquire_mdl_shared().
purge_sys_t::close_and_reopen(): In case of an MDL conflict, close and
reopen all tables that are covered by the current purge batch.
It may be that some of the tables have been dropped meanwhile and can
be ignored. This replaces wait_SYS() and wait_FTS().
row_purge_parse_undo_rec(): Make purge_coordinator_task issue a
MDL warrant to any purge_worker_task which might need it
when innodb_purge_threads>1.
purge_node_t::end(): Clear the MDL warrant.
Reviewed by: Vladislav Lesin and Vladislav Vaintroub
purge_coordinator_state::do_purge(): Simply use all innodb_purge_threads,
no matter what the LSN age is. During shutdown with innodb_fast_shutdown=0
this code could degrade to using only 1 thread.
Also, restore periodical "InnoDB: to purge" messages that were
accidentally disabled in commit 80585c9d6f.
Reviewed by: Vladislav Lesin and Vladislav Vaintroub
purge_sys_t::wake_if_not_active(): Replaces
srv_wake_purge_thread_if_not_active().
innodb_ddl_recovery_done(): Move the wakeup call to
srv_init_purge_tasks().
purge_coordinator_timer: Remove. The srv_master_callback() already
invokes purge_sys.wake_if_not_active() once per second.
Reviewed by: Vladislav Lesin and Vladislav Vaintroub
The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.
Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.
The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.
purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).
purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.
purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().
purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().
Reviewed by: Vladislav Lesin
srv_all_undo_tablespaces_open(): While opening the extra unused
undo tablespaces, InnoDB should use ULINT_UNDEFINED instead of
SRV_SPACE_ID_UPPER_BOUND.