thd_destructor_proxy(): Ensure that purge actually exits,
like the logic should have been ever since MDEV-14080.
srv_purge_shutdown(): A new function to wait for the
purge coordinator to exit. Before exiting, the
purge coordinator will ensure that all purge workers have exited.
srv_purge_coordinator_thread(): Wait for all purge worker threads
to actually exit. An analysis of a core dump of a hung 10.3 server
revealed that one srv_worker_thread did not exit, even though the
purge coordinator had exited. This caused kill_server_thread and
mysqld_main to wait indefinitely. The main InnoDB shutdown was
never called, because unireg_end() was never called.
Thanks to Sergey Vojtovich for feedback and many ideas.
purge_state_t: Remove. The states are replaced with
purge_sys_t::enabled() and purge_sys_t::paused() as follows:
PURGE_STATE_INIT, PURGE_STATE_EXIT, PURGE_STATE_DISABLED: !enabled().
PURGE_STATE_RUN, PURGE_STATE_STOP: paused() distinguishes these.
purge_sys_t::m_paused: Renamed from purge_sys_t::n_stop.
Protected by atomic memory access only, not purge_sys_t::latch.
purge_sys_t::m_enabled: An atomically updated Boolean that
replaces purge_sys_t::state.
purge_sys_t:🏃 Remove, because it duplicates
srv_sys.n_threads_active[SRV_PURGE].
purge_sys_t::running(): Accessor for srv_sys.n_threads_active[SRV_PURGE].
purge_sys_t::stop(): Renamed from trx_purge_stop().
purge_sys_t::resume(): Renamed from trx_purge_run().
Do not acquire latch; solely rely on atomics.
purge_sys_t::is_initialised(), purge_sys_t::m_initialised: Remove.
purge_sys_t::create(), purge_sys_t::close(): Instead of invoking
is_initialised(), check whether event is NULL.
purge_sys_t::event: Move before latch, so that fields that are
protected by latch can reside on the same cache line with latch.
srv_start_wait_for_purge_to_start(): Merge to the only caller srv_start().
trx_purge_stop(): Release purge_sys->latch before attempting to
wake up the purge threads, so that they can actually wake up.
This is a regression of commit a13a636c74
which attempted to fix MDEV-11802 by ensuring that srv_purge_wakeup()
will actually wait until all purge threads wake up.
Due to the purge_sys->latch, the threads might never wake up,
because some purge threads could end up waiting for purge_sys->latch
in trx_undo_prev_version_build() while holding dict_operation_lock
in shared mode. This in turn would block any DDL operations, the
InnoDB master thread, and InnoDB shutdown.
Problem:
=======
InnoDB master thread encounters the shutdown state as SRV_SHUTDOWN_FLUSH_PHASE
when innodb_force_recovery >=2 and slow scheduling of master thread during
shutdown.
Fix:
====
There is no need for master thread itself if innodb_force_recovery >=2.
Don't create the master thread if innodb_force_recovery >= 2
Remove unused InnoDB function parameters and functions.
i_s_sys_virtual_fill_table(): Do not allocate heap memory.
mtr_is_block_fix(): Replace with mtr_memo_contains().
mtr_is_page_fix(): Replace with mtr_memo_contains_page().
Implement innodb_flush_method as an enum parameter in Mariabackup,
instead of ignoring the option and hard-wiring it to a default value.
xb0xb.h: Remove. Only xtrabackup.cc refers to the enum parameters.
innodb_flush_method_names[], innodb_flush_method_typelib[]:
Define as non-static, so that mariabackup can share the definitions.
srv_file_flush_method: Change the type to ulong, to match the
assignment in init_one_value() and handle_options() in mariabackup.
There is only one log_sys and only one log_sys.log.
log_t::files::create(): Replaces log_init().
log_t::files::close(): Replaces log_group_close(), log_group_close_all().
fil_close_log_files(): if (free) log_sys.log_close();
The callers that passed free=true used to call log_group_close_all().
log_header_read(): Replaces log_group_header_read().
log_t::files::file_header_bufs_ptr: Use a single allocation.
log_t::files::file_header_bufs[]: Statically allocate the pointers.
log_t::files::set_fields(): Replaces log_group_set_fields().
log_t::files::calc_lsn_offset(): Replaces log_group_calc_lsn_offset().
Simplify the computation by using fewer variables.
log_t::files::read_log_seg(): Replaces log_group_read_log_seg().
log_sys_t::complete_checkpoint(): Replaces log_io_complete_checkpoint().
fil_aio_wait(): Move the logic from log_io_complete().
There is only one redo log subsystem in InnoDB. Allocate the object
statically, to avoid unnecessary dereferencing of the pointer.
log_t::create(): Renamed from log_sys_init().
log_t::close(): Renamed from log_shutdown().
log_t::checkpoint_buf_ptr: Remove. Allocate log_t::checkpoint_buf
statically.
Bind more InnoDB parameters directly to MYSQL_SYSVAR and
remove "shadow variables".
innodb_change_buffering: Declare as ENUM, not STRING.
innodb_flush_method: Declare as ENUM, not STRING.
innodb_log_buffer_size: Bind directly to srv_log_buffer_size,
without rounding it to a multiple of innodb_page_size.
LOG_BUFFER_SIZE: Remove.
SysTablespace::normalize_size(): Renamed from normalize().
innodb_init_params(): A new function to initialize and validate
InnoDB startup parameters.
innodb_init(): Renamed from innobase_init(). Invoke innodb_init_params()
before actually trying to start up InnoDB.
srv_start(bool): Renamed from innobase_start_or_create_for_mysql().
Added the input parameter create_new_db.
SRV_ALL_O_DIRECT_FSYNC: Define only for _WIN32.
xb_normalize_init_values(): Merge to innodb_init_param().
MariaDB uses HAVE_LZO, not HAVE_LZO1X (which was never defined).
Also, the variable srv_lzo_disabled was never defined or read
(only declared and assigned to, in unreachable code).
file IO, rather than int.
On Windows, it is suboptimal to depend on C runtime, as it has limited
number of file descriptors. This change eliminates
os_file_read_no_error_handling_int_fd(), os_file_write_int_fd(),
OS_FILE_FROM_FD() macro.
srv_purge_should_exit(): Remove the parameter n_purged.
If we happened to have n_purged==0 while some transaction was still
active, and then that transaction was added to the history list,
we were prematurely stopping the purge. It is more appropriate to
first check for trx_sys.any_active_transactions() == 0
(this count can only decrease during shutdown) and then for
trx_sys.history_size() == 0 (that count typically decreases, but
can increase when any remaining active transactions are committed
or rolled back).
innodb.dml_purge: Remove a server restart, and explicitly wait for
purge, and use FLUSH TABLE FOR EXPORT to read the file contents.
This will make the test run faster, easier to debug, and also
allow it to run with --embedded. This might also help repeat
MDEV-11802 better. The issue MDEV-13603 remains will remain tested
by innodb.table_flags.
purge_sys_t::n_submitted: Document that it is only accessed by
srv_purge_coordinator_thread.
purge_sys_t::n_completed: Exclusively use my_atomic access.
srv_task_execute(): Simplify the code.
srv_purge_coordinator_thread(): Test the cheaper condition first.
trx_purge(): Atomically access purge_sys.n_completed.
Remove some code duplication.
trx_purge_wait_for_workers_to_complete(): Atomically access
purge_sys.n_completed. Remove an unnecessary local variable.
trx_purge_stop(): Remove a redundant assignment.
buf_flush_remove(): Disable the output for now, because we
certainly do not want this after every page flush on shutdown.
It must be rate-limited somehow. There already is a timeout
extension for waiting the page cleaner to exit in
logs_empty_and_mark_files_at_shutdown().
log_write_up_to(): Use correct format.
srv_purge_should_exit(): Move the timeout extension to the
appropriate place, from one of the callers.
Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress
Move towards progress measures rather than pure time based measures.
Progress reporting at numberious shutdown/startup locations incuding:
* For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions.
* For merging the change buffer (in srv_shutdown(bool ibuf_merge))
* For purging history, srv_do_purge
Thanks Marko for feedback and suggestions.
InnoDB always keeps all tablespaces in the fil_system cache.
The fil_system.LRU is only for closing file handles; the
fil_space_t and fil_node_t for all data files will remain
in main memory. Between startup to shutdown, they can only be
created and removed by DDL statements. Therefore, we can
let dict_table_t::space point directly to the fil_space_t.
dict_table_t::space_id: A numeric tablespace ID for the corner cases
where we do not have a tablespace. The most prominent examples are
ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file.
There are a few functional differences; most notably:
(1) DROP TABLE will delete matching .ibd and .cfg files,
even if they were not attached to the data dictionary.
(2) Some error messages will report file names instead of numeric IDs.
There still are many functions that use numeric tablespace IDs instead
of fil_space_t*, and many functions could be converted to fil_space_t
member functions. Also, Tablespace and Datafile should be merged with
fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use
fil_space_t& instead of a numeric ID, and after moving to a single
buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to
fil_space_t::page_hash.
FilSpace: Remove. Only few calls to fil_space_acquire() will remain,
and gradually they should be removed.
mtr_t::set_named_space_id(ulint): Renamed from set_named_space(),
to prevent accidental calls to this slower function. Very few
callers remain.
fseg_create(), fsp_reserve_free_extents(): Take fil_space_t*
as a parameter instead of a space_id.
fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(),
fil_name_write_rename(), fil_rename_tablespace(). Mariabackup
passes the parameter log=false; InnoDB passes log=true.
dict_mem_table_create(): Take fil_space_t* instead of space_id
as parameter.
dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter
'status' with 'bool cached'.
dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name.
fil_ibd_open(): Return the tablespace.
fil_space_t::set_imported(): Replaces fil_space_set_imported().
truncate_t: Change many member function parameters to fil_space_t*,
and remove page_size parameters.
row_truncate_prepare(): Merge to its only caller.
row_drop_table_from_cache(): Assert that the table is persistent.
dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL
if the tablespace has been discarded.
row_import_update_discarded_flag(): Remove a constant parameter.
Add fil_system_t::sys_space, fil_system_t::temp_space.
These will replace lookups for TRX_SYS_SPACE or SRV_TMP_SPACE_ID.
mtr_t::m_undo_space, mtr_t::m_sys_space: Remove.
mtr_t::set_sys_modified(): Remove.
fil_space_get_type(), fil_space_get_n_reserved_extents(): Remove.
fsp_header_get_tablespace_size(), fsp_header_inc_size():
Merge to the only caller, innobase_start_or_create_for_mysql().
ha_innobase::check_if_supported_inplace_alter(): Only check for
high_level_read_only. Do not unnecessarily refuse
ALTER TABLE...ALGORITHM=INPLACE if innodb_force_recovery was
specified as 1, 2, or 3.
innobase_start_or_create_for_mysql(): Block all writes from SQL
if the system tablespace was initialized with 'newraw'.
There is only one lock_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.
lock_sys_t::create(): The deferred constructor.
lock_sys_t::close(): The early destructor.
There is only one purge_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.
purge_sys_t::create(): The deferred constructor.
purge_sys_t::close(): The early destructor.
undo::Truncate::create(): The deferred constructor.
Because purge_sys.undo_trunc is constructed before the start-up
parameters are parsed, the normal constructor would copy a
wrong value of srv_purge_rseg_truncate_frequency.
TrxUndoRsegsIterator: Do not forward-declare an inline constructor,
because the static construction of purge_sys.rseg_iter would not have
access to it.
trx_purge(): Remove the parameter limit or batch_size, which is
always passed as srv_purge_batch_size.
trx_purge_attach_undo_recs(): Remove the parameters purge_sys, batch_size.
Refer to srv_purge_batch_size.
trx_purge_wait_for_workers_to_complete(): Remove the parameter purge_sys.
Add innodb debug system variable, innodb_buffer_pool_load_pages_abort, to test
the behaviour of innodb_buffer_pool_load_incomplete.
(innodb_buufer_pool_dump_abort_loads.test)
This is based on a prototype by
Thirunarayanan Balathandayuthapani <thiru@mariadb.com>.
Binlog and Galera write-set replication information was written into
TRX_SYS page on each commit. Instead of writing to the TRX_SYS during
normal operation, InnoDB can make use of rollback segment header pages,
which are already being written to during a commit.
The following list of fields in rollback segment header page are added:
TRX_RSEG_BINLOG_OFFSET
TRX_RSEG_BINLOG_NAME (NUL-terminated; empty name = not present)
TRX_RSEG_WSREP_XID_FORMAT (0=not present; 1=present)
TRX_RSEG_WSREP_XID_GTRID
TRX_RSEG_WSREP_XID_BQUAL
TRX_RSEG_WSREP_XID_DATA
trx_sys_t: Introduce the fields
recovered_binlog_filename, recovered_binlog_offset, recovered_wsrep_xid.
To facilitate upgrade from older mysql or mariaDB versions, we will read
the information in TRX_SYS page. It will be overridden by the
information that we find in rollback segment header pages.
Mariabackup --prepare will read the metadata from the rollback
segment header pages via trx_rseg_array_init(). It will still
not read any undo log pages or recover any transactions.
trx_sys_t::rseg_history_len: Make private, and clarify the
documentation.
trx_sys_t::history_size(): Read rseg_history_len.
trx_sys_t::history_insert(), trx_sys_t::history_remove(),
trx_sys_t::history_add(): Update rseg_history_len.