trx_roll_must_shutdown(): During the rollback of recovered transactions,
report progress and check if the rollback should be interrupted because
of a pending shutdown.
trx_roll_max_undo_no, trx_roll_progress_printed_pct: Remove, along with
the messages that were interleaved with other messages.
row_undo_step(), trx_rollback_active(): Abort the rollback of a
recovered ordinary transaction if fast shutdown has been initiated.
trx_rollback_resurrected(): Convert an aborted-rollback transaction
into a fake XA PREPARE transaction, so that fast shutdown can proceed.
trx_rollback_resurrected(): If shutdown was initiated, fake all
remaining active transactions to XA PREPARE state, so that shutdown
can proceed. Also, make the parameter "all" an output that will be
assigned to FALSE in this case.
trx_rollback_or_clean_recovered(): Remove the shutdown check
(it was moved to trx_rollback_resurrected()).
trx_undo_free_prepared(): Relax assertions.
row_quiesce_table_start(), row_quiesce_table_complete():
Use the more appropriate predicate srv_undo_sources for skipping
purge control. (This change alone is insufficient; it is possible
that this predicate will change during the call to trx_purge_stop()
or trx_purge_run().)
trx_purge_stop(), trx_purge_run(): Tolerate PURGE_STATE_EXIT.
It is very well possible to initiate shutdown soon after the statement
FLUSH TABLES FOR EXPORT has been submitted to execution.
srv_purge_coordinator_thread(): Ensure that the wait for purge_sys->event
in trx_purge_stop() will terminate when the coordinator thread exits.
ha_print_info(): Remove.
srv_printf_innodb_monitor(): Do not acquire btr_search_latches[]
Add the equivalent functionality that was part of the non-debug
version of ha_print_info().
When the transaction isolation level is SERIALIZABLE, or when
a locking read is performed in the REPEATABLE READ isolation level,
InnoDB must lock delete-marked records in order to prevent another
transaction from inserting something.
However, at READ UNCOMMITTED or READ COMMITTED isolation level or
when the parameter innodb_locks_unsafe_for_binlog is set, the
repeatability of the reads does not matter, and there is no need
to lock any records.
row_search_mvcc(): Skip locks on delete-marked committed records upfront,
instead of invoking row_unlock_for_mysql() afterwards. The unlocking
never worked for secondary index records.
dict_stats_rename_table(): After DB_LOCK_WAIT_TIMEOUT
or DB_DUPLICATE_KEY, reset the trx->error_state before retrying.
Also, properly treat DB_DEADLOCK as a hard error.
The assertion failure was caused by
MDEV-14511 Use fewer transactions for updating InnoDB persistent statistics
We are reusing a transaction object after commit, and sometimes,
even after a successful operation, the trx_t::error_state may be
something else than DB_SUCCESS. Reset the field when needed.
This is 10.1 version where no merge error exists.
wsrep_on_check
New check function. Galera can't be enabled
if innodb-lock-schedule-algorithm=VATS.
innobase_kill_query
In Galera async kill we could own lock mutex.
innobase_init
If Variance-Aware-Transaction-Sheduling Algorithm (VATS) is
used on Galera we refuse to start InnoDB.
Changed innodb-lock-schedule-algorithm as read-only parameter
as it was designed to be.
lock_rec_other_has_expl_req,
lock_rec_other_has_conflicting,
lock_rec_lock_slow
lock_table_other_has_incompatible
lock_rec_insert_check_and_lock
Change pointer to conflicting lock to normal pointer as this
pointer contents could be changed later.
Starting with MySQL 5.7 (or MariaDB 10.2.2) InnoDB no longer contains
the "table monitor" or "tablespace monitor". The conditions on
srv_print_innodb_tablespace_monitor, srv_print_innodb_table_monitor
never hold. So, the code was dead.
Also, remove a bogus reference to dict_print(), which used to implement
the InnoDB table monitor.
Relax memory barrier for lock_word.
rw_lock_lock_word_decr() - used to acquire rw-lock, thus we only need to issue
ACQUIRE when we succeed locking.
rw_lock_x_lock_func_nowait() - same as above, but used to attempt to acquire
X-lock.
rw_lock_s_unlock_func() - used to release S-lock, RELEASE is what we need here.
rw_lock_x_unlock_func() - used to release X-lock. Ideally we'd need only RELEASE
here, but due to mess with waiters (they must be loaded after lock_word is
stored) we have to issue both ACQUIRE and RELEASE.
rw_lock_sx_unlock_func() - same as above, but used to release SX-lock.
rw_lock_s_lock_spin(), rw_lock_x_lock_func(), rw_lock_sx_lock_func() -
fetch-and-store to waiters has to issue only ACQUIRE memory barrier, so that
waiters are stored before lock_word is loaded.
Note that there is violation of RELEASE-ACQUIRE protocol here, because we do
on lock:
my_atomic_fas32_explicit((int32*) &lock->waiters, 1, MY_MEMORY_ORDER_ACQUIRE);
my_atomic_load32_explicit(&lock->lock_word, MY_MEMORY_ORDER_RELAXED);
on unlock
my_atomic_add32_explicit(&lock->lock_word, X_LOCK_DECR, MY_MEMORY_ORDER_ACQ_REL);
my_atomic_load32_explicit((int32*) &lock->waiters, MY_MEMORY_ORDER_RELAXED);
That is we kind of synchronize ACQUIRE on lock_word with ACQUIRE on waiters.
It was there before this patch. Simple fix may have negative performance impact.
Proper fix requires refactoring of lock_word.
Relax memory barrier for waiters: these 2 stores must be completed before
os_event_set() finishes. This is guaranteed by RELEASE barrier issued by
mutex.exit() of os_event_set().
Remove volatile modifier from waiters: it's not supposed for inter-thread
communication, use appropriate atomic operations instead.
Changed waiters to int32_t, my_atomic friendly type.
Allow DROP TABLE `#mysql50##sql-...._.` to drop tables that were
being rebuilt by ALGORITHM=INPLACE
NOTE: If the server is killed after the table-rebuilding ALGORITHM=INPLACE
commits inside InnoDB but before the .frm file has been replaced, then
the recovery will involve something else than DROP TABLE.
NOTE: If the server is killed in a true inplace ALTER TABLE commits
inside InnoDB but before the .frm file has been replaced, then we
are really out of luck. To properly handle that situation, we would
need a transactional mysql.ddl_fixup table that directs recovery to
rename or remove files.
prepare_inplace_alter_table_dict(): Use the altered_table->s->table_name
for generating the new_table_name.
table_name_t::part_suffix: The start of the partition name suffix.
table_name_t::dbend(): Return the end of the schema name.
table_name_t::dblen(): Return the length of the schema name, in bytes.
table_name_t::basename(): Return the name without the schema name.
table_name_t::part(): Return the partition name, or NULL if none.
row_drop_table_for_mysql(): Assert for #sql, not #sql-ib.
fseg_alloc_free_page_low(): Remove a bogus and redundant assertion about
fil_space_t::purpose. The debug function fsp_space_modify_check()
is asserting something similar, but more accurately.
When logging ROW_T_INSERT or ROW_T_UPDATE records, we did not normalize
the DB_TRX_ID of the current transaction into 0 if the current transaction
had started (modifying other tables) before the ALTER TABLE started.
MDEV-13654 introduced this normalization for ROW_T_DELETE
and for all operations with ADD PRIMARY KEY, in row_log_table_get_pk().
Problem was a merge error from MySQL wsrep i.e. Galera.
wsrep_on_check
New check function. Galera can't be enabled
if innodb-lock-schedule-algorithm=VATS.
innobase_kill_query
In Galera async kill we could own lock mutex.
innobase_init
If Variance-Aware-Transaction-Sheduling Algorithm (VATS) is
used on Galera we fall back to First-Come-First-Served (FCFS)
with notice to user.
Changed innodb-lock-schedule-algorithm as read-only parameter
as it was designed to be.
lock_reset_lock_and_trx_wait
Use ib::hex() to print out transaction ID.
lock_rec_other_has_expl_req,
lock_rec_other_has_conflicting,
RecLock::add_to_waitq
lock_rec_lock_slow
lock_table_other_has_incompatible
lock_rec_insert_check_and_lock
lock_prdt_other_has_conflicting
Change pointer to conflicting lock to normal pointer as this
pointer contents could be changed later.
RecLock::create
Conclicting lock pointer is moved to last parameter with
default value NULL. This conflicting transaction could
be selected as victim in Galera if requesting transaction
is BF (brute force) transaction. In this case contents
of conflicting lock pointer will be changed. Use ib::hex() to print
transaction ids.
Introduce the debug flag trx_t::persistent_stats to suppress the
assertion for the updates of persistent statistics during fast
shutdown.
dict_stats_exec_sql(): Do execute the statement even though shutdown
has been initiated.
dict_stats_exec_sql(): Expect the caller to always provide a transaction.
Remove some redundant assertions. The caller must hold dict_sys->mutex,
but holding dict_operation_lock is only necessary for accessing
data dictionary tables, which we are not accessing.
dict_stats_save_index_stat(): Acquire dict_sys->mutex
for invoking dict_stats_exec_sql().
dict_stats_save(), dict_stats_update_for_index(), dict_stats_update(),
dict_stats_drop_index(), dict_stats_delete_from_table_stats(),
dict_stats_delete_from_index_stats(), dict_stats_drop_table(),
dict_stats_rename_in_table_stats(), dict_stats_rename_in_index_stats(),
dict_stats_rename_table(): Use a single caller-provided
transaction that is started and committed or rolled back by the caller.
dict_stats_process_entry_from_recalc_pool(): Let the caller provide
a transaction object.
ha_innobase::open(): Pass a transaction to dict_stats_init().
ha_innobase::create(), ha_innobase::discard_or_import_tablespace():
Pass a transaction to dict_stats_update().
ha_innobase::rename_table(): Pass a transaction to
dict_stats_rename_table(). We do not use the same transaction
as the one that updated the data dictionary tables, because
we already released the dict_operation_lock. (FIXME: there is
a race condition; a lock wait on SYS_* tables could occur
in another DDL transaction until the data dictionary transaction
is committed.)
ha_innobase::info_low(): Pass a transaction to dict_stats_update()
when calculating persistent statistics.
alter_stats_norebuild(), alter_stats_rebuild(): Update the
persistent statistics as well. In this way, a single transaction
will be used for updating the statistics of a whole table, even
for partitioned tables.
ha_innobase::commit_inplace_alter_table(): Drop statistics for
all partitions when adding or dropping virtual columns, so that
the statistics will be recalculated on the next handler::open().
This is a refactored version of Oracle Bug#22469660 fix.
RecLock::add_to_waitq(), lock_table_enqueue_waiting():
Do not allow a lock wait to occur for updating statistics
in a data dictionary transaction, such as DROP TABLE. Instead,
return the previously unused error code DB_QUE_THR_SUSPENDED.
row_merge_lock_table(), row_mysql_lock_table(): Remove dead code
for handling DB_QUE_THR_SUSPENDED.
row_drop_table_for_mysql(), row_truncate_table_for_mysql():
Drop the statistics as part of the data dictionary transaction.
After TRUNCATE TABLE, the statistics will be recalculated on
subsequent ha_innobase::open(), similar to how the logic after
the above-mentioned Oracle Bug#22469660 fix in
ha_innobase::commit_inplace_alter_table() works.
btr_defragment_thread(): Use a single transaction object for
updating defragmentation statistics.
dict_stats_save_defrag_stats(), dict_stats_save_defrag_stats(),
dict_stats_process_entry_from_defrag_pool(),
dict_defrag_process_entries_from_defrag_pool(),
dict_stats_save_defrag_summary(), dict_stats_save_defrag_stats():
Add a parameter for the transaction.
dict_stats_empty_table(): Make public. This will be called by
row_truncate_table_for_mysql() after dropping persistent statistics,
to clear the memory-based statistics as well.
Silence the error log output that was introduced in MySQL 5.7
(MariaDB 10.2.2) if log_warnings=2 or less.
We should still figure out what these messages really indicate
and how to solve the problems.
pc_sleep_if_needed(): Add a parameter for the current time,
so that there will be fewer successive calls to ut_time_ms()
with no I/O between them.
buf_flush_page_cleaner_coordinator(): Exit the first loop
whenever shutdown has been requested. At the start of the loop,
call ut_time_ms() only once. Do not display the message if
log_warnings=2 or less.