* Disallow setting wsrep_on = 1 if wsrep_provider is unset. Also, move
wsrep_on_basic from sys_vars to wsrep suite: this test now requires
to run with wsrep_provider set
* Disallow setting @@session.wsrep_on = 1 when @@global.wsrep_on = 0
* Handle the case where a new connection turns @@global.wsrep_on from
off to on. In this case we would miss a call to wsrep_open, causing
unexpected states in wsrep::client_state (causing assertions).
* Disable wsrep.MDEV-22443 because it is no longer possible to enable
wsrep_on, if server is started with wsrep_provider='none'
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the victim's transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data
If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.
The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter
records the aborter THD's ID. This is needed to preserve the right to kill
the victim from different locations for the same aborter thread.
It is also good error logging, to see who is reponsible for the abort.
A new test case was added in galera.galera_bf_kill_debug.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
The hang can happen between a lock connection issuing KILL CONNECTION for a victim,
which is in committing phase.
There happens two resource deadlockwhere killer is holding victim's
LOCK_thd_data and requires trx mutex for the victim.
The victim, otoh, holds his own trx mutex, but requires LOCK_thd_data
in wsrep_commit_ordered(). Hence a classic two thread deadlock happens.
The fix in this commit changes innodb commit so that wsrep_commit_ordered()
is not called while holding trx mutex. With this, wsrep patch commit time mutex
locking does not violate the locking protocol of KILL command
(i.e. LOCK_thd_data -> trx mutex)
Also, a new test case has been added in galera.galera_bf_kill.test for scenario
where a client connection is killed in committting phase.
Problem was that trx->lock.was_chosen_as_wsrep_victim variable was
not set back to false after it was set true.
wsrep_thd_bf_abort
Add assertions for correct mutex status and take necessary
mutexes before calling thd->awake_no_mutex().
innobase_rollback_trx()
Reset trx->lock.was_chosen_as_wsrep_victim
wsrep_abort_slave_trx()
Removed unused function.
wsrep_innobase_kill_one_trx()
Added function comment, removed unnecessary parameters
and added debug assertions to enforce correct usage. Added
more debug output to help out on error analysis.
wsrep_abort_transaction()
Added debug assertions and removed unused variables.
trx0trx.h
Removed assert_trx_is_free macro and replaced it with
assert_freed() member function.
trx_create()
Use above assert_free() and initialize wsrep variables.
trx_free()
Use assert_free()
trx_t::commit_in_memory()
Reset lock.was_chosen_as_wsrep_victim
trx_rollback_for_mysql()
Reset trx->lock.was_chosen_as_wsrep_victim
Add test case galera_bf_kill
When connections go to same node and deadlock happens, BF abort should
not happen for victim thread. Fixed by guarding
`wsrep_handle_SR_rollback()` so that is called only for SR transactions.
Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Daniele Sciascia <daniele.sciascia@galeracluster.com>
This patch contains two fixes:
* wsrep_handle_mdl_conflict(): handle the case where SR transaction
is in aborting state. Previously, a BF-BF conflict was reported, and
the process would abort.
* wsrep_thd_bf_abort(): do not restore thread vars after calling
wsrep_bf_abort(). Thread vars are already restored in wsrep-lib if
necessary. This also removes the assumption that the caller of
wsrep_thd_bf_abort() is the given bf_thd, which is not the case.
Also in this patch:
* Remove unnecessary check for active victim transaction in
wsrep_thd_bf_abort(): the exact same check is performed later in
wsrep_bf_abort().
* Make wsrep_thd_bf_abort() and wsrep_log_thd() const-correct.
* Change signature of wsrep_abort_thd() to take THD pointers instead
of void pointers.
The original crash happened when async replication IO thread was updating mysql.gtid_slave_pos table. Operations on this table should remain node local, but it appears that protection (THD::wsrep_ignore_table flag) to prevent wsrep replication for this table mas missing for innodb write_row() and update_row().
It was somewhat difficult to reproduce the issue, because mtr seems to create the affected table mysql.gtid_log_pos as of Aria engine type, and Aria engine operations will not be replicated anyhow. It looks, though, that in release installation, mysql.gtid_slave_pos table is of InnoDB engine.
It was possible to trigger somewhat related problem by running test galera.galera_as_slave_gtid with configuration: gtid_pos_auto_engines=InnoDB. However, this test mode, causes earlier crash when replication background thread creates aditional table: mysql.gtid_slave_pos_InnoDB, and this table create triggered wsrep TOI replication, which also failed for assertion. Actually, async replication IO and background threads should not replicate anything to cluster.
This pull request contains new test galera.galera_as_slave_gtid_auto_engine, which basically just runs galera.galera_as_slave_gtid with configuration of gtid_pos_auto_engines=InnoDB.
Test galera.galera_as_slave_gtid is also modified for better code reuse.
Actual fix for MDEV-21096 is in storage/innobase/handler/ha_innodb.cc, where THD::wsrep_ignore_table flag is now honored before wsrep key population.
There is additional fix in sql/service_wsrep.cc where async replication IO and background threads are marked as non-local. This fences these threads out of wsrep replication altogether. Note that this change, actually makes the use of THD::wsrep_ignore-table redundant. We may want to refactor THD::wsrep_ignore_table out in the future, if there is no other use case for it in sight.
MariaDB 10.4 was crashing when thread-handling was set to
pool-of-threads and wsrep was enabled.
There were two apparent reasons for the crash:
- Connection handling in threadpool_common.cc was missing calls to
control wsrep client state.
- Thread specific storage which contains thread variables (THR_KEY_mysys)
was not handled appropriately by wsrep patch when pool-of-threads
was configured.
This patch addresses the above issues in the following way:
- Wsrep client state open/close was moved in thd_prepare_connection() and
end_connection() to have common handling for one-thread-per-connection
and pool-of-threads.
- Thread local storage handling in wsrep patch was reworked by introducing
set of wsrep_xxx_threadvars() calls which replace calls to
THD store_globals()/reset_globals() and deal with thread handling
specifics internally.
Wsrep-lib was updated to version which relaxes internal concurrency
related sanity checks.
Rollback code from wsrep_rollback_process() was extracted to separate calls
for better readability.
Post rollback thread was removed as it was completely unused.
* MDEV-16509 Improve wsrep commit performance with binlog disabled
Release commit order critical section early after trx_commit_low() if
binlog is not transaction coordinator. In order to avoid two phase commit,
binlog_hton is not registered for THD during IO_CACHE population.
Implemented a test which verifies that the transactions release
commit order early.
This optimization will change behavior during recovery as the commit
is not two phase when binlog is off. Fixed and recorded wsrep-recover-v25
and wsrep-recover to match the behavior.
* MDEV-18730 Ordering for wsrep binlog group commit
Previously out of order execution was allowed for wsrep commits.
Established proper ordering by populating wait_for_commit
for every wsrep THD and making group commit leader to wait for
prior commits before proceeding to trx_group_commit_leader().
* MDEV-18730 Added a test case to verify correct commit ordering
* MDEV-16509, MDEV-18730 Review fixes
Use WSREP_EMULATE_BINLOG() macro to decide if the binlog_hton
should be registered. Whitespace/syntax fixes and cleanups.
* MDEV-16509 Require binlog for galera_var_innodb_disallow_writes test
If the commit to InnoDB is done in one phase, the native InnoDB behavior
is that the transaction is committed in memory before it is persisted to
disk. This means that the innodb_disallow_writes=ON may not prevent
transaction to become visible to other readers before commit is completely
over. On the other hand, if the commit is two phase (as it is with binlog),
the transaction will be blocked in prepare phase.
Fixed the test to use binlog, which enforces two phase commit, which
in turn makes commit to block before the changes become visible to
other connections. This guarantees that the test produces expected
result.
The InnoDB DeadlockChecker::check_and_resolve() was missing a
call to wsrep_handle_SR_rollback() in the case when the
transaction running deadlock detection was chosen as victim.
Refined wsrep_handle_SR_rollback() to skip store_globals() calls
if the transaction was BF aborting itself.
Made mysql-wsrep-features#165 more deterministic by waiting until
the update is in progress before sending next update.