An assertion
`server_state_.rollback_mode() == wsrep::server_state::rm_async`
fired in before_command() when
- thread-handling was set to pool-of-threads and
- a BF abort happened between client session calls to
wait_rollback_complete_and_acquire_ownership() and before_command().
This commit introduces a test case to reproduce the crash and
updates wsrep-lib submodule to fixed version.
The sporadic test hangs happen because of mutex dealock between innodb
background threads and two test connection executions.
The test sets variable innodb_disallow_writes, which blocks all writes
to filesyste. The test logic is to execute an INSERT, which should hang
because of filesytstem writes are blocked, and through another session
verify by SELECT that this hanging happens. The SELECT session will then
release innodb_disallow_writes blocking.
However, filesystem write blocking affects also innodb background threads
and they may hang while keeping some other resources locked.
As an example, in one test hang situation, buffer pool access was blocked.
And, if buffer pool is blocked, the test connections will be blocked as well,
and the SELECT session will not be able to continue to release the
innodb_disallow_writes.
The fix in this commit is refactoring of the test logic.
The test will now set first innodb_disallow_writes blocking, and then record
a hash of data directory's filesystem contents. This works as checksum of the
state of data on the datadirectory.
Then some SQL load is tried on both nodes, these sessions will be blocking
due to frozen file system state. The test will have a short sleep to allow
innodb background threads to loop and possibly encounter innodb_disallow_writes
blocking as well.
After the sleep, the test will record file system checksun for the second time,
and then release the innodb_disallow-writes blocking.
Finally, the two checksums are compared, they should be identical to verify that
nothing was written on datadirectory during the test execution.
The checksum is implemented by md5sum hash over all files found in datadirectory
by find command. all these file hashes are hashed together by one more md5sum.
The test therefore depends on md5sum and find. find may work differently with some
OS distributions, e.g. freebsd may be problematic.
galera_toi_truncate test launches a long term INSERT statement in node 2,
and then submits an offending TRUNCATE through node 1. The idea is that
the replicated TRUNCATE will conflict with INSERT in node 2, and force
the INSERT to abort.
The test first issues --send INSERT in node 2, and then switches to node 1
to launch --send TRUNCATE. As the INSERT is launched asynchronously by --send,
it may happen that INSERT has not yet started to process, before the TRUNCATE
is replicated. The net effect may be that TRUCATE processes to completion in node 2,
and only after that INSERT starts to execute. As the INSERT is very long query,
it will last longer than mtr test suite max test time, the test will fail for timeout.
The fix in this commit uses another connection in node 2, to wait until the INSERT
has started to process in node 2. TRUNCATE in node 1, will be submitted in node 1
after this wait condition.
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the victim's transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data
If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.
The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter
records the aborter THD's ID. This is needed to preserve the right to kill
the victim from different locations for the same aborter thread.
It is also good error logging, to see who is reponsible for the abort.
A new test case was added in galera.galera_bf_kill_debug.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
When a transaction fails in certification phase, it has connsumed one GTID, but as
transaction must rollback, it will not go for commit ordering, and because of this
also the wsrep XID checkpointing can happen out of order.
This PR will make the thread, which has failed for certiication failure to wait for its
commit order turn for checkpointing wsrep IXD in innodb rollback segment.
There is a specific test for wsrep XID checkpointing ordering in mtr test:
mysql-wsrep-bugs-607, which is added in this PR.
Test galera_slave_replay depends also on this fix, as the second test phase
may also assert for bad wsrep XID checkpointing order.
galera_slave_replay.test had also other problems, which caused the test to
fail immediately, thse are now fixes in this PR as well.
The galera.galera_parallel_autoinc_manytrx mtr test opens and runs test
scenario through 3 connections to node 1 and one connection to node 2.
In the test initialization phase, the test creates two tables 't1' and 'ten'
and then creates a stored procedure 'p1' to operate on these tables.
These 3 create DDL statements are issued through same connection to node 1.
In the next test phase, the mtr script uses send command to launch the call
for the p1 stored procedure through all 3 connections to node 1 and through
one connection to node 2. As the mtr send command is asynchronous,
this test phase is non blocking and fast operation.
Now, if the replication between nodes is slow, it may happen that the
initialization phase DDL statements have not been received or have not been
fully applied in node 2. Therefore there is no guarantee that the test tables
and the stored procedure have been created in node 2. Yet, the test is trying
to call p1 in node 2.
In the failure case error logs, there is error message
"MTR failed: query 'reap' failed: 1305: PROCEDURE test.p1 does not exist"
The reap command through connection to node 2, is the first place where test
execution may observe that test tables and/or stored procedure are not yet
created in node 2.
The fix in this commit adds a wait condition in connection to node 2, to wait
until the stored procedure is created before calling the stored procedure.
The wait is implemented by looking in information_schema.routines for the p1
stored procedure.
MDEV-22140 galera.galera_drop_database MTR failed: InnoDB: MySQL is trying to drop database `fts`.`` though there are still open handles
Add wait conditions to wait that all operations are done in both
nodes.
The galera.galera_parallel_autoinc_manytrx mtr test opens and runs test
scenario through 3 connections to node 1 and one connection to node 2.
In the test initialization phase, the test creates two tables 't1' and 'ten'
and then creates a stored procedure 'p1' to operate on these tables.
These 3 create DDL statements are issued through same connection to node 1.
In the next test phase, the mtr script uses send command to launch the call
for the p1 stored procedure through all 3 connections to node 1 and through
one connection to node 2. As the mtr send command is asynchronous,
this test phase is non blocking and fast operation.
Now, if the replication between nodes is slow, it may happen that the
initialization phase DDL statements have not been received or have not been
fully applied in node 2. Therefore there is no guarantee that the test tables
and the stored procedure have been created in node 2. Yet, the test is trying
to call p1 in node 2.
In the failure case error logs, there is error message
"MTR failed: query 'reap' failed: 1305: PROCEDURE test.p1 does not exist"
The reap command through connection to node 2, is the first place where test
execution may observe that test tables and/or stored procedure are not yet
created in node 2.
The fix in this commit adds a wait condition in connection to node 2, to wait
until the stored procedure is created before calling the stored procedure.
The wait is implemented by looking in information_schema.routines for the p1
stored procedure.
Backported the support for aborting and replaying stored procedure and fix for trigger
key assigments from 10.4 version.
Backported also two mtr tests: wsrep_sp_bf_abort and MDEV-20225
The test was changing wsrep_on option in node_3, which is native
MariaDB server (i.e. not a cluster node). Native NariaDB server
should not manipulate wsrep replication state, this problem is fixed.
galera.galera_slave_replay test phase 2 will cause certification failure
for async slave SQL handler thread. This certification failure is now
monitored and required to happen in the test.
The test phase 2, generates scenario, where async slave SQL handler faces
certification failure and galera slave applier is paused when this happens.
This makes the test vulnerable for anomaly described in MDEV-22632.
Therefore the fix in this commit depends on MDEV-22632, and should be merged
after the fix for MDEV-22632.
The hang can happen between a lock connection issuing KILL CONNECTION for a victim,
which is in committing phase.
There happens two resource deadlockwhere killer is holding victim's
LOCK_thd_data and requires trx mutex for the victim.
The victim, otoh, holds his own trx mutex, but requires LOCK_thd_data
in wsrep_commit_ordered(). Hence a classic two thread deadlock happens.
The fix in this commit changes innodb commit so that wsrep_commit_ordered()
is not called while holding trx mutex. With this, wsrep patch commit time mutex
locking does not violate the locking protocol of KILL command
(i.e. LOCK_thd_data -> trx mutex)
Also, a new test case has been added in galera.galera_bf_kill.test for scenario
where a client connection is killed in committting phase.
Problem was that trx->lock.was_chosen_as_wsrep_victim variable was
not set back to false after it was set true.
wsrep_thd_bf_abort
Add assertions for correct mutex status and take necessary
mutexes before calling thd->awake_no_mutex().
innobase_rollback_trx()
Reset trx->lock.was_chosen_as_wsrep_victim
wsrep_abort_slave_trx()
Removed unused function.
wsrep_innobase_kill_one_trx()
Added function comment, removed unnecessary parameters
and added debug assertions to enforce correct usage. Added
more debug output to help out on error analysis.
wsrep_abort_transaction()
Added debug assertions and removed unused variables.
trx0trx.h
Removed assert_trx_is_free macro and replaced it with
assert_freed() member function.
trx_create()
Use above assert_free() and initialize wsrep variables.
trx_free()
Use assert_free()
trx_t::commit_in_memory()
Reset lock.was_chosen_as_wsrep_victim
trx_rollback_for_mysql()
Reset trx->lock.was_chosen_as_wsrep_victim
Add test case galera_bf_kill