wsrep_plugin_init(), wsrep_plugin_deinit(): Remove these dummy functions
in order to fix an error that would be flagged by cmake -DWITH_UBSAN=ON
when using clang.
wsrep_show_ready(), wsrep_show_bf_aborts(): Correct the signature.
The reason for this change are the following:
- If we call set_killed() from one thread to kill another thread with
a message, there may be concurrent usage of the MEM_ROOT which is
not supported (this could cause memory corruption).
We do not currently have code that does this, but the API allows this
and it is better to be fix the issue before it happens.
- The per thread memory tracking does not work if one thread uses
another threads MEM_ROOT.
- set_killed() can be called if a MEM_ROOT allocation fails. In this case
it is not good to try to allocate more memory from potentially the same
MEM_ROOT.
Fix is to use my_malloc() instead of mem_root for killed messages.
This commit contains a merge from 10.5-MDEV-29293-squash
into 10.6.
Although the bug MDEV-29293 was not reproducible with 10.6,
the fix contains several improvements for wsrep KILL query and
BF abort handling, and addresses the following issues:
* MDEV-30307 KILL command issued inside a transaction is
problematic for galera replication:
This commit will remove KILL TOI replication, so Galera side
transaction context is not lost during KILL.
* MDEV-21075 KILL QUERY maintains nodes data consistency but
breaks GTID sequence: This is fixed as well as KILL does not
use TOI, and thus does not change GTID state.
* MDEV-30372 Assertion in wsrep-lib state: This was caused by
BF abort or KILL when local transaction was in the middle
of group commit. This commit disables THD::killed handling
during commit, so the problem is avoided.
* MDEV-30963 Assertion failure !lock.was_chosen_as_deadlock_victim
in trx0trx.h:1065: The assertion happened when the victim was
BF aborted via MDL while it was committing. This commit changes
MDL BF aborts so that transactions which are committing cannot
be BF aborted via MDL. The RQG grammar attached in the issue
could not reproduce the crash anymore.
Original commit message from 10.5 fix:
MDEV-29293 MariaDB stuck on starting commit state
The problem seems to be a deadlock between KILL command execution
and BF abort issued by an applier, where:
* KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
* Applier has innodb side global lock mutex and victim trx mutex.
* KILL is calling innobase_kill_query, and is blocked by innodb
global lock mutex.
* Applier is in wsrep_innobase_kill_one_trx and is blocked by
victim's LOCK_thd_kill.
The fix in this commit removes the TOI replication of KILL command
and makes KILL execution less intrusive operation. Aborting the
victim happens now by using awake_no_mutex() and ha_abort_transaction().
If the KILL happens when the transaction is committing, the
KILL operation is postponed to happen after the statement
has completed in order to avoid KILL to interrupt commit
processing.
Notable changes in this commit:
* wsrep client connections's error state may remain sticky after
client connection is closed. This error message will then pop
up for the next client session issuing first SQL statement.
This problem raised with test galera.galera_bf_kill.
The fix is to reset wsrep client error state, before a THD is
reused for next connetion.
* Release THD locks in wsrep_abort_transaction when locking
innodb mutexes. This guarantees same locking order as with applier
BF aborting.
* BF abort from MDL was changed to do BF abort on server/wsrep-lib
side first, and only then do the BF abort on InnoDB side. This
removes the need to call back from InnoDB for BF aborts which originate
from MDL and simplifies the locking.
* Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
The manipulation of the wsrep_aborter can be done solely on
server side. Moreover, it is now debug only variable and
could be excluded from optimized builds.
* Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
fine grained locking for SR BF abort which may require locking
of victim LOCK_thd_kill. Added explicit call for
wsrep_thd_kill_LOCK/UNLOCK where appropriate.
* Wsrep-lib was updated to version which allows external
locking for BF abort calls.
Changes to MTR tests:
* Disable galera_bf_abort_group_commit. This test is going to
be removed (MDEV-30855).
* Make galera_var_retry_autocommit result more readable by echoing
cases and expectations into result. Only one expected result for
reap to verify that server returns expected status for query.
* Record galera_gcache_recover_manytrx as result file was incomplete.
Trivial change.
* Make galera_create_table_as_select more deterministic:
Wait until CTAS execution has reached MDL wait for multi-master
conflict case. Expected error from multi-master conflict is
ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
wsrep transaction when it is waiting for MDL, query gets interrupted
instead of BF aborted. This should be addressed in separate task.
* A new test galera_bf_abort_registering to check that registering trx gets
BF aborted through MDL.
* A new test galera_kill_group_commit to verify correct behavior
when KILL is executed while the transaction is committing.
Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
The problem seems to be a deadlock between KILL command execution
and BF abort issued by an applier, where:
* KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
* Applier has innodb side global lock mutex and victim trx mutex.
* KILL is calling innobase_kill_query, and is blocked by innodb
global lock mutex.
* Applier is in wsrep_innobase_kill_one_trx and is blocked by
victim's LOCK_thd_kill.
The fix in this commit removes the TOI replication of KILL command
and makes KILL execution less intrusive operation. Aborting the
victim happens now by using awake_no_mutex() and ha_abort_transaction().
If the KILL happens when the transaction is committing, the
KILL operation is postponed to happen after the statement
has completed in order to avoid KILL to interrupt commit
processing.
Notable changes in this commit:
* wsrep client connections's error state may remain sticky after
client connection is closed. This error message will then pop
up for the next client session issuing first SQL statement.
This problem raised with test galera.galera_bf_kill.
The fix is to reset wsrep client error state, before a THD is
reused for next connetion.
* Release THD locks in wsrep_abort_transaction when locking
innodb mutexes. This guarantees same locking order as with applier
BF aborting.
* BF abort from MDL was changed to do BF abort on server/wsrep-lib
side first, and only then do the BF abort on InnoDB side. This
removes the need to call back from InnoDB for BF aborts which originate
from MDL and simplifies the locking.
* Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
The manipulation of the wsrep_aborter can be done solely on
server side. Moreover, it is now debug only variable and
could be excluded from optimized builds.
* Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
fine grained locking for SR BF abort which may require locking
of victim LOCK_thd_kill. Added explicit call for
wsrep_thd_kill_LOCK/UNLOCK where appropriate.
* Wsrep-lib was updated to version which allows external
locking for BF abort calls.
Changes to MTR tests:
* Disable galera_bf_abort_group_commit. This test is going to
be removed (MDEV-30855).
* Record galera_gcache_recover_manytrx as result file was incomplete.
Trivial change.
* Make galera_create_table_as_select more deterministic:
Wait until CTAS execution has reached MDL wait for multi-master
conflict case. Expected error from multi-master conflict is
ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
wsrep transaction when it is waiting for MDL, query gets interrupted
instead of BF aborted. This should be addressed in separate task.
* A new test galera_kill_group_commit to verify correct behavior
when KILL is executed while the transaction is committing.
Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This is a backport from 10.5.
The problem seems to be a deadlock between KILL command execution
and BF abort issued by an applier, where:
* KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
* Applier has innodb side global lock mutex and victim trx mutex.
* KILL is calling innobase_kill_query, and is blocked by innodb
global lock mutex.
* Applier is in wsrep_innobase_kill_one_trx and is blocked by
victim's LOCK_thd_kill.
The fix in this commit removes the TOI replication of KILL command
and makes KILL execution less intrusive operation. Aborting the
victim happens now by using awake_no_mutex() and ha_abort_transaction().
If the KILL happens when the transaction is committing, the
KILL operation is postponed to happen after the statement
has completed in order to avoid KILL to interrupt commit
processing.
Notable changes in this commit:
* wsrep client connections's error state may remain sticky after
client connection is closed. This error message will then pop
up for the next client session issuing first SQL statement.
This problem raised with test galera.galera_bf_kill.
The fix is to reset wsrep client error state, before a THD is
reused for next connetion.
* Release THD locks in wsrep_abort_transaction when locking
innodb mutexes. This guarantees same locking order as with applier
BF aborting.
* BF abort from MDL was changed to do BF abort on server/wsrep-lib
side first, and only then do the BF abort on InnoDB side. This
removes the need to call back from InnoDB for BF aborts which originate
from MDL and simplifies the locking.
* Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
The manipulation of the wsrep_aborter can be done solely on
server side. Moreover, it is now debug only variable and
could be excluded from optimized builds.
* Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
fine grained locking for SR BF abort which may require locking
of victim LOCK_thd_kill. Added explicit call for
wsrep_thd_kill_LOCK/UNLOCK where appropriate.
* Wsrep-lib was updated to version which allows external
locking for BF abort calls.
Changes to MTR tests:
* Disable galera_bf_abort_group_commit. This test is going to
be removed (MDEV-30855).
* Record galera_gcache_recover_manytrx as result file was incomplete.
Trivial change.
* Make galera_create_table_as_select more deterministic:
Wait until CTAS execution has reached MDL wait for multi-master
conflict case. Expected error from multi-master conflict is
ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
wsrep transaction when it is waiting for MDL, query gets interrupted
instead of BF aborted. This should be addressed in separate task.
* A new test galera_kill_group_commit to verify correct behavior
when KILL is executed while the transaction is committing.
Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This commit changes backup execution (namely the block ddl phase),
so that node is not paused from cluster. Instead, the following
backup execution is declared as vulnerable for possible cluster
level conflicts, especially with DDL statement applying.
With this, the mariabackup execution may be aborted, if DDL
statements happen during backup execution. This abortable
backup execution is optional feature and may be
enabled/disabled by wsrep_mode: BF_ABORT_MARIABACKUP.
Note that old style node desync and pause, despite of
WSREP_MODE_BF_MARIABACKUP is needed if node is operating as
SST donor.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Cluster conflict victim's THD is marked with wsrep_aborter.
THD::wsrep_aorter holds the thread ID of the hight priority tread,
which is currently carrying out BF aborting for this victim.
However, the BF abort operation is not always successful,
and in such case the wsrep_aborter mark should be removed.
In the old code, this wsrep_aborter resetting did not happen,
and this could lead to a situation where the sticky wsrep_aborter
mark prevents any further attempt to BF abort this transaction.
This commit fixes this issue, and resets wsrep_aborter after
unsuccesful BF abort attempt.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Test MDEV-26575 fails when it runs after MDEV-25389. This is because
the latter simulates a failure while an applier thread is
created in `start_wsrep_THD()`. The failure was not handled correctly
and would not cleanup the created THD from the global
`server_threads`. A subsequent shutdown would hang and eventually fail
trying to close this THD.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
There are separate flags DBUG_OFF for disabling the DBUG facility
and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility.
Let us allow debug builds without DEBUG_SYNC.
Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to
define ENABLED_DEBUG_SYNC.
Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
In a rebase of the merge, two preceding commits were accidentally reverted:
commit 112b23969a (MDEV-26308)
commit ac2857a5fb (MDEV-25717)
Thanks to Daniele Sciascia for noticing this.
Victim threads which are in currently in process of aborting or already
aborted should be skipped for another kill process.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
After the merging of MDEV-24915, 10.6 branch has regressions with handling of
concurrent write load against two or more cluster nodes. These regressions may
surface as cluster hanging, node crashes or data inconsistency. With some test
scenarios, the only visible symptom could be that the BF victim aborting happens
only by innodb lock wait timeout expiration. This would result only to poor
performance (by default 50 sec hang for each BF conflict), and could be somewhat
difficult to diagnose.
This pull request has following fixes to handle concurrent write load from
multiple nodes:
In lock_wait_wsrep_kill(), the victim trx was expected to be only in
TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
that victim has advanced into pre commit state. This was fixed by choosing
victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.
Victim transaction may be in several different states at the time of detected
lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
may advance further before the actual BF aborting takes place. The BF aborting
in MDEV-24915 did not wake the victim, if it was in the state of waiting for
some other lock (than the one that was blocking the high priority thread).
This anomaly caused the innodb lock wait timeout expiration delays and poor
performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
to cancel this lock wait.
wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
and starts a new transaction if there was no active transaction before.
Due to late BF aborting, the victim may have e.g. failed in certification
and is already aborting or has aborted at this stage. This has caused
problems in testing where BF aborter tries to BF abort himself.
The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
or has aborted. Victim may not have started transaction yet in wsrep context,
but it may have acquired MDL locks (due to DDL execution), and this has
caused BF conflict. Such case does not require aborting in wsrep or
replication provider state.
BF aborting could cause BF-BF conflict scenario, if victim was already aborted
and changed to replayer having high priority as well. This BF-BF conflict
scenario is now avoided in lock_wait_wsrep() where we now check if blocking
lock holder is also high priority and is ordered before, caller should wait
for the lock in this situation.
The natural innodb deadlock resolving algorithm could pick BF thread as
deadlock victim. This is fixed by giving max weigh to BF threads in
Deadlock::report().
MDEV-24341 has changed excution paths in do_command() and this affects BF
aborted victim execution. This PR fixes one assert in do_command():
DBUG_ASSERT(!thd->async_state.pending_ops())
Which fired if the thd was BF aborted earlier. This assert is now changed
to allow pending_ops() if thd was BF aborted before.
With these fixes, long term highly conflicting write load could be run against
to node cluster. If binlogging is configured, log_slave_updates should be
also set.
Incorrect processing of an auto-incrementing field in the
WSREP-related code during applying transactions results in
a duplicate key being created. This is due to the fact that
at the beginning of the write_row() and update_row() functions,
the values of the auto-increment parameters are used, which
are read from the parameters of the current thread, but further
along the code other values are used, which are read from global
variables (when applying a transaction). This can happen when
the cluster configuration has changed while applying a transaction
(for example in the high_priority_service mode for Galera 4).
Further during IST processing duplicating key is detected, and
processing of the DB_DUPLICATE_KEY return code (inside innodb,
in the write_row() handler) results in a call to the
wsrep_thd_self_abort() function.
wsrep_cluster_address_update() causes LOCK_wsrep_slave_threads
to be locked under LOCK_wsrep_cluster_config, while normally
the order should be the opposite.
Fix: don't protect @@wsrep_cluster_address value with the
LOCK_wsrep_cluster_config, LOCK_global_system_variables is enough.
Only protect wsrep reinitialization with the LOCK_wsrep_cluster_config.
And make it use a local copy of the global @@wsrep_cluster_address.
Also, introduce a helper function that checks whether
wsrep_cluster_address is set and also asserts that it can be safely
read by the caller.
Some DML operations on tables having unique secondary keys cause scanning
in the secondary index, for instance to find potential unique key violations
in the seconday index. This scanning may involve GAP locking in the index.
As this locking happens also when applying replication events in high priority
applier threads, there is a probabality for lock conflicts between two wsrep
high priority threads.
This PR avoids lock conflicts of high priority wsrep threads, which do
secondary index scanning e.g. for duplicate key detection.
The actual fix is the patch in sql_class.cc:thd_need_ordering_with(), where
we allow relaxed GAP locking protocol between wsrep high priority threads.
wsrep high priority threads (replication appliers, replayers and TOI processors)
are ordered by the replication provider, and they will not need serializability
support gained by secondary index GAP locks.
PR contains also a mtr test, which exercises a scenario where two replication
applier threads have a false positive conflict in GAP of unique secondary index.
The conflicting local committing transaction has to replay, and the test verifies
also that the replaying phase will not conflict with the latter repllication applier.
Commit also contains new test scenario for galera.galera_UK_conflict.test,
where replayer starts applying after a slave applier thread, with later seqno,
has advanced to commit phase. The applier and replayer have false positive GAP
lock conflict on secondary unique index, and replayer should ignore this.
This test scenario caused crash with earlier version in this PR, and to fix this,
the secondary index uniquenes checking has been relaxed even further.
Now innodb trx_t structure has new member: bool wsrep_UK_scan, which is set to
true, when high priority thread is performing unique secondary index scanning.
The member trx_t::wsrep_UK_scan is defined inside WITH_WSREP directive, to make
it possible to prepare a MariaDB build where this additional trx_t member is
not present and is not used in the code base. trx->wsrep_UK_scan is set to true
only for the duration of function call for: lock_rec_lock() trx->wsrep_UK_scan
is used only in lock_rec_has_to_wait() function to relax the need to wait if
wsrep_UK_scan is set and conflicting transaction is also high priority.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
If log_slave_updates==OFF, wsrep applier threads used to be configured
with option: thd->variables.option_bits&= ~(OPTION_BIN_LOG);
(i.e. like sql_log_bin=ON). And this was regardless of log-bin configuration.
With this, having configuration of: --log-bin && --log-slave-updates=OFF,
local threads used binlogging, but applier threads did not. And further:
local threads went through binlog group commit, while applier threads did
direct commits. This resulted in situation, where applier threads entered
earlier in wsrep XID checkpointing, and could sync their wsrep XID out of order.
Later local thread commit would see that higher seqno was already checkpointed,
and fire an assert because of this.
As a fix, applier threads are now forced to enable binlogging regardless of
log-slave-updates configuration.
This PR comes with new mtr test: galera.MDEV-24327, which causes a scenario
where applier transaction is applied and committed while earlier local transaction
is parked before commit order monitor enter. A buggy mariadb versoin would fail
for assertion because of wsrep XID checkpoint order violation.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
The reason for the failure is that
thd->mdl_context.release_transactional_locks()
was called after commit & rollback even in cases where the current
transaction is still active.
For 10.2, 10.3 and 10.4 the fix is simple:
- Replace all calls to thd->mdl_context.release_transactional_locks() with
thd->release_transactional_locks(). The thd function will only call
the mdl_context function if there are no active transactional locks.
In 10.6 we will better fix where we will change the return value for
some trans_xxx() functions to indicate if transaction did close the
transaction or not. This will avoid the need of the indirect call.
Other things:
- trans_xa_commit() and trans_xa_rollback() will automatically
call release_transactional_locks() if the transaction is closed.
- We can't do that for the other functions as the caller of many of these
are doing additional work (like close_thread_tables) before calling
release_transactional_locks().
- Added missing abort_result_set() and missing DBUG_RETURN in
select_create::send_eof()
- Fixed wrong indentation in injector::transaction::commit()
Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue
and move condition to correct callers. Add a function to report
BF lock waits and assert if incorrect BF-BF lock wait happens.
wsrep_report_bf_lock_wait
Add a new function to report BF lock wait.
wsrep_assert_no_bf_bf_wait
Add a new function to check do we have a
BF-BF wait and if we have report this case
and assert as it is a bug.
lock_rec_has_to_wait
Use new wsrep_assert_bf_wait to check BF-BF wait.
lock_rec_create_low
lock_table_create
Use new function to report BF lock waits.
lock_rec_insert_by_trx_age
lock_grant_and_move_on_page
lock_grant_and_move_on_rec
Assert that trx is not Galera as VATS is not compatible
with Galera.
lock_rec_add_to_queue
If there is conflicting lock in a queue make sure that
transaction is BF.
lock_rec_has_to_wait_in_queue
Remove incorrect BF handling. If there is conflicting
locks in a queue all transactions must wait.
lock_rec_dequeue_from_page
lock_rec_unlock
If there is conflicting lock make sure it is not
BF-BF case.
lock_rec_queue_validate
Add Galera record locking rules comment and use
new function to report BF lock waits.
All attempts to reproduce the original assertion have been
failed. Therefore, there is no test case on this commit.