mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-30 02:30:06 +01:00

Author	SHA1	Message	Date
Daniele Sciascia	eb4123eefc	More fixes to variable wsrep_on * Disallow setting wsrep_on = 1 if wsrep_provider is unset. Also, move wsrep_on_basic from sys_vars to wsrep suite: this test now requires to run with wsrep_provider set * Disallow setting @@session.wsrep_on = 1 when @@global.wsrep_on = 0 * Handle the case where a new connection turns @@global.wsrep_on from off to on. In this case we would miss a call to wsrep_open, causing unexpected states in wsrep::client_state (causing assertions). * Disable wsrep.MDEV-22443 because it is no longer possible to enable wsrep_on, if server is started with wsrep_provider='none' Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-04-20 08:24:14 +03:00
Sergei Golubchik	eac8341df4	MDEV-23328 Server hang due to Galera lock conflict resolution adaptation of `29bbcac0ee` for 10.4	2021-02-12 18:17:06 +01:00
Sergei Golubchik	00a313ecf3	Merge branch 'bb-10.3-release' into bb-10.4-release Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution" was null-merged. 10.4 version of the fix is coming up separately	2021-02-12 17:44:22 +01:00
Marko Mäkelä	3a423088ac	Merge 10.3 into 10.4	2020-09-21 12:29:00 +03:00
sjaakola	5a7794d3a8	MDEV-21910 Deadlock between BF abort and manual KILL command When high priority replication slave applier encounters lock conflict in innodb, it will force the conflicting lock holder transaction (victim) to rollback. This is a must in multi-master sychronous replication model to avoid cluster lock-up. This high priority victim abort (aka "brute force" (BF) abort), is started from innodb lock manager while holding the victim's transaction's (trx) mutex. Depending on the execution state of the victim transaction, it may happen that the BF abort will call for THD::awake() to wake up the victim transaction for the rollback. Now, if BF abort requires THD::awake() to be called, then the applier thread executed locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data If, at the same time another DBMS super user issues KILL command to abort the same victim, it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex. These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking deadlock may occur. The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim This flag is set both when BF is called for from innodb and by KILL command. Either path of victim killing will bail out if victim's wsrep_killed is already set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter records the aborter THD's ID. This is needed to preserve the right to kill the victim from different locations for the same aborter thread. It is also good error logging, to see who is reponsible for the abort. A new test case was added in galera.galera_bf_kill_debug.test for scenario where wsrep applier thread and manual KILL command try to kill same idle victim	2020-06-26 09:56:23 +03:00
sjaakola	1af6e92f0b	MDEV-22666 galera.MW-328A hang The hang can happen between a lock connection issuing KILL CONNECTION for a victim, which is in committing phase. There happens two resource deadlockwhere killer is holding victim's LOCK_thd_data and requires trx mutex for the victim. The victim, otoh, holds his own trx mutex, but requires LOCK_thd_data in wsrep_commit_ordered(). Hence a classic two thread deadlock happens. The fix in this commit changes innodb commit so that wsrep_commit_ordered() is not called while holding trx mutex. With this, wsrep patch commit time mutex locking does not violate the locking protocol of KILL command (i.e. LOCK_thd_data -> trx mutex) Also, a new test case has been added in galera.galera_bf_kill.test for scenario where a client connection is killed in committting phase.	2020-05-25 19:30:23 +03:00
Jan Lindström	523d67a272	MDEV-22494 : Galera assertion lock_sys.mutex.is_owned() at lock_trx_handle_wait_low Problem was that trx->lock.was_chosen_as_wsrep_victim variable was not set back to false after it was set true. wsrep_thd_bf_abort Add assertions for correct mutex status and take necessary mutexes before calling thd->awake_no_mutex(). innobase_rollback_trx() Reset trx->lock.was_chosen_as_wsrep_victim wsrep_abort_slave_trx() Removed unused function. wsrep_innobase_kill_one_trx() Added function comment, removed unnecessary parameters and added debug assertions to enforce correct usage. Added more debug output to help out on error analysis. wsrep_abort_transaction() Added debug assertions and removed unused variables. trx0trx.h Removed assert_trx_is_free macro and replaced it with assert_freed() member function. trx_create() Use above assert_free() and initialize wsrep variables. trx_free() Use assert_free() trx_t::commit_in_memory() Reset lock.was_chosen_as_wsrep_victim trx_rollback_for_mysql() Reset trx->lock.was_chosen_as_wsrep_victim Add test case galera_bf_kill	2020-05-15 09:04:02 +03:00
mkaruza	d87c16be79	MDEV-20616: MariaDB-Galera 10.4.8 \| Transaction aborted \| Sig 6 Shutdown When connections go to same node and deadlock happens, BF abort should not happen for victim thread. Fixed by guarding `wsrep_handle_SR_rollback()` so that is called only for SR transactions. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Daniele Sciascia <daniele.sciascia@galeracluster.com>	2020-03-03 10:29:45 +01:00
Daniele Sciascia	aab6cefe8d	MDEV-20848 Fixes for MTR test galera_sr.GCF-1060 (#1421 ) This patch contains two fixes: * wsrep_handle_mdl_conflict(): handle the case where SR transaction is in aborting state. Previously, a BF-BF conflict was reported, and the process would abort. * wsrep_thd_bf_abort(): do not restore thread vars after calling wsrep_bf_abort(). Thread vars are already restored in wsrep-lib if necessary. This also removes the assumption that the caller of wsrep_thd_bf_abort() is the given bf_thd, which is not the case. Also in this patch: * Remove unnecessary check for active victim transaction in wsrep_thd_bf_abort(): the exact same check is performed later in wsrep_bf_abort(). * Make wsrep_thd_bf_abort() and wsrep_log_thd() const-correct. * Change signature of wsrep_abort_thd() to take THD pointers instead of void pointers.	2019-12-04 09:21:14 +02:00
seppo	4111a53079	MDEV-21096 async slave crash with gtid_log_pos table access (#1413 ) The original crash happened when async replication IO thread was updating mysql.gtid_slave_pos table. Operations on this table should remain node local, but it appears that protection (THD::wsrep_ignore_table flag) to prevent wsrep replication for this table mas missing for innodb write_row() and update_row(). It was somewhat difficult to reproduce the issue, because mtr seems to create the affected table mysql.gtid_log_pos as of Aria engine type, and Aria engine operations will not be replicated anyhow. It looks, though, that in release installation, mysql.gtid_slave_pos table is of InnoDB engine. It was possible to trigger somewhat related problem by running test galera.galera_as_slave_gtid with configuration: gtid_pos_auto_engines=InnoDB. However, this test mode, causes earlier crash when replication background thread creates aditional table: mysql.gtid_slave_pos_InnoDB, and this table create triggered wsrep TOI replication, which also failed for assertion. Actually, async replication IO and background threads should not replicate anything to cluster. This pull request contains new test galera.galera_as_slave_gtid_auto_engine, which basically just runs galera.galera_as_slave_gtid with configuration of gtid_pos_auto_engines=InnoDB. Test galera.galera_as_slave_gtid is also modified for better code reuse. Actual fix for MDEV-21096 is in storage/innobase/handler/ha_innodb.cc, where THD::wsrep_ignore_table flag is now honored before wsrep key population. There is additional fix in sql/service_wsrep.cc where async replication IO and background threads are marked as non-local. This fences these threads out of wsrep replication altogether. Note that this change, actually makes the use of THD::wsrep_ignore-table redundant. We may want to refactor THD::wsrep_ignore_table out in the future, if there is no other use case for it in sight.	2019-11-25 11:19:33 +02:00
Marko Mäkelä	ec40980ddd	Merge 10.3 into 10.4	2019-11-01 15:23:18 +02:00
Teemu Ollakka	9487e0b259	MDEV-19826 10.4 seems to crash with "pool-of-threads" (#1370 ) MariaDB 10.4 was crashing when thread-handling was set to pool-of-threads and wsrep was enabled. There were two apparent reasons for the crash: - Connection handling in threadpool_common.cc was missing calls to control wsrep client state. - Thread specific storage which contains thread variables (THR_KEY_mysys) was not handled appropriately by wsrep patch when pool-of-threads was configured. This patch addresses the above issues in the following way: - Wsrep client state open/close was moved in thd_prepare_connection() and end_connection() to have common handling for one-thread-per-connection and pool-of-threads. - Thread local storage handling in wsrep patch was reworked by introducing set of wsrep_xxx_threadvars() calls which replace calls to THD store_globals()/reset_globals() and deal with thread handling specifics internally. Wsrep-lib was updated to version which relaxes internal concurrency related sanity checks. Rollback code from wsrep_rollback_process() was extracted to separate calls for better readability. Post rollback thread was removed as it was completely unused.	2019-08-30 08:42:24 +03:00
Marko Mäkelä	b896f60a73	Fix -Wformat and -Wnonnull-compare for WSREP	2019-04-03 09:20:21 +03:00
Teemu Ollakka	1ef50a34ec	10.4 wsrep group commit fixes (#1224 ) * MDEV-16509 Improve wsrep commit performance with binlog disabled Release commit order critical section early after trx_commit_low() if binlog is not transaction coordinator. In order to avoid two phase commit, binlog_hton is not registered for THD during IO_CACHE population. Implemented a test which verifies that the transactions release commit order early. This optimization will change behavior during recovery as the commit is not two phase when binlog is off. Fixed and recorded wsrep-recover-v25 and wsrep-recover to match the behavior. * MDEV-18730 Ordering for wsrep binlog group commit Previously out of order execution was allowed for wsrep commits. Established proper ordering by populating wait_for_commit for every wsrep THD and making group commit leader to wait for prior commits before proceeding to trx_group_commit_leader(). * MDEV-18730 Added a test case to verify correct commit ordering * MDEV-16509, MDEV-18730 Review fixes Use WSREP_EMULATE_BINLOG() macro to decide if the binlog_hton should be registered. Whitespace/syntax fixes and cleanups. * MDEV-16509 Require binlog for galera_var_innodb_disallow_writes test If the commit to InnoDB is done in one phase, the native InnoDB behavior is that the transaction is committed in memory before it is persisted to disk. This means that the innodb_disallow_writes=ON may not prevent transaction to become visible to other readers before commit is completely over. On the other hand, if the commit is two phase (as it is with binlog), the transaction will be blocked in prepare phase. Fixed the test to use binlog, which enforces two phase commit, which in turn makes commit to block before the changes become visible to other connections. This guarantees that the test produces expected result.	2019-03-15 07:09:13 +02:00
Teemu Ollakka	6edfeb82fd	Fixes to streaming replication BF aborts The InnoDB DeadlockChecker::check_and_resolve() was missing a call to wsrep_handle_SR_rollback() in the case when the transaction running deadlock detection was chosen as victim. Refined wsrep_handle_SR_rollback() to skip store_globals() calls if the transaction was BF aborting itself. Made mysql-wsrep-features#165 more deterministic by waiting until the update is in progress before sending next update.	2019-02-25 08:31:52 +02:00
Brave Galera Crew	36a2a185fe	Galera4	2019-01-23 15:30:00 +04:00

16 commits