mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-16 03:47:17 +02:00

Author	SHA1	Message	Date
Jan Lindström	468e56bfde	Add missing includes.	2020-07-24 19:25:32 +03:00
Teemu Ollakka	1e2a4ed7ed	MDEV-21718 Assertion in wsrep::client_state::before_command(). An assertion `server_state_.rollback_mode() == wsrep::server_state::rm_async` fired in before_command() when - thread-handling was set to pool-of-threads and - a BF abort happened between client session calls to wait_rollback_complete_and_acquire_ownership() and before_command(). This commit introduces a test case to reproduce the crash and updates wsrep-lib submodule to fixed version.	2020-07-24 13:26:21 +03:00
Jan Lindström	134a6a8d2f	Silence unnecessary warning.	2020-07-24 12:05:39 +03:00
sjaakola	95132ade6d	MDEV-20928 mtr test galera.galera_var_innodb_disallow_writes test failure The sporadic test hangs happen because of mutex dealock between innodb background threads and two test connection executions. The test sets variable innodb_disallow_writes, which blocks all writes to filesyste. The test logic is to execute an INSERT, which should hang because of filesytstem writes are blocked, and through another session verify by SELECT that this hanging happens. The SELECT session will then release innodb_disallow_writes blocking. However, filesystem write blocking affects also innodb background threads and they may hang while keeping some other resources locked. As an example, in one test hang situation, buffer pool access was blocked. And, if buffer pool is blocked, the test connections will be blocked as well, and the SELECT session will not be able to continue to release the innodb_disallow_writes. The fix in this commit is refactoring of the test logic. The test will now set first innodb_disallow_writes blocking, and then record a hash of data directory's filesystem contents. This works as checksum of the state of data on the datadirectory. Then some SQL load is tried on both nodes, these sessions will be blocking due to frozen file system state. The test will have a short sleep to allow innodb background threads to loop and possibly encounter innodb_disallow_writes blocking as well. After the sleep, the test will record file system checksun for the second time, and then release the innodb_disallow-writes blocking. Finally, the two checksums are compared, they should be identical to verify that nothing was written on datadirectory during the test execution. The checksum is implemented by md5sum hash over all files found in datadirectory by find command. all these file hashes are hashed together by one more md5sum. The test therefore depends on md5sum and find. find may work differently with some OS distributions, e.g. freebsd may be problematic.	2020-07-24 12:05:39 +03:00
mkaruza	4b4372af6a	MDEV-22458: Server with WSREP hangs after INSERT, wrong usage of mutex 'LOCK_thd_data' and 'share->intern_lock' / 'lock->mutex' Add `find_thread_by_id_with_thd_data_lock` which will be used only when killing thread. This version needs to take `thd->LOCK_thd_data` lock.	2020-07-24 12:05:39 +03:00
Jan Lindström	8c7f7bae47	Fix regex on test.	2020-07-22 08:48:14 +03:00
Julius Goryavsky	956f21c3b0	Merge remote-tracking branch 'origin/bb-10.4-MDEV-21910' into 10.4	2020-07-16 13:03:29 +02:00
Julius Goryavsky	df1846aeea	Merge branch '10.4-MDEV-18838' of https://github.com/codership/mariadb-server into 10.4-MDEV-22966	2020-07-14 09:36:38 +02:00
Julius Goryavsky	1bf863a91a	Merge branch '10.4-MDEV-22222' of https://github.com/codership/mariadb-server into 10.4-MDEV-22222	2020-07-03 16:17:59 +02:00
MikkoJaakola	7b8319f3f1	MDEV-22966- Hang on galera_toi_truncate test case galera_toi_truncate test launches a long term INSERT statement in node 2, and then submits an offending TRUNCATE through node 1. The idea is that the replicated TRUNCATE will conflict with INSERT in node 2, and force the INSERT to abort. The test first issues --send INSERT in node 2, and then switches to node 1 to launch --send TRUNCATE. As the INSERT is launched asynchronously by --send, it may happen that INSERT has not yet started to process, before the TRUNCATE is replicated. The net effect may be that TRUCATE processes to completion in node 2, and only after that INSERT starts to execute. As the INSERT is very long query, it will last longer than mtr test suite max test time, the test will fail for timeout. The fix in this commit uses another connection in node 2, to wait until the INSERT has started to process in node 2. TRUNCATE in node 1, will be submitted in node 1 after this wait condition.	2020-07-02 10:59:00 +03:00
Marko Mäkelä	f347b3e0e6	Merge 10.3 into 10.4	2020-07-02 07:39:33 +03:00
Marko Mäkelä	1df1a63924	Merge 10.2 into 10.3	2020-07-02 06:17:51 +03:00
Marko Mäkelä	ea2bc974dc	Merge 10.1 into 10.2	2020-07-01 12:03:55 +03:00
Julius Goryavsky	8e8f9671cb	MDEV-21773: added missing include file to mtr tests	2020-06-30 14:03:22 +02:00
mkaruza	2b8b7394a1	MDEV-22222: Assertion `state() == s_executing \|\| state() == s_preparing \|\| state() == s_prepared \|\| state() == s_must_abort \|\| state() == s_aborting \|\| state() == s_cert_failed \|\| state() == s_must_replay' failed in wsrep::transaction::before_rollback LOCK TABLE will do implicit commit, we need to properly handle transaction after commit.	2020-06-28 23:07:41 +02:00
sjaakola	5a7794d3a8	MDEV-21910 Deadlock between BF abort and manual KILL command When high priority replication slave applier encounters lock conflict in innodb, it will force the conflicting lock holder transaction (victim) to rollback. This is a must in multi-master sychronous replication model to avoid cluster lock-up. This high priority victim abort (aka "brute force" (BF) abort), is started from innodb lock manager while holding the victim's transaction's (trx) mutex. Depending on the execution state of the victim transaction, it may happen that the BF abort will call for THD::awake() to wake up the victim transaction for the rollback. Now, if BF abort requires THD::awake() to be called, then the applier thread executed locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data If, at the same time another DBMS super user issues KILL command to abort the same victim, it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex. These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking deadlock may occur. The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim This flag is set both when BF is called for from innodb and by KILL command. Either path of victim killing will bail out if victim's wsrep_killed is already set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter records the aborter THD's ID. This is needed to preserve the right to kill the victim from different locations for the same aborter thread. It is also good error logging, to see who is reponsible for the abort. A new test case was added in galera.galera_bf_kill_debug.test for scenario where wsrep applier thread and manual KILL command try to kill same idle victim	2020-06-26 09:56:23 +03:00
Julius Goryavsky	141b390d82	Merge branch '10.4-MDEV-22729-2' into 10.4	2020-06-25 13:06:51 +02:00
Jan Lindström	bffa8264aa	Stabilize glera_var_cluster_conf_id test case.	2020-06-24 17:16:38 +03:00
sjaakola	33de71c2f8	MDEV-22632 wsrep XID checkpointing can happen out of order for certification failure When a transaction fails in certification phase, it has connsumed one GTID, but as transaction must rollback, it will not go for commit ordering, and because of this also the wsrep XID checkpointing can happen out of order. This PR will make the thread, which has failed for certiication failure to wait for its commit order turn for checkpointing wsrep IXD in innodb rollback segment. There is a specific test for wsrep XID checkpointing ordering in mtr test: mysql-wsrep-bugs-607, which is added in this PR. Test galera_slave_replay depends also on this fix, as the second test phase may also assert for bad wsrep XID checkpointing order. galera_slave_replay.test had also other problems, which caused the test to fail immediately, thse are now fixes in this PR as well.	2020-06-24 17:16:38 +03:00
Jan Lindström	9fb8d87d2d	Test fixes.	2020-06-24 09:38:54 +03:00
Julius Goryavsky	7bd11fb46f	MDEV-22729: additional changes after merge	2020-06-23 12:56:08 +02:00
Jan Lindström	eba9189777	Test case cleanups.	2020-06-23 07:46:35 +03:00
MikkoJaakola	51c8289ed6	MDEV-21759 galera.galera_parallel_autoinc_manytrx sporadic failures. The galera.galera_parallel_autoinc_manytrx mtr test opens and runs test scenario through 3 connections to node 1 and one connection to node 2. In the test initialization phase, the test creates two tables 't1' and 'ten' and then creates a stored procedure 'p1' to operate on these tables. These 3 create DDL statements are issued through same connection to node 1. In the next test phase, the mtr script uses send command to launch the call for the p1 stored procedure through all 3 connections to node 1 and through one connection to node 2. As the mtr send command is asynchronous, this test phase is non blocking and fast operation. Now, if the replication between nodes is slow, it may happen that the initialization phase DDL statements have not been received or have not been fully applied in node 2. Therefore there is no guarantee that the test tables and the stored procedure have been created in node 2. Yet, the test is trying to call p1 in node 2. In the failure case error logs, there is error message "MTR failed: query 'reap' failed: 1305: PROCEDURE test.p1 does not exist" The reap command through connection to node 2, is the first place where test execution may observe that test tables and/or stored procedure are not yet created in node 2. The fix in this commit adds a wait condition in connection to node 2, to wait until the stored procedure is created before calling the stored procedure. The wait is implemented by looking in information_schema.routines for the p1 stored procedure.	2020-06-23 07:46:35 +03:00
Jan Lindström	5d7e067cce	MDEV-22125 : galera.galera_drop_multi MTR failed: InnoDB: MySQL is trying to drop database `fts`.`` though there are still open handles MDEV-22140 galera.galera_drop_database MTR failed: InnoDB: MySQL is trying to drop database `fts`.`` though there are still open handles Add wait conditions to wait that all operations are done in both nodes.	2020-06-23 07:46:35 +03:00
Jan Lindström	319886eca7	MDEV-20928 : Galera test failure on galera.galera_var_innodb_disallow_writes: Result length mismatch Add wait_conditions to force desired execution.	2020-06-23 07:46:35 +03:00
Jan Lindström	b80b52394d	Test case cleanups.	2020-06-22 13:25:25 +03:00
Julius Goryavsky	4b4e77db64	Merge branch '10.4-MDEV-22729' of https://github.com/codership/mariadb-server into 10.4-MDEV-22729-2	2020-06-19 18:01:15 +02:00
MikkoJaakola	0128e13e62	MDEV-21759 galera.galera_parallel_autoinc_manytrx sporadic failures. The galera.galera_parallel_autoinc_manytrx mtr test opens and runs test scenario through 3 connections to node 1 and one connection to node 2. In the test initialization phase, the test creates two tables 't1' and 'ten' and then creates a stored procedure 'p1' to operate on these tables. These 3 create DDL statements are issued through same connection to node 1. In the next test phase, the mtr script uses send command to launch the call for the p1 stored procedure through all 3 connections to node 1 and through one connection to node 2. As the mtr send command is asynchronous, this test phase is non blocking and fast operation. Now, if the replication between nodes is slow, it may happen that the initialization phase DDL statements have not been received or have not been fully applied in node 2. Therefore there is no guarantee that the test tables and the stored procedure have been created in node 2. Yet, the test is trying to call p1 in node 2. In the failure case error logs, there is error message "MTR failed: query 'reap' failed: 1305: PROCEDURE test.p1 does not exist" The reap command through connection to node 2, is the first place where test execution may observe that test tables and/or stored procedure are not yet created in node 2. The fix in this commit adds a wait condition in connection to node 2, to wait until the stored procedure is created before calling the stored procedure. The wait is implemented by looking in information_schema.routines for the p1 stored procedure.	2020-06-16 11:43:31 +03:00
Jan Lindström	7710f28eec	Add missing include as test requires galera debug library	2020-06-15 09:29:17 +03:00
Marko Mäkelä	b3e395a13e	Merge 10.2 into 10.3	2020-06-06 18:50:25 +03:00
Julius Goryavsky	5f55f69e4a	Merge 10.1 into 10.2	2020-06-05 18:32:37 +02:00
Julius Goryavsky	3f019d1771	Added missing include files to check for debug_sync	2020-06-03 15:34:44 +02:00
sjaakola	8ec0e9111a	MDEV-22763 backporting MDEV-20225 fix into 10.1 Backported the support for aborting and replaying stored procedure and fix for trigger key assigments from 10.4 version. Backported also two mtr tests: wsrep_sp_bf_abort and MDEV-20225	2020-06-03 15:34:44 +02:00
sjaakola	ccec6b887b	MDEV-22729 fixes for galera.galera_slave_replay test The test was changing wsrep_on option in node_3, which is native MariaDB server (i.e. not a cluster node). Native NariaDB server should not manipulate wsrep replication state, this problem is fixed. galera.galera_slave_replay test phase 2 will cause certification failure for async slave SQL handler thread. This certification failure is now monitored and required to happen in the test. The test phase 2, generates scenario, where async slave SQL handler faces certification failure and galera slave applier is paused when this happens. This makes the test vulnerable for anomaly described in MDEV-22632. Therefore the fix in this commit depends on MDEV-22632, and should be merged after the fix for MDEV-22632.	2020-05-27 21:21:24 +03:00
Julius Goryavsky	e04999c460	Forgotten include files were added to check the necessary conditions for running the test	2020-05-26 14:01:13 +02:00
sjaakola	1af6e92f0b	MDEV-22666 galera.MW-328A hang The hang can happen between a lock connection issuing KILL CONNECTION for a victim, which is in committing phase. There happens two resource deadlockwhere killer is holding victim's LOCK_thd_data and requires trx mutex for the victim. The victim, otoh, holds his own trx mutex, but requires LOCK_thd_data in wsrep_commit_ordered(). Hence a classic two thread deadlock happens. The fix in this commit changes innodb commit so that wsrep_commit_ordered() is not called while holding trx mutex. With this, wsrep patch commit time mutex locking does not violate the locking protocol of KILL command (i.e. LOCK_thd_data -> trx mutex) Also, a new test case has been added in galera.galera_bf_kill.test for scenario where a client connection is killed in committting phase.	2020-05-25 19:30:23 +03:00
Marko Mäkelä	d8dc3c72b6	Merge 10.3 into 10.4	2020-05-20 12:25:23 +03:00
Marko Mäkelä	f4f0ef3e37	Merge 10.2 into 10.3	2020-05-20 11:41:51 +03:00
Jan Lindström	ad0f85bcd2	MDEV-18838 : galera.galera_toi_truncate: Test failure: mysqltest: query 'reap' succeeded - should have failed with errno 1213 Test cleanup.	2020-05-20 09:34:50 +03:00
Jan Lindström	fde94b4cd6	MDEV-21483 : Galera MTR tests failed: galera.MW-328A galera.MW-328B Enable tests with additional galera output to find out actual reason for test failures.	2020-05-18 14:21:12 +03:00
Jan Lindström	523d67a272	MDEV-22494 : Galera assertion lock_sys.mutex.is_owned() at lock_trx_handle_wait_low Problem was that trx->lock.was_chosen_as_wsrep_victim variable was not set back to false after it was set true. wsrep_thd_bf_abort Add assertions for correct mutex status and take necessary mutexes before calling thd->awake_no_mutex(). innobase_rollback_trx() Reset trx->lock.was_chosen_as_wsrep_victim wsrep_abort_slave_trx() Removed unused function. wsrep_innobase_kill_one_trx() Added function comment, removed unnecessary parameters and added debug assertions to enforce correct usage. Added more debug output to help out on error analysis. wsrep_abort_transaction() Added debug assertions and removed unused variables. trx0trx.h Removed assert_trx_is_free macro and replaced it with assert_freed() member function. trx_create() Use above assert_free() and initialize wsrep variables. trx_free() Use assert_free() trx_t::commit_in_memory() Reset lock.was_chosen_as_wsrep_victim trx_rollback_for_mysql() Reset trx->lock.was_chosen_as_wsrep_victim Add test case galera_bf_kill	2020-05-15 09:04:02 +03:00
Marko Mäkelä	38f6c47f8a	Merge 10.3 into 10.4	2020-05-13 12:52:57 +03:00
Marko Mäkelä	15fa70b840	Merge 10.2 into 10.3	2020-05-13 11:45:05 +03:00
Jan Lindström	748fb55093	MDEV-21483 : Galera MTR tests failed: galera.MW-328A galera.MW-328B Enable tests with additional galera output to find out actual reason for test failures.	2020-05-08 11:35:15 +03:00
Jan Lindström	a878344ee5	MDEV-21421 : Galera test sporadic failure on galera.galera_as_slave_gtid_myisam: Result length mismatch Add wait_condition so that drop table has time to replicate to Galera cluster.	2020-05-08 09:16:37 +03:00
Jan Lindström	40d0b64167	MDEV-21421 : Galera test sporadic failure on galera.galera_as_slave_gtid_myisam: Result length mismatch Add wait_condition so that drop table has time to replicate to Galera cluster.	2020-05-08 09:13:47 +03:00
Jan Lindström	057a700a2a	MDEV-22466 : Galera missing .test or .result files Add missing .test and .result files.	2020-05-07 14:23:33 +03:00
Jan Lindström	e6301d8f67	MDEV-21515 : Galera test sporadic failure on galera.galera_wsrep_new_cluster: Result content mismatch Test starts two servers and we do not know order they really start, thus wsrep_local_index can be 1 or 2.	2020-05-06 17:32:08 +03:00
Marko Mäkelä	2c3c851d2c	Merge 10.3 into 10.4	2020-05-05 20:33:10 +03:00
Jan Lindström	37a01aceca	MDEV-21489 : wsrep_cluster_conf_id has wrong value Do not show exact value as it depends order of test execution. Instead use # for correct values and ERROR for incorrect.	2020-05-05 09:48:03 +03:00

1 2 3 4 5 ...

733 commits