mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-28 01:34:17 +01:00

Author	SHA1	Message	Date
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Sergei Golubchik	98a39b0c91	Merge branch '10.4' into 10.5	2023-12-02 01:02:50 +01:00
Monty	2447172afb	Ensure that process "State" is properly cleaned after query execution In some cases "SHOW PROCESSLIST" could show "Reset for next command" as State, even if the previous query had finished properly. Fixed by clearing State after end of command and also setting the State for the "Connect" command. Other things: - Changed usage of 'thd->set_command(COM_SLEEP)' to 'thd->mark_connection_idle()'. - Changed thread_state_info() to return "" instead of NULL. This is just a safety measurement and in line with the logic of the rest of the function.	2023-11-07 10:07:30 +02:00
Marko Mäkelä	3fee1b4471	Merge 10.5 into 10.6	2023-08-15 11:21:34 +03:00
Marko Mäkelä	599c4d9a40	Merge 10.4 into 10.5	2023-08-15 11:10:27 +03:00
Jan Lindström	277968aa4c	MDEV-31413 : Node has been dropped from the cluster on Startup / Shutdown with async replica There was two related problems: (1) Galera node that is defined as a slave to async MariaDB master at restart might do SST (state stransfer) and part of that it will copy mysql.gtid_slave_pos table. Problem is that updates on that table are not replicated on a cluster. Therefore, table from donor that is not slave is copied and joiner looses gtid position it was and start executing events from wrong position of the binlog. This incorrect position could break replication and causes node to be dropped and requiring user action. (2) Slave sql thread might start executing events before galera is ready (wsrep_ready=ON) and that could also cause node to be dropped from the cluster. In this fix we enable replication of mysql.gtid_slave_pos table on a cluster. In this way all nodes in a cluster will know gtid slave position and even after SST joiner knows correct gtid position to start. Furthermore, we wait galera to be ready before slave sql thread executes any events to prevent too early execution. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-08-08 03:25:56 +02:00
Teemu Ollakka	f307160218	MDEV-29293 MariaDB stuck on starting commit state This commit contains a merge from 10.5-MDEV-29293-squash into 10.6. Although the bug MDEV-29293 was not reproducible with 10.6, the fix contains several improvements for wsrep KILL query and BF abort handling, and addresses the following issues: * MDEV-30307 KILL command issued inside a transaction is problematic for galera replication: This commit will remove KILL TOI replication, so Galera side transaction context is not lost during KILL. * MDEV-21075 KILL QUERY maintains nodes data consistency but breaks GTID sequence: This is fixed as well as KILL does not use TOI, and thus does not change GTID state. * MDEV-30372 Assertion in wsrep-lib state: This was caused by BF abort or KILL when local transaction was in the middle of group commit. This commit disables THD::killed handling during commit, so the problem is avoided. * MDEV-30963 Assertion failure !lock.was_chosen_as_deadlock_victim in trx0trx.h:1065: The assertion happened when the victim was BF aborted via MDL while it was committing. This commit changes MDL BF aborts so that transactions which are committing cannot be BF aborted via MDL. The RQG grammar attached in the issue could not reproduce the crash anymore. Original commit message from 10.5 fix: MDEV-29293 MariaDB stuck on starting commit state The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Make galera_var_retry_autocommit result more readable by echoing cases and expectations into result. Only one expected result for reap to verify that server returns expected status for query. * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_bf_abort_registering to check that registering trx gets BF aborted through MDL. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:42:05 +02:00
Teemu Ollakka	3f59bbeeae	MDEV-29293 MariaDB stuck on starting commit state The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:39:43 +02:00
Teemu Ollakka	6966d7fe4b	MDEV-29293 MariaDB stuck on starting commit state This is a backport from 10.5. The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:33:37 +02:00
Marko Mäkelä	829e8111c7	Merge 10.5 into 10.6	2022-09-26 14:34:43 +03:00
Marko Mäkelä	6286a05d80	Merge 10.4 into 10.5	2022-09-26 13:34:38 +03:00
Marko Mäkelä	3c92050d1c	Fix build without either ENABLED_DEBUG_SYNC or DBUG_OFF There are separate flags DBUG_OFF for disabling the DBUG facility and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility. Let us allow debug builds without DEBUG_SYNC. Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to define ENABLED_DEBUG_SYNC.	2022-09-23 17:37:52 +03:00
Jan Lindström	d5bc05798f	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 20:38:11 +02:00
Jan Lindström	aa7ca987db	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 09:52:40 +03:00
Vicențiu Ciorbaru	7c33ecb665	Merge remote-tracking branch 'upstream/10.4' into 10.5	2021-09-10 17:16:18 +03:00
Daniele Sciascia	c3707691c2	MDEV-25718 Assertion `transaction.is_streaming()' failed * Update wsrep-lib which contains the fix * Add deterministic test case that reproduces the assertion Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-09-06 08:16:06 +03:00
Sergei Golubchik	25d9d2e37f	Merge branch 'bb-10.4-release' into bb-10.5-release	2021-02-15 16:43:15 +01:00
Sergei Golubchik	eac8341df4	MDEV-23328 Server hang due to Galera lock conflict resolution adaptation of `29bbcac0ee` for 10.4	2021-02-12 18:17:06 +01:00
Marko Mäkelä	8de233af81	Merge 10.4 into 10.5	2021-01-11 16:29:51 +02:00
Alexey Yurchenko	033f8d13ce	Update wsrep-lib (new logger interface) Ensure consistent use of logging macros in wsrep-related code Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-01-07 17:41:21 +02:00
Marko Mäkelä	6a1e655cb0	Merge 10.4 into 10.5	2020-12-02 18:29:49 +02:00
Marko Mäkelä	24ec8eaf66	MDEV-15532 after-merge fixes from Monty The Galera tests were massively failing with debug assertions.	2020-12-02 16:16:29 +02:00
Marko Mäkelä	1813d92d0c	Merge 10.4 into 10.5	2020-07-02 09:41:44 +03:00
sjaakola	33de71c2f8	MDEV-22632 wsrep XID checkpointing can happen out of order for certification failure When a transaction fails in certification phase, it has connsumed one GTID, but as transaction must rollback, it will not go for commit ordering, and because of this also the wsrep XID checkpointing can happen out of order. This PR will make the thread, which has failed for certiication failure to wait for its commit order turn for checkpointing wsrep IXD in innodb rollback segment. There is a specific test for wsrep XID checkpointing ordering in mtr test: mysql-wsrep-bugs-607, which is added in this PR. Test galera_slave_replay depends also on this fix, as the second test phase may also assert for bad wsrep XID checkpointing order. galera_slave_replay.test had also other problems, which caused the test to fail immediately, thse are now fixes in this PR as well.	2020-06-24 17:16:38 +03:00
Marko Mäkelä	4337a3b5f9	Merge 10.4 into 10.5	2020-05-04 18:43:00 +03:00
Jan Lindström	d002ec2c87	MDEV-21489 : wsrep_cluster_conf_id has wrong value Variable was not updated after initialization.	2020-05-04 11:06:40 +03:00
mkaruza	41bc736871	Galera GTID support Support for galera GTID consistency thru cluster. All nodes in cluster should have same GTID for replicated events which are originating from cluster. Cluster originating commands need to contain sequential WSREP GTID seqno Ignore manual setting of gtid_seq_no=X. In master-slave scenario where master is non galera node replicated GTID is replicated and is preserved in all nodes. To have this - domain_id, server_id and seqnos should be same on all nodes. Node which bootstraps the cluster, to achieve this, sends domain_id and server_id to other nodes and this combination is used to write GTID for events that are replicated inside cluster. Cluster nodes that are executing non replicated events are going to have different GTID than replicated ones, difference will be visible in domain part of gtid. With wsrep_gtid_domain_id you can set domain_id for WSREP cluster. Functions WSREP_LAST_WRITTEN_GTID, WSREP_LAST_SEEN_GTID and WSREP_SYNC_WAIT_UPTO_GTID now works with "native" GTID format. Fixed galera tests to reflect this chances. Add variable to manually update WSREP GTID seqno in cluster Add variable to manipulate and change WSREP GTID seqno. Next command originating from cluster and on same thread will have set seqno and cluster should change their internal counter to it's value. Behavior is same as using @@gtid_seq_no for non WSREP transaction.	2020-01-29 15:06:06 +02:00
Daniele Sciascia	2d4b6571ec	Wsrep position not updated in InnoDB after certification failures (#1432 ) A certification failure followed by a clean shutdown would cause an inconsistency between the sequence number stored in innodb and the sequence number stored in provider. This happened both in the case of local certification failure, and in the case where dummy writeset is applied. The fix consists of: - updating wsrep position after dummy writeset is delivered in `Wsrep_high_priority_service::log_dummy_write_set()` - updating wsrep position while releasing commit order in wsrep-lib side Added two tests which stress the situation where a server is shutdown after a certification failure.	2020-01-14 07:33:02 +02:00
Daniele Sciascia	72a5a4f1d5	MDEV-20780 Fixes for failures on galera_sr_ddl_master (#1425 ) Test galera_sr_ddl_master would sometimes fail due to leftover streaming replication fragments. Rollbacker thread would attempt to open streaming_log table to remove the fragments, but would fail in check_stack_overrun(). Ultimately the check_stack_overrun() failure was caused by rollbacker missing to switch the victim's THD thread stack to rollbacker's thread stack. Also in this patch: - Remove duplicate functionality in rollbacker helper functions, and extract rollbacker fragment removal into function wsrep_remove_streaming_fragments() - Reuse open_for_write() in wsrep_schema::remove_fragments - Partially revert changes to galera_sr_ddl_master test from commit `44a11a7c08`. Removed unnecessary wait condition and isolation level setting	2019-12-11 14:08:06 +02:00
Teemu Ollakka	9487e0b259	MDEV-19826 10.4 seems to crash with "pool-of-threads" (#1370 ) MariaDB 10.4 was crashing when thread-handling was set to pool-of-threads and wsrep was enabled. There were two apparent reasons for the crash: - Connection handling in threadpool_common.cc was missing calls to control wsrep client state. - Thread specific storage which contains thread variables (THR_KEY_mysys) was not handled appropriately by wsrep patch when pool-of-threads was configured. This patch addresses the above issues in the following way: - Wsrep client state open/close was moved in thd_prepare_connection() and end_connection() to have common handling for one-thread-per-connection and pool-of-threads. - Thread local storage handling in wsrep patch was reworked by introducing set of wsrep_xxx_threadvars() calls which replace calls to THD store_globals()/reset_globals() and deal with thread handling specifics internally. Wsrep-lib was updated to version which relaxes internal concurrency related sanity checks. Rollback code from wsrep_rollback_process() was extracted to separate calls for better readability. Post rollback thread was removed as it was completely unused.	2019-08-30 08:42:24 +03:00
Marko Mäkelä	d3dcec5d65	Merge 10.3 into 10.4	2019-05-05 15:06:44 +03:00
Teemu Ollakka	b4c75f685b	MDEV-18480 Backwards compatibility in log_view() Galera versions below 4.x do not generate unique sequence number for view events. Take this into account when writing the SE checkpoint to avoid debug assertion in InnoDB.	2019-02-18 08:26:20 +02:00
Brave Galera Crew	36a2a185fe	Galera4	2019-01-23 15:30:00 +04:00

33 commits