mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 12:02:42 +01:00

Author	SHA1	Message	Date
Oleksandr Byelkin	99b370e023	Merge branch '11.2' into 11.4	2024-05-21 19:38:51 +02:00
Julius Goryavsky	b88c20ce1b	Merge branch 10.4 into 10.5	2024-05-06 13:55:42 +02:00
mkaruza	136358036d	MDEV-18590: galera.versioning_trx_id: Test failure: mysqltest: Result content mismatch Replicated events have time associated with them from originating node which will be used for commit timestamp. Associated time can be set in past before event is even applied. For WSREP replication we don't need to use time information from event. Addressed review comments: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-04-27 18:40:58 +02:00
Sergei Golubchik	c154aafe1a	Merge remote-tracking branch '11.3' into 11.4	2023-12-21 15:40:55 +01:00
Marko Mäkelä	4ae105a37d	Merge 10.4 into 10.5	2023-12-18 08:59:07 +02:00
Daniele Sciascia	61daac54d6	MDEV-27806 GTIDs diverge in Galera cluster after CTAS Add OPTION_GTID_BEGIN to applying side thread. This is needed to avoid intermediate commits when CREATE TABLE AS SELECT is applied, causing one more GTID to be consumed with respect to executing node. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-12-12 05:55:34 +01:00
Jan Lindström	99a3fe54d9	MDEV-31273: Precompute binlog checksums After `b8f9f796` Format_description_log_event constructor requires binlog checksum algorithm or BINLOG_CHECKSUM_ALG_OFF for Galera also. Thanks to Kristian Nielsen <knielsen@knielsen-hq.org> for pointing a fix.	2023-11-16 08:04:42 +02:00
Oleksandr Byelkin	6cfd2ba397	Merge branch '10.4' into 10.5	2023-11-08 12:59:00 +01:00
Sergei Golubchik	b00fd50fd8	MDEV-32541 Assertion `!(thd->server_status & (1U \| 8192U))' failed in MDL_context::release_transactional_locks SERVER_STATUS_IN_TRANS_READONLY should never be set without SERVER_STATUS_IN_TRANS. They're set together, must be removed together.	2023-10-23 17:40:03 +02:00
Teemu Ollakka	f307160218	MDEV-29293 MariaDB stuck on starting commit state This commit contains a merge from 10.5-MDEV-29293-squash into 10.6. Although the bug MDEV-29293 was not reproducible with 10.6, the fix contains several improvements for wsrep KILL query and BF abort handling, and addresses the following issues: * MDEV-30307 KILL command issued inside a transaction is problematic for galera replication: This commit will remove KILL TOI replication, so Galera side transaction context is not lost during KILL. * MDEV-21075 KILL QUERY maintains nodes data consistency but breaks GTID sequence: This is fixed as well as KILL does not use TOI, and thus does not change GTID state. * MDEV-30372 Assertion in wsrep-lib state: This was caused by BF abort or KILL when local transaction was in the middle of group commit. This commit disables THD::killed handling during commit, so the problem is avoided. * MDEV-30963 Assertion failure !lock.was_chosen_as_deadlock_victim in trx0trx.h:1065: The assertion happened when the victim was BF aborted via MDL while it was committing. This commit changes MDL BF aborts so that transactions which are committing cannot be BF aborted via MDL. The RQG grammar attached in the issue could not reproduce the crash anymore. Original commit message from 10.5 fix: MDEV-29293 MariaDB stuck on starting commit state The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Make galera_var_retry_autocommit result more readable by echoing cases and expectations into result. Only one expected result for reap to verify that server returns expected status for query. * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_bf_abort_registering to check that registering trx gets BF aborted through MDL. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:42:05 +02:00
Teemu Ollakka	3f59bbeeae	MDEV-29293 MariaDB stuck on starting commit state The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:39:43 +02:00
Teemu Ollakka	6966d7fe4b	MDEV-29293 MariaDB stuck on starting commit state This is a backport from 10.5. The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:33:37 +02:00
Oleksandr Byelkin	1d74927c58	Merge branch '10.4' into 10.5	2023-04-24 12:43:47 +02:00
Daniele Sciascia	feeeacc4d7	MDEV-30955 Explicit locks released too early in rollback path Assertion `thd->mdl_context.is_lock_owner()` fires when a client is disconnected, while transaction and and a table is opened through `HANDLER` interface. Reason for the assertion is that when a connection closes, its ongoing transaction is eventually rolled back in `Wsrep_client_state::bf_rollback()`. This method also releases explicit which are expected to survive beyond the transaction lifetime. This patch also removes calls to `mysql_ull_cleanup()`. User level locks are not supported in combination with Galera, making these calls unnecessary.	2023-04-18 13:57:59 +02:00
Oleksandr Byelkin	ac5a534a4c	Merge remote-tracking branch '10.4' into 10.5	2023-03-31 21:32:41 +02:00
Teemu Ollakka	cadc3efcdd	MDEV-27317 wsrep_checkpoint order violation due to certification failure With binlogs enabled, debug assertion ut_ad(xid_seqno > wsrep_seqno) fired in trx_rseg_update_wsrep_checkpoint() when an applier thread synced the seqno out of order for write set which had failed certification. This was caused by releasing commit order too early when binlogs were on, allowing group commit to run in parallel and commit following transactions too early. Fixed by extending the commit order critical section to cover call to wsrep_set_SE_checkpoint() also when binlogs are on. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-03-31 12:41:24 +02:00
Jan Lindström	015fb54d45	MDEV-25037 : SIGSEGV in MDL_lock::hog_lock_types_bitmap We should not call mdl_context.release_explicit_locks() in Wsrep_client_service::bf_rollback() if client is quiting because it will be done again in THD::cleanup(). Note that problem with GET_LOCK() / RELEASE_LOCK() will be fixed on MDEV-30473.	2023-01-27 08:38:27 +02:00
Jan Lindström	179c283372	Merge branch 10.4 into 10.5	2023-01-14 08:25:57 +02:00
sjaakola	cd97523dcf	MDEV-30317 Transaction savepoint may cause failure in galera replaying Created mtr test for reproducing the crash Developed actual fix for the issue. Setting THD::system_thread_info.rpl_sql_info for replayer thread, same way as it is handled for appliers. Recorded test result, with the fix Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2023-01-13 13:11:03 +02:00
Marko Mäkelä	6286a05d80	Merge 10.4 into 10.5	2022-09-26 13:34:38 +03:00
Marko Mäkelä	3c92050d1c	Fix build without either ENABLED_DEBUG_SYNC or DBUG_OFF There are separate flags DBUG_OFF for disabling the DBUG facility and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility. Let us allow debug builds without DEBUG_SYNC. Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to define ENABLED_DEBUG_SYNC.	2022-09-23 17:37:52 +03:00
Jan Lindström	ba987a46c9	Merge 10.4 into 10.5	2022-09-05 13:28:56 +03:00
Daniele Sciascia	2917bd0d2c	Reduce compilation dependencies on wsrep_mysqld.h Making changes to wsrep_mysqld.h causes large parts of server code to be recompiled. The reason is that wsrep_mysqld.h is included by sql_class.h, even tough very little of wsrep_mysqld.h is needed in sql_class.h. This commit introduces a new header file, wsrep_on.h, which is meant to be included from sql_class.h, and contains only macros and variable declarations used to determine whether wsrep is enabled. Also, header wsrep.h should only contain definitions that are also used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and WSREP_SYNC_WAIT macros to wsrep_mysqld.h. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2022-08-31 11:05:23 +03:00
Marko Mäkelä	d62b0368ca	Merge 10.4 into 10.5	2022-03-29 12:59:18 +03:00
mkaruza	507030c492	MDEV-27713 Crash after a conflict of applier thread with stored procedure call by event scheduler When thread is BF aborted by high priority service, ULL (user level locks need to be removed and released). Calling directly release of lock for MDL_EXPLICIT type doesn't clear also `thd->ull_hash`. Method `mysql_ull_cleanup` will properly clear all information about ULL locks for thread. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2022-03-18 08:30:26 +02:00
Marko Mäkelä	87ff4ba7c8	Merge 10.4 into 10.5	2021-08-26 08:46:57 +03:00
Marko Mäkelä	15b691b7bd	After-merge fix `f84e28c119` In a rebase of the merge, two preceding commits were accidentally reverted: commit `112b23969a` (MDEV-26308) commit `ac2857a5fb` (MDEV-25717) Thanks to Daniele Sciascia for noticing this.	2021-08-25 17:35:44 +03:00
Marko Mäkelä	f84e28c119	Merge 10.3 into 10.4	2021-08-18 16:51:52 +03:00
Daniele Sciascia	ac2857a5fb	MDEV-25717 Assertion `owning_thread_id_ == wsrep::this_thread::get_id()' A test case to reproduce the issue. The actual fix is in galera library. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-08-18 12:28:11 +03:00
Leandro Pacheco	112b23969a	MDEV-26308 : Galera test failure on galera.galera_split_brain Contains following fixes: * allow TOI commands to timeout while trying to acquire TOI with override lock_wait_timeout with a LONG_TIMEOUT only after succesfully entering TOI * only ignore lock_wait_timeout on TOI * fix galera_split_brain test as TOI operation now returns ER_LOCK_WAIT_TIMEOUT after lock_wait_timeout * explicitly test for TOI Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-08-18 08:57:33 +03:00
Marko Mäkelä	190a8312f5	Merge 10.4 into 10.5	2021-03-18 15:07:01 +02:00
Jan Lindström	5511c21b41	MDEV-23851 BF-BF Conflict issue because of UK GAP locks Add missing dbug sync point and correct the test case.	2021-03-17 08:35:38 +02:00
Oleksandr Byelkin	02e7bff882	Merge commit '10.4' into 10.5	2021-01-06 10:53:00 +01:00
sjaakola	41a961d85c	MDEV-24327 wsrep XID checkpointing order violation with cert failures Handling of write sets, which fail in certification happens differently than with write sets which pass certification. When certification fails, the write set applying can be skipped and applier needs only to take care of wsrep XID checkpointing. With current implementation, this can rush ahead of wsrep XID checkpointing of successful write sets. The fix in this PR registers wsrep XID checkpointing of certification failure cases in group commit, which guarantees that XID ceckpointing order is synchronized with real committing transactions. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2020-12-17 09:14:49 +02:00
sjaakola	87fa6d2c5c	MDEV-24327 wsrep XID checkpointing order with log_slave_updates=OFF If log_slave_updates==OFF, wsrep applier threads used to be configured with option: thd->variables.option_bits&= ~(OPTION_BIN_LOG); (i.e. like sql_log_bin=ON). And this was regardless of log-bin configuration. With this, having configuration of: --log-bin && --log-slave-updates=OFF, local threads used binlogging, but applier threads did not. And further: local threads went through binlog group commit, while applier threads did direct commits. This resulted in situation, where applier threads entered earlier in wsrep XID checkpointing, and could sync their wsrep XID out of order. Later local thread commit would see that higher seqno was already checkpointed, and fire an assert because of this. As a fix, applier threads are now forced to enable binlogging regardless of log-slave-updates configuration. This PR comes with new mtr test: galera.MDEV-24327, which causes a scenario where applier transaction is applied and committed while earlier local transaction is parked before commit order monitor enter. A buggy mariadb versoin would fail for assertion because of wsrep XID checkpoint order violation.	2020-12-17 08:12:58 +02:00
Marko Mäkelä	6a1e655cb0	Merge 10.4 into 10.5	2020-12-02 18:29:49 +02:00
Marko Mäkelä	24ec8eaf66	MDEV-15532 after-merge fixes from Monty The Galera tests were massively failing with debug assertions.	2020-12-02 16:16:29 +02:00
Marko Mäkelä	bb284e3fdb	Merge 10.4 into 10.5	2020-08-27 09:34:23 +03:00
Jan Lindström	571764c04f	MDEV-23584 : Galera assertion failure "thd->transaction.stmt.is_empty() at transaction.cc:69 If statement still contains changes it must be committed before actual transaction is committed. This assertion was found using randgen and happens only on applier. No repeatable test case found.	2020-08-26 13:34:51 +03:00
Marko Mäkelä	50a11f396a	Merge 10.4 into 10.5	2020-08-01 14:42:51 +03:00
Teemu Ollakka	3d76af0814	MDEV-22998 Free thd->mem_root at applier commit or rollback.	2020-07-24 19:33:45 +03:00
Monty	b3179b7e32	Cleaned up compile failure if DEBUG_SYNC is disabled in DEBUG builds	2020-06-14 19:39:43 +03:00
Julius Goryavsky	3ccd6766d0	Fixed compilation error in DCMAKE_BUILD_TYPE=mysql_release mode when WSREP enabled	2020-06-10 03:51:49 +02:00
Marko Mäkelä	ccc06931c3	Merge 10.4 into 10.5	2020-04-08 10:36:41 +03:00
Daniele Sciascia	bdcecfa22c	MDEV-22021: Galera database could get inconsistent with rollback to savepoint When binlog is disabled, WSREP will not behave correctly when SAVEPOINT ROLLBACK is executed and we will not rollback transaction.	2020-03-31 14:18:21 +03:00
mkaruza	41bc736871	Galera GTID support Support for galera GTID consistency thru cluster. All nodes in cluster should have same GTID for replicated events which are originating from cluster. Cluster originating commands need to contain sequential WSREP GTID seqno Ignore manual setting of gtid_seq_no=X. In master-slave scenario where master is non galera node replicated GTID is replicated and is preserved in all nodes. To have this - domain_id, server_id and seqnos should be same on all nodes. Node which bootstraps the cluster, to achieve this, sends domain_id and server_id to other nodes and this combination is used to write GTID for events that are replicated inside cluster. Cluster nodes that are executing non replicated events are going to have different GTID than replicated ones, difference will be visible in domain part of gtid. With wsrep_gtid_domain_id you can set domain_id for WSREP cluster. Functions WSREP_LAST_WRITTEN_GTID, WSREP_LAST_SEEN_GTID and WSREP_SYNC_WAIT_UPTO_GTID now works with "native" GTID format. Fixed galera tests to reflect this chances. Add variable to manually update WSREP GTID seqno in cluster Add variable to manipulate and change WSREP GTID seqno. Next command originating from cluster and on same thread will have set seqno and cluster should change their internal counter to it's value. Behavior is same as using @@gtid_seq_no for non WSREP transaction.	2020-01-29 15:06:06 +02:00
Marko Mäkelä	ded128aa9b	Merge 10.4 into 10.5	2020-01-20 16:48:56 +02:00
Daniele Sciascia	2d4b6571ec	Wsrep position not updated in InnoDB after certification failures (#1432 ) A certification failure followed by a clean shutdown would cause an inconsistency between the sequence number stored in innodb and the sequence number stored in provider. This happened both in the case of local certification failure, and in the case where dummy writeset is applied. The fix consists of: - updating wsrep position after dummy writeset is delivered in `Wsrep_high_priority_service::log_dummy_write_set()` - updating wsrep position while releasing commit order in wsrep-lib side Added two tests which stress the situation where a server is shutdown after a certification failure.	2020-01-14 07:33:02 +02:00
Marko Mäkelä	ca8c3be47d	Merge 10.4 into 10.5	2020-01-03 16:15:40 +02:00
Teemu Ollakka	13b3d7f1f1	MDEV-20793 Assertion failed after replay. Assertion failed in wsrep-lib after transaction replay which failed due to conflict in certification. - Implemented reproducible test case MDEV-20793 to reproduce the crash. - Fixed wsrep-lib to deal with certification error during replay.	2019-12-31 11:46:55 +02:00

1 2

67 commits