mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-17 12:32:27 +01:00

Author	SHA1	Message	Date
Sergei Golubchik	e78ce63291	MDEV-17711 Assertion `arena_for_set_stmt== 0' failed in LEX::set_arena_for_set_stmt upon SET STATEMENT restore SET STATEMENT variables between statements in a multi-statement	2023-09-06 22:38:41 +02:00
Daniel Black	fbc157ab33	MDEV-31545 GCC 13 -Wdangling-pointer in execute_show_status() The pointer was used deep in the call path. Resolve this by setting the pointer to NULL at the end of the function. Tested with gcc-13.3.1 (fc38) The warning disable `38fe266ea9` can be reverted in 10.6+ on merge.	2023-07-19 11:20:29 +10:00
Teemu Ollakka	6966d7fe4b	MDEV-29293 MariaDB stuck on starting commit state This is a backport from 10.5. The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data. * Applier has innodb side global lock mutex and victim trx mutex. * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex. * Applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill. The fix in this commit removes the TOI replication of KILL command and makes KILL execution less intrusive operation. Aborting the victim happens now by using awake_no_mutex() and ha_abort_transaction(). If the KILL happens when the transaction is committing, the KILL operation is postponed to happen after the statement has completed in order to avoid KILL to interrupt commit processing. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill. The fix is to reset wsrep client error state, before a THD is reused for next connetion. * Release THD locks in wsrep_abort_transaction when locking innodb mutexes. This guarantees same locking order as with applier BF aborting. * BF abort from MDL was changed to do BF abort on server/wsrep-lib side first, and only then do the BF abort on InnoDB side. This removes the need to call back from InnoDB for BF aborts which originate from MDL and simplifies the locking. * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h. The manipulation of the wsrep_aborter can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. * Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: * Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). * Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. * Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. * A new test galera_kill_group_commit to verify correct behavior when KILL is executed while the transaction is committing. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com> Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-22 00:33:37 +02:00
Jan Lindström	28eaf66e18	MDEV-30388 : Assertion `!wsrep_has_changes(thd) \|\| (thd->lex->sql_command == SQLCOM_CREATE_TABLE && !thd->is_current_stmt_binlog_format_row()) \|\| thd->wsrep_cs().transaction().state() == wsrep::transaction::s_aborted' failed Problem for Galera is the fact that sequences are not really transactional. Sequence operation is committed immediately in sql_sequence.cd and later Galera could find out that we have changes but actual statement is not there anymore. Therefore, we must make some restrictions what kind of sequences Galera can support. (1) Galera cluster supports only sequences implemented by InnoDB storage engine. This is because Galera replication supports currently only InnoDB. (2) We do not allow LOCK TABLE on sequence object and we do not allow sequence creation under LOCK TABLE, instead lock is released and we issue warning. (3) We allow sequences with NOCACHE definition or with INCREMEMENT BY 0 CACHE=n definition. This makes sure that sequence values are unique accross Galera cluster. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-05-11 14:34:03 +02:00
Daniele Sciascia	feeeacc4d7	MDEV-30955 Explicit locks released too early in rollback path Assertion `thd->mdl_context.is_lock_owner()` fires when a client is disconnected, while transaction and and a table is opened through `HANDLER` interface. Reason for the assertion is that when a connection closes, its ongoing transaction is eventually rolled back in `Wsrep_client_state::bf_rollback()`. This method also releases explicit which are expected to survive beyond the transaction lifetime. This patch also removes calls to `mysql_ull_cleanup()`. User level locks are not supported in combination with Galera, making these calls unnecessary.	2023-04-18 13:57:59 +02:00
Igor Babaev	f33fc2fae5	MDEV-30539 EXPLAIN EXTENDED: no message with queries for DML statements EXPLAIN EXTENDED for an UPDATE/DELETE/INSERT/REPLACE statement did not produce the warning containing the text representation of the query obtained after the optimization phase. Such warning was produced for SELECT statements, but not for DML statements. The patch fixes this defect of EXPLAIN EXTENDED for DML statements.	2023-03-25 12:36:59 -07:00
Sergei Golubchik	a777a8a6a3	KILL USER and missing privileges note that `KILL USER foo` should not fail with ER_KILL_DENIED_ERROR when SHOW PROCESSLIST doesn't show connections of that user. Because no connections exist or because the caller has no PROCESS - doesn't matter. also, fix the error message to make sense ("You are not owner of thread <current connection id>" is ridiculous)	2023-02-21 23:22:56 +01:00
Daniel Black	762fe015c1	MDEV-30558: ER_KILL_{,QUERY_}DENIED_ERROR - normalize id type The error string from ER_KILL_QUERY_DENIED_ERROR took a different type to ER_KILL_DENIED_ERROR for the thread id. This shows up in differences on 32 big endian arches like powerpc (Deb notation). Normalize the passing of the THD->id to its real type of my_thread_id, and cast to (long long) on output. As such normalize the ER_KILL_QUERY_DENIED_ERROR to that convention too. Note for upwards merge, convert the type to %lld on new translations of ER_KILL_QUERY_DENIED_ERROR.	2023-02-07 19:28:18 +11:00
Oleksandr Byelkin	a977054ee0	Merge branch '10.3' into 10.4	2023-01-28 18:22:55 +01:00
Igor Babaev	ea270178b0	MDEV-30052 Crash with a query containing nested WINDOW clauses Use SELECT_LEX to save lists for ORDER BY and GROUP BY before parsing WINDOW clauses / specifications. This is needed for proper parsing of a nested WINDOW clause when a WINDOW clause is used in a subquery contained in another WINDOW clause. Fix assignment of empty SQL_I_List to another one (in case of empty list next shoud point on first).	2023-01-20 09:07:02 +01:00
Alexander Barkov	284ac6f2b7	MDEV-27653 long uniques don't work with unicode collations	2023-01-19 20:33:03 +04:00
Marko Mäkelä	fb0808c450	Merge 10.3 into 10.4	2023-01-03 16:10:02 +02:00
Sergei Golubchik	ca23558a05	--skip-name-resolve=0 didn't work custom code in `case OPT_SKIP_RESOLVE` was overriding the correct value from handle_options().	2023-01-02 00:04:03 +01:00
Marko Mäkelä	f97f6955bd	Merge 10.3 into 10.4	2022-12-14 06:20:04 +02:00
Daniel Black	697dbd15e0	MDEV-21187: log_slow_filter="" logs queries not using indexes Consistent with MDEV-4206 and empty log_slow_filter still means no explict filtering. Since `21518ab2e4` however the log_queries_not_using_indexes became stored in the same variable. As we need to test for the absense of log_queries_not_using_indexes the SERVER_QUERY_NO_INDEX USED part of log_slow_statement, the empty criteria resulted in an always true to log queries not using indexes if log_slow_filter was set to empty. Adjusted the log_slow.test for MDEV-4206 as slow_log_query has been global and session for a while and it was relying on the MDEV-21187 buggy behavior to detect a slow query. Reviewer: Monty	2022-12-14 10:15:32 +11:00
Daniele Sciascia	c2fc5266ad	MDEV-29880 Galera test failure on GCF-336 Fix `wsrep_table_accessible_when_detached()` so that commands that access no tables are rejected while a node is disconnected from a cluster. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2022-11-29 07:02:35 +02:00
Marko Mäkelä	667d3fbbb5	Merge 10.3 into 10.4	2022-10-25 10:04:37 +03:00
Anel Husakovic	64f822c142	MDEV-28455: CREATE TEMPORARY TABLES privilege is insufficient for SHOW COLUMNS =========== Problem ============= - `show columns` is not working for temporary tables, even though there is enough privilege `create temporary tables`. =========== Solution ============= - Append `TMP_TABLE_ACLS` privilege when running `show columns` for temp tables. - Additionally `check_access()` for database only once, not for each field =========== Additionally ============= - Update comments for function `check_table_access` arguments Reviewed by: <vicentiu@mariadb.org>	2022-10-18 10:25:55 +03:00
Sergei Golubchik	5f26f50020	typo fixed, followup for `3fe55fa8be`	2022-10-07 15:24:02 +02:00
Sergei Golubchik	3fe55fa8be	CREATE ... VALUES ... didn't require INSERT privilege	2022-10-07 14:41:03 +02:00
Sergei Golubchik	d4f6d2f08f	Merge branch '10.3' into 10.4	2022-10-01 23:07:26 +02:00
Anel Husakovic	1f51d6c0f6	MDEV-28548: ER_TABLEACCESS_DENIED_ERROR is missing information about DB - Added missing information about database of corresponding table for various types of commands - Update some typos - Reviewed by: <vicentiu@mariadb.org>	2022-09-30 08:48:57 +02:00
Marko Mäkelä	3c92050d1c	Fix build without either ENABLED_DEBUG_SYNC or DBUG_OFF There are separate flags DBUG_OFF for disabling the DBUG facility and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility. Let us allow debug builds without DEBUG_SYNC. Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to define ENABLED_DEBUG_SYNC.	2022-09-23 17:37:52 +03:00
Oleksandr Byelkin	3bb36e9495	Merge branch '10.3' into 10.4	2022-07-27 11:02:57 +02:00
Sergei Golubchik	bc4098582b	MDEV-29074 GET_BIT variables crash in SET STATEMENT	2022-07-26 14:42:32 +02:00
Oleksandr Byelkin	92a3280998	table_count was present twice in one class of LEX. Remove table_count from Query_tables_list (not used, moved to MYSQL_LOCK). Rename table_count from LEX to avoid mixing it with other counters of tables.	2022-07-14 09:46:06 +02:00
Jan Lindström	4dffa7b5c5	MDEV-28546 : Possible to write/update with read_only=ON and not a SUPER privilege Function wsrep_read_only_option was already removed in commit `d54bc3c0d1` because it could cause race condition on variable opt_readonly so that value OFF can become permanent. Removed function again and added test case. Note that writes to TEMPORARY tables are still allowed when read_only=ON.	2022-05-17 10:28:21 +03:00
Sergei Golubchik	a70a1cf3f4	Merge branch '10.3' into 10.4	2022-05-08 23:03:08 +02:00
Sergei Golubchik	6f741eb6e4	Merge branch '10.2' into 10.3	2022-05-07 11:48:15 +02:00
Sergei Petrunia	ba4927e520	MDEV-19398: Assertion `item1->type() == Item::FIELD_ITEM ... Window Functions code tries to minimize the number of times it needs to sort the select's resultset by finding "compatible" OVER (PARTITION BY ... ORDER BY ...) clauses. This employs compare_order_elements(). That function assumed that the order expressions are Item_field-derived objects (that refer to a temp.table). But this is not always the case: one can construct queries order expressions are arbitrary item expressions. Add handling for such expressions: sort them according to the window specification they appeared in. This means we cannot detect that two compatible PARTITION BY clauses that use expressions can share the sorting step. But at least we won't crash.	2022-05-04 15:47:45 +03:00
Igor Babaev	39feab3cd3	MDEV-26412 Server crash in Item_field::fix_outer_field for INSERT SELECT IF an INSERT/REPLACE SELECT statement contained an ON expression in the top level select and this expression used a subquery with a column reference that could not be resolved then an attempt to resolve this reference as an outer reference caused a crash of the server. This happened because the outer context field in the Name_resolution_context structure was not set to NULL for such references. Rather it pointed to the first element in the select_stack. Note that starting from 10.4 we cannot use the SELECT_LEX::outer_select() method when parsing a SELECT construct. Approved by Oleksandr Byelkin <sanja@mariadb.com>	2022-04-27 08:23:01 -07:00
Daniele Sciascia	11e5aba792	MDEV-26575 Crash on shutdown after starting an XA transaction Disallow XA when Galera library is loaded. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2022-04-20 10:41:19 +03:00
Marko Mäkelä	f089f8d95e	MDEV-23328 fixup: sign mismatch in format strings kill_one_thread(): Fix integer sign mismatch in some format strings. Some of this was introduced in commit `5c230b21bf`	2022-04-06 08:59:41 +03:00
Daniele Sciascia	c63eab2c68	MDEV-28055: Galera ps-protocol fixes * Fix test galera.MW-44 to make it work with --ps-protocol * Skip test galera.MW-328C under --ps-protocol This test relies on wsrep_retry_autocommit, which has no effect under ps-protocol. * Return WSREP related errors on COM_STMT_PREPARE commands Change wsrep_command_no_result() to allow sending back errors when a statement is prepared. For example, to handle deadlock error due to BF aborted transaction during prepare. * Add sync waiting before statement prepare When a statement is prepared, tables used in the statement may be opened and checked for existence. Because of that, some tests (for example galera_create_table_as_select) that CREATE a table in one node and then SELECT from the same table in another node may result in errors due to non existing table. To make tests behave similarly under normal and PS protocol, we add a call to sync wait before preparing statements that would sync wait during normal execution. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2022-03-18 08:30:26 +02:00
sjaakola	97582f1c06	MDEV-27649 PS conflict handling causing node crash Handling BF abort for prepared statement execution so that EXECUTE processing will continue until parameter setup is complete, before BF abort bails out the statement execution. THD class has new boolean member: wsrep_delayed_BF_abort, which is set if BF abort is observed in do_command() right after reading client's packet, and if the client has sent PS execute command. In such case, the deadlock error is not returned immediately back to client, but the PS execution will be started. However, the PS execution loop, will now check if wsrep_delayed_BF_abort is set, and stop the PS execution after the type information has been assigned for the PS. With this, the PS protocol type information, which is present in the first PS EXECUTE command, is not lost even if the first PS EXECUTE command was marked to abort. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2022-03-18 08:30:25 +02:00
Daniel Black	069139a549	Merge 10.3 to 10.4 extra2_read_len resolved by keeping the implementation in sql/table.cc by exposed it for use by ha_partition.cc Remove identical implementation in unireg.h (ref: `bfed2c7d57`)	2022-03-16 16:39:10 +11:00
Daniel Black	6a2d88c132	Merge 10.2 to 10.3	2022-03-16 12:51:22 +11:00
Daniel Black	57dbe8785d	MDEV-23915 ER_KILL_DENIED_ERROR not passed a thread id (part 2) Per Marko's comment in JIRA, sql_kill is passing the thread id as long long. We change the format of the error messages to match, and cast the thread id to long long in sql_kill_user.	2022-03-16 09:37:45 +11:00
Daniel Black	99837c61a6	MDEV-23915 ER_KILL_DENIED_ERROR not passed a thread id The 10.5 test error main.grant_kill showed up a incorrect thread id on a big endian architecture. The cause of this is the sql_kill_user function assumed the error was ER_OUT_OF_RESOURCES, when the the actual error was ER_KILL_DENIED_ERROR. ER_KILL_DENIED_ERROR as an error message requires a thread id to be passed as unsigned long, however a user/host was passed. ER_OUT_OF_RESOURCES doesn't even take a user/host, despite the optimistic comment. We remove this being passed as an argument to the function so that when MDEV-21978 is implemented one less compiler format warning is generated (which would have caught this error sooner). Thanks Otto for reporting and Marko for analysis.	2022-03-16 09:37:45 +11:00
Oleksandr Byelkin	bb46b79c8c	Fix mutex order according to a new sequence.	2021-11-02 13:09:35 +01:00
Jan Lindström	eab7f5d8bc	MDEV-23328 Server hang due to Galera lock conflict resolution * Fix error handling NULL-pointer reference * Add mtr-suppression on galera_ssl_upgrade	2021-11-02 10:08:54 +02:00
Jan Lindström	db64924454	MDEV-23328 Server hang due to Galera lock conflict resolution * Fix error handling NULL-pointer reference * Add mtr-suppression on galera_ssl_upgrade	2021-11-02 07:23:40 +02:00
Jan Lindström	e571eaae9f	MDEV-23328 Server hang due to Galera lock conflict resolution Use better error message when KILL fails even in case TOI fails.	2021-11-02 07:20:30 +02:00
Jan Lindström	ea239034de	MDEV-23328 Server hang due to Galera lock conflict resolution * Fix error handling NULL-pointer reference * Add mtr-suppression on galera_ssl_upgrade	2021-11-01 13:07:55 +02:00
sjaakola	157b3a637f	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 10:00:17 +03:00
sjaakola	5c230b21bf	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 09:52:52 +03:00
Jan Lindström	aa7ca987db	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 09:52:40 +03:00
sjaakola	db50ea3ad3	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 07:57:18 +03:00
Marko Mäkelä	f59f5c4a10	Revert MDEV-25114 Revert `88a4be75a5` and `9d97f92feb`, which had been prematurely pushed by accident.	2021-09-24 16:21:20 +03:00
sjaakola	88a4be75a5	MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL) This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This patch also fixes mutex locking order and unprotected THD member accesses on bf aborting case. We try to hold THD::LOCK_thd_data during bf aborting. Only case where it is not possible is at wsrep_abort_transaction before call wsrep_innobase_kill_one_trx where we take InnoDB mutexes first and then THD::LOCK_thd_data. This will also fix possible race condition during close_connection and while wsrep is disconnecting connections. Added wsrep_bf_kill_debug test case Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-09-24 09:47:31 +03:00

1 2 3 4 5 ...

7569 commits