Commit graph

348 commits

Author SHA1 Message Date
sjaakola
5c230b21bf MDEV-23328 Server hang due to Galera lock conflict resolution
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is

          wsrep_thd_LOCK()
          wsrep_kill_victim()
          lock_rec_other_has_conflicting()
          lock_clust_rec_read_check_and_lock()
          row_search_mvcc()
          ha_innobase::index_read()
          ha_innobase::rnd_pos()
          handler::ha_rnd_pos()
          handler::rnd_pos_by_record()
          handler::ha_rnd_pos_by_record()
          Rows_log_event::find_row()
          Update_rows_log_event::do_exec_row()
          Rows_log_event::do_apply_event()
          Log_event::apply_event()
          wsrep_apply_events()

and mutexes are taken in the order

          lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data

When a normal KILL statement is executed, the stack is

          innobase_kill_query()
          kill_handlerton()
          plugin_foreach_with_mask()
          ha_kill_query()
          THD::awake()
          kill_one_thread()

        and mutexes are

          victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.

In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.

TOI replication is used, in this approach,  purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.

This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-10-29 09:52:52 +03:00
mkaruza
a75813d467 MDEV-22708 Assertion `!mysql_bin_log.is_open() || thd.is_current_stmt_binlog_format_row()' failed in Delayed_insert::handle_inserts and in Diagnostics_area::set_eof_status
Function `upgrade_lock_type` should check global binlog_format variable
instead of thread one.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-10-06 07:20:02 +03:00
Jan Lindström
d3b35598fc MDEV-26053 : TRUNCATE on table with Foreign Key Constraint no longer replicated to other nodes
Problem was that there was extra condition !thd->lex->no_write_to_binlog
before call to begin TOI. It seems that this variable is not initialized.
TRUNCATE does not support [NO_WRITE_TO_BINLOG | LOCAL] keywords, thus
we should not check this condition. All this was hidden in a macro,
so I decided to remove those macros that were used only a few places
with actual function calls.
2021-09-17 07:18:37 +03:00
Marko Mäkelä
15b691b7bd After-merge fix f84e28c119
In a rebase of the merge, two preceding commits were accidentally reverted:
commit 112b23969a (MDEV-26308)
commit ac2857a5fb (MDEV-25717)

Thanks to Daniele Sciascia for noticing this.
2021-08-25 17:35:44 +03:00
Marko Mäkelä
f84e28c119 Merge 10.3 into 10.4 2021-08-18 16:51:52 +03:00
Leandro Pacheco
112b23969a MDEV-26308 : Galera test failure on galera.galera_split_brain
Contains following fixes:

* allow TOI commands to timeout while trying to acquire TOI with
override lock_wait_timeout with a LONG_TIMEOUT only after
succesfully entering TOI
* only ignore lock_wait_timeout on TOI
* fix galera_split_brain test as TOI operation now returns ER_LOCK_WAIT_TIMEOUT after lock_wait_timeout
* explicitly test for TOI

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-08-18 08:57:33 +03:00
Leandro Pacheco
2b84e1c966 MDEV-23080: desync and pause node on BACKUP STAGE BLOCK_DDL
make BACKUP STAGE behave as FTWRL, desyncing and pausing the node
to prevent BF threads (appliers) from interfering with blocking stages.
This is needed because BF threads don't respect BACKUP MDL locks.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-07-27 08:11:41 +03:00
Sergei Golubchik
b34cafe9d9 cleanup: move thread_count to THD_count::value()
because the name was misleading, it counts not threads, but THDs,
and as THD_count is the only way to increment/decrement it, it
could as well be declared inside THD_count.
2021-07-24 12:37:50 +02:00
mkaruza
206d630ea0 MDEV-22227 Assertion `state_ == s_exec' failed in wsrep::client_state::start_transaction
Removed redundant code for BF abort transaction in `thr_lock.cc`.

TOI operations will ignore provided lock_wait_timeout and use `LONG_TIMEOUT`
until operation is finished.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-04-28 11:11:01 +03:00
mkaruza
c3b016efde MDEV-22668: "Flush SSL" command doesn't reload wsrep cert
Trigger `socket.ssl_reload` when FLUSH SSL is issued. To triger reloading
of certificate, key and CA, files needs to be physically changed.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-04-15 08:50:01 +03:00
Daniele Sciascia
915983e1cc MDEV-25226 Assertion when wsrep_on set OFF with SR transaction
This patch makes the following changes around variable wsrep_on:

1) Variable wsrep_on can no longer be updated from a session that has
an active transaction running. The original behavior allowed cases
like this:

     BEGIN;
     INSERT INTO t1 VALUES (1);
     SET SESSION wsrep_on = OFF;
     INSERT INTO t1 VALUES (2);
     COMMIT;

With regular transactions this would result in no replication
events (not even value 1). With streaming replication it would be
unnecessarily complex to achieve the same behavior. In the above
example, it would be possible for value 1 to be already replicated if
it happened to fill a separate fragment, while value 2 wouldn't.

2) Global variable wsrep_on no longer affects current sessions, only
subsequent ones. This is to avoid a similar case to the above, just
using just by using global wsrep_on instead session wsrep_on:

      --connection conn_1
      BEGIN;
      INSERT INTO t1 VALUES(1);

      --connection conn_2
      SET GLOBAL wsrep_on = OFF;

      --connection conn_1
      INSERT INTO t1 VALUES(2);
      COMMIT;

The above example results in the transaction to be replicated, as
global wsrep_on will only affect the session wsrep_on of new
connections.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-04-05 09:10:23 +03:00
Marko Mäkelä
d6d3d9ae2f Merge 10.2 into 10.3 2021-03-31 08:01:03 +03:00
Jan Lindström
d217a925b2 MDEV-24923 : Port selected Galera conflict resolution changes from 10.6
Add condition on trx->state == TRX_STATE_COMMITTED_IN_MEMORY in order to
avoid unnecessary work. If a transaction has already been committed or
rolled back, it will release its locks in lock_release() and let
the waiting thread(s) continue execution.

Let BF wait on lock_rec_has_to_wait and if necessary other BF
is replayed.

wsrep_trx_order_before
  If BF is not even replicated yet then they are ordered
  correctly.

bg_wsrep_kill_trx
  Make sure victim_trx is found and check also its state. If
  state is TRX_STATE_COMMITTED_IN_MEMORY transaction is
  already committed or rolled back and will release it locks
  soon.

wsrep_assert_no_bf_bf_wait
  Transaction requesting new record lock should be TRX_STATE_ACTIVE
  Conflicting transaction can be in states TRX_STATE_ACTIVE,
  TRX_STATE_COMMITTED_IN_MEMORY or in TRX_STATE_PREPARED.
  If conflicting transaction is already committed in memory or
  prepared we should wait. When transaction is committed in memory we
  held trx mutex, but not lock_sys->mutex. Therefore, we
  could end here before transaction has time to do lock_release()
  that is protected with lock_sys->mutex.

lock_rec_has_to_wait
  We very well can let bf to wait normally as other BF will be
  replayed in case of conflict. For debug builds we will do
  additional sanity checks to catch unsupported bf wait if any.

wsrep_kill_victim
  Check is victim already in TRX_STATE_COMMITTED_IN_MEMORY state and
  if it is we can return.

lock_rec_dequeue_from_page
lock_rec_unlock
  Remove unnecessary wsrep_assert_no_bf_bf_wait function calls.
  We can very well let BF wait here.
2021-03-30 08:58:10 +03:00
Sergei Golubchik
34fcd726a6 Merge branch 'bb-10.4-release' into 10.4 2021-02-23 00:08:56 +01:00
Jan Lindström
a5bcec727b MDEV-24865 : Server crashes when truncate mysql user table
For truncate we try to find out possible foreign key tables
using open_tables. However, table_list was not cleaned up
properly and there was no error handling. Fixed by cleaning
table_list and adding proper error handling.
2021-02-16 08:46:14 +02:00
Sergei Golubchik
2696538723 updating @@wsrep_cluster_address deadlocks
wsrep_cluster_address_update() causes LOCK_wsrep_slave_threads
to be locked under LOCK_wsrep_cluster_config, while normally
the order should be the opposite.

Fix: don't protect @@wsrep_cluster_address value with the
LOCK_wsrep_cluster_config, LOCK_global_system_variables is enough.

Only protect wsrep reinitialization with the LOCK_wsrep_cluster_config.
And make it use a local copy of the global @@wsrep_cluster_address.

Also, introduce a helper function that checks whether
wsrep_cluster_address is set and also asserts that it can be safely
read by the caller.
2021-02-14 23:18:42 +01:00
Sergei Golubchik
259a1902a0 cleanup: THD::abort_current_cond_wait()
* reuse the loop in THD::abort_current_cond_wait, don't duplicate it
* find_thread_by_id should return whatever it has found, it's the
  caller's task not to kill COM_DAEMON (if the caller's a killer)

and other minor changes
2021-02-12 18:05:34 +01:00
Sergei Golubchik
37e24970cb merge 2021-02-02 12:43:17 +01:00
Sergei Golubchik
2676c9aad7 galera fixes related to THD::LOCK_thd_kill
Since 2017 (c2118a08b1) THD::awake() no longer requires LOCK_thd_data.
It uses LOCK_thd_kill, and this latter mutex is used to prevent
a thread of dying, not LOCK_thd_data as before.
2021-02-02 10:02:17 +01:00
Sergei Golubchik
29bbcac0ee MDEV-23328 Server hang due to Galera lock conflict resolution
mutex order violation here.
when wsrep bf thread kills a conflicting trx, the stack is

  wsrep_thd_LOCK()
  wsrep_kill_victim()
  lock_rec_other_has_conflicting()
  lock_clust_rec_read_check_and_lock()
  row_search_mvcc()
  ha_innobase::index_read()
  ha_innobase::rnd_pos()
  handler::ha_rnd_pos()
  handler::rnd_pos_by_record()
  handler::ha_rnd_pos_by_record()
  Rows_log_event::find_row()
  Update_rows_log_event::do_exec_row()
  Rows_log_event::do_apply_event()
  Log_event::apply_event()
  wsrep_apply_events()

and mutexes are taken in the order

  lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data

When a normal KILL statement is executed, the stack is

  innobase_kill_query()
  kill_handlerton()
  plugin_foreach_with_mask()
  ha_kill_query()
  THD::awake()
  kill_one_thread()

and mutexes are

  victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

To fix the mutex order violation we kill the victim thd asynchronously,
from the manager thread
2021-01-24 11:35:55 +01:00
Alexey Yurchenko
033f8d13ce Update wsrep-lib (new logger interface)
Ensure consistent use of logging macros in wsrep-related code

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-01-07 17:41:21 +02:00
mkaruza
b79b3ff655 MDEV-23468: inline_mysql_socket_send: Assertion `mysql_socket.fd != -1' failed on shutdown
Closing remaining threads in `wsrep_close_client_connections` should also
check `thd_is_connection_alive` for thd before closing connection. Assert is
happening when thread already doing shutdown, but still not removed from threads
list.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2020-12-21 11:24:24 +02:00
Marko Mäkelä
24ec8eaf66 MDEV-15532 after-merge fixes from Monty
The Galera tests were massively failing with debug assertions.
2020-12-02 16:16:29 +02:00
sjaakola
2fbcddbeaf MDEV-24119 MDL BF-BF Conflict caused by TRUNCATE TABLE
A follow-up fix, for original fix for MDEV-21577, which did not handle well
temporary tables.

OPTIMIZE and REPAIR TABLE statements can take a list of tables as argument,
and some of the tables may be temporary. Proper handling of temporary tables
is to skip them and continue working on the real tables. The bad version, skipped all tables,
if a single temporary table was in the argument list. And this resulted so that FK parent
tables were not scnanned for the remaining real tables. Some mtr tests, using OPTIMIZE or REPAIR
for temporary tables caused regressions bacause of this, e.g. galera.galera_optimize_analyze_multi

The fix in this PR opens temporary and real tables first, and in the table handling loop skips
temporary tables, FK parent scanning is done only for real tables.

The test has new scenario for OPTIMIZE and REPAIR issued for two tables of which one is temporary table.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2020-11-11 17:46:50 +02:00
sjaakola
4d6c661144 MDEV-21577 MDL BF-BF conflict
Some DDL statements appear to acquire MDL locks for a table referenced by
foreign key constraint from the actual affected table of the DDL statement.
OPTIMIZE, REPAIR and ALTER TABLE belong to this class of DDL statements.

Earlier MariaDB version did not take this in consideration, and appended
only affected table in the certification key list in write set.
Because of missing certification information, it could happen that e.g.
OPTIMIZE table for FK child table could be allowed to apply in parallel
with DML operating on the foreign key parent table, and this could lead to
unhandled MDL lock conflicts between two high priority appliers (BF).

The fix in this patch, changes the TOI replication for OPTIMIZE, REPAIR and
ALTER TABLE statements so that before the execution of respective DDL
statement, there is foreign key parent search round. This FK parent search
contains following steps:
* open and lock the affected table (with permissive shared locks)
* iterate over foreign key contstraints and collect and array of Fk parent
  table names
* close all tables open for the THD and release MDL locks
* do the actual TOI replication with the affected table and FK parent
  table names as key values

The patch contains also new mtr test for verifying that the above mentioned
DDL statements replicate without problems when operating on FK child table.
The mtr test scenario #1, which can be used to check if some other DDL
(on top of OPTIMIZE, REPAIR and ALTER) could cause similar excessive FK
parent table locking.

Reviewed-by: Aleksey Midenkov <aleksey.midenkov@mariadb.com>
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2020-11-03 19:40:06 +02:00
Marko Mäkelä
c9cf6b13f6 Merge 10.3 into 10.4 2020-09-03 15:53:38 +03:00
Jan Lindström
33ae1616e0 MDEV-21578 : CREATE OR REPLACE TRIGGER in Galera cluster not replicating
In 10.3 OR REPLACE trigger option is part of create_info.
2020-09-03 14:10:42 +03:00
Marko Mäkelä
c3752cef3c Merge 10.2 into 10.3 2020-09-03 09:26:54 +03:00
Jan Lindström
c710c450e3 MDEV-21578 : CREATE OR REPLACE TRIGGER in Galera cluster not replicating
While doing TOI buffer OR REPLACE option was not added to replicated
string.
2020-08-28 16:40:12 +03:00
Daniele Sciascia
09dd06f14a MDEV-22443 wsrep::runtime_error on START TRANSACTION
This happens with global wsrep_on disabled and local wsrep_on enabled.
The fix consists in avoiding sync wait when global wsrep_on is
disabled.
2020-08-19 13:12:00 +03:00
Marko Mäkelä
b3e395a13e Merge 10.2 into 10.3 2020-06-06 18:50:25 +03:00
Julius Goryavsky
5f55f69e4a Merge 10.1 into 10.2 2020-06-05 18:32:37 +02:00
sjaakola
8ec0e9111a MDEV-22763 backporting MDEV-20225 fix into 10.1
Backported the support for aborting and replaying stored procedure and fix for trigger
key assigments from 10.4 version.
Backported also two mtr tests: wsrep_sp_bf_abort and MDEV-20225
2020-06-03 15:34:44 +02:00
Marko Mäkelä
2c39f69d34 MDEV-22203: WSREP_ON is unnecessarily expensive WITH_WSREP=OFF
If the server is compiled WITH_WSREP=OFF, we should avoid evaluating
conditions on a global variable that is constant.

WSREP_ON_: Renamed from WSREP_ON. Defined only WITH_WSREP=ON.

WSREP_ON: Defined as unlikely(WSREP_ON_).

wsrep_on(): Defined as WSREP_ON && wsrep_service->wsrep_on_func().

The reason why we have wsrep_on() at all is that the macro WSREP(thd)
depends on the definition of THD, and that is intentionally an opaque
data type for InnoDB. So, we cannot avoid invoking wsrep_on(), but
we can evaluate the less expensive condition WSREP_ON before calling
the function.
2020-04-24 15:25:39 +03:00
Jan Lindström
93475aff8d MDEV-22203: WSREP_ON is unnecessarily expensive to evaluate
Replaced WSREP_ON macro by single global variable WSREP_ON
that is then updated at server statup and on wsrep_on and
wsrep_provider update functions.
2020-04-24 13:12:46 +03:00
Marko Mäkelä
e52a36d37b Merge 10.1 into 10.2 2020-04-22 13:50:53 +03:00
Marko Mäkelä
7198c6ab2d MDEV-22271 Excessive stack memory usage due to WSREP_LOG
Several tests that involve stored procedures fail on 10.4 kvm-asan
(clang 10) due to stack overrun. The main contributor to this stack
overrun is mysql_execute_command(), which is invoked recursively
during stored procedure execution.

Rebuilding with cmake -DWITH_WSREP=OFF shrunk the stack frame size
of mysql_execute_command() by more than 10 kilobytes in a
WITH_ASAN=ON, CMAKE_BUILD_TYPE=Debug build. The culprit
turned out to be the macro WSREP_LOG, which is allocating a
separate 1KiB buffer for every occurrence.

We replace the macro with a function, so that the stack will be
allocated only when the function is actually invoked. In this way,
no stack space will be wasted by default (when WSREP and Galera
are disabled).

This backports commit b6c5657ef2
from MariaDB 10.3.1.

Without ASAN, compilers can be smarter and optimize the stack usage.
The original commit message mentions that 1KiB was saved on GCC 5.4,
and 4KiB on Mac OS X Lion, which presumably uses a clang-based compiler.
2020-04-17 10:54:56 +03:00
Teemu Ollakka
c79051e587 MDEV-22271 Excessive stack memory usage due to WSREP_LOG
- Made WSREP_LOG a function and moved the body out of header.
- Reduced the stack allocated buffer size and implemented
  reprint into dynamically allocated buffer if stack buffer is not
  large enough to hold the message.
2020-04-17 10:46:09 +03:00
mkaruza
edc3899d97 MDEV-22051: Protocol::end_statement(): Assertion `0' failed on Galera node upon DDL attempt with conflicting lock
If FTWRL is issued, DDL statements should report error back to user before
TOI is started.
2020-04-08 16:42:18 +03:00
Marko Mäkelä
1a9b6c4c7f Merge 10.2 into 10.3 2020-03-30 11:12:56 +03:00
seppo
5918b17004
MDEV-21473 conflicts with async slave BF aborting (#1475)
If async slave thread (slave SQL handler), becomes a BF victim, it may occasionally happen that rollbacker thread is used to carry out the rollback instead of the async slave thread.
This can happen, if async slave thread has flagged "idle" state when BF thread tries to figure out how to kill the victim.
The issue was possible to test by using a galera cluster as slave for external master, and issuing high load of conflicting writes through async replication and directly against galera cluster nodes.
However, a deterministic mtr test for the "conflict window" has not yet been worked on.

The fix, in this patch makes sure that async slave thread state is never set to IDLE. This prevents the rollbacker thread to intervene.
The wsrep_query_state change was refactored to happen by dedicated function to make controlling the idle state change in one place.
2020-03-24 11:01:42 +02:00
Oleksandr Byelkin
ceda5f724f Merge branch '10.2' into 10.3 2020-01-24 14:16:20 +01:00
Oleksandr Byelkin
f2ccfcaca1 Merge branch '10.1' into 10.2 2020-01-24 13:46:49 +01:00
Jan Lindström
8a931e4d16 MDEV-17571 : Make systemd timeout behavior more compatible with long Galera SSTs
This is 10.4 version.

Idea is to create monitor thread for both donor and joiner that will
periodically if needed extend systemd timeout while SST is being
processed. In 10.4 actual SST is executed by running SST script
and exchanging messages on pipe using blocking fgets. This fix
starts monitoring thread before SST script is started and
we stop monitoring thread when SST has been completed.
2020-01-22 16:55:59 +02:00
Julius Goryavsky
578b6ba02a MDEV-19457: sys_vars.wsrep_provider_basic failed in buildbot
If the initialization of the wsrep provider failed, in some
cases the internal variable wrep_inited indicating that the
initialization has already been completed is still set to
"1", which then leads to confusion in the initialization
status. To solve the problem, we should set this variable
to "1" only if the wsrep provider initialization really
completed successfully.

An earlier issue has already been fixed for branch 10.4,
and this patch contains a fix for earlier versions (where
Galera 3.x is used).
2020-01-20 12:14:26 +01:00
Jan Lindström
088de81d96 MDEV-21335 : Galera test failure on suite wsrep
Problem was that wsrep_on was OFF.

This is 10.4 version.
2019-12-18 08:22:07 +02:00
Teemu Ollakka
67e063eb94 Update wsrep-lib. (#1426)
This commit updates the wsrep-lib. The changes are a cleanup in
client_state TOI processing and stub methods for future extensions.
2019-12-16 07:50:15 +02:00
Marko Mäkelä
0a20e5ab77 Merge 10.2 into 10.3 2019-12-12 14:41:51 +02:00
Oleksandr Byelkin
a15234bf4b Merge branch '10.3' into 10.4 2019-12-09 15:09:41 +01:00
Jan Lindström
59e14b9684 MDEV-21189: Dropping partition with 'wsrep_OSU_method=RSU' and 'SESSION sql_log_bin = 0' cases the galera node to hang
Found two bugs

(1) have_committing_connections was missing mutex unlock on one
exit case. As this function is called on a loop it caused mutex
lock when we already owned the mutex. This could cause hang.

(2) wsrep_RSU_begin did set up error code when partition to
be dropped could not be MDL-locked because of concurrent
operations but wrong error code was propagated to upper layer
causing error to be ignored. This could have also caused
the hang.
2019-12-09 08:14:39 +02:00