Commit graph

276 commits

Author SHA1 Message Date
Kristian Nielsen
7372ecc396 Restore the THD state correctly in parallel replication
If both do_gco_wait() and do_ftwrl_wait() had to wait, the state was not restored correctly.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 09:22:00 +01:00
Kristian Nielsen
0166c89e02 Merge 10.5 -> 10.6
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 09:20:36 +01:00
Teemu Ollakka
a2575a0703 MDEV-35465 Async replication stops working on Galera async replica node when parallel replication is enabled
Parallel slave failed to retry in retry_event_group() with error

    WSREP: Parallel slave worker failed at wsrep_before_command() hook

Fix wsrep transaction cleanup/restart in retry_event_group() to properly
clean up previous transaction by calling wsrep_after_statement().
Also move call to reset error after call to wsrep_after_statement()
to make sure that it remains effective.

Add a MTR test galera_as_slave_parallel_retry to reproduce the error
when the fix is not present.

Other issues which were detected when testing with sysbench:

Check if parallel slave is killed for retry before waiting for prior
commits in THD::wsrep_parallel_slave_wait_for_prior_commit(). This
is required with slave-parallel-mode=optimistic to avoid deadlock
when a slave later in commit order manages to reach prepare phase
before a lock conflict is detected.

Suppress wsrep applier specific warning for slave threads.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-12-03 15:05:32 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Monty
bddbef3573 MDEV-34533 asan error about stack overflow when writing record in Aria
The problem was that when using clang + asan, we do not get a correct value
for the thread stack as some local variables are not allocated at the
normal stack.

It looks like that for example clang 18.1.3, when compiling with
-O2 -fsanitize=addressan it puts local variables and things allocated by
alloca() in other areas than on the stack.

The following code shows the issue

Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection
    (connect=0x5080000027b8,
    put_in_cache=<optimized out>) at sql/sql_connect.cc:1399

THD *thd;
1399      thd->thread_stack= (char*) &thd;
(gdb) p &thd
(THD **) 0x7fffedee7060
(gdb) p $sp
(void *) 0x7fffef4e7bc0

The address of thd is 24M away from the stack pointer

(gdb) info reg
...
rsp            0x7fffef4e7bc0      0x7fffef4e7bc0
...
r13            0x7fffedee7060      140737185214560

r13 is pointing to the address of the thd. Probably some kind of
"local stack" used by the sanitizer

I have verified this with gdb on a recursive call that calls alloca()
in a loop. In this case all objects was stored in a local heap,
not on the stack.

To solve this issue in a portable way, I have added two functions:

my_get_stack_pointer() returns the address of the current stack pointer.
The code is using asm instructions for intel 32/64 bit, powerpc,
arm 32/64 bit and sparc 32/64 bit.
Supported compilers are gcc, clang and MSVC.
For MSVC 64 bit we are using _AddressOfReturnAddress()

As a fallback for other compilers/arch we use the address of a local
variable.

my_get_stack_bounds() that will return the address of the base stack
and stack size using pthread_attr_getstack() or NtCurrentTed() with
fallback to using the address of a local variable and user provided
stack size.

Server changes are:

- Moving setting of thread_stack to THD::store_globals() using
  my_get_stack_bounds().
- Removing setting of thd->thread_stack, except in functions that
  allocates a lot on the stack before calling store_globals().  When
  using estimates for stack start, we reduce stack_size with
  MY_STACK_SAFE_MARGIN (8192) to take into account the stack used
  before calling store_globals().

I also added a unittest, stack_allocation-t, to verify the new code.

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2024-10-16 17:24:46 +03:00
Marko Mäkelä
48becffd07 Merge 10.5 into 10.6 2024-08-27 08:52:10 +03:00
Kristian Nielsen
b4c2e23954 MDEV-34696: do_gco_wait() completes too early on InnoDB dict stats updates
Before doing mark_start_commit(), check that there is no pending deadlock
kill. If there is a pending kill, we won't commit (we will abort, roll back,
and retry). Then we should not mark the commit as started, since that could
potentially make the following GCO start too early, before we completed the
commit after the retry.

This condition could trigger in some corner cases, where InnoDB would take
temporarily table/row locks that are released again immediately, not held
until the transaction commits. This happens with dict_stats updates and
possibly auto-increment locks.

Such locks can be passed to thd_rpl_deadlock_check() and cause a deadlock
kill to be scheduled in the background. But since the blocking locks are
held only temporarily, they can be released before the background kill
happens. This way, the kill can be delayed until after mark_start_commit()
has been called. Thus we need to check the synchronous indication
rgi->killed_for_retry, not just the asynchroneous thd->killed.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-08-26 14:39:24 +02:00
Marko Mäkelä
5ba542e9ee Merge 10.5 into 10.6 2024-05-30 14:27:07 +03:00
Robin Newhouse
dc38d8ea80 Minimize unsafe C functions with safe_strcpy()
Similar to .
567b681 introduced safe_strcpy() to minimize the use of C with
potentially unsafe memory overflow with strcpy() whose use is
discouraged.
Replace instances of strcpy() with safe_strcpy() where possible, limited
here to files in the `sql/` directory.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
2024-05-17 13:33:16 +01:00
Kristian Nielsen
596921dab8 MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure
Clear any pending deadlock kill after completing XA PREPARE, and before
updating the mysql.gtid_slave_pos table in a separate transaction.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-05-02 21:07:51 +02:00
Marko Mäkelä
829cb1a49c Merge 10.5 into 10.6 2024-04-17 14:14:58 +03:00
Kristian Nielsen
16aa4b5f59 Merge from 10.4 to 10.5
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-15 17:46:49 +02:00
Kristian Nielsen
d90a2b44ad MDEV-33668: More precise dependency tracking of XA XID in parallel replication
Keep track of each recently active XID, recording which worker it was queued
on. If an XID might still be active, choose the same worker to queue event
groups that refer to the same XID to avoid conflicts.

Otherwise, schedule the XID freely in the next round-robin slot.

This way, XA PREPARE can normally be scheduled without restrictions (unless
duplicate XID transactions come close together). This improves scheduling
and parallelism over the old method, where the worker thread to schedule XA
PREPARE on was fixed based on a hash value of the XID.

XA COMMIT will normally be scheduled on the same worker as XA PREPARE, but
can be a different one if the XA PREPARE is far back in the event history.

Testcase and code for trimming dynamic array due to Andrei.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-09 11:42:34 +03:00
Kristian Nielsen
f9ecaa87ce MDEV-33668: Refactor parallel replication round-robin scheduling to use explicit FIFO
This is a preparatory patch to facilitate the next commit to improve
the scheduling of XA transactions in parallel replication.

When choosing the scheduling bucket for the next event group in
rpl_parallel_entry::choose_thread(), use an explicit FIFO for the
round-robin selection instead of a simple cyclic counter i := (i+1) % N.

This allows to schedule XA COMMIT/ROLLBACK dependencies explicitly without
changing the round-robin scheduling of other event groups.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-09 11:42:34 +03:00
Kristian Nielsen
0a6f46965a MDEV-33475: --gtid-ignore-duplicate can double-apply event in case of parallel replication retry
When rolling back and retrying a transaction in parallel replication, don't
release the domain ownership (for --gtid-ignore-duplicates) as part of the
rollback. Otherwise another master connection could grab the ownership and
double-apply the transaction in parallel with the retry.

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-13 16:59:10 +01:00
Monty
9a132d423a MDEV-33620 Improve times and states in show processlist for replication
This will makes it easier to find out what replication workers are
doing and what they are waiting for.

Things changed in processlist:
- Slave_SQL time was not consistent. Now time for state "Slave has
  read all relay log; waiting for more updates" shows how long it has
  waited for getting the next event.
- Slave_worker threads did often show "Closing tables" for a long
  time.  Now the state is reverted to the previous state after
  "Closing tables" is done.
- Commit and Rollback states where not shown for replication (and some
  other threads). Now Commit and Rollback states are always shown and
  the state is reverted to previous state when the Commit/Rollback
  have finished.

Code changes:
- Added thd->set_time_for_next_stage() for parallel replication when
  when starting to wait for prior transactions to commit, group commit,
  and FTWRL and for free space in thread pool.
  Before we reset the time only after the above events.
- Moved THD_STAGE_INFO(stage_rollback) and THD_STAGE_INFO(stage_commit)
  from sql_parse.cc to transaction.cc to ensure this is done for
  all commits and not only 'normal connection queries'.

Test case changes:
- close_thread_tables() reverting stage to previous stage caused the
  counter in performance_schema to be increased. In many case it is
  the 'sql/starting' stage that was effected.
- We only change to "Commit" stage if there is a need for a commit.
  This caused some "Commit" stages to disapper from perfschema reports.

TODO in 11.#:
- Slave_IO always showes "Waiting for master to send event" and the time is
  from SLAVE START. We should in 11.# change this to be the time since
  reading the last event.
2024-03-08 15:23:17 +02:00
Marko Mäkelä
2b01e5103d Merge 10.5 into 10.6 2023-12-19 18:41:42 +02:00
Marko Mäkelä
12995559f9 Merge 10.4 into 10.5 2023-12-19 18:30:58 +02:00
Kristian Nielsen
eaa4968fc5 MDEV-10653: Fix segfault in SHOW MASTER STATUS with NULL inuse_relaylog
The previous patch for MDEV-10653 changes the rpl_parallel::workers_idle()
function to use Relay_log_info::last_inuse_relaylog to check for idle
workers. But the code was missing a NULL check. Also, there was one place
during SQL slave thread start which was missing mutex synchronisation when
updating inuse_relaylog.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-12-19 12:08:54 +01:00
Kristian Nielsen
1cbba45e6e Attempt to fix rare race in test for MDEV-8031
The error-injection inject_mdev8031 simulates a deadlock kill in a specific
place, by setting killed_for_retry to RETRY_KILL_KILLED directly. If a real
deadlock kill triggers at the same time, it is possible for the thread to
complete its transaction retry and set rgi_slave to NULL before the real
readlock kill can complete in the background. This will cause a segfault
due to null-pointer access.

Fix by changing the error injection to do a real background deadlock kill,
which ensures that the thread will wait for any pending background kills to
complete.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-12-19 12:08:53 +01:00
Marko Mäkelä
4ae105a37d Merge 10.4 into 10.5 2023-12-18 08:59:07 +02:00
Brandon Nesterenko
8dad51481b MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave
AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in
buildbot with timeout in include

A replication parallel worker thread can deadlock with another
connection running SHOW SLAVE STATUS. That is, if the replication
worker thread is in do_gco_wait() and is killed, it will already
hold the LOCK_parallel_entry, and during error reporting, try to
grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in
reverse order. It will initially grab the err_lock, and then try to
grab LOCK_parallel_entry. This leads to a deadlock when both threads
have grabbed their first lock without the second.

This patch implements the MDEV-31894 proposed fix to optimize the
workers_idle() check to compare the last in-use relay log’s
queued_count==dequeued_count for idleness. This removes the need for
workers_idle() to grab LOCK_parallel_entry, as these values are
atomically updated.

Huge thanks to Kristian Nielsen for diagnosing the problem!

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2023-12-11 07:45:23 -07:00
Marko Mäkelä
0f9acce3f2 Merge 10.5 into 10.6 2023-09-14 09:01:15 +03:00
sjaakola
a3cbc44b24 MDEV-31833 replication breaks when using optimistic replication and replica is a galera node
MariaDB async replication SQL thread was stopped for any failure
in applying of replication events and error message logged for the failure
was: "Node has dropped from cluster". The assumption was that event applying
failure is always due to node dropping out.
With optimistic parallel replication, event applying can fail for natural
reasons and applying should be retried to handle the failure. This retry
logic was never exercised because the slave SQL thread was stopped with first
applying failure.

To support optimistic parallel replication retrying logic this commit will
now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
and replication is configured for optimistic or aggressive retry logic.

During the development of this fix, galera.galera_as_slave_nonprim test showed
some problems. The test was analyzed, and it appears to need some attention.
One excessive sleep command was removed in this commit, but it will need more
fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
is successful, though.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-09-12 02:37:30 +02:00
Marko Mäkelä
0dd25f28f7 Merge 10.5 into 10.6 2023-09-11 14:46:39 +03:00
Marko Mäkelä
f8f7d9de2c Merge 10.4 into 10.5 2023-09-11 11:29:31 +03:00
Kristian Nielsen
e937a64d46 MDEV-10356: rpl.rpl_parallel_temptable failure due to incorrect commit optimization of temptables
The problem was that parallel replication of temporary tables using
statement-based binlogging could overlap the COMMIT in one thread with a DML
or DROP TEMPORARY TABLE in another thread using the same temporary table.
Temporary tables are not safe for concurrent access, so this caused
reference to freed memory and possibly other nastiness.

The fix is to disable the optimisation with overlapping commits of one
transaction with the start of a later transaction, when temporary tables are
in use. Then the following event groups will be blocked from starting until
the one using temporary tables is completed.

This also fixes occasional test failures of rpl.rpl_parallel_temptable seen
in Buildbot.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-09-07 14:40:05 +02:00
Marko Mäkelä
448c2077fb Merge 10.5 into 10.6 2023-08-21 15:50:31 +03:00
Marko Mäkelä
5895a3622b Merge 10.4 into 10.5 2023-08-17 10:33:36 +03:00
Kristian Nielsen
34e8585437 MDEV-29974: Missed kill waiting for worker queues to drain
When the SQL driver thread goes to wait for room in the parallel slave
worker queue, there was a race where a kill at the right moment could
be ignored and the wait proceed uninterrupted by the kill.

Fix by moving the THD::check_killed() to occur _after_ doing ENTER_COND().

This bug was seen as sporadic failure of the testcase rpl.rpl_parallel
(rpl.rpl_parallel_gco_wait_kill since 10.5), with "Slave stopped with
wrong error code".

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-16 14:07:06 +02:00
Kristian Nielsen
7c9837ce74 Merge 10.4 into 10.5
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 18:02:18 +02:00
Kristian Nielsen
18acbaf416 MDEV-31655: Parallel replication deadlock victim preference code errorneously removed
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.

Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:

MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master

Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 16:39:49 +02:00
Kristian Nielsen
900c4d6920 MDEV-31655: Parallel replication deadlock victim preference code errorneously removed
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.

Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:

MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master

Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB locking code changes.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 16:35:30 +02:00
Oleksandr Byelkin
6bf8483cac Merge branch '10.5' into 10.6 2023-08-01 15:08:52 +02:00
Oleksandr Byelkin
7564be1352 Merge branch '10.4' into 10.5 2023-07-26 16:02:57 +02:00
Oleksandr Byelkin
f52954ef42 Merge commit '10.4' into 10.5 2023-07-20 11:54:52 +02:00
Kristian Nielsen
08585b0949 MDEV-31509: Lost data with FTWRL and STOP SLAVE
The largest_started_sub_id needs to be set under LOCK_parallel_entry
together with testing stop_sub_id. However, in-between was the logic for
do_ftwrl_wait(), which temporarily releases the mutex. This could lead to
inconsistent stopping amongst worker threads and lost data.

Fix by moving all the stop-related logic out from unrelated do_gco_wait()
and do_ftwrl_wait() and into its own function do_stop_handling().

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-07-12 09:41:32 +02:00
Kristian Nielsen
5d61442c85 MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica
The problem is that when a worker thread is (user) killed in
wait_for_prior_commit, the event group may complete out-of-order since the
wait for prior commit was aborted by the kill.

This fix ensures that event groups will always complete in-order, even
in the error case. This is done in finish_event_group() by doing an
extra wait_for_prior_commit(), if necessary, that ignores kills.

This fix supersedes the fix for MDEV-30780, so the earlier fix for
that is reverted in this patch.

Also fix that an error from wait_for_prior_commit() inside
finish_event_group() would not signal the error to
wakeup_subsequent_commits().

Based on earlier work by Brandon Nesterenko and Andrei Elkin, with
some changes to simplify the semantics of wait_for_prior_commit() and
make the code more robust to future changes.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-07-12 09:41:32 +02:00
Kristian Nielsen
a8ea6627a4 MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica
The problem was an incorrect unmark_start_commit() in
signal_error_to_sql_driver_thread(). If an event group gets an error, this
unmark could run after the following GCO started, and the subsequent
re-marking could access de-allocated GCO.

The offending unmark_start_commit() looks obviously incorrect, and the fix
is to just remove it. It was introduced in the MDEV-8302 patch, the commit
message of which suggests it was added there solely to satisfy an assertion
in ha_rollback_trans(). So update this assertion instead to not trigger for
event groups that experienced an error (rgi->worker_error). When an error
occurs in an event group, all following event groups are skipped anyway, so
the unmark should never be needed in this case.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-07-12 09:41:32 +02:00
Kristian Nielsen
60bec1d54d MDEV-13915: STOP SLAVE takes very long time on a busy system
At STOP SLAVE, worker threads will continue applying event groups until the
end of the current GCO before stopping. This is a left-over from when only
conservative mode was available. In optimistic and aggressive mode, often
_all_ queued event will be in the same GCO, and slave stop will be
needlessly delayed.

This patch instead records at STOP SLAVE time the latest (highest sub_id)
event group that has started. Then worker threads will continue to apply
event groups up to that event group, but skip any following. The result is
that each worker thread will complete its currently running event group, and
then the slave will stop.

If the slave is caught up, and STOP SLAVE is run in the middle of an event
group that is already executing in a worker thread, then that event group
will be rolled back and the slave stop immediately, as normal.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-07-12 09:41:32 +02:00
Brandon Nesterenko
8ed88e3455 Revert "MDEV-13915: STOP SLAVE takes very long time on a busy system"
This reverts commit 0a99d457b3
because it should go into only 10.5+
2023-06-06 08:11:38 -06:00
Brandon Nesterenko
0a99d457b3 MDEV-13915: STOP SLAVE takes very long time on a busy system
The problem is that a parallel replica would not immediately stop
running/queued transactions when issued STOP SLAVE. That is, it
allowed the current group of transactions to run, and sometimes the
transactions which belong to the next group could be started and run
through commit after STOP SLAVE was issued too, if the last group
had started committing. This would lead to long periods to wait for
all waiting transactions to finish.

This patch updates a parallel replica to try and abort immediately
and roll-back any ongoing transactions. The exception to this is any
transactions which are non-transactional (e.g. those modifying
sequences or non-transactional tables), and any prior transactions,
will be run to completion.

The specifics are as follows:

 1. A new stage was added to SHOW PROCESSLIST output for the SQL
Thread when it is waiting for a replica thread to either rollback or
finish its transaction before stopping. This stage presents as
“Waiting for worker thread to stop”

 2. Worker threads which error or are killed no longer perform GCO
cleanup if there is a concurrently running prior transaction. This
is because a worker thread scheduled to run in a future GCO could be
killed and incorrectly perform cleanup of the active GCO.

 3. Refined cases when the FL_TRANSACTIONAL flag is added to GTID
binlog events to disallow adding it to transactions which modify
both transactional and non-transactional engines when the binlogging
configuration allow the modifications to exist in the same event,
i.e. when using binlog_direct_non_trans_update == 0 and
binlog_format == statement.

 4. A few existing MTR tests relied on the completion of certain
transactions after issuing STOP SLAVE, and were re-recorded
(potentially with added synchronizations) under the new rollback
behavior.

Reviewed By
===========
Andrei Elkin <andrei.elkin@mariadb.com>
2023-06-05 10:03:06 -06:00
Oleksandr Byelkin
de703a2b21 Merge branch '10.4' into 10.4.29 release 2023-05-11 09:07:45 +02:00
Oleksandr Byelkin
043d69bbcc Merge branch '10.5' into 10.6 2023-05-03 09:51:25 +02:00
Monty
4f7317579e Fixed "Trying to lock uninitialized mutex' in parallel replication
The problem was that mutex_init() was called after the worker was
put into the domain_hash, which allowed other threads to access it
before mutex was initialized.
2023-05-02 23:43:07 +03:00
Oleksandr Byelkin
edf8ce5b97 Merge branch 'bb-10.4-release' into bb-10.5-release 2023-05-02 13:54:54 +02:00
Oleksandr Byelkin
edd0b03e60 Merge branch '10.3' into 10.4 2023-05-02 10:09:27 +02:00
Andrei
55a53949be MDEV-29621: Replica stopped by locks on sequence
When using binlog_row_image=FULL with sequence table inserts, a
replica can deadlock because it treats full inserts in a sequence as DDL
statements by getting an exclusive lock on the sequence table. It
has been observed that with parallel replication, this exclusive
lock on the sequence table can lead to a deadlock where one
transaction has the exclusive lock and is waiting on a prior
transaction to commit, whereas this prior transaction is waiting on
the MDL lock.

This fix for this is on the master side, to raise FL_DDL
flag on the GTID of a full binlog_row_image write of a sequence table.
This forces the slave to execute the statement serially so a deadlock
cannot happen.

A test verifies the deadlock also to prove it happen on the OLD (pre-fixes)
slave.

OLD (buggy master) -replication-> NEW (fixed slave) is provided.
As the pre-fixes master's full row-image may represent both
SELECT NEXT VALUE and INSERT, the parallel slave pessimistically
waits for the prior transaction to have committed before to take on the
critical part of the second (like INSERT in the test) event execution.
The waiting exploits a parallel slave's retry mechanism which is
controlled by `@@global.slave_transaction_retries`.

Note that in order to avoid any persistent 'Deadlock found' 2013 error
in OLD -> NEW, `slave_transaction_retries` may need to be set to a
higher than the default value.
START-SLAVE is an effective work-around if this still happens.
2023-04-27 21:55:45 +03:00
Marko Mäkelä
bb1d1dc846 Merge 10.5 into 10.6 2023-04-27 09:48:27 +03:00
Marko Mäkelä
902c622215 Merge 10.4 into 10.5 2023-04-27 09:39:53 +03:00