Commit graph

4220 commits

Author SHA1 Message Date
Marko Mäkelä
98dbe3bfaf Merge 10.5 into 10.6 2025-01-20 09:57:37 +02:00
Brandon Nesterenko
d8c841d0d4 MDEV-35096: History is stored in different partitions on different nodes when using SYSTEM VERSION
Row-injection updates don’t correctly set the historical partition
for tables with system versioning and system_time partitions. This
results in inconsistencies between the master and slave when
replicating transactions that target such tables (i.e. the primary
server would correctly distribute archived rows amongst its
partitions, whereas the replica would have all archived rows in a
single partition). The function
partition_info::vers_set_hist_part(THD*) is used to set the
partition; however, its initial check for
vers_require_hist_part(THD*) returns false, bypassing the rest of
the function (which sets up the partition to use). This is because
the actual check uses the LEX sql_command (via
LEX::vers_history_generating()) to determine if the command is valid
to generate history. Row injections don’t have sql_commands though.

This patch provides a fix which extends the check in
vers_history_generating() to additionally allow row injections to be
history generating (via the function LEX::is_stmt_row_injection()).

Special thanks to Jan Lindstrom <jan.lindstrom@galeracluster.com>
for his work in reproducing the bug, and providing an initial test
case.

Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Aleksey Midenkov <midenok@mariadb.com>
2025-01-13 15:59:07 -07:00
Marko Mäkelä
b251cb6a4f Merge 10.5 into 10.6 2025-01-08 08:48:21 +02:00
Sergei Golubchik
9508a44c37 enforce no trailing \n in Diagnostic_area messages
that is in my_error(), push_warning(), etc
2025-01-07 16:31:39 +01:00
Sergei Golubchik
6abbfdef7a sporadic failures of binlog_encryption.rpl_parallel_gco_wait_kill
CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'

An sql thread can reach the "Slave has read all relay log" state
and then start reading relay log again. Let's use a more generic
pattern to retrieve the sql thread ID even if it's not
in the "read all relay log" state.
2025-01-05 16:40:12 +02:00
Monty
88d9348dfc Remove dates from all rdiff files 2025-01-05 16:40:11 +02:00
Monty
87ee1e75bc MDEV-35643 Add support for MySQL 8.0 binlog events
MDEV-29533 Crash when MariaDB is replica of MySQL 8.0

MySQL 8.0 has added the following new events in the MySQL binary log

PARTIAL_UPDATE_ROWS_EVENT
TRANSACTION_PAYLOAD_EVENT
HEARTBEAT_LOG_EVENT_V2

- PARTIAL_UPDATE_ROWS_EVENT is used by MySQL to generate update
  statements using JSON_SET, JSON_REPLACE and JSON_REMOVE to make
  update of JSON columns more efficient.  These events can be
  disabled by setting 'binlog-row-value-options=""'
- TRANSACTION_PAYLOAD_EVENT is used by MySQL to signal that a
  row event is compressed. It an be disably by setting
  'binlog_transaction_compression=0'.
- HEARTBEAT_LOG_EVENT_V2 is written to the binary log many times
  per seconds. It can be ignored by the server.

What this patch does:

- If PARTIAL_UPDATE_ROWS_EVENT or TRANSACTION_PAYLOAD_EVENT is found,
  the server will stop with an error message of how to disable the
  MySQL server to generate such events.
- HEARTBEAT_LOG_EVENT_V2 events are ignored.
- mariadb-binlog will write the name of the new events.
- mariadb-binlog will stop if PARTIAL_UPDATE_ROWS_EVENT or
  TRANSACTION_PAYLOAD_EVENT is found, unless --force is given.
- Fixes a crash in mariadb-binlog if a character set unknown to
  MariaDB is found. (MDEV-29533)

From Kristian Nielsen:
- Add test case for MySQL 8.0 to MariaDB replication and fixed a
  a small typo in post_header_len initialization.

Reviewer: knielsen@mariadb.org
2025-01-05 16:40:11 +02:00
Yuchen Pei
671f80c738
Merge branch '10.5' into 10.6 2024-12-17 11:06:09 +11:00
Andrei Elkin
bc6121819c MDEV-35098 rpl.rpl_mysqldump_gtid_slave_pos fails in buildbot
The test turns out to be senstive to @@global.gtid_cleanup_batch_size.
With a rather small default value of the latter
SELECTing from mysql.gtid_slave_pos may not be deterministic: tests
that run before may increase a pending for automitic deletion batch.

The test is refined to set its own value for the batch size which
is virtually unreachable.

Thanks to Kristian Nielsen for the analysis.
2024-12-16 19:43:41 +02:00
Marko Mäkelä
ddd7d5d8e3 MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)

Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.

The idea of this fix is from Michael 'Monty' Widenius.

HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.

trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.

trx_t::is_started(): Replaces trx_is_started().

ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.

ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().

The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.

trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.

trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.

fts_trx_create(): Do not copy previous savepoints.

fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2024-12-12 18:02:00 +02:00
Kristian Nielsen
d959acbbf8 MDEV-34049: Parallel access to temptable in different domain_id in parallel replication
Disallow changing @@gtid_domain_id while a temporary table is open in
STATEMENT or MIXED binlog mode. Otherwise, a slave may try to replicate
events refering to the same temporary table in parallel, using domain-based
out-of-order parallel replication. This is not valid, temporary tables are
only available for use within a single thread at a time.

One concrete consequence seen from this bug was a ROLLBACK on an
InnoDB temporary table running in one domain in parallel with DROP
TEMPORARY TABLE in another domain, causing an assertion inside InnoDB:
InnoDB: Failing assertion: table->get_ref_count() == 0 in
dict_sys_t::remove.

Use an existing error code that's somewhat close to the real issue
(ER_INSIDE_TRANSACTION_PREVENTS_SWITCH_GTID_DOMAIN_ID_SEQ_NO), to not add a
new error code in a GA release. When this is merged to the next GA release,
we could optionally introduce a new and more precise error code for an
attempt to change the domain_id while temporary tables are open.

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 09:22:00 +01:00
Kristian Nielsen
0166c89e02 Merge 10.5 -> 10.6
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 09:20:36 +01:00
Kristian Nielsen
b4fde50b1f MDEV-5798: Wrong errorcode for missing partition after TRUNCATE PARTITION
The partitioning error handling code was looking at
thd->lex->alter_info.partition_flags in non-alter-table cases, in which cases
the value is stale and contains whatever was set by any earlier ALTER TABLE.
This could cause the wrong error code to be generated, which then in some cases
can cause replication to break with "different errorcode" error.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 08:17:35 +01:00
Brandon Nesterenko
a06d81ff3f MDEV-35477: rpl_semi_sync_no_missed_ack_after_add_slave fails after MDEV-35109
MTR test rpl_semi_sync_no_missed_ack_after_add_slave fails on
buildbot after the preparatory commit for MDEV-35109 (5290fa043b)
which changed a sleep to a debug_sync point. The problem is that the
debug_sync point would time-out on a slave while waiting to enter
the logic to send an ACK reply. More specifically, where the test
config is a primary with two replicas, and the test waits on one of
the replicas to start sending an ACK, if the other replica was able
to receive the event and respond with an ACK before the binlog dump
thread of the timing-out server would prepare to send event, it
wouldn't set the SEMI_SYNC_NEED_ACK flag, and the replica wouldn't
even try to respond with an ACK.

Fix is to use debug_sync for both replicas such that both replicas
are held before sending their ack, so one can’t temporarily disable
semi-sync for the other before it receives the transaction.
2024-11-21 11:30:25 -07:00
Brandon Nesterenko
716ed2ce22 MDEV-35350: Consolidate MTR wait_for_pattern_in_file.inc and SEARCH_WAIT in search_pattern_in_file.inc
Replace wait_for_pattern_in_file.inc and all of its uses
to use search_pattern_in_file.inc with SEARCH_WAIT.

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Sergei Golubchik <serg@mariadb.org>
2024-11-07 13:25:58 -07:00
Vladislav Vaintroub
faf9e755ba MDEV-35109 fix test case
rpl_semi_sync_after_sync_coord_consistency fails on release compilation
2024-11-05 22:38:55 +01:00
Brandon Nesterenko
b07258a0d5 MDEV-35109: Semi-sync Replication stalling Primary using wait point=AFTER_SYNC
For a primary configured with wait_point=AFTER_SYNC, if two threads
T1 (binlogging through MYSQL_BIN_LOG::write()) and T2 were
binlogging at the same time, T1 could accidentally wait for its
semi-sync ACK using the binlog coordinates of T2. Prior to
MDEV-33551, this only resulted in delayed transactions, because all
transactions shared the same condition variable for ACK signaling.
However, with the MDEV-33551 changes, each thread has its own
condition variable to signal. So T1 could wait indefinitely when
either:
  1) T1's ACK is received but not T2's when T1 goes into
wait_after_sync(), because the ACK receiver thread has already
notified about the T1 ACK, but T1 was _actually_ waiting on T2's
ACK, and therefore tries to wait (in vain).

  2) T1 goes to wait_after_sync() before any ACKs have arrived. When
T1's ACK comes in, T1 is woken up; however, sees it needs to wait
more (because it was actually waiting on T2's ACK), and goes to wait
again (this time, in vain).

Note that the actual cause of T1 waiting on T2's binlog coordinates
is when MYSQL_BIN_LOG::write() would call
Repl_semisync_master::wait_after_sync(), the binlog offset parameter
was read as the end of MYSQL_BIN_LOG::log_file, which is shared
among transactions. So if T2 had updated the binary log _after_ T1
had released LOCK_log, but not yet invoked wait_after_sync(), it
would use the end of the binary log file as the binlog offset, which
was that of T2 (or any future transaction).

The fix in this patch ensures consistency between the binary log
coordinates a transaction uses between report_binlog_update() and
wait_after_sync().

Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2024-11-04 10:45:58 -07:00
Brandon Nesterenko
5290fa043b MDEV-35109 PREP: simulate_delay_semisync_slave_reply use debug_sync
This is a preparatory commit for MDEV-35109 to make its
testing code cleaner (and harden other tests too).

The DEBUG_DBUG point simulate_delay_semisync_slave_reply
up to this patch used my_sleep() to delay an ACK response,
but sleeps are prone to test failures on machines that
run tests when already having a heavy load (e.g. on
buildbot).

This patch changes this DEBUG_DBUG sleep to use DEBUG_SYNC
to coordinate exactly when a slave should send its reply,
which is safer and faster.

As DEBUG_SYNC can't be used while a server is shutting
down, to synchronize threads with SHUTDOWN WAIT FOR SLAVES
logic, we use and extend wait_for_pattern_in_file.inc to
wait for an informational error message in the logic to
indicate that the shutdown process has reached the
intended state (i.e. indicating that the shutdown has
been delayed to await semi-sync ACKs). Specifically, the
extensions are as follows:

 1. wait_for_pattern_in_file.inc is extended with parameter
    wait_for_pattern_count as a number that indicates the
    number of times a pattern should occur in the file before
    return control back to the calling script.

 2. search_for_pattern_in_file.inc is extended with parameter
    SEARCH_ABORT_IS_SUCCESS to inverse the error/success
    logic, so the SEARCH_ABORT condition can be used to
    indicate success, rather than error.
2024-11-04 10:45:58 -07:00
Monty
066f920484 MDEV-35110 Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions
This is an extension of MDEV-30423 "Deadlock on Replica during BACKUP
STAGE BLOCK_COMMIT on XA transactions"

The original commit in MDEV-30423 was not complete as some usage in XA of
MDL_BACKUP_COMMIT locks did not set thd->backup_commit_lock.
This is required to be set when using parallel replication.

Fixed by ensuring that all usage of BACKUP_COMMIT lock i XA is uniform and
all sets thd->backup_commit_lock. I also changed all locks to be
MDL_EXPLICIT to keep also that part uniform.

A regression test is added.
2024-10-28 13:29:21 +02:00
Brandon Nesterenko
1ed30e08af MDEV-34122: Assertion `entry' failed in Active_tranx::assert_thd_is_waiter
If semi-sync is switched off then on while a transaction is
in-between binlogging and waiting for an ACK, the semi-sync state of
the transaction is removed, leading to a debug assertion that
indicates the transaction tried to wait, but cannot receive an ACK
signal. More specifically, when semi-sync is switched off, the
Active_tranx list is cleared (where a transaction adds an entry to
this list during binlogging), and each entry in this list saves the
thread which will wait for an ACK, and the thread has the COND
variable to signal to wake itself. So if the entry is lost, the
Ack_receiver thread won’t be able to find the thread to wake up when
an ACK comes in

The fix is to ensure that the entry exists before awaiting the ACK,
and if there is no entry, skip the wait. In debug builds, an
informative message is written explaining that the transaction is
skipping its wait. Additional debug-build only logic is added to
ensure that the cause of the missing entry is due to semi-sync being
turned off and on

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-21 15:35:54 -06:00
Sergei Golubchik
3a1cf2c85b MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
Sergei Golubchik
5ebda30ccc Revert "MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2"
This reverts commit 8ae462a220.
2024-10-16 13:23:47 +02:00
Kristian Nielsen
8ae462a220 MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2
Implement variable legacy_xa_rollback_at_disconnect to support
backwards compatibility for applications that rely on the pre-10.5
behavior for connection disconnect, which is to rollback the
transaction (in violation of the XA specification).

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-16 10:18:36 +02:00
Marko Mäkelä
7e0afb1c73 Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
Lena Startseva
0a5e4a0191 MDEV-31005: Make working cursor-protocol
Updated tests: cases with bugs or which cannot be run
with the cursor-protocol were excluded with
"--disable_cursor_protocol"/"--enable_cursor_protocol"

Fix for v.10.5
2024-09-18 18:39:26 +07:00
Brandon Nesterenko
68938d2b42 MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail
The failing test case validates Seconds_Behind_Master for a delayed
slave, while STOP SLAVE is executed during a delay. The test fixes
initially added to the test (commit b04c857596) added a table lock
to ensure a transaction could not finish before validating the
Seconds_Behind_Master field after SLAVE START, but did not address a
possibility that the transaction could finish before running the
STOP SLAVE command, which invalidates the validations for the rest
of the test case. Specifically, this would result in 1) a timeout in
“Waiting for table metadata lock” on the replica, which expects the
transaction to retry after slave restart and hit a lock conflict on
the locked tables (added in b04c857596), and 2) that
Seconds_Behind_Master should have increased, but did not.

The failure can be reproduced by synchronizing the slave to the master
before the MDEV-32265 echo statement (i.e. before the SLAVE STOP).

This patch fixes the test by adding a mechanism to use DEBUG_SYNC to
synchronize a MASTER_DELAY, rather than continually increase the
duration of the delay each time the test fails on buildbot. This is
to ensure that on slow machines, a delay does not pass before the
test gets a chance to validate results. Additionally, it decreases
overall test time because the test can continue immediately after
validation, thereby bypassing the remainder of a full delay for each
transaction.
2024-09-17 06:29:20 -06:00
Marko Mäkelä
48becffd07 Merge 10.5 into 10.6 2024-08-27 08:52:10 +03:00
Kristian Nielsen
8642453ce6 Fix sporadic failure of test case rpl.rpl_start_stop_slave
The test was expecting the I/O thread to be in a specific state, but thread
scheduling may cause it to not yet have reached that state. So just have a
loop that waits for the expected state to occur.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-08-26 14:39:24 +02:00
Kristian Nielsen
214e6c5b3d Fix sporadic failure of test case rpl.rpl_old_master
Remove the test for MDEV-14528. This is supposed to test that parallel
replication from pre-10.0 master will update Seconds_Behind_Master. But
after MDEV-12179 the SQL thread is blocked from even beginning to fetch
events from the relay log due to FLUSH TABLES WITH READ LOCK, so the test
case is no longer testing what is was intended to. And pre-10.0 versions are
long since out of support, so does not seem worthwhile to try to rewrite the
test to work another way.

The root cause of the test failure is MDEV-34778. Briefly, depending on
exact timing during slave stop, the rli->sql_thread_caught_up flag may end
up with different value. If it ends up as "true", this causes
Seconds_Behind_Master to be 0 during next slave start; and this caused test
case timeout as the test was waiting for Seconds_Behind_Master to become
non-zero.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-08-26 14:39:24 +02:00
Kristian Nielsen
7dc4ea5649 Fix sporadic test failure in rpl.rpl_create_drop_event
Depending on timing, an extra event run could start just when the event
scheduler is shut down and delay running until after the table has been
dropped; this would cause the test to fail with a "table does not exist"
error in the log.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-08-26 14:39:24 +02:00
Kristian Nielsen
33854d7324 Restore skiping rpl.rpl_mdev6020 under Valgrind
(Revert a change done by mistake when XtraDB was removed.)

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-08-26 14:39:24 +02:00
Oleksandr Byelkin
8f020508c8 Merge branch '10.5' into 10.6 2024-08-03 09:04:24 +02:00
Brandon Nesterenko
001608de7e MDEV-15393: Fix rpl_mysqldump_gtid_slave_pos
The slave would try to sync_with_master_gtid.inc,
but the master never actually saved its gtid position
so the test would move on too quickly.
2024-07-31 14:17:46 -06:00
Oleksandr Byelkin
a938503cfb Merge branch '10.5' into 10.6 2024-07-20 08:12:42 +02:00
Andrei
b8f92ade57 MDEV-15393 gtid_slave_pos duplicate key errors after mysqldump restore
When mysqldump is run to dump the `mysql` system database, it generates
INSERT statements into the table `mysql.gtid_slave_pos`.
After running the backup script
those inserts did not produce the expected gtid state on slave. In
particular the maximum of mysql.gtid_slave_pos.sub_id did not make
into
   rpl_global_gtid_slave_state.last_sub_id

an in-memory object that is supposed to match the current state of the
table. And that was regardless of whether --gtid option was specified
or not. Later when the backup recipient server starts as slave
in *non-gtid* mode this desychronization may lead to a duplicate key
error.

This effect is corrected for --gtid mode mysqldump/mariadb-dump only
as the following.  The fixes ensure the insert block of the dump
script is followed with a "summing-up" SET @global.gtid_slave_pos
assignment.

For the implemenation part, note a deferred print-out of
SET-gtid_slave_pos and associated comments is prefered over relocating
of the entire blocks if (opt_master,slave_data &&
do_show_master,slave_status) ...  because of compatiblity
concern. Namely an error inside do_show_*() is handled in the new code
the same way, as early as, as before.

A regression test can be run in how-to-reproduce mode as well.
One affected mtr test observed.
rpl_mysqldump_slave.result "mismatch" shows now the new deferring print
of SET-gtid_slave_pos policy in action.
2024-07-19 21:44:12 +03:00
Oleksandr Byelkin
9af2caca33 Merge branch '10.5' into 10.6 2024-07-18 16:25:33 +02:00
Brandon Nesterenko
a061ae1079 MDEV-33921: Fix rpl_xa_empty_transaction.test
The test was missing a save_master_gtid.inc on the master,
leading to the slave thinking it was in sync after executing
sync_with_master_gtid.inc, despite not having executed the
latest transaction. This skipped transaction, XA COMMIT,
was supposed to error-to-be-ignored because its XID could not
be found, but be thrown out because the replication filters
would filter out the target database. However, if the slave
was able to stop before executing the transaction, then
the replication filer is reset (to empty), and when the
slave is later restarted, that transactions error would
no longer be ignored.

Additionally, as the test cases added in MDEV-33921 rely
on GTID synchronization, the test cases now force
master_use_gtid=slave_pos for consistency
2024-07-17 16:38:26 -06:00
Sergei Golubchik
d60f5c11ea MDEV-34318 mariadb-dump SQL syntax error with MAX_STATEMENT_TIME against Percona MySQL server
protect MariaDB conditional comments from a bug
in Percona MySQL comment parser
2024-07-17 21:25:40 +02:00
Yuchen Pei
f071b7620b
Merge branch '10.5' into 10.6 2024-07-16 15:54:22 +08:00
Daniel Black
e8bcc4e455 MDEV-34568 rpl.rpl_mdev12179 - correct for Windows
Simplify in an attempt to avoid:

mysqltest: At line 275: File already exist: on the write_file
lines.

Using write_line as that's what a lot of other tests
do for writing small bits to a expect file.

Review thanks Valdislav Vaintroub
2024-07-12 12:55:28 +02:00
Brandon Nesterenko
ea9869504d MDEV-33921: Replication breaks when filtering two-phase XA transactions
There are two problems.

First, replication fails when XA transactions are used where the
slave has replicate_do_db set and the client has touched a different
database when running DML such as inserts. This is because XA
commands are not treated as keywords, and are thereby not exempt
from the replication filter. The effect of this is that during an XA
transaction, if its logged “use db” from the master is filtered out
by the replication filter, then XA END will be ignored, yet its
corresponding XA PREPARE will be executed in an invalid state,
thereby breaking replication.

Second, if the slave replicates an XA transaction which results in
an empty transaction, the XA START through XA PREPARE first phase of
the transaction won’t be binlogged, yet the XA COMMIT will be
binlogged. This will break replication in chain configurations.

The first problem is fixed by treating XA commands in
Query_log_event as keywords, thus allowing them to bypass the
replication filter. Note that Query_log_event::is_trans_keyword() is
changed to accept a new parameter to define its mode, to either
check for XA commands or regular transaction commands, but not both.
In addition, mysqlbinlog is adapted to use this mode so its
--database filter does not remove XA commands from its output.

The second problem fixed by overwriting the XA state in the XID
cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to
rollback the transaction and skip its binlogging. If the xid cache
is cleared before an XA transaction receives its completion command
(e.g. on server shutdown), then before reporting ER_XAER_NOTA when
the completion command is executed, the filter is first checked if
the database is ignored, and if so, the error is ignored.

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2024-07-10 14:37:39 -06:00
Julius Goryavsky
4026f04425 Merge branch 10.5 into 10.6 2024-07-09 11:56:47 +02:00
Brandon Nesterenko
744580d5a7 MDEV-32892: IO Thread Reports False Error When Stopped During Connecting to Primary
The IO thread can report error code 2013 into the error log when it
is stopped during the initial connection process to the primary, as
well as when trying to read an event. However, because the IO thread
is being stopped, its connection to the primary is force-killed by
the signaling thread (see THD::awake_no_mutex()), and thereby these
connection errors should be ignored.

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-07-08 10:39:17 -06:00
Alexander Barkov
e56040fee8 Merge remote-tracking branch 'origin/10.5' into 10.6 2024-07-08 18:59:04 +04:00
Brandon Nesterenko
eb4458e993 MDEV-33465: an option to enable semisync recovery
The current semi-sync binlog fail-over recovery process uses
rpl_semi_sync_slave_enabled==TRUE as its condition to truncate a
primary server’s binlog, as it is anticipating the server to re-join
a replication topology as a replica. However, for servers configured
with both rpl_semi_sync_master_enabled=1 and
rpl_semi_sync_slave_enabled=1, if a primary is just re-started (i.e.
retaining its role as master), it can truncate its binlog to drop
transactions which its replica(s) has already received and executed.
If this happens, when the replica reconnects, its gtid_slave_pos can
be ahead of the recovered primary’s gtid_binlog_pos, resulting in an
error state where the replica’s state is ahead of the primary’s.

This patch changes the condition for semi-sync recovery to truncate
the binlog to instead use the configuration variable
--init-rpl-role, when set to SLAVE. This allows for both
rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled to be
set for a primary that is restarted, and no transactions will be
lost, so long as --init-rpl-role is not set to SLAVE.

Reviewed By:
============
Sergei Golubchik <serg@mariadb.com>
2024-07-05 19:53:57 -06:00
Brandon Nesterenko
cbc1898e82 MDEV-25607: Auto-generated DELETE from HEAP table can break replication
The special logic used by the memory storage engine
to keep slaves in sync with the master on a restart can
break replication. In particular, after a restart, the
master writes DELETE statements in the binlog for
each MEMORY-based table so the slave can empty its
data. If the DELETE is not executable, e.g. due to
invalid triggers, the slave will error and fail, whereas
the master will never see the problem.

Instead of DELETE statements, use TRUNCATE to
keep slaves in-sync with the master, thereby bypassing
triggers.

Reviewed By:
===========
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2024-07-05 12:00:09 -06:00
Marko Mäkelä
0076eb3d4e Merge 10.5 into 10.6 2024-06-24 13:09:47 +03:00
Monty
3541bd63f0 MDEV-33582 Add more warnings to be able to better diagnose network issues
Changed the logged messages from errors to warnings
Also changed 'remain' to 'read_length' in the warning to make it more readable.
2024-06-20 09:53:01 +03:00
Brandon Nesterenko
6cab2f75fe MDEV-23857: replication master password length
After MDEV-4013, the maximum length of replication passwords was extended to
96 ASCII characters. After a restart, however, slaves only read the first 41
characters of MASTER_PASSWORD from the master.info file. This lead to slaves
unable to reconnect to the master after a restart.

After a slave restart, if a master.info file is detected, use the full
allowable length of the password rather than 41 characters.

Reviewed By:
============
Sergei Golubchik <serg@mariadb.com>
2024-06-18 07:21:18 -06:00
Brandon Nesterenko
fcd21d3e40 MDEV-34355: rpl.rpl_semi_sync_no_missed_ack_after_add_slave ‘server_3 should have sent…’
The problem is that the test could query the status variable
Rpl_semi_sync_slave_send_ack before the slave actually updated it.
This would result in an immediate --die assertion killing the rest
of the test. The bottom of this commit message has a small patch
that can be applied to reproduce the test failure.

This patch fixes the test failure by waiting for the variable to be
updated before querying its value.

diff --git a/sql/semisync_slave.cc b/sql/semisync_slave.cc
index 9ddd4c5c8d7..60538079fce 100644
--- a/sql/semisync_slave.cc
+++ b/sql/semisync_slave.cc
@@ -303,7 +303,10 @@ int Repl_semi_sync_slave::slave_reply(Master_info *mi)
     reply_res= DBUG_EVALUATE_IF("semislave_failed_net_flush", 1,
                                 net_flush(net));
     if (!reply_res)
+    {
+      sleep(1);
       rpl_semi_sync_slave_send_ack++;
+    }
   }
   DBUG_RETURN(reply_res);
 }
2024-06-10 12:27:20 -06:00