mariadb/mysql-test/suite
Brandon Nesterenko b07258a0d5 MDEV-35109: Semi-sync Replication stalling Primary using wait point=AFTER_SYNC
For a primary configured with wait_point=AFTER_SYNC, if two threads
T1 (binlogging through MYSQL_BIN_LOG::write()) and T2 were
binlogging at the same time, T1 could accidentally wait for its
semi-sync ACK using the binlog coordinates of T2. Prior to
MDEV-33551, this only resulted in delayed transactions, because all
transactions shared the same condition variable for ACK signaling.
However, with the MDEV-33551 changes, each thread has its own
condition variable to signal. So T1 could wait indefinitely when
either:
  1) T1's ACK is received but not T2's when T1 goes into
wait_after_sync(), because the ACK receiver thread has already
notified about the T1 ACK, but T1 was _actually_ waiting on T2's
ACK, and therefore tries to wait (in vain).

  2) T1 goes to wait_after_sync() before any ACKs have arrived. When
T1's ACK comes in, T1 is woken up; however, sees it needs to wait
more (because it was actually waiting on T2's ACK), and goes to wait
again (this time, in vain).

Note that the actual cause of T1 waiting on T2's binlog coordinates
is when MYSQL_BIN_LOG::write() would call
Repl_semisync_master::wait_after_sync(), the binlog offset parameter
was read as the end of MYSQL_BIN_LOG::log_file, which is shared
among transactions. So if T2 had updated the binary log _after_ T1
had released LOCK_log, but not yet invoked wait_after_sync(), it
would use the end of the binary log file as the binlog offset, which
was that of T2 (or any future transaction).

The fix in this patch ensures consistency between the binary log
coordinates a transaction uses between report_binlog_update() and
wait_after_sync().

Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2024-11-04 10:45:58 -07:00
..
archive Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
atomic
binlog Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
binlog_encryption Merge branch '10.5' into 10.6 2024-07-18 16:25:33 +02:00
client
compat MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
csv Backporting bugs fixes fixed by MDEV-31340 from 11.5 2024-05-21 14:58:01 +04:00
encryption MDEV-34830: LSN in the future is not being treated as serious corruption 2024-10-17 17:24:20 +03:00
engines Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
federated Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
funcs_1 MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
funcs_2
galera Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
galera_3nodes Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
galera_3nodes_sr Merge branch 10.5 into 10.6 2024-07-09 11:56:47 +02:00
galera_sr MDEV-34836: TOI on parent table must BF abort SR in progress on a child 2024-09-24 11:14:01 +02:00
gcol Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
handler MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
heap
innodb Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
innodb_fts Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
innodb_gis Merge branch '10.5' into 10.6 2024-10-15 16:00:44 +11:00
innodb_i_s
innodb_zip MDEV-34830: LSN in the future is not being treated as serious corruption 2024-10-17 17:24:20 +03:00
jp
json MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
large_tests fix failing large_tests.maria_recover_encrypted 2024-04-22 17:22:11 +02:00
maria MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
mariabackup MDEV-34830: LSN in the future is not being treated as serious corruption 2024-10-17 17:24:20 +03:00
mtr/t
mtr2
multi_source Merge 10.5 into 10.6 2024-04-17 14:14:58 +03:00
optimizer_unfixed_bugs
parts Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
perfschema Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
perfschema_stress
period Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
plugins MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
roles Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
rpl MDEV-35109: Semi-sync Replication stalling Primary using wait point=AFTER_SYNC 2024-11-04 10:45:58 -07:00
s3 MDEV-34867 engine S3 cause 500 error for huawei buckets 2024-09-11 16:15:37 +03:00
sql_sequence Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
storage_engine
stress MDEV-34453 Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1 2024-09-20 20:26:43 +05:30
sys_vars MDEV-34690 lock_rec_unlock_unmodified() causes deadlock 2024-10-23 12:36:17 +03:00
sysschema Merge 10.5 into 10.6 2024-03-12 09:19:57 +02:00
unit
vcol Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
versioning MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
wsrep Merge branch '10.5' into '10.6' 2024-09-01 06:51:25 +02:00