MDEV-33500: rpl.rpl_parallel_sbm can fail on slow machines, e.g. MSAN/Valgrind builders

In an addition to test rpl.rpl_parallel_sbm added by MDEV-32265, the
test uses sleep statements alone to test Seconds_Behind_Master with
delayed replication. On slow running machines, the test can pass the
intended MASTER_DELAY duration and Seconds_Behind_Master can become
0, when the test expects the transaction to still be actively in a
delaying state.

This can be consistently reproduced by adding a sleep statement
before the call to

--let = query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1)

to sleep past the delay end point.

This patch fixes this by locking the table which the delayed
transaction targets so Second_Behind_Master cannot be updated before
the test reads it for validation.
This commit is contained in:
Brandon Nesterenko 2024-02-20 08:16:15 -07:00
parent 1bbbb66e46
commit b04c857596
2 changed files with 17 additions and 0 deletions

View file

@ -27,12 +27,19 @@ connection slave;
# delaying a transaction; then when the reciprocal START SLAVE occurs,
# if the event is still to be delayed, SBM should resume accordingly
include/stop_slave.inc
# Lock t1 on slave to ensure the event can't finish (and thereby update
# Seconds_Behind_Master) so slow running servers don't accidentally
# catch up to the master before checking SBM.
connection server_2;
LOCK TABLES t1 WRITE;
include/start_slave.inc
connection slave;
# Waiting for replica to resume the delay for the transaction
# Sleeping 1s to increment SBM
# Ensuring Seconds_Behind_Master increases after sleeping..
# ..done
connection server_2;
UNLOCK TABLES;
include/sync_with_master_gtid.inc
#
# Pt 2) If the worker threads have not entered an idle state, ensure

View file

@ -67,6 +67,13 @@ if (`SELECT $sbm_trx1_arrive > ($seconds_since_idling + 1)`)
--echo # if the event is still to be delayed, SBM should resume accordingly
--source include/stop_slave.inc
--echo # Lock t1 on slave to ensure the event can't finish (and thereby update
--echo # Seconds_Behind_Master) so slow running servers don't accidentally
--echo # catch up to the master before checking SBM.
--connection server_2
LOCK TABLES t1 WRITE;
--source include/start_slave.inc
--connection slave
@ -86,6 +93,9 @@ if (`SELECT $sbm_trx1_after_1s_sleep <= $sbm_trx1_arrive`)
}
--echo # ..done
--connection server_2
UNLOCK TABLES;
--source include/sync_with_master_gtid.inc
--echo #