MDEV-32168: Postpush fix for rpl_domain_id_filter_master_crash

While a replica may be reading events from the
primary, the primary is killed. Left to its own
devices, the IO thread may or may not stop in
error, depending on what it is doing when its
connection to the primary is killed (e.g. a
failed read results in an error, whereas if the
IO thread is idly waiting for events when the
connection dies, it will enter into a reconnect
loop and reconnect). MDEV-32168 changed the test
to always wait for the reconnect, thus breaking
the error case, as the IO thread would be stopped
at a time of expecting it to be running.

The fix is to manually stop/start the IO thread
to ensure it is in a consistent state.

Note that rpl_domain_id_filter_master_crash.test
will need additional changes after fixing MDEV-33268

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
This commit is contained in:
Brandon Nesterenko 2024-01-17 07:14:11 -07:00
parent 3cd8875145
commit 01ca57ec16
2 changed files with 21 additions and 4 deletions

View file

@ -38,8 +38,9 @@ connection master;
include/rpl_start_server.inc [server_number=1]
# Master has restarted successfully
connection slave;
include/wait_for_slave_io_to_start.inc
include/wait_for_slave_sql_to_start.inc
include/stop_slave_sql.inc
include/stop_slave_io.inc
include/start_slave.inc
select * from ti;
a
1

View file

@ -67,10 +67,26 @@ connection master;
save_master_pos;
--connection slave
# Left to its own devices, the IO thread may or may not stop in error,
# depending on what it is doing when its connection to the primary is killed
# (e.g. a failed read results in an error, whereas if the IO thread is idly
# waiting for events when the connection dies, it will enter into a reconnect
# loop and reconnect). So we manually stop/start the IO thread to ensure it is
# in a consistent state
#
# FIXME: We shouldn't need to stop/start the SQL thread here, but due to
# MDEV-33268, we have to. So after fixing 33268, this should only stop/start
# the IO thread. Note the SQL thread must be stopped first due to an invalid
# DBUG_ASSERT in the IO thread's stop logic that depends on the state of the
# SQL thread (also reported and to be fixed in the same ticket).
#
--source include/stop_slave_sql.inc
--let rpl_allow_error=1
--source include/wait_for_slave_io_to_start.inc
--source include/stop_slave_io.inc
--let rpl_allow_error=
--source include/wait_for_slave_sql_to_start.inc
--source include/start_slave.inc
sync_with_master;
select * from ti;
select * from tm;