mariadb/mysql-test/suite/rpl/r/rpl_shutdown_sighup.result
Brandon Nesterenko 952ab9a596 MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown
The signal handler thread can use various different runtime
resources when processing a SIGHUP (e.g. master-info information)
due to calling into reload_acl_and_cache(). Currently, the shutdown
process waits for the termination of the signal thread after
performing cleanup. However, this could cause resources actively
used by the signal handler to be freed while reload_acl_and_cache()
is processing.

The specific resource that caused MDEV-30260 is a race condition for
the hostname_cache, such that mysqld would delete it in
clean_up()::hostname_cache_free(), before the signal handler would
use it in reload_acl_and_cache()::hostname_cache_refresh().

Another similar resource is the active_mi/master_info_index. There
was a race between its deletion by the main thread in end_slave(),
and their usage by the Signal Handler as a part of
Master_info_index::flush_all_relay_logs.read(active_mi) in
reload_acl_and_cache().

This patch fixes these race conditions by relocating where server
shutdown waits for the signal handler to die until after
server-level threads have been killed (i.e., as a last step of
close_connections()). With respect to the hostname_cache, active_mi
and master_info_cache, this ensures that they cannot be destroyed
while the signal handler is still active, and potentially using
them.

Additionally:

 1) This requires that Events memory is still in place for SIGHUP
handling's mysql_print_status(). So event deinitialization is moved
into clean_up(), but the event scheduler still needs to be stopped
in close_connections() at the same spot.

 2) The function kill_server_thread is no longer used, so it is
deleted

 3) The timeout to wait for the death of the signal thread was not
consistent with the comment. The comment mentioned up to 10 seconds,
whereas it was actually 0.01s. The code has been fixed to wait up to
10 seconds.

 4) A warning has been added if the signal handler thread fails to
exit in time.

 5) Added pthread_join() to end of wait_for_signal_thread_to_end()
if it hadn't ended in 10s with a warning. Note this also removes
the pthread_detached attribute from the signal_thread to allow
for the pthread_join().

Reviewed By:
===========
Vladislav Vaintroub <wlad@mariadb.com>
Andrei Elkin <andrei.elkin@mariadb.com>
2024-04-09 14:25:13 -06:00

50 lines
1.8 KiB
Text

include/master-slave.inc
[connection master]
connection slave;
set statement sql_log_bin=0 for call mtr.add_suppression("Signal handler thread did not exit in a timely manner");
#
# Main test
connection master;
create table t1 (a int);
insert into t1 values (1);
include/save_master_gtid.inc
connection slave;
include/sync_with_master_gtid.inc
set @@global.debug_dbug= "+d,hold_sighup_log_refresh";
# Waiting for sighup to reach reload_acl_and_cache..
set debug_sync="now wait_for in_reload_acl_and_cache";
# Signalling signal handler to proceed to sleep before REFRESH_HOSTS
set debug_sync="now signal refresh_logs";
# Starting shutdown (note this will take 3+ seconds due to DBUG my_sleep in reload_acl_and_cache)
shutdown;
connection server_2;
connection slave;
include/assert_grep.inc [Ensure Mariadbd did not segfault when shutting down]
connection master;
connection slave;
#
# Error testcase to ensure an error message is shown if the signal
# takes longer than the timeout while processing the SIGHUP
connection slave;
set @@global.debug_dbug= "+d,force_sighup_processing_timeout";
set @@global.debug_dbug= "+d,hold_sighup_log_refresh";
connection master;
insert into t1 values (1);
include/save_master_gtid.inc
connection slave;
include/sync_with_master_gtid.inc
# Waiting for sighup to reach reload_acl_and_cache..
set debug_sync="now wait_for in_reload_acl_and_cache";
# Signalling signal handler to proceed to sleep before REFRESH_HOSTS
set debug_sync="now signal refresh_logs";
# Starting shutdown (note this will take 3+ seconds due to DBUG my_sleep in reload_acl_and_cache)
shutdown;
connection server_2;
connection slave;
include/assert_grep.inc [Ensure warning is issued that signal handler thread is still processing]
#
# Cleanup
connection master;
drop table t1;
include/rpl_end.inc
# End of rpl_shutdown_sighup.test