mariadb/mysql-test/suite/rpl
Brandon Nesterenko 952ab9a596 MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown
The signal handler thread can use various different runtime
resources when processing a SIGHUP (e.g. master-info information)
due to calling into reload_acl_and_cache(). Currently, the shutdown
process waits for the termination of the signal thread after
performing cleanup. However, this could cause resources actively
used by the signal handler to be freed while reload_acl_and_cache()
is processing.

The specific resource that caused MDEV-30260 is a race condition for
the hostname_cache, such that mysqld would delete it in
clean_up()::hostname_cache_free(), before the signal handler would
use it in reload_acl_and_cache()::hostname_cache_refresh().

Another similar resource is the active_mi/master_info_index. There
was a race between its deletion by the main thread in end_slave(),
and their usage by the Signal Handler as a part of
Master_info_index::flush_all_relay_logs.read(active_mi) in
reload_acl_and_cache().

This patch fixes these race conditions by relocating where server
shutdown waits for the signal handler to die until after
server-level threads have been killed (i.e., as a last step of
close_connections()). With respect to the hostname_cache, active_mi
and master_info_cache, this ensures that they cannot be destroyed
while the signal handler is still active, and potentially using
them.

Additionally:

 1) This requires that Events memory is still in place for SIGHUP
handling's mysql_print_status(). So event deinitialization is moved
into clean_up(), but the event scheduler still needs to be stopped
in close_connections() at the same spot.

 2) The function kill_server_thread is no longer used, so it is
deleted

 3) The timeout to wait for the death of the signal thread was not
consistent with the comment. The comment mentioned up to 10 seconds,
whereas it was actually 0.01s. The code has been fixed to wait up to
10 seconds.

 4) A warning has been added if the signal handler thread fails to
exit in time.

 5) Added pthread_join() to end of wait_for_signal_thread_to_end()
if it hadn't ended in 10s with a warning. Note this also removes
the pthread_detached attribute from the signal_thread to allow
for the pthread_join().

Reviewed By:
===========
Vladislav Vaintroub <wlad@mariadb.com>
Andrei Elkin <andrei.elkin@mariadb.com>
2024-04-09 14:25:13 -06:00
..
extension mtr: use env for perl 2020-06-23 03:24:46 +02:00
include MDEV-29816 rpl.rpl_parallel_29322 occasionally fails in BB 2023-12-11 12:02:58 +01:00
r MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown 2024-04-09 14:25:13 -06:00
t MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown 2024-04-09 14:25:13 -06:00
disabled.def MDEV-17390: re-neable rpl_semi_sync_after_sync test 2022-06-17 19:38:43 +03:00
my.cnf Remove duplicated default client include from replication my.cnf 2023-09-14 12:56:41 +02:00
README
rpl_1slave_base.cnf

How to run.
===========

./mysql-test-run.pl --suite=rpl --mysqld=--binlog-format=mixed