mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-30 18:41:56 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	7e0afb1c73	Merge 10.5 into 10.6	2024-10-03 09:31:39 +03:00
Lena Startseva	0a5e4a0191	MDEV-31005: Make working cursor-protocol Updated tests: cases with bugs or which cannot be run with the cursor-protocol were excluded with "--disable_cursor_protocol"/"--enable_cursor_protocol" Fix for v.10.5	2024-09-18 18:39:26 +07:00
Brandon Nesterenko	68938d2b42	MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail The failing test case validates Seconds_Behind_Master for a delayed slave, while STOP SLAVE is executed during a delay. The test fixes initially added to the test (commit `b04c857596`) added a table lock to ensure a transaction could not finish before validating the Seconds_Behind_Master field after SLAVE START, but did not address a possibility that the transaction could finish before running the STOP SLAVE command, which invalidates the validations for the rest of the test case. Specifically, this would result in 1) a timeout in “Waiting for table metadata lock” on the replica, which expects the transaction to retry after slave restart and hit a lock conflict on the locked tables (added in `b04c857596`), and 2) that Seconds_Behind_Master should have increased, but did not. The failure can be reproduced by synchronizing the slave to the master before the MDEV-32265 echo statement (i.e. before the SLAVE STOP). This patch fixes the test by adding a mechanism to use DEBUG_SYNC to synchronize a MASTER_DELAY, rather than continually increase the duration of the delay each time the test fails on buildbot. This is to ensure that on slow machines, a delay does not pass before the test gets a chance to validate results. Additionally, it decreases overall test time because the test can continue immediately after validation, thereby bypassing the remainder of a full delay for each transaction.	2024-09-17 06:29:20 -06:00
Marko Mäkelä	48becffd07	Merge 10.5 into 10.6	2024-08-27 08:52:10 +03:00
Kristian Nielsen	8642453ce6	Fix sporadic failure of test case rpl.rpl_start_stop_slave The test was expecting the I/O thread to be in a specific state, but thread scheduling may cause it to not yet have reached that state. So just have a loop that waits for the expected state to occur. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	214e6c5b3d	Fix sporadic failure of test case rpl.rpl_old_master Remove the test for MDEV-14528. This is supposed to test that parallel replication from pre-10.0 master will update Seconds_Behind_Master. But after MDEV-12179 the SQL thread is blocked from even beginning to fetch events from the relay log due to FLUSH TABLES WITH READ LOCK, so the test case is no longer testing what is was intended to. And pre-10.0 versions are long since out of support, so does not seem worthwhile to try to rewrite the test to work another way. The root cause of the test failure is MDEV-34778. Briefly, depending on exact timing during slave stop, the rli->sql_thread_caught_up flag may end up with different value. If it ends up as "true", this causes Seconds_Behind_Master to be 0 during next slave start; and this caused test case timeout as the test was waiting for Seconds_Behind_Master to become non-zero. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	7dc4ea5649	Fix sporadic test failure in rpl.rpl_create_drop_event Depending on timing, an extra event run could start just when the event scheduler is shut down and delay running until after the table has been dropped; this would cause the test to fail with a "table does not exist" error in the log. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	33854d7324	Restore skiping rpl.rpl_mdev6020 under Valgrind (Revert a change done by mistake when XtraDB was removed.) Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Oleksandr Byelkin	8f020508c8	Merge branch '10.5' into 10.6	2024-08-03 09:04:24 +02:00
Brandon Nesterenko	001608de7e	MDEV-15393: Fix rpl_mysqldump_gtid_slave_pos The slave would try to sync_with_master_gtid.inc, but the master never actually saved its gtid position so the test would move on too quickly.	2024-07-31 14:17:46 -06:00
Oleksandr Byelkin	a938503cfb	Merge branch '10.5' into 10.6	2024-07-20 08:12:42 +02:00
Andrei	b8f92ade57	MDEV-15393 gtid_slave_pos duplicate key errors after mysqldump restore When mysqldump is run to dump the `mysql` system database, it generates INSERT statements into the table `mysql.gtid_slave_pos`. After running the backup script those inserts did not produce the expected gtid state on slave. In particular the maximum of mysql.gtid_slave_pos.sub_id did not make into rpl_global_gtid_slave_state.last_sub_id an in-memory object that is supposed to match the current state of the table. And that was regardless of whether --gtid option was specified or not. Later when the backup recipient server starts as slave in non-gtid mode this desychronization may lead to a duplicate key error. This effect is corrected for --gtid mode mysqldump/mariadb-dump only as the following. The fixes ensure the insert block of the dump script is followed with a "summing-up" SET @global.gtid_slave_pos assignment. For the implemenation part, note a deferred print-out of SET-gtid_slave_pos and associated comments is prefered over relocating of the entire blocks if (opt_master,slave_data && do_show_master,slave_status) ... because of compatiblity concern. Namely an error inside do_show_*() is handled in the new code the same way, as early as, as before. A regression test can be run in how-to-reproduce mode as well. One affected mtr test observed. rpl_mysqldump_slave.result "mismatch" shows now the new deferring print of SET-gtid_slave_pos policy in action.	2024-07-19 21:44:12 +03:00
Oleksandr Byelkin	9af2caca33	Merge branch '10.5' into 10.6	2024-07-18 16:25:33 +02:00
Brandon Nesterenko	a061ae1079	MDEV-33921: Fix rpl_xa_empty_transaction.test The test was missing a save_master_gtid.inc on the master, leading to the slave thinking it was in sync after executing sync_with_master_gtid.inc, despite not having executed the latest transaction. This skipped transaction, XA COMMIT, was supposed to error-to-be-ignored because its XID could not be found, but be thrown out because the replication filters would filter out the target database. However, if the slave was able to stop before executing the transaction, then the replication filer is reset (to empty), and when the slave is later restarted, that transactions error would no longer be ignored. Additionally, as the test cases added in MDEV-33921 rely on GTID synchronization, the test cases now force master_use_gtid=slave_pos for consistency	2024-07-17 16:38:26 -06:00
Sergei Golubchik	d60f5c11ea	MDEV-34318 mariadb-dump SQL syntax error with MAX_STATEMENT_TIME against Percona MySQL server protect MariaDB conditional comments from a bug in Percona MySQL comment parser	2024-07-17 21:25:40 +02:00
Yuchen Pei	f071b7620b	Merge branch '10.5' into 10.6	2024-07-16 15:54:22 +08:00
Daniel Black	e8bcc4e455	MDEV-34568 rpl.rpl_mdev12179 - correct for Windows Simplify in an attempt to avoid: mysqltest: At line 275: File already exist: on the write_file lines. Using write_line as that's what a lot of other tests do for writing small bits to a expect file. Review thanks Valdislav Vaintroub	2024-07-12 12:55:28 +02:00
Brandon Nesterenko	ea9869504d	MDEV-33921: Replication breaks when filtering two-phase XA transactions There are two problems. First, replication fails when XA transactions are used where the slave has replicate_do_db set and the client has touched a different database when running DML such as inserts. This is because XA commands are not treated as keywords, and are thereby not exempt from the replication filter. The effect of this is that during an XA transaction, if its logged “use db” from the master is filtered out by the replication filter, then XA END will be ignored, yet its corresponding XA PREPARE will be executed in an invalid state, thereby breaking replication. Second, if the slave replicates an XA transaction which results in an empty transaction, the XA START through XA PREPARE first phase of the transaction won’t be binlogged, yet the XA COMMIT will be binlogged. This will break replication in chain configurations. The first problem is fixed by treating XA commands in Query_log_event as keywords, thus allowing them to bypass the replication filter. Note that Query_log_event::is_trans_keyword() is changed to accept a new parameter to define its mode, to either check for XA commands or regular transaction commands, but not both. In addition, mysqlbinlog is adapted to use this mode so its --database filter does not remove XA commands from its output. The second problem fixed by overwriting the XA state in the XID cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to rollback the transaction and skip its binlogging. If the xid cache is cleared before an XA transaction receives its completion command (e.g. on server shutdown), then before reporting ER_XAER_NOTA when the completion command is executed, the filter is first checked if the database is ignored, and if so, the error is ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-07-10 14:37:39 -06:00
Julius Goryavsky	4026f04425	Merge branch 10.5 into 10.6	2024-07-09 11:56:47 +02:00
Brandon Nesterenko	744580d5a7	MDEV-32892: IO Thread Reports False Error When Stopped During Connecting to Primary The IO thread can report error code 2013 into the error log when it is stopped during the initial connection process to the primary, as well as when trying to read an event. However, because the IO thread is being stopped, its connection to the primary is force-killed by the signaling thread (see THD::awake_no_mutex()), and thereby these connection errors should be ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-08 10:39:17 -06:00
Alexander Barkov	e56040fee8	Merge remote-tracking branch 'origin/10.5' into 10.6	2024-07-08 18:59:04 +04:00
Brandon Nesterenko	eb4458e993	MDEV-33465: an option to enable semisync recovery The current semi-sync binlog fail-over recovery process uses rpl_semi_sync_slave_enabled==TRUE as its condition to truncate a primary server’s binlog, as it is anticipating the server to re-join a replication topology as a replica. However, for servers configured with both rpl_semi_sync_master_enabled=1 and rpl_semi_sync_slave_enabled=1, if a primary is just re-started (i.e. retaining its role as master), it can truncate its binlog to drop transactions which its replica(s) has already received and executed. If this happens, when the replica reconnects, its gtid_slave_pos can be ahead of the recovered primary’s gtid_binlog_pos, resulting in an error state where the replica’s state is ahead of the primary’s. This patch changes the condition for semi-sync recovery to truncate the binlog to instead use the configuration variable --init-rpl-role, when set to SLAVE. This allows for both rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled to be set for a primary that is restarted, and no transactions will be lost, so long as --init-rpl-role is not set to SLAVE. Reviewed By: ============ Sergei Golubchik <serg@mariadb.com>	2024-07-05 19:53:57 -06:00
Brandon Nesterenko	cbc1898e82	MDEV-25607: Auto-generated DELETE from HEAP table can break replication The special logic used by the memory storage engine to keep slaves in sync with the master on a restart can break replication. In particular, after a restart, the master writes DELETE statements in the binlog for each MEMORY-based table so the slave can empty its data. If the DELETE is not executable, e.g. due to invalid triggers, the slave will error and fail, whereas the master will never see the problem. Instead of DELETE statements, use TRUNCATE to keep slaves in-sync with the master, thereby bypassing triggers. Reviewed By: =========== Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-07-05 12:00:09 -06:00
Marko Mäkelä	0076eb3d4e	Merge 10.5 into 10.6	2024-06-24 13:09:47 +03:00
Monty	3541bd63f0	MDEV-33582 Add more warnings to be able to better diagnose network issues Changed the logged messages from errors to warnings Also changed 'remain' to 'read_length' in the warning to make it more readable.	2024-06-20 09:53:01 +03:00
Brandon Nesterenko	6cab2f75fe	MDEV-23857: replication master password length After MDEV-4013, the maximum length of replication passwords was extended to 96 ASCII characters. After a restart, however, slaves only read the first 41 characters of MASTER_PASSWORD from the master.info file. This lead to slaves unable to reconnect to the master after a restart. After a slave restart, if a master.info file is detected, use the full allowable length of the password rather than 41 characters. Reviewed By: ============ Sergei Golubchik <serg@mariadb.com>	2024-06-18 07:21:18 -06:00
Brandon Nesterenko	fcd21d3e40	MDEV-34355: rpl.rpl_semi_sync_no_missed_ack_after_add_slave ‘server_3 should have sent…’ The problem is that the test could query the status variable Rpl_semi_sync_slave_send_ack before the slave actually updated it. This would result in an immediate --die assertion killing the rest of the test. The bottom of this commit message has a small patch that can be applied to reproduce the test failure. This patch fixes the test failure by waiting for the variable to be updated before querying its value. diff --git a/sql/semisync_slave.cc b/sql/semisync_slave.cc index 9ddd4c5c8d7..60538079fce 100644 --- a/sql/semisync_slave.cc +++ b/sql/semisync_slave.cc @@ -303,7 +303,10 @@ int Repl_semi_sync_slave::slave_reply(Master_info *mi) reply_res= DBUG_EVALUATE_IF("semislave_failed_net_flush", 1, net_flush(net)); if (!reply_res) + { + sleep(1); rpl_semi_sync_slave_send_ack++; + } } DBUG_RETURN(reply_res); }	2024-06-10 12:27:20 -06:00
Sergei Golubchik	7b53672c63	Merge branch '10.5' into 10.6	2024-05-08 20:06:00 +02:00
Sergei Golubchik	7a789e2027	sporadic failures of rpl.rpl_parallel_sbm the test waits for the event to get stuck on MASTER_DELAY, but on a slow/overloaded slave the event might pass MASTER_DELAY before the test starts waiting. Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY), the event cannot avoid that,	2024-05-05 21:37:07 +02:00
Sergei Golubchik	cea083af9f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-05-05 21:37:07 +02:00
Kristian Nielsen	4b4db4a8e5	MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure Refinement of the original patch. Move the code to reset the kill up into the parent class Xid_apply_log_event, to also fix the similar issue for XA COMMIT. Increase the number of slave retries in the test case rpl.rpl_parallel_multi_domain_xa to fix some sporadic failures. The test generates massive amounts of conflicting transactions in multiple independent domains, which can cause multiple rollback+retry for a transaction as it conflicts with transactions in other domains one-by-one. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-05-05 19:01:56 +02:00
Kristian Nielsen	e365877bae	MDEV-33798: ROW base optimistic deadlock with concurrent writes on same table One case is conflicting transactions T1 and T2 with different domain id, in optimistic parallel replication in non-GTID mode. Then T2 will wait_for_prior_commit on T1; and if T1 got a row lock wait on T2 it would hang, as different domains caused the deadlock kill to be skipped in thd_rpl_deadlock_check(). More generally, if we have transactions T1 and T2 in one domain/master connection, and independent transactions U in another, then we can still deadlock like this: T1 row low wait on U U row lock wait on T2 T2 wait_for_prior_commit on T1 This commit enforces the deadlock kill in these cases. If the waited-for transaction is speculatively applied, then it will be deadlock killed in case of a conflict, even if the two transactions are in different domains or master connections. Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-05-02 21:07:45 +02:00
Sergei Golubchik	c1f3eff53f	Merge branch '10.5' into 10.6	2024-04-29 10:08:58 +02:00
Kristian Nielsen	553a4d6271	MDEV-33602: Sporadic test failure in rpl.rpl_gtid_stop_start The test could fail with a duplicate key error because switching to non-GTID mode could start at the wrong old-style position. The position could be wrong when the previous GTID connect was stopped before receiving the fake GTID list event which gives the old-style position corresponding to the GTID connected position. Work-around by injecting an extra event and syncing the slave before switching to non-GTID mode. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-25 11:00:45 +02:00
Sergei Golubchik	e83d92ee5e	sporadic failures of rpl.rpl_semi_sync_fail_over in the $case=2 - it's wrong to kill after the first binlog EOF, because that might happen between INSERT(4) and INSERT(5). So, wait for the slave to acknowledge INSERT(5) before killing the master, that is, both connection threads must pass repl_semisync_master.wait_after_sync()	2024-04-21 22:54:52 +02:00
Sergei Golubchik	6242783f24	rpl.rpl_semi_sync_fail_over improve debugability	2024-04-21 14:03:26 +02:00
Sergei Golubchik	a4b6409ff6	sporadic failures of binlog_encryption.rpl_parallel_slave_bgc_kill do CHANGE MASTER before sync_with_master to have the slave in a predictable fully synced state before the next test	2024-04-21 01:17:31 +02:00
Sergei Golubchik	d8368ae289	Merge '10.5' into 10.6	2024-04-20 14:47:26 +02:00
Kristian Nielsen	0c249ad718	MDEV-30232: rpl.rpl_gtid_crash fails sporadically in BB The root cause of the failure is a bug in the Linux network stack: https://lore.kernel.org/netdev/87sf0ldk41.fsf@urd.knielsen-hq.org/T/#u If the slave does a connect(2) at the exact same time that kill -9 of the master process closes the listening socket, the FIN or RST packet is lost in the kernel, and the slave ends up timing out waiting for the initial communication from the server. This timeout defaults to --slave-net-timeout=120, which causes include/master_gtid_wait.inc to time out first and fail the test. Work-around this problem by reducing the --slave-net-timeout for this test case. If this problem turns up in other tests, we can consider reducing the default value for all tests. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-20 13:41:08 +02:00
Marko Mäkelä	bb2e125d07	Merge 10.5 into 10.6 This excludes commit `040069f4ba` because it is specific to innodb_sync_debug, which had been removed in commit `ff5d306e29`.	2024-04-18 07:14:56 +03:00
Brandon Nesterenko	0ad52e4d6a	MDEV-27512: Assertion !thd->transaction_rollback_request failed in rows_event_stmt_cleanup If replicating an event in ROW format, and InnoDB detects a deadlock while searching for a row, the row event will error and rollback in InnoDB and indicate that the binlog cache also needs to be cleared, i.e. by marking thd->transaction_rollback_request. In the normal case, this will trigger an error in Rows_log_event::do_apply_event() and cause a rollback. During the Rows_log_event::do_apply_event() cleanup of a successful event application, there is a DBUG_ASSERT in log_event_server.cc::rows_event_stmt_cleanup(), which sets the expectation that thd->transaction_rollback_request cannot be set because the general rollback (i.e. not the InnoDB rollback) should have happened already. However, if the replica is configured to skip deadlock errors, the rows event logic will clear the error and continue on, as if no error happened. This results in thd->transaction_rollback_request being set while in rows_event_stmt_cleanup(), thereby triggering the assertion. This patch fixes this in the following ways: 1) The assertion is invalid, and thereby removed. 2) The rollback case is forced in rows_event_stmt_cleanup() if transaction_rollback_request is set. Note the differing behavior between transactions which are skipped due to deadlock errors and other errors. When a transaction is skipped due to an ignored deadlock error, the entire transaction is rolled back and skipped (though note MDEV-33930 which allows statements in the same transaction after the deadlock-inducing one to commit). When a transaction is skipped due to ignoring a different error, only the erroring statements are rolled-back and skipped - the rest of the transaction will execute as normal. The effect of this can be seen in the test results. The added test case to rpl_skip_error.test shows that only statements which are ignored due to non-deadlock errors are ignored in larger transactions. A diff between rpl_temporary_error2_skip_all.result and rpl_temporary_error2.result shows that all statements in the errored transaction are rolled back (diff pasted below): : diff rpl_temporary_error2.result rpl_temporary_error2_skip_all.result 49c49 < 2 1 --- > 2 NULL 51c51 < 4 1 --- > 4 NULL 53c53 < * There will be two rows in t2 due to the retry. --- > * There will be one row in t2 because the ignored deadlock does not retry. 57d56 < 1 59c58 < 1 --- > 0 Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2024-04-17 11:14:21 -06:00
Vladislav Vaintroub	061adae9a2	MDEV-16944 Fix file sharing issues on Windows in mysqltest On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to conflicting share modes between processes accessing the same file can result in CreateFile failures. mysys' my_open() already incorporates a workaround by implementing wait/retry logic on Windows. But this does not help if files are opened using shell redirection like mysqltest traditionally did it, i.e via --echo exec "some text" > output_file In such cases, it is cmd.exe, that opens the output_file, and it won't do any sharing-violation retries. This commit addresses the issue by introducing a new built-in command, 'write_line', in mysqltest. This new command serves as a brief alternative to 'write_file', with a single line output, that also resolves variables like "exec" would. Internally, this command will use my_open(), and therefore retry-on-error logic. Hopefully this will eliminate the very sporadic "can't open file because it is used by another process" error on CI.	2024-04-17 16:52:37 +02:00
Marko Mäkelä	829cb1a49c	Merge 10.5 into 10.6	2024-04-17 14:14:58 +03:00
Daniel Black	ea810b04cb	MDEV-30676 rpl.parallel_backup* tests sometimes fail Raise innodb_lock_wait_timeout from 1 to 5	2024-04-15 15:45:03 +10:00
Brandon Nesterenko	a6aecbb036	MDEV-10684: rpl.rpl_domain_id_filter_restart fails in buildbot The test failure in rpl.rpl_domain_id_filter_restart is caused by MDEV-33887. That is, the test uses master_pos_wait() (called indirectly by sync_slave_with_master) to try and wait for the replica to catch up to the master. However, the waited on transaction is ignored by the configured CHANGE MASTER TO IGNORE_DOMAIN_IDS=() As MDEV-33887 reports, due to the IO thread updating the binlog coordinates and the SQL thread updating the GTID state, if the replica is stopped in-between these updates, the replica state will be inconsistent. That is, the test expects that the GTID state will be updated, so upon restart, the replica will be up-to-date. However, if the replica is stopped before the SQL thread updates its GTID state, then upon restart, the replica will fetch the previously ignored event, which is no longer ignored upon restart, and execute it. This leads to the sporadic extra row in t2. This patch changes master_pos_wait() to use master_gtid_wait() to ensure the replica state is consistent with the master state.	2024-04-11 09:49:20 -06:00
Sergei Golubchik	340d93a8cc	cleanup: rpl.rpl_semi_sync_shutdown_await_ack avoid using multiple files with the same functionality.	2024-04-11 15:28:55 +02:00
Sergei Golubchik	41296a07c8	Merge branch '10.5' into 10.6	2024-04-11 13:58:22 +02:00
Sergei Golubchik	2d2172a5cf	sporadic failures of rpl.rpl_semi_sync_master_shutdown increase the MASTER_CONNECT_RETRY time under valgrind, otherwise the slave gives up retrying before the master is ready also, cosmetic cleanup of rpl_semi_sync_master_shutdown.test	2024-04-10 19:38:39 +02:00
Andrei	0da1653f1b	MDEV-31779 Server crash in Rows_log_event::update_sequence upon replaying binary log The crash at running mysqlbinlog on a SEQUENCE containing binlog file was caused MDEV-29621 fixes that did not check which of the slave or binlog applier executes a block introduced there. The block is meaningful only for the parallel slave applier, so it's safe to fix this bug with identified the actual applier and skipping the block when it's the mysqlbinlog one.	2024-04-10 19:31:39 +03:00
Brandon Nesterenko	952ab9a596	MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown The signal handler thread can use various different runtime resources when processing a SIGHUP (e.g. master-info information) due to calling into reload_acl_and_cache(). Currently, the shutdown process waits for the termination of the signal thread after performing cleanup. However, this could cause resources actively used by the signal handler to be freed while reload_acl_and_cache() is processing. The specific resource that caused MDEV-30260 is a race condition for the hostname_cache, such that mysqld would delete it in clean_up()::hostname_cache_free(), before the signal handler would use it in reload_acl_and_cache()::hostname_cache_refresh(). Another similar resource is the active_mi/master_info_index. There was a race between its deletion by the main thread in end_slave(), and their usage by the Signal Handler as a part of Master_info_index::flush_all_relay_logs.read(active_mi) in reload_acl_and_cache(). This patch fixes these race conditions by relocating where server shutdown waits for the signal handler to die until after server-level threads have been killed (i.e., as a last step of close_connections()). With respect to the hostname_cache, active_mi and master_info_cache, this ensures that they cannot be destroyed while the signal handler is still active, and potentially using them. Additionally: 1) This requires that Events memory is still in place for SIGHUP handling's mysql_print_status(). So event deinitialization is moved into clean_up(), but the event scheduler still needs to be stopped in close_connections() at the same spot. 2) The function kill_server_thread is no longer used, so it is deleted 3) The timeout to wait for the death of the signal thread was not consistent with the comment. The comment mentioned up to 10 seconds, whereas it was actually 0.01s. The code has been fixed to wait up to 10 seconds. 4) A warning has been added if the signal handler thread fails to exit in time. 5) Added pthread_join() to end of wait_for_signal_thread_to_end() if it hadn't ended in 10s with a warning. Note this also removes the pthread_detached attribute from the signal_thread to allow for the pthread_join(). Reviewed By: =========== Vladislav Vaintroub <wlad@mariadb.com> Andrei Elkin <andrei.elkin@mariadb.com>	2024-04-09 14:25:13 -06:00

1 2 3 4 5 ...

4197 commits