mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-04-14 19:25:34 +02:00

Author	SHA1	Message	Date
Sergei Golubchik	9508a44c37	enforce no trailing \n in Diagnostic_area messages that is in my_error(), push_warning(), etc	2025-01-07 16:31:39 +01:00
Andrei Elkin	bc6121819c	MDEV-35098 rpl.rpl_mysqldump_gtid_slave_pos fails in buildbot The test turns out to be senstive to @@global.gtid_cleanup_batch_size. With a rather small default value of the latter SELECTing from mysql.gtid_slave_pos may not be deterministic: tests that run before may increase a pending for automitic deletion batch. The test is refined to set its own value for the batch size which is virtually unreachable. Thanks to Kristian Nielsen for the analysis.	2024-12-16 19:43:41 +02:00
Kristian Nielsen	b4fde50b1f	MDEV-5798: Wrong errorcode for missing partition after TRUNCATE PARTITION The partitioning error handling code was looking at thd->lex->alter_info.partition_flags in non-alter-table cases, in which cases the value is stale and contains whatever was set by any earlier ALTER TABLE. This could cause the wrong error code to be generated, which then in some cases can cause replication to break with "different errorcode" error. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-12-05 08:17:35 +01:00
Sergei Golubchik	5ebda30ccc	Revert "MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2" This reverts commit `8ae462a220`.	2024-10-16 13:23:47 +02:00
Kristian Nielsen	8ae462a220	MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2 Implement variable legacy_xa_rollback_at_disconnect to support backwards compatibility for applications that rely on the pre-10.5 behavior for connection disconnect, which is to rollback the transaction (in violation of the XA specification). Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-10-16 10:18:36 +02:00
Lena Startseva	0a5e4a0191	MDEV-31005: Make working cursor-protocol Updated tests: cases with bugs or which cannot be run with the cursor-protocol were excluded with "--disable_cursor_protocol"/"--enable_cursor_protocol" Fix for v.10.5	2024-09-18 18:39:26 +07:00
Brandon Nesterenko	68938d2b42	MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail The failing test case validates Seconds_Behind_Master for a delayed slave, while STOP SLAVE is executed during a delay. The test fixes initially added to the test (commit `b04c857596`) added a table lock to ensure a transaction could not finish before validating the Seconds_Behind_Master field after SLAVE START, but did not address a possibility that the transaction could finish before running the STOP SLAVE command, which invalidates the validations for the rest of the test case. Specifically, this would result in 1) a timeout in “Waiting for table metadata lock” on the replica, which expects the transaction to retry after slave restart and hit a lock conflict on the locked tables (added in `b04c857596`), and 2) that Seconds_Behind_Master should have increased, but did not. The failure can be reproduced by synchronizing the slave to the master before the MDEV-32265 echo statement (i.e. before the SLAVE STOP). This patch fixes the test by adding a mechanism to use DEBUG_SYNC to synchronize a MASTER_DELAY, rather than continually increase the duration of the delay each time the test fails on buildbot. This is to ensure that on slow machines, a delay does not pass before the test gets a chance to validate results. Additionally, it decreases overall test time because the test can continue immediately after validation, thereby bypassing the remainder of a full delay for each transaction.	2024-09-17 06:29:20 -06:00
Kristian Nielsen	8642453ce6	Fix sporadic failure of test case rpl.rpl_start_stop_slave The test was expecting the I/O thread to be in a specific state, but thread scheduling may cause it to not yet have reached that state. So just have a loop that waits for the expected state to occur. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	214e6c5b3d	Fix sporadic failure of test case rpl.rpl_old_master Remove the test for MDEV-14528. This is supposed to test that parallel replication from pre-10.0 master will update Seconds_Behind_Master. But after MDEV-12179 the SQL thread is blocked from even beginning to fetch events from the relay log due to FLUSH TABLES WITH READ LOCK, so the test case is no longer testing what is was intended to. And pre-10.0 versions are long since out of support, so does not seem worthwhile to try to rewrite the test to work another way. The root cause of the test failure is MDEV-34778. Briefly, depending on exact timing during slave stop, the rli->sql_thread_caught_up flag may end up with different value. If it ends up as "true", this causes Seconds_Behind_Master to be 0 during next slave start; and this caused test case timeout as the test was waiting for Seconds_Behind_Master to become non-zero. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	7dc4ea5649	Fix sporadic test failure in rpl.rpl_create_drop_event Depending on timing, an extra event run could start just when the event scheduler is shut down and delay running until after the table has been dropped; this would cause the test to fail with a "table does not exist" error in the log. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	33854d7324	Restore skiping rpl.rpl_mdev6020 under Valgrind (Revert a change done by mistake when XtraDB was removed.) Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Brandon Nesterenko	001608de7e	MDEV-15393: Fix rpl_mysqldump_gtid_slave_pos The slave would try to sync_with_master_gtid.inc, but the master never actually saved its gtid position so the test would move on too quickly.	2024-07-31 14:17:46 -06:00
Andrei	b8f92ade57	MDEV-15393 gtid_slave_pos duplicate key errors after mysqldump restore When mysqldump is run to dump the `mysql` system database, it generates INSERT statements into the table `mysql.gtid_slave_pos`. After running the backup script those inserts did not produce the expected gtid state on slave. In particular the maximum of mysql.gtid_slave_pos.sub_id did not make into rpl_global_gtid_slave_state.last_sub_id an in-memory object that is supposed to match the current state of the table. And that was regardless of whether --gtid option was specified or not. Later when the backup recipient server starts as slave in non-gtid mode this desychronization may lead to a duplicate key error. This effect is corrected for --gtid mode mysqldump/mariadb-dump only as the following. The fixes ensure the insert block of the dump script is followed with a "summing-up" SET @global.gtid_slave_pos assignment. For the implemenation part, note a deferred print-out of SET-gtid_slave_pos and associated comments is prefered over relocating of the entire blocks if (opt_master,slave_data && do_show_master,slave_status) ... because of compatiblity concern. Namely an error inside do_show_*() is handled in the new code the same way, as early as, as before. A regression test can be run in how-to-reproduce mode as well. One affected mtr test observed. rpl_mysqldump_slave.result "mismatch" shows now the new deferring print of SET-gtid_slave_pos policy in action.	2024-07-19 21:44:12 +03:00
Brandon Nesterenko	a061ae1079	MDEV-33921: Fix rpl_xa_empty_transaction.test The test was missing a save_master_gtid.inc on the master, leading to the slave thinking it was in sync after executing sync_with_master_gtid.inc, despite not having executed the latest transaction. This skipped transaction, XA COMMIT, was supposed to error-to-be-ignored because its XID could not be found, but be thrown out because the replication filters would filter out the target database. However, if the slave was able to stop before executing the transaction, then the replication filer is reset (to empty), and when the slave is later restarted, that transactions error would no longer be ignored. Additionally, as the test cases added in MDEV-33921 rely on GTID synchronization, the test cases now force master_use_gtid=slave_pos for consistency	2024-07-17 16:38:26 -06:00
Sergei Golubchik	d60f5c11ea	MDEV-34318 mariadb-dump SQL syntax error with MAX_STATEMENT_TIME against Percona MySQL server protect MariaDB conditional comments from a bug in Percona MySQL comment parser	2024-07-17 21:25:40 +02:00
Daniel Black	e8bcc4e455	MDEV-34568 rpl.rpl_mdev12179 - correct for Windows Simplify in an attempt to avoid: mysqltest: At line 275: File already exist: on the write_file lines. Using write_line as that's what a lot of other tests do for writing small bits to a expect file. Review thanks Valdislav Vaintroub	2024-07-12 12:55:28 +02:00
Brandon Nesterenko	ea9869504d	MDEV-33921: Replication breaks when filtering two-phase XA transactions There are two problems. First, replication fails when XA transactions are used where the slave has replicate_do_db set and the client has touched a different database when running DML such as inserts. This is because XA commands are not treated as keywords, and are thereby not exempt from the replication filter. The effect of this is that during an XA transaction, if its logged “use db” from the master is filtered out by the replication filter, then XA END will be ignored, yet its corresponding XA PREPARE will be executed in an invalid state, thereby breaking replication. Second, if the slave replicates an XA transaction which results in an empty transaction, the XA START through XA PREPARE first phase of the transaction won’t be binlogged, yet the XA COMMIT will be binlogged. This will break replication in chain configurations. The first problem is fixed by treating XA commands in Query_log_event as keywords, thus allowing them to bypass the replication filter. Note that Query_log_event::is_trans_keyword() is changed to accept a new parameter to define its mode, to either check for XA commands or regular transaction commands, but not both. In addition, mysqlbinlog is adapted to use this mode so its --database filter does not remove XA commands from its output. The second problem fixed by overwriting the XA state in the XID cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to rollback the transaction and skip its binlogging. If the xid cache is cleared before an XA transaction receives its completion command (e.g. on server shutdown), then before reporting ER_XAER_NOTA when the completion command is executed, the filter is first checked if the database is ignored, and if so, the error is ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-07-10 14:37:39 -06:00
Brandon Nesterenko	744580d5a7	MDEV-32892: IO Thread Reports False Error When Stopped During Connecting to Primary The IO thread can report error code 2013 into the error log when it is stopped during the initial connection process to the primary, as well as when trying to read an event. However, because the IO thread is being stopped, its connection to the primary is force-killed by the signaling thread (see THD::awake_no_mutex()), and thereby these connection errors should be ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-08 10:39:17 -06:00
Brandon Nesterenko	cbc1898e82	MDEV-25607: Auto-generated DELETE from HEAP table can break replication The special logic used by the memory storage engine to keep slaves in sync with the master on a restart can break replication. In particular, after a restart, the master writes DELETE statements in the binlog for each MEMORY-based table so the slave can empty its data. If the DELETE is not executable, e.g. due to invalid triggers, the slave will error and fail, whereas the master will never see the problem. Instead of DELETE statements, use TRUNCATE to keep slaves in-sync with the master, thereby bypassing triggers. Reviewed By: =========== Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-07-05 12:00:09 -06:00
Brandon Nesterenko	6cab2f75fe	MDEV-23857: replication master password length After MDEV-4013, the maximum length of replication passwords was extended to 96 ASCII characters. After a restart, however, slaves only read the first 41 characters of MASTER_PASSWORD from the master.info file. This lead to slaves unable to reconnect to the master after a restart. After a slave restart, if a master.info file is detected, use the full allowable length of the password rather than 41 characters. Reviewed By: ============ Sergei Golubchik <serg@mariadb.com>	2024-06-18 07:21:18 -06:00
Sergei Golubchik	7a789e2027	sporadic failures of rpl.rpl_parallel_sbm the test waits for the event to get stuck on MASTER_DELAY, but on a slow/overloaded slave the event might pass MASTER_DELAY before the test starts waiting. Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY), the event cannot avoid that,	2024-05-05 21:37:07 +02:00
Sergei Golubchik	cea083af9f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-05-05 21:37:07 +02:00
Kristian Nielsen	553a4d6271	MDEV-33602: Sporadic test failure in rpl.rpl_gtid_stop_start The test could fail with a duplicate key error because switching to non-GTID mode could start at the wrong old-style position. The position could be wrong when the previous GTID connect was stopped before receiving the fake GTID list event which gives the old-style position corresponding to the GTID connected position. Work-around by injecting an extra event and syncing the slave before switching to non-GTID mode. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-25 11:00:45 +02:00
Kristian Nielsen	0c249ad718	MDEV-30232: rpl.rpl_gtid_crash fails sporadically in BB The root cause of the failure is a bug in the Linux network stack: https://lore.kernel.org/netdev/87sf0ldk41.fsf@urd.knielsen-hq.org/T/#u If the slave does a connect(2) at the exact same time that kill -9 of the master process closes the listening socket, the FIN or RST packet is lost in the kernel, and the slave ends up timing out waiting for the initial communication from the server. This timeout defaults to --slave-net-timeout=120, which causes include/master_gtid_wait.inc to time out first and fail the test. Work-around this problem by reducing the --slave-net-timeout for this test case. If this problem turns up in other tests, we can consider reducing the default value for all tests. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-20 13:41:08 +02:00
Brandon Nesterenko	0ad52e4d6a	MDEV-27512: Assertion !thd->transaction_rollback_request failed in rows_event_stmt_cleanup If replicating an event in ROW format, and InnoDB detects a deadlock while searching for a row, the row event will error and rollback in InnoDB and indicate that the binlog cache also needs to be cleared, i.e. by marking thd->transaction_rollback_request. In the normal case, this will trigger an error in Rows_log_event::do_apply_event() and cause a rollback. During the Rows_log_event::do_apply_event() cleanup of a successful event application, there is a DBUG_ASSERT in log_event_server.cc::rows_event_stmt_cleanup(), which sets the expectation that thd->transaction_rollback_request cannot be set because the general rollback (i.e. not the InnoDB rollback) should have happened already. However, if the replica is configured to skip deadlock errors, the rows event logic will clear the error and continue on, as if no error happened. This results in thd->transaction_rollback_request being set while in rows_event_stmt_cleanup(), thereby triggering the assertion. This patch fixes this in the following ways: 1) The assertion is invalid, and thereby removed. 2) The rollback case is forced in rows_event_stmt_cleanup() if transaction_rollback_request is set. Note the differing behavior between transactions which are skipped due to deadlock errors and other errors. When a transaction is skipped due to an ignored deadlock error, the entire transaction is rolled back and skipped (though note MDEV-33930 which allows statements in the same transaction after the deadlock-inducing one to commit). When a transaction is skipped due to ignoring a different error, only the erroring statements are rolled-back and skipped - the rest of the transaction will execute as normal. The effect of this can be seen in the test results. The added test case to rpl_skip_error.test shows that only statements which are ignored due to non-deadlock errors are ignored in larger transactions. A diff between rpl_temporary_error2_skip_all.result and rpl_temporary_error2.result shows that all statements in the errored transaction are rolled back (diff pasted below): : diff rpl_temporary_error2.result rpl_temporary_error2_skip_all.result 49c49 < 2 1 --- > 2 NULL 51c51 < 4 1 --- > 4 NULL 53c53 < * There will be two rows in t2 due to the retry. --- > * There will be one row in t2 because the ignored deadlock does not retry. 57d56 < 1 59c58 < 1 --- > 0 Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2024-04-17 11:14:21 -06:00
Vladislav Vaintroub	061adae9a2	MDEV-16944 Fix file sharing issues on Windows in mysqltest On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to conflicting share modes between processes accessing the same file can result in CreateFile failures. mysys' my_open() already incorporates a workaround by implementing wait/retry logic on Windows. But this does not help if files are opened using shell redirection like mysqltest traditionally did it, i.e via --echo exec "some text" > output_file In such cases, it is cmd.exe, that opens the output_file, and it won't do any sharing-violation retries. This commit addresses the issue by introducing a new built-in command, 'write_line', in mysqltest. This new command serves as a brief alternative to 'write_file', with a single line output, that also resolves variables like "exec" would. Internally, this command will use my_open(), and therefore retry-on-error logic. Hopefully this will eliminate the very sporadic "can't open file because it is used by another process" error on CI.	2024-04-17 16:52:37 +02:00
Daniel Black	ea810b04cb	MDEV-30676 rpl.parallel_backup* tests sometimes fail Raise innodb_lock_wait_timeout from 1 to 5	2024-04-15 15:45:03 +10:00
Brandon Nesterenko	a6aecbb036	MDEV-10684: rpl.rpl_domain_id_filter_restart fails in buildbot The test failure in rpl.rpl_domain_id_filter_restart is caused by MDEV-33887. That is, the test uses master_pos_wait() (called indirectly by sync_slave_with_master) to try and wait for the replica to catch up to the master. However, the waited on transaction is ignored by the configured CHANGE MASTER TO IGNORE_DOMAIN_IDS=() As MDEV-33887 reports, due to the IO thread updating the binlog coordinates and the SQL thread updating the GTID state, if the replica is stopped in-between these updates, the replica state will be inconsistent. That is, the test expects that the GTID state will be updated, so upon restart, the replica will be up-to-date. However, if the replica is stopped before the SQL thread updates its GTID state, then upon restart, the replica will fetch the previously ignored event, which is no longer ignored upon restart, and execute it. This leads to the sporadic extra row in t2. This patch changes master_pos_wait() to use master_gtid_wait() to ensure the replica state is consistent with the master state.	2024-04-11 09:49:20 -06:00
Sergei Golubchik	2d2172a5cf	sporadic failures of rpl.rpl_semi_sync_master_shutdown increase the MASTER_CONNECT_RETRY time under valgrind, otherwise the slave gives up retrying before the master is ready also, cosmetic cleanup of rpl_semi_sync_master_shutdown.test	2024-04-10 19:38:39 +02:00
Andrei	0da1653f1b	MDEV-31779 Server crash in Rows_log_event::update_sequence upon replaying binary log The crash at running mysqlbinlog on a SEQUENCE containing binlog file was caused MDEV-29621 fixes that did not check which of the slave or binlog applier executes a block introduced there. The block is meaningful only for the parallel slave applier, so it's safe to fix this bug with identified the actual applier and skipping the block when it's the mysqlbinlog one.	2024-04-10 19:31:39 +03:00
Brandon Nesterenko	952ab9a596	MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown The signal handler thread can use various different runtime resources when processing a SIGHUP (e.g. master-info information) due to calling into reload_acl_and_cache(). Currently, the shutdown process waits for the termination of the signal thread after performing cleanup. However, this could cause resources actively used by the signal handler to be freed while reload_acl_and_cache() is processing. The specific resource that caused MDEV-30260 is a race condition for the hostname_cache, such that mysqld would delete it in clean_up()::hostname_cache_free(), before the signal handler would use it in reload_acl_and_cache()::hostname_cache_refresh(). Another similar resource is the active_mi/master_info_index. There was a race between its deletion by the main thread in end_slave(), and their usage by the Signal Handler as a part of Master_info_index::flush_all_relay_logs.read(active_mi) in reload_acl_and_cache(). This patch fixes these race conditions by relocating where server shutdown waits for the signal handler to die until after server-level threads have been killed (i.e., as a last step of close_connections()). With respect to the hostname_cache, active_mi and master_info_cache, this ensures that they cannot be destroyed while the signal handler is still active, and potentially using them. Additionally: 1) This requires that Events memory is still in place for SIGHUP handling's mysql_print_status(). So event deinitialization is moved into clean_up(), but the event scheduler still needs to be stopped in close_connections() at the same spot. 2) The function kill_server_thread is no longer used, so it is deleted 3) The timeout to wait for the death of the signal thread was not consistent with the comment. The comment mentioned up to 10 seconds, whereas it was actually 0.01s. The code has been fixed to wait up to 10 seconds. 4) A warning has been added if the signal handler thread fails to exit in time. 5) Added pthread_join() to end of wait_for_signal_thread_to_end() if it hadn't ended in 10s with a warning. Note this also removes the pthread_detached attribute from the signal_thread to allow for the pthread_join(). Reviewed By: =========== Vladislav Vaintroub <wlad@mariadb.com> Andrei Elkin <andrei.elkin@mariadb.com>	2024-04-09 14:25:13 -06:00
Kristian Nielsen	fb774eb1eb	Fix occasional test failure of rpl.rpl_parallel_stop_slave Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-03-15 15:04:09 +01:00
Marko Mäkelä	f703e72bd8	Merge 10.4 into 10.5	2024-03-11 10:08:20 +02:00
Brandon Nesterenko	b04c857596	MDEV-33500: rpl.rpl_parallel_sbm can fail on slow machines, e.g. MSAN/Valgrind builders In an addition to test rpl.rpl_parallel_sbm added by MDEV-32265, the test uses sleep statements alone to test Seconds_Behind_Master with delayed replication. On slow running machines, the test can pass the intended MASTER_DELAY duration and Seconds_Behind_Master can become 0, when the test expects the transaction to still be actively in a delaying state. This can be consistently reproduced by adding a sleep statement before the call to --let = query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1) to sleep past the delay end point. This patch fixes this by locking the table which the delayed transaction targets so Second_Behind_Master cannot be updated before the test reads it for validation.	2024-02-20 08:19:18 -07:00
Kristian Nielsen	c73c6aea63	MDEV-33426: Aria temptables wrong thread-specific memory accounting in slave thread Aria temporary tables account allocated memory as specific to the current THD. But this fails for slave threads, where the temporary tables need to be detached from any specific THD. Introduce a new flag to mark temporary tables in replication as "global", and use that inside Aria to not account memory allocations as thread specific for such tables. Based on original suggestion by Monty. Reviewed-by: Monty <monty@mariadb.org> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-02-16 12:48:30 +01:00
Alexey Botchkov	85517f609a	MDEV-33393 audit plugin do not report user did the action.. The '<replication_slave>' user is assigned to the slave replication thread so this name appears in the auditing logs.	2024-02-14 00:02:29 +04:00
Brandon Nesterenko	03d1346e7f	MDEV-29369: rpl.rpl_semi_sync_shutdown_await_ack fails regularly with Result content mismatch This test was prone to failures for a few reasons, summarized below: 1) MDEV-32168 introduced “only_running_threads=1” to slave_stop.inc, which allowed the stop logic to bypass an attempting-to-reconnect IO thread. That is, the IO thread could realize the master shutdown in `read_event()`, and thereby call into `try_to_reconnect()`. This would leave the IO thread up when the test expected it to be stopped. Fixed by explicitly stopping the IO thread and allowing an error state, as the above case would lead to errno 2003. 2) On slow systems (or those running profiling tools, e.g. MSAN), the waiting-for-ack transaction can complete before the system processes the `SHUTDOWN WAIT FOR ALL SLAVES`. There was shutdown preparation logic in-between the transaction and shutdown itself, which contributes to this problem. This patch also moves this preparation logic before the transaction, so there is less to do in-between the calls. 3) Changed work-around for MDEV-28141 to use debug_sync instead of sleep delay, as it was still possible to hit the bug on very slow systems. 4) Masked MTR variable reset with disable/enable query log Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-02-12 05:48:18 -07:00
Brandon Nesterenko	ee89558312	MDEV-14357: rpl.rpl_domain_id_filter_io_crash failed in buildbot with wrong result A race condition with the SQL thread, where depending on if it was killed before or after it had executed the fake/generated IGN_GTIDS Gtid_list_log_event, may or may not update gtid_slave_pos with the position of the ignored events. Then, the slave would be restarted while resetting IGNORE_DOMAIN_IDS to be empty, which would result in the slave requesting different starting locations, depending on whether or not gtid_slave_pos was updated. And, because previously ignored events could now be requested and executed (no longer ignored), their presence would fail the test. This patch fixes this in two ways. First, to use GTID positions for synchronization rather than binlog file positions. Then second, to synchronize the SQL thread’s gtid_slave_pos with the ignored events before killing the SQL thread. To consistently reproduce the test failure, the following patch can be applied: diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index f51f5b7deec..de62233acff 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -3686,6 +3686,12 @@ Gtid_list_log_event::do_apply_event(rpl_group_info rgi) void hton= NULL; uint32 i; + sleep(1); + if (rli->sql_driver_thd->killed \|\| rli->abort_slave) + { + return 0; + } + Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-02-12 05:48:18 -07:00
Marko Mäkelä	8ec12e0d6d	Merge 10.4 into 10.5	2024-02-12 11:38:13 +02:00
Sergei Golubchik	01f6abd1d4	Merge branch '10.4' into 10.5	2024-01-31 17:32:53 +01:00
Monty	57ffcd686f	MDEV-21472: ALTER TABLE ... ANALYZE PARTITION ... with EITS reads and locks all rows This was fixed in 10.2 in 2020 but merging the code to 10.3 caused the bug to come back.	2024-01-30 09:19:01 +02:00
Brandon Nesterenko	c75905cacb	MDEV-33327: rpl_seconds_behind_master_spike Sensitive to IO Thread Stop Position rpl.rpl_seconds_behind_master_spike uses the DEBUG_SYNC mechanism to count how many format descriptor events (FDEs) have been executed, to attempt to pause on a specific relay log FDE after executing transactions. However, depending on when the IO thread is stopped, it can send an extra FDE before sending the transactions, forcing the test to pause before executing any transactions, resulting in a table not existing, that is attempted to be read for COUNT. This patch fixes this by no longer counting FDEs, but rather by programmatically waiting until the SQL thread has executed the transaction and then automatically activating the DEBUG_SYNC point to trigger at the next relay log FDE.	2024-01-30 06:58:44 +01:00
Brandon Nesterenko	e4f221a5f2	MDEV-33327: rpl_seconds_behind_master_spike Sensitive to IO Thread Stop Position rpl.rpl_seconds_behind_master_spike uses the DEBUG_SYNC mechanism to count how many format descriptor events (FDEs) have been executed, to attempt to pause on a specific relay log FDE after executing transactions. However, depending on when the IO thread is stopped, it can send an extra FDE before sending the transactions, forcing the test to pause before executing any transactions, resulting in a table not existing, that is attempted to be read for COUNT. This patch fixes this by no longer counting FDEs, but rather by programmatically waiting until the SQL thread has executed the transaction and then automatically activating the DEBUG_SYNC point to trigger at the next relay log FDE.	2024-01-29 15:17:57 -07:00
Brandon Nesterenko	112eb14f7e	MDEV-27850: rpl_seconds_behind_master_spike debug_sync fix A debug_sync signal could remain for the SQL thread that should have begun a wait_for upon seeing a GTID event, but would instead see the old signal and continue on without waiting. This broke an "idle" condition in SHOW SLAVE STATUS which should have automatically negated Seconds_Behind_Master. Instead, because the SQL thread had already processed the GTID event, it set sql_thread_caught_up to false, and thereby calculated the value of Seconds_behind_master, when the test expected 0. This patch fixes this by resetting the debug_sync state before creating a new transaction which sends a GTID event to the replica	2024-01-26 11:43:34 -07:00
Brandon Nesterenko	01ca57ec16	MDEV-32168: Postpush fix for rpl_domain_id_filter_master_crash While a replica may be reading events from the primary, the primary is killed. Left to its own devices, the IO thread may or may not stop in error, depending on what it is doing when its connection to the primary is killed (e.g. a failed read results in an error, whereas if the IO thread is idly waiting for events when the connection dies, it will enter into a reconnect loop and reconnect). MDEV-32168 changed the test to always wait for the reconnect, thus breaking the error case, as the IO thread would be stopped at a time of expecting it to be running. The fix is to manually stop/start the IO thread to ensure it is in a consistent state. Note that rpl_domain_id_filter_master_crash.test will need additional changes after fixing MDEV-33268 Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-01-22 07:30:52 -07:00
Alexander Barkov	fa3171df08	MDEV-27666 User variable not parsed as geometry variable in geometry function Adding GEOMETRY type user variables.	2024-01-16 18:53:23 +04:00
Marko Mäkelä	12995559f9	Merge 10.4 into 10.5	2023-12-19 18:30:58 +02:00
Kristian Nielsen	a204ce2788	MDEV-33045: Server crashes in Item_func_binlog_gtid_pos::val_str / Binary_string::c_ptr_safe Item::val_str() sets the Item::null_value flag, so call it before checking the flag, not after. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-12-19 12:08:53 +01:00
Marko Mäkelä	4ae105a37d	Merge 10.4 into 10.5	2023-12-18 08:59:07 +02:00
Brandon Nesterenko	8dad51481b	MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in buildbot with timeout in include A replication parallel worker thread can deadlock with another connection running SHOW SLAVE STATUS. That is, if the replication worker thread is in do_gco_wait() and is killed, it will already hold the LOCK_parallel_entry, and during error reporting, try to grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in reverse order. It will initially grab the err_lock, and then try to grab LOCK_parallel_entry. This leads to a deadlock when both threads have grabbed their first lock without the second. This patch implements the MDEV-31894 proposed fix to optimize the workers_idle() check to compare the last in-use relay log’s queued_count==dequeued_count for idleness. This removes the need for workers_idle() to grab LOCK_parallel_entry, as these values are atomically updated. Huge thanks to Kristian Nielsen for diagnosing the problem! Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2023-12-11 07:45:23 -07:00

1 2 3 4 5 ...

4074 commits