mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	4ae105a37d	Merge 10.4 into 10.5	2023-12-18 08:59:07 +02:00
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Brandon Nesterenko	8dad51481b	MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in buildbot with timeout in include A replication parallel worker thread can deadlock with another connection running SHOW SLAVE STATUS. That is, if the replication worker thread is in do_gco_wait() and is killed, it will already hold the LOCK_parallel_entry, and during error reporting, try to grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in reverse order. It will initially grab the err_lock, and then try to grab LOCK_parallel_entry. This leads to a deadlock when both threads have grabbed their first lock without the second. This patch implements the MDEV-31894 proposed fix to optimize the workers_idle() check to compare the last in-use relay log’s queued_count==dequeued_count for idleness. This removes the need for workers_idle() to grab LOCK_parallel_entry, as these values are atomically updated. Huge thanks to Kristian Nielsen for diagnosing the problem! Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2023-12-11 07:45:23 -07:00
Kristian Nielsen	5ca63b2b8b	MDEV-26632: multi source replication filters breaking GTID semantic Add a test case that demonstrates a working setup as described in MDEV-26632. This requires --gtid-ignore-duplicates=1 and --gtid-strict-mode=0. In A->B->C, B filters some (but not all) events from A. C is promoted to create A->C->B, and the current GTID position in B contains a GTID from A that is not present in C (due to filtering). Demonstrate that B can still connect with GTID to C, starting at the "hole" in the binlog stream on C originating from A. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-12-11 12:04:49 +01:00
Kristian Nielsen	da9ffca908	MDEV-29816 rpl.rpl_parallel_29322 occasionally fails in BB Make sure the old binlog dump thread is not still running when manipulating binlog files; otherwise there is a small chance it will see an invalid partial file and report an I/O error. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-12-11 12:02:58 +01:00
Sergei Golubchik	98a39b0c91	Merge branch '10.4' into 10.5	2023-12-02 01:02:50 +01:00
Marko Mäkelä	b3a628c7d4	Merge 10.5 into 10.6	2023-11-30 10:45:01 +02:00
Kristian Nielsen	ea4bcb9d98	MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc Fix some random test failures following MDEV-32168 push. Don't blindly set $rpl_only_running_threads in many places. Instead explicit stop only the IO or SQL thread, as appropriate. Setting it interfered with rpl_end.inc in some cases. Rather than clearing it afterwards, better to not set it at all when it is not needed, removing ambiguity in the test about the state of the replication threads. Don't fail the test if include/stop_slave_io.inc finds an error in the IO thread after stop. Such errors can be simply because slave stop happened in the middle of the IO thread's initial communication with the master. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-28 19:10:42 +01:00
Monty	387b92df97	Remove deprication from mariadbd --debug --debug is supported by allmost all our other binaries and we should keep it also in the server to keep option names similar.	2023-11-28 16:33:22 +02:00
Kristian Nielsen	36680b648a	MDEV-20523: rpl.create_or_replace_mix, rpl.create_or_replace_statement failed in buildbot with wrong result Wait for the disconnect of the other connection to complete, before running SHOW BINLOG EVENTS. Otherwise the DROP TEMPORARY TABLE that is binlogged during disconnect may not have appeared yet depending on thread scheduling. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-17 19:44:11 +01:00
Kristian Nielsen	0258ad545a	MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc Fix wrong change to rpl.rpl_shutdown_wait_slaves. After shutting down the master, slaves may or may not succeed in reconnecting depending on the timing on their reconnect relative to master restart. So don't assume all IO threads will be running, just restart any slave that needs it. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-17 19:44:11 +01:00
Kristian Nielsen	7e394d0b4a	MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc Fix sporadic test failure in rpl.rpl_ssl1. The test incorrectly did a STOP SLAVE too early, which could race with the expected 'Access denied' error. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-17 19:44:11 +01:00
Kristian Nielsen	30ec1b3e78	MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc Fix sporadic test failures in rpl.rpl_set_statement_default_master and rpl.rpl_slave_load_tmpdir_not_exist. A race between START and STOP SLAVE could leave an error condition that causes test failure after MDEV-32168. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-17 19:44:11 +01:00
Kristian Nielsen	17430d94d7	MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc Test rpl.show_status_stop_slave_race-7126 now fails sporadically because it is expected to sometimes (but not always) leave an error condition after slave stop. Fix by explicitly allowing the error condition in this case. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-17 19:44:11 +01:00
Kristian Nielsen	d95fa7e332	MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc Fix a start/stop race that causes occasional test failure after more the more strict error check of MDEV-32168. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-17 19:44:11 +01:00
Anel Husakovic	a7d186a17d	MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc - Reviewer: <knielsen@knielsen-hq.org> <brandon.nesterenko@mariadb.com> <andrei.elkin@mariadb.com>	2023-11-16 10:41:11 +01:00
Kristian Nielsen	64a743fc81	MDEV-16951: binlog_encryption.rpl_checksum failed in buildbot with wrong result Wait for the binlog checkpoint event to fix non-determinism in the testcase. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-15 11:40:05 +01:00
Kristian Nielsen	73a38b68dc	MDEV-11018: rpl.rpl_mariadb_slave_capability fails sporadically in buildbot The test was missing a wait_for_binlog_checkpoint.inc, making it non-deterministic Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-14 17:12:59 +01:00
Andrei	d6872f9cbb	MDEV-32365: post-fixes to rpl_semi_sync_slave_reply_fail	2023-11-09 14:36:46 +02:00
Oleksandr Byelkin	b83c379420	Merge branch '10.5' into 10.6	2023-11-08 15:57:05 +01:00
Oleksandr Byelkin	6cfd2ba397	Merge branch '10.4' into 10.5	2023-11-08 12:59:00 +01:00
Monty	e5a5573f78	rpl.rpl_invoked_features fails sporadically with "Duplicate key error" The reason was that Event e11 was re-executed before "ALTER EVENT e11 DISABLE" had been executed. Fixed by increasing re-schedule time Other things: - Removed double accounting of 'execution_count'. It was incremented in top->mark_last_executed(thd) that was executed a few lines earlier.	2023-11-03 11:42:52 +02:00
Brandon Nesterenko	80ea3590de	MDEV-32655: rpl_semi_sync_slave_compressed_protocol.test assert_only_after is wrong The MTR test rpl.rpl_semi_sync_slave_compressed_protocol scans the log file to ensure there is no magic number error. It attempts to only scan the log files of the current test; however, the variable which controls this, , is initialized incorrectly, and it thereby scans the entire log file, which includes output from prior tests. This causes it to fail if a test which expects this error runs previously on the same worker. This patch fixes the assert_only_after so the test only scans through its own log contents.	2023-11-01 09:10:17 -06:00
Brandon Nesterenko	c341743e83	MDEV-32651: Lost Debug_sync signal in rpl_sql_thd_start_errno_cleared The test rpl.rpl_sql_thd_start_errno_cleared can lose a debug_sync signal, as there is a RESET immediately following a SIGNAL. When the signal is lost, the sql_thread is stuck in a WAIT_FOR clause until it times out, resulting in long test times (albeit still successful). This patch extends the test to ensure the debug_sync signal was received before issuing the RESET	2023-11-01 07:35:07 -06:00
Kristian Nielsen	6fa69ad747	MDEV-27436: binlog corruption (/tmp no space left on device at the same moment) This commit fixes several bugs in error handling around disk full when writing the statement/transaction binlog caches: 1. If the error occurs during a non-transactional statement, the code attempts to binlog the partially executed statement (as it cannot roll back). The stmt_cache->error was still set from the disk full error. This caused MYSQL_BIN_LOG::write_cache() to get an error while trying to read the cache to copy it to the binlog. This was then wrongly interpreted as a disk full error writing to the binlog file. As a result, a partial event group containing just a GTID event (no query or commit) was binlogged. Fixed by checking if an error is set in the statement cache, and if so binlog an INCIDENT event instead of a corrupt event group, as for other errors. 2. For LOAD DATA LOCAL INFILE, if a disk full error occured while writing to the statement cache, the code would attempt to abort and read-and-discard any remaining data sent by the client. The discard code would however continue trying to write data to the statement cache, and wrongly interpret another disk full error as end-of-file from the client. This left the client connection with extra data which corrupts the communication for the next command, as well as again causing an corrupt/incomplete event to be binlogged. Fixed by restoring the default read function before reading any remaining data from the client connection. Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-10-31 11:48:00 +01:00
Andrei	728bca44e8	MDEV-32593 Assertion failure upon CREATE SEQUENCE A recently added by MDEV-32593 assert conditions are corrected.	2023-10-27 12:26:34 +03:00
Andrei	9c43343213	MDEV-32365 detailize the semisync replication magic number error Semisync ack (master side) receiver thread is made to report details of faced errors. In case of 'magic byte' error, a hexdump of the received packet is always (level) NOTEd into the error log. In other cases an exact server level error is print out as a warning (as it may not be critical) under log_warnings > 2. An MTR test added for the magic byte error. For others existing mtr tests cover that, provided log_warnings > 2 is set.	2023-10-26 20:24:44 +03:00
Brandon Nesterenko	c5f776e9fa	MDEV-32265: seconds_behind_master is inaccurate for Delayed replication If a replica is actively delaying a transaction when restarted (STOP SLAVE/START SLAVE), when the sql thread is back up, Seconds_Behind_Master will present as 0 until the configured MASTER_DELAY has passed. That is, before the restart, last_master_timestamp is updated to the timestamp of the delayed event. Then after the restart, the negation of sql_thread_caught_up is skipped because the timestamp of the event has already been used for the last_master_timestamp, and their update is grouped together in the same conditional block. This patch fixes this by separating the negation of sql_thread_caught_up out of the timestamp-dependent block, so it is called any time an idle parallel slave queues an event to a worker. Note that sql_thread_caught_up is still left in the check for internal events, as SBM should remain idle in such case to not "magically" begin incrementing. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2023-10-23 14:25:03 -06:00
Andrei	1fe4a71b67	MDEV-31792 Assertion fails in MDL_context::acquire_lock upon parallel replication of CREATE SEQUENCE The assert's reason was in missed FL_DDL flagging of CREATE-or-REPLACE Query event. MDEV-27365 fixes covered only the non-pre-existing table execution branch so did not see a possibility of implicit commit in the middle of execution in a rollback branch when the being CREATEd sequence table is actually replaced. The pre-existing table branch cleared the DDL modification flag so the query lost FL_DDL in binlog and its parallel execution on slave may have ended up with the assert to indicate the query is raced by a following in binlog order event. Fixed with applying the MDEV-27365 pattern. An mtr test is added to cover the rollback situation. The description test [ pass ] with a generous number of mtr parallel reties.	2023-10-23 15:39:51 +03:00
Alexander Barkov	6400b199ac	MDEV-32249 strings/ctype-ucs2.c:2336: my_vsnprintf_utf32: Assertion `(n % 4) == 0' failed in my_vsnprintf_utf32 on INSERT The crash inside my_vsnprintf_utf32() happened correctly, because the caller methods: Field_string::sql_rpl_type() Field_varstring::sql_rpl_type() mis-used the charset library and sent pure ASCII data to the virtual function snprintf() of a utf32 CHARSET_INFO. It was wrong to use Field::charset() in sql_rpl_type(). We're printing the metadata (the data type) here, not the column data. The string contraining the data type of a CHAR/VARCHAR column is a pure ASCII string. Fixing to use res->charset() to print, like all virtual implementations of sql_type() do. Review was done by Andrei Elkin. Thanks to Andrei for proposing MTR test improvents.	2023-10-11 22:39:36 +04:00
Marko Mäkelä	625a150a86	Merge 10.5 into 10.6	2023-10-06 14:34:01 +03:00
Yuchen Pei	6b343de8ef	Merge branch '10.4' into 10.5	2023-09-25 13:06:57 +10:00
Oleksandr Byelkin	2bf291ba59	MDEV-30820 slow log Rows_examined out of range Fix row counters to be able to get any possible value.	2023-09-22 12:10:38 +02:00
Yuchen Pei	b70d8fbf18	Merge branch '10.5' into 10.6	2023-09-15 12:12:46 +10:00
Yuchen Pei	e95e9a221f	Merge branch '10.4' into 10.5	2023-09-15 12:04:44 +10:00
Marko Mäkelä	6a470db552	Merge 10.5 into 10.6	2023-09-14 15:25:53 +03:00
Anel Husakovic	b1ab4ec4e2	Remove duplicated default client include from replication my.cnf - `default_client` is included already in rpl_1slave_base.cnf`, so remove it from `my.cnf` - Remove option group for `mysqld` server as and add comment how to override specific settings for specific server - Reviewer: <brandon.nesterenko@mariadb.com>	2023-09-14 12:56:41 +02:00
Yuchen Pei	cb1965bd9d	Merge branch '10.4' into 10.5	2023-09-14 16:30:11 +10:00
Marko Mäkelä	0f9acce3f2	Merge 10.5 into 10.6	2023-09-14 09:01:15 +03:00
Brandon Nesterenko	1407f99963	MDEV-31177: SHOW SLAVE STATUS Last_SQL_Errno Race Condition on Errored Slave Restart The SQL thread and a user connection executing SHOW SLAVE STATUS have a race condition on Last_SQL_Errno, such that a slave which previously errored and stopped, on its next start, SHOW SLAVE STATUS can show that the SQL Thread is running while the previous error is also showing. The fix is to move when the last error is cleared when the SQL thread starts to occur before setting the status of Slave_SQL_Running. Thanks to Kristian Nielson for his work diagnosing the problem! Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com> Kristian Nielson <knielsen@knielsen-hq.org>	2023-09-13 12:01:47 -06:00
Brandon Nesterenko	7de0c7b569	MDEV-31038: rpl.rpl_xa_prepare_gtid_fail clean up - Removed commented out and unused lines. - Updated test to reference true failure of timeout rather than deadlock - Switched save variables from MTR to user - Forced relay-log purge to not potentially re-execute an already prepared transaction	2023-09-13 10:59:26 -06:00
Kristian Nielsen	7c9837ce74	Merge 10.4 into 10.5 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 18:02:18 +02:00
Kristian Nielsen	805e0668c9	MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication Remove the exception that InnoDB does not report auto-increment locks waits to the parallel replication. There was an assumption that these waits could not cause conflicts with in-order parallel replication and thus need not be reported. However, this assumption is wrong and it is possible to get conflicts that lead to hangs for the duration of --innodb-lock-wait-timeout. This can be seen with three transactions: 1. T1 is waiting for T3 on an autoinc lock 2. T2 is waiting for T1 to commit 3. T3 is waiting on a normal row lock held by T2 Here, T3 needs to be deadlock killed on the wait by T1. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:40:02 +02:00
Kristian Nielsen	18acbaf416	MDEV-31655: Parallel replication deadlock victim preference code errorneously removed Restore code to make InnoDB choose the second transaction as a deadlock victim if two transactions deadlock that need to commit in-order for parallel replication. This code was erroneously removed when VATS was implemented in InnoDB. Also add a test case for InnoDB choosing the right deadlock victim. Also fixes this bug, with testcase that reliably reproduces: MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:39:49 +02:00
Kristian Nielsen	900c4d6920	MDEV-31655: Parallel replication deadlock victim preference code errorneously removed Restore code to make InnoDB choose the second transaction as a deadlock victim if two transactions deadlock that need to commit in-order for parallel replication. This code was erroneously removed when VATS was implemented in InnoDB. Also add a test case for InnoDB choosing the right deadlock victim. Also fixes this bug, with testcase that reliably reproduces: MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master Note: This should be null-merged to 10.6, as a different fix is needed there due to InnoDB locking code changes. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:35:30 +02:00
Kristian Nielsen	920789e9d4	MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication Remove the exception that InnoDB does not report auto-increment locks waits to the parallel replication. There was an assumption that these waits could not cause conflicts with in-order parallel replication and thus need not be reported. However, this assumption is wrong and it is possible to get conflicts that lead to hangs for the duration of --innodb-lock-wait-timeout. This can be seen with three transactions: 1. T1 is waiting for T3 on an autoinc lock 2. T2 is waiting for T1 to commit 3. T3 is waiting on a normal row lock held by T2 Here, T3 needs to be deadlock killed on the wait by T1. Note: This should be null-merged to 10.6, as a different fix is needed there due to InnoDB lock code changes. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:34:09 +02:00
Marko Mäkelä	3fee1b4471	Merge 10.5 into 10.6	2023-08-15 11:21:34 +03:00
Marko Mäkelä	599c4d9a40	Merge 10.4 into 10.5	2023-08-15 11:10:27 +03:00
Kristian Nielsen	b2e312b055	MDEV-23021: rpl.rpl_parallel_optimistic_until fails in Buildbot The test case accessed slave-relay-bin.000003 without waiting for the IO thread to write it first. If the IO thread was slow, this could fail. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-10 19:52:25 +02:00
Oleksandr Byelkin	6bf8483cac	Merge branch '10.5' into 10.6	2023-08-01 15:08:52 +02:00

1 2 3 4 5 ...

4162 commits