Commit graph

17796 commits

Author SHA1 Message Date
Sergei Golubchik
349ca2be74 mtr: remove innodb combinations
dead code for about 10 years
2024-05-05 21:37:08 +02:00
Sergei Golubchik
df6899b30b bugfix: mysqld --safe-mode crashes 2024-05-05 21:37:08 +02:00
Sergei Golubchik
7a789e2027 sporadic failures of rpl.rpl_parallel_sbm
the test waits for the event to get stuck on MASTER_DELAY,
but on a slow/overloaded slave the event might pass MASTER_DELAY
before the test starts waiting.

Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY),
the event cannot avoid that,
2024-05-05 21:37:07 +02:00
Sergei Golubchik
cea083af9f cleanup: use THD_STAGE_INFO, not thd_proc_info
and put master-slave.inc *last* in the series of includes
2024-05-05 21:37:07 +02:00
Sergei Golubchik
cb7c99674e sporadic failure of perfschema.func_file_io
--- func_file_io.result
+++ func_file_io.reject
@@ -134,7 +134,7 @@
 Variable_name	Value
 Performance_schema_accounts_lost	0
 Performance_schema_cond_classes_lost	0
-Performance_schema_cond_instances_lost	0
+Performance_schema_cond_instances_lost	5
 Performance_schema_digest_lost	0
 Performance_schema_file_classes_lost	0
 Performance_schema_file_handles_lost	0
2024-05-05 21:37:07 +02:00
Kristian Nielsen
4b4db4a8e5 MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure
Refinement of the original patch.

Move the code to reset the kill up into the parent class
Xid_apply_log_event, to also fix the similar issue for XA COMMIT.

Increase the number of slave retries in the test case
rpl.rpl_parallel_multi_domain_xa to fix some sporadic failures. The test
generates massive amounts of conflicting transactions in multiple
independent domains, which can cause multiple rollback+retry for a
transaction as it conflicts with transactions in other domains one-by-one.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-05-05 19:01:56 +02:00
Sergei Golubchik
3ee6f69d49 sporadic failures of binlog_encryption.rpl_parallel_gco_wait_kill
CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'

An sql thread can reach the "Slave has read all relay log" state
and then start reading relay log again. Let's use a more generic
pattern to retrieve the sql thread ID even if it's not
in the "read all relay log" state.
2024-05-02 22:14:19 +02:00
Kristian Nielsen
e365877bae MDEV-33798: ROW base optimistic deadlock with concurrent writes on same table
One case is conflicting transactions T1 and T2 with different domain id, in
optimistic parallel replication in non-GTID mode. Then T2 will
wait_for_prior_commit on T1; and if T1 got a row lock wait on T2 it would
hang, as different domains caused the deadlock kill to be skipped in
thd_rpl_deadlock_check().

More generally, if we have transactions T1 and T2 in one domain/master
connection, and independent transactions U in another, then we can
still deadlock like this:

  T1 row low wait on U
  U row lock wait on T2
  T2 wait_for_prior_commit on T1

This commit enforces the deadlock kill in these cases. If the waited-for
transaction is speculatively applied, then it will be deadlock killed in
case of a conflict, even if the two transactions are in different domains
or master connections.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-05-02 21:07:45 +02:00
Sergei Golubchik
dba9d19249 atomic.alter_table test is too slow for MSAN 2024-04-30 21:59:38 +02:00
Thirunarayanan Balathandayuthapani
156761db3b MDEV-31161 Assertion failures upon adding a too long key to table with COMPRESSED row
Problem:
=======
During InnoDB non-rebuild online alter operation, InnoDB set the
dummy log to clustered index online log. This can be used by
concurrent DML to identify whether the table undergoes online DDL.
InnoDB fails to reset the dummy log of clustered index in case
of error happened during prepare phase.

Solution:
========
Reset the InnoDB clustered index online log in case of error during
prepare phase.
2024-04-30 20:40:29 +05:30
Sergei Golubchik
b663c935a4 don't use normal diffs in *.rdiff files
they aren't robust enough and can easily apply incorrectly

(this fixes the failure of innodb.insert_into_empty,4k after the merge)
2024-04-30 16:57:07 +02:00
Sergei Golubchik
0aae11ac28 Merge branch '10.6' into 10.11 2024-04-30 16:56:49 +02:00
Thirunarayanan Balathandayuthapani
f378e76434 MDEV-33980 mariadb-backup --backup is missing retry logic for undo tablespaces
Problem:
========
- Currently mariabackup have to reread the pages in case they are
modified by server concurrently. But while reading the undo
tablespace, mariabackup failed to do reread the page in case of
error.

Fix:
===
Mariabackup --backup functionality should have retry logic
while reading the undo tablespaces.
2024-04-30 16:15:26 +05:30
Andrei
ae03374f29 MDEV-34030 rpl.rpl_using_gtid_default can fail in (BB) mtr
The test's header is not written to follow strictly a correct order
of checks by mtr at test start which may lead to an error. E.g

./mtr --mysqld=--binlog-format=row rpl.rpl_using_gtid_default

to
At line 175: query 'SET GLOBAL gtid_slave_pos= ""' failed: ER_SLAVE_MUST_STOP (1198): This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first

Fixed to require the binlog format first in the test header.
2024-04-30 12:40:50 +03:00
Andrei
6a63204c36 MDEV-34029 rpl.rpl_heartbeat can fail when (BB) mtr reorders tests
rpl.rpl_heartbeat turns out to miss a standard include/master-slave
header which made it potentially in BB and actually with manual mtr
failing as it may have used a previous slave GTID state.

Fixed with installing the standard rpl suite header/footer in the
test file.
2024-04-30 12:40:50 +03:00
Rucha Deodhar
d7df63e1c9 MDEV-19487: JSON_TYPE doesnt detect the type of String Values
(returns NULL) and for Date/DateTime returns "INTEGER"

Analysis:
When the first character of json is scanned it is number. Based on that
integer is returned.
Fix:
Scan rest of the json before returning the final result to ensure json is
valid in the first place in order to have a valid type.
2024-04-29 22:32:17 +05:30
Thirunarayanan Balathandayuthapani
a586b6dbc8 MDEV-22855 Assertion `!field->prefix_len || field->fixed_len == field->prefix_len' failed in btr_node_ptr_max_size
Problem:
========
- InnoDB wrongly calulates the record size in
btr_node_ptr_max_size() when prefix index of
the column has to be stored externally.

Fix:
====
- InnoDB should add the maximum field size to
record size when the field is a fixed length one.
2024-04-29 16:42:26 +05:30
Alexander Barkov
c6e3fe29d4 MDEV-30646 View created via JSON_ARRAYAGG returns incorrect json object
Backporting add782a13e from 10.6, this fixes the problem.
2024-04-29 13:47:45 +04:00
Sergei Golubchik
c1f3eff53f Merge branch '10.5' into 10.6 2024-04-29 10:08:58 +02:00
Alexander Barkov
dc25d600ee MDEV-21058 CREATE TABLE with generated column and RLIKE results in sigabrt
Regexp_processor_pcre::fix_owner() called Regexp_processor_pcre::compile(),
which could fail on the regex syntax error in the pattern and put
an error into the diagnostics area. However, the callers:
  - Item_func_regex::fix_length_and_dec()
  - Item_func_regexp_instr::fix_length_and_dec()
still returned "false" in such cases, which made the code
crash later inside Diagnostics_area::set_ok_status().

Fix:

- Change the return type of fix_onwer() from "void" to "bool"
  and return "true" whenever an error is put to the DA
  (e.g. on the syntax error in the pattern).
- Fixing fix_length_and_dec() of the mentioned Item_func_xxx
  classes to return "true" if fix_onwer() returned "true".
2024-04-29 11:08:07 +04:00
mkaruza
136358036d MDEV-18590: galera.versioning_trx_id: Test failure: mysqltest: Result content mismatch
Replicated events have time associated with them from originating
node which will be used for commit timestamp. Associated time can
be set in past before event is even applied.

For WSREP replication we don't need to use time information from
event.

Addressed review comments:
	  Jan Lindström <jan.lindstrom@galeracluster.com>

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-04-27 18:40:58 +02:00
Jan Lindström
1532f12058 MDEV-33898 : Galera test failure on galera.MW-369
Tests using MW-369.inc sometimes hanged after
signaling two debug sync points inside a Galera
library. Replaced Galera library sync point
with server code sync point when possible and
added more wait_conditions to make sure we are
in correct state.

Tests effected: MW-369, MW-402, MDEV-27276, and
mysql-wsrep#332.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-04-26 20:21:44 +02:00
Julius Goryavsky
288ea9e146 galera SST scripts: parsing CN in certificates
This commit contains a fix for the code that extracts and parses
the CN (common name, domain name) record from certificates using
the openssl utility. This code is also made common to the rsync
and mariabackup scripts. There is also some systematization of
the use of 'printf' and 'echo' builtins/utilities.
2024-04-26 20:21:44 +02:00
Sergei Golubchik
7ff649315e sporadic failures of rpl.rpl_parallel_multi_domain_xa
it's a slow test, the slave needs to catch up, reading >1500
transactions. A default MASTER_GTID_WAIT() timeout in
sync_with_master_gtid.inc is 120 seconds, which might be not
enough for a slow/overloaded slave.

Let's wait forever or until ./mtr --testcase-timeout,
whatever comes first.
2024-04-26 14:24:32 +02:00
Jan Lindström
b3e531a3cc MDEV-33896 : Galera test failure on galera_3nodes.MDEV-29171
Based on logs we might start SST before donor has reached
Primary state. Because this test shutdowns all nodes we
need to make sure when we start nodes that previous nodes
have reached Primary state and joined the cluster.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-04-25 16:32:06 +02:00
Sergei Golubchik
9e92582024 sporadic failures of rpl.rpl_parallel_sbm
the test waits for the event to get stuck on MASTER_DELAY,
but on a slow/overloaded slave the event might pass MASTER_DELAY
before the test starts waiting.

Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY),
the event cannot avoid that,
2024-04-25 12:47:23 +02:00
Marko Mäkelä
0936c13809 MDEV-33993 Possible server hang on DROP INDEX or RENAME INDEX
commit_try_norebuild(): Add the parameter statistics_exist,
similar to commit_try_rebuild(). If the InnoDB statistics tables
did not exist, we will not attempt to update statistics later on
during the transaction.

Thanks to Matthias Leich for originally reproducing this scenario.
2024-04-25 13:44:10 +03:00
Kristian Nielsen
553a4d6271 MDEV-33602: Sporadic test failure in rpl.rpl_gtid_stop_start
The test could fail with a duplicate key error because switching to non-GTID
mode could start at the wrong old-style position. The position could be
wrong when the previous GTID connect was stopped before receiving the fake
GTID list event which gives the old-style position corresponding to the GTID
connected position.

Work-around by injecting an extra event and syncing the slave before
switching to non-GTID mode.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-25 11:00:45 +02:00
Thirunarayanan Balathandayuthapani
8c8b7da017 MDEV-33979 Disallow bulk insert operation during partition update statement
Problem:
========
- Partition update operation enables the bulk insert for the
transaction while moving the row between partitions. This leads
to debug assert failure while removing the row from one
of the partition.

Solution:
========
- Disallow the bulk insert operation for non-insert operation
of partition table.
2024-04-25 10:50:34 +05:30
Sergei Golubchik
9cf718859f cleanup: use THD_STAGE_INFO, not thd_proc_info
and put master-slave.inc *last* in the series of includes
2024-04-24 22:08:52 +02:00
Sergei Golubchik
7d5e08de6b MDEV-20157 perfschema.stage_mdl_function failed in buildbot with wrong result
MDL wait consists of short 1 second waits (this is not configurable)
repeated until lock_wait_timeout is reached. The stage is changed
to Waiting and back every second. To have predictable result in the
test the query should filter all sequences of X, "Waiting for MDL", X,
leaving just X.
2024-04-24 18:09:58 +02:00
Sergei Golubchik
259394aed7 disable mariabackup.incremental_encrypted,64k on 32bit
it allocates 1GB of memory, it causes failures in CI
2024-04-24 18:09:20 +02:00
Sergei Golubchik
e2f95ebbcb fix galera_3nodes.galera_gtid_consistency to work with nc
like other galera tests do
2024-04-24 18:09:20 +02:00
Brandon Nesterenko
8c7992165b MDEV-33672: 10.11 Fix for Two Phase Alter Flags
Extends 89c907bd4f to account for
binlog_two_phase_alter flags in a Gtid log event. I.e., if the
FL_COMMIT_ALTER_E1 or FL_ROLLBACK_ALTER_E2 flags are set in the
event flags, yet the length of the event is too short to hold
the value, then set the event as invalid
2024-04-24 13:19:36 +02:00
Thirunarayanan Balathandayuthapani
0c55d854fe MDEV-33334 mariadb-backup fails to preserve innodb_encrypt_tables
Problem:
========
mariabackup --prepare fails to write the pages in encrypted format.
This issue happens only for default encrypted table when
innodb_encrypt_tables variable is enabled.

Fix:
====
backup process should write the value of innodb_encrypt_tables
variable in configuration file. prepare should enable the
variable based on configuration file.
2024-04-24 16:27:31 +05:30
Thirunarayanan Balathandayuthapani
c3460e6904 MDEV-33970 Assertion `!m.first->second.is_bulk_insert()' failed in trx_undo_report_row_operation()
In case of partition insert, InnoDB fails to end the bulk insert
for one of the partition. It leads to bulk insert operation for
the consecutive delete statement.

trx_t::bulk_insert_apply_for_table(): Irrespective of bulk insert
value, InnoDB should end the bulk insert for the table.
2024-04-23 16:26:02 +05:30
Sergei Golubchik
e73181112f MDEV-16944 fix galera tests
followup for 061adae9a2
2024-04-23 10:55:35 +02:00
Sergei Golubchik
f243c73788 sporadic failures of rpl.rpl_rewrite_db_sys_vars
first stop the slave, then run commands on the master that are
supposed to fail on the slave, then start the slave.

if you swap first two steps, the slave might get and execute those
commands before it's stopped, which will fail the test.

also, improve debugability
2024-04-22 21:02:11 +02:00
Sergei Golubchik
a74846354e fix failing large_tests.maria_recover_encrypted
update results
2024-04-22 18:38:39 +02:00
Sergei Golubchik
466bc8f7e0 fix failing large_tests.maria_recover_encrypted
update results
2024-04-22 17:22:11 +02:00
Sergei Golubchik
018d537ec1 Merge branch '10.6' into 10.11 2024-04-22 15:23:10 +02:00
Sergei Golubchik
75488a57f2 archive.archive and main.mysqlbinlog_row_compressed
fixes for zlib-ng
2024-04-22 00:14:03 +02:00
Sergei Golubchik
e83d92ee5e sporadic failures of rpl.rpl_semi_sync_fail_over
in the $case=2 - it's wrong to kill after the first binlog EOF,
because that might happen between INSERT(4) and INSERT(5).

So, wait for the slave to acknowledge INSERT(5) before killing
the master, that is, both connection threads must pass
repl_semisync_master.wait_after_sync()
2024-04-21 22:54:52 +02:00
Sergei Golubchik
6242783f24 rpl.rpl_semi_sync_fail_over improve debugability 2024-04-21 14:03:26 +02:00
Sergei Golubchik
a4b6409ff6 sporadic failures of binlog_encryption.rpl_parallel_slave_bgc_kill
do CHANGE MASTER before sync_with_master to have the slave
in a predictable fully synced state before the next test
2024-04-21 01:17:31 +02:00
Sergei Golubchik
d8368ae289 Merge '10.5' into 10.6 2024-04-20 14:47:26 +02:00
Kristian Nielsen
0c249ad718 MDEV-30232: rpl.rpl_gtid_crash fails sporadically in BB
The root cause of the failure is a bug in the Linux network stack:

  https://lore.kernel.org/netdev/87sf0ldk41.fsf@urd.knielsen-hq.org/T/#u

If the slave does a connect(2) at the exact same time that kill -9 of the
master process closes the listening socket, the FIN or RST packet is lost in
the kernel, and the slave ends up timing out waiting for the initial
communication from the server. This timeout defaults to
--slave-net-timeout=120, which causes include/master_gtid_wait.inc to time
out first and fail the test.

Work-around this problem by reducing the --slave-net-timeout for this test
case. If this problem turns up in other tests, we can consider reducing the
default value for all tests.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-20 13:41:08 +02:00
Sergei Golubchik
4a2e03453a MDEV-33952 galera_create_table_as_select fails sporadically
disable until fixed
2024-04-19 22:09:41 +02:00
Thirunarayanan Balathandayuthapani
8a3755cc29 MDEV-33934 Assertion `!check_foreigns' failed in
bulk_insert_apply_for_table(dict_table_t*)

This issue is caused by
commit 188c5da72a (MDEV-32453).

trx_t::bulk_insert_apply_for_table(): Remove the assert
check_unique_secondary and check_foreigns. InnoDB can
apply the bulk insert operation even after disabling
the check_foreigns and check_unique_secondary variable.
2024-04-19 11:05:44 +05:30
Marko Mäkelä
bb2e125d07 Merge 10.5 into 10.6
This excludes commit 040069f4ba
because it is specific to innodb_sync_debug, which had been removed
in commit ff5d306e29.
2024-04-18 07:14:56 +03:00