the test waits for the event to get stuck on MASTER_DELAY,
but on a slow/overloaded slave the event might pass MASTER_DELAY
before the test starts waiting.
Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY),
the event cannot avoid that,
Refinement of the original patch.
Move the code to reset the kill up into the parent class
Xid_apply_log_event, to also fix the similar issue for XA COMMIT.
Increase the number of slave retries in the test case
rpl.rpl_parallel_multi_domain_xa to fix some sporadic failures. The test
generates massive amounts of conflicting transactions in multiple
independent domains, which can cause multiple rollback+retry for a
transaction as it conflicts with transactions in other domains one-by-one.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'
An sql thread can reach the "Slave has read all relay log" state
and then start reading relay log again. Let's use a more generic
pattern to retrieve the sql thread ID even if it's not
in the "read all relay log" state.
One case is conflicting transactions T1 and T2 with different domain id, in
optimistic parallel replication in non-GTID mode. Then T2 will
wait_for_prior_commit on T1; and if T1 got a row lock wait on T2 it would
hang, as different domains caused the deadlock kill to be skipped in
thd_rpl_deadlock_check().
More generally, if we have transactions T1 and T2 in one domain/master
connection, and independent transactions U in another, then we can
still deadlock like this:
T1 row low wait on U
U row lock wait on T2
T2 wait_for_prior_commit on T1
This commit enforces the deadlock kill in these cases. If the waited-for
transaction is speculatively applied, then it will be deadlock killed in
case of a conflict, even if the two transactions are in different domains
or master connections.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Problem:
=======
During InnoDB non-rebuild online alter operation, InnoDB set the
dummy log to clustered index online log. This can be used by
concurrent DML to identify whether the table undergoes online DDL.
InnoDB fails to reset the dummy log of clustered index in case
of error happened during prepare phase.
Solution:
========
Reset the InnoDB clustered index online log in case of error during
prepare phase.
Problem:
========
- Currently mariabackup have to reread the pages in case they are
modified by server concurrently. But while reading the undo
tablespace, mariabackup failed to do reread the page in case of
error.
Fix:
===
Mariabackup --backup functionality should have retry logic
while reading the undo tablespaces.
The test's header is not written to follow strictly a correct order
of checks by mtr at test start which may lead to an error. E.g
./mtr --mysqld=--binlog-format=row rpl.rpl_using_gtid_default
to
At line 175: query 'SET GLOBAL gtid_slave_pos= ""' failed: ER_SLAVE_MUST_STOP (1198): This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first
Fixed to require the binlog format first in the test header.
rpl.rpl_heartbeat turns out to miss a standard include/master-slave
header which made it potentially in BB and actually with manual mtr
failing as it may have used a previous slave GTID state.
Fixed with installing the standard rpl suite header/footer in the
test file.
(returns NULL) and for Date/DateTime returns "INTEGER"
Analysis:
When the first character of json is scanned it is number. Based on that
integer is returned.
Fix:
Scan rest of the json before returning the final result to ensure json is
valid in the first place in order to have a valid type.
Problem:
========
- InnoDB wrongly calulates the record size in
btr_node_ptr_max_size() when prefix index of
the column has to be stored externally.
Fix:
====
- InnoDB should add the maximum field size to
record size when the field is a fixed length one.
Regexp_processor_pcre::fix_owner() called Regexp_processor_pcre::compile(),
which could fail on the regex syntax error in the pattern and put
an error into the diagnostics area. However, the callers:
- Item_func_regex::fix_length_and_dec()
- Item_func_regexp_instr::fix_length_and_dec()
still returned "false" in such cases, which made the code
crash later inside Diagnostics_area::set_ok_status().
Fix:
- Change the return type of fix_onwer() from "void" to "bool"
and return "true" whenever an error is put to the DA
(e.g. on the syntax error in the pattern).
- Fixing fix_length_and_dec() of the mentioned Item_func_xxx
classes to return "true" if fix_onwer() returned "true".
Replicated events have time associated with them from originating
node which will be used for commit timestamp. Associated time can
be set in past before event is even applied.
For WSREP replication we don't need to use time information from
event.
Addressed review comments:
Jan Lindström <jan.lindstrom@galeracluster.com>
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Tests using MW-369.inc sometimes hanged after
signaling two debug sync points inside a Galera
library. Replaced Galera library sync point
with server code sync point when possible and
added more wait_conditions to make sure we are
in correct state.
Tests effected: MW-369, MW-402, MDEV-27276, and
mysql-wsrep#332.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This commit contains a fix for the code that extracts and parses
the CN (common name, domain name) record from certificates using
the openssl utility. This code is also made common to the rsync
and mariabackup scripts. There is also some systematization of
the use of 'printf' and 'echo' builtins/utilities.
it's a slow test, the slave needs to catch up, reading >1500
transactions. A default MASTER_GTID_WAIT() timeout in
sync_with_master_gtid.inc is 120 seconds, which might be not
enough for a slow/overloaded slave.
Let's wait forever or until ./mtr --testcase-timeout,
whatever comes first.
Based on logs we might start SST before donor has reached
Primary state. Because this test shutdowns all nodes we
need to make sure when we start nodes that previous nodes
have reached Primary state and joined the cluster.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
the test waits for the event to get stuck on MASTER_DELAY,
but on a slow/overloaded slave the event might pass MASTER_DELAY
before the test starts waiting.
Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY),
the event cannot avoid that,
commit_try_norebuild(): Add the parameter statistics_exist,
similar to commit_try_rebuild(). If the InnoDB statistics tables
did not exist, we will not attempt to update statistics later on
during the transaction.
Thanks to Matthias Leich for originally reproducing this scenario.
The test could fail with a duplicate key error because switching to non-GTID
mode could start at the wrong old-style position. The position could be
wrong when the previous GTID connect was stopped before receiving the fake
GTID list event which gives the old-style position corresponding to the GTID
connected position.
Work-around by injecting an extra event and syncing the slave before
switching to non-GTID mode.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Problem:
========
- Partition update operation enables the bulk insert for the
transaction while moving the row between partitions. This leads
to debug assert failure while removing the row from one
of the partition.
Solution:
========
- Disallow the bulk insert operation for non-insert operation
of partition table.
MDL wait consists of short 1 second waits (this is not configurable)
repeated until lock_wait_timeout is reached. The stage is changed
to Waiting and back every second. To have predictable result in the
test the query should filter all sequences of X, "Waiting for MDL", X,
leaving just X.
Extends 89c907bd4f to account for
binlog_two_phase_alter flags in a Gtid log event. I.e., if the
FL_COMMIT_ALTER_E1 or FL_ROLLBACK_ALTER_E2 flags are set in the
event flags, yet the length of the event is too short to hold
the value, then set the event as invalid
Problem:
========
mariabackup --prepare fails to write the pages in encrypted format.
This issue happens only for default encrypted table when
innodb_encrypt_tables variable is enabled.
Fix:
====
backup process should write the value of innodb_encrypt_tables
variable in configuration file. prepare should enable the
variable based on configuration file.
In case of partition insert, InnoDB fails to end the bulk insert
for one of the partition. It leads to bulk insert operation for
the consecutive delete statement.
trx_t::bulk_insert_apply_for_table(): Irrespective of bulk insert
value, InnoDB should end the bulk insert for the table.
first stop the slave, then run commands on the master that are
supposed to fail on the slave, then start the slave.
if you swap first two steps, the slave might get and execute those
commands before it's stopped, which will fail the test.
also, improve debugability
in the $case=2 - it's wrong to kill after the first binlog EOF,
because that might happen between INSERT(4) and INSERT(5).
So, wait for the slave to acknowledge INSERT(5) before killing
the master, that is, both connection threads must pass
repl_semisync_master.wait_after_sync()
The root cause of the failure is a bug in the Linux network stack:
https://lore.kernel.org/netdev/87sf0ldk41.fsf@urd.knielsen-hq.org/T/#u
If the slave does a connect(2) at the exact same time that kill -9 of the
master process closes the listening socket, the FIN or RST packet is lost in
the kernel, and the slave ends up timing out waiting for the initial
communication from the server. This timeout defaults to
--slave-net-timeout=120, which causes include/master_gtid_wait.inc to time
out first and fail the test.
Work-around this problem by reducing the --slave-net-timeout for this test
case. If this problem turns up in other tests, we can consider reducing the
default value for all tests.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
bulk_insert_apply_for_table(dict_table_t*)
This issue is caused by
commit 188c5da72a (MDEV-32453).
trx_t::bulk_insert_apply_for_table(): Remove the assert
check_unique_secondary and check_foreigns. InnoDB can
apply the bulk insert operation even after disabling
the check_foreigns and check_unique_secondary variable.