Commit graph

206 commits

Author SHA1 Message Date
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
Marko Mäkelä
7d4077cc11 Merge 10.5 into 10.6 2024-11-29 12:37:46 +02:00
Jan Lindström
f39217da0c MDEV-35473 : Sporadic failures in the galera_3nodes.galera_evs_suspect_timeout mtr test
Remove unnecessary sleep and replace it with proper wait_conditions.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-11-28 01:02:35 +01:00
Thirunarayanan Balathandayuthapani
074831ec61 Merge branch 10.5 into 10.6 2024-11-08 18:17:15 +05:30
Julius Goryavsky
db68eb69f9 MDEV-35344: post-fix correction for other galera tests 2024-11-06 04:59:10 +01:00
Julius Goryavsky
f176248d4b Merge branch '10.6' into '10.11' 2024-09-17 06:23:10 +02:00
Julius Goryavsky
80fff4c6b1 Merge branch '10.5' into '10.6' 2024-09-16 16:39:59 +02:00
Julius Goryavsky
7ee0e60bbb galera mtr tests: minor fixes to make tests more reliable 2024-09-15 05:05:03 +02:00
Julius Goryavsky
b3cc952916 galera tests: updated .result for galera_gtid_2_cluster test 2024-09-03 07:21:43 +02:00
Julius Goryavsky
d058be62b8 Merge branch '10.6' into '10.11' 2024-09-02 03:49:03 +02:00
Julius Goryavsky
bac0804d81 Merge branch '10.5' into '10.6' 2024-09-01 06:51:25 +02:00
Jan Lindström
dd64f29d6b MDEV-33897 : Galera test failure on galera_3nodes.galera_gtid_consistency
Based on logs SST was started before donor reached
Primaty state. Add wait_conditions to make sure that
nodes reach Primary state before starting next node.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-09-01 03:01:37 +02:00
Alexey Yurchenko
731a5aba0b Use only MySQL code for TOI error vote
For TOI events specifically we have a situation where in case of the
same error different nodes may generate different messages. This may
be for two reasons:
 - different locale setting between the current client session and
   server default (we can reasonably require server locales to be
   identical on all nodes, but user can change message locale for the
   session)
 - non-deterministic course of STATEMENT execution e.g. for ALTER TABLE

On the other hand we may reasonably expect TOI event failures since
they are executed after replication, so we must ensure that voting is
consistent. For that purpose error codes should be sufficiently unique
and deterministic for TOI event failures as DDLs normally deal with
a single object, so we can merely use MySQL error codes to vote on.

Notice that this problem does not happen with regular transactional
writesets, since the originator node will always vote success and
replica nodes are assumed to have the same global locale setting.
As such different error messages indicate different errors even if
the error code is the same (e.g. ER_DUP_KEY can happen on different
rows tables).

Use only MySQL error code (without the error message) for error voting
in case of TOI event failure.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-09-01 02:58:27 +02:00
Marko Mäkelä
62bfcfd8b2 Merge 10.6 into 10.11 2024-08-14 11:36:52 +03:00
Marko Mäkelä
757c368139 Merge 10.5 into 10.6 2024-08-14 10:56:11 +03:00
Jan Lindström
71f289e5d1 MDEV-25614 : Galera test failure on GCF-354
Modified node config with longer timeouts for suspect,
inactive, install and wait_prim timeout. Increased
node_1 weight to keep it primary component when
other nodes are voted out.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-08-04 17:59:13 +02:00
Jan Lindström
cb80ef93a9 MDEV-32778 : galera_ssl_reload failed with warning message
Fixed used configuration and added suppression for warning
message. Test case changes only.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-08-04 17:54:05 +02:00
Oleksandr Byelkin
0fe39d368a Merge branch '10.6' into 10.11 2024-07-22 15:14:50 +02:00
Julius Goryavsky
4026f04425 Merge branch 10.5 into 10.6 2024-07-09 11:56:47 +02:00
Julius Goryavsky
d0a2d4e755 galera mtr tests: correction of inaccuracies in warnings suppressions 2024-07-08 23:36:21 +02:00
Marko Mäkelä
27a3366663 Merge 10.6 into 10.11 2024-06-27 10:26:09 +03:00
Jan Lindström
ee974ca5e0 MDEV-31658 : Deadlock found when trying to get lock during applying
Problem was that there was two non-conflicting local idle
transactions in node_1 that both inserted a key to primary key.
Then two transactions from other nodes inserted also
a key to primary key so that insert from node_2 conflicted
one of the local transactions in node_1 so that there would
be duplicate key if both are committed. For this insert
from other node tries to acquire S-lock for this record
and because this insert is high priority brute force (BF)
transaction it will kill idle local transaction.

Concurrently, second insert from node_3 conflicts the second
idle insert transaction in node_1. Again, it tries to acquire
S-lock for this record and kills idle local transaction.

At this point we have two non-conflicting high priority
transactions holding S-lock on different records in node_1.
For example like this: rec s-lock-node2-rec s-lock-node3-rec rec.

Because these high priority BF-transactions do not wait
each other insert from node3 that has later seqno compared
to insert from node2 can continue. It will try to acquire
insert intention for record it tries to insert (to avoid
duplicate key to be inserted by local transaction). Hower,
it will note that there is conflicting S-lock in same gap
between records. This will lead deadlock error as we have
defined that BF-transactions may not wait for record lock
but we can't kill conflicting BF-transaction because
it has lower seqno and it should commit first.

BF-transactions are executed concurrently because their
values to primary key are different i.e. they do not
conflict.

Galera certification will make sure that inserts from
other nodes i.e these high priority BF-transactions
can't insert duplicate keys. Local transactions naturally
can but they will be killed when BF-transaction
acquires required record locks.

Therefore, we can allow situation where there is conflicting
S-lock and insert intention lock regardless of their seqno
order and let both continue with no wait. This will lead
to situation where we need to allow BF-transaction
to wait when lock_rec_has_to_wait_in_queue is called
because this function is also called from
lock_rec_queue_validate and because lock is waiting
there would be assertion in ut_a(lock->is_gap()
|| lock_rec_has_to_wait_in_queue(cell, lock));

lock_wait_wsrep_kill
  Add debug sync points for BF-transactions killing
  local transaction.

wsrep_assert_no_bf_bf_wait
  Print also requested lock information

lock_rec_has_to_wait
  Add function to handle wsrep transaction lock wait
  cases.

lock_rec_has_to_wait_wsrep
  New function to handle wsrep transaction lock wait
  exceptions.

lock_rec_has_to_wait_in_queue
  Remove wsrep exception, in this function all
  conflicting locks need to wait in queue.
  Conflicts between BF and local transactions
  are handled in lock_wait.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-06-19 14:09:11 +02:00
Marko Mäkelä
b81d717387 Merge 10.6 into 10.11 2024-06-11 12:50:10 +03:00
Marko Mäkelä
a687cf8661 Merge 10.5 into 10.6 2024-06-07 10:03:51 +03:00
Denis Protivensky
a4838721a2 MDEV-32633: Fix Galera cluster <-> native replication interaction
GTID events are applied without a running server transaction,
we need to set next transaction ID for Wsrep transaction.

The whole Galera cluster now has a single GTID value (including
the server ID throughout the cluster), fix the config accordingly.

Add force restart so that repeated MTR test execution prints
consistent GTID values, otherwise they would have been recovered
from the previous run.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-06-03 09:48:13 +02:00
Denis Protivensky
0cc9b49751 MDEV-32633: Fix Galera cluster <-> native replication interaction
It's possible to establish Galera multi-cluster setups connected
through the native replication when every Galera cluster is configured
to have a separate domain ID.
For this setup to work, we need to replace domain ID values in generated
GTID events when they are written at transaction commit to the values
configured by Wsrep replication.

At the same time, it's possible that the GTID event already contains
a correct domain ID if it comes through the native replication from
another Galera cluster.
In this case, when such an event is applied either through a native
replication slave thread or through Wsrep applier, we write GTID event
on transaction start and avoid writing it during transaction commit.

The code contained multiple problems that were fixed:
- applying GTID events didn't work because it's applied without a
running server transaction and Wsrep transaction was not started
- GTID event generation on transaction start didn't contain proper
"standalone" and "is_transactional" flags that the original applied
GTID event contained
- condition determining that GTID event is written on transaction start
to avoid writing it on commit relied on the fact that the GTID event
is the first found in transaction/statement caches, which wasn't the
case and resulted in duplicate GTID events written
- instead of relying on the caches to find a GTID event, a simple check
is introduced that follows the exact rules for checking if event is
written at transaction start as described above
- the test case is improved to check that exact GTID events are
applied after two Galera clusters have synced.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-06-03 09:48:13 +02:00
Sergei Golubchik
0aae11ac28 Merge branch '10.6' into 10.11 2024-04-30 16:56:49 +02:00
Sergei Golubchik
c1f3eff53f Merge branch '10.5' into 10.6 2024-04-29 10:08:58 +02:00
Jan Lindström
b3e531a3cc MDEV-33896 : Galera test failure on galera_3nodes.MDEV-29171
Based on logs we might start SST before donor has reached
Primary state. Because this test shutdowns all nodes we
need to make sure when we start nodes that previous nodes
have reached Primary state and joined the cluster.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-04-25 16:32:06 +02:00
Jan Lindström
baec63e304 MDEV-33787 : Fix Galera test failures on 10.11 2024-04-03 10:04:40 +03:00
Marko Mäkelä
64cce8d5bf Merge 10.6 into 10.11 2024-02-14 16:12:53 +02:00
Marko Mäkelä
691f923906 Merge 10.5 into 10.6 2024-02-13 20:42:59 +02:00
Marko Mäkelä
8ec12e0d6d Merge 10.4 into 10.5 2024-02-12 11:38:13 +02:00
Jan Lindström
5b4456b38a MDEV-33036 : Galera test case galera_3nodes.galera_ist_gcache_rollover has warning
Correct used configuration and force server restarts before test
case. Add wait condition instead of sleep to verify that
all expected nodes are back to cluster.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-01-30 00:26:13 +01:00
Marko Mäkelä
bdf65893dd Merge 10.6 into 10.11 2024-01-03 15:37:57 +02:00
Marko Mäkelä
8bd5a3de7f Merge 10.5 into 10.6 2024-01-03 14:24:47 +02:00
Marko Mäkelä
3a3a4f044f Merge 10.4 into 10.5 2024-01-03 12:07:51 +02:00
Marko Mäkelä
e23c695250 Merge 10.5 into 10.6 2024-01-02 17:37:58 +02:00
sjaakola
c89f769f24 MDEV-31905 GTID inconsistency
This commit fixes GTID inconsistency which was injected by mariabackup SST.
Donor node now writes new info file: donor_galera_info, which is streamed
along the mariabackup donation to the joiner node. The donor_galera_info
file contains both GTID and gtid domain_id, and joiner will use these to
initialize the GTID state.

Commit has new mtr test case: galera_3nodes.galera_gtid_consistency, which
exercises potentially harmful mariabackup SST scenarios. The test has also
scenario with IST joining.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-12-22 00:10:23 +01:00
Jan Lindström
1dc6ded8b1 MDEV-20485 : Galera test failure on galera.galera_var_node_address
Loopback interface might not be configured, thus do not test it.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-12-21 18:45:57 +01:00
Marko Mäkelä
2b8dc7668a Merge 10.6 into 10.11 2023-12-21 13:19:17 +02:00
Marko Mäkelä
a81a138aab Merge 10.5 into 10.6 2023-12-21 12:58:11 +02:00
Jan Lindström
cfaab614ea MDEV-24481 : galera_3nodes.galera_vote_rejoin_mysqldump MTR failed: mysql_shutdown failed
Improve test case to wait until cluster membership is
correct.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-12-20 15:29:47 +01:00
Oleksandr Byelkin
ced243a099 Merge branch '10.9' into 10.10 2023-08-05 20:34:09 +02:00
Oleksandr Byelkin
6bf8483cac Merge branch '10.5' into 10.6 2023-08-01 15:08:52 +02:00
Oleksandr Byelkin
f52954ef42 Merge commit '10.4' into 10.5 2023-07-20 11:54:52 +02:00
Oleksandr Byelkin
45087dd0b3 Merge branch '10.9' into 10.10 2023-01-18 16:45:59 +01:00
Oleksandr Byelkin
795ff0daf0 Merge branch '10.6' into 10.7 2023-01-18 16:36:13 +01:00
Marko Mäkelä
a8c5635cf1 Merge 10.5 into 10.6 2023-01-17 20:02:29 +02:00
Daniele Sciascia
9ec475c376 MDEV-29171 changing the value of wsrep_gtid_domain_id with full cluster restart fails on some nodes
Fix `wsrep_init_gtid()` to avoid overwriting the domain id received
during state transfer.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2023-01-17 14:08:28 +02:00