Based on logs SST was started before donor reached
Primaty state. Add wait_conditions to make sure that
nodes reach Primary state before starting next node.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
For TOI events specifically we have a situation where in case of the
same error different nodes may generate different messages. This may
be for two reasons:
- different locale setting between the current client session and
server default (we can reasonably require server locales to be
identical on all nodes, but user can change message locale for the
session)
- non-deterministic course of STATEMENT execution e.g. for ALTER TABLE
On the other hand we may reasonably expect TOI event failures since
they are executed after replication, so we must ensure that voting is
consistent. For that purpose error codes should be sufficiently unique
and deterministic for TOI event failures as DDLs normally deal with
a single object, so we can merely use MySQL error codes to vote on.
Notice that this problem does not happen with regular transactional
writesets, since the originator node will always vote success and
replica nodes are assumed to have the same global locale setting.
As such different error messages indicate different errors even if
the error code is the same (e.g. ER_DUP_KEY can happen on different
rows tables).
Use only MySQL error code (without the error message) for error voting
in case of TOI event failure.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Modified node config with longer timeouts for suspect,
inactive, install and wait_prim timeout. Increased
node_1 weight to keep it primary component when
other nodes are voted out.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Fixed used configuration and added suppression for warning
message. Test case changes only.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that there was two non-conflicting local idle
transactions in node_1 that both inserted a key to primary key.
Then two transactions from other nodes inserted also
a key to primary key so that insert from node_2 conflicted
one of the local transactions in node_1 so that there would
be duplicate key if both are committed. For this insert
from other node tries to acquire S-lock for this record
and because this insert is high priority brute force (BF)
transaction it will kill idle local transaction.
Concurrently, second insert from node_3 conflicts the second
idle insert transaction in node_1. Again, it tries to acquire
S-lock for this record and kills idle local transaction.
At this point we have two non-conflicting high priority
transactions holding S-lock on different records in node_1.
For example like this: rec s-lock-node2-rec s-lock-node3-rec rec.
Because these high priority BF-transactions do not wait
each other insert from node3 that has later seqno compared
to insert from node2 can continue. It will try to acquire
insert intention for record it tries to insert (to avoid
duplicate key to be inserted by local transaction). Hower,
it will note that there is conflicting S-lock in same gap
between records. This will lead deadlock error as we have
defined that BF-transactions may not wait for record lock
but we can't kill conflicting BF-transaction because
it has lower seqno and it should commit first.
BF-transactions are executed concurrently because their
values to primary key are different i.e. they do not
conflict.
Galera certification will make sure that inserts from
other nodes i.e these high priority BF-transactions
can't insert duplicate keys. Local transactions naturally
can but they will be killed when BF-transaction
acquires required record locks.
Therefore, we can allow situation where there is conflicting
S-lock and insert intention lock regardless of their seqno
order and let both continue with no wait. This will lead
to situation where we need to allow BF-transaction
to wait when lock_rec_has_to_wait_in_queue is called
because this function is also called from
lock_rec_queue_validate and because lock is waiting
there would be assertion in ut_a(lock->is_gap()
|| lock_rec_has_to_wait_in_queue(cell, lock));
lock_wait_wsrep_kill
Add debug sync points for BF-transactions killing
local transaction.
wsrep_assert_no_bf_bf_wait
Print also requested lock information
lock_rec_has_to_wait
Add function to handle wsrep transaction lock wait
cases.
lock_rec_has_to_wait_wsrep
New function to handle wsrep transaction lock wait
exceptions.
lock_rec_has_to_wait_in_queue
Remove wsrep exception, in this function all
conflicting locks need to wait in queue.
Conflicts between BF and local transactions
are handled in lock_wait.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
GTID events are applied without a running server transaction,
we need to set next transaction ID for Wsrep transaction.
The whole Galera cluster now has a single GTID value (including
the server ID throughout the cluster), fix the config accordingly.
Add force restart so that repeated MTR test execution prints
consistent GTID values, otherwise they would have been recovered
from the previous run.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
It's possible to establish Galera multi-cluster setups connected
through the native replication when every Galera cluster is configured
to have a separate domain ID.
For this setup to work, we need to replace domain ID values in generated
GTID events when they are written at transaction commit to the values
configured by Wsrep replication.
At the same time, it's possible that the GTID event already contains
a correct domain ID if it comes through the native replication from
another Galera cluster.
In this case, when such an event is applied either through a native
replication slave thread or through Wsrep applier, we write GTID event
on transaction start and avoid writing it during transaction commit.
The code contained multiple problems that were fixed:
- applying GTID events didn't work because it's applied without a
running server transaction and Wsrep transaction was not started
- GTID event generation on transaction start didn't contain proper
"standalone" and "is_transactional" flags that the original applied
GTID event contained
- condition determining that GTID event is written on transaction start
to avoid writing it on commit relied on the fact that the GTID event
is the first found in transaction/statement caches, which wasn't the
case and resulted in duplicate GTID events written
- instead of relying on the caches to find a GTID event, a simple check
is introduced that follows the exact rules for checking if event is
written at transaction start as described above
- the test case is improved to check that exact GTID events are
applied after two Galera clusters have synced.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Based on logs we might start SST before donor has reached
Primary state. Because this test shutdowns all nodes we
need to make sure when we start nodes that previous nodes
have reached Primary state and joined the cluster.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Correct used configuration and force server restarts before test
case. Add wait condition instead of sleep to verify that
all expected nodes are back to cluster.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This commit fixes GTID inconsistency which was injected by mariabackup SST.
Donor node now writes new info file: donor_galera_info, which is streamed
along the mariabackup donation to the joiner node. The donor_galera_info
file contains both GTID and gtid domain_id, and joiner will use these to
initialize the GTID state.
Commit has new mtr test case: galera_3nodes.galera_gtid_consistency, which
exercises potentially harmful mariabackup SST scenarios. The test has also
scenario with IST joining.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>