If mariabackup with backup locks is used on SST we do not
pause and desync galera provider at all. If WSREP_MODE_BF_MARIABACKUP
case provider is paused and desync at BLOCK_COMMIT phase. In
other cases provider is paused and desync at BLOCK_DDL phase.
After two concurrent FTWRL/UNLOCK TABLES, the node stays in paused state
and the following CREATE TABLE fails with
ER_UNKNOWN_COM_ERROR (1047): Aborting TOI: Replication paused on
node for FTWRL/BACKUP STAGE.
The cause is the use of global `wsrep_locked_seqno` to determine
if the node should be resumed on UNLOCK TABLES. In some executions
the `wsrep_locked_seqno` is cleared by the first UNLOCK TABLES
after the second FTWRL gets past `make_global_read_lock_block_commit()`.
As a fix, use `thd->wsrep_desynced_backup_stage` to determine
if the thread should resume the node on UNLOCK TABLES.
Add MTR test galera.galera_ftwrl_concurrent to reproduce the
race. The test contains also cases for BACKUP STAGE which
uses similar mechanism for desyncing and pausing the node.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Revert the old work-around for buggy fdatasync() on Linux ext3. This bug was
fixed in Linux > 10 years ago back to kernel version at least 3.0.
Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Problem was incorrect condition when node should have
resumed and resync at backup_end. Simplified condition
to fix the problem and added missing test case for
this wsrep_mode = BF_ABORT_MARIABACKUP.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This commit changes backup execution (namely the block ddl phase),
so that node is not paused from cluster. Instead, the following
backup execution is declared as vulnerable for possible cluster
level conflicts, especially with DDL statement applying.
With this, the mariabackup execution may be aborted, if DDL
statements happen during backup execution. This abortable
backup execution is optional feature and may be
enabled/disabled by wsrep_mode: BF_ABORT_MARIABACKUP.
Note that old style node desync and pause, despite of
WSREP_MODE_BF_MARIABACKUP is needed if node is operating as
SST donor.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
MENT-328 wrongly assumed that the backup failed because of warnings from
mariabackup about not found files. This is normal (and the error message
should be deleted).
randgen failed because mariabackup didn't retry BACKUP STAGE BLOCK DDL
if it failed with a deadlock.
To simplify things, I implemented the retry loop in the server as
this particular deadlock should be quickly resolved.
This adds following new thread states:
* waiting to execute in isolation - DDL is waiting to execute in TOI mode.
* waiting for TOI DDL - some other statement is waiting for DDL to complete.
* waiting for flow control - some statement is paused while flow control is in effect.
* waiting for certification - the transaction is being certified.
make BACKUP STAGE behave as FTWRL, desyncing and pausing the node
to prevent BF threads (appliers) from interfering with blocking stages.
This is needed because BF threads don't respect BACKUP MDL locks.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
The issue was that we sent two different signals to different threads
after each other. The DEBUG_SYNC functionality cannot handle this (as
the signal is stored in a global variable) and the first one can get
lost.
Fixed by using the same signal for both threads.
MDEV-20945: BACKUP UNLOCK + FTWRL assertion failure | SIGSEGV in I_P_List
from MDL_context::release_lock on INSERT w/ BACKUP LOCK (on optimized
builds) | Assertion `ticket->m_duration == MDL_EXPLICIT' failed
BACKUP LOCK behavior is modified so it won't be used wrong:
- BACKUP LOCK should commit any active transactions.
- BACKUP LOCK should not be allowed in stored procedures.
- When BACKUP LOCK is active, don't allow any DDL's for that connection.
- FTWRL is forbidden on the same connection while BACKUP LOCK is active.
Reviewed-by: monty@mariadb.com
It was mistakenly used by tdc_start_shutdown() to make sure TABLE_SHARE
gets evicted from table definition cache when it becomes unused. However
same effect is achieved by resetting tdc_size and tc_size.
Part of MDEV-17882 - Cleanup refresh version
The problem was two fault:
- flush_tables() wrongly gave errors when failing to open read only tables
- backup_block_ddl() didn't properly ignores errors from flush_tables()
The test case for this will be pushed in 10.5 as the test involves
S3 tables.
Part of MDEV-5336 Implement LOCK FOR BACKUP
- Changed check of Global_only_lock to also include BACKUP lock.
- We store latest MDL_BACKUP_DDL lock in thd->mdl_backup_ticket to be able
to downgrade lock during copy_data_between_tables()