Problem:
=======
When the semisync master is crashed and restarted as slave it could
recover transactions that former slaves may never have seen.
A known method existed to clear out all prepared transactions
with --tc-heuristic-recover=rollback does not care to adjust
binlog accordingly.
Fix:
===
The binlog-based recovery is made to concern of the slave semisync role of
post-crash restarted server.
No changes in behavior is done to the "normal" binloggging server
and the semisync master.
When the restarted server is configured with
--rpl-semi-sync-slave-enabled=1
the refined recovery attempts to roll back prepared transactions
and truncate binlog accordingly.
In case of a partially committed (that is committed at least
in one of the engine participants) such transaction gets committed.
It's guaranteed no (partially as well) committed transactions
exist beyond the truncate position.
In case there exists a non-transactional replication event
(being in a way a committed transaction) past the
computed truncate position the recovery ends with an error.
As after master crash and failover to slave, the demoted-to-slave
ex-master must be ready to face and accept its own (generated by)
events, without generally necessary --replicate-same-server-id.
So the acceptance conditions are relaxed for the semisync slave
to accept own events without that option.
While gtid_strict_mode ON ensures no duplicate transaction can be
(re-)executed the master_use_gtid=none slave has to be
configured with --replicate-same-server-id.
*NOTE* for reviewers.
This patch does not handle the user XA which is done
in next git commit.
The new option --log-innodb-page-corruption is introduced.
When this option is set, backup is not interrupted if innodb corrupted
page is detected. Instead it logs all found corrupted pages in
innodb_corrupted_pages file in backup directory and finishes with error.
For incremental backup corrupted pages are also copied to .delta file,
because we can't do LSN check for such pages during backup,
innodb_corrupted_pages will also be created in incremental backup
directory.
During --prepare, corrupted pages list is read from the file just after
redo log is applied, and each page from the list is checked if it is allocated
in it's tablespace or not. If it is not allocated, then it is zeroed out,
flushed to the tablespace and removed from the list. If all pages are removed
from the list, then --prepare is finished successfully and
innodb_corrupted_pages file is removed from backup directory. Otherwise
--prepare is finished with error message and innodb_corrupted_pages contains
the list of the pages, which are detected as corrupted during backup, and are
allocated in their tablespaces, what means backup directory contains corrupted
innodb pages, and backup can not be considered as consistent.
For incremental --prepare corrupted pages from .delta files are applied
to the base backup, innodb_corrupted_pages is read from both base in
incremental directories, and the same action is proceded for corrupted
pages list as for full --prepare. innodb_corrupted_pages file is
modified or removed only in base directory.
If DDL happens during backup, it is also processed at the end of backup
to have correct tablespace names in innodb_corrupted_pages.
Currently, running mtr with an incorrect (for example, new or
obsolete) version of wsrep_provider (for example, with the 26
version of libgalera_smm.so) leads to the failure of tests in
several suites with vague error diagnostics.
As for the galera_3nodes suite, the mtr also does not effectively
check all the prerequisites after merge with MDEV-18426 fixes.
For example, tests that using mariabackup do not check for presence
of ss and socat/nc. This is due to improper handling of relative
paths in mtr scripts.
In addition, some tests in different suites can be run without
setting the environment variables such as MTR_GALERA_TFMT, XBSTREAM,
and so on.
To eliminate all these issues, this patch makes the following changes:
1. Added auxiliary wsrep_mtr_check utility (which located in the
mysql-test/lib/My/SafeProcess subdirectory), which compares the
versions of the wsrep API that used by the server and by the wsrep
provider library, and it does this comparison safely, without
accessing the API if the versions do not match.
2. All checks related to the presence of mariabackup and utilities
that necessary for its operation transferred from the local directories
of different mtr suites (from the suite.pm files) to the main suite.pm
file. This not only reduces the amount of code and eliminates duplication
of identical code fragments, but also avoids problems due to the inability
of mtr to consider relative paths to include files when checking skip
combinations.
3. Setting the values of auxiliary environment variables that
are necessary for Galera, SST scripts and mariabackup (to work
properly) is moved to the main mysql-test-run.pl script, so as
not to duplicate this code in different suites, and to avoid
partial corrections of the same errors for different suites
(while other suites remain uncorrected).
4. Fixed duplication of the have_file_key_management.inc and
have_filekeymanagement.inc files between different suites,
these checks are also transferred to the top level.
5. Added garbd presence check and garbd path variable.
https://jira.mariadb.org/browse/MDEV-18565
Currently, running mtr with an incorrect (for example, new or
obsolete) version of wsrep_provider (for example, with the 26
version of libgalera_smm.so) leads to the failure of tests in
several suites with vague error diagnostics.
As for the galera_3nodes suite, the mtr also does not effectively
check all the prerequisites after merge with MDEV-18426 fixes.
For example, tests that using mariabackup do not check for presence
of ss and socat/nc. This is due to improper handling of relative
paths in mtr scripts.
In addition, some tests in different suites can be run without
setting the environment variables such as MTR_GALERA_TFMT, XBSTREAM,
and so on.
To eliminate all these issues, this patch makes the following changes:
1. Added auxiliary wsrep_mtr_check utility (which located in the
mysql-test/lib/My/SafeProcess subdirectory), which compares the
versions of the wsrep API that used by the server and by the wsrep
provider library, and it does this comparison safely, without
accessing the API if the versions do not match.
2. All checks related to the presence of mariabackup and utilities
that necessary for its operation transferred from the local directories
of different mtr suites (from the suite.pm files) to the main suite.pm
file. This not only reduces the amount of code and eliminates duplication
of identical code fragments, but also avoids problems due to the inability
of mtr to consider relative paths to include files when checking skip
combinations.
3. Setting the values of auxiliary environment variables that
are necessary for Galera, SST scripts and mariabackup (to work
properly) is moved to the main mysql-test-run.pl script, so as
not to duplicate this code in different suites, and to avoid
partial corrections of the same errors for different suites
(while other suites remain uncorrected).
4. Fixed duplication of the have_file_key_management.inc and
have_filekeymanagement.inc files between different suites,
these checks are also transferred to the top level.
5. Added garbd presence check and garbd path variable.
https://jira.mariadb.org/browse/MDEV-18565
InnoDB I/O and buffer pool interfaces and the redo log format
have been changed between MariaDB 10.1 and 10.2, and the backup
code has to be adjusted accordingly.
The code has been simplified, and many memory leaks have been fixed.
Instead of the file name xtrabackup_logfile, the file name ib_logfile0
is being used for the copy of the redo log. Unnecessary InnoDB startup and
shutdown and some unnecessary threads have been removed.
Some help was provided by Vladislav Vaintroub.
Parameters have been cleaned up and aligned with those of MariaDB 10.2.
The --dbug option has been added, so that in debug builds,
--dbug=d,ib_log can be specified to enable diagnostic messages
for processing redo log entries.
By default, innodb_doublewrite=OFF, so that --prepare works faster.
If more crash-safety for --prepare is needed, double buffering
can be enabled.
The parameter innodb_log_checksums=OFF can be used to ignore redo log
checksums in --backup.
Some messages have been cleaned up.
Unless --export is specified, Mariabackup will not deal with undo log.
The InnoDB mini-transaction redo log is not only about user-level
transactions; it is actually about mini-transactions. To avoid confusion,
call it the redo log, not transaction log.
We disable any undo log processing in --prepare.
Because MariaDB 10.2 supports indexed virtual columns, the
undo log processing would need to be able to evaluate virtual column
expressions. To reduce the amount of code dependencies, we will not
process any undo log in prepare.
This means that the --export option must be disabled for now.
This also means that the following options are redundant
and have been removed:
xtrabackup --apply-log-only
innobackupex --redo-only
In addition to disabling any undo log processing, we will disable any
further changes to data pages during --prepare, including the change
buffer merge. This means that restoring incremental backups should
reliably work even when change buffering is being used on the server.
Because of this, preparing a backup will not generate any further
redo log, and the redo log file can be safely deleted. (If the
--export option is enabled in the future, it must generate redo log
when processing undo logs and buffered changes.)
In --prepare, we cannot easily know if a partial backup was used,
especially when restoring a series of incremental backups. So, we
simply warn about any missing files, and ignore the redo log for them.
FIXME: Enable the --export option.
FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write
a test that initiates a backup while an ALGORITHM=INPLACE operation
is creating indexes or rebuilding a table. An error should be detected
when preparing the backup.
FIXME: In --incremental --prepare, xtrabackup_apply_delta() should
ensure that if FSP_SIZE is modified, the file size will be adjusted
accordingly.
Throttling only works with when creating backup. Attempt to use it with
--copy-back results in crash, since throttle events are not initialized.
Thus, ignore throttling unless --backup is given.