log_group_read_log_seg(): Always return false when returning
before reading end_lsn.
xtrabackup_copy_logfile(): On error, indicate whether
a corrupt log record was encountered.
Only xtrabackup_copy_logfile() in Mariabackup cared about the
return value of the function. InnoDB crash recovery was not
affected by this bug.
The variable is obsolete.
In mariabackup --backup, encryption plugin loading code sets the value
this parameter to the same as in server.
In mariabackup --prepare, no new redo log is generated,
and xtrabackup_logfile is removed after it anyway.
Fix one more bug in "DDL redo" phase in prepare
If table was renamed, and then new table was created with the old name,
prepare can be confused, and .ibd can end up with wrong name.
Fix the order of how DDL fixup is applied , once again - ".new" files
should be processed after renames.
If, during backup
1) Innodb table is dropped (after being copied to backup) and then
2) Before backup finished, another Innodb table is renamed, and new name
is the name of the dropped table in 1)
then, --prepare fails with assertion, as DDL fixup code in prepare
did not handle this specific case.
The fix is to process drops before renames, in prepare DDL-"redo" phase.
If an encrypted table is created during backup, then
mariabackup --backup could wrongly fail.
This caused a failure of the test mariabackup.huge_lsn once on buildbot.
This is due to the way how InnoDB creates .ibd files. It would first
write a dummy page 0 with no encryption information. Due to this,
xb_fil_cur_open() could wrongly interpret that the table is not encrypted.
Subsequently, page_is_corrupted() would compare the computed page
checksum to the wrong checksum. (There are both "before" and "after"
checksums for encrypted pages.)
To work around this problem, we introduce a Boolean option
--backup-encrypted that is enabled by default. With this option,
Mariabackup will assume that a nonzero key_version implies that the
page is encrypted. We need this option in order to be able to copy
encrypted tables from MariaDB 10.1 or 10.2, because unencrypted pages
that were originally created before MySQL 5.1.48 could contain nonzero
garbage in the fields that were repurposed for encryption.
Later, MDEV-18128 would clean up the way how .ibd files are created,
to remove the need for this option.
page_is_corrupted(): Add missing const qualifiers, and do not check
space->crypt_data unless --skip-backup-encrypted has been specified.
xb_fil_cur_read(): After a failed page read, output a page dump.
would not hide more interesting information, like invalid memory accesses.
some "leaks" are expected
- partly this is due to weird options parsing, that runs twice, and
does not free memory after the first run.
- also we do not mind to exit() whenever it makes sense, without full
cleanup.
- Refactor code to isolate page validation in page_is_corrupted() function.
- Introduce --extended-validation parameter(default OFF) for mariabackup
--backup to enable decryption of encrypted uncompressed pages during
backup.
- mariabackup would still always check checksum on encrypted data,
it is needed to detect partially written pages.
ported privilege checking from xtrabackup.
Now, mariabackup would terminate early if either RELOAD or PROCESS privilege
is not held, not at the very end of backup
The behavior can be disabled with nre setting --check-privileges=0.
Also , --no-lock does not need all of these privileges, since it skips
FTWRL and SHOW ENGINE STATUS INNODB.
After validating the post-encryption checksum on an encrypted page,
Mariabackup should decrypt the page and validate the pre-encryption
checksum as well. This should reduce the probability of accepting
invalid pages as valid ones.
This is a backport and refactoring of a patch that was
originally written by Thirunarayanan Balathandayuthapani
for the 10.2 branch.
fil_space_t::add(): Replaces fil_node_create(), fil_node_create_low().
Let the caller pass fil_node_t::handle, to avoid having to close and
re-open files.
fil_node_t::read_page0(): Refactored from fil_node_open_file().
Read the first page of a data file.
fil_node_open_file(): Open the file only once.
srv_undo_tablespace_open(): Set the file handle for the opened
undo tablespace. This should ensure that ut_ad(file->is_open())
no longer fails in recv_add_trim().
xtrabackup_backup_func(): Remove some dead code.
xb_fil_cur_open(): Open files only if needed. Undo tablespaces
should already have been opened.
Simplify, and make it work with system tablespace outside of
innodb data home.
Also, do not reread TRX_SYS page in endless loop,
if it appears to be corrupted.
Use finite number of attempts.
if custom undo tablespace is defined
- In case of multiple undo tablespace, mariabackup have to open system
tablespace to find the list of undo tablespace present in TRX_SYS page.
For opening system tablespace, mariabackup should fetch the file name
from already initialized system tablespace object.
This amends commit 4dc20ff687.
Starting with MariaDB 10.2, InnoDB defines
typedef size_t ulint;
The standard format for size_t uses the z modifier, for example,
"%zu" as in the macro ULINTPF.
"%lu" is wrong for size_t, because sizeof(unsigned long) can be
something else than sizeof(size_t). On Windows, the former would
always be 4 bytes, while size_t would be 4 or 8 bytes.
On Unix, it is compiled-in datadir value.
On Windows, the directory is ..\data, relative to directory
mariabackup.exe
server uses the same logic to determine datadir.
A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding
ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version
would trigger a debug assertion failure during rollback,
in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the
TRX_UNDO_RENAME_TABLE record would be misinterpreted as an
update_undo log record, and typically the file name would be
interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching
record would be found, row_undo_mod() would hit ut_error in
switch (node->rec_type). Typically, ut_a(table2 == NULL) would
fail when opening the table from SQL.
Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2
versions by changing the InnoDB redo log format identifier to the
10.3 identifier, and by introducing a subformat identifier so that
10.2 can continue to refuse crash-downgrade from 10.3 or later.
After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would
still be possible thanks to MDEV-14909. A downgrade to older 10.2
versions is only possible after removing the log files (not recommended).
LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format).
log_group_t: Add subformat. For 10.2, we will use subformat 1,
and will refuse crash recovery from any other subformat of the
10.3 format, that is, a genuine 10.3 redo log.
recv_find_max_checkpoint(): Allow startup after clean shutdown
from a future LOG_HEADER_FORMAT_10_4 (unencrypted only).
We cannot handle the encrypted 10.4 redo log block format,
which was introduced in MDEV-12041. Allow crash recovery from
the original 10.2 format as well as the new format.
In Mariabackup --backup, do not allow any startup from 10.3 or 10.4
redo logs.
recv_recovery_from_checkpoint_start(): Skip redo log apply for
clean 10.3 redo log, but not for the new 10.2 redo log
(10.3 format, subformat 1).
srv_prepare_to_delete_redo_log_files(): On format or subformat
mismatch, set srv_log_file_size = 0, so that we will display the
correct message.
innobase_start_or_create_for_mysql(): Check for format or subformat
mismatch.
xtrabackup_backup_func(): Remove debug assertions that were made
redundant by the code changes in recv_find_max_checkpoint().
Implement undo tablespace truncation via normal redo logging.
Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name,
CREATE, and DROP.
Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2
is killed before the DROP operation is committed. If MariaDB Server 10.2
is killed during TRUNCATE, it is also possible that the old table
was renamed to #sql-ib*.ibd but the data dictionary will refer to the
table using the original name.
In MariaDB Server 10.3, RENAME inside InnoDB is transactional,
and #sql-* tables will be dropped on startup. So, this new TRUNCATE
will be fully crash-safe in 10.3.
ha_mroonga::wrapper_truncate(): Pass table options to the underlying
storage engine, now that ha_innobase::truncate() will need them.
rpl_slave_state::truncate_state_table(): Before truncating
mysql.gtid_slave_pos, evict any cached table handles from
the table definition cache, so that there will be no stale
references to the old table after truncating.
== TRUNCATE TABLE ==
WL#6501 in MySQL 5.7 introduced separate log files for implementing
atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB
undo and redo log. Some convoluted logic was added to the InnoDB
crash recovery, and some extra synchronization (including a redo log
checkpoint) was introduced to make this work. This synchronization
has caused performance problems and race conditions, and the extra
log files cannot be copied or applied by external backup programs.
In order to support crash-upgrade from MariaDB 10.2, we will keep
the logic for parsing and applying the extra log files, but we will
no longer generate those files in TRUNCATE TABLE.
A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE
(with full redo and undo logging and proper rollback). This will
be implemented in MDEV-14717.
ha_innobase::truncate(): Invoke RENAME, create(), delete_table().
Because RENAME cannot be fully rolled back before MariaDB 10.3
due to missing undo logging, add some explicit rename-back in
case the operation fails.
ha_innobase::delete(): Introduce a variant that takes sqlcom as
a parameter. In TRUNCATE TABLE, we do not want to touch any
FOREIGN KEY constraints.
ha_innobase::create(): Add the parameters file_per_table, trx.
In TRUNCATE, the new table must be created in the same transaction
that renames the old table.
create_table_info_t::create_table_info_t(): Add the parameters
file_per_table, trx.
row_drop_table_for_mysql(): Replace a bool parameter with sqlcom.
row_drop_table_after_create_fail(): New function, wrapping
row_drop_table_for_mysql().
dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(),
fil_prepare_for_truncate(), fil_reinit_space_header_for_table(),
row_truncate_table_for_mysql(), TruncateLogger,
row_truncate_prepare(), row_truncate_rollback(),
row_truncate_complete(), row_truncate_fts(),
row_truncate_update_system_tables(),
row_truncate_foreign_key_checks(), row_truncate_sanity_checks():
Remove.
row_upd_check_references_constraints(): Remove a check for
TRUNCATE, now that the table is no longer truncated in place.
The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some
race-condition like scenarios. The test innodb-innodb.truncate does
not use any synchronization.
We add a redo log subformat to indicate backup-friendly format.
MariaDB 10.4 will remove support for the old TRUNCATE logging,
so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve
limitations.
== Undo tablespace truncation ==
MySQL 5.7 implements undo tablespace truncation. It is only
possible when innodb_undo_tablespaces is set to at least 2.
The logging is implemented similar to the WL#6501 TRUNCATE,
that is, using separate log files and a redo log checkpoint.
We can simply implement undo tablespace truncation within
a single mini-transaction that reinitializes the undo log
tablespace file. Unfortunately, due to the redo log format
of some operations, currently, the total redo log written by
undo tablespace truncation will be more than the combined size
of the truncated undo tablespace. It should be acceptable
to have a little more than 1 megabyte of log in a single
mini-transaction. This will be fixed in MDEV-17138 in
MariaDB Server 10.4.
recv_sys_t: Add truncated_undo_spaces[] to remember for which undo
tablespaces a MLOG_FILE_CREATE2 record was seen.
namespace undo: Remove some unnecessary declarations.
fil_space_t::is_being_truncated: Document that this flag now
only applies to undo tablespaces. Remove some references.
fil_space_t::is_stopping(): Do not refer to is_being_truncated.
This check is for tablespaces of tables. Potentially used
tablespaces are never truncated any more.
buf_dblwr_process(): Suppress the out-of-bounds warning
for undo tablespaces.
fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero
page number (new size of the tablespace in pages) to inform
crash recovery that the undo tablespace size has been reduced.
fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2
can be written for undo tablespaces (without .ibd file suffix)
for a nonzero page number.
os_file_truncate(): Add the parameter allow_shrink=false
so that undo tablespaces can actually be shrunk using this function.
fil_name_parse(): For undo tablespace truncation,
buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[].
recv_read_in_area(): Avoid reading pages for which no redo log
records remain buffered, after recv_addr_trim() removed them.
trx_rseg_header_create(): Add a FIXME comment that we could write
much less redo log.
trx_undo_truncate_tablespace(): Reinitialize the undo tablespace
in a single mini-transaction, which will be flushed to the redo log
before the file size is trimmed.
recv_addr_trim(): Discard any redo logs for pages that were
logged after the new end of a file, before the truncation LSN.
If the rec_list becomes empty, reduce n_addrs. After removing
any affected records, actually truncate the file.
recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before
applying any log records. The undo tablespace files must be open
at this point.
buf_flush_or_remove_pages(), buf_flush_dirty_pages(),
buf_LRU_flush_or_remove_pages(): Add a parameter for specifying
the number of the first page to flush or remove (default 0).
trx_purge_initiate_truncate(): Remove the log checkpoints, the
extra logging, and some unnecessary crash points. Merge the code
from trx_undo_truncate_tablespace(). First, flush all to-be-discarded
pages (beyond the new end of the file), then trim the space->size
to make the page allocation deterministic. At the only remaining
crash injection point, flush the redo log, so that the recovery
can be tested.
Remove plugin-load option from mariabackup. It does not needed to be an
option (we only need to store the plugin-load value during backup phase,
and reuse the same value during --prepare).
Fix is to read plugin-load from backup-my.cnf during prepare.
The MySQL 5.7 TRUNCATE TABLE is inherently incompatible
with hot backup, because it is creating and deleting a separate
log file, and it is not writing redo log for all changes of the
InnoDB data dictionary tables. Refuse to create a corrupted backup
if the unsafe form of TRUNCATE was executed.
Note: Undo log tablespace truncation cannot be detected easily.
Also it is incompatible with backup, for similar reasons.
xtrabackup_backup_func(): "Subscribe to" the log events before
the first invocation of xtrabackup_copy_logfile().
recv_parse_or_apply_log_rec_body(): If the function pointer
log_truncate is set, invoke it to report MLOG_TRUNCATE.
aws_key_management needs current directory to be datadir during
initalization, it scans current directory for encrypted keys.
Fix is to ensure, that plugin initialization in mariabackup happens
after the call to my_setwd(mysql_real_data_home).
srv_print_verbose_log: Introduce the value 2 to refer to
mariabackup --verbose.
recv_recover_page(), recv_parse_log_recs(): Add output for
mariabackup --verbose.
Commit dc9c555415 moved the final phase of
the redo log copying to the background thread. This would sometimes cause
too little redo log to be copied at the end of the backup. We would only
guarantee copying up to the latest redo log checkpoint. This would produce
a consistent backup, but it could refer to a too old point of time.
xtrabackup_copy_log(), xtrabackup_copy_logfile(): Add the parameter 'last'.
xtrabackup_backup_low(): Copy any remaining part of the log after the
backup threads have terminated.
Since MariaDB Server 10.2.2 (and MySQL 5.7), the default value of
innodb_checksum_algorithm is crc32 (CRC-32C), not the inefficient "innodb"
checksum. Change Mariabackup to use the same default, so that checksum
validation (when using the default algorithm on the server) will take less
time during mariabackup --backup. Also, mariabackup --prepare should be
a little faster, and the server should read backups faster, because the
page checksums would only be validated against CRC-32C.
log_copying_thread(): Keep copying redo log until the end has been
reached. (Previously, we would stop copying as soon as
the first batch of xtrabackup_copy_logfile() returned.)
log_copying: Remove. Use log_copying_running instead.
copy_logfile: Remove. Log copying will now only be invoked from
2 places: from xtrabackup_backup_func() for the initial batch,
and from log_copying_thread() until all of the log has been read.
Use the global variable metadata_to_lsn for determining if the
final part of the log is being copied.
xtrabackup_copy_log(): Add diagnostic messages for terminating
the copying. These messages should be dead code, because
log_group_read_log_seg() should be checking for the same.
xtrabackup_copy_logfile(): Correct the retrying logic.
If anything was successfully read, process the portion that
was read. On failure, let the caller close dst_log_file.
io_watching_thread(): Stop throttling during the last phase
of copying the log (metadata_to_lsn!=0). The final copying
of the log will now be performed in log_copying_thread().
stop_backup_threads(): Clean up the message about stopping
the log copying thread.
xtrabackup_backup_low(): Read metadata_to_lsn from the latest
checkpoint header page, even if it is the first page.
Let the log_copying_thread take care of copying all of
the redo log.
CIFS does not like O_DIRECT flag (it is set successfully, but pread would
fail).
The fix is not to use O_DIRECT, there is not need for it.
posix_fadvise() was used already that should prevent buffer cache
pollution on Linux.
As recommended by documentation of posix_fadvise(), we'll also fsync()
tablespaces after a batch of writes.
- recovered_lsn shouldn't be initialized during xtrabackup_copy_logfile().
If partial redo log read during the end of xtrabackup_copy_logfile() then
recovered_lsn will be different from scanned_lsn. Re-initialization of
recovered_lsn could lead to partial read again. It is a regression of
MDEV-14545
concurrently.
There is a deadlock between
C1 mariabackup's connection that holds MDL locks
C2 Online ALTER TABLE that wants to have MDL exclusively
and tries to upgrade its mdl lock.
C3 another mariabackup's connection that does FLUSH TABLES (or FTWRL)
C3 waits waits for C2, which waits for C1, which waits for C3,
thus the deadlock.
MDL locks cannot be released until FLUSH succeeds, because
otherwise it would allow ALTER to sneak in, causing backup to abort and
breaking lock-ddl-per-table's promise.
The fix here workarounds the deadlock, by killing connections in
"Waiting for metadata lock" status (i.e ALTER). This killing continues
until FTWRL succeeds.
Killing connections is skipped in case --no-locks parameter
was passed to backup, because there won't be a FLUSH.
For the reference,in Percona's xtrabackup --lock-ddl-per-connection
silently implies --no-lock ie FLUSH is always skipped there.
A rather large part of fix is introducing DBUG capability to start
a query the new connection at the right moment of backup
compensating somewhat for mariabackup' lack of send_query or DBUG_SYNC.
Revert the dead code for MySQL 5.7 multi-master replication (GCS),
also known as
WL#6835: InnoDB: GCS Replication: Deterministic Deadlock Handling
(High Prio Transactions in InnoDB).
Also, make innodb_lock_schedule_algorithm=vats skip SPATIAL INDEX,
because the code does not seem to be compatible with them.
Add FIXME comments to some SPATIAL INDEX locking code. It looks
like Galera write-set replication might not work with SPATIAL INDEX.
The merge only covered 10.1 up to
commit 4d248974e0.
Actually merge the changes up to
commit 0a534348c7.
Also, remove the unused InnoDB field trx_t::abort_type.
Problem:
=======
Mariabackup exits during prepare phase if it encounters
MLOG_INDEX_LOAD redo log record. MLOG_INDEX_LOAD record
informs Mariabackup that the backup cannot be completed based
on the redo log scan, because some information is purposely
omitted due to bulk index creation in ALTER TABLE.
Solution:
========
Detect the MLOG_INDEX_LOAD redo record during backup phase and
exit the mariabackup with the proper error message.
When Mariabackup gets a bad read of the first page of the system
tablespace file, it would inappropriately try to apply the doublewrite
buffer and write changes back to the data file (to the source file)!
This is very wrong and must be prevented.
The correct action would be to retry reading the system tablespace
as well as any other files whose first page was read incorrectly.
Fixing this was not attempted.
xb_load_tablespaces(): Shorten a bogus message to be more relevant.
The message can be displayed by --backup or --prepare.
xtrabackup_backup_func(), os_file_write_func(): Add a missing space
to a message.
Datafile::restore_from_doublewrite(): Do not even attempt the
operation in Mariabackup.
recv_init_crash_recovery_spaces(): Do not attempt to restore the
doublewrite buffer in Mariabackup (--prepare or --export), because
all pages should have been copied correctly in --backup already,
and because --backup should ignore the doublewrite buffer.
SysTablespace::read_lsn_and_check_flags(): Do not attempt to initialize
the doublewrite buffer in Mariabackup.
innodb_make_page_dirty(): Correct the bounds check.
Datafile::read_first_page(): Correct the name of the parameter.
Solve 3 way deadlock between plugin_initialiaze(), THD::init() and
mysql_sys_var_char().
The deadlock exists because of the lock order inversion between
LOCK_global_system_variables mutex and LOCK_system_variables_hash
read-write lock-
In this case, it is enough to change LOCK_system_variables_hash to prefer
reads to fix the deadlock, i.e change it to mysql_prlock_t
FULLTEXT auxiliary tables
Change the logic to take mdl lock on all tables before tablespaces are
copied, rather than lock every single tablespace just before it is
copied.
This bug affects both writing and reading encrypted redo log in
MariaDB 10.1, starting from version 10.1.3 which added support for
innodb_encrypt_log. That is, InnoDB crash recovery and Mariabackup
will sometimes fail when innodb_encrypt_log is used.
MariaDB 10.2 or Mariabackup 10.2 or later versions are not affected.
log_block_get_start_lsn(): Remove. This function would cause trouble if
a log segment that is being read is crossing a 32-bit boundary of the LSN,
because this function does not allow the most significant 32 bits of the
LSN to change.
log_blocks_crypt(), log_encrypt_before_write(), log_decrypt_after_read():
Add the parameter "lsn" for the start LSN of the block.
log_blocks_encrypt(): Remove (unused function).
find_type_or_exit() client helper did exit(1) on error, exit(1) moved to
clients.
mysql_read_default_options() did exit(1) on error, error is passed through and
handled now.
my_str_malloc_default() did exit(1) on error, replaced my_str_ allocator
functions with normal my_malloc()/my_realloc()/my_free().
sql_connect.cc did many exit(1) on hash initialisation failure. Removed error
check since my_hash_init() never fails.
my_malloc() did exit(1) on error. Replaced with abort().
my_load_defaults() did exit(1) on error, replaced with return 2.
my_load_defaults() still does exit(0) when invoked with --print-defaults.
When Mariabackup is invoked on an instance that uses a multi-file
InnoDB system tablespace, it may fail to other files of the system
tablespace than the first one.
This was revealed by the MDEV-14447 test case.
The offending code is assuming that the first page of each data file
is page 0. But, in multi-file system tablespaces that is not the case.
xb_fil_cur_open(): Instead of re-reading the first page of the file,
rely on the fil_space_t metadata that already exists in memory.
xb_get_space_flags(): Remove.
for multi-file innodb_data_file_path.
Use fil_extend_space_to_desired_size() to correctly extend system
tablespace. Make sure to get tablespace size from the first tablespace
part.