MySQL 5.7 introduced the class page_size_t and increased the size of
buffer pool page descriptors by introducing this object to them.
Maybe the intention of this exercise was to prepare for a future
where the buffer pool could accommodate multiple page sizes.
But that future never arrived, not even in MySQL 8.0. It is much
easier to manage a pool of a single page size, and typically all
storage devices of an InnoDB instance benefit from using the same
page size.
Let us remove page_size_t from MariaDB Server. This will make it
easier to remove support for ROW_FORMAT=COMPRESSED (or make it a
compile-time option) in the future, just by removing various
occurrences of zip_size.
If an encrypted table is created during backup, then
mariabackup --backup could wrongly fail.
This caused a failure of the test mariabackup.huge_lsn once on buildbot.
This is due to the way how InnoDB creates .ibd files. It would first
write a dummy page 0 with no encryption information. Due to this,
xb_fil_cur_open() could wrongly interpret that the table is not encrypted.
Subsequently, page_is_corrupted() would compare the computed page
checksum to the wrong checksum. (There are both "before" and "after"
checksums for encrypted pages.)
To work around this problem, we introduce a Boolean option
--backup-encrypted that is enabled by default. With this option,
Mariabackup will assume that a nonzero key_version implies that the
page is encrypted. We need this option in order to be able to copy
encrypted tables from MariaDB 10.1 or 10.2, because unencrypted pages
that were originally created before MySQL 5.1.48 could contain nonzero
garbage in the fields that were repurposed for encryption.
Later, MDEV-18128 would clean up the way how .ibd files are created,
to remove the need for this option.
page_is_corrupted(): Add missing const qualifiers, and do not check
space->crypt_data unless --skip-backup-encrypted has been specified.
xb_fil_cur_read(): After a failed page read, output a page dump.
- Refactor code to isolate page validation in page_is_corrupted() function.
- Introduce --extended-validation parameter(default OFF) for mariabackup
--backup to enable decryption of encrypted uncompressed pages during
backup.
- mariabackup would still always check checksum on encrypted data,
it is needed to detect partially written pages.
Write a test case that computes valid crc32 checksums for
an encrypted page, but zeroes out the payload area, so
that the checksum after decryption fails.
xb_fil_cur_read(): Validate the page number before trying
any checksum calculation or decrypting or decompression.
Also, skip zero-filled pages. For page_compressed pages,
ensure that the FIL_PAGE_TYPE was changed. Also, reject
FIL_PAGE_PAGE_COMPRESSED_ENCRYPTED if no decryption was attempted.
Problem:
=======
Mariabackup seems to fail to verify the pages of compressed tables.
The reason is that both fil_space_verify_crypt_checksum() and
buf_page_is_corrupted() will skip the validation for compressed pages.
Fix:
====
Mariabackup should call fil_page_decompress() for compressed and encrypted
compressed page. After that, call buf_page_is_corrupted() to
check the page corruption.
The initial fix only covered a part of Mariabackup.
This fix hardens InnoDB and XtraDB in a similar way, in order
to reduce the probability of mistaking a corrupted encrypted page
for a valid unencrypted one.
This is based on work by Thirunarayanan Balathandayuthapani.
fil_space_verify_crypt_checksum(): Assert that key_version!=0.
Let the callers guarantee that. Now that we have this assertion,
we also know that buf_page_is_zeroes() cannot hold.
Also, remove all diagnostic output and related parameters,
and let the relevant callers emit such messages.
Last but not least, validate the post-encryption checksum
according to the innodb_checksum_algorithm (only accepting
one checksum for the strict variants), and no longer
try to validate the page as if it was unencrypted.
buf_page_is_zeroes(): Move to the compilation unit of the only callers,
and declare static.
xb_fil_cur_read(), buf_page_check_corrupt(): Add a condition before
calling fil_space_verify_crypt_checksum(). This is a non-functional
change.
buf_dblwr_process(): Validate the page only as encrypted or unencrypted,
but not both.
After validating the post-encryption checksum on an encrypted page,
Mariabackup should decrypt the page and validate the pre-encryption
checksum as well. This should reduce the probability of accepting
invalid pages as valid ones.
This is a backport and refactoring of a patch that was
originally written by Thirunarayanan Balathandayuthapani
for the 10.2 branch.
fil_space_t::add(): Replaces fil_node_create(), fil_node_create_low().
Let the caller pass fil_node_t::handle, to avoid having to close and
re-open files.
fil_node_t::read_page0(): Refactored from fil_node_open_file().
Read the first page of a data file.
fil_node_open_file(): Open the file only once.
srv_undo_tablespace_open(): Set the file handle for the opened
undo tablespace. This should ensure that ut_ad(file->is_open())
no longer fails in recv_add_trim().
xtrabackup_backup_func(): Remove some dead code.
xb_fil_cur_open(): Open files only if needed. Undo tablespaces
should already have been opened.
extra/mariabackup/fil_cur.cc:361:42: warning: format specifies type 'unsigned long' but the argument has type 'ib_int64_t' (aka 'long long') [-Wformat]
extra/mariabackup/fil_cur.cc:376:9: warning: format specifies type 'unsigned long' but the argument has type 'ib_int64_t' (aka 'long long') [-Wformat]
sql/handler.cc:6196:45: warning: format specifies type 'unsigned long' but the argument has type 'wsrep_trx_id_t' (aka 'unsigned long long') [-Wformat]
sql/log.cc:1681:16: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
sql/log.cc:1687:16: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
sql/wsrep_sst.cc:1388:86: warning: format specifies type 'long' but the argument has type 'wsrep_seqno_t' (aka 'long long') [-Wformat]
sql/wsrep_sst.cc:232:86: warning: format specifies type 'long' but the argument has type 'wsrep_seqno_t' (aka 'long long') [-Wformat]
storage/connect/filamdbf.cpp:450:47: warning: format specifies type 'short' but the argument has type 'int' [-Wformat]
storage/connect/filamdbf.cpp:970:47: warning: format specifies type 'short' but the argument has type 'int' [-Wformat]
storage/connect/inihandl.cpp:197:16: warning: address of array 'key->name' will always evaluate to 'true' [-Wpointer-bool-conversion]
storage/innobase/btr/btr0scrub.cc:151:17: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/buf/buf0buf.cc:5085:8: warning: nonnull parameter 'bpage' will evaluate to 'true' on first encounter [-Wpointer-bool-conversion]
storage/innobase/fil/fil0crypt.cc:2454:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/handler/ha_innodb.cc:18685:7: warning: format specifies type 'unsigned long' but the argument has type 'wsrep_trx_id_t' (aka 'unsigned long long') [-Wformat]
storage/innobase/row/row0mysql.cc:3319:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/row/row0mysql.cc:3327:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/maria/ma_norec.c:35:10: warning: implicit conversion from 'int' to 'my_bool' (aka 'char') changes value from 131 to -125 [-Wconstant-conversion]
storage/maria/ma_norec.c:42:10: warning: implicit conversion from 'int' to 'my_bool' (aka 'char') changes value from 131 to -125 [-Wconstant-conversion]
storage/maria/ma_test2.c:1009:12: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
storage/maria/ma_test2.c:1010:12: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
storage/mroonga/ha_mroonga.cpp:9189:44: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
storage/mroonga/vendor/groonga/lib/expr.c:4987:22: warning: comparison of constant -1 with expression of type 'grn_operator' is always false [-Wtautological-constant-out-of-range-compare]
storage/xtradb/btr/btr0scrub.cc:151:17: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/buf/buf0buf.cc:5047:8: warning: nonnull parameter 'bpage' will evaluate to 'true' on first encounter [-Wpointer-bool-conversion]
storage/xtradb/fil/fil0crypt.cc:2454:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/row/row0mysql.cc:3324:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/row/row0mysql.cc:3332:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
unittest/sql/mf_iocache-t.cc:120:35: warning: format specifies type 'unsigned long' but the argument has type 'int' [-Wformat]
unittest/sql/mf_iocache-t.cc:96:35: note: expanded from macro 'INFO_TAIL'
Implement innodb_flush_method as an enum parameter in Mariabackup,
instead of ignoring the option and hard-wiring it to a default value.
xb0xb.h: Remove. Only xtrabackup.cc refers to the enum parameters.
innodb_flush_method_names[], innodb_flush_method_typelib[]:
Define as non-static, so that mariabackup can share the definitions.
srv_file_flush_method: Change the type to ulong, to match the
assignment in init_one_value() and handle_options() in mariabackup.
InnoDB always keeps all tablespaces in the fil_system cache.
The fil_system.LRU is only for closing file handles; the
fil_space_t and fil_node_t for all data files will remain
in main memory. Between startup to shutdown, they can only be
created and removed by DDL statements. Therefore, we can
let dict_table_t::space point directly to the fil_space_t.
dict_table_t::space_id: A numeric tablespace ID for the corner cases
where we do not have a tablespace. The most prominent examples are
ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file.
There are a few functional differences; most notably:
(1) DROP TABLE will delete matching .ibd and .cfg files,
even if they were not attached to the data dictionary.
(2) Some error messages will report file names instead of numeric IDs.
There still are many functions that use numeric tablespace IDs instead
of fil_space_t*, and many functions could be converted to fil_space_t
member functions. Also, Tablespace and Datafile should be merged with
fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use
fil_space_t& instead of a numeric ID, and after moving to a single
buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to
fil_space_t::page_hash.
FilSpace: Remove. Only few calls to fil_space_acquire() will remain,
and gradually they should be removed.
mtr_t::set_named_space_id(ulint): Renamed from set_named_space(),
to prevent accidental calls to this slower function. Very few
callers remain.
fseg_create(), fsp_reserve_free_extents(): Take fil_space_t*
as a parameter instead of a space_id.
fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(),
fil_name_write_rename(), fil_rename_tablespace(). Mariabackup
passes the parameter log=false; InnoDB passes log=true.
dict_mem_table_create(): Take fil_space_t* instead of space_id
as parameter.
dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter
'status' with 'bool cached'.
dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name.
fil_ibd_open(): Return the tablespace.
fil_space_t::set_imported(): Replaces fil_space_set_imported().
truncate_t: Change many member function parameters to fil_space_t*,
and remove page_size parameters.
row_truncate_prepare(): Merge to its only caller.
row_drop_table_from_cache(): Assert that the table is persistent.
dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL
if the tablespace has been discarded.
row_import_update_discarded_flag(): Remove a constant parameter.
When Mariabackup is invoked on an instance that uses a multi-file
InnoDB system tablespace, it may fail to other files of the system
tablespace than the first one.
This was revealed by the MDEV-14447 test case.
The offending code is assuming that the first page of each data file
is page 0. But, in multi-file system tablespaces that is not the case.
xb_fil_cur_open(): Instead of re-reading the first page of the file,
rely on the fil_space_t metadata that already exists in memory.
xb_get_space_flags(): Remove.
- Simplified use_trans_cache() to return at once if is_transactional is set
- Indentation and spelling errors fixed
- Don't call signal_update() if update_binlog_end_pos() is called as the
function already calls signal_update()
- Removed not used function wait_for_update_bin_log(), which would cause
errors if ever used.
- Simplified handler::clone() by always allocating 'ref' in ha_open(). To do
this I added an optional MEM_ROOT argument to ha_open() to be used when
allocating 'ref'
- Changed arguments to get_system_var() from LEX_CSTRING to LEX_CSTRING*
- Added THD as argument to create_select_for_variable(). Changed also char*
argument to LEX_CSTRING to avoid strlen() call.
- Change calls to append() to use LEX_CSTRING
The fix broke mariabackup --prepare --incremental.
The restore of an incremental backup starts up (parts of) InnoDB twice.
First, all data files are discovered for applying .delta files. Then,
after the .delta files have been applied, InnoDB will be restarted
more completely, so that the redo log records will be applied via the
buffer pool.
During the first startup, the buffer pool is not initialized, and thus
trx_rseg_get_n_undo_tablespaces() must not be invoked. The apply of
the .delta files will currently assume that the --innodb-undo-tablespaces
option correctly specifies the number of undo tablespace files, just
like --backup does.
The second InnoDB startup of --prepare for applying the redo log will
properly invoke trx_rseg_get_n_undo_tablespaces().
enum srv_operation_mode: Add SRV_OPERATION_RESTORE_DELTA for
distinguishing the apply of .delta files from SRV_OPERATION_RESTORE.
srv_undo_tablespaces_init(): In mariabackup --prepare --incremental,
in the initial SRV_OPERATION_RESTORE_DELTA phase, do not invoke
trx_rseg_get_n_undo_tablespaces() because the buffer pool or the
redo logs are not available. Instead, blindly rely on the parameter
--innodb-undo-tablespaces.
When using innodb_page_size=16k, InnoDB tables
that were created in MariaDB 10.1.0 to 10.1.20 with
PAGE_COMPRESSED=1 and
PAGE_COMPRESSION_LEVEL=2 or PAGE_COMPRESSION_LEVEL=3
would fail to load.
fsp_flags_is_valid(): When using innodb_page_size=16k, use a
more strict check for .ibd files, with the assumption that
nobody would try to use different-page-size files.
InnoDB I/O and buffer pool interfaces and the redo log format
have been changed between MariaDB 10.1 and 10.2, and the backup
code has to be adjusted accordingly.
The code has been simplified, and many memory leaks have been fixed.
Instead of the file name xtrabackup_logfile, the file name ib_logfile0
is being used for the copy of the redo log. Unnecessary InnoDB startup and
shutdown and some unnecessary threads have been removed.
Some help was provided by Vladislav Vaintroub.
Parameters have been cleaned up and aligned with those of MariaDB 10.2.
The --dbug option has been added, so that in debug builds,
--dbug=d,ib_log can be specified to enable diagnostic messages
for processing redo log entries.
By default, innodb_doublewrite=OFF, so that --prepare works faster.
If more crash-safety for --prepare is needed, double buffering
can be enabled.
The parameter innodb_log_checksums=OFF can be used to ignore redo log
checksums in --backup.
Some messages have been cleaned up.
Unless --export is specified, Mariabackup will not deal with undo log.
The InnoDB mini-transaction redo log is not only about user-level
transactions; it is actually about mini-transactions. To avoid confusion,
call it the redo log, not transaction log.
We disable any undo log processing in --prepare.
Because MariaDB 10.2 supports indexed virtual columns, the
undo log processing would need to be able to evaluate virtual column
expressions. To reduce the amount of code dependencies, we will not
process any undo log in prepare.
This means that the --export option must be disabled for now.
This also means that the following options are redundant
and have been removed:
xtrabackup --apply-log-only
innobackupex --redo-only
In addition to disabling any undo log processing, we will disable any
further changes to data pages during --prepare, including the change
buffer merge. This means that restoring incremental backups should
reliably work even when change buffering is being used on the server.
Because of this, preparing a backup will not generate any further
redo log, and the redo log file can be safely deleted. (If the
--export option is enabled in the future, it must generate redo log
when processing undo logs and buffered changes.)
In --prepare, we cannot easily know if a partial backup was used,
especially when restoring a series of incremental backups. So, we
simply warn about any missing files, and ignore the redo log for them.
FIXME: Enable the --export option.
FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write
a test that initiates a backup while an ALGORITHM=INPLACE operation
is creating indexes or rebuilding a table. An error should be detected
when preparing the backup.
FIXME: In --incremental --prepare, xtrabackup_apply_delta() should
ensure that if FSP_SIZE is modified, the file size will be adjusted
accordingly.