This patch fixes an issue with launching mariabackup during SST
(when used with Galera), when during bootstrap mariabackup receives
the "--innodb" option, which is incorrectly interpreted as shortcut
for "--innodb-force-recovery". This patch does not require separate
test for mtr, as the problem is visible in general testing on
buildbot.
During the prepare phase of restoring backups, "mariabackup" does
not seem to allow (or recognize) the option "innodb_force_recovery"
for the embedded InnoDB server instance that it starts.
If page corruption observed during page recovery, the prepare step
fails. While this is indeed the correct behavior ideally, allowing
this option to be set in case of emergencies might be useful when
the current backup is the only copy available. Some error messages
during "--prepare" suggest to set "innodb_force_recovery" to 1:
[ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
For backwards compatibility, "mariabackup --innobackupex --apply-log"
should also have this option.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
innobase_space_shutdown(): Remove. We want this step to be executed
before the message "InnoDB: Shutdown completed; log sequence number "
is output by innodb_shutdown(). It used to be executed after that step.
innodb_shutdown(): Duplicate the code that used to live in
innobase_space_shutdown().
innobase_init_abort(): Merge with innobase_space_shutdown().
The new option --log-innodb-page-corruption is introduced.
When this option is set, backup is not interrupted if innodb corrupted
page is detected. Instead it logs all found corrupted pages in
innodb_corrupted_pages file in backup directory and finishes with error.
For incremental backup corrupted pages are also copied to .delta file,
because we can't do LSN check for such pages during backup,
innodb_corrupted_pages will also be created in incremental backup
directory.
During --prepare, corrupted pages list is read from the file just after
redo log is applied, and each page from the list is checked if it is allocated
in it's tablespace or not. If it is not allocated, then it is zeroed out,
flushed to the tablespace and removed from the list. If all pages are removed
from the list, then --prepare is finished successfully and
innodb_corrupted_pages file is removed from backup directory. Otherwise
--prepare is finished with error message and innodb_corrupted_pages contains
the list of the pages, which are detected as corrupted during backup, and are
allocated in their tablespaces, what means backup directory contains corrupted
innodb pages, and backup can not be considered as consistent.
For incremental --prepare corrupted pages from .delta files are applied
to the base backup, innodb_corrupted_pages is read from both base in
incremental directories, and the same action is proceded for corrupted
pages list as for full --prepare. innodb_corrupted_pages file is
modified or removed only in base directory.
If DDL happens during backup, it is also processed at the end of backup
to have correct tablespace names in innodb_corrupted_pages.
mariabackup deallocated uninitialized
write_filt_ctxt.u.wf_incremental_ctxt in xtrabackup_copy_datafile() when
some table should be skipped due to parsed DDL redo log record.
The problem:
When incremental backup is taken, delta files are created for innodb tables
which are marked as new tables during innodb ddl tracking. When such
tablespace is tried to be opened during prepare in
xb_delta_open_matching_space(), it is "created", i.e.
xb_space_create_file() is invoked, instead of opening, even if
a tablespace with the same name exists in the base backup directory.
xb_space_create_file() writes page 0 header the tablespace.
This header does not contain crypt data, as mariabackup does not have
any information about crypt data in delta file metadata for
tablespaces.
After delta file is applied, recovery process is started. As the
sequence of recovery for different pages is not defined, there can be
the situation when crypt data redo log event is executed after some
other page is read for recovery. When some page is read for recovery, it's
decrypted using crypt data stored in tablespace header in page 0, if
there is no crypt data, the page is not decryped and does not pass corruption
test.
This causes error for incremental backup --prepare for encrypted
tablespaces.
The error is not stable because crypt data redo log event updates crypt
data on page 0, and recovery for different pages can be executed in
undefined order.
The fix:
When delta file is created, the corresponding write filter copies only
the pages which LSN is greater then some incremental LSN. When new file
is created during incremental backup, the LSN of all it's pages must be
greater then incremental LSN, so there is no need to create delta for
such table, we can just copy it completely.
The fix is to copy the whole file which was tracked during incremental backup
with innodb ddl tracker, and copy it to base directory during --prepare
instead of delta applying.
There is also DBUG_EXECUTE_IF() in innodb code to avoid writing redo log
record for crypt data updating on page 0 to make the test case stable.
Note:
The issue is not reproducible in 10.5 as optimized DDL's are deprecated
in 10.5. But the fix is still useful because it allows to decrease
data copy size during backup, as delta file contains some extra info.
The test case should be removed for 10.5 as it will always pass.
To fix this, it is necessary to add an option to exclude the
database with the name "lost+found" from processing (the database
name will be checked by the check_if_skip_database_by_path() or
by the check_if_skip_database() function, and as a result
"lost+found" will be skipped).
In addition, it is necessary to slightly modify the verification
logic in the check_if_skip_database() function.
Also added a new test galera_sst_mariabackup_lost_found.test
MDEV-13318 introduced a condition to Mariabackup that can cause it to
hang if the server goes idle after writing a log block that has no
payload after the 12-byte header. Normal recovery in log0recv.cc would
allow blocks with exactly 12 bytes of length, and only reject blocks
where the length field is shorter than that.
log_group_read_log_seg() returns error when:
1) Calculated log block number does not correspond to read log block
number. This can be caused by:
a) Garbage or an incompletely written log block. We can exclude this
case by checking log block checksum if it's enabled(see innodb-log-checksums,
encrypted log block contains checksum always).
b) The log block is overwritten. In this case checksum will be correct and
read log block number will be greater then requested one.
2) When log block length is wrong. In this case recv_sys->found_corrupt_log
is set.
3) When redo log block checksum is wrong. In this case innodb code
writes messages to error log with the following prefix: "Invalid log
block checksum."
The fix processes all the cases above.
In commit fe39d02f51 (MDEV-20638)
we removed some wake-up signaling of the master thread that should
have been there, to ensure a steady log checkpointing workload.
Common sense suggests that the commit omitted some necessary calls
to srv_inc_activity_count(). But, an attempt to add the call to
trx_flush_log_if_needed_low() as well as to reinstate the function
innobase_active_small() did not restore the performance for the
case where sync_binlog=1 is set.
Therefore, we will revert the entire commit in MariaDB Server 10.2.
In MariaDB Server 10.5, adding a srv_inc_activity_count() call to
trx_flush_log_if_needed_low() did restore the performance, so we
will not revert MDEV-20638 across all versions.
Regretfully, the parameter innodb_log_checksums was introduced
in MySQL 5.7.9 (the first GA release of that series) by
mysql/mysql-server@af0acedd88
which partly replaced a parameter that had been introduced in 5.7.8
mysql/mysql-server@22ba38218e
as innodb_log_checksum_algorithm.
Given that the CRC-32C operations are accelerated on many processor
implementations (AMD64 with SSE4.2; since MDEV-22669 also on IA-32
with SSE4.2, POWER 8 and later, ARMv8 with some extensions)
and by lookup tables when only generic SISD instructions are available,
there should be no valid reason to disable checksums.
In MariaDB 10.5.2, as a preparation for MDEV-12353, MDEV-19543 deprecated
and ignored the parameter innodb_log_checksums altogether. This should
imply that after a clean shutdown with innodb_log_checksums=OFF one
cannot upgrade to MariaDB Server 10.5 at all.
Due to these problems, let us deprecate the parameter innodb_log_checksums
and honor it only during server startup.
The command SET GLOBAL innodb_log_checksums will always set the
parameter to ON.
- Due to commit fe95cb2e40 (MDEV-16125),
InnoDB master thread does not need to call srv_resume_thread()
and therefore there is no need to wake up the thread.
Due to the above patch, InnoDB should remove the following dead code.
srv_check_activity(): Makes the parameter as in,out and returns the
recent activity value
innobase_active_small(): Removed
srv_active_wake_master_thread(): Removed
srv_wake_master_thread(): Removed
srv_active_wake_master_thread_low(): Removed
Simplify srv_master_thread() and remove switch cases, added the assert.
Replace srv_wake_master_thread() with srv_inc_activity_count()
INNOBASE_WAKE_INTERVAL: Removed
error is logged
The fix is to set flag in ib::error::~error() and check it in
mariabackup.
ib::error::error() is replaced with ib::warn::warn() in
AIO::linux_create_io_ctx() because of two reasons:
1) if we leave it as is, then mariabackup MTR tests will fail with --mem
option, because Linux AIO can not be used on tmpfs,
2) when Linux AIO can not be initialized, InnoDB falls back to simulated
AIO, so such sutiation is not fatal error, it should be treated as warning.
config.
The solution is to read the system variable value on startup and to fill
databases_exclude_hash.
xb_load_list_string() became non-static and was reformatted. The system
variable value is read and processed in get_mysql_vars(), which was also
reformatted.
was restored.
Optionally rollback prepared XA's on "mariabackup --prepare".
The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves.
Setting "streamfmt=mbstream" in the "[sst]" section causes SST to fail
because the format automatically switches to 'tar' by default (insead
of mbstream).
To fix this, we need to add mbstream to the list of valid values for
the format, making it synonymous with xbstream. This must be done both
in the SST script and when parsing the options of the corresponding
utilities.
- Moved the recv_sys->heap memory condition inside recv_parse_log_recs().
So that, InnoDB can mark the status as STORE_NO earlier.
- InnoDB uses one third of buffer pool chunk size for reading the redo
log records. In that case, we can avoid the scenario where buffer ran
out of memory issue during recovery.
copy thread)
mariabackup hangs waiting until innodb redo log thread read log till certain
LSN, and it waits under FTWRL. If there is redo log read error in the thread,
it is finished, and main thread knows nothing about it, what leads to hanging.
As it hangs under FTWRL, slave threads on server side can be blocked due
to MDL lock conflict.
The fix is to finish mariabackup with error message on innodb redo log read
failure.
dict_table_rename_in_cache(): Use strcpy() instead of strncpy(),
because they are known to be equivalent in this case (the length
of old_name was already validated).
mariabackup: Invoke strncpy() with one less than the buffer size,
and explicitly add NUL as the last byte of the buffer.
When "--export" mariabackup option is used, mariabackup starts the server in
bootstrap mode to generate *.cfg files for the certain innodb tables.
The started instance of the server reads options from the file, pointed
out in "--defaults-file" mariabackup option.
If the server uses the same config file as mariabackup, and binlog is
switched on in that config file, then "mariabackup --prepare --export"
will create binary log files in the server's binary log directory, what
can cause issues.
The fix is to add "--skip-log-bin" in mysld options when the server is
started to generate *.cfg files.
The general reason why innodb redo log file is limited by 512G is that
log_block_convert_lsn_to_no() returns value limited by 1G. But there is no
need to have unique log block numbers in log group. The fix removes 512G
limit and limits log group size by
(uint32_t maximum value) * (minimum page size), which, in turns, can be
removed if fil_io() is no longer used for innodb redo log io.
In some cases it's possible that InnoDB redo log file header is re-written so,
that checkpoint lsn and checkpoint lsn offset are updated, but checkpoint
number stays the same. The fix is to re-read redo log header if at least one
of those three parametes is changed at backup start.
Repeat the logic of log_group_checkpoint() on choosing InnoDB checkpoint info
field on backup start. This does not influence backup correctness, but
simplifies bugs analysis.
If required privilege is missing, dump the output from "SHOW GRANTS"
into mariabackup log.
This will help troubleshooting, and make the bug reproducible.
It's a micro optimization. On most platforms CPUs has instructions to
compare with 0 fast. DB_SUCCESS is the most popular outcome of functions
and this patch optimized code like (err == DB_SUCCESS)
BtrBulk::finish(): bogus assertion fixed
fil_node_t::read_page0(): corrected usage of os_file_read()
que_eval_sql(): bugus assertion removed. Apparently it checked that
the field was assigned after having been zero-initialized at
object creation.
It turns out that the return type of os_file_read_func() was changed
in mysql/mysql-server@98909cefbc (MySQL 5.7)
from ibool to dberr_t. The reviewer (if there was any) failed to
point out that because of future merges, it could be a bad idea to
change the return type of a function without changing the function name.
This change was applied to MariaDB 10.2.2 in
commit 2e814d4702 but the
MariaDB-specific code was not fully adjusted accordingly,
e.g. in fil_node_open_file(). Essentially, code like
!os_file_read(...) became dead code in MariaDB and later
in Mariabackup 10.2, and we could be dealing with an uninitialized
buffer after a failed page read.
os_mem_alloc_large(): Invoke the macro ut_2pow_round() with the
correct argument type.
innobase_large_page_size, innobase_use_large_pages,
os_use_large_pages, os_large_page_size: Remove.
Simply refer to opt_large_page_size, my_use_large_pages.
xtrabackup_backup_func(): If the log checkpoint header changed
since we last read it, search for the most recent checkpoint again.
Otherwise, we could corrupt the backup of the redo log, because the
least significant bits of checkpoint_lsn_start would not match
log_sys->log.lsn.
The recv_sys data structures are accessed not only from the thread
that executes InnoDB plugin initialization, but also from the
InnoDB I/O threads, which can invoke recv_recover_page().
Assert that sufficient concurrency control is in place.
Some code was accessing recv_sys data structures without
holding recv_sys->mutex.
recv_recover_page(bpage): Refactor the call from buf_page_io_complete()
into a separate function that performs necessary steps. The
main thread was unnecessarily releasing and reacquiring recv_sys->mutex.
recv_recover_page(block,mtr,recv_addr): Pass more parameters from
the caller. Avoid redundant lookups and computations. Eliminate some
redundant variables.
recv_get_fil_addr_struct(): Assert that recv_sys->mutex is being held.
That was not always the case!
recv_scan_log_recs(): Acquire recv_sys->mutex for the whole duration
of the function. (While we are scanning and buffering redo log records,
no pages can be read in.)
recv_read_in_area(): Properly protect access with recv_sys->mutex.
recv_apply_hashed_log_recs(): Check recv_addr->state only once,
and continuously hold recv_sys->mutex. The mutex will be released
and reacquired inside recv_recover_page() and recv_read_in_area(),
allowing concurrent processing by buf_page_io_complete() in I/O threads.