MemorySanitizer (clang -fsanitize=memory) requires that all code
be compiled with instrumentation enabled. The only exception is the
C runtime library. Failure to use instrumented libraries will cause
bogus messages about memory being uninitialized.
In WITH_MSAN builds, we must avoid calling getservbyname(),
because even though it is a standard library function, it is
not instrumented, not even in clang 10.
Note: Before MariaDB Server 10.5, ./mtr will typically fail
due to the old PCRE library, which was updated in MDEV-14024.
The following cmake options were tested on 10.5
in commit 94d0bb4dbe:
cmake \
-DCMAKE_C_FLAGS='-march=native -O2' \
-DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \
-DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \
-DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \
-DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DWITH_SAFEMALLOC=OFF \
-DWITH_{ZLIB,SSL,PCRE}=bundled \
-DHAVE_LIBAIO_H=0 \
-DWITH_MSAN=ON
MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED()
and __msan_unpoison().
MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for
VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow().
InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros.
ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in
functions instead of inline assembler when building WITH_MSAN.
This will require at least -msse4.2 when building for IA-32 or AMD64.
The inline assembler would not be instrumented, and would thus cause
bogus failures.
The background drop table queue in InnoDB is a work-around for
cases where the SQL layer is requesting DDL on tables on which
transactional locks exist.
One such case are XA transactions. Our test case exploits the
fact that the recovery of XA PREPARE transactions will
only resurrect InnoDB table locks, but not MDL that should
block any concurrent DDL.
srv_shutdown_t: Introduce the srv_shutdown_state=SRV_SHUTDOWN_INITIATED
for the initial part of shutdown, to wait for the background drop
table queue to be emptied.
srv_shutdown_bg_undo_sources(): Assign
srv_shutdown_state=SRV_SHUTDOWN_INITIATED
before waiting for the background drop table queue to be emptied.
row_drop_tables_for_mysql_in_background(): On slow shutdown, if
no active transactions exist (excluding ones that are in
XA PREPARE state), skip any tables on which locks exist.
row_drop_table_for_mysql(): Do not unnecessarily attempt to
drop InnoDB persistent statistics for tables that have
already been added to the background drop table queue.
row_mysql_close(): Relax an assertion, and free all memory
even if innodb_force_recovery=2 would prevent the background
drop table queue from being emptied.
Introduce a new ATTRIBUTE_NOINLINE to
ib::logger member functions, and add UNIV_UNLIKELY hints to callers.
Also, remove some crash reporting output. If needed, the
information will be available using debugging tools.
Furthermore, remove some fts_enable_diag_print output that included
indexed words in raw form. The code seemed to assume that words are
NUL-terminated byte strings. It is not clear whether a NUL terminator
is always guaranteed to be present. Also, UCS2 or UTF-16 strings would
typically contain many NUL bytes.
was restored.
Optionally rollback prepared XA's on "mariabackup --prepare".
The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves.
redo log during recovery
- InnoDB unnecessarily reads the page even though it has fully initialized
buffered redo log records. Allow the page initialization redo log to
apply for the page in buf_page_get_gen() during recovery.
- Renamed buf_page_get_gen() to buf_page_get_low()
- Newly added buf_page_get_gen() will check for buffered redo log for
the particular page id during recovery
- Added new function buf_page_mtr_lock() which basically latches the page
for the given latch type.
- recv_recovery_create_page() is inline function which creates a page
if it has page initialization redo log records.
- This issue is caused by MDEV-19176
(bba59abb03).
- Problem is that there is miscalculation of available memory during
recovery if innodb_buffer_pool_instances > 1.
- Ignore the buffer pool instance while calculating available_memory
- Removed recv_n_pool_free_frames variable and use buf_pool_get_n_pages()
instead.
- Moved the recv_sys->heap memory condition inside recv_parse_log_recs().
So that, InnoDB can mark the status as STORE_NO earlier.
- InnoDB uses one third of buffer pool chunk size for reading the redo
log records. In that case, we can avoid the scenario where buffer ran
out of memory issue during recovery.
The test encryption.innodb-redo-badkey was accidentally disabled
until commit 23657a2101 enabled
it recently. Once it was enabled, it started failing randomly.
recv_recover_corrupt_page(): Do not assume that any redo log exists
for the page. A page may be unnecessarily read by read-ahead.
When noting the corruption, reset recv_addr->state to RECV_PROCESSED,
so that even if the same page is re-read again, we will only
decrement recv_sys->n_addrs once.
The general reason why innodb redo log file is limited by 512G is that
log_block_convert_lsn_to_no() returns value limited by 1G. But there is no
need to have unique log block numbers in log group. The fix removes 512G
limit and limits log group size by
(uint32_t maximum value) * (minimum page size), which, in turns, can be
removed if fil_io() is no longer used for innodb redo log io.
- Introduce a new variable called innodb_encrypt_temporary_tables which is
a boolean variable. It decides whether to encrypt the temporary tablespace.
- Encrypts the temporary tablespace based on full checksum format.
- Introduced a new counter to track encrypted and decrypted temporary
tablespace pages.
- Warnings issued if temporary table creation has conflict value with
innodb_encrypt_temporary_tables
- Added a new test case which reads and writes the pages from/to temporary
tablespace.
- If InnoDB encounters garbage or incomplete written log block during
recovery then don't throw the error. Treat it as end of the log.
- This kind of incomplete or empty block can be result of killing
InnoDB when writing the redo log.
- Don't apply redo log for the corrupted page when innodb_force_recovery > 0.
- Allow the table to be dropped when index root page is
corrupted when innodb_force_recovery > 0.
log_buffer_extend(): Do not write to disk. Just allocate new bigger
buffer and copy contents of old one to it. Do not acquire write_mutex.
log_t::is_extending: Removed as unneeded now.
LOG_BUFFER_SIZE: Removed to make the dependence on srv_log_buffer_size
visible.
Some places didn't match the previous rules, making the Floor
address wrong.
Additional sed rules:
sed -i -e 's/Place.*Suite .*, Boston/Street, Fifth Floor, Boston/g'
sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'
log_checkpoint(), log_make_checkpoint_at(): Remove the parameter
write_always. It seems that the primary purpose of this parameter
was to ensure in the function recv_reset_logs() that both checkpoint
header pages will be overwritten, when the function is called from
the never-enabled function recv_recovery_from_archive_start().
create_log_files(): Merge recv_reset_logs() to its only caller.
Debug instrumentation: Prefer to flush the redo log, instead of
triggering a redo log checkpoint.
page_header_set_field(): Disable a debug assertion that will
always fail due to MDEV-19344, now that we no longer initiate
a redo log checkpoint before an injected crash.
In recv_reset_logs() there used to be two calls to
log_make_checkpoint_at(). The apparent purpose of this was
to ensure that both InnoDB redo log checkpoint header pages
will be initialized or overwritten.
The second call was removed (without any explanation) in MySQL 5.6.3:
mysql/mysql-server@4ca37968da
In MySQL 5.6.8 WL#6494, starting with
mysql/mysql-server@00a0ba8ad9
the function recv_reset_logs() was not only invoked during
InnoDB data file initialization, but also during a regular
startup when the redo log is being resized.
mysql/mysql-server@45e9167983
in MySQL 5.7.2 removed the UNIV_LOG_ARCHIVE code, but still
did not remove the parameter write_always.
InnoDB crash recovery used to read every data page for which
redo log exists. This is unnecessary for those pages that are
initialized by the redo log. If a newly created page is corrupted,
recovery could unnecessarily fail. It would suffice to reinitialize
the page based on the redo log records.
To add insult to injury, InnoDB crash recovery could hang if it
encountered a corrupted page. We will fix also that problem.
InnoDB would normally refuse to start up if it encounters a
corrupted page on recovery, but that can be overridden by
setting innodb_force_recovery=1.
Data pages are completely initialized by the records
MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
which notifies that a page has been freed and its contents
can be discarded (filled with zeroes).
The record MLOG_INDEX_LOAD notifies that redo logging has
been re-enabled after being disabled. We can avoid loading
the page if all buffered redo log records predate the
MLOG_INDEX_LOAD record.
For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
records were written before commit aa3f7a107c.
Hence, we will skip these optimizations for tables whose
name starts with FTS_.
This is joint work with Thirunarayanan Balathandayuthapani.
fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
latest recovered MLOG_INDEX_LOAD record for a tablespace.
mlog_init: Page initialization operations discovered during
redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
and should be removed in MDEV-19176.
recv_addr_state: Add the new state RECV_WILL_NOT_READ to
indicate that according to mlog_init, the page will be
initialized based on redo log record contents.
recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
as page initialization. This works around bugs in the crash
recovery of ROW_FORMAT=COMPRESSED tables.
recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
by resetting the state to RECV_NOT_PROCESSED and by updating
the fil_name_t::enable_lsn.
recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
to fil_space_t::enable_lsn.
recv_recover_page(): Add the parameter init_lsn, to ignore
any log records that precede the page initialization.
Add DBUG output about skipped operations.
buf_page_create(): Initialize FIL_PAGE_LSN, so that
recv_recover_page() will not wrongly skip applying
the page-initialization record due to the field containing
some newer LSN as a leftover from a different page.
Do not invoke ibuf_merge_or_delete_for_page() during
crash recovery.
recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
Note if a corrupted page was found during recovery.
After invoking buf_page_create(), do invoke
ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
in the last recovery batch.
ibuf_merge_or_delete_for_page(): Relax a debug assertion.
innobase_start_or_create_for_mysql(): Abort startup if
a corrupted page was found during recovery. Corrupted pages
will not be flagged if innodb_force_recovery is set.
However, the recv_sys->found_corrupt_fs flag can be set
regardless of innodb_force_recovery if file names are found
to be incorrect (for example, multiple files with the same
tablespace ID).
The regression that was reported in MDEV-19212 occurred due to use
of macros that did not ensure that the arguments have compatible
types.
ut_2pow_remainder(), ut_2pow_round(), ut_calc_align(): Define as
inline function templates.
UT_CALC_ALIGN(): Define as a macro, because this is used in
compile_time_assert(). Only starting with C++11 (MariaDB 10.4)
we could define the inline functions as constexpr.
The recv_sys data structures are accessed not only from the thread
that executes InnoDB plugin initialization, but also from the
InnoDB I/O threads, which can invoke recv_recover_page().
Assert that sufficient concurrency control is in place.
Some code was accessing recv_sys data structures without
holding recv_sys->mutex.
recv_recover_page(bpage): Refactor the call from buf_page_io_complete()
into a separate function that performs necessary steps. The
main thread was unnecessarily releasing and reacquiring recv_sys->mutex.
recv_recover_page(block,mtr,recv_addr): Pass more parameters from
the caller. Avoid redundant lookups and computations. Eliminate some
redundant variables.
recv_get_fil_addr_struct(): Assert that recv_sys->mutex is being held.
That was not always the case!
recv_scan_log_recs(): Acquire recv_sys->mutex for the whole duration
of the function. (While we are scanning and buffering redo log records,
no pages can be read in.)
recv_read_in_area(): Properly protect access with recv_sys->mutex.
recv_apply_hashed_log_recs(): Check recv_addr->state only once,
and continuously hold recv_sys->mutex. The mutex will be released
and reacquired inside recv_recover_page() and recv_read_in_area(),
allowing concurrent processing by buf_page_io_complete() in I/O threads.
The record MLOG_INDEX_LOAD is supposed to be written to indicate that
some page modifications bypassed redo logging, and that redo logging
is now re-enabled. It was not written for fulltext indexes during
ALTER TABLE.
row_merge_write_redo(): Declare globally. Assert that the index
is neither a spatial nor fulltext index.
recv_mlog_index_load(): Observe a MLOG_INDEX_LOAD operation.
recv_parse_log_recs(): Handle MLOG_INDEX_LOAD also in multi-record
mini-transactions. Because of this omission, we should keep writing
MLOG_INDEX_LOAD in single-record mini-transactions, because older
versions of Mariabackup would fail.
row_fts_merge_insert(): Write MLOG_INDEX_LOAD for the auxiliary
tables of fulltext indexes.
The record MLOG_ZIP_PAGE_COMPRESS is similar to MLOG_INIT_FILE_PAGE2
that it contains all the information needed to initialize the page.
Like for the other record, do initialize the entire page on recovery.
The page_size argument to buf_page_get_gen() only matters when the
page is going to be loaded into the buffer pool. Allow callers to
pass a dummy parameter when using BUF_GET_IF_IN_POOL (which would
return NULL if the block is not in the buffer pool).
Normally, InnoDB is not in the process of executing crash recovery.
Provide a hint to the compiler that the recovery-related code paths
are rarely executed.
On Microsoft Windows, InnoDB writes the path separator \ to the
redo log file, while on all other platforms, / is being used.
fil_name_parse(): Normalize the parsed path separators to the
native format. This allows backups or data sets to be portable
between Windows and other systems.
recv_parse_log_recs(): Do not compare type if ptr==end_ptr
(we have reached the end of the redo log parsing buffer),
because it will not have been correctly initialized in that case.
log_group_read_log_seg(): Always return false when returning
before reading end_lsn.
xtrabackup_copy_logfile(): On error, indicate whether
a corrupt log record was encountered.
Only xtrabackup_copy_logfile() in Mariabackup cared about the
return value of the function. InnoDB crash recovery was not
affected by this bug.
Problem:
=======
Mariabackup incremental prepare creates new tablespace when it encounter
new tablespace. It sets the intial size as FIL_IBD_FILE_INITIAL_SIZE (4).
But while applying redo log, it tries to access 5th page and then
it leads to out of tablespace error.
Fix:
===
While parsing the redo log record, track FSP_SIZE in recv_spaces for the
respective space id. Assign the recv_size for the tablespace when it
is loaded. Extend the tablespace depends on recv_size while applying
the redo log record.
recv_addr_trim(): Do not try to detach the hash bucket, because
the code for doing that does not always work.
recv_apply_hashed_log_recs(): Do not attempt to read pages for which
there exist no redo log records.
Rename the 10.2-specific configuration option innodb_unsafe_truncate
to innodb_safe_truncate, and invert its value.
The default (for now) is innodb_safe_truncate=OFF, to avoid
disrupting users with an undo and redo log format change within
a Generally Available (GA) release series.
While MariaDB Server 10.2 is not really guaranteed to be compatible
with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format
change that could be present in XtraBackup, but was reverted from
MariaDB in MDEV-12289), we do not want to disrupt users who have
deployed xtrabackup and MariaDB Server 10.2 in their environments.
With this change, MariaDB 10.2 will continue to use the backup-unsafe
TRUNCATE TABLE code, so that neither the undo log nor the redo log
formats will change in an incompatible way.
Undo tablespace truncation will keep using the redo log only. Recovery
or backup with old code will fail to shrink the undo tablespace files,
but the contents will be recovered just fine.
In the MariaDB Server 10.2 series only, we introduce the configuration
parameter innodb_unsafe_truncate and make it ON by default. To allow
MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE
operations, use loose_innodb_unsafe_truncate=OFF.
MariaDB Server 10.3.10 and later releases will always use the
backup-safe TRUNCATE TABLE, and this parameter will not be
added there.
recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables()
unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan
tables if RENAME operations are not transactional within InnoDB.
LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT.
log_init(), log_group_file_header_flush(),
srv_prepare_to_delete_redo_log_files(),
innobase_start_or_create_for_mysql(): Choose the redo log format
and subformat based on the value of innodb_unsafe_truncate.
This is a regression caused by
commit 73af8af094
(MDEV-15325 Incomplete validation of missing tablespace during recovery).
If the recv_sys->addr_hash hash table ran out of memory, we would
have to do crash recovery in multiple passes. If some tablespaces were
missing, after the MDEV-15325 fix we would rescan the remaining redo log.
But, we could incorrectly reset the "rescan" flag. Because of this, we
would fail to apply some of the oldest redo log records to the data files.
(The recv_sys->addr_hash would only contain records from the latest
redo log scan batch.)
Fix:
After checking for missing tablespaces, reset the flag rescan=true,
so that all redo log records will be re-read and applied.