The test encryption.innodb-redo-nokeys did not actually test
recovery without valid keys, because due to the setting
innodb_encrypt_tables, InnoDB refused to start up at all,
without even attempting any crash recovery.
fil_ibd_load(): If the encryption key is not available,
refuse to load the file.
- Purge thread is trying to access the table name when other thread
does rename of the table name. It leads to heap use after free error
by purge thread. purge thread should check whether the table name
is temporary while holding dict_sys.mutex.
The check on the SQL command type, in ha_spider::external_lock() is deleted
by e954d9de. This resulted in the wrong call of spider_internal_start_trx()
(and thus Ha_trx_info::register_ha()).
I reverted the check and refactored ha_spider::external_lock().
We will remove the parameter innodb_disallow_writes because it is badly
designed and implemented. The parameter was never allowed at startup.
It was only internally used by Galera snapshot transfer.
If a user executed
SET GLOBAL innodb_disallow_writes=ON;
the server could hang even on subsequent read operations.
During Galera snapshot transfer, we will block writes
to implement an rsync friendly snapshot, as follows:
sst_flush_tables() will acquire a global lock by executing
FLUSH TABLES WITH READ LOCK, which will block any writes
at the high level.
sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
will suspend or disable InnoDB background tasks or threads that could
initiate writes. As part of this, log_make_checkpoint() will be invoked
to ensure that anything in the InnoDB buf_pool.flush_list will be written
to the data files. This has the nice side effect that the Galera joiner
will avoid crash recovery.
The changes to sql/wsrep.cc and to the tests are based on a prototype
that was developed by Jan Lindström.
Reviewed by: Jan Lindström
The issue is caused by 59a0236da4 commit.
The initial intention of the commit was to speed up
"mariabackup --prepare".
The call stack of binlog position reading is the following:
▾ trx_rseg_mem_restore
▾ trx_rseg_array_init
▾ trx_lists_init_at_db_start
▸ srv_start
Both trx_lists_init_at_db_start() and trx_rseg_mem_restore() contain
special cases for srv_operation == SRV_OPERATION_RESTORE condition, and
on this condition only rseg headers are read to parse binlog position.
Performance impact is not so big.
The solution is to revert 59a0236da4.
ib_id_t is a uint64. On AIX this isn't a long long unsigned and to
prevent the compile warnings and potential wrong type, the UINT64PFx
defination is corrected.
As INT64PF is unused (last use, xtradb in 10.2), it is removed to
remove the confusion that INT64PF and UINT64PFx would be different
types otherwise.
This was noticed as part of verifying
MDEV-28186 "crash on startup after crash while regular use"
but is probably not related to the users issue.
Still good to have it fixed
mariabackup does not load dictionary during backup, but it loads
tablespaces, that is why fil_system.max_assigned_id is not set with
dict_check_tablespaces_and_store_max_id(). There is no sense to issue the
warning during backup.
The comparison on the checkpoint age (number of log bytes
written since the previous checkpoint) is inaccurate, because
the previous FILE_CHECKPOINT record could span two 512-byte
log blocks, which will cause the LSN to increase by the size of the
log block header and footer.
We will still generate a redudant checkpoint if the previous
checkpoint wrote some FILE_MODIFY records before the FILE_CHECKPOINT
record.
Only one checkpoint may be in progress at a time.
The counter log_sys.n_pending_checkpoint_writes
was being protected by log_sys.mutex.
Let us replace it with the Boolean log_sys.checkpoint_pending.
srv_start(): Set srv_startup_is_before_trx_rollback_phase before
starting the buf_flush_page_cleaner() thread, so that it will not
invoke log_checkpoint() before the log file has been created.
This race condition was reproduced with https://rr-project.org.
This fixes up commit 15efb7ed48
buf_pool_t::watch_unset(): Reorder some code so that
no warning will be emitted in CMAKE_BUILD_TYPE=RelWithDebInfo.
It is unclear why invoking watch_is_sentinel() before
buf_fix_count() would make the warning disappear.
In commit 437da7bc54 (MDEV-19534),
the default value of the global variable srv_checksum_algorithm
in innochecksum was changed from SRV_CHECKSUM_ALGORITHM_INNODB
to implied 0 (innodb_checksum_algorithm=crc32). As a result,
the function buf_page_is_corrupted() would by default invoke
buf_calc_page_crc32() in innochecksum, and crc32_inited would hold.
This would cause "innochecksum" to fail on a particular page.
The actual problem is older, introduced in 2011 in
mysql/mysql-server@17e497bdb7
(MySQL 5.6.3). It should affect the validation of pages of old
data files that were written with innodb_checksum_algorithm=innodb.
When using innodb_checksum_algorithm=crc32 (the default setting
since MariaDB Server 10.2), some valid pages would be rejected
only because exactly one of the two checksum fields accidentally
matches the innodb_checksum_algorithm=crc32 value.
buf_page_is_corrupted(): Simplify the logic of non-strict
checksum validation, by always invoking buf_calc_page_crc32().
Remove a bogus condition that if only one of the checksum fields
contains the value returned by buf_calc_page_crc32(), the page
is corrupted.
New function to release latches till savepoint was added in mtr_t. As
there is no longer need to limit MDEV-20605 fix usage for locking reads
only, the limitation is removed.
In commit ab38b7511b
an added "goto err" would seemingly cause a read of
an uninitialized variable old_info if errpos>=5.
However, because we would have errpos=0 at that point,
there was no real error.
buf_flush_freed_pages(): Assert that neither buf_pool.mutex
nor buf_pool.flush_list_mutex are held. Simplify the loops.
Return the tablespace and the number of pages written or punched.
buf_flush_LRU_list_batch(), buf_do_flush_list_batch():
Release buf_pool.mutex before invoking buf_flush_space().
buf_flush_list_space(): Acquire the mutexes only after invoking
buf_flush_freed_pages().
Reviewed by: Thirunarayanan Balathandayuthapani
fil_space_t::try_to_close(): Tolerate a tablespace that has no
data files attached. The function fil_ibd_create() initially
creates and attaches a tablespace with no files, and invokes
fil_space_t::add() later.
fil_node_open_file(): After releasing and reacquiring fil_system.mutex,
check if the file was already opened by another thread. This avoids
an assertion failure !node->is_open() in fil_node_open_file_low().
These failures were reproduced with the test
innodb.table_definition_cache_debug and the fix of MDEV-27985.
- InnoDB fails to skip newly created column while checking for
change column when table is in redundant row format. This issue
is caused the MDEV-18035 (ccb1acbd3c)
For some reason, the tests of the MemorySanitizer build on 10.5 failed
with both clang 13 and clang 14 with SIGSEGV. On 10.6 where it worked
better, some more places to work around were identified.
The MemorySanitizer implementation in clang includes some built-in
instrumentation (interceptors) for GNU libc. In GNU libc 2.33, the
interface to the stat() family of functions was changed. Until the
MemorySanitizer interceptors are adjusted, any MSAN code builds
will act as if that the stat() family of functions failed to initialize
the struct stat.
A fix was applied in
https://reviews.llvm.org/rG4e1a6c07052b466a2a1cd0c3ff150e4e89a6d87a
but it fails to cover the 64-bit variants of the calls.
For now, let us work around the MemorySanitizer bug by defining
and using the macro MSAN_STAT_WORKAROUND().
As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can
really confuse MariaDB.
Filipe Manana (SuSE)[1] showed how database programmers can assume
O_DIRECT is all or nothing.
While a fix was done in the kernel side, we can do better in our code by
requesting that the rest of the block be read/written synchronously if
we do only get a partial read/write.
Per the APIs, a partial read/write can occur before an error, so
reattempting the request will leave the caller with a concrete error to
handle.
[1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a
Also spell synchronously correctly in other files.
The first step for deprecating innodb_autoinc_lock_mode(see MDEV-27844) is:
- to switch statement binlog format to ROW if binlog format is MIXED and
the statement changes autoincremented fields
- issue warnings if innodb_autoinc_lock_mode == 2 and binlog format is
STATEMENT