LATCH_ID_OS_AIO_READ_MUTEX,
LATCH_ID_OS_AIO_WRITE_MUTEX,
LATCH_ID_OS_AIO_LOG_MUTEX,
LATCH_ID_OS_AIO_IBUF_MUTEX,
LATCH_ID_OS_AIO_SYNC_MUTEX: Remove. The tpool is not instrumented.
lock_set_timeout_event(): Remove.
srv_sys_mutex_key, srv_sys_t::mutex, SYNC_THREADS: Remove.
srv_slot_t::suspended: Remove. We only ever assigned this data member
true, so it is redundant.
ib_wqueue_wait(), ib_wqueue_timedwait(): Remove.
os_thread_join(): Remove.
os_thread_create(), os_thread_exit(): Remove redundant parameters.
These were missed in commit 5e62b6a5e0.
Before closing file handles, we really must wait until there are no
pending os_file_flush(). This was missed in
commit b1ab211dee (MDEV-15053)
where we changed the following:
fil_node_close_to_free(): Wait for n_pending==0. Because we no longer
do an extra lookup of the tablespace between fil_io() and the
completion of the operation, we must give fil_node_t::complete_io() a
chance to decrement the counter.
The test encryption.create_or_replace would occasionally fail with
a warning message from fil_check_pending_ops().
fil_crypt_find_space_to_rotate(): While waiting for available
I/O capacity, check fil_space_t::is_stopping() and release a
handle if necessary.
fil_space_crypt_close_tablespace(): Wake up the waiters in
fil_crypt_find_space_to_rotate().
Problem:
======
Making the tablespace as page_compressed doesn't do table rebuild.
It does change only the FSP_SPACE_FLAGS.
During recovery:
1) InnoDB encounters FILE_CREATE redo log and opens the tablespace
with old FSP_SPACE_FLAGS value.
2) Only parsing of redo log has been finished. Now InnoDB tries to
load the table. If the existing tablespace flags doesn't match
with table flags then InnoDB should read page0. But in
fsp_flags_try_adjust(), skips the page read for full_crc32 format.
3) After that, InnoDB tries to open the clustered index and it
leads to failure of page validation.
Fix:
===
While parsing the redo log record, track FSP_SPACE_FLAGS in
recv_spaces for the respective space id. Assign the flags for
the tablespace when it is loaded.
recv_parse_set_size_and_flags(): Parse the redo log to set the
tablespace recovery size and flags.
fil_space_set_recv_size_and_flags(): Changed from
fil_space_set_recv_size(). To set the recovery size and flags of
the tablespace.
Introduce flags variable in file_name_t to maintain the tablespace
flag which we encountered during parsing of redo log.
is_flags_full_crc32_equal(), is_flags_non_full_crc32_equal(): Rename
the variable page_ssize and space_page_ssize with fcrc32_psize and
non_fcrc32_psize.
fts_drop_orphaned_tables() takes long time to remove the orphaned
FTS tables. In order to reduce the time, do the following:
- Traverse fil_system.space_list and construct a set of
table_id,index_id of all FTS_*.ibd tablespaces.
- Traverse the sys_indexes table and ignore the entry
from the above collection if it exist.
- Existing elements in the collection can be considered as
orphaned fts tables. construct the table name from
(table_id,index_id) and invoke fts_drop_tables().
- Removed DICT_TF2_FTS_AUX_HEX_NAME flag usage from upgrade.
- is_aux_table() in dict_table_t to check whether the given name
is fts auxiliary table
fts_space_set_t is a structure to store set of parent table id
and index id
- Remove unused FTS function in fts0fts.cc
- Remove the fulltext index in row_format_redundant test case.
Because it deals with the condition that SYS_TABLES does have
corrupted entry and valid entry exist in SYS_INDEXES.
commit de942c9f61 (MDEV-15983)
introduced a race condition that we inadequately fixed in
commit 93b69825ad (MDEV-16169).
Because fil_space_t::release() or fil_space_t::acquire() are
not protected by fil_system.mutex like their predecessors,
it is possible that stop_new_ops was set between the time
a thread checked fil_space_t::is_stopping() and invoked
fil_space_t::acquire().
In an execution trace, this happened in fil_system_t::keyrotate_next(),
causing an assertion failure in fil_delete_tablespace()
in the other thread that seeked to stop new operations.
We fix this bug by merging the flag fil_space_t::stop_new_ops
and the reference count fil_space_t::n_pending_ops into a
single word that is only being accessed by atomic memory operations.
fil_space_t::set_stopping(): Accessor for changing the state of
the former stop_new_ops flag.
fil_space_t::acquire(): Return whether the acquisition succeeded.
It would fail between set_stopping(true) and set_stopping(false).
A leak of the contents of fil_system.ssd that was introduced in
commit 10dd290b4b (MDEV-17380)
was caught by implementing SAFEMALLOC instrumentation of operator new.
I did not try to find out how to make AddressSanitizer or Valgrind
detect it.
fil_system_t::close(): Clear fil_system.ssd.
The leak was identified and a fix suggested by Michael Widenius
and Vicențiu Ciorbaru.
Ever since MDEV-15053 changed something in the page flushing,
we are occasionally observing pending I/O for a data file that
is about to be deleted.
fil_check_pending_io(): Change the Warning to a note.
This message was already made less frequent in
commit dcc0baf540 (10.5.4)
and commit 65f831d17c (10.3.24, 10.4.14).
fil_page_decompress(): Remove a rather useless debug check.
We should have test coverage for reading page_compressed pages
from files, either due to buffer pool page eviction or due to
server restarts.
A similar check was removed from fil_space_encrypt() in
commit 0b36c27e0c (MDEV-20307).
fil_system_t::keyrotate_next(): If space && space->is_in_rotation_list
does not hold, iterate from the start of the list.
In debug builds, we would typically have hit SIGSEGV because the
iterator would have wrapped a null pointer. It might also be that
we are dereferencing a stale pointer.
There is no test case, because the encryption is very nondeterministic
in nature, due to the use of background threads.
This scenario can be hit by setting the following:
SET GLOBAL innodb_encryption_threads=5;
SET GLOBAL innodb_encryption_rotate_key_age=0;
The test encryption.create_or_replace would occasionally fail,
because some fil_space_t::n_pending_ops would never be decremented.
fil_crypt_find_space_to_rotate(): If rotate_thread_t::should_shutdown()
holds due to innodb_encryption_threads having been reduced, do
release the reference.
fil_space_remove_from_keyrotation(), fil_space_next(): Declare the
functions static, simplify a little, and define in the same compilation
unit with the only caller, fil_crypt_find_space_to_rotate().
fil_crypt_key_mutex: Remove (unused).
- This issue is caused by a5584b13d1
(MDEV-15528). os_file_punch_hole() is added to fil_io() in MDEV-15528.
But it fails to handle failure of os_file_punch_hole(). InnoDB should
handle the DB_IO_NO_PUNCH_HOLE error and silently transform to
DB_SUCCESS. InnoDB should set the punch hole flag correctly when
tablespace is loaded
fil_node_t::read_page0(): Set the punch hole flag when tablespace is loaded
fil_io(): Handle the DB_IO_NO_PUNCH_HOLE error
buf_flush_free_pages(): Checks the punch hole condition earlier using
tablespace punch hole flag
InnoDB should replace FSP_FLAGS_HAS_PAGE_COMPRESSION check with
fil_space_t::is_compressed(). fil_space_t::is_compressed() checks
for both non full crc32 and crc32 format.
InnoDB should replace FSP_FLAGS_HAS_PAGE_COMPRESSION check with
fil_space_t::is_compressed(). fil_space_t::is_compressed() checks
for both non full crc32 and crc32 format.
When InnoDB is extending a data file, it is updating the FSP_SIZE
field in the first page of the data file.
In commit 8451e09073 (MDEV-11556)
we removed a work-around for this bug and made recovery stricter,
by making it track changes to FSP_SIZE via redo log records, and
extend the data files before any changes are being applied to them.
It turns out that the function fsp_fill_free_list() is not crash-safe
with respect to this when it is initializing the change buffer bitmap
page (page 1, or generally, N*innodb_page_size+1). It uses a separate
mini-transaction that is committed (and will be written to the redo
log file) before the mini-transaction that actually extended the data
file. Hence, recovery can observe a reference to a page that is
beyond the current end of the data file.
fsp_fill_free_list(): Initialize the change buffer bitmap page in
the same mini-transaction.
The rest of the changes are fixing a bug that the use of the separate
mini-transaction was attempting to work around. Namely, we must ensure
that no other thread will access the change buffer bitmap page before
our mini-transaction has been committed and all page latches have been
released.
That is, for read-ahead as well as neighbour flushing, we must avoid
accessing pages that might not yet be durably part of the tablespace.
fil_space_t::committed_size: The size of the tablespace
as persisted by mtr_commit().
fil_space_t::max_page_number_for_io(): Limit the highest page
number for I/O batches to committed_size.
MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch.
mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch.
mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK,
copy space->size to space->committed_size. In this way, read-ahead
or flushing will never be invoked on pages that do not yet exist
according to FSP_SIZE.
In fsp_path_to_space_name(), we would access a byte right before
the start of the string, tripping AddressSanitizer.
This reverts commit d87006a1c1
and commit a7634281aa.
This version is not optimized yet. It could have bugs because I didn't
check it with unit tests. Also, std::char_traits are not really supported.
So, now it's not possible to create f.ex. a case insensitive string_view.
fil_path_to_space_name(): renamed, moved to another file
and refactored to use string_view
- Some of the bug fixes are backports from 10.5!
- The fix in innobase/fil/fil0fil.cc is just a backport to get less
error messages in mysqld.1.err when running with valgrind.
- Renamed HAVE_valgrind_or_MSAN to HAVE_valgrind
This function is very common in a debug build. I can even see it in
profiler.
This patch reduces execution time of fil_validate() from
8948ns
8367ns
8650ns
8906ns
8448ns
to
260ns
232ns
403ns
275ns
169ns
in my environment.
The trick is a faster fil_space_t iteration. Hash table
is typically initialized with a size of 50,000. And looping through
it is slow. Slower, than iterating an exact amount of fil_space_t
which is typically less than ten.
Only debug builds are affected.
This issue is pretty much the same as MDEV-20213.
The fix is similar to:
3c238ac51c52c4abbff2
Check::validate(): fix a debug assertion
SysTablespace::open_or_create(): protect assigning to a shared
variable with a mutex
Check::validate(): Relax a debug assertion. TRX_SYS_SPACE fil_space_t
can be created and became visible to this assertion before
fil_system.sys_space becomes initialized