ha_innobase::defragment_table(): Skip corrupted indexes and
FULLTEXT INDEX. In InnoDB, FULLTEXT INDEX is implemented with
auxiliary tables. We will not defragment them on OPTIMIZE TABLE.
Also, some MDEV-11738/MDEV-11581 post-push fixes.
In MariaDB 10.1, there is no fil_space_t::is_being_truncated field,
and the predicates fil_space_t::stop_new_ops and fil_space_t::is_stopping()
are interchangeable. I requested the fil_space_t::is_stopping() to be added
in the review, but some added checks for fil_space_t::stop_new_ops were
not replaced with calls to fil_space_t::is_stopping().
buf_page_decrypt_after_read(): In this low-level I/O operation, we must
look up the tablespace if it exists, even though future I/O operations
have been blocked on it due to a pending DDL operation, such as DROP TABLE
or TRUNCATE TABLE or other table-rebuilding operations (ALTER, OPTIMIZE).
Pass a parameter to fil_space_acquire_low() telling that we are performing
a low-level I/O operation and the fil_space_t::is_stopping() status should
be ignored.
MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off
Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.
This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).
fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()
fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.
btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
Use fil_space_acquire()/release() to access fil_space_t.
buf_page_decrypt_after_read():
Use fil_space_get_crypt_data() because at this point
we might not yet have read page 0.
fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().
fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.
fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.
check_table_options()
Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*
i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.
In the 10.1 InnoDB Plugin, a call os_event_free(buf_flush_event) was
misplaced. The event could be signalled by rollback of resurrected
transactions while shutdown was in progress. This bug was caught
by cmake -DWITH_ASAN testing. This call was only present in the
10.1 InnoDB Plugin, not in other versions, or in XtraDB.
That said, the bug affects all InnoDB versions. Shutdown assumes the
cessation of any page-dirtying activity, including the activity of
the background rollback thread. InnoDB only waited for the background
rollback to finish as part of a slow shutdown (innodb_fast_shutdown=0).
The default is a clean shutdown (innodb_fast_shutdown=1). In a scenario
where InnoDB is killed, restarted, and shut down soon enough, the data
files could become corrupted.
logs_empty_and_mark_files_at_shutdown(): Wait for the
rollback to finish, except if innodb_fast_shutdown=2
(crash-like shutdown) was requested.
trx_rollback_or_clean_recovered(): Before choosing the next
recovered transaction to roll back, terminate early if non-slow
shutdown was initiated. Roll back everything on slow shutdown
(innodb_fast_shutdown=0).
srv_innodb_monitor_mutex: Declare as static, because the mutex
is only used within one module.
After each call to os_event_free(), ensure that the freed event
is not reachable via global variables, by setting the relevant
variables to NULL.
fil_parse_write_crypt_data(): Correct the comparison operator.
This was broken in commit 498f4a825b
which removed a signed/unsigned mismatch in these comparisons.
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:
recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).
Report progress also via systemd using sd_notifyf().
Also, implement MDEV-11027 a little differently from 5.5:
recv_sys_t::report(ib_time_t): Determine whether progress should
be reported.
recv_apply_hashed_log_recs(): Rename the parameter to last_batch.
Provide more useful progress reporting of crash recovery.
recv_sys_t::progress_time: The time of the last report.
recv_scan_print_counter: Remove.
log_group_read_log_seg(): After after each I/O request,
report progress if needed.
recv_apply_hashed_log_recs(): At the start of each batch,
if there are pages to be recovered, issue a message.
This is a non-functional change.
On a related note, the calls fil_system_enter() and fil_system_exit()
are often used in an unsafe manner. The fix of MDEV-11738 should
introduce fil_space_acquire() and remove potential race conditions.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
buf_dump(): Correct the printf format passed to buf_dump_status()
to match the argument types.
Revert the changes to storage/xtradb. XtraDB is not being compiled
for 10.2. The unused copy that we have in the 10.2 branch is only
getting merges from 10.1.
Disable the test sys_vars.innodb_buffer_pool_dump_pct_function
because it is unstable on buildbot.
If page_compression (introduced in MariaDB Server 10.1) is enabled,
the logical action is to not preallocate space to the data files,
but to only logically extend the files with zeroes.
fil_create_new_single_table_tablespace(): Create smaller files for
ROW_FORMAT=COMPRESSED tables, but adhere to the minimum file size of
4*innodb_page_size.
fil_space_extend_must_retry(), os_file_set_size(): On Windows,
use SetFileInformationByHandle() and FILE_END_OF_FILE_INFO,
which depends on bumping _WIN32_WINNT to 0x0600.
FIXME: The files are not yet set up as sparse, so
this will currently end up physically extending (preallocating)
the files, wasting storage for unused pages.
os_file_set_size(): Add the parameter "bool sparse=false" to declare
that the file is to be extended logically, instead of being preallocated.
The only caller with sparse=true is
fil_create_new_single_table_tablespace().
(The system tablespace cannot be created with page_compression.)
fil_space_extend_must_retry(), os_file_set_size(): Outside Windows,
use ftruncate() to extend files that are supposed to be sparse.
On systems where ftruncate() is limited to files less than 4GiB
(if there are any), fil_space_extend_must_retry() retains the
old logic of physically extending the file.
fil_extend_space_to_desired_size(): Use a proper type cast when
computing start_offset for the posix_fallocate() call on 32-bit systems
(where sizeof(ulint) < sizeof(os_offset_t)). This could affect 32-bit
systems when extending files that are at least 4 MiB long.
This bug existed in MariaDB 10.0 before MDEV-11520. In MariaDB 10.1
it had been fixed in MDEV-11556.
a large memory buffer on Windows
fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate(), and pass the correct arguments to
posix_fallocate().
On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)
I tested that the file extension works properly with a multi-file
system tablespace, both with --innodb-use-fallocate and
--skip-innodb-use-fallocate (the default):
./mtr \
--mysqld=--innodb-use-fallocate \
--mysqld=--innodb-autoextend-increment=1 \
--mysqld=--innodb-data-file-path='ibdata1:5M;ibdata2:5M:autoextend' \
--parallel=auto --force --retry=0 --suite=innodb &
ls -lsh mysql-test/var/*/mysqld.1/data/ibdata2
(several samples while running the test)
Before the MDEV-11520 fixes, fil_extend_space_to_desired_size()
in MariaDB Server 5.5 incorrectly passed the desired file size as the
third argument to posix_fallocate(), even though the length of the
extension should have been passed. This looks like a regression
that was introduced in the 5.5 version of MDEV-5746.
Remove the unused variable desired_size.
Also, correct the expression for the posix_fallocate() start_offset,
and actually test that it works with a multi-file system tablespace.
Before MDEV-11520, the expression was wrong in both innodb_plugin and
xtradb, in different ways.
The start_offset formula was tested with the following:
./mtr --big-test --mysqld=--innodb-use-fallocate \
--mysqld=--innodb-data-file-path='ibdata1:5M;ibdata2:5M:autoextend' \
--parallel=auto --force --retry=0 --suite=innodb &
ls -lsh mysql-test/var/*/mysqld.1/data/ibdata2
a large memory buffer on Windows
fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate().
On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)
For innodb_plugin, port the XtraDB fix for MySQL Bug#56433
(introducing fil_system->file_extend_mutex). The bug was
fixed differently in MySQL 5.6 (and MariaDB Server 10.0).
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.