Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
Remove the unused variable innodb_calling_exit which was added
in MySQL 5.7 in an attempt to avoid failures in other threads when
an I/O thread calls exit().
Since MDEV-9282, InnoDB or XtraDB MariaDB Server is not calling
exit() at all, and the variable innodb_calling_exit was never set.
This is a non-functional change.
On a related note, the calls fil_system_enter() and fil_system_exit()
are often used in an unsafe manner. The fix of MDEV-11738 should
introduce fil_space_acquire() and remove potential race conditions.
The InnoDB adaptive hash index is sometimes degrading the performance of
InnoDB, and it is sometimes disabled to get more consistent performance.
We should have a compile-time option to disable the adaptive hash index.
Let us introduce two options:
OPTION(WITH_INNODB_AHI "Include innodb_adaptive_hash_index" ON)
OPTION(WITH_INNODB_ROOT_GUESS "Cache index root block descriptors" ON)
where WITH_INNODB_AHI always implies WITH_INNODB_ROOT_GUESS.
As part of this change, the misleadingly named function
trx_search_latch_release_if_reserved(trx) will be replaced with the macro
trx_assert_no_search_latch(trx) that will be empty unless
BTR_CUR_HASH_ADAPT is defined (cmake -DWITH_INNODB_AHI=ON).
We will also remove the unused column
INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_TIMEOUT.
In MariaDB Server 10.1, it used to reflect the value of
trx_t::search_latch_timeout which could be adjusted during
row_search_for_mysql(). In 10.2, there is no such field.
Other than the removal of the unused column TRX_ADAPTIVE_HASH_TIMEOUT,
this is an almost non-functional change to the server when using the
default build options.
Some tests are adjusted so that they will work with both
-DWITH_INNODB_AHI=ON and -DWITH_INNODB_AHI=OFF. The test
innodb.innodb_monitor has been renamed to innodb.monitor
in order to track MySQL 5.7, and the duplicate tests
sys_vars.innodb_monitor_* are removed.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
This fixes MySQL Bug#80788 in MariaDB 10.2.5.
When I made the InnoDB crash recovery more robust by implementing
WL#7142, I also introduced an extra redo log scan pass that can be
shortened.
This fix will slightly extend the InnoDB redo log format that I
introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT
mini-transaction to the end of the log checkpoint page, so that recovery
can jump straight to it without scanning all the preceding redo log.
LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN
of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were
written as 0.
log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter
end_lsn for writing LOG_CHECKPOINT_END_LSN.
log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT
mini-transaction is starting (or at which the redo log ends on
shutdown).
recv_init_crash_recovery(): Remove.
recv_group_scan_log_recs(): Add the parameter checkpoint_lsn.
recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN
and if it is set, start the first scan from it instead of the
checkpoint LSN. Improve some messages and remove bogus assertions.
recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some
file-level redo log records.
recv_parse_or_apply_log_rec_body(): If we have not parsed all redo
log between the checkpoint and the corresponding MLOG_CHECKPOINT
record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records
to recv_init_crash_recovery_spaces().
recv_init_crash_recovery_spaces(): Refuse recovery if
MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
MDEV-7618 introduced configuration parameter innodb_instrument_semaphores
in MariaDB Server 10.1. The parameter seems to only affect the rw-lock
X-latch acquisition. Extra fields are added to rw_lock_t to remember one
X-latch holder or waiter. These fields are not being consulted or reported
anywhere. This is basically only adding code bloat.
If the intention is to debug hangs or deadlocks, we have better tools for
that in the debug server, and for the non-debug server, core dumps can
reveal a lot. For example, the mini-transaction memo records the
currently held buffer block or index rw-locks, to be released at
mtr_t::commit().
The configuration parameter innodb_instrument_semaphores will be
deprecated in 10.2.5 and removed in 10.3.0.
rw_lock_t: Remove the members lock_name, file_name, line, thread_id
which did not affect any output.
to tables in the system tablespace
This is a regression caused by MDEV-11585, which accidentally
changed Tablespace::is_undo_tablespace() in an incorrect way,
causing the InnoDB system tablespace to be reported as a dedicated
undo tablespace, for which the change buffer is not applicable.
Tablespace::is_undo_tablespace(): Remove. There were only 2
calls from the function buf_page_io_complete(). Replace those
calls as appropriate.
Also, merge changes to tablespace import/export tests from
MySQL 5.7, and clean up the tests a little further, allowing
them to be run with any innodb_page_size.
Remove duplicated error injection instrumentation for the
import/export tests. In MySQL 5.7, the error injection label
buf_page_is_corrupt_failure was renamed to
buf_page_import_corrupt_failure.
fil_space_extend_must_retry(): Correct a debug assertion
(tablespaces can be extended during IMPORT), and remove a
TODO comment about compressed temporary tables that was
already addressed in MDEV-11816.
dict_build_tablespace_for_table(): Correct a comment that
no longer holds after MDEV-11816, and assert that
ROW_FORMAT=COMPRESSED can only be used in .ibd files.
buf_dump(): Correct the printf format passed to buf_dump_status()
to match the argument types.
Revert the changes to storage/xtradb. XtraDB is not being compiled
for 10.2. The unused copy that we have in the 10.2 branch is only
getting merges from 10.1.
Disable the test sys_vars.innodb_buffer_pool_dump_pct_function
because it is unstable on buildbot.
If page_compression (introduced in MariaDB Server 10.1) is enabled,
the logical action is to not preallocate space to the data files,
but to only logically extend the files with zeroes.
fil_create_new_single_table_tablespace(): Create smaller files for
ROW_FORMAT=COMPRESSED tables, but adhere to the minimum file size of
4*innodb_page_size.
fil_space_extend_must_retry(), os_file_set_size(): On Windows,
use SetFileInformationByHandle() and FILE_END_OF_FILE_INFO,
which depends on bumping _WIN32_WINNT to 0x0600.
FIXME: The files are not yet set up as sparse, so
this will currently end up physically extending (preallocating)
the files, wasting storage for unused pages.
os_file_set_size(): Add the parameter "bool sparse=false" to declare
that the file is to be extended logically, instead of being preallocated.
The only caller with sparse=true is
fil_create_new_single_table_tablespace().
(The system tablespace cannot be created with page_compression.)
fil_space_extend_must_retry(), os_file_set_size(): Outside Windows,
use ftruncate() to extend files that are supposed to be sparse.
On systems where ftruncate() is limited to files less than 4GiB
(if there are any), fil_space_extend_must_retry() retains the
old logic of physically extending the file.
fil_extend_space_to_desired_size(): Use a proper type cast when
computing start_offset for the posix_fallocate() call on 32-bit systems
(where sizeof(ulint) < sizeof(os_offset_t)). This could affect 32-bit
systems when extending files that are at least 4 MiB long.
This bug existed in MariaDB 10.0 before MDEV-11520. In MariaDB 10.1
it had been fixed in MDEV-11556.
a large memory buffer on Windows
fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate(), and pass the correct arguments to
posix_fallocate().
On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)
I tested that the file extension works properly with a multi-file
system tablespace, both with --innodb-use-fallocate and
--skip-innodb-use-fallocate (the default):
./mtr \
--mysqld=--innodb-use-fallocate \
--mysqld=--innodb-autoextend-increment=1 \
--mysqld=--innodb-data-file-path='ibdata1:5M;ibdata2:5M:autoextend' \
--parallel=auto --force --retry=0 --suite=innodb &
ls -lsh mysql-test/var/*/mysqld.1/data/ibdata2
(several samples while running the test)
Before the MDEV-11520 fixes, fil_extend_space_to_desired_size()
in MariaDB Server 5.5 incorrectly passed the desired file size as the
third argument to posix_fallocate(), even though the length of the
extension should have been passed. This looks like a regression
that was introduced in the 5.5 version of MDEV-5746.
Remove the unused variable desired_size.
Also, correct the expression for the posix_fallocate() start_offset,
and actually test that it works with a multi-file system tablespace.
Before MDEV-11520, the expression was wrong in both innodb_plugin and
xtradb, in different ways.
The start_offset formula was tested with the following:
./mtr --big-test --mysqld=--innodb-use-fallocate \
--mysqld=--innodb-data-file-path='ibdata1:5M;ibdata2:5M:autoextend' \
--parallel=auto --force --retry=0 --suite=innodb &
ls -lsh mysql-test/var/*/mysqld.1/data/ibdata2
a large memory buffer on Windows
fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate().
On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)
For innodb_plugin, port the XtraDB fix for MySQL Bug#56433
(introducing fil_system->file_extend_mutex). The bug was
fixed differently in MySQL 5.6 (and MariaDB Server 10.0).
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
log_write_flush_to_disk_low(): Invoke log_mutex_enter() at the end, to
avoid race conditions when changing the system state. (No potential
race condition existed before MySQL 5.7.)
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
fil_space_extend_must_retry(): When innodb_use_fallocate=ON,
initialize pages_added = size - space->size so that posix_fallocate()
will actually attempt to extend the file, instead of keeping the same size.
This is a regression from MDEV-11556 which refactored
the InnoDB data file extension.
The test innodb.log_file_size_checkpoint was originally added to
MySQL 5.7 by me in a bug fix, to fix the interaction of WL#6494
(redo log resizing, introduced in MySQL 5.6) and WL#7142
(data file discovery based on MLOG_FILE_NAME records,
introduced in MySQL 5.7):
commit 70f9ef4e1220827132b50275ca7272f2bcca1864
Author: Marko Mäkelä <marko.makela@oracle.com>
Date: Wed May 21 13:31:29 2014 +0300
Bug#18755095 REDO LOG SIZE CHANGE AFTER CRASH RESULTS IN CHECKPOINT AGE
ERROR MESSAGE
This is a regression from fixing
Bug#18730524 REPEATED KILL+RESTART FAILS DUE TO MISSING MLOG_FILE_NAME
RECORD
innobase_start_or_create_for_mysql(): Invoke fil_names_clear() before
creating the "checkpoint" when changing redo log files.
Approved by Jimmy Yang on IM.
The relevant part of the test is that fil_names_clear() is invoked to
emit an MLOG_CHECKPOINT record before the redo log files are deleted.
In case the server is killed before ib_logfile0 has been deleted,
the old (not-yet-resized) redo log will be treated as valid. We do not
need to create a large number of tables for that.
I introduced the rec_printer object in MySQL to pretty-print raw InnoDB
records and index tuples in diagnostic messages. These objects are being
constructed unconditionally, even though the DBUG_PRINT is not enabled.
The unnecessary work is avoided by simply passing rec_printer(…).str()
to the DBUG_LOG macro that was introduced in MDEV-11713.
dict_init_free(): Make global, and move the call from
dict_close() to srv_free(), because this is initialized
earlier than dict_sys.
innobase_space_shutdown(): Do not leak srv_allow_writes_event.
Write only one encryption key to the checkpoint page.
Use 4 bytes of nonce. Encrypt more of each redo log block,
only skipping the 4-byte field LOG_BLOCK_HDR_NO which the
initialization vector is derived from.
Issue notes, not warning messages for rewriting the redo log files.
recv_recovery_from_checkpoint_finish(): Do not generate any redo log,
because we must avoid that before rewriting the redo log files, or
otherwise a crash during a redo log rewrite (removing or adding
encryption) may end up making the database unrecoverable.
Instead, do these tasks in innobase_start_or_create_for_mysql().
Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some
unreachable code and duplicated error messages for log corruption.
LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo
log format.
log_group_t::is_encrypted(), log_t::is_encrypted(): Determine
if the redo log is in encrypted format.
recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED.
srv_prepare_to_delete_redo_log_files(): Display NOTE messages about
adding or removing encryption. Do not issue warnings for redo log
resizing any more.
innobase_start_or_create_for_mysql(): Rebuild the redo logs also when
the encryption changes.
innodb_log_checksums_func_update(): Always use the CRC-32C checksum
if innodb_encrypt_log. If needed, issue a warning
that innodb_encrypt_log implies innodb_log_checksums.
log_group_write_buf(): Compute the checksum on the encrypted
block contents, so that transmission errors or incomplete blocks can be
detected without decrypting.
Rewrite most of the redo log encryption code. Only remember one
encryption key at a time (but remember up to 5 when upgrading from the
MariaDB 10.1 format.)
The InnoDB redo log consists of a list of files that logically form
a bigger file, as if the individual files were concatenated together.
The first file will always be written on redo log checkpoint, because
the two checkpoint pages are at the start of the single logical
redo log file.
There is no technical reason why InnoDB requires at least 2 files
to exist. Let us reduce the minimum number to 1. In that way,
restoring from backups will become easier, since InnoDB can directly
deal with a single backed-up redo log file.
Ever since MDEV-5800 enabled indexed virtual columns for InnoDB,
the InnoDB shutdown relied on close_connections() that would set
thd->killed for the InnoDB purge threads. Alas, the embedded server
shutdown is not invoking close_connections(), and thus InnoDB purge
threads fail to initiate shutdown, causing a hang.
innodb_inited: Remove. Use srv_was_started instead.
innobase_fast_shutdown: Remove. Use srv_fast_shutdown instead.
srv_running: Renamed from thd_destructor_myvar, and made global.
The value NULL means that shutdown was requested or the purge threads
should not be running because of innodb_read_only_mode=1.
innobase_init(): Set srv_was_started after ensuring that srv_running
was initialized. (In innodb_read_only mode, the purge threads are not
started and we do not care if srv_running==NULL.)
innobase_start_or_create_for_mysql(): Do not set srv_was_started.
Let it be set by the only caller innobase_init().
srv_purge_should_exit(): Check also srv_was_started and srv_running
when evaluating thd->killed.
EINVAL means that the filesystem doesn't support posix_fallocate().
There were two places where this error was issued, one checked for
EINVAL, the other did not. This commit fixed the other place
to also check for EINVAL.
Also, remove the space after the REFMAN to get the valid url
with no space in the middle.
Also don't say "Make sure the file system supports this function."
when posix_fallocate() fails, because this message is only shown
when the filesystem does support this function.
move TABLE::key_read into handler. Because in index merge and DS-MRR
there can be many handlers per table, and some of them use
key read while others don't. "keyread" is really per handler,
not per TABLE property.
Oracle introduced a Memcached plugin interface to the InnoDB
storage engine in MySQL 5.6. That interface is essentially a
fork of Memcached development snapshot 1.6.0-beta1 of an old
development branch 'engine-pu'.
To my knowledge, there have not been any updates to the Memcached code
between MySQL 5.6 and 5.7; only bug fixes and extensions related to
the Oracle modifications.
The Memcached plugin is not part of the MariaDB Server. Therefore it
does not make sense to include the InnoDB interfaces for the Memcached
plugin, or to have any related configuration parameters:
innodb_api_bk_commit_interval
innodb_api_disable_rowlock
innodb_api_enable_binlog
innodb_api_enable_mdl
innodb_api_trx_level
Removing this code in one commit makes it possible to easily restore
it, in case it turns out to be needed later.
log_crypt_101_read_checkpoint(): Read the encryption information
from a MariaDB 10.1 checkpoint page.
log_crypt_101_read_block(): Attempt to decrypt a MariaDB 10.1
redo log page.
recv_log_format_0_recover(): Only attempt decryption on checksum
mismatch. NOTE: With the MariaDB 10.1 innodb_encrypt_log format,
we can actually determine from the cleartext portion of the redo log
whether the redo log is empty. We do not really have to decrypt the
redo log here, if we did not want to determine if the checksum is valid.
innodb_shutdown(), trx_sys_close(): Startup may be aborted between
purge_sys and trx_sys creation. Therefore, purge_sys must be freed
independently of trx_sys.
innobase_start_or_create_for_mysql(): Remember to free purge_queue if
it was not yet attached to purge_sys.