a large memory buffer on Windows
fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate().
On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)
For innodb_plugin, port the XtraDB fix for MySQL Bug#56433
(introducing fil_system->file_extend_mutex). The bug was
fixed differently in MySQL 5.6 (and MariaDB Server 10.0).
- Get the suite to work with dynamically-linked plugin (ha_rocksdb.so)
- Due to the push to keep everything MyRocks-related in storage/rocksdb,
there is no mysql-test/include/have_rocksdb.* anymore.
Make a copy of storage/rocksdb/mysql-test/rocksdb/include/have_rocksdb*,
hopefully these files wont be changed [often].
- Maria-fication of rocksdb_persistent_cache_path test.
This change should have been a part of
Merge 'merge-myrocks' into 'bb-10.2-mariarocks'
Merged cset:
Copy of
commit d1bb19b8f751875472211312c8e810143a7ba4b6
We probably should make submodule info a part of the mergetree process.
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
log_write_flush_to_disk_low(): Invoke log_mutex_enter() at the end, to
avoid race conditions when changing the system state. (No potential
race condition existed before MySQL 5.7.)
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
Merged cset:
Copy of
commit d1bb19b8f751875472211312c8e810143a7ba4b6
Author: Manuel Ung <mung@fb.com>
Date: Fri Feb 3 11:50:34 2017 -0800
...
Add cardinality stats to information schema
Test suite parameters for 'rocksdb' test suite were disabled in order
to get mysqld to start at all when ha_rocksdb is a dynamic plugin.
A lot of tests depend on these parameters being enabled, though. Put
them back by using the loose- form.
fil_space_extend_must_retry(): When innodb_use_fallocate=ON,
initialize pages_added = size - space->size so that posix_fallocate()
will actually attempt to extend the file, instead of keeping the same size.
This is a regression from MDEV-11556 which refactored
the InnoDB data file extension.
The test innodb.log_file_size_checkpoint was originally added to
MySQL 5.7 by me in a bug fix, to fix the interaction of WL#6494
(redo log resizing, introduced in MySQL 5.6) and WL#7142
(data file discovery based on MLOG_FILE_NAME records,
introduced in MySQL 5.7):
commit 70f9ef4e1220827132b50275ca7272f2bcca1864
Author: Marko Mäkelä <marko.makela@oracle.com>
Date: Wed May 21 13:31:29 2014 +0300
Bug#18755095 REDO LOG SIZE CHANGE AFTER CRASH RESULTS IN CHECKPOINT AGE
ERROR MESSAGE
This is a regression from fixing
Bug#18730524 REPEATED KILL+RESTART FAILS DUE TO MISSING MLOG_FILE_NAME
RECORD
innobase_start_or_create_for_mysql(): Invoke fil_names_clear() before
creating the "checkpoint" when changing redo log files.
Approved by Jimmy Yang on IM.
The relevant part of the test is that fil_names_clear() is invoked to
emit an MLOG_CHECKPOINT record before the redo log files are deleted.
In case the server is killed before ib_logfile0 has been deleted,
the old (not-yet-resized) redo log will be treated as valid. We do not
need to create a large number of tables for that.
I introduced the rec_printer object in MySQL to pretty-print raw InnoDB
records and index tuples in diagnostic messages. These objects are being
constructed unconditionally, even though the DBUG_PRINT is not enabled.
The unnecessary work is avoided by simply passing rec_printer(…).str()
to the DBUG_LOG macro that was introduced in MDEV-11713.
dict_init_free(): Make global, and move the call from
dict_close() to srv_free(), because this is initialized
earlier than dict_sys.
innobase_space_shutdown(): Do not leak srv_allow_writes_event.
Write only one encryption key to the checkpoint page.
Use 4 bytes of nonce. Encrypt more of each redo log block,
only skipping the 4-byte field LOG_BLOCK_HDR_NO which the
initialization vector is derived from.
Issue notes, not warning messages for rewriting the redo log files.
recv_recovery_from_checkpoint_finish(): Do not generate any redo log,
because we must avoid that before rewriting the redo log files, or
otherwise a crash during a redo log rewrite (removing or adding
encryption) may end up making the database unrecoverable.
Instead, do these tasks in innobase_start_or_create_for_mysql().
Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some
unreachable code and duplicated error messages for log corruption.
LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo
log format.
log_group_t::is_encrypted(), log_t::is_encrypted(): Determine
if the redo log is in encrypted format.
recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED.
srv_prepare_to_delete_redo_log_files(): Display NOTE messages about
adding or removing encryption. Do not issue warnings for redo log
resizing any more.
innobase_start_or_create_for_mysql(): Rebuild the redo logs also when
the encryption changes.
innodb_log_checksums_func_update(): Always use the CRC-32C checksum
if innodb_encrypt_log. If needed, issue a warning
that innodb_encrypt_log implies innodb_log_checksums.
log_group_write_buf(): Compute the checksum on the encrypted
block contents, so that transmission errors or incomplete blocks can be
detected without decrypting.
Rewrite most of the redo log encryption code. Only remember one
encryption key at a time (but remember up to 5 when upgrading from the
MariaDB 10.1 format.)
The InnoDB redo log consists of a list of files that logically form
a bigger file, as if the individual files were concatenated together.
The first file will always be written on redo log checkpoint, because
the two checkpoint pages are at the start of the single logical
redo log file.
There is no technical reason why InnoDB requires at least 2 files
to exist. Let us reduce the minimum number to 1. In that way,
restoring from backups will become easier, since InnoDB can directly
deal with a single backed-up redo log file.
Ever since MDEV-5800 enabled indexed virtual columns for InnoDB,
the InnoDB shutdown relied on close_connections() that would set
thd->killed for the InnoDB purge threads. Alas, the embedded server
shutdown is not invoking close_connections(), and thus InnoDB purge
threads fail to initiate shutdown, causing a hang.
innodb_inited: Remove. Use srv_was_started instead.
innobase_fast_shutdown: Remove. Use srv_fast_shutdown instead.
srv_running: Renamed from thd_destructor_myvar, and made global.
The value NULL means that shutdown was requested or the purge threads
should not be running because of innodb_read_only_mode=1.
innobase_init(): Set srv_was_started after ensuring that srv_running
was initialized. (In innodb_read_only mode, the purge threads are not
started and we do not care if srv_running==NULL.)
innobase_start_or_create_for_mysql(): Do not set srv_was_started.
Let it be set by the only caller innobase_init().
srv_purge_should_exit(): Check also srv_was_started and srv_running
when evaluating thd->killed.
EINVAL means that the filesystem doesn't support posix_fallocate().
There were two places where this error was issued, one checked for
EINVAL, the other did not. This commit fixed the other place
to also check for EINVAL.
Also, remove the space after the REFMAN to get the valid url
with no space in the middle.
Also don't say "Make sure the file system supports this function."
when posix_fallocate() fails, because this message is only shown
when the filesystem does support this function.
move TABLE::key_read into handler. Because in index merge and DS-MRR
there can be many handlers per table, and some of them use
key read while others don't. "keyread" is really per handler,
not per TABLE property.
Oracle introduced a Memcached plugin interface to the InnoDB
storage engine in MySQL 5.6. That interface is essentially a
fork of Memcached development snapshot 1.6.0-beta1 of an old
development branch 'engine-pu'.
To my knowledge, there have not been any updates to the Memcached code
between MySQL 5.6 and 5.7; only bug fixes and extensions related to
the Oracle modifications.
The Memcached plugin is not part of the MariaDB Server. Therefore it
does not make sense to include the InnoDB interfaces for the Memcached
plugin, or to have any related configuration parameters:
innodb_api_bk_commit_interval
innodb_api_disable_rowlock
innodb_api_enable_binlog
innodb_api_enable_mdl
innodb_api_trx_level
Removing this code in one commit makes it possible to easily restore
it, in case it turns out to be needed later.
log_crypt_101_read_checkpoint(): Read the encryption information
from a MariaDB 10.1 checkpoint page.
log_crypt_101_read_block(): Attempt to decrypt a MariaDB 10.1
redo log page.
recv_log_format_0_recover(): Only attempt decryption on checksum
mismatch. NOTE: With the MariaDB 10.1 innodb_encrypt_log format,
we can actually determine from the cleartext portion of the redo log
whether the redo log is empty. We do not really have to decrypt the
redo log here, if we did not want to determine if the checksum is valid.
innodb_shutdown(), trx_sys_close(): Startup may be aborted between
purge_sys and trx_sys creation. Therefore, purge_sys must be freed
independently of trx_sys.
innobase_start_or_create_for_mysql(): Remember to free purge_queue if
it was not yet attached to purge_sys.
This change add WITH_ROCKSDB_{LZ4,BZIP2,ZSTD,snappy} CMake variables
that can be set to ON/OFF/AUTO.
If variable has default value AUTO, rocksdb links with corresponding
compression library. OFF disables compiling/linking with specific compression
library, ON forces compiling with it (cmake would throw error if library
is not available)
Support for ZLIB is added unconditionally, as it is always there.
A proper InnoDB shutdown after aborted startup was introduced
in commit 81b7fe9d38.
Also related to this is MDEV-11985, making read-only shutdown more robust.
If startup was aborted, there may exist recovered transactions that were
not rolled back. Relax the assertions accordingly.
buf_page_is_checksum_valid_crc32()
buf_page_is_checksum_valid_innodb()
buf_page_is_checksum_valid_none():
Use ULINTPF instead of %lu and %u for ib_uint32_t
fil_space_verify_crypt_checksum():
Check that page is really empty if checksum and
LSN are zero.
fil_space_verify_crypt_checksum():
Correct the comment to be more agurate.
buf0buf.h:
Remove unnecessary is_corrupt variable from
buf_page_t structure.
recv_writer_thread(): Do not assign recv_writer_thread_active=true
in order to avoid a race condition with
recv_recovery_from_checkpoint_finish().
recv_init_crash_recovery(): Assign recv_writer_thread_active=true
before creating recv_writer_thread.