Alias the InnoDB ulint and lint data types to size_t and ssize_t,
which are the standard names for the machine-word-width data types.
Correspondingly, define ULINTPF as "%zu" and introduce ULINTPFx as "%zx".
In this way, better compiler warnings for type mismatch are possible.
Furthermore, use PRIu64 for that 64-bit format, and define
the feature macro __STDC_FORMAT_MACROS to enable it on Red Hat systems.
Fix some errors in error messages, and replace some error messages
with assertions.
Most notably, an IMPORT TABLESPACE error message in InnoDB was
displaying the number of columns instead of the mismatching flags.
InnoDB defines some functions that are not called at all.
Other functions are called, but only from the same compilation unit.
Remove some function declarations and definitions, and add 'static'
keywords. Some symbols must be kept for separately compiled tools,
such as innochecksum.
Also, remove empty .ic files that were not removed by my MySQL commit.
Problem:
InnoDB used to support a compilation mode that allowed to choose
whether the function definitions in .ic files are to be inlined or not.
This stopped making sense when InnoDB moved to C++ in MySQL 5.6
(and ha_innodb.cc started to #include .ic files), and more so in
MySQL 5.7 when inline methods and functions were introduced
in .h files.
Solution:
Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from
all files, assuming that the symbols are never defined.
Remove the files fut0fut.cc and ut0byte.cc which only mattered when
UNIV_NONINL was defined.
dict_create_or_check_foreign_constraint_tables(): Change the warning
about the foreign key metadata table creation to a note.
Remove messages after metadata table creation. If the creation fails,
startup will abort with a message. Normally the creation succeeds on
bootstrap, and the messages would only be noise.
Remove the related suppressions from the tests.
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:
recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).
Report progress also via systemd using sd_notifyf().
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
The InnoDB adaptive hash index is sometimes degrading the performance of
InnoDB, and it is sometimes disabled to get more consistent performance.
We should have a compile-time option to disable the adaptive hash index.
Let us introduce two options:
OPTION(WITH_INNODB_AHI "Include innodb_adaptive_hash_index" ON)
OPTION(WITH_INNODB_ROOT_GUESS "Cache index root block descriptors" ON)
where WITH_INNODB_AHI always implies WITH_INNODB_ROOT_GUESS.
As part of this change, the misleadingly named function
trx_search_latch_release_if_reserved(trx) will be replaced with the macro
trx_assert_no_search_latch(trx) that will be empty unless
BTR_CUR_HASH_ADAPT is defined (cmake -DWITH_INNODB_AHI=ON).
We will also remove the unused column
INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_TIMEOUT.
In MariaDB Server 10.1, it used to reflect the value of
trx_t::search_latch_timeout which could be adjusted during
row_search_for_mysql(). In 10.2, there is no such field.
Other than the removal of the unused column TRX_ADAPTIVE_HASH_TIMEOUT,
this is an almost non-functional change to the server when using the
default build options.
Some tests are adjusted so that they will work with both
-DWITH_INNODB_AHI=ON and -DWITH_INNODB_AHI=OFF. The test
innodb.innodb_monitor has been renamed to innodb.monitor
in order to track MySQL 5.7, and the duplicate tests
sys_vars.innodb_monitor_* are removed.
to tables in the system tablespace
This is a regression caused by MDEV-11585, which accidentally
changed Tablespace::is_undo_tablespace() in an incorrect way,
causing the InnoDB system tablespace to be reported as a dedicated
undo tablespace, for which the change buffer is not applicable.
Tablespace::is_undo_tablespace(): Remove. There were only 2
calls from the function buf_page_io_complete(). Replace those
calls as appropriate.
Also, merge changes to tablespace import/export tests from
MySQL 5.7, and clean up the tests a little further, allowing
them to be run with any innodb_page_size.
Remove duplicated error injection instrumentation for the
import/export tests. In MySQL 5.7, the error injection label
buf_page_is_corrupt_failure was renamed to
buf_page_import_corrupt_failure.
fil_space_extend_must_retry(): Correct a debug assertion
(tablespaces can be extended during IMPORT), and remove a
TODO comment about compressed temporary tables that was
already addressed in MDEV-11816.
dict_build_tablespace_for_table(): Correct a comment that
no longer holds after MDEV-11816, and assert that
ROW_FORMAT=COMPRESSED can only be used in .ibd files.
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
log_write_flush_to_disk_low(): Invoke log_mutex_enter() at the end, to
avoid race conditions when changing the system state. (No potential
race condition existed before MySQL 5.7.)
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
dict_init_free(): Make global, and move the call from
dict_close() to srv_free(), because this is initialized
earlier than dict_sys.
innobase_space_shutdown(): Do not leak srv_allow_writes_event.
Remove the dependency on unzip. Instead, generate the InnoDB files
with perl.
log_block_checksum_is_ok(): Correct the error message.
recv_scan_log_recs(): Remove the duplicated error message for
log block checksum mismatch.
innobase_start_or_create_for_mysql(): If the server is in read-only
mode or if innodb_force_recovery>=3, do not try to modify the system
tablespace. (If the doublewrite buffer or the non-core system tables
do not exist, do not try to create them.)
innodb_shutdown(): Relax a debug assertion. If the system tablespace
did not contain a doublewrite buffer and if we started up in
innodb_read_only mode or with innodb_force_recovery>=3, it will not
be created.
dict_create_or_check_sys_tablespace(): Set the flag
srv_sys_tablespaces_open when the tables exist.
Both dict_foreign_find_index and dict_foreign_qualify_index
did not consider virtual columns as possible foreign key
columns and there was assertion to disable virtual columns.
Fixed by also looking referencing and referenced column
from virtual columns if needed.
Most notably, this includes MDEV-11623, which includes a fix and
an upgrade procedure for the InnoDB file format incompatibility
that is present in MariaDB Server 10.1.0 through 10.1.20.
In other words, this merge should address
MDEV-11202 InnoDB 10.1 -> 10.2 migration does not work
MySQL 5.7 allows temporary tables to be created in ROW_FORMAT=COMPRESSED.
The usefulness of this is questionable. WL#7899 in MySQL 8.0.0
prevents the creation of such compressed tables, so that all InnoDB
temporary tables will be located inside the predefined
InnoDB temporary tablespace.
Pick up and adjust some tests from MySQL 5.7 and 8.0.
dict_tf_to_fsp_flags(): Remove the parameter is_temp.
fsp_flags_init(): Remove the parameter is_temporary.
row_mysql_drop_temp_tables(): Remove. There cannot be any temporary
tables in InnoDB. (This never removed #sql* tables in the datadir
which were created by DDL.)
dict_table_t::dir_path_of_temp_table: Remove.
create_table_info_t::m_temp_path: Remove.
create_table_info_t::create_options_are_invalid(): Do not allow
ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE for temporary tables.
create_table_info_t::innobase_table_flags(): Do not unnecessarily
prevent CREATE TEMPORARY TABLE with SPATIAL INDEX.
(MySQL 5.7 does allow this.)
fil_space_belongs_in_lru(): The only FIL_TYPE_TEMPORARY tablespace
is never subjected to closing least-recently-used files.
MySQL 5.7 introduced partial support for user-created shared tablespaces
(for example, import and export are not supported).
MariaDB Server does not support tablespaces at this point of time.
Let us remove most InnoDB code and data structures that is related
to shared tablespaces.
MariaDB will likely never support MySQL-style encryption for
InnoDB, because we cannot link with the Oracle encryption plugin.
This is preparation for merging MDEV-11623.
MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K
The storage format of FSP_SPACE_FLAGS was accidentally broken
already in MariaDB 10.1.0. This fix is bringing the format in
line with other MySQL and MariaDB release series.
Please refer to the comments that were added to fsp0fsp.h
for details.
This is an INCOMPATIBLE CHANGE that affects users of
page_compression and non-default innodb_page_size. Upgrading
to this release will correct the flags in the data files.
If you want to downgrade to earlier MariaDB 10.1.x, please refer
to the test innodb.101_compatibility how to reset the
FSP_SPACE_FLAGS in the files.
NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret
uncompressed data files with innodb_page_size=4k or 64k as
compressed innodb_page_size=16k files, and then probably fail
when trying to access the pages. See the comments in the
function fsp_flags_convert_from_101() for detailed analysis.
Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16.
In this way, compressed innodb_page_size=16k tablespaces will not
be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20.
Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the
dict_table_t::flags when the table is available, in
fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace().
During crash recovery, fil_load_single_table_tablespace() will use
innodb_compression_level for the PAGE_COMPRESSION_LEVEL.
FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags
that are not to be written to FSP_SPACE_FLAGS. Currently, these will
include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR.
Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support
one innodb_page_size for the whole instance.
When creating a dummy tablespace for the redo log, use
fil_space_t::flags=0. The flags are never written to the redo log files.
Remove many FSP_FLAGS_SET_ macros.
dict_tf_verify_flags(): Remove. This is basically only duplicating
the logic of dict_tf_to_fsp_flags(), used in a debug assertion.
fil_space_t::mark: Remove. This flag was not used for anything.
fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter
mark_space, and add a parameter for table flags. Check that
fil_space_t::flags match the table flags, and adjust the (memory-only)
flags based on the table flags.
fil_node_open_file(): Remove some redundant or unreachable conditions,
do not use stderr for output, and avoid unnecessary server aborts.
fil_user_tablespace_restore_page(): Convert the flags, so that the
correct page_size will be used when restoring a page from the
doublewrite buffer.
fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove.
It suffices to have fil_space_is_page_compressed().
FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL,
FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not
exist in the FSP_SPACE_FLAGS but only in memory.
fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS
in page 0. Called by fil_open_single_table_tablespace(),
fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql()
except if --innodb-read-only is active.
fsp_flags_is_valid(ulint): Reimplement from the scratch, with
accurate comments. Do not display any details of detected
inconsistencies, because the output could be confusing when
dealing with MariaDB 10.1.x data files.
fsp_flags_convert_from_101(ulint): Convert flags from buggy
MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags
cannot be in MariaDB 10.1.x format.
fsp_flags_match(): Check the flags when probing files.
Implemented based on fsp_flags_is_valid()
and fsp_flags_convert_from_101().
dict_check_tablespaces_and_store_max_id(): Do not access the
page after committing the mini-transaction.
IMPORT TABLESPACE fixes:
AbstractCallback::init(): Convert the flags.
FetchIndexRootPages::operator(): Check that the tablespace flags match the
table flags. Do not attempt to convert tablespace flags to table flags,
because the conversion would necessarily be lossy.
PageConverter::update_header(): Write back the correct flags.
This takes care of the flags in IMPORT TABLESPACE.
- Atomic writes are enabled by default
- Automatically detect if device supports atomic write and use it if
atomic writes are enabled
- Remove ATOMIC WRITE options from CREATE TABLE
- Atomic write is a device option, not a table options as the table may
crash if the media changes
- Add support for SHANNON SSD cards
InnoDB shutdown failed to properly take fil_crypt_thread() into account.
The encryption threads were signalled to shut down together with other
non-critical tasks. This could be much too early in case of slow shutdown,
which could need minutes to complete the purge. Furthermore, InnoDB
failed to wait for the fil_crypt_thread() to actually exit before
proceeding to the final steps of shutdown, causing the race conditions.
Furthermore, the log_scrub_thread() was shut down way too early.
Also it should remain until the SRV_SHUTDOWN_FLUSH_PHASE.
fil_crypt_threads_end(): Remove. This would cause the threads to
be terminated way too early.
srv_buf_dump_thread_active, srv_dict_stats_thread_active,
lock_sys->timeout_thread_active, log_scrub_thread_active,
srv_monitor_active, srv_error_monitor_active: Remove a race condition
between startup and shutdown, by setting these in the startup thread
that creates threads, not in each created thread. In this way, once the
flag is cleared, it will remain cleared during shutdown.
srv_n_fil_crypt_threads_started, fil_crypt_threads_event: Declare in
global rather than static scope.
log_scrub_event, srv_log_scrub_thread_active, log_scrub_thread():
Declare in static rather than global scope. Let these be created by
log_init() and freed by log_shutdown().
rotate_thread_t::should_shutdown(): Do not shut down before the
SRV_SHUTDOWN_FLUSH_PHASE.
srv_any_background_threads_are_active(): Remove. These checks now
exist in logs_empty_and_mark_files_at_shutdown().
logs_empty_and_mark_files_at_shutdown(): Shut down the threads in
the proper order. Keep fil_crypt_thread() and log_scrub_thread() alive
until SRV_SHUTDOWN_FLUSH_PHASE, and check that they actually terminate.
The InnoDB source code contains quite a few references to a closed-source
hot backup tool which was originally called InnoDB Hot Backup (ibbackup)
and later incorporated in MySQL Enterprise Backup.
The open source backup tool XtraBackup uses the full database for recovery.
So, the references to UNIV_HOTBACKUP are only cluttering the source code.
Essentially revert MDEV-6759, which addressed a double free of memory
by removing the freeing altogether, introducing the memory leaks.
No double free was observed when running the test suite -DWITH_ASAN.
Replace some mem_heap_free(foreign->heap) with dict_foreign_free(foreign)
so that the calls can be located and instrumented more easily when needed.
MySQL 5.7 supports only one shared temporary tablespace.
MariaDB 10.2 does not support any other shared InnoDB tablespaces than
the two predefined tablespaces: the persistent InnoDB system tablespace
(default file name ibdata1) and the temporary tablespace
(default file name ibtmp1).
InnoDB is unnecessarily allocating a tablespace ID for the predefined
temporary tablespace on every startup, and it is in several places
testing whether a tablespace ID matches this dynamically generated ID.
We should use a compile-time constant to reduce code size and to avoid
unnecessary updates to the DICT_HDR page at every startup.
Using a hard-coded tablespace ID will should make it easier to remove the
TEMPORARY flag from FSP_SPACE_FLAGS in MDEV-11202.
Essentially revert MDEV-6759, which addressed a double free of memory
by removing the freeing altogether, introducing the memory leaks.
No double free was observed when running the test suite -DWITH_ASAN.
Replace some mem_heap_free(foreign->heap) with dict_foreign_free(foreign)
so that the calls can be located and instrumented more easily when needed.
This should be functionally equivalent to WL#6204 in MySQL 8.0.0, with
the notable difference that the file format changes are limited to
repurposing a previously unused data field in B-tree pages.
For persistent InnoDB tables, write the last used AUTO_INCREMENT
value to the root page of the clustered index, in the previously
unused (0) PAGE_MAX_TRX_ID field, now aliased as PAGE_ROOT_AUTO_INC.
Unlike some other previously unused InnoDB data fields, this one was
actually always zero-initialized, at least since MySQL 3.23.49.
The writes to PAGE_ROOT_AUTO_INC are protected by SX or X latch on the
root page. The SX latch will allow concurrent read access to the root
page. (The field PAGE_ROOT_AUTO_INC will only be read on the
first-time call to ha_innobase::open() from the SQL layer. The
PAGE_ROOT_AUTO_INC can only be updated when executing SQL, so
read/write races are not possible.)
During INSERT, the PAGE_ROOT_AUTO_INC is updated by the low-level
function btr_cur_search_to_nth_level(), adding no extra page
access. [Adaptive hash index lookup will be disabled during INSERT.]
If some rare UPDATE modifies an AUTO_INCREMENT column, the
PAGE_ROOT_AUTO_INC will be adjusted in a separate mini-transaction in
ha_innobase::update_row().
When a page is reorganized, we have to preserve the PAGE_ROOT_AUTO_INC
field.
During ALTER TABLE, the initial AUTO_INCREMENT value will be copied
from the table. ALGORITHM=COPY and online log apply in LOCK=NONE will
update PAGE_ROOT_AUTO_INC in real time.
innodb_col_no(): Determine the dict_table_t::cols[] element index
corresponding to a Field of a non-virtual column.
(The MySQL 5.7 implementation of virtual columns breaks the 1:1
relationship between Field::field_index and dict_table_t::cols[].
Virtual columns are omitted from dict_table_t::cols[]. Therefore,
we must translate the field_index of AUTO_INCREMENT columns into
an index of dict_table_t::cols[].)
Upgrade from old data files:
By default, the AUTO_INCREMENT sequence in old data files would appear
to be reset, because PAGE_MAX_TRX_ID or PAGE_ROOT_AUTO_INC would contain
the value 0 in each clustered index page. In new data files,
PAGE_ROOT_AUTO_INC can only be 0 if the table is empty or does not contain
any AUTO_INCREMENT column.
For backward compatibility, we use the old method of
SELECT MAX(auto_increment_column) for initializing the sequence.
btr_read_autoinc(): Read the AUTO_INCREMENT sequence from a new-format
data file.
btr_read_autoinc_with_fallback(): A variant of btr_read_autoinc()
that will resort to reading MAX(auto_increment_column) for data files
that did not use AUTO_INCREMENT yet. It was manually tested that during
the execution of innodb.autoinc_persist the compatibility logic is
not activated (for new files, PAGE_ROOT_AUTO_INC is never 0 in nonempty
clustered index root pages).
initialize_auto_increment(): Replaces
ha_innobase::innobase_initialize_autoinc(). This initializes
the AUTO_INCREMENT metadata. Only called from ha_innobase::open().
ha_innobase::info_low(): Do not try to lazily initialize
dict_table_t::autoinc. It must already have been initialized by
ha_innobase::open() or ha_innobase::create().
Note: The adjustments to class ha_innopart were not tested, because
the source code (native InnoDB partitioning) is not being compiled.
* remnant of 5.6, does not exist in 5.7. bad merge?
* also remove dict_table_get_col_name_for_mysql(), it was only
used when index_field_t::col_name was not NULL
WL#7682 in MySQL 5.7 introduced the possibility to create light-weight
temporary tables in InnoDB. These are called 'intrinsic temporary tables'
in InnoDB, and in MySQL 5.7, they can be created by the optimizer for
sorting or buffering data in query processing.
In MariaDB 10.2, the optimizer temporary tables cannot be created in
InnoDB, so we should remove the dead code and related data structures.
In InnoDB and XtraDB functions that declare pointer parameters as nonnull,
remove nullness checks, because GCC would optimize them away anyway.
Use #ifdef instead of #if when checking for a configuration flag.
Clang says that left shifts of negative values are undefined.
So, use ~0U instead of ~0 in a number of macros.
Some functions that were defined as UNIV_INLINE were declared as
UNIV_INTERN. Consistently use the same type of linkage.
ibuf_merge_or_delete_for_page() could pass bitmap_page=NULL to
buf_page_print(), conflicting with the __attribute__((nonnull)).
Replaced InnoDB atomic operations with server atomic operations.
Moved INNODB_RW_LOCKS_USE_ATOMICS - it is always defined (code won't compile
otherwise).
NOTE: InnoDB uses thread identifiers as a target for atomic operations.
Thread identifiers should be considered opaque: any attempt to use a
thread ID other than in pthreads calls is nonportable and can lead to
unspecified results.
(Fixing both InnoDB and XtraDB)
Re-opening a TABLE object (after e.g. FLUSH TABLES or open table cache
eviction) causes ha_innobase to call
dict_stats_update(DICT_STATS_FETCH_ONLY_IF_NOT_IN_MEMORY).
Inside this call, the following is done:
dict_stats_empty_table(table);
dict_stats_copy(table, t);
On the other hand, commands like UPDATE make this call to get the "rows in
table" statistics in table->stats.records:
ha_innobase->info(HA_STATUS_VARIABLE|HA_STATUS_NO_LOCK)
note the HA_STATUS_NO_LOCK parameter. It means, no locks are taken by
::info() If the ::info() call happens between dict_stats_empty_table
and dict_stats_copy calls, the UPDATE's optimizer will get an estimate
of table->stats.records=1, which causes it to pick a full table scan,
which in turn will take a lot of row locks and cause other bad
consequences.