To avoid potential race conditions between concurrent access to
dict_table_t::freed_indexes, let us consistently use
dict_table_t::autoinc_mutex.
dict_table_remove_from_cache_low(): To avoid extensive hold time
of table->autoinc_mutex, unconditionally free the FTS data structures.
Problem:
=======
The last AHI page for two indexes of an dropped table is being
freed at the same time by two threads. One thread frees the
table heap and other thread tries to access table heap again.
It leads to asan failure in btr_search_lazy_free().
Solution:
========
InnoDB uses autoinc_mutex to avoid the race condition
in btr_search_lazy_free()
InnoDB startup hangs if a DDL transaction needs to be
rolled back and a recovered transaction on statistics
tables exists. In that case, InnoDB should rollback
the transaction which holds locks on innodb_table_stats
or innodb_index_stats during trx_rollback_or_clean_recovered().
Problem:
========
InnoDB fails to clean the index stub if it fails to add the
virtual index which contains new virtual column. But it clears
the newly virtual column from index in clear_added_indexes()
during inplace_alter_table. On commit, InnoDB evicts and
reload the table. In case of rollback, it doesn't happen.
InnoDB clears the ABORTED index while opening the table
or doing the DDL. In the mean time, InnoDB can access
the dropped virtual index columns while creating prebuilt
or rollback of concurrent DML.
Solution:
==========
(1) InnoDB should maintain newly added virtual column while
rollbacking the newly added virtual index.
(2) InnoDB must not defer the index removal
if the alter table is executed with LOCK=EXCLUSIVE.
(3) For LOCK=SHARED, InnoDB should check whether the table
has any other transaction lock other than alter transaction
before deferring the index stub.
Replaced has_new_v_col with dict_add_vcol_info in dict_index_t to
indicate whether the index has any new virtual column.
dict_index_t::has_new_v_col(): Returns whether the index has
newly added virtual column, it doesn't say which columns are
newly added virtual column
ha_innobase_inplace_ctx::is_new_vcol(): Return whether the
given column is added as a part of the current alter.
ha_innobase_inplace_ctx::clean_new_vcol_index(): Copy the newly
added virtual column to new_vcol_info in dict_index_t. Replace
the column in the index fields with virtual column stored
in new_vcol_info.
dict_index_t::assign_new_v_col(): Store the number of virtual
column added in index as a part of alter table.
dict_index_t::get_n_new_vcol(): Get the number of newly added
virtual column
dict_index_t::assign_drop_v_col(): Allocate the memory for
adding new virtual column in new_vcol_info.
dict_index_t::add_drop_v_col(): Add the newly added virtual
column in new_vcol_info.
dict_table_t::has_lock_for_other_trx(): Whether the table has
any other transaction lock than given transaction.
row_merge_drop_indexes(): Add parameter alter_trx and check
whether the table has any other lock than alter transaction.
When a table has been evicted from dict_sys and reloaded internally by
InnoDB for FOREIGN KEY processing, statistics may not be initialized,
but nevertheless row_update_cascade_for_mysql() could invoke
dict_stats_update_if_needed(). In that case, we cannot really update
the statistics. For tables that have STATS_PERSISTENT=1 and
STATS_AUTO_RECALC=1, ANALYZE TABLE might have to be executed later.
dict_stats_update_if_needed(): Replace the assertion with
a conditional early return.
The last_updated column of innodb_table_stats and innodb_index_stats
hasn't been DATA_FIXBINARY for many years.
Innodb represents TIMESTAMP as INT of length 4. Let's test it with this
and stop hiding the result in mysql_upgrade test.
Reviewer: Marko
This is a fixup patch for MDEV-23991 afc9d00c66
We really should read result.n_leaf_pages, which was set previously.
Analysis and fix was provided by Jukka Santala. Thanks!
Reviewed by: Marko Mäkelä
The nonnull attribute is not applicable to parameters that are
passed by reference, at least not in the Intel compiler.
Let us remove the reference indirection, which was only there
so that the pointer could be assigned to NULL, and let the
callers perform that task.
row_log_allocate(): Fix a bug in out-of-memory error handling
that would leave a pointer to freed memory.
As noted in commit 0b66d3f70d,
MariaDB does not support CREATE TABLESPACE for InnoDB.
Hence, some code that was added in
commit fec844aca8
and originally in
mysql/mysql-server@c71dd213bd
is unused in MariaDB and should be removed.
Patch removes dict_index_t::stats_latch. Table/index statistics now
protected with dict_sys->mutex. That way statistics computation can
happen in parallel in several threads and dict_sys->mutex will be locked
only for a short period of time.
This patch is a joint work with Marko Mäkelä
dict_index_t:🔒 make mutable which allows to pass const pointer
when only lock is touched in an object
btr_height_get()
btr_get_size(): make index argument const for better type safety
btr_estimate_number_of_different_key_vals(): now returns computed values
instead of setting fields in dict_index_t directly
remove everything related to dict_index_t::stats_latch
dict_stats_index_set_n_diff(): now returns computed values instead
of setting fields in dict_index_t directly
dict_stats_analyze_index(): now returns computed values instead
of setting fields in dict_index_t directly
Reviewed by: Marko Mäkelä
This issue is caused by MDEV-22456 ad6171b91c. Fix involves the backported version of 10.4 patch
MDEV-22778 5f2628d1ee and few parts of
MDEV-17441 (e9a5f288f2).
dict_table_t::stats_latch_created: Removed
dict_table_t::stats_latch: make value member and always lock it for
simplicity even for stats cloned table.
zip_pad_info_t::mutex_created: Removed
zip_pad_info_t::mutex: make member value instead of pointer
os0once.h: Removed
dict_table_remove_from_cache_low(): Ensure that fts_free() is always
called, even if dict_mem_table_free() is deferred until
btr_search_lazy_free().
InnoDB would always zip_pad_info_t::mutex and
dict_table_t::autoinc_mutex, even for tables are not in
ROW_FORMAT=COMPRESSED nor include any AUTO_INCREMENT column.
Marking of deletion of row in fts index happens twice in
self-referential foreign key relation. So while performing
referential checks of foreign key, InnoDB can avoid updating
of fts index if the foreign key has self-referential relationship.
Reviewed-by: Marko Mäkelä
dict_load_table_low(): Copy the 'discarded' flag to file_unreadable.
This allows to avoid a potentially harmful call to dict_stats_init()
in ha_innobase::open().
After DISCARD TABLESPACE, the tablespace of a table will no longer
exist, and dict_get_and_save_data_dir_path() would invoke
dict_get_first_path() to read an entry from SYS_DATAFILES.
For some reason, DISCARD TABLESPACE would not to remove the entry
from there.
dict_get_and_save_data_dir_path(): If the tablespace has been
discarded, do not bother trying to read the name.
Side note: The tables SYS_TABLESPACES and SYS_DATAFILES are
redundant and subject to removal in MDEV-22343.
dict_foreign_qualify_index(): Reject corrupted or garbage indexes.
For index stubs that are created on virtual columns, no
dict_field_t::col would be assign. Instead, the entire table
definition would be reloaded on a successful operation.
buf_page_create() is invoked when page is initialized. So that
previous contents of the page ignored. In few cases, it calls
buf_page_get_gen() is called to fetch the page from buffer pool.
It should take x-latch on the page. If other thread uses the block
or block io state is different from BUF_IO_NONE then release the
mutex and check the state and buffer fix count again. For compressed
page, use the existing free block from LRU list to create new page.
Retry to fetch the compressed page if it is in flush list
fseg_create(), fseg_create_general(): Introduce block as a parameter
where segment header is placed. It is used to avoid repetitive
x-latch on the same page
Change the assert to check whether the page has SX latch and
X latch in all callee function of buf_page_create()
mtr_t::get_fix_count(): Get the buffer fix count of the given
block added by the mtr
FindBlock is added to find the buffer fix count of the given
block acquired by the mini-transaction
Problem:
========
dict_load_table_one() doesn't handle the scenario where clustered
index page is FIL_NULL when DICT_ERR_IGNORE_INDEX_ROOT mode
is set.
Fix:
====
InnoDB should set the file_unreadable when it can't find the
clustered index root page.
MemorySanitizer (clang -fsanitize=memory) requires that all code
be compiled with instrumentation enabled. The only exception is the
C runtime library. Failure to use instrumented libraries will cause
bogus messages about memory being uninitialized.
In WITH_MSAN builds, we must avoid calling getservbyname(),
because even though it is a standard library function, it is
not instrumented, not even in clang 10.
Note: Before MariaDB Server 10.5, ./mtr will typically fail
due to the old PCRE library, which was updated in MDEV-14024.
The following cmake options were tested on 10.5
in commit 94d0bb4dbe:
cmake \
-DCMAKE_C_FLAGS='-march=native -O2' \
-DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \
-DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \
-DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \
-DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DWITH_SAFEMALLOC=OFF \
-DWITH_{ZLIB,SSL,PCRE}=bundled \
-DHAVE_LIBAIO_H=0 \
-DWITH_MSAN=ON
MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED()
and __msan_unpoison().
MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for
VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow().
InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros.
ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in
functions instead of inline assembler when building WITH_MSAN.
This will require at least -msse4.2 when building for IA-32 or AMD64.
The inline assembler would not be instrumented, and would thus cause
bogus failures.
We are supposed to commit and restart the mini-transaction
between records. There is no point to store and restore the
persistent cursor position otherwise.
If buf_page_optimistic_get() is patched to always fail, the
debug build would fail to start up due to trying to re-acquire
an already S-latched block.
This bug (which should not have visible impact to users, because
the code is only executed during startup, while no other threads
are accessing B-trees or causing pages to be evicted from the
buffer pool) was caught as part of a debugging effort for
something else.
The debugging approach was: Make buf_page_optimistic_get()
always return FALSE, and add ut_a(block->lock.lock_word == X_LOCK_DECR)
to both buf_LRU_get_free_only() and buf_LRU_block_free_non_file_page().
This would catch misuse of the buffer pool. If it were not for
buf_page_optimistic_get(), no buf_block_t::lock of any BUF_BLOCK_NOT_USED
block would ever be acquired.
Introduce a new ATTRIBUTE_NOINLINE to
ib::logger member functions, and add UNIV_UNLIKELY hints to callers.
Also, remove some crash reporting output. If needed, the
information will be available using debugging tools.
Furthermore, remove some fts_enable_diag_print output that included
indexed words in raw form. The code seemed to assume that words are
NUL-terminated byte strings. It is not clear whether a NUL terminator
is always guaranteed to be present. Also, UCS2 or UTF-16 strings would
typically contain many NUL bytes.
This is a regression due to MDEV-16376
commit 8dc70c862b.
To make dict_index_t::detach_columns() idempotent,
we cleared dict_index_t::n_fields. But, this could
cause trouble with purge after a secondary index
creation failed (not even involving virtual columns).
A better way is to clear the dict_field_t::col pointers
that point to virtual columns that are being freed
due to aborting index creation on an index that depends
on a virtual column.
Note: the v_cols[] of an existing dict_table_t object will
never be modified. If any virtual columns are added or removed,
ha_innobase::commit_inplace_alter_table() would invoke
dict_table_remove_from_cache() and reload the table to dict_sys.
Index creation is a special case where the dict_index_t points
to virtual columns that do not yet exist in dict_table_t.
If the InnoDB buffer pool contains many pages for a table or index
that is being dropped or rebuilt, and if many of such pages are
pointed to by the adaptive hash index, dropping the adaptive hash index
may consume a lot of time.
The time-consuming operation of dropping the adaptive hash index entries
is being executed while the InnoDB data dictionary cache dict_sys is
exclusively locked.
It is not actually necessary to drop all adaptive hash index entries
at the time a table or index is being dropped or rebuilt. We can let
the LRU replacement policy of the buffer pool take care of this gradually.
For this to work, we must detach the dict_table_t and dict_index_t
objects from the main dict_sys cache, and once the last
adaptive hash index entry for the detached table is removed
(when the garbage page is evicted from the buffer pool) we can free
the dict_table_t and dict_index_t object.
Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE
skip both the buffer pool eviction and the drop of the adaptive hash index.
We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE.
We can remove the eviction from DROP TABLE. We must retain the eviction
in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the
discarded table is being re-imported with the same tablespace identifier,
the fresh data from the imported tablespace will replace any stale pages
in the buffer pool.
rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can
no longer be interrupted inside InnoDB.
fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(),
fseg_free_page_low(), fseg_free_extent(): Remove the parameter
that specifies whether the adaptive hash index should be dropped.
btr_search_lazy_free(): Lazily free an index when the last
reference to it is dropped from the adaptive hash index.
buf_pool_clear_hash_index(): Declare static, and move to the
same compilation unit with the bulk of the adaptive hash index
code.
dict_index_t::clone(), dict_index_t::clone_if_needed():
Clone an index that is being rebuilt while adaptive hash index
entries exist. The original index will be inserted into
dict_table_t::freed_indexes and dict_index_t::set_freed()
will be called.
dict_index_t::set_freed(), dict_index_t::freed(): Note that
or check whether the index has been freed. We will use the
impossible page number 1 to denote this condition.
dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count().
dict_index_t::detach_columns(): Move the assignment n_fields=0
to ha_innobase_inplace_ctx::clear_added_indexes().
We must have access to the columns when freeing the
adaptive hash index. Note: dict_table_t::v_cols[] will remain
valid. If virtual columns are dropped or added, the table
definition will be reloaded in ha_innobase::commit_inplace_alter_table().
buf_page_mtr_lock(): Drop a stale adaptive hash index if needed.
We will also reduce the number of btr_get_search_latch() calls
and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT
in order to benefit cmake -DWITH_INNODB_AHI=OFF.
innobase_get_charset(), innobase_get_stmt_safe(): Remove.
It is more efficient and readable to invoke thd_charset()
and thd_query_safe() directly, without a non-inlined wrapper function.
As part of the SPATIAL INDEX implementation in InnoDB,
dict_index_t was expanded by a rtr_ssn_t field. There are only
3 operations for this field, all protected by rtr_ssn_t::mutex:
* btr_cur_search_to_nth_level() stores the least significant 32 bits
of the 64-bit value that is stored in the index root page.
(This would better be done when the table is opened for the
very first time.)
* rtr_get_new_ssn_id() increments the value by 1.
* rtr_get_current_ssn_id() reads the current value.
All these operations can be implemented equally safely by using
atomic memory access operations.
FOREIGN_KEY_CHECKS is disabled
- Referenced index can be null While renaming the referenced column name.
In that case, rename the referenced column name in dict_foreign_t and
find the equivalent referenced index.
dict_stats_update_if_needed(): Replace the parameter THD*
with const trx_t& so that trx_t::is_wsrep() can be invoked
instead of the more expensive wsrep_on().
Replace also other occurrences of wsrep_on() with trx_t::is_wsrep().
FOREIGN_KEY_CHECKS is disabled
- dict_foreign_find_index() can return NULL if InnoDB already dropped
the foreign index when FOREIGN_KEY_CHECKS is disabled.
fil_delete_tablespace(): Remove the unused parameter drop_ahi,
and add the parameter if_exists=false. We want to suppress
error messages if we know that the tablespace has been discarded.
dict_table_rename_in_cache(): Pass the new parameter to
fil_delete_tablespace(), that is, do not complain about
missing tablespace if the tablespace has been discarded.
row_make_new_pathname(): Declare as static.
row_drop_table_for_mysql(): Tolerate !table->data_dir_path
when the tablespace has been discarded.
row_rename_table_for_mysql(): Skip part of the RENAME TABLE
when fil_space_get_first_path() returns NULL.
All tablespace metadata is buffered in fil_system. There is a LRU
mechanism, but that only controls the opening and closing of
fil_node_t::handle.
It is much more efficient and less error-prone to access data file names
by looking up the fil_space_t object rather than by essentially joining
each row with an access to SYS_DATAFILES via the InnoDB internal SQL parser.
dict_get_first_path(): Declare static. The function may only be needed
when loading or updating the data dictionary. Also, change a condition
in order to avoid a bogus GCC 10 -Wstringop-overflow warning for
mem_strdupl() about len==ULINT_UNDEFINED.
i_s_sys_tablespaces_fill_table(): Do not access other InnoDB internal
dictionary tables than SYS_TABLESPACES.
MySQL 5.7.29 includes the following fix:
Bug #30287668 INNODB: A LONG SEMAPHORE WAIT
mysql/mysql-server@5cdbb22b51
There is no test case. It seems that the problem could occur when
a spatial index is large and peculiar enough so that multiple R-tree
leaf pages will have the exactly same maximum bounding rectangle (MBR).
The commit message suggests that the hang can occur when R-tree
non-leaf pages are being merged, which should only be possible
during transaction rollback or the purge of transaction history,
when the R-tree index is at least 2 levels high and very many records
are being deleted. The message says that a comparison result that two
spatial index node pointer records are equal will cause an infinite loop
in rtr_page_copy_rec_list_end_no_locks(). Hence, we must include the
child page number in the comparison to be consistent with
mysql/mysql-server@2e11fe0e15.
We fix this bug in a simpler way, involving fewer code changes.
cmp_rec_rec(): Renamed from cmp_rec_rec_with_match().
Assert that rec2 always resides in an index page.
Treat non-leaf spatial index pages specially.
offset_t: this is a type which represents one record offset.
It's unsigned short int.
a lot of functions: replace ulint with offset_t
btr_pcur_restore_position_func(),
page_validate(),
row_ins_scan_sec_index_for_duplicate(),
row_upd_clust_rec_by_insert_inherit_func(),
row_vers_impl_x_locked_low(),
trx_undo_prev_version_build():
allocate record offsets on the stack instead of waiting for rec_get_offsets()
to allocate it from mem_heap_t. So, reducing memory allocations.
RECORD_OFFSET, INDEX_OFFSET:
now it's less convenient to store pointers in offset_t*
array. One pointer occupies now several offset_t. And those constant are start
indexes into array to places where to store pointer values
REC_OFFS_HEADER_SIZE: adjusted for the new reality
REC_OFFS_NORMAL_SIZE:
increase size from 100 to 300 which means less heap allocations.
And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which
is smaller than previous 800 bytes.
REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality
rem0rec.h, rem0rec.ic, rem0rec.cc:
various arguments, return values and local variables types were changed to
fix numerous integer conversions issues.
enum field_type_t:
offset types concept was introduces which replaces old offset flags stuff.
Like in earlier version, 2 upper bits are used to store offset type.
And this enum represents those types.
REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed
get_type(), set_type(), get_value(), combine():
these are convenience functions to work with offsets and it's types
rec_offs_base()[0]:
still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL
rec_offs_base()[i]:
these have type offset_t now. Two upper bits contains type.
ut_rnd_interval(): Remove the first parameter, which was mostly
passed as 0. Implement as a simple wrapper around ut_rnd_gen().
Trivially return 0 if the size of the interval is smaller than 2.
ut_rnd_ulint_counter, ut_rnd_gen_next_ulint(), ut_rnd_gen_ulint(): Remove.
buf_read_ibuf_merge_pages(): Discard any page numbers that are
outside the current bounds of the tablespace, by invoking the
function ibuf_delete_recs() that was introduced in MDEV-20934.
This could avoid an infinite change buffer merge loop on
innodb_fast_shutdown=0, because normally the change buffer merge
would only be attempted if a page was successfully loaded into
the buffer pool.
dict_drop_index_tree(): Add the parameter trx_t*.
To prevent the DROP TABLE crash, do not invoke btr_free_if_exists()
if the entire .ibd file will be dropped. Thus, we will avoid a crash
if the BTR_SEG_LEAF or BTR_SEG_TOP of the index is corrupted,
and we will also avoid unnecessarily accessing the to-be-dropped
tablespace via the buffer pool.
In MariaDB 10.2, we disable the DROP TABLE fix if innodb_safe_truncate=0,
because the backup-unsafe MySQL 5.7 WL#6501 form of TRUNCATE TABLE
requires that the individual pages be freed inside the tablespace.
Move row size check to early CREATE/ALTER TABLE phase. Stop checking
on table open.
dict_index_add_to_cache(): remove parameter 'strict', stop checking row size
dict_index_t::record_size_info_t: this is a result of row size check operation
create_table_info_t::row_size_is_acceptable(): performs row size check.
Issues error or warning. Writes first overflow field to InnoDB log.
create_table_info_t::create_table(): add row size check
dict_index_t::record_size_info(): this is a refactored version
of dict_index_t::rec_potentially_too_big(). New version doesn't change global
state of a program but return all interesting info. And it's callers who
decide how to handle row size overflow.
dict_index_t::rec_potentially_too_big(): removed
page_rec_write_field(): Remove.
dict_create_index_tree_step(): If the SYS_INDEXES.PAGE does not change,
do not update it in the data dictionary. Typically, all index page numbers
would be unchanged before and after IMPORT TABLESPACE, except if some
secondary indexes were created after loading some data.
btr_root_fseg_adjust_on_import(): Remove the redundant mtr_t* parameter.
Redo logging is disabled during the page adjustments that IMPORT TABLESPACE
is performing.
dict_index_add_to_cache(): Make the 'index' a reference to a pointer,
so that the caller will avoid the expensive call to
dict_index_get_if_in_cache_low().
InnoDB: Assertion failure in file .../dict/dict0dict.cc line ...
InnoDB: Failing assertion: table->can_be_evicted
This fixes a regression that was caused by the fix of MDEV-20621
(commit a41d429765).
MySQL 5.6 (and MariaDB 10.0) introduced eviction of tables from
the InnoDB data dictionary cache. Tables that are connected to
FOREIGN KEY constraints or FULLTEXT INDEX are exempt of the eviction.
With the problematic change, a table that would already be exempt
from eviction due to FOREIGN KEY would cause the problem if there
also was a FULLTEXT INDEX defined on it.
dict_load_table(): Only prevent eviction if table->can_be_evicted holds.
dict_table_rename_in_cache(): Use strcpy() instead of strncpy(),
because they are known to be equivalent in this case (the length
of old_name was already validated).
mariabackup: Invoke strncpy() with one less than the buffer size,
and explicitly add NUL as the last byte of the buffer.