In commit fe39d02f51 (MDEV-20638)
we removed some wake-up signaling of the master thread that should
have been there, to ensure a steady log checkpointing workload.
Common sense suggests that the commit omitted some necessary calls
to srv_inc_activity_count(). But, an attempt to add the call to
trx_flush_log_if_needed_low() as well as to reinstate the function
innobase_active_small() did not restore the performance for the
case where sync_binlog=1 is set.
Therefore, we will revert the entire commit in MariaDB Server 10.2.
In MariaDB Server 10.5, adding a srv_inc_activity_count() call to
trx_flush_log_if_needed_low() did restore the performance, so we
will not revert MDEV-20638 across all versions.
InnoDB transaction rollback includes an unnecessary work-around for
a data corruption bug that was fixed by me in MySQL 5.6.12
mysql/mysql-server@935ba09d52
and ported to MariaDB 10.0.8 by
commit c291ddfdf7
in 2013 and 2014, respectively.
By acquiring and releasing dict_operation_lock in shared mode,
row_undo() hopes to prevent the table from being dropped while
the undo log record is being rolled back. But, thanks to mentioned fix,
debug assertions (that we are adding) show that the rollback is
protected by transactional locks (table IX lock, in addition to
implicit or explicit exclusive locks on the records that had been modified).
Because row_drop_table_for_mysql() would invoke
row_add_table_to_background_drop_list() if any locks exist on the table,
the mere existence of locks (which is guaranteed during ROLLBACK) is
enough to protect the table from disappearing. Hence, acquiring and
releasing dict_operation_lock for every row that is being rolled back is
unnecessary.
row_undo(): Remove the unnecessary acquisition and release of
dict_operation_lock.
Note: row_add_table_to_background_drop_list() is mostly working around
bugs outside InnoDB:
MDEV-21175 (insufficient MDL protection of FOREIGN KEY operations)
MDEV-21602 (incorrect error handling of CREATE TABLE...SELECT).
Problem:
=======
InnoDB drops the column which has foreign key relations on it. So it
tries to load the foreign key during rename process of copy algorithm
even though the foreign_key_check is disabled.
Solution:
========
During alter copy algorithm, InnoDB ignores the error while loading
the foreign key constraint if foreign key check is disabled. It
should throw the warning about failure of the foreign key constraint
when foreign key check is disabled.
row_vers_impl_x_locked_low(): clust_offsets may point to memory
that is allocated by mem_heap_alloc() and may have been freed.
For initializing clust_offsets, try to use the stack-allocated
buffer instead of a pointer that may point to freed memory.
This fixes a regression that was introduced in
commit f0aa073f2b (MDEV-20950).
The fix of MDEV-13654 (commit ff81faf670)
wrongly caused ADD PRIMARY KEY to ignore duplicate PRIMARY KEY values
caused by concurrent DML transactions that had been started before the
ALTER TABLE operation (but did not access the table before the ALTER TABLE
started).
row_ins_duplicate_online(): Always report a duplicate key error
if DB_TRX_ID had been reset (it belongs to a transaction that had
started before the ALTER TABLE operation).
- Due to commit fe95cb2e40 (MDEV-16125),
InnoDB master thread does not need to call srv_resume_thread()
and therefore there is no need to wake up the thread.
Due to the above patch, InnoDB should remove the following dead code.
srv_check_activity(): Makes the parameter as in,out and returns the
recent activity value
innobase_active_small(): Removed
srv_active_wake_master_thread(): Removed
srv_wake_master_thread(): Removed
srv_active_wake_master_thread_low(): Removed
Simplify srv_master_thread() and remove switch cases, added the assert.
Replace srv_wake_master_thread() with srv_inc_activity_count()
INNOBASE_WAKE_INTERVAL: Removed
Fix stale virtual field value in 4 cases: when virtual field depends
on row_start/row_end in timestamp/trx_id versioned table. row_start
dep is recalculated in vers_update_fields() (SQL and InnoDB
layer). row_end dep is recalculated on history row insert.
make_versioned_helper() appended new update field unconditionally
while it should check if this field already exists in update vector.
Misc renames to conform versioning prefix. vers_update_fields() name
conforms with sql layer TABLE::vers_update_fields().
In AddressSanitizer, we only want memory poisoning to happen
in connection with custom memory allocation or freeing.
The primary use of MEM_UNDEFINED is for declaring memory uninitialized
in Valgrind or MemorySanitizer. We do not want MEM_UNDEFINED to
have the unwanted side effect that AddressSanitizer would no longer
be able to complain about accessing unallocated memory.
MEM_UNDEFINED(): Define as no-op for AddressSanitizer.
MEM_MAKE_ADDRESSABLE(): Define as MEM_UNDEFINED() or
ASAN_UNPOISON_MEMORY_REGION().
MEM_CHECK_ADDRESSABLE(): Wrap also __asan_region_is_poisoned().
- Some of the bug fixes are backports from 10.5!
- The fix in innobase/fil/fil0fil.cc is just a backport to get less
error messages in mysqld.1.err when running with valgrind.
- Renamed HAVE_valgrind_or_MSAN to HAVE_valgrind
MemorySanitizer (clang -fsanitize=memory) requires that all code
be compiled with instrumentation enabled. The only exception is the
C runtime library. Failure to use instrumented libraries will cause
bogus messages about memory being uninitialized.
In WITH_MSAN builds, we must avoid calling getservbyname(),
because even though it is a standard library function, it is
not instrumented, not even in clang 10.
Note: Before MariaDB Server 10.5, ./mtr will typically fail
due to the old PCRE library, which was updated in MDEV-14024.
The following cmake options were tested on 10.5
in commit 94d0bb4dbe:
cmake \
-DCMAKE_C_FLAGS='-march=native -O2' \
-DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \
-DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \
-DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \
-DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DWITH_SAFEMALLOC=OFF \
-DWITH_{ZLIB,SSL,PCRE}=bundled \
-DHAVE_LIBAIO_H=0 \
-DWITH_MSAN=ON
MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED()
and __msan_unpoison().
MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for
VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow().
InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros.
ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in
functions instead of inline assembler when building WITH_MSAN.
This will require at least -msse4.2 when building for IA-32 or AMD64.
The inline assembler would not be instrumented, and would thus cause
bogus failures.
Problem:
========
- InnoDB clears the fts resource when last FTS index is being dropped
if the table has user defined FTS_DOC_ID. While creating the new fts
index, InnoDB expects to have FTS resources.
Fix:
===
fts_drop_index(): Removed the fts resource clear.
fts_clear_all(): Clear the fts resource when there are no new fts
index to be added.
commit_cache_norebuild(), row_merge_drop_indexes():
Tries to call fts resource after removing associated fts index
from table object
The background drop table queue in InnoDB is a work-around for
cases where the SQL layer is requesting DDL on tables on which
transactional locks exist.
One such case are XA transactions. Our test case exploits the
fact that the recovery of XA PREPARE transactions will
only resurrect InnoDB table locks, but not MDL that should
block any concurrent DDL.
srv_shutdown_t: Introduce the srv_shutdown_state=SRV_SHUTDOWN_INITIATED
for the initial part of shutdown, to wait for the background drop
table queue to be emptied.
srv_shutdown_bg_undo_sources(): Assign
srv_shutdown_state=SRV_SHUTDOWN_INITIATED
before waiting for the background drop table queue to be emptied.
row_drop_tables_for_mysql_in_background(): On slow shutdown, if
no active transactions exist (excluding ones that are in
XA PREPARE state), skip any tables on which locks exist.
row_drop_table_for_mysql(): Do not unnecessarily attempt to
drop InnoDB persistent statistics for tables that have
already been added to the background drop table queue.
row_mysql_close(): Relax an assertion, and free all memory
even if innodb_force_recovery=2 would prevent the background
drop table queue from being emptied.
Introduce a new ATTRIBUTE_NOINLINE to
ib::logger member functions, and add UNIV_UNLIKELY hints to callers.
Also, remove some crash reporting output. If needed, the
information will be available using debugging tools.
Furthermore, remove some fts_enable_diag_print output that included
indexed words in raw form. The code seemed to assume that words are
NUL-terminated byte strings. It is not clear whether a NUL terminator
is always guaranteed to be present. Also, UCS2 or UTF-16 strings would
typically contain many NUL bytes.
- During column reorder table rebuild, rollback of insert fails.
Reason is that InnoDB tries to fetch the column position from
new clustered index and it exceeds default column value tuple fields.
So InnoDB should use the table column position while searching for
defaults column value.
ins_node_create_entry_list(): Create dummy empty tuples for
corrupted or incomplete indexes, to avoid dereferencing a NULL
dict_field_t::col pointer in row_build_index_entry_low().
This issue was found by a crash in the test gcol.innodb_virtual_basic
when merging the fix to 10.5.
This is a regression due to the cleanup
commit 12f804acfa.
row_sel_open_pcur(): Remove the unnecessary parameter.
It suffices for us to acquire the adaptive hash index latch
only when btr_search_guess_on_hash() is called by
btr_cur_search_to_nth_level_func(), in
btr_pcur_open_with_no_init().
This code seems to be a relic from the times when there was
only one btr_search_latch, which was held in shared mode
for longer periods of time. Another relic of that era was
removed in commit e5980bf1b1.
This clean-up was missed when the btr_search_latch was split in
mysql/mysql-server/commit@ab17ab91ce18a47bb6c5c49e4dc0505ad488a448
(MySQL 5.7.8).
Problem:
=======
- During alter rebuild, document read from old table is tokenzied
parallelly by innodb_ft_sort_pll_degree threads and stores it
in respective merge files. While doing the parallel merge, InnoDB
wrongly skips the root level selection of merging buffer records.
So it leads to insertion of merge records in non-ascending order.
Solution:
==========
Build selection tree for the root level also. So that root of
selection tree can always contain sorted buffer.
If the InnoDB buffer pool contains many pages for a table or index
that is being dropped or rebuilt, and if many of such pages are
pointed to by the adaptive hash index, dropping the adaptive hash index
may consume a lot of time.
The time-consuming operation of dropping the adaptive hash index entries
is being executed while the InnoDB data dictionary cache dict_sys is
exclusively locked.
It is not actually necessary to drop all adaptive hash index entries
at the time a table or index is being dropped or rebuilt. We can let
the LRU replacement policy of the buffer pool take care of this gradually.
For this to work, we must detach the dict_table_t and dict_index_t
objects from the main dict_sys cache, and once the last
adaptive hash index entry for the detached table is removed
(when the garbage page is evicted from the buffer pool) we can free
the dict_table_t and dict_index_t object.
Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE
skip both the buffer pool eviction and the drop of the adaptive hash index.
We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE.
We can remove the eviction from DROP TABLE. We must retain the eviction
in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the
discarded table is being re-imported with the same tablespace identifier,
the fresh data from the imported tablespace will replace any stale pages
in the buffer pool.
rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can
no longer be interrupted inside InnoDB.
fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(),
fseg_free_page_low(), fseg_free_extent(): Remove the parameter
that specifies whether the adaptive hash index should be dropped.
btr_search_lazy_free(): Lazily free an index when the last
reference to it is dropped from the adaptive hash index.
buf_pool_clear_hash_index(): Declare static, and move to the
same compilation unit with the bulk of the adaptive hash index
code.
dict_index_t::clone(), dict_index_t::clone_if_needed():
Clone an index that is being rebuilt while adaptive hash index
entries exist. The original index will be inserted into
dict_table_t::freed_indexes and dict_index_t::set_freed()
will be called.
dict_index_t::set_freed(), dict_index_t::freed(): Note that
or check whether the index has been freed. We will use the
impossible page number 1 to denote this condition.
dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count().
dict_index_t::detach_columns(): Move the assignment n_fields=0
to ha_innobase_inplace_ctx::clear_added_indexes().
We must have access to the columns when freeing the
adaptive hash index. Note: dict_table_t::v_cols[] will remain
valid. If virtual columns are dropped or added, the table
definition will be reloaded in ha_innobase::commit_inplace_alter_table().
buf_page_mtr_lock(): Drop a stale adaptive hash index if needed.
We will also reduce the number of btr_get_search_latch() calls
and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT
in order to benefit cmake -DWITH_INNODB_AHI=OFF.
This essentially reverts commit b393e2cb0c.
The leak might have been fixed, but because the
DEBUG_SYNC instrumentation for InnoDB purge threads was reverted
in 10.5 commit 5e62b6a5e0
as part of introducing a thread pool, it is easiest to revert
the entire change.
During the UPDATE of PRIMARY KEY columns, we may miscalculate the
size of the clustered index record.
row_upd_clust_rec_by_insert(): Pass the total number of off-page columns,
which may include such columns that were inherited from the record
and not created as part of the UPDATE operation.
This is based on
mysql/mysql-server@490c45e8c8
which is a follow-up to
mysql/mysql-server@1fa475b85d
which we filed and fixed as MDEV-21511.
No test case was provided by Oracle.
dict_stats_update_if_needed(): Replace the parameter THD*
with const trx_t& so that trx_t::is_wsrep() can be invoked
instead of the more expensive wsrep_on().
Replace also other occurrences of wsrep_on() with trx_t::is_wsrep().
The function wsrep_on() was being called rather frequently
in InnoDB and XtraDB. Let us cache it in trx_t and invoke
trx_t::is_wsrep() instead.
innobase_trx_init(): Cache trx->wsrep = wsrep_on(thd).
ha_innobase::write_row(): Replace many repeated calls to current_thd,
and test the cheapest condition first.
row_vers_vc_matches_cluster(): Remove the parameter in_purge,
which was always passed as in_purge=true.
This parameter became constant in
mysql/mysql-server@1dec14d346
and it always was constant in MariaDB starting from the
introduction of the function in
commit 2e814d4702 (MariaDB 10.2.2).
row_prebuilt_free(): Do not attempt to drop orphan indexes
that might have been left behind by a failed ADD UNIQUE INDEX.
This avoids the execution of unwanted transactions during shutdown.
After MDEV-12353, the consistency check that I originally added for
commit 1b9fe0bbac
(InnoDB Plugin for MySQL 5.1) started randomly failing.
It turns out that the IMPORT TABLESPACE code was always incorrect:
it did not update the (redundantly stored) tablespace ID
in index tree root pages. It only does that for page headers
and BLOB pointers.
PageConverter::update_index_page(): Update the tablespace ID
in the BTR_SEG_TOP and BTR_SEG_LEAF of index root pages.
This is a backport of commit b8b3edff13.
Improve the test that was imported and adapted for MariaDB in
commit fb217449dc.
row_undo_step(): Move the DEBUG_SYNC point from trx_rollback_for_mysql().
This DEBUG_SYNC point is executed after rolling back one row.
trx_rollback_for_mysql(): Clarify the comments that describe the scenario,
and remove the DEBUG_SYNC point.
If the statement "if (trx->has_logged_persistent())" and its body are
removed from trx_rollback_for_mysql(), then the test
innodb.xa_recovery_debug will fail because the transaction would still
exist in the XA PREPARE state. If we allow the XA COMMIT statement
to succeed in the test, we would observe an incorrect state of the
XA transaction where the table would contain row (1,NULL). Depending
on whether the XA transaction was committed, the table should either
be empty or contain the record (1,1). The intermediate state of
(1,NULL) should never be observed after completed recovery.
Making a linked list of dtuple_t is needed only for inserting
records. It's better to store tuples in a non-intrusive
container to not affect all other use cases of dtuple_t
dtuple_t::tuple_list: removed, it was 2 * sizeof(void*) bytes
ins_node_t::entry_list: now it's std::vector<dtuple_t*>
ins_node_t::entry: now it's std::vector<dtuple_t*>::iterator
DBUG_EXECUTE_IF("row_ins_skip_sec": this dead code removed