Introduce a new ATTRIBUTE_NOINLINE to
ib::logger member functions, and add UNIV_UNLIKELY hints to callers.
Also, remove some crash reporting output. If needed, the
information will be available using debugging tools.
Furthermore, remove some fts_enable_diag_print output that included
indexed words in raw form. The code seemed to assume that words are
NUL-terminated byte strings. It is not clear whether a NUL terminator
is always guaranteed to be present. Also, UCS2 or UTF-16 strings would
typically contain many NUL bytes.
Problem:
========
During buffer pool resizing, InnoDB recreates the dictionary hash
tables. Dictionary hash table reuses the heap of AHI hash tables.
It leads to memory corruption.
Fix:
====
- While disabling AHI, free the heap and AHI hash tables. Recreate the
AHI hash tables and assign new heap when AHI is enabled.
- btr_blob_free() access invalid page if page was reallocated during
buffer poolresizing. So btr_blob_free() should get the page from
buf_pool instead of using existing block.
- btr_search_enabled and block->index should be checked after
acquiring the btr_search_sys latch
- Moved the buffer_pool_scan debug sync to earlier before accessing the
btr_search_sys latches to avoid the hang of truncate_purge_debug
test case
- srv_printf_innodb_monitor() should acquire btr_search_sys latches
before AHI hash tables.
namespace intrusive: removed
split class into two: ilist<T> and sized_ilist<T> which has a size field.
ilist<T> no more NULLify pointers to bring a slignly better performance.
As a consequence, fil_space_t::is_in_unflushed_spaces and
fil_space_t::is_in_rotation_list boolean members are needed now.
There was a race condition where the connection of the
victim of a KILL statement is disconnected while the
KILL statement is executing.
As a side effect of this fix, we will make XA PREPARE
transactions immune to KILL statements.
Starting with MariaDB 10.2, we have a pool of trx_t objects.
trx_free() would only free memory to the pool. We poison the
contents of freed objects in the pool in order to catch misuse.
trx_free(): Unpoison also trx->mysql_thd and trx->state.
This is to counter the poisoning of *trx in trx_pools->mem_free().
Unpoison only on AddressSanitizer or Valgrind, but not on MemorySanitizer.
Pool: Unpoison allocated objects only on AddressSanitizer or
Valgrind, but not on MemorySanitizer.
innobase_kill_query(): Properly protect trx, acquiring also
trx_sys_t::mutex and checking trx_t::mysql_thd and trx_t::state.
This is a regression due to MDEV-16376
commit 8dc70c862b.
To make dict_index_t::detach_columns() idempotent,
we cleared dict_index_t::n_fields. But, this could
cause trouble with purge after a secondary index
creation failed (not even involving virtual columns).
A better way is to clear the dict_field_t::col pointers
that point to virtual columns that are being freed
due to aborting index creation on an index that depends
on a virtual column.
Note: the v_cols[] of an existing dict_table_t object will
never be modified. If any virtual columns are added or removed,
ha_innobase::commit_inplace_alter_table() would invoke
dict_table_remove_from_cache() and reload the table to dict_sys.
Index creation is a special case where the dict_index_t points
to virtual columns that do not yet exist in dict_table_t.
error is logged
The fix is to set flag in ib::error::~error() and check it in
mariabackup.
ib::error::error() is replaced with ib::warn::warn() in
AIO::linux_create_io_ctx() because of two reasons:
1) if we leave it as is, then mariabackup MTR tests will fail with --mem
option, because Linux AIO can not be used on tmpfs,
2) when Linux AIO can not be initialized, InnoDB falls back to simulated
AIO, so such sutiation is not fatal error, it should be treated as warning.
If the InnoDB buffer pool contains many pages for a table or index
that is being dropped or rebuilt, and if many of such pages are
pointed to by the adaptive hash index, dropping the adaptive hash index
may consume a lot of time.
The time-consuming operation of dropping the adaptive hash index entries
is being executed while the InnoDB data dictionary cache dict_sys is
exclusively locked.
It is not actually necessary to drop all adaptive hash index entries
at the time a table or index is being dropped or rebuilt. We can let
the LRU replacement policy of the buffer pool take care of this gradually.
For this to work, we must detach the dict_table_t and dict_index_t
objects from the main dict_sys cache, and once the last
adaptive hash index entry for the detached table is removed
(when the garbage page is evicted from the buffer pool) we can free
the dict_table_t and dict_index_t object.
Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE
skip both the buffer pool eviction and the drop of the adaptive hash index.
We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE.
We can remove the eviction from DROP TABLE. We must retain the eviction
in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the
discarded table is being re-imported with the same tablespace identifier,
the fresh data from the imported tablespace will replace any stale pages
in the buffer pool.
rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can
no longer be interrupted inside InnoDB.
fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(),
fseg_free_page_low(), fseg_free_extent(): Remove the parameter
that specifies whether the adaptive hash index should be dropped.
btr_search_lazy_free(): Lazily free an index when the last
reference to it is dropped from the adaptive hash index.
buf_pool_clear_hash_index(): Declare static, and move to the
same compilation unit with the bulk of the adaptive hash index
code.
dict_index_t::clone(), dict_index_t::clone_if_needed():
Clone an index that is being rebuilt while adaptive hash index
entries exist. The original index will be inserted into
dict_table_t::freed_indexes and dict_index_t::set_freed()
will be called.
dict_index_t::set_freed(), dict_index_t::freed(): Note that
or check whether the index has been freed. We will use the
impossible page number 1 to denote this condition.
dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count().
dict_index_t::detach_columns(): Move the assignment n_fields=0
to ha_innobase_inplace_ctx::clear_added_indexes().
We must have access to the columns when freeing the
adaptive hash index. Note: dict_table_t::v_cols[] will remain
valid. If virtual columns are dropped or added, the table
definition will be reloaded in ha_innobase::commit_inplace_alter_table().
buf_page_mtr_lock(): Drop a stale adaptive hash index if needed.
We will also reduce the number of btr_get_search_latch() calls
and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT
in order to benefit cmake -DWITH_INNODB_AHI=OFF.
This essentially reverts commit b393e2cb0c.
The leak might have been fixed, but because the
DEBUG_SYNC instrumentation for InnoDB purge threads was reverted
in 10.5 commit 5e62b6a5e0
as part of introducing a thread pool, it is easiest to revert
the entire change.
innobase_get_charset(), innobase_get_stmt_safe(): Remove.
It is more efficient and readable to invoke thd_charset()
and thd_query_safe() directly, without a non-inlined wrapper function.
As part of the SPATIAL INDEX implementation in InnoDB,
dict_index_t was expanded by a rtr_ssn_t field. There are only
3 operations for this field, all protected by rtr_ssn_t::mutex:
* btr_cur_search_to_nth_level() stores the least significant 32 bits
of the 64-bit value that is stored in the index root page.
(This would better be done when the table is opened for the
very first time.)
* rtr_get_new_ssn_id() increments the value by 1.
* rtr_get_current_ssn_id() reads the current value.
All these operations can be implemented equally safely by using
atomic memory access operations.
The test was incompatible with ./mtr --repeat=2 until
commit 2d6719d7ee
fixed that.
It turns out that the failing assertion that we disabled in
commit 3db94d2403
is bogus and can fail when the change buffer is emptied
during the last batch of crash recovery. The reason for this
is the condition around the page_create_empty() call in
page_cur_delete_rec(). The condition was removed in MariaDB
Server 10.5 as part of MDEV-12353, in
commit 7ae21b18a6 and
commit f8a9f90667.
The bug that the assertion aimed to catch is MDEV-22497, which
was fixed in commit 26aab96ecf.
This is a partial backport of
commit 5e7e7153b4 from 10.4.
assert_trx_is_free(): Assert !is_wsrep().
trx_init(): Do not initialize trx->wsrep, because it must have been
initialized already.
trx_commit_in_memory(): Invoke wsrep_commit_ordered(). This call
was being skipped, because the transaction object had already been
freed to the pool.
trx_rollback_for_mysql(), innobase_commit_low(),
innobase_rollback_trx(): Always reset trx->wsrep.
Several MYSQL_SYSVAR_STR parameters that employ both a validate
function callback fail to copy the string for saving the
validated value. The affected variables include the following:
innodb_ft_aux_table
innodb_ft_server_stopword_table
innodb_ft_user_stopword_table
innodb_buffer_pool_filename
The test case is an enhanced version of
mysql/mysql-server@0b0c30641f
and the code changes are inspired by their fixes.
We are also importing and adjusting the test innodb_fts.stopword
to get coverage for the variable innodb_ft_user_stopword_table.
buf_dump(), buf_load(): Protect srv_buf_dump_filename with
LOCK_global_system_variables.
fts_load_user_stopword(): Minor cleanup
fts_load_stopword(): Remove the parameter global_stopword_table.
innobase_fts_load_stopword(): Protect innodb_server_stopword_table
against concurrent SET GLOBAL.
dict_stats_update_if_needed(): Replace the parameter THD*
with const trx_t& so that trx_t::is_wsrep() can be invoked
instead of the more expensive wsrep_on().
Replace also other occurrences of wsrep_on() with trx_t::is_wsrep().
The function wsrep_on() was being called rather frequently
in InnoDB and XtraDB. Let us cache it in trx_t and invoke
trx_t::is_wsrep() instead.
innobase_trx_init(): Cache trx->wsrep = wsrep_on(thd).
ha_innobase::write_row(): Replace many repeated calls to current_thd,
and test the cheapest condition first.
was restored.
Optionally rollback prepared XA's on "mariabackup --prepare".
The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves.
redo log during recovery
- InnoDB unnecessarily reads the page even though it has fully initialized
buffered redo log records. Allow the page initialization redo log to
apply for the page in buf_page_get_gen() during recovery.
- Renamed buf_page_get_gen() to buf_page_get_low()
- Newly added buf_page_get_gen() will check for buffered redo log for
the particular page id during recovery
- Added new function buf_page_mtr_lock() which basically latches the page
for the given latch type.
- recv_recovery_create_page() is inline function which creates a page
if it has page initialization redo log records.
Making a linked list of dtuple_t is needed only for inserting
records. It's better to store tuples in a non-intrusive
container to not affect all other use cases of dtuple_t
dtuple_t::tuple_list: removed, it was 2 * sizeof(void*) bytes
ins_node_t::entry_list: now it's std::vector<dtuple_t*>
ins_node_t::entry: now it's std::vector<dtuple_t*>::iterator
DBUG_EXECUTE_IF("row_ins_skip_sec": this dead code removed
move span.h to a proper place to make it available for the whole server
Reformat it.
Constuctors from a contigous container are fixed
to use cont.data() instead of cont.begin()
span<>::index_type is replaced with span<>::size_type
In my micro-benchmarks memcmp(4196) 3 times faster than old
implementation. Also, it's generally better to use as less
reinterpret_casts<> as possible.
buf_is_zeroes(): renamed from buf_page_is_zeroes() and
argument changed to span<> for convenience.
st_::span<T>::const_iterator: fixed
page_zip-verify_checksum(): make argument byte* instead of void*
fil_delete_tablespace(): Remove the unused parameter drop_ahi,
and add the parameter if_exists=false. We want to suppress
error messages if we know that the tablespace has been discarded.
dict_table_rename_in_cache(): Pass the new parameter to
fil_delete_tablespace(), that is, do not complain about
missing tablespace if the tablespace has been discarded.
row_make_new_pathname(): Declare as static.
row_drop_table_for_mysql(): Tolerate !table->data_dir_path
when the tablespace has been discarded.
row_rename_table_for_mysql(): Skip part of the RENAME TABLE
when fil_space_get_first_path() returns NULL.
- This issue is caused by MDEV-19176
(bba59abb03).
- Problem is that there is miscalculation of available memory during
recovery if innodb_buffer_pool_instances > 1.
- Ignore the buffer pool instance while calculating available_memory
- Removed recv_n_pool_free_frames variable and use buf_pool_get_n_pages()
instead.
All tablespace metadata is buffered in fil_system. There is a LRU
mechanism, but that only controls the opening and closing of
fil_node_t::handle.
It is much more efficient and less error-prone to access data file names
by looking up the fil_space_t object rather than by essentially joining
each row with an access to SYS_DATAFILES via the InnoDB internal SQL parser.
dict_get_first_path(): Declare static. The function may only be needed
when loading or updating the data dictionary. Also, change a condition
in order to avoid a bogus GCC 10 -Wstringop-overflow warning for
mem_strdupl() about len==ULINT_UNDEFINED.
i_s_sys_tablespaces_fill_table(): Do not access other InnoDB internal
dictionary tables than SYS_TABLESPACES.
- n_ext value may be less than dtuple_get_n_ext(dtuple) when PK is being
updated and new record inherits the externally stored fields from
delete mark old record.
The only change is a change of the version number.
In MySQL 5.6.46, the copyright comments in a number of files were changed
in mysql/mysql-server@f1a006ece7
but there was no functional change to InnoDB code.
This was also reflected by XtraDB. We are not changing the copyright
comments in MariaDB Server for now.
Between MySQL 5.6.46 and 5.6.47, InnoDB was not changed at all.
Actually, we had forgotten to update the InnoDB version number to
5.6.46. With this change, we are updating InnoDB
from 5.6.45 to 5.6.47 and XtraDB from 5.6.45-86.1 to 5.6.46-86.2.
MySQL 5.7.29 includes the following fix:
Bug #30287668 INNODB: A LONG SEMAPHORE WAIT
mysql/mysql-server@5cdbb22b51
There is no test case. It seems that the problem could occur when
a spatial index is large and peculiar enough so that multiple R-tree
leaf pages will have the exactly same maximum bounding rectangle (MBR).
The commit message suggests that the hang can occur when R-tree
non-leaf pages are being merged, which should only be possible
during transaction rollback or the purge of transaction history,
when the R-tree index is at least 2 levels high and very many records
are being deleted. The message says that a comparison result that two
spatial index node pointer records are equal will cause an infinite loop
in rtr_page_copy_rec_list_end_no_locks(). Hence, we must include the
child page number in the comparison to be consistent with
mysql/mysql-server@2e11fe0e15.
We fix this bug in a simpler way, involving fewer code changes.
cmp_rec_rec(): Renamed from cmp_rec_rec_with_match().
Assert that rec2 always resides in an index page.
Treat non-leaf spatial index pages specially.
Now that we will be invoking dtuple_get_n_ext() instead of
letting btr_push_update_extern_fields() update an already
calculated value, it is unnecessary to calculate the n_ext
upfront.
row_rec_to_index_entry(), row_rec_to_index_entry_low():
Remove the output parameter n_ext.
During update, rollback, or MVCC read, we may miscalculate
the number of off-page columns, and thus the size of the
clustered index record. The function btr_push_update_extern_fields()
is mostly redundant, because the off-page columns would also be
moved by row_upd_index_replace_new_col_val(), which is invoked
via row_upd_index_replace_new_col_vals().
btr_push_update_extern_fields(): Remove.
This is based on
mysql/mysql-server@1fa475b85d
which refines a fix for a recovery bug fix
mysql/mysql-server@ce0a1e85e2
in MySQL 5.7.5.
No test case was provided by Oracle.
Some of the changed code is being covered by the existing test
innodb.blob-crash.
WL#6326 in MariaDB 10.2.2 introduced a potential hang on purge or rollback
when an index tree is being shrunk by multiple levels.
This fix is based on
mysql/mysql-server@f2c5852630
with the main difference that our version of the test case uses
DEBUG_SYNC instrumentation on ROLLBACK, not on purge.
btr_cur_will_modify_tree(): Simplify the check further.
This is the actual bug fix.
row_undo_mod_remove_clust_low(), row_undo_mod_clust(): Add DEBUG_SYNC
instrumentation for the test case.
By default (innodb_strict_mode=ON), InnoDB attempts to guarantee
at DDL time that any INSERT to the table can succeed.
MDEV-19292 recently revised the "row size too large" check in InnoDB.
The check still is somewhat inaccurate;
that should be addressed in MDEV-20194.
Note: If a table contains multiple long string columns so that each column
is part of a column prefix index, then an UPDATE that attempts to modify
all those columns at once may fail, because the undo log record might
not fit in a single undo log page (of innodb_page_size). In the worst case,
the undo log record would grow by about 3KiB of for each updated column.
The DDL-time check (since the InnoDB Plugin for MySQL 5.1) is optional
in the sense that when the maximum B-tree record size or undo log
record size would be exceeded, the DML operation will fail and the
transaction will be properly rolled back.
create_table_info_t::row_size_is_acceptable(): Add the parameter
'bool strict' so that innodb_strict_mode=ON can be overridden during
TRUNCATE, OPTIMIZE and ALTER TABLE...FORCE (when the storage format
is not changing).
create_table_info_t::create_table(): Perform a sloppy check for
TRUNCATE TABLE (create_fk=false).
prepare_inplace_alter_table_dict(): Perform a sloppy check for
simple operations.
trx_is_strict(): Remove. The function became unused in
commit 98694ab0cb (MDEV-20949).
Features:
* STL-like interface
* Fast modification: no branches on insertion or deletion
* Fast iteration: one pointer dereference and one pointer comparison
* Your class can be a part of several lists
Modeled after std::list<T> but currently has fewer methods (not complete yet)
For even more performance it's possible to customize list with templates so
it won't have size counter variable or won't NULLify unlinked node.
How existing lists differ?
No existing lists support STL-like interface.
I_List:
* slower iteration (one more branch on iteration)
* element can't be a part of two lists simultaneously
I_P_List:
* slower modification (branches, except for the fastest push_back() case)
* slower iteration (one more branch on iteration)
UT_LIST_BASE_NODE_T:
* slower modification (branches)
Three UT_LISTs were replaced: two in fil_system_t and one in dyn_buf_t.
- Moved the recv_sys->heap memory condition inside recv_parse_log_recs().
So that, InnoDB can mark the status as STORE_NO earlier.
- InnoDB uses one third of buffer pool chunk size for reading the redo
log records. In that case, we can avoid the scenario where buffer ran
out of memory issue during recovery.