Apply the changes to InnoDB and XtraDB that had been
inadvertently skipped in the merge
commit ae476868a5
That merge failure sabotaged part of MDEV-20127:
>Revert a problematic auto_increment_increment 'fix' from 2014.
>This involves replacing the MDEV-8827 fix and in 10.1,
>removing some WSREP instrumentation.
The code changes were re-merged manually by executing the following:
# Get the parent of the problematic merge.
git checkout ae476868a5394041a00e75a29c7d45917e8dfae8^
# Perform the merge again.
git merge ae476868a5394041a00e75a29c7d45917e8dfae8^2
# Get the conflict resolution from that merge.
git checkout ae476868a5 .
# Note: Any changes to these files were removed (empty diff)!
git diff HEAD storage/{innobase,xtradb}/handler/ha_innodb.cc
# Apply the code changes:
git diff cf40393471b10ca68cc1d2804c22ab9203900978^2..MERGE_HEAD \
storage/{innobase,xtradb}/handler/ha_innodb.cc|
patch -p1
Lock wait can happen on secondary index when doing FK checks for wsrep.
We should just return error to upper layer and applier will retry
operation when needed.
ut_strlcpy(): Replace with the standard function strncpy().
ut_strlcpy_rev(): Define in the same compilation unit where
the only caller resides. Avoid unnecessary definition
in non-debug builds.
TODO: do not use fil_* functions for redo log files.
log_t::checkpoint_lock: remove this lock which was used to wait for
async I/O completion.
checkpoint_lock_key
checkpoint_lock: remove now unneeded globals
log_write_checkpoint_info(): remove sync argument because all checkpoint
writes are synchronous now
log_write_checkpoint_info(): remove sync argument
log_group_checkpoint(): merge with the only caller
log_complete_checkpoint(): merge with the only caller
log_t::complete_checkpoint(): remove by merging with the only caller.
mem_heap_dup(): Avoid mem_heap_alloc() and memcpy() of data=NULL, len=0.
trx_undo_report_insert_virtual(), trx_undo_page_report_insert(),
trx_undo_page_report_modify(): Avoid memcpy(ptr, NULL, 0).
dfield_data_is_binary_equal(): Correctly handle data=NULL, len=0.
rec_init_offsets_temp(): Do allow def_val=NULL in the interface.
This clean-up was motivated by WITH_UBSAN, and no bug related to this
was observed in the wild. It should be noted that undefined behaviour
such as memcpy(ptr, NULL, 0) could allow compilers to perform unsafe
optimizations, like it was the case in
commit fc168c3a5e (MDEV-15587).
mem_heap_dup(): Avoid mem_heap_alloc() and memcpy() of data=NULL, len=0.
trx_undo_report_insert_virtual(), trx_undo_page_report_insert(),
trx_undo_page_report_modify(): Avoid memcpy(ptr, NULL, 0).
dfield_data_is_binary_equal(): Correctly handle data=NULL, len=0.
This clean-up was motivated by WITH_UBSAN, and no bug related to this
was observed in the wild. It should be noted that undefined behaviour
such as memcpy(ptr, NULL, 0) could allow compilers to perform unsafe
optimizations, like it was the case in
commit fc168c3a5e (MDEV-15587).
Ignore GetDiskFreeSpace() errors in os_file_get_status_win32
The call is only used to calculate filesystem block size, and this in
turn is only shown in information_schema.sys_tablespaces.FS_BLOCK_SIZE.
There is no other use of this field, it does not affect any Innodb
functionality
Historically, InnoDB split the redo log into at least 2 files.
MDEV-12061 allowed the minimum to be innodb_log_files_in_group=1,
but it kept the default at innodb_log_files_in_group=2.
Because performance seems to be slightly better with only one log file,
and because implementing an append-only variant of the log would require
a single file, let us define the default to be 1, and have
innodb_log_file_size=96M, to retain the same default total size.
- fts_optimize_thread() uses dict_table_t object instead of table id.
So that it doesn't acquire dict_sys->mutex. It leads to remove the
hang of dict_sys->mutex between fts_optimize_thread() and other threads.
- in_queue to indicate whether the table is in fts_optimize_queue. It
is protected by fts_optimize_wq->mutex to avoid any race condition.
- fts_optimize_init() adds the fts table to the fts_optimize_wq
InnoDB stores synced_doc_id + 1 value in FTS_CONFIG table. But
while reading the synced doc id from FTS_CONFIG table after restart,
InnoDB should read synced_doc_id - 1 to get the actual synced
doc id value.
Based on the performance testing that was conducted in MDEV-17492,
the InnoDB adaptive hash index could only help performance in specific,
almost-read-only workloads. It could slow down all kinds of workloads
(especially DROP TABLE, TRUNCATE TABLE, ALTER TABLE, or DROP INDEX
operations), and it can become corrupted, causing crashes (such as
MDEV-18815, MDEV-20203) and possibly data corruption. Furthermore,
the adaptive hash index consumes space from the InnoDB buffer pool,
which could hurt performance when the working set would almost fit
in the buffer pool.
Given all this, it is best to disable the adaptive hash index by default.
To diagnose a hang in slow shutdown (innodb_fast_shutdown=0),
let us introduce a Boolean startup option in debug builds
that will cause the contents of the InnoDB change buffer
to be dumped to the server error log at startup.
Reduce the scope of some variables, remove a goto and a redundant
assertion.
For B-tree secondary indexes, this function can remove a delete-marked
purgeable record, in case a row rollback of the INSERT was initiated
due to an error in an earlier secondary index.
The BtrBulk class, which was introduced in MySQL 5.7, is by design
the exclusive writer to an index. It is therefore unnecessary to
acquire the dict_index_t::lock in that code.
Holding the dict_index_t::lock would unnecessarily block other threads
(SQL connections and the InnoDB purge threads) from buffering concurrent
modifications to being-created secondary indexes.
This fix is motivated by a change in MySQL 5.7.28:
Bug #29008298 MYSQLD CRASHES ITSELF WHEN CREATING INDEX
mysql/mysql-server@f9fb96c20f
PageBulk::init(), PageBulk::latch(): Never acquire m_index->lock.
PageBulk::storeExt(): Remove some pointer indirection, and improve
a debug assertion that seems to prove that some code is redundant.
BtrBulk::pageCommit(): Assert that m_index->lock is not being held.
btr_blob_log_check_t: Do not acquire m_index->lock if
m_op == BTR_STORE_INSERT_BULK. Add UNIV_UNLIKELY hints around
that condition.
btr_store_big_rec_extern_fields(): Allow index->lock not to be held
while op == BTR_STORE_INSERT_BULK. Add UNIV_UNLIKELY hints around
that condition.
MDEV-16210 original case was wrongly allowed versioned DELETE from
referenced table where reference is by non-primary key. InnoDB UPDATE
has optimization for new rows not changing its clustered index
position. In this case InnoDB doesn't update all secondary indexes and
misses the one holding the referenced key. The fix was to disable this
optimization for versioned DELETE. In case of versioned DELETE we
forcely update all secondary indexes and therefore check them for
constraints.
But the above fix raised another problem with versioned DELETE on
foreign table side. In case when there was no corresponding record in
referenced table (illegal foreign reference can be done with "set
foreign_key_checks=off") there was spurious constraint check (because
versioned DELETE is actually UPDATE) and hence the operation failed
with constraint error.
MDEV-16210 tried to fix the above problem by checking foreign table
instead of referenced table and that at least was illegal.
Constraint check is done by row_ins_check_foreign_constraint() no
matter what kind of table is checked, referenced or foreign
(controlled by check_ref argument).
Referenced table is checked by row_upd_check_references_constraints().
Foreign table is checked by row_ins_check_foreign_constraints().
Current fix rolls back the wrong fix for the above problem and
disables referenced table check for DELETE on foreign side by
introducing `check_foreign` argument which when set to *false* skips
row_ins_check_foreign_constraints() call.
fil_crypt_rotate_page(): Skip the key rotation for pages that carry 0
in FIL_PAGE_TYPE. This avoids not only unnecessary writes, but also
failures of the recently added debug assertion in
buf_flush_init_for_writing() that the FIL_PAGE_TYPE should be nonzero.
Note: the debug assertion can fail if the file was originally created
before MySQL 5.5. In old InnoDB versions, FIL_PAGE_TYPE was only
initialized for B-tree pages, to FIL_PAGE_INDEX. For any other pages,
the field could be garbage, including FIL_PAGE_INDEX. In MariaDB 10.2
and later, buf_flush_init_for_writing() would initialize the
FIL_PAGE_TYPE on such old pages, but only after passing the debug
assertion that insists that pages have a nonzero FIL_PAGE_TYPE.
Thus, the debug assertion at the start of buf_flush_init_for_writing()
can fail when upgrading from very old debug files. This assertion is
only present in debug builds, not release builds.
Old InnoDB/XtraDB versions only initialized FIL_PAGE_TYPE for
B-tree pages (to FIL_PAGE_INDEX), and left it uninitialized
(possibly containing FIL_PAGE_INDEX) for others. In MySQL
or MariaDB 5.5, the field is initialized on almost all pages,
but still not all of them.
In MariaDB 10.2 and later, buf_flush_init_for_writing() would
initialize the FIL_PAGE_TYPE on such old pages, but only after
passing the debug assertion that we are now removing from 10.1.
There, we will be able to modify fil_crypt_rotate_page() so
that it will skip the key rotation for pages that contain 0
in FIL_PAGE_TYPE.
In MariaDB 10.1, there is no logic that would initialize
FIL_PAGE_TYPE on data pages in old data files after an update.
So, encryption key rotation may routinely cause page flushes
on pages that contain 0 in FIL_PAGE_TYPE.
rec_init_offsets(): Relax the assertion that was added in
commit 01f45becd1
to catch ROW_FORMAT=REDUNDANT records that have fewer fields
than expected.
This assertion would fail when accessing the records of the
built-in InnoDB table SYS_INDEXES. The column MERGE_THRESHOLD
had been effectively instantly added in MariaDB Server 10.2
(and MySQL 5.7), but is_instant() does not hold for that index.
Relax the assertion, so that it will not fail in this case.
The assertion that was added in
commit c0c003beb4
to augment the fix of MDEV-20805 turns out to be invalid when
innodb_immediate_scrub_data_uncompressed is enabled.
In this mode, fsp_init_file_page() will be invoked on data pages
that have been freed, causing writes of almost-all-zero pages.
btr_page_free(): Adjust the comment.
buf_flush_init_for_writing(): Disable the assertion with a note
that it should be re-enabled in MDEV-15528.
This is another follow-up fix to
commit b393e2cb0c
which turned out to be still broken.
Replace the C++11 keyword 'constexpr' with #define.
debug_sync_t::str: Remove the zero-length array.
Replace sync->str with reinterpret_cast<char*>(&sync[1]).
Remove unused variables and type mismatch that was introduced
in commit b393e2cb0c
Also, fix a typo in the documentation of the parameter, and
update the test.
We will remove the InnoDB background operation of merging buffered
changes to secondary index leaf pages. Changes will only be merged as a
result of an operation that accesses a secondary index leaf page,
such as a SQL statement that performs a lookup via that index,
or is modifying the index. Also ROLLBACK and some background operations,
such as purging the history of committed transactions, or computing
index cardinality statistics, can cause change buffer merge.
Encryption key rotation will not perform change buffer merge.
The motivation of this change is to simplify the I/O logic and to
allow crash recovery to happen in the background (MDEV-14481).
We also hope that this will reduce the number of "mystery" crashes
due to corrupted data. Because change buffer merge will typically
take place as a result of executing SQL statements, there should be
a clearer connection between the crash and the SQL statements that
were executed when the server crashed.
In many cases, a slight performance improvement was observed.
This is joint work with Thirunarayanan Balathandayuthapani
and was tested by Axel Schwenke and Matthias Leich.
The InnoDB monitor counter innodb_ibuf_merge_usec will be removed.
On slow shutdown (innodb_fast_shutdown=0), we will continue to
merge all buffered changes (and purge all undo log history).
Two InnoDB configuration parameters will be changed as follows:
innodb_disable_background_merge: Removed.
This parameter existed only in debug builds.
All change buffer merges will use synchronous reads.
innodb_force_recovery will be changed as follows:
* innodb_force_recovery=4 will be the same as innodb_force_recovery=3
(the change buffer merge cannot be disabled; it can only happen as
a result of an operation that accesses a secondary index leaf page).
The option used to be capable of corrupting secondary index leaf pages.
Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'.
* innodb_force_recovery=5 (which essentially hard-wires
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED)
becomes safe to use. Bogus data can be returned to SQL, but
persistent InnoDB data files will not be corrupted further.
* innodb_force_recovery=6 (ignore the redo log files)
will be the only option that can potentially cause
persistent corruption of InnoDB data files.
Code changes:
buf_page_t::ibuf_exist: New flag, to indicate whether buffered
changes exist for a buffer pool page. Pages with pending changes
can be returned by buf_page_get_gen(). Previously, the changes
were always merged inside buf_page_get_gen() if needed.
ibuf_page_exists(const buf_page_t&): Check if a buffered changes
exist for an X-latched or read-fixed page.
buf_page_get_gen(): Add the parameter allow_ibuf_merge=false.
All callers that know that they may be accessing a secondary index
leaf page must pass this parameter as allow_ibuf_merge=true,
unless it does not matter for that caller whether all buffered
changes have been applied. Assert that whenever allow_ibuf_merge
holds, the page actually is a leaf page. Attempt change buffer
merge only to secondary B-tree index leaf pages.
btr_block_get(): Add parameter 'bool merge'.
All callers of btr_block_get() should know whether the page could be
a secondary index leaf page. If it is not, we should avoid consulting
the change buffer bitmap to even consider a merge. This is the main
interface to requesting index pages from the buffer pool.
ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace
buf_page_get_known_nowait() with much simpler logic, because
it is now guaranteed that that the block is x-latched or read-fixed.
mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge().
On crash recovery, we will no longer merge any buffered changes
for the pages that we read into the buffer pool during the last batch
of applying log records.
buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove.
btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait()
to its only remaining caller.
buf_page_make_young_if_needed(): Define as an inline function.
Add the parameter buf_pool.
buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the
parameter buf_pool.
fil_space_validate_for_mtr_commit(): Remove a bogus comment
about background merge of the change buffer.
btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(),
btr_cur_open_at_index_side_func(): Use narrower data types and scopes.
ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages().
Merge the change buffer by invoking buf_page_get_gen().
buf_flush_init_for_writing(): Assert that FIL_PAGE_TYPE is set
except when creating a new data file with a dummy first page.
buf_dblwr_create(): Ensure that FIL_PAGE_TYPE on all pages
will be initialized. Reset buf_dblwr_being_created at the end.
In the function recv_parse_or_apply_log_rec_body() there are debug checks
for validating the state of the page when redo log records are being
applied. Most notably, FIL_PAGE_TYPE should be set before anything else
is being written to the page.
ibuf_add_free_page(): Set FIL_PAGE_TYPE before performing any other changes.
btr_page_get_split_rec_to_left(): Assert that in the leftmost leaf page,
if the metadata record exists, index->is_instant() must hold.
The assertion of commit 01f45becd1
could fail during innobase_instant_try().
btr_page_get_split_rec_to_left(): Assert that in the leftmost leaf page,
the metadata record exists if and only if index->is_instant().
page_validate(): Correct the wording of a message.
rec_init_offsets(): Assert that whenever a record is in "instant ALTER"
format, index->is_instant() must hold.