If there're multiple row versions in InnoDB, reading one row from PK
may have O(N) complexity and reading from secondary keys may have
O(N^2) complexity.
The problem occurs when there are many pending versions of the same
row, meaning that the primary key is the same, but a secondary key is
different. The slowdown occurs when the secondary index is
traversed. This patch creates a helper class for the function
row_sel_get_clust_rec_for_mysql() which can remember and re-use
cached_clust_rec & cached_old_vers so that rec_get_offsets() does not
need to be called over and over for the clustered record.
Corrections by Kevin Lewis <kevin.lewis@oracle.com>
MDEV-20341 Unstable innodb.innodb_bug14704286
Removed test that tested the ability of interrupting long query which
is not long anymore.
MDEV-17614 flags INSERT…ON DUPLICATE KEY UPDATE unsafe for statement-based
replication when there are multiple unique indexes. This correctly fixes
something whose attempted fix in MySQL 5.7
in mysql/mysql-server@c93b0d9a97
caused lock conflicts. That change was reverted in MySQL 5.7.26
in mysql/mysql-server@066b6fdd43
(with a substantial amount of other changes).
In MDEV-17073 we already disabled the unfortunate MySQL change when
statement-based replication was not being used. Now, thanks to MDEV-17614,
we can actually remove the change altogether.
This reverts commit 8a346f31b9 (MDEV-17073)
and mysql/mysql-server@c93b0d9a97 while
keeping the test cases.
fts_sync(): Remove the constant parameter has_dict=false.
fts_sync_table(): Remove the constant parameter has_dict=false,
and the redundant parameter unlock_cache = !wait.
Make wait=true the default parameter.
1) Whenever purge thread tries to remove the secondary virtual index
entry, purge thread acquires metadata lock for the table and release
dict_operation_lock. After that, it retries the secondary index
deletion if MDL acquired successfully.
2) Inside row_vers_old_has_index_entry(), Change the safe_to_purge
to unsafe_to_purge goto statement. So it can be more appropriate to
return true if it is unsafe_to_purge.
3) Previously, row_vers_old_has_index_entry() returns false if InnoDB
fetched the MDL on the table for the first time. This check(two cases)
should checked only during purge thread. In row_purge_poss_sec(), again
InnoDB checks whether the MDL fetched for the first time. If it is then
InnoDB retry the secondary index deletion logic. So in that case,
InnoDB have to clean up the memory used inside row_vers_old_has_index_entry()
and shouldn't care about return value.
This is a regression due to MDEV-16515 that affects some versions in
the MariaDB 10.1 server series starting with 10.1.35, and possibly
all versions starting with 10.2.17, 10.3.8, and 10.4.0.
The idea of MDEV-16515 is to allow DROP TABLE to be interrupted,
in case it was stuck due to some concurrent activity. We already
made some cases of internal DROP TABLE immune to kill in MDEV-18237,
MDEV-16647, MDEV-17470. We must include the cleanup of
CREATE TABLE...SELECT in the list of such internal DROP TABLE.
ha_innobase::delete_table(): Pass create_failed=true if the current
SQL statement is CREATE, so that the table will be dropped.
row_drop_table_for_mysql(): If create_failed=true, do not allow
the operation to be interrupted.
In row_ins_foreign_check_on_constraint(), clustered index record is being passed to wsrep_append_foreign_key() after releasing the latch. If a record has been changed by other thread in the meantime then it could lead to a crash when
wsrep_rec_get_foreign_key () tries to access the record.
row_ins_foreign_check_on_constraint
Use cascade->pcur->old_rec instead of clust_rec.
row_ins_check_foreign_constraint
Add missing error printout.
- Introduce a new variable called innodb_encrypt_temporary_tables which is
a boolean variable. It decides whether to encrypt the temporary tablespace.
- Encrypts the temporary tablespace based on full checksum format.
- Introduced a new counter to track encrypted and decrypted temporary
tablespace pages.
- Warnings issued if temporary table creation has conflict value with
innodb_encrypt_temporary_tables
- Added a new test case which reads and writes the pages from/to temporary
tablespace.
Problem:
=========
One of the purge thread access the corrupted page and tries to remove from
LRU list. In the mean time, other purge threads are waiting for same page
in buf_wait_for_read(). Assertion(buf_fix_count == 0) fails for the
purge thread which tries to remove the page from LRU list.
Solution:
========
- Set the page id as FIL_NULL to indicate the page is corrupted before
removing the block from LRU list. Acquire hash lock for the particular
page id and wait for the other threads to release buf_fix_count
for the block.
- Added the error check for btr_cur_open() in row_search_on_row_ref().
Some I/O functions and macros that are declared in os0file.h used to
return a Boolean status code (nonzero on success). In MySQL 5.7, they
were changed to return dberr_t instead. Alas, in MariaDB Server 10.2,
some uses of functions were not adjusted to the changed return value.
Until MDEV-19231, the valid values of dberr_t were always nonzero.
This means that some code that was incorrectly checking for a zero
return value from the functions would never detect a failure.
After MDEV-19231, some tests for ALTER ONLINE TABLE would fail with
cmake -DPLUGIN_PERFSCHEMA=NO. It turned out that the wrappers
pfs_os_file_read_no_error_handling_int_fd_func() and
pfs_os_file_write_int_fd_func() were wrongly returning
bool instead of dberr_t. Also the callers of these functions were
wrongly expecting bool (nonzero on success) instead of dberr_t.
This mistake had been made when the addition of these functions was
merged from MySQL 5.6.36 and 5.7.18 into MariaDB Server 10.2.7.
This fix also reverts commit 40becbc3c7
which attempted to work around the problem.
Problem:
=======
fil_iterate() writes imported tablespace page0 as it is to discarded
tablespace. Space id wasn't even changed. While opening the tablespace,
tablespace fails with space id mismatch error.
Fix:
====
fil_iterate() copies the page0 with discarded space id to imported
tablespace.
os_file_write_func() and os_file_read_no_error_handling_func() returned
different result values depending on if UNIV_PFS_IO was defined or not.
Other things:
- Added some comments about return values for some functions
row_search_mvcc(): Duplicate the logic of btr_pcur_move_to_next()
so that an infinite loop can be avoided when advancing to the next
page fails due to a corrupted page.
In MySQL 5.7.8 an extra level of pointer indirection was added to
dict_operation_lock and some other rw_lock_t without solid justification,
in mysql/mysql-server@52720f1772.
Let us revert that change and remove the rather useless rw_lock_t
constructor and destructor and the magic_n field. In this way,
some unnecessary pointer dereferences and heap allocation will be avoided
and debugging might be a little easier.
Some places didn't match the previous rules, making the Floor
address wrong.
Additional sed rules:
sed -i -e 's/Place.*Suite .*, Boston/Street, Fifth Floor, Boston/g'
sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'
row_purge_upd_exist_or_extern_func(): Check for node->vcol_op_failed()
after row_purge_remove_sec_if_poss(), like row_purge_del_mark() did.
This avoids us dereferencing the node->table=NULL pointer.
The test case, submitted by Elena Stepanova, is not deterministic and
does not repeat the bug on 10.2. With the added loop, for me, it reliably
crashes 10.3 without the fix. I was unable to create a deterministic
test case for either 10.2 or 10.3.
Reviewed by Thirunarayanan Balathandayuthapani
fts_table_t::parent: Remove the redundant field. Refer to
table->name.m_name instead.
fts_update_sync_doc_id(), fts_update_next_doc_id(): Remove
the redundant parameter table_name.
fts_get_table_name_prefix(): Access the dict_table_t::name.
FIXME: Ensure that this access is always covered by
dict_sys->mutex.
The accessor dtuple_get_nth_v_field() was defined differently between
debug and release builds in MySQL 5.7.8 in
mysql/mysql-server@c47e1751b7
and a debug assertion to document or enforce the questionable assumption
tuple->v_fields == &tuple->fields[tuple->n_fields] was missing.
This was apparently no problem until MDEV-11369 introduced instant
ADD COLUMN to MariaDB Server 10.3. With that work present, in one
test case, trx_undo_report_insert_virtual() could in release builds
fetch the wrong value for a virtual column.
We replace many of the dtuple_t accessors with const-preserving
inline functions, and fix missing or misleadingly applied const
qualifiers accordingly.
log_checkpoint(), log_make_checkpoint_at(): Remove the parameter
write_always. It seems that the primary purpose of this parameter
was to ensure in the function recv_reset_logs() that both checkpoint
header pages will be overwritten, when the function is called from
the never-enabled function recv_recovery_from_archive_start().
create_log_files(): Merge recv_reset_logs() to its only caller.
Debug instrumentation: Prefer to flush the redo log, instead of
triggering a redo log checkpoint.
page_header_set_field(): Disable a debug assertion that will
always fail due to MDEV-19344, now that we no longer initiate
a redo log checkpoint before an injected crash.
In recv_reset_logs() there used to be two calls to
log_make_checkpoint_at(). The apparent purpose of this was
to ensure that both InnoDB redo log checkpoint header pages
will be initialized or overwritten.
The second call was removed (without any explanation) in MySQL 5.6.3:
mysql/mysql-server@4ca37968da
In MySQL 5.6.8 WL#6494, starting with
mysql/mysql-server@00a0ba8ad9
the function recv_reset_logs() was not only invoked during
InnoDB data file initialization, but also during a regular
startup when the redo log is being resized.
mysql/mysql-server@45e9167983
in MySQL 5.7.2 removed the UNIV_LOG_ARCHIVE code, but still
did not remove the parameter write_always.
Similar to what was done in commit aa3f7a107c
for FULLTEXT INDEX, we must ensure that MLOG_INDEX_LOAD records will always
be written if redo logging was disabled.
row_merge_build_indexes(): Invoke row_merge_write_redo() also when
online operation is not being executed or an error occurs.
In case of an error, invoke flush_observer->interrupted() so that
the pages will not be flushed but merely evicted from the buffer pool.
Before resuming redo logging, it is crucial for the correctness of
mariabackup and InnoDB crash recovery to flush or evict all affected pages
and to write MLOG_INDEX_LOAD records.
This is a follow-up to MDEV-18733. As part of that fix, we made
dict_check_sys_tables() skip tables that would be dropped by
row_mysql_drop_garbage_tables().
DICT_ERR_IGNORE_DROP: A new mode where the file should not be attempted
to be opened.
dict_load_tablespace(): Do not try to load the tablespace if
DICT_ERR_IGNORE_DROP has been specified.
row_mysql_drop_garbage_tables(): Pass the DICT_ERR_IGNORE_DROP mode.
fil_space_for_table_exists_in_mem(): Remove a parameter.
The only caller that passed print_error_if_does_not_exist=true
was row_drop_single_table_tablespace().
The record MLOG_INDEX_LOAD is supposed to be written to indicate that
some page modifications bypassed redo logging, and that redo logging
is now re-enabled. It was not written for fulltext indexes during
ALTER TABLE.
row_merge_write_redo(): Declare globally. Assert that the index
is neither a spatial nor fulltext index.
recv_mlog_index_load(): Observe a MLOG_INDEX_LOAD operation.
recv_parse_log_recs(): Handle MLOG_INDEX_LOAD also in multi-record
mini-transactions. Because of this omission, we should keep writing
MLOG_INDEX_LOAD in single-record mini-transactions, because older
versions of Mariabackup would fail.
row_fts_merge_insert(): Write MLOG_INDEX_LOAD for the auxiliary
tables of fulltext indexes.
This reverts commit 21b2fada7a
and commit 81d71ee6b2.
The MDEV-18464 change introduces a few data race issues. Contrary to
the documentation, the field trx_t::victim is not always being protected
by lock_sys_t::mutex and trx_t::mutex. Most importantly, it seems
that KILL QUERY could wrongly avoid acquiring both mutexes when
invoking lock_trx_handle_wait_low(), in case another thread had
already set trx->victim=true.
We also revert MDEV-12009, because it should depend on the MDEV-18464
fix being present.
1) Avoid writing of MLOG_INDEX_LOAD redo log record during inplace
alter table when the table is empty and also for spatial index.
2) Avoid creation of temporary merge file for spatial index during
index creation process.
Pushed the decision for innodb transaction and system
locking down to lock0lock.cc level. With this,
we can avoid releasing these mutexes for executions
where these mutexes were acquired upfront.
This patch will also fix BF aborting of native threads, e.g.
threads which have declared wsrep_on=OFF. Earlier, we have
used, for innodb trx locks, was_chosen_as_deadlock_victim
flag, for marking inodb transactions, which are victims for
wsrep BF abort. With native threads (wsrep_on==OFF), re-using
was_chosen_as_deadlock_victim flag may lead to inteference
with real deadlock, and to deal with this, the patch has added new
flag for marking wsrep BF aborts only: victim=true
Similar way if replication decides to abort one of the threads
we mark victim by: victim=true
innobase_kill_query
Remove lock sys and trx mutex handling.
wsrep_innobase_kill_one_trx
Mark victim trx with victim=true
trx0trx.h
Remove trx_abort_t type and abort type variable from
trx struct. Add victim variable to trx.
wsrep_kill_victim
Remove abort_type
lock_report_waiters_to_mysql
Take also trx mutex and mark trx as a victim for
replication abort.
lock_trx_handle_wait_low
New low level function to check whether the transaction
has already been rolled back because it was selected as
a deadlock victim, or if it has to wait then cancel
the wait lock.
lock_trx_handle_wait
If transaction is not marked as victim take lock sys
and trx mutex before calling lock_trx_handle_wait_low
and release them after that.
row_search_for_mysql
Remove lock sys and trx mutex taking and releasing.
trx_rollback_to_savepoint_for_mysql_low
trx_commit_in_memory
Clean up victim variable.
now we can afford it. Fix -Werror errors. Note:
* old gcc is bad at detecting uninit variables, disable it.
* time_t is int or long, cast it for printf's
The predicate page_is_root(), which was added in MariaDB Server 10.2.2,
is based on a wrong assumption.
Under some circumstances, InnoDB can transform B-trees into a degenerate
state where a non-leaf page has no sibling pages. Because of this,
we cannot assume that a page that has no siblings is the root page.
This bug will be tracked as MDEV-19022.
Because of the bug that may affect many InnoDB data files, we must remove
and replace the wrong predicate. Using the wrong predicate can cause
corruption. A leaf page is not allowed to be empty except if it is the
root page, and the entire table is empty.
row_ins_foreign_fill_virtual(): Construct update->old_vrow
with ROW_COPY_DATA instead of ROW_COPY_POINTERS. With the latter,
the object would be pointing to a buffer pool page frame. That page
frame can become stale and invalid as soon as
row_ins_foreign_check_on_constraint() invokes mtr_t::commit().
Most of the time, the pointer target is not going to be overwritten
by anything, and everything appears to work correctly.
Buffer pool page replacement is highly unlikely, and any pessimistic
operation that would overwrite the old location of the record is only
slightly more likely. It is not known whether there is an actual bug.
This came up while diagnosing MDEV-18879 in MariaDB 10.3.
row_drop_tables_for_mysql_in_background(): Copy the table name
before closing the table handle, to avoid heap-use-after-free if
another thread succeeds in dropping the table before
row_drop_table_for_mysql_in_background() completes the table name lookup.
dict_mem_create_temporary_tablename(): With innodb_safe_truncate=ON
(the default), generate a simple, unique, collision-free table name
using only the id, no pseudorandom component. This is safe, because
on startup, we will drop any #sql tables that might exist in InnoDB.
This is a backport from 10.3. It should have been backported already
as part of backporting MDEV-14717,MDEV-14585 which were prerequisites
for the MDEV-13564 backup-friendly TRUNCATE TABLE.
This seems to reduce the chance of table creation failures in
ha_innobase::truncate().
ha_innobase::truncate(): Do not invoke close(), but instead
mimic it, so that we can restore to the original table handle
in case opening the truncated copy of the table failed.