The InnoDB srv_stats counters
n_rows_updated, n_rows_deleted, n_rows_inserted, and n_rows_read
are duplicating
Handler_update, Handler_delete, Handler_write, and Handler_read_ counters.
Updating those counters is not free, especially because some counters
are furthermore split to distinguish a rare case of modifying tables
in the system schema.
srv_printf_innodb_monitor(): Only display an ADAPTIVE HASH INDEX
section if the adaptive hash index is enabled.
ibuf_print(): Only display an INSERT BUFFER section if the
change buffer is not empty.
Problem:
========
InnoDB DDL fails to remove the newly added table or index from
dictionary and index stub from the table cache if the alter
transaction encounters DEADLOCK error in commit phase.
Solution:
========
Restart the alter table transaction if it encounters DEADLOCK
in commit phase. So that index stubs and index, table removal from
dictionary can be done in rollback_inplace_alter_table()
- Added one assert in rollback_inplace_alter_table() to indicate that
the online log for the old table shouldn't exist.
In commit 1bd681c8b3 (MDEV-25506 part 3)
the way how DDL transactions delete files was rewritten.
Only files that are actually attached to InnoDB tablespaces would be
deleted, and only after the DDL transaction was durably committed.
After a failed ALTER TABLE...IMPORT TABLESPACE, any data files that
the user might have moved to the data directory will not be attached
to the InnoDB data dictionary. Therefore, DROP TABLE would not
attempt to delete those files, and a subsequent CREATE TABLE would
fail. The logic was that the user who created the files outside the
DBMS is still the owner of those files, and InnoDB should not delete
those files because an "ownership transfer" (IMPORT TABLESPACE) was
not successfully completed.
However, not deleting those detached files could surprise users.
ha_innobase::delete_table(): Even if no tablespace exists, try to
delete any files that might match the table name.
Reviewed by: Thirunarayanan Balathandayuthapani
btr_insert_into_right_sibling(): Inherit any gap lock from the
left sibling to the right sibling before inserting the record
to the right sibling and updating the node pointer(s).
lock_update_node_pointer(): Update locks in case a node pointer
will move.
Based on mysql/mysql-server@c7d93c274f
This follows up the previous fix in
commit c3c53926c4 (MDEV-26554).
ha_innobase::delete_table(): Work around the insufficient
metadata locking (MDL) during DML operations by acquiring exclusive
InnoDB table locks on all child tables. Previously, this was only
done on TRUNCATE and ALTER.
ibuf_delete_rec(), btr_cur_optimistic_delete(): Do not invoke
lock_update_delete() during change buffer operations.
The revised trx_t::commit(std::vector<pfs_os_file_t>&) will
hold exclusive lock_sys.latch while invoking fil_delete_tablespace(),
which in turn may invoke ibuf_delete_rec().
dict_index_t::has_locking(): A new predicate, replacing the dummy
!dict_table_is_locking_disabled(index->table). Used for skipping lock
operations during ibuf_delete_rec().
trx_t::commit(std::vector<pfs_os_file_t>&): Release the locks
and remove the table from the cache while holding exclusive
lock_sys.latch.
trx_t::commit_in_memory(): Skip release_locks() if dict_operation holds.
trx_t::commit(): Reset dict_operation before invoking commit_in_memory()
via commit_persist().
lock_release_on_drop(): Release locks while lock_sys.latch is
exclusively locked.
lock_table(): Add a parameter for a pointer to the table.
We must not dereference the table before a lock_sys.latch has
been acquired. If the pointer to the table does not match the table
at that point, the table is invalid and DB_DEADLOCK will be returned.
row_ins_foreign_check_on_constraint(): Improve the checks.
Remove a bogus DB_LOCK_WAIT_TIMEOUT return that was needed
before commit c5fd9aa562 (MDEV-25919).
row_upd_check_references_constraints(),
wsrep_row_upd_check_foreign_constraints(): Simplify checks.
- InnoDB should avoid bulk insert operation when table has active
DDL. Because bulk insert writes only one undo log as TRX_UNDO_EMPTY
and logging of concurrent DML happens at commit time uses undo log
record to parse and get the value and operation.
- Removed ROW_T_EMPTY, ROW_OP_EMPTY and their associated functions
and also the test case which tries to log the ROW_OP_EMPTY
when table has active DDL.
- InnoDB DDL results in `Duplicate entry' if concurrent DML throws
duplicate key error. The following scenario explains the problem
connection con1:
ALTER TABLE t1 FORCE;
connection con2:
INSERT INTO t1(pk, uk) VALUES (2, 2), (3, 2);
In connection con2, InnoDB throws the 'DUPLICATE KEY' error because
of unique index. Alter operation will throw the error when applying
the concurrent DML log.
- Inserting the duplicate key for unique index logs the insert
operation for online ALTER TABLE. When insertion fails,
transaction does rollback and it leads to logging of
delete operation for online ALTER TABLE.
While applying the insert log entries, alter operation
encounters 'DUPLICATE KEY' error.
- To avoid the above fake duplicate scenario, InnoDB should
not write any log for online ALTER TABLE before DML transaction
commit.
- User thread which does DML can apply the online log if
InnoDB ran out of online log and index is marked as completed.
Set online log error if apply phase encountered any error.
It can also clear all other indexes log, marks the newly
added indexes as corrupted.
- Removed the old online code which was a part of DML operations
commit_inplace_alter_table() : Does apply the online log
for the last batch of secondary index log and does frees
the log for the completed index.
trx_t::apply_online_log: Set to true while writing the undo
log if the modified table has active DDL
trx_t::apply_log(): Apply the DML changes to online DDL tables
dict_table_t::is_active_ddl(): Returns true if the table
has an active DDL
dict_index_t::online_log_make_dummy(): Assign dummy value
for clustered index online log to indicate the secondary
indexes are being rebuild.
dict_index_t::online_log_is_dummy(): Check whether the online
log has dummy value
ha_innobase_inplace_ctx::log_failure(): Handle the apply log
failure for online DDL transaction
row_log_mark_other_online_index_abort(): Clear out all other
online index log after encountering the error during
row_log_apply()
row_log_get_error(): Get the error happened during row_log_apply()
row_log_online_op(): Does apply the online log if index is
completed and ran out of memory. Returns false if apply log fails
UndorecApplier: Introduced a class to maintain the undo log
record, latched undo buffer page, parse the undo log record,
maintain the undo record type, info bits and update vector
UndorecApplier::get_old_rec(): Get the correct version of the
clustered index record that was modified by the current undo
log record
UndorecApplier::clear_undo_rec(): Clear the undo log related
information after applying the undo log record
UndorecApplier::log_update(): Handle the update, delete undo
log and apply it on online indexes
UndorecApplier::log_insert(): Handle the insert undo log
and apply it on online indexes
UndorecApplier::is_same(): Check whether the given roll pointer
is generated by the current undo log record information
trx_t::rollback_low(): Set apply_online_log for the transaction
after partially rollbacked transaction has any active DDL
prepare_inplace_alter_table_dict(): After allocating the online
log, InnoDB does create fulltext common tables. Fulltext index
doesn't allow the index to be online. So removed the dead
code of online log removal
Thanks to Marko Mäkelä for providing the initial prototype and
Matthias Leich for testing the issue patiently.
It suffices to test compression with one record. Restarting the
server is not really needed; we are exercising the log based recovery
in other tests, such as mariabackup.page_compression_level.
This is a backport of commit 4489a89c71
in order to remove the test innodb.redo_log_during_checkpoint
that would cause trouble in the DBUG subsystem invoked by
safe_mutex_lock() via log_checkpoint(). Before
commit 7cffb5f6e8
these mutexes were of different type.
The following options were introduced in
commit 2e814d4702 (mariadb-10.2.2)
and have little use:
innodb_disable_resize_buffer_pool_debug had no effect even in
MariaDB 10.2.2 or MySQL 5.7.9. It was introduced in
mysql/mysql-server@5c4094cf49
to work around a problem that was fixed in
mysql/mysql-server@2957ae4f99
(but the parameter was not removed).
innodb_page_cleaner_disabled_debug and innodb_master_thread_disabled_debug
are only used by the test innodb.redo_log_during_checkpoint
that will be removed as part of this commit.
innodb_dict_stats_disabled_debug is only used by that test,
and it is redundant because one could simply use
innodb_stats_persistent=OFF or the STATS_PERSISTENT=0 attribute
of the table in the test to achieve the same effect.
Ever since commit 685d958e38
some Perl code in the test mariabackup.huge_lsn is writing names of
non-existing files to the InnoDB redo log and testing the recovery.
We do not need any debug instrumentation to duplicate that test.
The following condition has to added:
1) InnoDB fails to include the offset of the node pointer field
in non-leaf record for redundant row format.
2) If the Fixed length field does have only prefix length then
calculate the field maximum size as prefix length.
- Added the test case to test (2) and to check maximum number of
fields can exist in the index.
Starting with 10.3, an assertion would fail on the rollback of
a recovered incomplete transaction if a table definition violates
a FOREIGN KEY constraint.
DICT_ERR_IGNORE_RECOVER_LOCK: Include also DICT_ERR_IGNORE_FK_NOKEY
so that trx_resurrect_table_locks() will be able to load
table definitions and resurrect IX locks. Previously, if the
FOREIGN KEY constraints of a table were incomplete, the table
would fail to load until rollback, and in 10.3 or later an assertion
would fail that the rollback was not protected by a table IX lock.
Thanks to commit 9de2e60d74 there
will be no problems to enforce subsequent FOREIGN KEY operations
even though a table with invalid REFERENCES clause was loaded.
- Trigger statement initiate update statement after bulk insert
operation and leads to disable of bulk operation. During commit,
InnoDB expects transaction to be in bulk insert mode before
applying the bulk insert operation. InnoDB transaction should
apply all bulk insert operation before disabling bulk insert
operation.
buf_page_t::set_state(): Relax a debug assertion. It is fine to update
a read-fixed block descriptor to be both read-fixed and buffer-fixed.
buf_pool_t::watch_unset(): Fix some incorrect logic that was implemented
in commit e9e6db9355.
Thanks to Elena Stepanova for the test case.
- InnoDB bulk insert operation fails to rollback when it detect
DB_DUPLICATE_KEY error. It leads to orphaned records in primary
indexes. Consecutive update/delete operation assumes that record
should exist in secondary index and it leads to failure.
- After MDEV-24621, InnoDB does buffer the insert bulk operation
for all indexes expect spatial one. But it leads to search the
primary key lookup and it leads to failure. So InnoDB should avoid
bulk insert when table has spatial index involved.
This also fixes MDEV-20198: Instant ALTER TABLE is not crash safe
InnoDB dictionary recovery wrongly used the READ UNCOMMITTED isolation
level, causing some mismatch. For example, if a table was renamed or
replaced in a transaction, according to READ UNCOMMITTED the table might
not exist at all.
We implement READ COMMITTED isolation level for accessing the dictionary
tables SYS_TABLES, SYS_COLUMNS, SYS_INDEXES, SYS_FIELDS, SYS_VIRTUAL,
SYS_FOREIGN, SYS_FOREIGN_COLS. For most of these tables, no secondary
index exists. For the secondary indexes (on SYS_TABLES.ID,
SYS_FOREIGN.FOR_NAME, SYS_FOREIGN.REF_NAME), we will always look up
the primary key in the clustered index and check if the record actually
is a committed version.
dict_check_sys_tables(): Recover tablespaces also from delete-marked
committed records, so that if a matching .ibd file exists, it will
be removed by fil_delete_tablespace() when the committed delete-marked
SYS_INDEXES record of the clustered index is purged
in row_purge_remove_clust_if_poss_low().
fil_ibd_open(): Change the Boolean parameter "validate" to a ternary
one, to suppress error messages when the file might not exist.
It is possible that a .ibd file was deleted and the server shut down
before the SYS_INDEXES and SYS_TABLES records were purged. Hence, if
dict_check_sys_tables() finds a committed delete-marked record,
we must not complain if the tablespace file is not found.
On Windows, we msut treat ERROR_PATH_NOT_FOUND (directory not found)
in the same way as ERROR_FILE_NOT_FOUND. This fixes a few failures where
a previous test successfully executed DROP DATABASE (and deleted all
files and the directory), but a committed delete-marked SYS_TABLES
record had not been purged before server restart.
dict_getnext_system_low(): Do not filter out delete-marked records.
dict_startscan_system(), dict_getnext_system(): Do filter out
delete-marked records, for accessing the INFORMATION_SCHEMA tables.
dict_sys_tables_rec_read(): Return the DB_TRX_ID of the committed
version of the record. This is needed in dict_load_table_low().
dict_load_foreign_cols(), dict_load_foreign(): Add a parameter for
the current transaction identifier. In some DDL operations, the
FOREIGN KEY constraints are being loaded from the data dictionary
before the DDL transaction has been committed. For SYS_FOREIGN
and SYS_FOREIGN_COLS, we must implement the special case of
READ COMMITTED that the changes of the uncommitted current transaction
are visible.
dict_load_foreign(): Validate the table name. We could find a
SYS_FOREIGN.ID via a committed delete-marked secondary index record
that does not match the REF_NAME or FOR_NAME of the secondary index record.
dict_load_index_low(): Optionally take the table as a parameter,
so that table->def_trx_id can be updated in case of a
committed delete-marked SYS_INDEXES record corresponding
to DROP INDEX, but not corresponding to an index stub of ADD INDEX.
dict_load_indexes(): Do not update table->def_trx_id
in case of delete-marked records.
rec_is_metadata(), rec_offs_make_valid(), rec_get_offsets_func(),
row_build_low(): Relax some assertions. We may now have
!index->is_instant() even if a metadata record is present in the index.
Previously, the recovery of instant ADD/DROP COLUMN assumed
that READ UNCOMMITTED of the data dictionary will be performed.
Now, we will have a READ COMMITTED copy of the data dictionary
cache, and a READ UNCOMMITTED copy of the metadata record.
btr_page_reorganize_low(): Correctly update the FIL_PAGE_TYPE
when rolling back an instant ADD/DROP COLUMN operation.
row_rec_to_index_entry_impl(): Relax some assertions,
and disallow accessing "extra" fields. This fixes the recovery
of a crash during an instant ADD COLUMN after a successful
instant DROP COLUMN, in the test innodb.instant_alter_crash.
Tested by: Matthias Leich
A few regression tests invoke heavy flushing of the buffer pool
and may trigger warnings that tablespaces could not be deleted
because of pending writes. Those warnings are to be expected
during the execution of such tests.
The warnings are also frequently seen with Valgrind or MemorySanitizer.
For those, the global suppression in have_innodb.inc does the trick.
- InnoDB should check whether bulk transaction id set to its own
transaction id before start bulk insert operation.
- When bulk insert failure happens, InnoDB should set the error info
of the transaction.