MDEV-14511 tried to avoid some consistency problems related to InnoDB
persistent statistics. The persistent statistics are being written by
an InnoDB internal SQL interpreter that requires the InnoDB data dictionary
cache to be locked.
Before MDEV-14511, the statistics were written during DDL in separate
transactions, which could unnecessarily reduce performance (each commit
would require a redo log flush) and break atomicity, because the statistics
would be updated separately from the dictionary transaction.
However, because it is unacceptable to hold the InnoDB data dictionary
cache locked while suspending the execution for waiting for a
transactional lock (in the mysql.innodb_index_stats or
mysql.innodb_table_stats tables) to be released, any lock conflict
was immediately be reported as "lock wait timeout".
To fix MDEV-14941, an attempt to reduce these lock conflicts by acquiring
transactional locks on the user tables in both the statistics and DDL
operations was made, but it would still not entirely prevent lock conflicts
on the mysql.innodb_index_stats and mysql.innodb_table_stats tables.
Fixing the remaining problems would require a change that is too intrusive
for a GA release series, such as MariaDB 10.2.
Thefefore, we revert the change MDEV-14511. To silence the
MDEV-13201 assertion, we use the pre-existing flag trx_t::internal.
dict_foreign_find_index(): Ignore incompletely created indexes.
After a failed ADD UNIQUE INDEX, an incompletely created index
could be left behind until the next ALTER TABLE statement.
InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2
redo log records during table-rebuilding ALGORITHM=INPLACE operations.
We must write the records for any .ibd file renames, so that the
operations are crash-safe.
If InnoDB is killed during a RENAME TABLE operation, it can happen that
the transaction for updating the data dictionary will be rolled back.
But, nothing will roll back the renaming of the .ibd file
(the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter,
the renaming of the dict_table_t::name in the dict_sys cache. We introduce
the undo log record TRX_UNDO_RENAME_TABLE to fix this.
fil_space_for_table_exists_in_mem(): Remove the parameters
adjust_space, table_id and some code that was trying to work around
these deficiencies.
fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record.
dict_table_rename_in_cache(): Invoke fil_name_write_rename().
trx_undo_rec_copy(): Set the first 2 bytes to the length of the
copied undo log record.
trx_undo_page_report_rename(), trx_undo_report_rename():
Write a TRX_UNDO_RENAME_TABLE record with the old table name.
row_rename_table_for_mysql(): Invoke trx_undo_report_rename()
before modifying any data dictionary tables.
row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE
by invoking dict_table_rename_in_cache(), which will take care
of both renaming the table and the file.
dict_stats_rename_table(): After DB_LOCK_WAIT_TIMEOUT
or DB_DUPLICATE_KEY, reset the trx->error_state before retrying.
Also, properly treat DB_DEADLOCK as a hard error.
The assertion failure was caused by
MDEV-14511 Use fewer transactions for updating InnoDB persistent statistics
We are reusing a transaction object after commit, and sometimes,
even after a successful operation, the trx_t::error_state may be
something else than DB_SUCCESS. Reset the field when needed.
Starting with MySQL 5.7 (or MariaDB 10.2.2) InnoDB no longer contains
the "table monitor" or "tablespace monitor". The conditions on
srv_print_innodb_tablespace_monitor, srv_print_innodb_table_monitor
never hold. So, the code was dead.
Also, remove a bogus reference to dict_print(), which used to implement
the InnoDB table monitor.
Allow DROP TABLE `#mysql50##sql-...._.` to drop tables that were
being rebuilt by ALGORITHM=INPLACE
NOTE: If the server is killed after the table-rebuilding ALGORITHM=INPLACE
commits inside InnoDB but before the .frm file has been replaced, then
the recovery will involve something else than DROP TABLE.
NOTE: If the server is killed in a true inplace ALTER TABLE commits
inside InnoDB but before the .frm file has been replaced, then we
are really out of luck. To properly handle that situation, we would
need a transactional mysql.ddl_fixup table that directs recovery to
rename or remove files.
prepare_inplace_alter_table_dict(): Use the altered_table->s->table_name
for generating the new_table_name.
table_name_t::part_suffix: The start of the partition name suffix.
table_name_t::dbend(): Return the end of the schema name.
table_name_t::dblen(): Return the length of the schema name, in bytes.
table_name_t::basename(): Return the name without the schema name.
table_name_t::part(): Return the partition name, or NULL if none.
row_drop_table_for_mysql(): Assert for #sql, not #sql-ib.
Introduce the debug flag trx_t::persistent_stats to suppress the
assertion for the updates of persistent statistics during fast
shutdown.
dict_stats_exec_sql(): Do execute the statement even though shutdown
has been initiated.
dict_stats_exec_sql(): Expect the caller to always provide a transaction.
Remove some redundant assertions. The caller must hold dict_sys->mutex,
but holding dict_operation_lock is only necessary for accessing
data dictionary tables, which we are not accessing.
dict_stats_save_index_stat(): Acquire dict_sys->mutex
for invoking dict_stats_exec_sql().
dict_stats_save(), dict_stats_update_for_index(), dict_stats_update(),
dict_stats_drop_index(), dict_stats_delete_from_table_stats(),
dict_stats_delete_from_index_stats(), dict_stats_drop_table(),
dict_stats_rename_in_table_stats(), dict_stats_rename_in_index_stats(),
dict_stats_rename_table(): Use a single caller-provided
transaction that is started and committed or rolled back by the caller.
dict_stats_process_entry_from_recalc_pool(): Let the caller provide
a transaction object.
ha_innobase::open(): Pass a transaction to dict_stats_init().
ha_innobase::create(), ha_innobase::discard_or_import_tablespace():
Pass a transaction to dict_stats_update().
ha_innobase::rename_table(): Pass a transaction to
dict_stats_rename_table(). We do not use the same transaction
as the one that updated the data dictionary tables, because
we already released the dict_operation_lock. (FIXME: there is
a race condition; a lock wait on SYS_* tables could occur
in another DDL transaction until the data dictionary transaction
is committed.)
ha_innobase::info_low(): Pass a transaction to dict_stats_update()
when calculating persistent statistics.
alter_stats_norebuild(), alter_stats_rebuild(): Update the
persistent statistics as well. In this way, a single transaction
will be used for updating the statistics of a whole table, even
for partitioned tables.
ha_innobase::commit_inplace_alter_table(): Drop statistics for
all partitions when adding or dropping virtual columns, so that
the statistics will be recalculated on the next handler::open().
This is a refactored version of Oracle Bug#22469660 fix.
RecLock::add_to_waitq(), lock_table_enqueue_waiting():
Do not allow a lock wait to occur for updating statistics
in a data dictionary transaction, such as DROP TABLE. Instead,
return the previously unused error code DB_QUE_THR_SUSPENDED.
row_merge_lock_table(), row_mysql_lock_table(): Remove dead code
for handling DB_QUE_THR_SUSPENDED.
row_drop_table_for_mysql(), row_truncate_table_for_mysql():
Drop the statistics as part of the data dictionary transaction.
After TRUNCATE TABLE, the statistics will be recalculated on
subsequent ha_innobase::open(), similar to how the logic after
the above-mentioned Oracle Bug#22469660 fix in
ha_innobase::commit_inplace_alter_table() works.
btr_defragment_thread(): Use a single transaction object for
updating defragmentation statistics.
dict_stats_save_defrag_stats(), dict_stats_save_defrag_stats(),
dict_stats_process_entry_from_defrag_pool(),
dict_defrag_process_entries_from_defrag_pool(),
dict_stats_save_defrag_summary(), dict_stats_save_defrag_stats():
Add a parameter for the transaction.
dict_stats_empty_table(): Make public. This will be called by
row_truncate_table_for_mysql() after dropping persistent statistics,
to clear the memory-based statistics as well.
dict_stats_exec_sql(): Refuse the operation if shutdown has been
initiated.
The real fix would be to update the persistent statistics as part
of the data dictionary transactions. To do this, we should move the
storage of InnoDB persistent statistics to the InnoDB data files,
and maybe also remove the InnoDB data dictionary.
With a big buffer pool that contains many data pages,
DISCARD TABLESPACE took a long time, because it would scan the
entire buffer pool to remove any pages that belong to the tablespace.
With a large buffer pool, this would take a lot of time, especially
when the table-to-discard is empty.
The minimum amount of work that DISCARD TABLESPACE must do is to
remove the pages of the to-be-discarded table from the
buf_pool->flush_list because any writes to the data file must be
prevented before the file is deleted.
If DISCARD TABLESPACE does not evict the pages from the buffer pool,
then IMPORT TABLESPACE must do it, because we must prevent pre-DISCARD,
not-yet-evicted pages from being mistaken for pages of the imported
tablespace.
It would not be a useful fix to simply move the buffer pool scan to
the IMPORT TABLESPACE step. What we can do is to actively evict those
pages that could be mistaken for imported pages. In this way, when
importing a small table into a big buffer pool, the import should
still run relatively fast.
Import is bypassing the buffer pool when reading pages for the
adjustment phase. In the adjustment phase, if a page exists in
the buffer pool, we could replace it with the page from the imported
file. Unfortunately I did not get this to work properly, so instead
we will simply evict any matching page from the buffer pool.
buf_page_get_gen(): Implement BUF_EVICT_IF_IN_POOL, a new mode
where the requested page will be evicted if it is found. There
must be no unwritten changes for the page.
buf_remove_t: Remove. Instead, use trx!=NULL to signify that a write
to file is desired, and use a separate parameter bool drop_ahi.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Replace buf_remove_t.
buf_LRU_remove_pages(), buf_LRU_remove_all_pages(): Remove.
PageConverter::m_mtr: A dummy mini-transaction buffer
PageConverter::PageConverter(): Complete the member initialization list.
PageConverter::operator()(): Evict any 'shadow' pages from the
buffer pool so that pre-existing (garbage) pages cannot be mistaken
for pages that exist in the being-imported file.
row_discard_tablespace(): Remove a bogus comment that seems to
refer to IMPORT TABLESPACE, not DISCARD TABLESPACE.
There are two bugs related to failed ADD INDEX and
the InnoDB table cache eviction.
dict_table_close(): Try dropping failed ADD INDEX when releasing
the last table handle, not when releasing the last-but-one.
dict_table_remove_from_cache_low(): Do not invoke
row_merge_drop_indexes() after freeing all index metadata.
Instead, directly invoke row_merge_drop_indexes_dict() to
remove the metadata from the persistent data dictionary
and to free the index pages.
The fix 716f97e271
was inadvertently reverted in
commit 2e814d4702
"Merge InnoDB 5.7 from mysql-5.7.9".
Reapply the fix, because the test of the bug would fail
after merging MDEV-13838, which replaced an earlier incorrect
bug fix with a correct one.
Reverted incorrect changes done on MDEV-7367 and MDEV-9469. Fixes properly
also related bugs:
MDEV-13668: InnoDB unnecessarily rebuilds table when renaming a column and adding index
MDEV-9469: 'Incorrect key file' on ALTER TABLE
MDEV-9548: Alter table (renaming and adding index) fails with "Incorrect key file for table"
MDEV-10535: ALTER TABLE causes standalone/wsrep cluster crash
MDEV-13640: ALTER TABLE CHANGE and ADD INDEX on auto_increment column fails with "Incorrect key file for table..."
Root cause for all these bugs is the fact that MariaDB .frm file
can contain virtual columns but InnoDB dictionary does not and
previous fixes were incorrect or unnecessarily forced table
rebuilt. In index creation key_part->fieldnr can be bigger than
number of columns in InnoDB data dictionary. We need to skip not
stored fields when calculating correct column number for InnoDB
data dictionary.
dict_table_get_col_name_for_mysql
Remove
innobase_match_index_columns
Revert incorrect change done on MDEV-7367
innobase_need_rebuild
Remove unnecessary rebuild force when column is renamed.
innobase_create_index_field_def
Calculate InnoDB column number correctly and remove
unnecessary column name set.
innobase_create_index_def, innobase_create_key_defs
Remove unneeded fields parameter. Revert unneeded memset.
prepare_inplace_alter_table_dict
Remove unneeded col_names parameter
index_field_t
Remove unneeded col_name member.
row_merge_create_index
Remove unneeded col_names parameter and resolution.
Effected tests:
innodb-alter-table : Add test case for MDEV-13668
innodb-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
innodb-wl5980-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
MariaDB 10.1 introduced non-indexed virtual columns for InnoDB tables.
When MySQL 5.7 introduced virtual columns in InnoDB tables, it also
introduced the table SYS_VIRTUAL that stores metadata on virtual
columns. This table does not initially exist in data files that were
imported from 10.1. So, we do not always have virtual column metadata
inside InnoDB.
dict_index_contains_col_or_prefix(): In the clustered index records,
all non-virtual columns are present and no virtual columns are present.
ha_innobase::build_template(): In the clustered index, do not
include virtual columns in the query template. The SQL layer is
supposed to compute the virtual column values when needed.
The function dict_disable_redo_if_temporary() was supposed to
disable redo logging for temporary tables. It was invoked
unnecessarily for two read-only operations:
row_undo_search_clust_to_pcur() and
dict_stats_update_transient_for_index().
When a table is not temporary and not in the system tablespace,
the tablespace should be flagged for MLOG_FILE_NAME logging.
We do not need this overhead for temporary tables. Therefore,
either mtr_t::set_log_mode() or mtr_t::set_named_space() should
be invoked.
dict_table_t::is_temporary(): Determine if a table is temporary.
dict_table_is_temporary(): Redefined as a macro wrapper for
dict_table_t::is_temporary().
dict_disable_redo_if_temporary(): Remove.