This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
The implementation of handlerton::drop_database in InnoDB is
unnecessarily complex. The minimal implementation should check
that no conflicting locks or references exist on the tables,
delete all table metadata in a single transaction, and finally
delete the tablespaces.
Note: DROP DATABASE will delete each individual table that the
SQL layer knows about, one table per transaction.
The handlerton::drop_database is basically a final cleanup step
for removing any garbage that could have been left behind
in InnoDB due to some bug, or not having atomic DDL in the past.
hash_node_t: Remove. Use the proper data type name in pointers.
dict_drop_index_tree(): Do not take the table as a parameter.
Instead, return the tablespace ID if the tablespace should be dropped
(we are dropping a clustered index tree).
fil_delete_tablespace(), fil_system_t::detach(): Return a single
detached file handle. Multi-file tablespaces cannot be deleted
via this interface.
ha_innobase::delete_table(): Remove a work-around for non-atomic DDL
and do not try to drop tables with similar-looking name.
innodb_drop_database(): Complete rewrite.
innobase_drop_database(), dict_get_first_table_name_in_db(),
row_drop_database_for_mysql(), drop_all_foreign_keys_in_db(): Remove.
row_purge_remove_clust_if_poss_low(), row_undo_ins_remove_clust_rec():
If the tablespace is to be deleted, try to evict the table definition
from the cache. Failing that, set dict_table_t::space to nullptr.
lock_release_on_rollback(): On the rollback of CREATE TABLE, release all
locks that the transaction had on the table, to avoid heap-use-after-free.
trx_t::will_lock: Changed the type to bool.
trx_t::is_autocommit_non_locking(): Replaces
trx_is_autocommit_non_locking().
trx_is_ac_nl_ro(): Remove (replaced with equivalent assertion expressions).
assert_trx_nonlocking_or_in_list(): Remove.
Replaced with at least as strict checks in each place.
check_trx_state(): Moved to a static function; partially replaced with
individual debug assertions implementing equivalent or stricter checks.
ha_innobase::index_read(): If an autocommit non-locking transaction was
already started, refuse to access a SPATIAL INDEX.
Once a non-locking autocommit transaction has started, it must remain
in that mode (not acquire any locks).
This should fix one cause of the assertion failure that would occur in
DeadlockChecker::check_and_resolve() under heavy load, presumably
due to concurrent execution of trx_commit_in_memory().
During data file creation, InnoDB holds dict_sys mutex, tries to
write page 0 of the file and flushes the file. This not only causing
unnecessary contention but also a deviation from the write-ahead
logging protocol.
The clean sequence of operations is that we first start a dictionary
transaction and write SYS_TABLES and SYS_INDEXES records that identify
the tablespace. Then, we durably write a FILE_CREATE record to the
write-ahead log and create the file.
Recovery should not unnecessarily insist that the first page of each
data file that is referred to by the redo log is valid. It must be
enough that page 0 of the tablespace can be initialized based on the
redo log contents.
We introduce a new data structure deferred_spaces that keeps track
of corrupted-looking files during recovery. The data structure holds
the last LSN of a FILE_ record referring to the data file, the
tablespace identifier, and the last known file name.
There are two scenarios can happen during recovery:
i) Sufficient memory: InnoDB can reconstruct the
tablespace after parsing all redo log records.
ii) Insufficient memory(multiple apply phase): InnoDB should
store the deferred tablespace redo logs even though
tablespace is not present. InnoDB should start constructing
the tablespace when it first encounters deferred tablespace
id.
Mariabackup copies the zero filled ibd file in backup_fix_ddl() as
the extension of .new file. Mariabackup test case does page flushing
when it deals with DDL operation during backup operation.
fil_ibd_create(): Remove the write of page0 and flushing of file
fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has
zero filled page0
Datafile: Clean up the error handling, and do not report errors
if we are in the middle of recovery. The caller will check
Datafile::m_defer.
fil_node_t::deferred: Indicates whether the tablespace loading was
deferred during recovery
FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace
file was cannot be loaded.
recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to
initialize fil_space_t based on buffered metadata and records to
initialize page 0. Ignore the flags in fil_name_t, because they are
intentionally invalid.
fil_name_process(): Update deferred_spaces.
recv_sys_t::parse(): Store the redo log if the tablespace id
is present in deferred spaces
recv_sys_t::recover_low(): Should recover the first page of
the tablespace even though the tablespace instance is not
present
recv_sys_t::apply(): Initialize the deferred tablespace
before applying the deferred tablespace records
recv_validate_tablespace(): Skip the validation for deferred_spaces.
recv_rename_files(): Moved and revised from recv_sys_t::apply().
For deferred-recovery tablespaces, do not attempt to rename the
file if a deferred-recovery tablespace is associated with the name.
recv_recovery_from_checkpoint_start(): Invoke recv_rename_files()
and initialize all deferred tablespaces before applying redo log.
fil_node_t::read_page0(): Skip page0 validation if the tablespace
is deferred
buf_page_create_deferred(): A variant of buf_page_create() when
the fil_space_t is not available yet
This is joint work with Thirunarayanan Balathandayuthapani,
who implemented an initial prototype.
The counter metadata_table_reference_count was not updated consistently
ever since mysql-server@65c0af9a1dedae43b63797134aff6b32304ced52
or commit 2e814d4702
introduced dict_table_t::release().
The counter metadata_table_handles_closed was being incremented
unconditionally in ha_innobase::close(), while the corresponding
counter metadata_table_handles_opened would be incremented in
ha_innobase::open() if the function returned early due to an error.
MONITOR_TRX_ACTIVE: Remove. The count is not being updated consistently,
and it would also include read-only transactions that are otherwise
fully invisible to any other threads.
If it later turns out that a reliable count of active transactions
is needed, it can be exposed via a different interface.
trx_commit_for_mysql(): If the transaction was not started, return
immediately.
row_drop_table_after_create_fail(): Remove. This function was only used
during InnoDB initialization (or upgrade) when creating a system table
failed. Rollback will clean up a failed CREATE just fine, by invoking
dict_drop_index_tree().
Problem:
=======
Test fails with 3 different symptoms
connection slave;
Assertion text: 'Last_Seen_Transaction should show .'
Assertion condition: '"0-1-1" = ""'
Assertion condition, interpolated: '"0-1-1" = ""'
Assertion result: '0'
connection slave;
Assertion text: 'Value returned by SSS and PS table for Last_Error_Number
should be same.'
Assertion condition: '"1146" = "0"'
Assertion condition, interpolated: '"1146" = "0"'
Assertion result: '0'
connection slave;
Assertion text: 'Value returned by PS table for worker_idle_time should be
>= 1'
Assertion condition: '"0" >= "1"'
Assertion condition, interpolated: '"0" >= "1"'
Assertion result: '0'
Fix1:
====
Performance schema table's Last_Seen_Transaction is compared with 'SELECT
gtid_slave_pos'. Since DDLs are not transactional changes to user table and
gtid_slave_pos table are not guaranteed to be synchronous. To fix the
issue Gtid_IO_Pos value from SHOW SLAVE STATUS command will be used to
verify the correctness of Performance schema specific
Last_Seen_Transaction.
Fix2:
====
On error worker thread information is stored as part of backup pool. Access
to this backup pool should be protected by 'LOCK_rpl_thread_pool' mutex so
that simultaneous START SLAVE cannot destroy the backup pool, while it is
being queried by performance schema.
Fix3:
====
When a worker is waiting for events if performance schema table is queried,
at present it just returns the difference between current_time and
start_time. This is incorrect. It should be worker_idle_time +
(current_time - start_time).
For example a worker thread was idle for 10 seconds and then it got events
to process. Upon completion it goes to idle state, now if the pfs table is
queried it should return current_idle time + worker_idle_time.
This reverts commit 72fa9dabad
but doesn't recover deleted jars - they still exist in
mysql-test/connect/std_data, no need to have them twice.
Also it removes a redundant copy on JavaWrappers.jar
dict_drop_index_tree(): Even if SYS_INDEXES.PAGE contains the
special value FIL_NULL, the tablespace identified by SYS_INDEXES.SPACE
may exist and may need to be dropped. This would definitely be the case
if the server had been killed right after a FILE_CREATE record was
persistently written during CREATE TABLE, but before the transaction
was committed.
btr_free_if_exists(): Simplify the interface, to avoid repeated
tablespace lookup.
One more scenario is known to be broken: If the server is killed
during DROP TABLE (or table-rebuilding ALTER TABLE) right after a
FILE_DELETE record has been persistently written but before the
file was deleted, then we could end up recovering no tablespace
at all, and failing to delete the file, in either of fil_name_process()
or dict_drop_index_tree().
Thanks to Elena Stepanova for providing "rr replay" and data directories
of these scenarios.
Make DDL operations that involve FULLTEXT INDEX atomic.
In particular, we must drop the internal FTS_ tables in the same
DDL transaction with ALTER TABLE.
Remove all references to fts_drop_orphaned_tables().
row_merge_drop_temp_indexes(): Drop also the internal FTS_ tables
that are associated with index stubs that were created in
prepare_inplace_alter_table_dict() for
CREATE FULLTEXT INDEX before the server was killed.
fts_clear_all(): Remove the fts_drop_tables() call. It has to be
executed before the transaction is committed!
dict_load_indexes(): Do not load any metadata for index stubs
that had been created by prepare_inplace_alter_table_dict()
fts_create_one_common_table(), fts_create_common_tables(),
fts_create_one_index_table(), fts_create_index_tables():
Remove redundant error handling. The tables will be dropped
just fine by dict_drop_index_tree().
commit_try_norebuild(): Also drop the FTS_ tables when dropping
FULLTEXT INDEX.
The changes to the test case innodb_fts.crash_recovery has been
extensively tested. The non-debug server will be killed while
the 3 ALTER TABLE are in any phase of execution. With the debug
server, DEBUG_SYNC should make the test deterministic.
Before we create an InnoDB data file, we must have persistently
started a DDL transaction and written a record in SYS_INDEXES
as well as a FILE_CREATE record for creating the file.
In that way, if InnoDB is killed before the DDL transaction is
committed, the rollback will be able to delete the file in
dict_drop_index_tree().
dict_build_table_def_step(): Do not create the tablespace.
At this point, we have not written any log, not even for
inserting the SYS_TABLES record.
dict_create_sys_indexes_tuple(): Relax an assertion to tolerate
a missing tablespace before the first index has been created in
dict_create_index_step().
dict_build_index_def_step(): Relax the dict_table_open_on_name()
parameter, because no tablespace may be available yet.
tab_create_graph_create(), row_create_table_for_mysql(), tab_node_t:
Remove key_id, mode.
ind_create_graph_create(), row_create_index_for_mysql(), ind_node_t:
Add key_id, mode.
dict_create_index_space(): New function, to create the tablespace
during clustered index creation.
dict_create_index_step(): After the SYS_INDEXES record has been
written, invoke dict_create_index_space() to create the tablespace
if needed.
fil_ibd_create(): Before creating the file, persistently write a
FILE_CREATE record. This will also ensure that an incomplete DDL
transaction will be recovered. After creating the file, invoke
fsp_header_init().
InnoDB used to support at most one CREATE TABLE or DROP TABLE
per transaction. This caused complications for DDL operations on
partitioned tables (where each partition is treated as a separate
table by InnoDB) and FULLTEXT INDEX (where each index is maintained
in a number of internal InnoDB tables).
dict_drop_index_tree(): Extend the MDEV-24589 logic and treat
the purge or rollback of SYS_INDEXES records of clustered indexes
specially: by dropping the tablespace if it exists. This is the only
form of recovery that we will need.
trx_undo_ddl_type: Document the DDL undo log record types better.
trx_t::dict_operation: Change the type to bool.
trx_t::ddl: Remove.
trx_t::table_id, trx_undo_t::table_id: Remove.
dict_build_table_def_step(): Remove trx_t::table_id logging.
dict_table_close_and_drop(), row_merge_drop_table(): Remove.
row_merge_lock_table(): Merged to the only callers, which can
call lock_table_for_trx() directly.
fts_aux_table_t, fts_aux_id, fts_space_set_t: Remove.
fts_drop_orphaned_tables(): Remove.
row_merge_rename_index_to_drop(): Remove. Thanks to MDEV-24589,
we can simply delete the to-be-dropped indexes from SYS_INDEXES,
while still being able to roll back the operation.
ha_innobase_inplace_ctx: Make a few data members const.
Preallocate trx.
prepare_inplace_alter_table_dict(): Simplify the logic. Let the
normal rollback take care of some cleanup.
row_undo_ins_remove_clust_rec(): Simplify the parsing of SYS_COLUMNS.
trx_rollback_active(): Remove the special DROP TABLE logic.
trx_undo_mem_create_at_db_start(), trx_undo_reuse_cached():
Always write TRX_UNDO_TABLE_ID as 0.
after previous error upon multi-RENAME
- InnoDB fails to rename the foreign key constraint while
rollbacking the rename operation. In that case, InnoDB should
rename the FK constraint too.
This happens during repair when a temporary table is opened
with HA_OPEN_COPY, which resets 'share->born_transactional', which
the encryption code did not like.
Fixed by resetting just share->now_transactional.
InnoDB tries to fetch the deleted doc ids for discarded
tablespace. In i_s_fts_deleted_generic_fill(), InnoDB needs
to check whether the table is discarded or not before fetching
deleted doc ids.
xdes_get_descriptor_with_space_hdr(): Use the correct mode
BUF_GET_POSSIBLY_FREED also when the tablespace is larger
than innodb_page_size pages. This function could be called by
fseg_free_step().
fsp_alloc_seg_inode(): For completeness (and for improved robustness
in case of a corrupted tablespace), use BUF_GET_POSSIBLY_FREED.
With this, the entire compilation unit fsp0fsp.cc will use that mode.
fil_ibd_load(): Remove a message that is basically saying that
everything works as expected. The other "Ignoring data file" message
about the presence of an extraneous file will be retained
(and expected by the test innodb.log_file_name).
In commit 54e2e70194
we relaxed a debug assertion in the POSIX version of
os_file_rename_func() only. Let us relax it also on Windows,
so that the test innodb.truncate_crash will pass.
In commit 91599701d0 (MDEV-25312)
some recovery code for TRUNCATE TABLE was broken
causing a regression in a case where undo log for a RENAME TABLE
operation had been durably written but the tablespace had not been
renamed yet.
row_rename_table_for_mysql(): Add a DEBUG_SYNC point for the
test case, and simplify the logic and trim the error messages.
fil_space_t::rename(): Simplify the operation. Merge the necessary
part of fil_rename_tablespace_check(). If there is no change to
the file name, do nothing.
dict_table_t::rename_tablespace(): Refactored from
dict_table_rename_in_cache().
row_undo_ins_parse_undo_rec(): On rolling back TRX_UNDO_RENAME_TABLE,
invoke dict_table_t::rename_tablespace() even if the table name matches.
os_file_rename_func(): Temporarily relax an assertion that would
fail during the recovery in the test innodb.truncate_crash.