This is a complete rewrite of DROP TABLE, also as part of other DDL,
such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE.
The background DROP TABLE queue hack is removed.
If a transaction needs to drop and create a table by the same name
(like TRUNCATE TABLE does), it must first rename the table to an
internal #sql-ib name. No committed version of the data dictionary
will include any #sql-ib tables, because whenever a transaction
renames a table to a #sql-ib name, it will also drop that table.
Either the rename will be rolled back, or the drop will be committed.
Data files will be unlinked after the transaction has been committed
and a FILE_RENAME record has been durably written. The file will
actually be deleted when the detached file handle returned by
fil_delete_tablespace() will be closed, after the latches have been
released. It is possible that a purge of the delete of the SYS_INDEXES
record for the clustered index will execute fil_delete_tablespace()
concurrently with the DDL transaction. In that case, the thread that
arrives later will wait for the other thread to finish.
HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag.
ha_innobase::truncate() now requires that all other references to
the table be released in advance. This was implemented by Monty.
ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected,
we will "hijack" the current transaction, drop the table in
the current transaction and commit the current transaction.
This essentially fixes MDEV-21602. There is a FIXME comment about
making the check less failure-prone.
ha_innobase::truncate(), ha_innobase::delete_table():
Implement a fast path for temporary tables. We will no longer allow
temporary tables to use the adaptive hash index.
dict_table_t::mdl_name: The original table name for the purpose of
acquiring MDL in purge, to prevent a race condition between a
DDL transaction that is dropping a table, and purge processing
undo log records of DML that had executed before the DDL operation.
For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the
dict_table_t::mdl_name will differ from dict_table_t::name.
dict_table_t::parse_name(): Use mdl_name instead of name.
dict_table_rename_in_cache(): Update mdl_name.
For the internal FTS_ tables of FULLTEXT INDEX, purge would
acquire MDL on the FTS_ table name, but not on the main table,
and therefore it would be able to run concurrently with a
DDL transaction that is dropping the table. Previously, the
DROP TABLE queue hack prevented a race between purge and DDL.
For now, we introduce purge_sys.stop_FTS() to prevent purge from
opening any table, while a DDL transaction that may drop FTS_
tables is in progress. The function fts_lock_table(), which will
be invoked before the dictionary is locked, will wait for
purge to release any table handles.
trx_t::drop_table_statistics(): Drop statistics for the table.
This replaces dict_stats_drop_index(). We will drop or rename
persistent statistics atomically as part of DDL transactions.
On lock conflict for dropping statistics, we will fail instantly
with DB_LOCK_WAIT_TIMEOUT, because we will be holding the
exclusive data dictionary latch.
trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory().
Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT
in addition to DB_DUPLICATE_KEY. The call to fts_commit() is
entirely misplaced here and may obviously break the consistency
of transactions that affect FULLTEXT INDEX. It needs to be fixed
separately.
dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175).
The counter was a work-around for missing meta-data locking (MDL)
on the SQL layer, and not really needed in MariaDB.
ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28.
HA_ERR_TABLE_IN_FK_CHECK: Remove.
row_ins_check_foreign_constraints(): Do not acquire
dict_sys.latch either. The SQL-layer MDL will protect us.
This was reviewed by Thirunarayanan Balathandayuthapani
and tested by Matthias Leich.
Many InnoDB data dictionary cache operations require that the
table name be copied so that it will be NUL terminated.
(For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.)
dict_table_t::is_garbage_name(): Check if a name belongs to
the background drop table queue.
dict_check_if_system_table_exists(): Remove.
dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables
SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup.
dict_sys_t::create_or_check_sys_tables(): Replaces
dict_create_or_check_foreign_constraint_tables() and
dict_create_or_check_sys_virtual().
dict_sys_t::load_table(): Replaces dict_table_get_low()
and dict_load_table().
dict_sys_t::find_table(): Renamed from get_table().
dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded
tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist.
trx_t::has_stats_table_lock(): Moved to dict0stats.cc.
Some error messages will now report table names in the internal
databasename/tablename format, instead of `databasename`.`tablename`.
Problem:
======
Making the tablespace as page_compressed doesn't do table rebuild.
It does change only the FSP_SPACE_FLAGS.
During recovery:
1) InnoDB encounters FILE_CREATE redo log and opens the tablespace
with old FSP_SPACE_FLAGS value.
2) Only parsing of redo log has been finished. Now InnoDB tries to
load the table. If the existing tablespace flags doesn't match
with table flags then InnoDB should read page0. But in
fsp_flags_try_adjust(), skips the page read for full_crc32 format.
3) After that, InnoDB tries to open the clustered index and it
leads to failure of page validation.
Fix:
===
While parsing the redo log record, track FSP_SPACE_FLAGS in
recv_spaces for the respective space id. Assign the flags for
the tablespace when it is loaded.
recv_parse_set_size_and_flags(): Parse the redo log to set the
tablespace recovery size and flags.
fil_space_set_recv_size_and_flags(): Changed from
fil_space_set_recv_size(). To set the recovery size and flags of
the tablespace.
Introduce flags variable in file_name_t to maintain the tablespace
flag which we encountered during parsing of redo log.
is_flags_full_crc32_equal(), is_flags_non_full_crc32_equal(): Rename
the variable page_ssize and space_page_ssize with fcrc32_psize and
non_fcrc32_psize.
MariaDB data-at-rest encryption (innodb_encrypt_tables)
had repurposed the same unused data field that was repurposed
in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN)
field of SPATIAL INDEX. Because of this, MariaDB was unable to
support encryption on SPATIAL INDEX pages.
Furthermore, InnoDB page checksums skipped some bytes, and there
are multiple variations and checksum algorithms. By default,
InnoDB accepts all variations of all algorithms that ever existed.
This unnecessarily weakens the page checksums.
We hereby introduce two more innodb_checksum_algorithm variants
(full_crc32, strict_full_crc32) that are special in a way:
When either setting is active, newly created data files will
carry a flag (fil_space_t::full_crc32()) that indicates that
all pages of the file will use a full CRC-32C checksum over the
entire page contents (excluding the bytes where the checksum
is stored, at the very end of the page). Such files will always
use that checksum, no matter what the parameter
innodb_checksum_algorithm is assigned to.
For old files, the old checksum algorithms will continue to be
used. The value strict_full_crc32 will be equivalent to strict_crc32
and the value full_crc32 will be equivalent to crc32.
ROW_FORMAT=COMPRESSED tables will only use the old format.
These tables do not support new features, such as larger
innodb_page_size or instant ADD/DROP COLUMN. They may be
deprecated in the future. We do not want an unnecessary
file format change for them.
The new full_crc32() format also cleans up the MariaDB tablespace
flags. We will reserve flags to store the page_compressed
compression algorithm, and to store the compressed payload length,
so that checksum can be computed over the compressed (and
possibly encrypted) stream and can be validated without
decrypting or decompressing the page.
In the full_crc32 format, there no longer are separate before-encryption
and after-encryption checksums for pages. The single checksum is
computed on the page contents that is written to the file.
We do not make the new algorithm the default for two reasons.
First, MariaDB 10.4.2 was a beta release, and the default values
of parameters should not change after beta. Second, we did not
yet implement the full_crc32 format for page_compressed pages.
This will be fixed in MDEV-18644.
This is joint work with Marko Mäkelä.
In tests that directly write InnoDB data file pages,
compute the innodb_checksum_algorithm=crc32 checksums,
instead of writing the 0xdeadbeef value used by
innodb_checksum_algorithm=none. In this way, these tests
will not cause failures when executing
./mtr --mysqld=--loose-innodb-checksum-algorithm=strict_crc32
- This is a regression of commit b26e603aeb. While dropping
the incompletely created table, InnoDB shouldn't consider that operation as non-atomic one.
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
row_drop_table_for_mysql(): Avoid accessing non-existing dictionary tables.
dict_create_or_check_foreign_constraint_tables(): Add debug instrumentation
for creating and dropping a table before the creation of any non-core
dictionary tables.
trx_purge_add_update_undo_to_history(): Adjust a debug assertion, so that
it will not fail due to the test instrumentation.
When code from MySQL 5.7.9 was merged to MariaDB 10.2.2
in commit 2e814d4702
an assignment validate=true was inadvertently added to the function
dict_check_sys_tables().
This causes InnoDB to open every single .ibd file on startup, even
when no crash recovery was needed.
Simply removing the assignment would make some tests fail. We do the
best to retain almost the same level of inconsistency detection.
In the test innodb.table_flags, access to one of the tables will not
be blocked despite inconsistent flags.
dict_check_sys_tables(): Remove the problematic assignment, and skip
validation in normal startup.
dict_load_table_one(): If the .ibd file cannot be opened, mark the
table as corrupted and unreadable.
fil_node_open_file(): Validate FSP_SPACE_FLAGS with the expected
flags. If reading the tablespace fails, invalidate node->handle
instead of letting it remain stale. This bug was caught by a
fil_validate() assertion failure.
fsp_flags_try_adjust(): If the tablespace file is invalid, do nothing.
1. Reverts "Tests: disabled TRT for some IB tests [#302]"
6d78496aee
2. Removes setting TRANSACTION_REGISTRY=0 in mysqldump
--system-versioning-transaction-registry now is OFF by default.
This commit should be reverted back if the default will change.
Tests affected: mysqldump mysqldump-max openssl_1
As they fail on TRT schema check:
innodb.log_file
innodb.table_flags
innodb.row_format_redundant
encryption.innodb_encrypt_log_corruption
encryption.innodb_first_page
This should have been part of MDEV-12288.
trx_undo_t::del_marks: Remove.
Purge needs to process all undo log records in order to
reset the DB_TRX_ID. Before MDEV-12288, it sufficed to only delete
the purgeable delete-marked records, and it ignore other undo log.
trx_rseg_t::needs_purge: Renamed from trx_rseg_t::last_del_marks.
Indicates whether a rollback segment needs to be processed by purge.
TRX_UNDO_NEEDS_PURGE: Renamed from TRX_UNDO_DEL_MARKS.
Indicates whether a rollback segment needs to be processed by purge.
This will be 1 until trx_purge_free_segment() has been invoked.
row_purge_record_func(): Set the is_insert flag for TRX_UNDO_INSERT_REC,
so that the DB_ROLL_PTR will match in row_purge_reset_trx_id().
trx_purge_fetch_next_rec(): Add a comment about row_purge_record_func()
going to set the is_insert flag.
trx_purge_read_undo_rec(): Always attempt to read the undo log record.
trx_purge_get_next_rec(): Do not skip any undo log records.
Even when no clustered index record is going to be removed,
we may want to reset some DB_TRX_ID,DB_ROLL_PTR.
trx_undo_rec_get_cmpl_info(), trx_undo_rec_get_extern_storage(): Remove.
trx_purge_add_undo_to_history(): Set the TRX_UNDO_NEEDS_PURGE flag
so that the resetting will work on undo logs that were originally
created before MDEV-12288 (MariaDB 10.3.1).
trx_undo_roll_ptr_is_insert(), trx_purge_free_segment(): Cleanup
(should be no functional change).
sql_sequence.read_only: Show that the sequence can be read in
both read-only and read-write mode, and that the sequence remains
accessible after a server restart.
innodb.table_flags: Adjust the test case. Due to the MDEV-12873 fix
in 10.2, the corrupted flags for table test.td would be converted,
and a tablespace flag mismatch will occur when trying to open the file.
Cover innodb.table_flags with the new innodb_page_size.combinations
32k and 64k.
dict_sys_tables_type_validate(): Remove an assertion that made a
check in the function redundant. Remove the excessive output to
the error log, as the invalid SYS_TABLES.TYPE value is already being
output.
Add a test case for corrupting SYS_TABLES.TYPE,
and for ROW_FORMAT=REDUNDANT, the unused field SYS_TABLES.MIX_LEN
that must be ignored (InnoDB before MySQL 5.5 wrote uninitialized
garbage to this column).
MariaDB 10.0 appears to validate the SYS_TABLES.TYPE properly.
This is a test-only change.