Move row size check to early CREATE/ALTER TABLE phase. Stop checking
on table open.
dict_index_add_to_cache(): remove parameter 'strict', stop checking row size
dict_index_t::record_size_info_t: this is a result of row size check operation
create_table_info_t::row_size_is_acceptable(): performs row size check.
Issues error or warning. Writes first overflow field to InnoDB log.
create_table_info_t::create_table(): add row size check
dict_index_t::record_size_info(): this is a refactored version
of dict_index_t::rec_potentially_too_big(). New version doesn't change global
state of a program but return all interesting info. And it's callers who
decide how to handle row size overflow.
dict_index_t::rec_potentially_too_big(): removed
dict_index_add_to_cache(): Make the 'index' a reference to a pointer,
so that the caller will avoid the expensive call to
dict_index_get_if_in_cache_low().
Problem:
=======
During dropping of fts index, InnoDB waits for fts_optimize_remove_table()
and it holds dict_sys->mutex and dict_operaiton_lock even though the
table id is not present in the queue. But fts_optimize_thread does wait
for dict_sys->mutex to process the unrelated table id from the slot.
Solution:
========
Whenever table is added to fts_optimize_wq, update the fts_status
of in-memory fts subsystem to TABLE_IN_QUEUE. Whenever drop index
wants to remove table from the queue, it can check the fts_status
to decide whether it should send the MSG_DELETE_TABLE to the queue.
Removed the following functions because these are all deadcode.
dict_table_wait_for_bg_threads_to_exit(),
fts_wait_for_background_thread_to_start(),fts_start_shutdown(), fts_shudown().
fkerr_t: Errors for the foreign key checks. Replaces ulint,
which used #define that looked like dberr_t literals.
wsrep_dict_foreign_find_index(): Remove. Use
dict_foreign_find_index() instead, with default parameters.
dict_foreign_push_index_error(): Do not add redundant quotes
around quoted table names.
In MySQL 5.7.8 an extra level of pointer indirection was added to
dict_operation_lock and some other rw_lock_t without solid justification,
in mysql/mysql-server@52720f1772.
Let us revert that change and remove the rather useless rw_lock_t
constructor and destructor and the magic_n field. In this way,
some unnecessary pointer dereferences and heap allocation will be avoided
and debugging might be a little easier.
Some places didn't match the previous rules, making the Floor
address wrong.
Additional sed rules:
sed -i -e 's/Place.*Suite .*, Boston/Street, Fifth Floor, Boston/g'
sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'
In MariaDB, InnoDB tables will always contain DATA_N_SYS_COLS = 3
columns, 2 or 3 of which are present in the clustered index.
We remove the predicate that was added in MySQL 5.7 as part of WL#7682.
It turned out that ha_innobase::truncate() would prematurely
commit the transaction already before the completion of the
ha_innobase::create(). All of this must be atomic.
innodb.truncate_crash: Use the correct DEBUG_SYNC point, and
tolerate non-truncation of the table, because the redo log
for the TRUNCATE transaction commit might be flushed due to
some InnoDB background activity.
dict_build_tablespace_for_table(): Merge to the function
dict_build_table_def_step().
dict_build_table_def_step(): If a table is being created during
an already started data dictionary transaction (such as TRUNCATE),
persistently write the table_id to the undo log header before
creating any file. In this way, the recovery of TRUNCATE will be
able to delete the new file before rolling back the rename of
the original table.
dict_table_rename_in_cache(): Add the parameter replace_new_file,
used as part of rolling back a TRUNCATE operation.
fil_rename_tablespace_check(): Add the parameter replace_new.
If the parameter is set and a file identified by new_path exists,
remove a possible tablespace and also the file.
create_table_info_t::create_table_def(): Remove some debug assertions
that no longer hold. During TRUNCATE, the transaction will already
have been started (and performed a rename operation) before the
table is created. Also, remove a call to dict_build_tablespace_for_table().
create_table_info_t::create_table(): Add the parameter create_fk=true.
During TRUNCATE TABLE, do not add FOREIGN KEY constraints to the
InnoDB data dictionary, because they will also not be removed.
row_table_add_foreign_constraints(): If trx=NULL, do not modify
the InnoDB data dictionary, but only load the FOREIGN KEY constraints
from the data dictionary.
ha_innobase::create(): Lock the InnoDB data dictionary cache only
if no transaction was passed by the caller. Unlock it in any case.
innobase_rename_table(): Add the parameter commit = true.
If !commit, do not lock or unlock the data dictionary cache.
ha_innobase::truncate(): Lock the data dictionary before invoking
rename or create, and let ha_innobase::create() unlock it and
also commit or roll back the transaction.
trx_undo_mark_as_dict(): Renamed from trx_undo_mark_as_dict_operation()
and declared global instead of static.
row_undo_ins_parse_undo_rec(): If table_id is set, this must
be rolling back the rename operation in TRUNCATE TABLE, and
therefore replace_new_file=true.
This is a backport of commit 0bc36758ba
and commit 9eb3fcc9fb.
InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2
redo log records during table-rebuilding ALGORITHM=INPLACE operations.
We must write the records for any .ibd file renames, so that the
operations are crash-safe.
If InnoDB is killed during a RENAME TABLE operation, it can happen that
the transaction for updating the data dictionary will be rolled back.
But, nothing will roll back the renaming of the .ibd file
(the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter,
the renaming of the dict_table_t::name in the dict_sys cache. We introduce
the undo log record TRX_UNDO_RENAME_TABLE to fix this.
fil_space_for_table_exists_in_mem(): Remove the parameters
adjust_space, table_id and some code that was trying to work around
these deficiencies.
fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record.
dict_table_rename_in_cache(): Invoke fil_name_write_rename().
trx_undo_rec_copy(): Set the first 2 bytes to the length of the
copied undo log record.
trx_undo_page_report_rename(), trx_undo_report_rename():
Write a TRX_UNDO_RENAME_TABLE record with the old table name.
row_rename_table_for_mysql(): Invoke trx_undo_report_rename()
before modifying any data dictionary tables.
row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE
by invoking dict_table_rename_in_cache(), which will take care
of both renaming the table and the file.
ha_innobase::truncate(): Remove a work-around.
When MySQL 5.7.1 introduced WL#6326 to reduce contention on the
non-leaf levels of B-trees, it introduced a new rw-lock mode SX
(not conflicting with S, but conflicting with SX and X) and
new rules to go with it.
A thread that is holding an dict_index_t::lock aka index->lock
in SX mode is permitted to acquire non-leaf buf_block_t::lock
aka block->lock X or SX mode, in monotonically descending order.
That is, once the thread has acquired a block->lock, it is not
allowed to acquire a lock on its parent or grandparent pages.
Such arbitrary-order access is only allowed when the thread
acquired the index->lock in X mode upfront.
A customer encountered a repeatable hang when loading a dump into
InnoDB while using multiple innodb_purge_threads (default: 4).
The dump makes very heavy use of FOREIGN KEY constraints.
By luck, it happened so that two purge worker threads (srv_worker_thread)
deadlocked with each other. Both were operating on the index FOR_REF
of the InnoDB internal table SYS_FOREIGN. One of them was legitimately
holding index->lock S-latch and the root block->lock S-latch. The other
had acquired index->lock SX-latch, root block->lock SX-latch, and a bunch
of other latches, including the fil_space_t::latch for freeing some blocks
and some leaf page latches. This other thread was inside 2 nested calls
to btr_compress() and it was trying to reacquire the root block->lock
in X mode, violating the WL#6326 protocol.
This violation led to a deadlock, because while S is compatible with SX
and a thread can upgrade an SX lock to X when there are no conflicting
requests, in this case there was a conflicting S lock held by the other
purge worker thread.
During this deadlock, both threads are holding dict_operation_lock S-latch,
which would block any subsequent DDL statements, such as CREATE TABLE.
The tables SYS_FOREIGN and SYS_FOREIGN_COLS are special in that they
define key columns of the type VARCHAR(0), created using the InnoDB
internal SQL parser. Because InnoDB does not internally enforce the
maximum length of columns, it would happily write more than 0 bytes
to these columns. This caused a miscalculation of node_ptr_max_size.
btr_cur_will_modify_tree(): Clean up some code. (No functional change.)
btr_node_ptr_max_size(): Renamed from dict_index_node_ptr_max_size().
Use a more realistic maximum size for SYS_FOREIGN and SYS_FOREIGN_COLS.
btr_cur_pessimistic_delete(): Refrain from merging pages if it is
not safe.
This work is based on the following MySQL 5.7.23 fix:
commit 58dcf0b4a4165ed59de94a9a1e7d8c954f733726
Author: Aakanksha Verma <aakanksha.verma@oracle.com>
Date: Wed May 9 18:54:03 2018 +0530
BUG#26225783 MYSQL CRASH ON CREATE TABLE (REPRODUCEABLE) -> INNODB: A
LONG SEMAPHORE WAIT
After a failed ADD INDEX, dict_index_remove_from_cache_low()
could iterate the index fields and dereference a freed virtual
column object when trying to remove the index from the v_indexes
of the virtual column.
This regression was caused by a merge of
MDEV-16119 InnoDB lock->index refers to a freed object.
ha_innobase_inplace_ctx::clear_added_indexes(): Detach the
indexes of uncommitted indexes from virtual columns, so that
the iteration in dict_index_remove_from_cache_low() can be avoided.
ha_innobase::prepare_inplace_alter_table(): Ignore uncommitted
corrupted indexes when rejecting ALTER TABLE. (This minor bug was
revealed by the extension of the test case.)
dict_index_t::detach_columns(): Detach an index from virtual columns.
Invoked by both dict_index_remove_from_cache_low() and
ha_innobase_inplace_ctx::clear_added_indexes().
dict_col_t::detach(const dict_index_t& index): Detach an index from
a column.
dict_col_t::is_virtual(): Replaces dict_col_is_virtual().
dict_index_t::has_virtual(): Replaces dict_index_has_virtual().
PROBLEM
=======
When add of virtual index fails with DB_TOO_BIG_RECORD , the virtual
index being freed isn't removed from the list of indexes a virtual
column(which is part of the index). This while the undo log is read
could fetch a wrong value during rollback and cause the assertion
reported in the bug particularly.
FIX
===
Added a function that is called when the virtual index being freed would
allow the index be removed from the index list of virtual column which
was a field of that index.
Reviwed By: Jimmy Yang<Jimmy.Yang@oracle.com>
RB: 18528
If creating a secondary index fails (typically, ADD UNIQUE INDEX fails
due to duplicate key), it is possible that concurrently running UPDATE
or DELETE will access the index stub and hit the debug assertion.
It does not make any sense to keep updating an uncommitted index whose
creation has failed.
dict_index_t::is_corrupted(): Replaces dict_index_is_corrupted().
Also take online_status into account.
Replace some calls to dict_index_is_clust() with calls to
dict_index_t::is_primary().
InnoDB limited the maximum number of bytes per character to 4.
But, the filename character set that was introduced in MySQL 5.1
uses up to 5 bytes per character.
To allow InnoDB tables to be created with wider characters, let
us split the mbminmaxlen fields into mbminlen, mbmaxlen, and increase
the limit to 7 bytes per character. This will increase the payload size
of dtype_t and dict_col_t by one bit. The storage size will be unchanged
(54 bits and 77 bits will use the same number of bytes as the
previous sizes 53 and 76 bits).
Reverted incorrect changes done on MDEV-7367 and MDEV-9469. Fixes properly
also related bugs:
MDEV-13668: InnoDB unnecessarily rebuilds table when renaming a column and adding index
MDEV-9469: 'Incorrect key file' on ALTER TABLE
MDEV-9548: Alter table (renaming and adding index) fails with "Incorrect key file for table"
MDEV-10535: ALTER TABLE causes standalone/wsrep cluster crash
MDEV-13640: ALTER TABLE CHANGE and ADD INDEX on auto_increment column fails with "Incorrect key file for table..."
Root cause for all these bugs is the fact that MariaDB .frm file
can contain virtual columns but InnoDB dictionary does not and
previous fixes were incorrect or unnecessarily forced table
rebuilt. In index creation key_part->fieldnr can be bigger than
number of columns in InnoDB data dictionary. We need to skip not
stored fields when calculating correct column number for InnoDB
data dictionary.
dict_table_get_col_name_for_mysql
Remove
innobase_match_index_columns
Revert incorrect change done on MDEV-7367
innobase_need_rebuild
Remove unnecessary rebuild force when column is renamed.
innobase_create_index_field_def
Calculate InnoDB column number correctly and remove
unnecessary column name set.
innobase_create_index_def, innobase_create_key_defs
Remove unneeded fields parameter. Revert unneeded memset.
prepare_inplace_alter_table_dict
Remove unneeded col_names parameter
index_field_t
Remove unneeded col_name member.
row_merge_create_index
Remove unneeded col_names parameter and resolution.
Effected tests:
innodb-alter-table : Add test case for MDEV-13668
innodb-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
innodb-wl5980-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
The function dict_disable_redo_if_temporary() was supposed to
disable redo logging for temporary tables. It was invoked
unnecessarily for two read-only operations:
row_undo_search_clust_to_pcur() and
dict_stats_update_transient_for_index().
When a table is not temporary and not in the system tablespace,
the tablespace should be flagged for MLOG_FILE_NAME logging.
We do not need this overhead for temporary tables. Therefore,
either mtr_t::set_log_mode() or mtr_t::set_named_space() should
be invoked.
dict_table_t::is_temporary(): Determine if a table is temporary.
dict_table_is_temporary(): Redefined as a macro wrapper for
dict_table_t::is_temporary().
dict_disable_redo_if_temporary(): Remove.
The field dict_table_t::big_rows was only used for determining if
the adaptive hash index should be used when the internal InnoDB SQL
parser is used. That parser is only used for modifying the InnoDB
data dictionary, updating persistent tables, and for fulltext indexes.
This fixes a regression that only affects debug builds, caused by
commit 48192f963a which is necessary
preparation for MDEV-11369 instant ADD COLUMN. (Although that is a
10.3 task, to ease merges between 10.2 and 10.3, this change that
improves debug checks was pushed to 10.2 already.)
Unlike btr_pcur_restore_position(), rtr_cur_restore_position()
can create a search tuple out of a non-leaf page record. So,
we must pass 'bool leaf' parameter to dict_index_build_data_tuple().
dict_index_build_data_tuple(): Add a debug-only parameter 'bool leaf'.
rec_copy_prefix_to_dtuple(): Make the parameter debug-only.
row_sel_get_clust_rec_for_mysql(): In the debug code for spatial index,
remove an unnecessary call to buf_page_get_gen(), and use the already
latched block directly.
ATTRIBUTE_NORETURN is supported on all platforms (MSVS and GCC-like).
It declares that a function will not return; instead, the thread or
the whole process will terminate.
ATTRIBUTE_COLD is supported starting with GCC 4.3. It declares that
a function is supposed to be executed rarely. Rarely used error-handling
functions and functions that emit messages to the error log should be
tagged such.
Problem was that dict_sys->size tries to maintain used memory
occupied by the data dictionary table and index objects.
However at least on table objects table->heap size can increase
between when table object is inserted to dict_sys and when
it is removed from dict_sys causing inconsistency on amount
of memory added to and removed from dict_sys->size variable.
Removed unnecessary dict_sys:size variable as it is really
used only for status output.
Introduced dict_sys_get_size function to calculate memory
occupied by the data dictionary table and index objects
that is then used on show engine innodb output.
dict_table_add_to_cache(),
dict_table_rename_in_cache(),
dict_table_remove_from_cache_low(),
dict_index_remove_from_cache_low(),
Remove size calculation.
srv_printf_innodb_monitor(): Use dict_sys_get_size function to
get dictionary memory allocated.
xtradb_internal_hash_tables_fill_table(): Use dict_sys_get_size
function to get dictionary memory allocated.
This only merges MDEV-12253, adapting it to MDEV-12602 which is already
present in 10.2 but not yet in the 10.1 revision that is being merged.
TODO: Error handling in crash recovery needs to be improved.
If a page cannot be decrypted (or read), we should cleanly abort
the startup. If innodb_force_recovery is specified, we should
ignore the problematic page and apply redo log to other pages.
Currently, the test encryption.innodb-redo-badkey randomly fails
like this (the last messages are from cmake -DWITH_ASAN):
2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption
2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown...
i=================================================================
==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0
#0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750)
#1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4
#2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17
#3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3
#4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2
#5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2
#6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.
This patch should also address following test failures and
bugs:
MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.
MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot
MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot
MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8
Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.
Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().
Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.
btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.
buf_page_get_zip(): Issue error if page read fails.
buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.
buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.
buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.
buf_page_io_complete(): use dberr_t for error handling.
buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
Issue error if page read fails.
btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.
Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()
dict_stats_update_transient_for_index(),
dict_stats_update_transient()
Do not continue if table decryption failed or table
is corrupted.
dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.
fil_parse_write_crypt_data():
Check that key read from redo log entry is found from
encryption plugin and if it is not, refuse to start.
PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.
Fixed error code on innodb.innodb test.
Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2. Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.
Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.
fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.
Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.
Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.
fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.
recv_apply_hashed_log_recs() does not return anything.
InnoDB defines some functions that are not called at all.
Other functions are called, but only from the same compilation unit.
Remove some function declarations and definitions, and add 'static'
keywords. Some symbols must be kept for separately compiled tools,
such as innochecksum.
Also, remove empty .ic files that were not removed by my MySQL commit.
Problem:
InnoDB used to support a compilation mode that allowed to choose
whether the function definitions in .ic files are to be inlined or not.
This stopped making sense when InnoDB moved to C++ in MySQL 5.6
(and ha_innodb.cc started to #include .ic files), and more so in
MySQL 5.7 when inline methods and functions were introduced
in .h files.
Solution:
Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from
all files, assuming that the symbols are never defined.
Remove the files fut0fut.cc and ut0byte.cc which only mattered when
UNIV_NONINL was defined.
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
dict_init_free(): Make global, and move the call from
dict_close() to srv_free(), because this is initialized
earlier than dict_sys.
innobase_space_shutdown(): Do not leak srv_allow_writes_event.
Most notably, this includes MDEV-11623, which includes a fix and
an upgrade procedure for the InnoDB file format incompatibility
that is present in MariaDB Server 10.1.0 through 10.1.20.
In other words, this merge should address
MDEV-11202 InnoDB 10.1 -> 10.2 migration does not work
MySQL 5.7 allows temporary tables to be created in ROW_FORMAT=COMPRESSED.
The usefulness of this is questionable. WL#7899 in MySQL 8.0.0
prevents the creation of such compressed tables, so that all InnoDB
temporary tables will be located inside the predefined
InnoDB temporary tablespace.
Pick up and adjust some tests from MySQL 5.7 and 8.0.
dict_tf_to_fsp_flags(): Remove the parameter is_temp.
fsp_flags_init(): Remove the parameter is_temporary.
row_mysql_drop_temp_tables(): Remove. There cannot be any temporary
tables in InnoDB. (This never removed #sql* tables in the datadir
which were created by DDL.)
dict_table_t::dir_path_of_temp_table: Remove.
create_table_info_t::m_temp_path: Remove.
create_table_info_t::create_options_are_invalid(): Do not allow
ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE for temporary tables.
create_table_info_t::innobase_table_flags(): Do not unnecessarily
prevent CREATE TEMPORARY TABLE with SPATIAL INDEX.
(MySQL 5.7 does allow this.)
fil_space_belongs_in_lru(): The only FIL_TYPE_TEMPORARY tablespace
is never subjected to closing least-recently-used files.
MySQL 5.7 introduced partial support for user-created shared tablespaces
(for example, import and export are not supported).
MariaDB Server does not support tablespaces at this point of time.
Let us remove most InnoDB code and data structures that is related
to shared tablespaces.
MariaDB will likely never support MySQL-style encryption for
InnoDB, because we cannot link with the Oracle encryption plugin.
This is preparation for merging MDEV-11623.
- Atomic writes are enabled by default
- Automatically detect if device supports atomic write and use it if
atomic writes are enabled
- Remove ATOMIC WRITE options from CREATE TABLE
- Atomic write is a device option, not a table options as the table may
crash if the media changes
- Add support for SHANNON SSD cards
The InnoDB source code contains quite a few references to a closed-source
hot backup tool which was originally called InnoDB Hot Backup (ibbackup)
and later incorporated in MySQL Enterprise Backup.
The open source backup tool XtraBackup uses the full database for recovery.
So, the references to UNIV_HOTBACKUP are only cluttering the source code.
This should be functionally equivalent to WL#6204 in MySQL 8.0.0, with
the notable difference that the file format changes are limited to
repurposing a previously unused data field in B-tree pages.
For persistent InnoDB tables, write the last used AUTO_INCREMENT
value to the root page of the clustered index, in the previously
unused (0) PAGE_MAX_TRX_ID field, now aliased as PAGE_ROOT_AUTO_INC.
Unlike some other previously unused InnoDB data fields, this one was
actually always zero-initialized, at least since MySQL 3.23.49.
The writes to PAGE_ROOT_AUTO_INC are protected by SX or X latch on the
root page. The SX latch will allow concurrent read access to the root
page. (The field PAGE_ROOT_AUTO_INC will only be read on the
first-time call to ha_innobase::open() from the SQL layer. The
PAGE_ROOT_AUTO_INC can only be updated when executing SQL, so
read/write races are not possible.)
During INSERT, the PAGE_ROOT_AUTO_INC is updated by the low-level
function btr_cur_search_to_nth_level(), adding no extra page
access. [Adaptive hash index lookup will be disabled during INSERT.]
If some rare UPDATE modifies an AUTO_INCREMENT column, the
PAGE_ROOT_AUTO_INC will be adjusted in a separate mini-transaction in
ha_innobase::update_row().
When a page is reorganized, we have to preserve the PAGE_ROOT_AUTO_INC
field.
During ALTER TABLE, the initial AUTO_INCREMENT value will be copied
from the table. ALGORITHM=COPY and online log apply in LOCK=NONE will
update PAGE_ROOT_AUTO_INC in real time.
innodb_col_no(): Determine the dict_table_t::cols[] element index
corresponding to a Field of a non-virtual column.
(The MySQL 5.7 implementation of virtual columns breaks the 1:1
relationship between Field::field_index and dict_table_t::cols[].
Virtual columns are omitted from dict_table_t::cols[]. Therefore,
we must translate the field_index of AUTO_INCREMENT columns into
an index of dict_table_t::cols[].)
Upgrade from old data files:
By default, the AUTO_INCREMENT sequence in old data files would appear
to be reset, because PAGE_MAX_TRX_ID or PAGE_ROOT_AUTO_INC would contain
the value 0 in each clustered index page. In new data files,
PAGE_ROOT_AUTO_INC can only be 0 if the table is empty or does not contain
any AUTO_INCREMENT column.
For backward compatibility, we use the old method of
SELECT MAX(auto_increment_column) for initializing the sequence.
btr_read_autoinc(): Read the AUTO_INCREMENT sequence from a new-format
data file.
btr_read_autoinc_with_fallback(): A variant of btr_read_autoinc()
that will resort to reading MAX(auto_increment_column) for data files
that did not use AUTO_INCREMENT yet. It was manually tested that during
the execution of innodb.autoinc_persist the compatibility logic is
not activated (for new files, PAGE_ROOT_AUTO_INC is never 0 in nonempty
clustered index root pages).
initialize_auto_increment(): Replaces
ha_innobase::innobase_initialize_autoinc(). This initializes
the AUTO_INCREMENT metadata. Only called from ha_innobase::open().
ha_innobase::info_low(): Do not try to lazily initialize
dict_table_t::autoinc. It must already have been initialized by
ha_innobase::open() or ha_innobase::create().
Note: The adjustments to class ha_innopart were not tested, because
the source code (native InnoDB partitioning) is not being compiled.