btr_search_guess_on_hash() would only acquire an index page latch if it
is invoked with ahi_latch=NULL. If it's invoked from
row_sel_try_search_shortcut_for_mysql() with ahi_latch!=NULL, a page
will not be latched, and row_search_mvcc() will get a pointer to the
record, which can be changed by some other transaction before the record
was stored in result buffer with row_sel_store_mysql_rec() call.
ahi_latch argument of btr_cur_search_to_nth_level_func() and
btr_pcur_open_with_no_init_func() is used only for
row_sel_try_search_shortcut_for_mysql().
btr_cur_search_to_nth_level_func(..., ahi_latch !=0, ...) is invoked
only from btr_pcur_open_with_no_init_func(..., ahi_latch !=0, ...),
which, in turns, is invoked only from
row_sel_try_search_shortcut_for_mysql().
I suppose that separate case with ahi_latch!=0 was intentionally
implemented to protect row_sel_store_mysql_rec() call in
row_search_mvcc() just after row_sel_try_search_shortcut_for_mysql()
call. After the ahi_latch was moved from row_seach_mvcc() to
row_sel_try_search_shortcut_for_mysql(), there is no need in it at all
if btr_search_guess_on_hash() latches a page unconditionally. And if
btr_search_guess_on_hash() latched the page, any access to the record in
row_sel_try_search_shortcut_for_mysql() after btr_pcur_open_with_no_init()
call will be protected with the page latch.
The fix is to remove ahi_latch argument from
btr_pcur_open_with_no_init_func(), btr_cur_search_to_nth_level_func()
and btr_search_guess_on_hash().
There will not be test, as to test it we need to freeze some SELECT
execution in the point between row_sel_try_search_shortcut_for_mysql()
and row_sel_store_mysql_rec() calls in row_search_mvcc(), and to change
the record in some other transaction to let row_sel_store_mysql_rec() to
store changed record in result buffer. Buf we can't do this with the
fix, as the page will be latched in btr_search_guess_on_hash() call.
btr_lift_page_up(): If the leaf page only contains a hidden metadata
record for MDEV-11369 instant ADD COLUMN, convert the table to the
canonical format like we are supposed to do whenever the table
becomes empty.
In commit 0b47c126e3 (MDEV-13542)
a few calls to mtr_t::memo_push() were moved before a write latch
on the page was acquired. This introduced a race condition:
1. is_block_dirtied() returned false to mtr_t::memo_push()
2. buf_page_t::write_complete() was executed, the block marked clean,
and a page latch released
3. The page latch was acquired by the caller of mtr_t::memo_push(),
and mtr_t::m_made_dirty was not set even though the block is in
a clean state.
The impact of this race condition is that crash recovery and backups
may fail.
btr_cur_latch_leaves(), btr_store_big_rec_extern_fields(),
btr_free_externally_stored_field(), trx_purge_free_segment():
Acquire the page latch before invoking mtr_t::memo_push().
This fixes the regression caused by MDEV-13542.
Side note: It would suffice to set mtr_t::m_made_dirty at the time
we set the MTR_MEMO_MODIFY flag for a block. Currently that flag is
unnecessarily set if a mini-transaction acquires a page latch on
a page that is in a clean state, and will not actually modify the block.
This may cause unnecessary acquisitions of log_sys.flush_order_mutex
on mtr_t::commit().
mtr_t::free(): If the block had been exclusively latched in this
mini-transaction, set the m_made_dirty flag so that the flush order mutex
will be acquired during mtr_t::commit(). This should have been part of
commit 4179f93d28 (MDEV-18976).
It was necessary to change mtr_t::free() so that
WriteOPT_PAGE_CHECKSUM::operator() would be able to avoid writing
checksums for freed pages.
The index root page contains the fields BTR_SEG_TOP and BTR_SEG_LEAF
which keep track of allocated pages in the index tree. These fields
are normally protected by an Update latch, so that concurrent read
access to other parts of the page will be possible.
When the index root page is already exclusively latched in the
mini-transaction, we must not try to acquire a lower-grade Update latch.
In fact, when the root page is already X or U latched in the
mini-transaction, there is no point to acquire another latch.
Moreover, after a U latch was acquired on top of an X-latch,
mtr_t::defer_drop_ahi() would trigger an assertion failure or
lock corruption in block->page.lock.u_x_upgrade() because X locks
already exist on the block.
This problem may have been introduced in
commit 03ca6495df (MDEV-24142).
btr_page_alloc_low(), btr_page_free(): Initially buffer-fix the root page.
If it is already U or X latched, release the buffer-fix. Else, upgrade
the buffer-fix to a U latch.
mtr_t::u_lock_register(): Upgrade a buffer-fix to U latch.
mtr_t::have_u_or_x_latch(): Check if U or X latches are already
registered in the mini-transaction.
Reason:
=======
Race condition between btr_search_drop_hash_index() and
btr_search_lazy_free(). One thread does resizing of buffer pool
and clears the ahi on all pages in the buffer pool, frees the
index and table while removing the last reference. At the same time,
other thread access index->heap in btr_search_drop_hash_index().
Solution:
=========
Acquire the respective ahi latch before checking index->freed()
btr_search_drop_page_hash_index(): Added new parameter to indicate
that drop ahi entries only if the index is marked as freed
btr_search_check_marked_free_index(): Acquire all ahi latches and
return true if the index was freed
btr_page_reorganize_low(): Restore mtr->set_log_mode() before returning.
innobase_instant_try(): Do not invoke rec_get_offsets() if
btr_cur_pessimistic_update() failed. The cursor position may be
invalid.
The function rec_get_offsets_func() used to hit ut_error
due to an invalid rec_get_status() value of a
ROW_FORMAT!=REDUNDANT record. This fix is twofold:
We will not only avoid a crash on corruption in this case,
but we will also make more effort to validate each record
every time we are iterating over index page records.
rec_get_offsets_func(): Do not crash on a corrupted record.
page_rec_get_nth(): Return nullptr on error.
page_dir_slot_get_rec_validate(): Like page_dir_slot_get_rec(),
but validate the pointer and return nullptr on error.
page_cur_search_with_match(), page_cur_search_with_match_bytes(),
page_dir_split_slot(), page_cur_move_to_next():
Indicate failure in a return value.
page_cur_search(): Replaced with page_cur_search_with_match().
rec_get_next_ptr_const(), rec_get_next_ptr(): Replaced with
page_rec_get_next_low().
TODO: rtr_page_split_initialize_nodes(), rtr_update_mbr_field(),
and possibly other SPATIAL INDEX functions fail to properly handle
errors.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
Performance tested by: Axel Schwenke
Get rid of BTR_ESTIMATE and btr_cur_t::path_arr.
Before the fix btr_estimate_n_rows_in_range_low() used two
btr_cur_search_to_nth_level() calls to create two arrays of tree path,
the array per border. And then it tried to estimate the number of rows
diving level-by-level with the array elements. As the path pages are
unlatched during the arrays iterating, the tree could be modified, the
estimation function called itself until the number of attempts exceed.
After the fix the estimation happens during search process. Roughly, the
algorithm is the following. Dive in the left page, then if there are pages
between left and right ones, read a few pages to the right, if the right
page is reached, fetch it and count the exact number of rows, otherwise
count the estimated number of rows, and fetch the right page.
The latching order corresponds to WL#6326 rules, i.e.:
(2.1) [same as (1.1)]: Page latches must be acquired in descending order
of tree level.
(2.2) When acquiring a node pointer page latch at level L, we must hold
the left sibling page latch (at level L) or some ancestor latch
(at level>L).
When we dive to the level down, the parent page is unlatched only after
the the current level page is latched. When we estimate the number of rows
on some level, we latch the left border, then fetch the next page, and
then fetch the next page unlatching the previous page after the current
page is latched until the right border is reached. I.e. the left sibling
is always latched when we acquire page latch on the same level. When we
reach the right border, the current page is unlatched, and then the right
border is latched. Following to (2.2) rule, we can do this because the
right border's parent is latched.
btr_root_raise_and_insert(), btr_lift_page_up(),
rtr_page_split_and_insert(): Reset DB_FAIL from a failure to
copy records on a ROW_FORMAT=COMPRESSED page to DB_SUCCESS
before retrying.
This fixes a regression that was introduced by
commit 0b47c126e3 (MDEV-13542).
btr_root_raise_and_insert(): Remove a redundant condition.
btr_page_split_and_insert() will invoke btr_page_split_and_insert()
if needed.
A prominent remaining source of crashes on corrupted index pages
is page directory corruption.
A frequent caller of page_dir_find_owner_slot() is page_rec_get_prev().
Some of those calls can be replaced with simpler logic that is less
prone to fail.
page_dir_find_owner_slot(),
page_rec_get_prev(), page_rec_get_prev_const(),
btr_pcur_move_to_prev(), btr_pcur_move_to_prev_on_page(),
btr_cur_upd_rec_sys(),
page_delete_rec_list_end(),
rtr_page_copy_rec_list_end_no_locks(),
rtr_page_copy_rec_list_start_no_locks(): Return an error code on failure.
fil_space_t::io(), buf_page_get_low(): Use DB_CORRUPTION for
out-of-bounds page reads.
PageBulk::getSplitRec(), PageBulk::copyOut(): Simplify the code.
btr_validate_level(): Prevent some more CHECK TABLE crashes on
corrupted pages.
btr_block_get(), btr_pcur_move_to_next_page(): Implement some checks that
were previously only part of IndexPurge::next().
IndexPurge::next(): Use btr_pcur_move_to_next_page().
Even after commit 0b47c126e3
there are a few ib::fatal() calls in non-debug code
that can be replaced easily.
btr_page_reorganize_low(): On size invariant violation, return
an error code instead of crashing.
btr_check_blob_fil_page_type(): On an invalid page type, report
an error but do not crash.
btr_copy_blob_prefix(): Truncate the output if a page type is invalid.
dict_load_foreign_cols(): On an error, return DB_CORRUPTION instead
of crashing.
fil_space_decrypt_full_crc32(), fil_space_decrypt_for_non_full_checksum():
On error, return DB_DECRYPTION_FAILED instead of crashing.
fil_set_max_space_id_if_bigger(): Replace ib::fatal() with an
equivalent ut_a() assertion.
We will introduce an optional log record OPT_PAGE_CHECKSUM for recording
page checksums, so that more inconsistencies on crash recovery may be
caught.
mtr_t::page_checksum(const buf_page_t&): Write OPT_PAGE_CHECKSUM
(currently not for ROW_FORMAT=COMPRESSED pages).
mtr_t::do_write(): Write OPT_PAGE_CHECKSUM records for all pages
(currently, in debug builds only).
mtr_t::is_logged(): Return whether log should be written.
mtr_t::set_log_mode_sub(const mtr_t&): Set the logging mode of
a sub-minitransaction when another mini-transaction is holding
latches on some modified pages. When creating or freeing BLOB pages,
we may only write OPT_PAGE_CHECKSUM records in the main mini-transaction,
after all changes have been written to the log.
MTR_LOG_SUB: Log mode for a sub-mini-transaction.
mtr_t::free(): Define non-inline, and invoke MarkFreed.
MarkFreed: For any matching page in the mini-transaction log,
change the first entry to say MTR_MEMO_PAGE_X_MODIFY and any subsequent
entries to MTR_MEMO_PAGE_X_FIX.
FindModified: Simplify a condition. MTR_MEMO_MODIFY can only be set
if MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX are set.
FindBlockX: Consider also MTR_MEMO_PAGE_X_MODIFY.
recv_sys_t::parse(): Store OPT_PAGE_CHECKSUM records.
log_phys_t::apply(): Validate OPT_PAGE_CHECKSUM records.
log_phys_t::page_checksum(): Validate an OPT_PAGE_CHECKSUM record.
Tested by: Matthias Leich
The approach to handling corruption that was chosen by Oracle in
commit 177d8b0c12
is not really useful. Not only did it actually fail to prevent InnoDB
from crashing, but it is making things worse by blocking attempts to
rescue data from or rebuild a partially readable table.
We will try to prevent crashes in a different way: by propagating
errors up the call stack. We will never mark the clustered index
persistently corrupted, so that data recovery may be attempted by
reading from the table, or by rebuilding the table.
This should also fix MDEV-13680 (crash on btr_page_alloc() failure);
it was extensively tested with innodb_file_per_table=0 and a
non-autoextend system tablespace.
We should now avoid crashes in many cases, such as when a page
cannot be read or allocated, or an inconsistency is detected when
attempting to update multiple pages. We will not crash on double-free,
such as on the recovery of DDL in system tablespace in case something
was corrupted.
Crashes on corrupted data are still possible. The fault injection mechanism
that is introduced in the subsequent commit may help catch more of them.
buf_page_import_corrupt_failure: Remove the fault injection, and instead
corrupt some pages using Perl code in the tests.
btr_cur_pessimistic_insert(): Always reserve extents (except for the
change buffer), in order to prevent a subsequent allocation failure.
btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages().
btr_assert_not_corrupted(), btr_corruption_report(): Remove.
Similar checks are already part of btr_block_get().
FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE.
dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(),
trx_undo_page_get_s_latched(): Replaced with error-checking calls.
trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get().
trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed.
trx_sys_create_sys_pages(): Merged with trx_sysf_create().
dict_check_tablespaces_and_store_max_id(): Do not access
DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot().
Merge dict_check_sys_tables() with this function.
dir_pathname(): Replaces os_file_make_new_pathname().
row_undo_ins_remove_sec(): Do not modify the undo page by adding
a terminating NUL byte to the record.
btr_decryption_failed(): Report decryption failures
dict_set_corrupted_by_space(), dict_set_encrypted_by_space(),
dict_set_corrupted_index_cache_only(): Remove.
dict_set_corrupted(): Remove the constant parameter dict_locked=false.
Never flag the clustered index corrupted in SYS_INDEXES, because
that would deny further access to the table. It might be possible to
repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case
no B-tree leaf page is corrupted.
dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(),
row_purge_skip_uncommitted_virtual_index(): Remove, and refactor
the callers to read dict_index_t::type only once.
dict_table_is_corrupted(): Remove.
dict_index_t::is_btree(): Determine if the index is a valid B-tree.
BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove.
UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger
assertion failures, but error codes being returned.
buf_corrupt_page_release(): Replaced with a direct call to
buf_pool.corrupted_evict().
fil_invalid_page_access_msg(): Never crash on an invalid read;
let the caller of buf_page_get_gen() decide.
btr_pcur_t::restore_position(): Propagate failure status to the caller
by returning CORRUPTED.
opt_search_plan_for_table(): Simplify the code.
row_purge_del_mark(), row_purge_upd_exist_or_extern_func(),
row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(),
row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free()
when no secondary indexes exist.
row_undo_mod_upd_exist_sec(): Simplify the code.
row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT
if the clustered index (and therefore the table) is corrupted, similar
to what we do in row_insert_for_mysql().
fut_get_ptr(): Replace with buf_page_get_gen() calls.
buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION
if the page is marked as freed. For other modes than
BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will
trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED,
we will return nullptr for freed pages, so that the callers
can be simplified. The purge of transaction history will be
a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on
corrupted data.
buf_page_get_low(): Never crash on a corrupted page, but simply
return nullptr.
fseg_page_is_allocated(): Replaces fseg_page_is_free().
fts_drop_common_tables(): Return an error if the transaction
was rolled back.
fil_space_t::set_corrupted(): Report a tablespace as corrupted if
it was not reported already.
fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report
out-of-bounds page access or other errors.
Clean up mtr_t::page_lock()
buf_page_get_low(): Validate the page identifier (to check for
recently read corrupted pages) after acquiring the page latch.
buf_page_t::read_complete(): Flag uninitialized (all-zero) pages
with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch.
mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi().
recv_sys_t::free_corrupted_page(): Only set_corrupt_fs()
if any log records exist for the page. We do not mind if read-ahead
produces corrupted (or all-zero) pages that were not actually needed
during recovery.
recv_recover_page(): Return whether the operation succeeded.
recv_sys_t::recover_low(): Simplify the logic. Check for recovery error.
Thanks to Matthias Leich for testing this extensively and to the
authors of https://rr-project.org for making it easy to diagnose
and fix any failures that were found during the testing.
The types btr_latch_mode and mtr_memo_type_t are partly derived from
rw_lock_type_t. Despite that, some code for converting between them
is using conditions instead of bitwise arithmetics.
Let us define btr_latch_mode in such a way that more conversions to
rw_lock_type_t are possible by bitwise and.
Some SPATIAL INDEX code that assumed !(BTR_MODIFY_TREE & BTR_MODIFY_LEAF)
was adjusted.
This is based on commit 20ae4816bb
with some adjustments for MDEV-12353.
row_ins_sec_index_entry_low(): If a separate mini-transaction is
needed to adjust the minimum bounding rectangle (MBR) in the parent
page, we must disable redo logging if the table is a temporary table.
For temporary tables, no log is supposed to be written, because
the temporary tablespace will be reinitialized on server restart.
rtr_update_mbr_field(), rtr_merge_and_update_mbr(): Changed the return
type to void and removed unreachable code. In older versions, these
used to return a different value for temporary tables.
page_id_t: Add constexpr to most member functions.
mtr_t::log_write(): Catch log writes to invalid tablespaces
so that the test case would crash without the fix to
row_ins_sec_index_entry_low().
btr_insert_into_right_sibling(): Inherit any gap lock from the
left sibling to the right sibling before inserting the record
to the right sibling and updating the node pointer(s).
lock_update_node_pointer(): Update locks in case a node pointer
will move.
Based on mysql/mysql-server@c7d93c274f
This follows up the previous fix in
commit c3c53926c4 (MDEV-26554).
ha_innobase::delete_table(): Work around the insufficient
metadata locking (MDL) during DML operations by acquiring exclusive
InnoDB table locks on all child tables. Previously, this was only
done on TRUNCATE and ALTER.
ibuf_delete_rec(), btr_cur_optimistic_delete(): Do not invoke
lock_update_delete() during change buffer operations.
The revised trx_t::commit(std::vector<pfs_os_file_t>&) will
hold exclusive lock_sys.latch while invoking fil_delete_tablespace(),
which in turn may invoke ibuf_delete_rec().
dict_index_t::has_locking(): A new predicate, replacing the dummy
!dict_table_is_locking_disabled(index->table). Used for skipping lock
operations during ibuf_delete_rec().
trx_t::commit(std::vector<pfs_os_file_t>&): Release the locks
and remove the table from the cache while holding exclusive
lock_sys.latch.
trx_t::commit_in_memory(): Skip release_locks() if dict_operation holds.
trx_t::commit(): Reset dict_operation before invoking commit_in_memory()
via commit_persist().
lock_release_on_drop(): Release locks while lock_sys.latch is
exclusively locked.
lock_table(): Add a parameter for a pointer to the table.
We must not dereference the table before a lock_sys.latch has
been acquired. If the pointer to the table does not match the table
at that point, the table is invalid and DB_DEADLOCK will be returned.
row_ins_foreign_check_on_constraint(): Improve the checks.
Remove a bogus DB_LOCK_WAIT_TIMEOUT return that was needed
before commit c5fd9aa562 (MDEV-25919).
row_upd_check_references_constraints(),
wsrep_row_upd_check_foreign_constraints(): Simplify checks.
- InnoDB DDL results in `Duplicate entry' if concurrent DML throws
duplicate key error. The following scenario explains the problem
connection con1:
ALTER TABLE t1 FORCE;
connection con2:
INSERT INTO t1(pk, uk) VALUES (2, 2), (3, 2);
In connection con2, InnoDB throws the 'DUPLICATE KEY' error because
of unique index. Alter operation will throw the error when applying
the concurrent DML log.
- Inserting the duplicate key for unique index logs the insert
operation for online ALTER TABLE. When insertion fails,
transaction does rollback and it leads to logging of
delete operation for online ALTER TABLE.
While applying the insert log entries, alter operation
encounters 'DUPLICATE KEY' error.
- To avoid the above fake duplicate scenario, InnoDB should
not write any log for online ALTER TABLE before DML transaction
commit.
- User thread which does DML can apply the online log if
InnoDB ran out of online log and index is marked as completed.
Set online log error if apply phase encountered any error.
It can also clear all other indexes log, marks the newly
added indexes as corrupted.
- Removed the old online code which was a part of DML operations
commit_inplace_alter_table() : Does apply the online log
for the last batch of secondary index log and does frees
the log for the completed index.
trx_t::apply_online_log: Set to true while writing the undo
log if the modified table has active DDL
trx_t::apply_log(): Apply the DML changes to online DDL tables
dict_table_t::is_active_ddl(): Returns true if the table
has an active DDL
dict_index_t::online_log_make_dummy(): Assign dummy value
for clustered index online log to indicate the secondary
indexes are being rebuild.
dict_index_t::online_log_is_dummy(): Check whether the online
log has dummy value
ha_innobase_inplace_ctx::log_failure(): Handle the apply log
failure for online DDL transaction
row_log_mark_other_online_index_abort(): Clear out all other
online index log after encountering the error during
row_log_apply()
row_log_get_error(): Get the error happened during row_log_apply()
row_log_online_op(): Does apply the online log if index is
completed and ran out of memory. Returns false if apply log fails
UndorecApplier: Introduced a class to maintain the undo log
record, latched undo buffer page, parse the undo log record,
maintain the undo record type, info bits and update vector
UndorecApplier::get_old_rec(): Get the correct version of the
clustered index record that was modified by the current undo
log record
UndorecApplier::clear_undo_rec(): Clear the undo log related
information after applying the undo log record
UndorecApplier::log_update(): Handle the update, delete undo
log and apply it on online indexes
UndorecApplier::log_insert(): Handle the insert undo log
and apply it on online indexes
UndorecApplier::is_same(): Check whether the given roll pointer
is generated by the current undo log record information
trx_t::rollback_low(): Set apply_online_log for the transaction
after partially rollbacked transaction has any active DDL
prepare_inplace_alter_table_dict(): After allocating the online
log, InnoDB does create fulltext common tables. Fulltext index
doesn't allow the index to be online. So removed the dead
code of online log removal
Thanks to Marko Mäkelä for providing the initial prototype and
Matthias Leich for testing the issue patiently.
This also fixes MDEV-20198: Instant ALTER TABLE is not crash safe
InnoDB dictionary recovery wrongly used the READ UNCOMMITTED isolation
level, causing some mismatch. For example, if a table was renamed or
replaced in a transaction, according to READ UNCOMMITTED the table might
not exist at all.
We implement READ COMMITTED isolation level for accessing the dictionary
tables SYS_TABLES, SYS_COLUMNS, SYS_INDEXES, SYS_FIELDS, SYS_VIRTUAL,
SYS_FOREIGN, SYS_FOREIGN_COLS. For most of these tables, no secondary
index exists. For the secondary indexes (on SYS_TABLES.ID,
SYS_FOREIGN.FOR_NAME, SYS_FOREIGN.REF_NAME), we will always look up
the primary key in the clustered index and check if the record actually
is a committed version.
dict_check_sys_tables(): Recover tablespaces also from delete-marked
committed records, so that if a matching .ibd file exists, it will
be removed by fil_delete_tablespace() when the committed delete-marked
SYS_INDEXES record of the clustered index is purged
in row_purge_remove_clust_if_poss_low().
fil_ibd_open(): Change the Boolean parameter "validate" to a ternary
one, to suppress error messages when the file might not exist.
It is possible that a .ibd file was deleted and the server shut down
before the SYS_INDEXES and SYS_TABLES records were purged. Hence, if
dict_check_sys_tables() finds a committed delete-marked record,
we must not complain if the tablespace file is not found.
On Windows, we msut treat ERROR_PATH_NOT_FOUND (directory not found)
in the same way as ERROR_FILE_NOT_FOUND. This fixes a few failures where
a previous test successfully executed DROP DATABASE (and deleted all
files and the directory), but a committed delete-marked SYS_TABLES
record had not been purged before server restart.
dict_getnext_system_low(): Do not filter out delete-marked records.
dict_startscan_system(), dict_getnext_system(): Do filter out
delete-marked records, for accessing the INFORMATION_SCHEMA tables.
dict_sys_tables_rec_read(): Return the DB_TRX_ID of the committed
version of the record. This is needed in dict_load_table_low().
dict_load_foreign_cols(), dict_load_foreign(): Add a parameter for
the current transaction identifier. In some DDL operations, the
FOREIGN KEY constraints are being loaded from the data dictionary
before the DDL transaction has been committed. For SYS_FOREIGN
and SYS_FOREIGN_COLS, we must implement the special case of
READ COMMITTED that the changes of the uncommitted current transaction
are visible.
dict_load_foreign(): Validate the table name. We could find a
SYS_FOREIGN.ID via a committed delete-marked secondary index record
that does not match the REF_NAME or FOR_NAME of the secondary index record.
dict_load_index_low(): Optionally take the table as a parameter,
so that table->def_trx_id can be updated in case of a
committed delete-marked SYS_INDEXES record corresponding
to DROP INDEX, but not corresponding to an index stub of ADD INDEX.
dict_load_indexes(): Do not update table->def_trx_id
in case of delete-marked records.
rec_is_metadata(), rec_offs_make_valid(), rec_get_offsets_func(),
row_build_low(): Relax some assertions. We may now have
!index->is_instant() even if a metadata record is present in the index.
Previously, the recovery of instant ADD/DROP COLUMN assumed
that READ UNCOMMITTED of the data dictionary will be performed.
Now, we will have a READ COMMITTED copy of the data dictionary
cache, and a READ UNCOMMITTED copy of the metadata record.
btr_page_reorganize_low(): Correctly update the FIL_PAGE_TYPE
when rolling back an instant ADD/DROP COLUMN operation.
row_rec_to_index_entry_impl(): Relax some assertions,
and disallow accessing "extra" fields. This fixes the recovery
of a crash during an instant ADD COLUMN after a successful
instant DROP COLUMN, in the test innodb.instant_alter_crash.
Tested by: Matthias Leich
btr_cur_optimistic_insert(): Disregard DEBUG_DBUG injection to
invoke btr_page_reorganize() if the page (and the table) is empty.
Otherwise, an assertion would fail in btr_page_reorganize_low()
because PAGE_MAX_TRX_ID is 0 in an empty secondary index leaf page.