Commit graph

24 commits

Author SHA1 Message Date
Marko Mäkelä
0b47c126e3 MDEV-13542: Crashing on corrupted page is unhelpful
The approach to handling corruption that was chosen by Oracle in
commit 177d8b0c12
is not really useful. Not only did it actually fail to prevent InnoDB
from crashing, but it is making things worse by blocking attempts to
rescue data from or rebuild a partially readable table.

We will try to prevent crashes in a different way: by propagating
errors up the call stack. We will never mark the clustered index
persistently corrupted, so that data recovery may be attempted by
reading from the table, or by rebuilding the table.

This should also fix MDEV-13680 (crash on btr_page_alloc() failure);
it was extensively tested with innodb_file_per_table=0 and a
non-autoextend system tablespace.

We should now avoid crashes in many cases, such as when a page
cannot be read or allocated, or an inconsistency is detected when
attempting to update multiple pages. We will not crash on double-free,
such as on the recovery of DDL in system tablespace in case something
was corrupted.

Crashes on corrupted data are still possible. The fault injection mechanism
that is introduced in the subsequent commit may help catch more of them.

buf_page_import_corrupt_failure: Remove the fault injection, and instead
corrupt some pages using Perl code in the tests.

btr_cur_pessimistic_insert(): Always reserve extents (except for the
change buffer), in order to prevent a subsequent allocation failure.

btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages().

btr_assert_not_corrupted(), btr_corruption_report(): Remove.
Similar checks are already part of btr_block_get().

FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE.

dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(),
trx_undo_page_get_s_latched(): Replaced with error-checking calls.

trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get().

trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed.

trx_sys_create_sys_pages(): Merged with trx_sysf_create().

dict_check_tablespaces_and_store_max_id(): Do not access
DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot().
Merge dict_check_sys_tables() with this function.

dir_pathname(): Replaces os_file_make_new_pathname().

row_undo_ins_remove_sec(): Do not modify the undo page by adding
a terminating NUL byte to the record.

btr_decryption_failed(): Report decryption failures

dict_set_corrupted_by_space(), dict_set_encrypted_by_space(),
dict_set_corrupted_index_cache_only(): Remove.

dict_set_corrupted(): Remove the constant parameter dict_locked=false.
Never flag the clustered index corrupted in SYS_INDEXES, because
that would deny further access to the table. It might be possible to
repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case
no B-tree leaf page is corrupted.

dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(),
row_purge_skip_uncommitted_virtual_index(): Remove, and refactor
the callers to read dict_index_t::type only once.

dict_table_is_corrupted(): Remove.

dict_index_t::is_btree(): Determine if the index is a valid B-tree.

BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove.

UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger
assertion failures, but error codes being returned.

buf_corrupt_page_release(): Replaced with a direct call to
buf_pool.corrupted_evict().

fil_invalid_page_access_msg(): Never crash on an invalid read;
let the caller of buf_page_get_gen() decide.

btr_pcur_t::restore_position(): Propagate failure status to the caller
by returning CORRUPTED.

opt_search_plan_for_table(): Simplify the code.

row_purge_del_mark(), row_purge_upd_exist_or_extern_func(),
row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(),
row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free()
when no secondary indexes exist.

row_undo_mod_upd_exist_sec(): Simplify the code.

row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT
if the clustered index (and therefore the table) is corrupted, similar
to what we do in row_insert_for_mysql().

fut_get_ptr(): Replace with buf_page_get_gen() calls.

buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION
if the page is marked as freed. For other modes than
BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will
trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED,
we will return nullptr for freed pages, so that the callers
can be simplified. The purge of transaction history will be
a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on
corrupted data.

buf_page_get_low(): Never crash on a corrupted page, but simply
return nullptr.

fseg_page_is_allocated(): Replaces fseg_page_is_free().

fts_drop_common_tables(): Return an error if the transaction
was rolled back.

fil_space_t::set_corrupted(): Report a tablespace as corrupted if
it was not reported already.

fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report
out-of-bounds page access or other errors.

Clean up mtr_t::page_lock()

buf_page_get_low(): Validate the page identifier (to check for
recently read corrupted pages) after acquiring the page latch.

buf_page_t::read_complete(): Flag uninitialized (all-zero) pages
with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch.

mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi().

recv_sys_t::free_corrupted_page(): Only set_corrupt_fs()
if any log records exist for the page. We do not mind if read-ahead
produces corrupted (or all-zero) pages that were not actually needed
during recovery.

recv_recover_page(): Return whether the operation succeeded.

recv_sys_t::recover_low(): Simplify the logic. Check for recovery error.

Thanks to Matthias Leich for testing this extensively and to the
authors of https://rr-project.org for making it easy to diagnose
and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
Marko Mäkelä
ac028ec5d8 MDEV-24142: Remove the LatchDebug interface to rw-locks
The latching order checks for rw-locks have not caught many bugs
in the past few years and they are greatly complicating the code.

Last time the debug checks were useful was in
commit 59caf2c3c1 (MDEV-13485).

The B-tree hang MDEV-14637 was not caught by LatchDebug,
because the granularity of the checks is not sufficient
to distinguish the levels of non-leaf B-tree pages.

The interface was already made dead code by the grandparent
commit 03ca6495df.
2020-12-03 15:27:50 +02:00
Marko Mäkelä
0bee3b8d65 Cleanup: Use Atomic_relaxed for dict_sys.row_id 2020-11-24 15:43:13 +02:00
Marko Mäkelä
5407117a59 MDEV-22343 Remove SYS_TABLESPACES and SYS_DATAFILES
The InnoDB internal tables SYS_TABLESPACES and SYS_DATAFILES as well as the
INFORMATION_SCHEMA views INNODB_SYS_TABLESPACES and INNODB_SYS_DATAFILES
were introduced in MySQL 5.6 for no good reason in
mysql/mysql-server/commit/e9255a22ef16d612a8076bc0b34002bc5a784627
when the InnoDB support for the DATA DIRECTORY attribute was introduced.

The file system should be the authoritative source of information on files.
Storing information about file system paths in the file system (symlinks,
or even the .isl files that were unfortunately chosen as the solution) is
sufficient. If information is additionally stored in some hidden tables
inside the InnoDB system tablespace, everything unnecessarily becomes
more complicated, because more copies of data mean more opportunity
for the copies to be out of sync, and because modifying the data in
the system tablespace in the desired way might not be possible at all
without modifying the InnoDB source code. So, the copy in the system
tablespace basically is a redundant, non-authoritative source of
information.

We will stop creating or accessing the system tables SYS_TABLESPACES
and SYS_DATAFILES.

We will also remove the view
INFORMATION_SCHEMA.INNODB_SYS_DATAFILES along with SYS_DATAFILES.

The view
INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES will be repurposed
to directly reflect fil_system.space_list. The column
PAGE_SIZE, which would always contain the value of
the GLOBAL read-only variable innodb_page_size, is
removed. The column ZIP_PAGE_SIZE, which would actually
contain the physical page size of a page, is renamed to
PAGE_SIZE. Finally, a new column FILENAME is added, as a
replacement of SYS_DATAFILES.PATH.

This will also
address MDEV-21801 (files that were created before upgrading
to MySQL 5.6 or MariaDB 10.0 or later were never registered
in SYS_TABLESPACES or SYS_DATAFILES) and
MDEV-21801 (information about the system tablespace is not stored
in SYS_TABLESPACES or SYS_DATAFILES).
2020-11-11 11:15:11 +02:00
Marko Mäkelä
ba573c4736 MDEV-21133 follow-up: More my_assume_aligned hints
fsp0pagecompress.h: Remove.
Invoke fil_page_get_type() and FSP_FLAGS_GET_PAGE_COMPRESSION_LEVEL
directly.

log_block_get_flush_bit(), log_block_set_flush_bit():
Access the byte directly.

dict_sys_read_row_id(): Remove (unused function).
2020-05-07 12:25:00 +03:00
Marko Mäkelä
56f6dab1d0 MDEV-21174: Replace mlog_write_ulint() with mtr_t::write()
mtr_t::write(): Replaces mlog_write_ulint(), mlog_write_ull().
Optimize away writes if the page contents does not change,
except when a dummy write has been explicitly requested.

Because the member function template takes a block descriptor as a
parameter, it is possible to introduce better consistency checks.
Due to this, the code for handling file-based lists, undo logs
and user transactions was refactored to pass around buf_block_t.
2019-12-03 11:05:18 +02:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
c0ac0b8860 Update FSF address 2019-05-11 19:25:02 +03:00
Marko Mäkelä
b374246730 Merge 10.3 into 10.4 2018-11-30 11:05:46 +02:00
Marko Mäkelä
0abd2766b1 Merge 10.2 into 10.3
Also, related to MDEV-15522, MDEV-17304, MDEV-17835,
remove the Galera xtrabackup tests, because xtrabackup never worked
with MariaDB Server 10.3 due to InnoDB redo log format changes.
2018-11-30 09:38:56 +02:00
Marko Mäkelä
447e493179 Remove some unnecessary InnoDB #include 2018-11-29 12:53:44 +02:00
Marko Mäkelä
4be0855cf5 MDEV-17794 Do not assign persistent ID for temporary tables
InnoDB in MySQL 5.7 introduced two new parameters to the function
dict_hdr_get_new_id(), to allow redo logging to be disabled when
assigning identifiers to temporary tables or during the
backup-unfriendly TRUNCATE TABLE that was replaced in MariaDB
by MDEV-13564.

Now that MariaDB 10.4.0 removed the crash recovery code for the
backup-unfriendly TRUNCATE, we can revert dict_hdr_get_new_id()
to be used only for persistent data structures.

dict_table_assign_new_id(): Remove. This was a simple 2-line function
that was called from few places.

dict_table_open_on_id_low(): Declare in the only file where it
is called.

dict_sys_t::temp_id_hash: A separate lookup table for temporary tables.
Table names will be in the common dict_sys_t::table_hash.

dict_sys_t::get_temporary_table_id(): Assign a temporary table ID.

dict_sys_t::get_table(): Look up a persistent table.

dict_sys_t::get_temporary_table(): Look up a temporary table.

dict_sys_t::temp_table_id: The sequence of temporary table identifiers.
Starts from DICT_HDR_FIRST_ID, so that we can continue to simply compare
dict_table_t::id to a few constants for the persistent hard-coded
data dictionary tables.

undo_node_t::state: Distinguish temporary and persistent tables.

lock_check_dict_lock(), lock_get_table_id(): Assert that there cannot
be locks on temporary tables.

row_rec_to_index_entry_impl(): Assert that there cannot be metadata
records on temporary tables.

row_undo_ins_parse_undo_rec(): Distinguish temporary and persistent tables.
Move some assertions from the only caller. Return whether the table was
found.

row_undo_ins(): Add some assertions.

row_undo_mod_clust(), row_undo_mod(): Do not assign node->state.
Let row_undo() do that.

row_undo_mod_parse_undo_rec(): Distinguish temporary and persistent tables.
Move some assertions from the only caller. Return whether the table was
found.

row_undo_try_truncate(): Renamed and simplified from trx_roll_try_truncate().

row_undo_rec_get(): Replaces trx_roll_pop_top_rec_of_trx() and
trx_roll_pop_top_rec(). Fetch an undo log record, and assign undo->state
accordingly.

trx_undo_truncate_end(): Acquire the rseg->mutex only for the minimum
required duration, and release it between mini-transactions.
2018-11-22 15:42:52 +02:00
Eugene Kosov
90c809a61d IB: VTQ cleanups [#305]
* btr_pcur_move_to_prev_user_rec() removed
* usage of ut_usectime() removed
* other VTQ-related leftovers
2017-11-17 11:25:50 +03:00
Aleksey Midenkov
33085349e9 IB, SQL: removed VTQ, added TRT on SQL layer [closes #305] 2017-11-15 00:22:10 +03:00
Aleksey Midenkov
d8d7251019 System Versioning pre0.12
Merge remote-tracking branch 'origin/archive/2017-10-17' into 10.3
2017-11-07 00:37:49 +03:00
Marko Mäkelä
c65fdcf7c8 MDEV-14062 Upgrade from 10.0 fails on assertion during ALTER TABLE
The fix in MDEV-14022 was incomplete. We must adjust some more code
for the fact that SYS_INDEXES records may lack the column MERGE_THRESHOLD
that was "instantly" added in MySQL 5.7 and MariaDB 10.2.2.

dict_boot(): Hard-core SYS_INDEXES.MERGE_THRESHOLD as DEFAULT NULL.

dict_index_t::instant_field_value(), rec_offs_make_valid():
Tolerate the missing SYS_INDEXES.MERGE_THRESHOLD column.
2017-10-13 21:59:15 +03:00
Aleksey Midenkov
d54d36c45e IB, SQL: (0.4) COMMIT_ID-based ordering of transactions
IB:
* removed CONCURR_TRX from VTQ;
* new fields in VTQ: COMMIT_ID, ISO_LEVEL.

SQL:
* renamed BEGIN_TS, COMMIT_TS to VTQ_BEGIN_TS, VTQ_COMMIT_TS;
* new functions: VTQ_COMMIT_ID, VTQ_ISO_LEVEL, VTQ_TRX_ID, VTQ_TRX_SEES, VTQ_TRX_SEES_EQ;
* versioned SELECT for IB uses VTQ_TRX_SEES, VTQ_TRX_SEES_EQ.

Closes #71
2017-05-05 20:36:17 +03:00
Aleksey Midenkov
84e1971128 IB: 0.2 part I
* SYS_VTQ internal InnoDB table;
* I_S.INNODB_SYS_VTQ table;
* vers_notify_vtq(): add record to SYS_VTQ on versioned DML;
* SYS_VTQ columns filled: TRX_ID, BEGIN_TS.
2017-05-05 20:36:07 +03:00
Marko Mäkelä
4e1116b2c6 MDEV-12271 Port MySQL 8.0 Bug#23150562 REMOVE UNIV_MUST_NOT_INLINE AND UNIV_NONINL
Also, remove empty .ic files that were not removed by my MySQL commit.

Problem:
InnoDB used to support a compilation mode that allowed to choose
whether the function definitions in .ic files are to be inlined or not.
This stopped making sense when InnoDB moved to C++ in MySQL 5.6
(and ha_innodb.cc started to #include .ic files), and more so in
MySQL 5.7 when inline methods and functions were introduced
in .h files.

Solution:
Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from
all files, assuming that the symbols are never defined.
Remove the files fut0fut.cc and ut0byte.cc which only mattered when
UNIV_NONINL was defined.
2017-03-17 12:42:07 +02:00
Jan Lindström
2e814d4702 Merge InnoDB 5.7 from mysql-5.7.9.
Contains also

MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7

	The failure happened because 5.7 has changed the signature of
	the bool handler::primary_key_is_clustered() const
	virtual function ("const" was added). InnoDB was using the old
	signature which caused the function not to be used.

MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7

	Fixed mutexing problem on lock_trx_handle_wait. Note that
	rpl_parallel and rpl_optimistic_parallel tests still
	fail.

MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
  Reason: incorrect merge

MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
  Reason: incorrect merge
2016-09-02 13:22:28 +03:00
Sergei Golubchik
720e04ff67 5.6.31 2016-06-21 14:21:03 +02:00
Sergei Golubchik
6d06fbbd1d move to storage/innobase 2015-05-04 19:17:21 +02:00
Renamed from include/dict0boot.h (Browse further)