mariadb/storage/innobase/include/dict0boot.h

298 lines
10 KiB
C
Raw Normal View History

/*****************************************************************************
2016-06-21 14:21:03 +02:00
Copyright (c) 1996, 2016, Oracle and/or its affiliates. All Rights Reserved.
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
Copyright (c) 2018, 2022, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc.,
2019-05-11 19:25:02 +03:00
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
*****************************************************************************/
/**************************************************//**
@file include/dict0boot.h
Data dictionary creation and booting
Created 4/18/1996 Heikki Tuuri
*******************************************************/
#ifndef dict0boot_h
#define dict0boot_h
#include "mtr0mtr.h"
#include "mtr0log.h"
#include "ut0byte.h"
#include "buf0buf.h"
#include "dict0dict.h"
/**********************************************************************//**
Returns a new table, index, or space id. */
void
dict_hdr_get_new_id(
/*================*/
table_id_t* table_id, /*!< out: table id
(not assigned if NULL) */
index_id_t* index_id, /*!< out: index id
(not assigned if NULL) */
MDEV-17794 Do not assign persistent ID for temporary tables InnoDB in MySQL 5.7 introduced two new parameters to the function dict_hdr_get_new_id(), to allow redo logging to be disabled when assigning identifiers to temporary tables or during the backup-unfriendly TRUNCATE TABLE that was replaced in MariaDB by MDEV-13564. Now that MariaDB 10.4.0 removed the crash recovery code for the backup-unfriendly TRUNCATE, we can revert dict_hdr_get_new_id() to be used only for persistent data structures. dict_table_assign_new_id(): Remove. This was a simple 2-line function that was called from few places. dict_table_open_on_id_low(): Declare in the only file where it is called. dict_sys_t::temp_id_hash: A separate lookup table for temporary tables. Table names will be in the common dict_sys_t::table_hash. dict_sys_t::get_temporary_table_id(): Assign a temporary table ID. dict_sys_t::get_table(): Look up a persistent table. dict_sys_t::get_temporary_table(): Look up a temporary table. dict_sys_t::temp_table_id: The sequence of temporary table identifiers. Starts from DICT_HDR_FIRST_ID, so that we can continue to simply compare dict_table_t::id to a few constants for the persistent hard-coded data dictionary tables. undo_node_t::state: Distinguish temporary and persistent tables. lock_check_dict_lock(), lock_get_table_id(): Assert that there cannot be locks on temporary tables. row_rec_to_index_entry_impl(): Assert that there cannot be metadata records on temporary tables. row_undo_ins_parse_undo_rec(): Distinguish temporary and persistent tables. Move some assertions from the only caller. Return whether the table was found. row_undo_ins(): Add some assertions. row_undo_mod_clust(), row_undo_mod(): Do not assign node->state. Let row_undo() do that. row_undo_mod_parse_undo_rec(): Distinguish temporary and persistent tables. Move some assertions from the only caller. Return whether the table was found. row_undo_try_truncate(): Renamed and simplified from trx_roll_try_truncate(). row_undo_rec_get(): Replaces trx_roll_pop_top_rec_of_trx() and trx_roll_pop_top_rec(). Fetch an undo log record, and assign undo->state accordingly. trx_undo_truncate_end(): Acquire the rseg->mutex only for the minimum required duration, and release it between mini-transactions.
2018-11-22 15:36:50 +02:00
ulint* space_id); /*!< out: space id
(not assigned if NULL) */
/** Update dict_sys.row_id in the dictionary header file page. */
void dict_hdr_flush_row_id(row_id_t id);
/** @return A new value for GEN_CLUST_INDEX(DB_ROW_ID) */
inline row_id_t dict_sys_t::get_new_row_id()
{
row_id_t id= row_id.fetch_add(1);
if (!(id % ROW_ID_WRITE_MARGIN))
dict_hdr_flush_row_id(id);
return id;
}
/** Ensure that row_id is not smaller than id, on IMPORT TABLESPACE */
inline void dict_sys_t::update_row_id(row_id_t id)
{
row_id_t sys_id= row_id;
while (id >= sys_id)
{
if (!row_id.compare_exchange_strong(sys_id, id))
continue;
if (!(id % ROW_ID_WRITE_MARGIN))
dict_hdr_flush_row_id(id);
break;
}
}
/**********************************************************************//**
Writes a row id to a record or other 6-byte stored form. */
inline void dict_sys_write_row_id(byte *field, row_id_t row_id)
{
static_assert(DATA_ROW_ID_LEN == 6, "compatibility");
mach_write_to_6(field, row_id);
}
/*****************************************************************//**
Initializes the data dictionary memory structures when the database is
started. This function is also called when the data dictionary is created.
@return DB_SUCCESS or error code. */
dberr_t
dict_boot(void)
/*===========*/
2016-06-21 14:21:03 +02:00
MY_ATTRIBUTE((warn_unused_result));
/*****************************************************************//**
Creates and initializes the data dictionary at the server bootstrap.
@return DB_SUCCESS or error code. */
dberr_t
dict_create(void)
/*=============*/
2016-06-21 14:21:03 +02:00
MY_ATTRIBUTE((warn_unused_result));
/*********************************************************************//**
Check if a table id belongs to system table.
@return true if the table id belongs to a system table. */
inline bool dict_is_sys_table(table_id_t id) { return id < DICT_HDR_FIRST_ID; }
/* Space id and page no where the dictionary header resides */
#define DICT_HDR_SPACE 0 /* the SYSTEM tablespace */
#define DICT_HDR_PAGE_NO FSP_DICT_HDR_PAGE_NO
/* The ids for the basic system tables and their indexes */
#define DICT_TABLES_ID 1
#define DICT_COLUMNS_ID 2
#define DICT_INDEXES_ID dict_index_t::DICT_INDEXES_ID /* 3 */
#define DICT_FIELDS_ID 4
/* The following is a secondary index on SYS_TABLES */
#define DICT_TABLE_IDS_ID 5
/* The offset of the dictionary header on the page */
#define DICT_HDR FSEG_PAGE_DATA
/*-------------------------------------------------------------*/
/* Dictionary header offsets */
#define DICT_HDR_ROW_ID 0 /* The latest assigned row id */
#define DICT_HDR_TABLE_ID 8 /* The latest assigned table id */
#define DICT_HDR_INDEX_ID 16 /* The latest assigned index id */
#define DICT_HDR_MAX_SPACE_ID 24 /* The latest assigned space id,or 0*/
#define DICT_HDR_MIX_ID_LOW 28 /* Obsolete,always DICT_HDR_FIRST_ID*/
#define DICT_HDR_TABLES 32 /* Root of SYS_TABLES clust index */
#define DICT_HDR_TABLE_IDS 36 /* Root of SYS_TABLE_IDS sec index */
#define DICT_HDR_COLUMNS 40 /* Root of SYS_COLUMNS clust index */
#define DICT_HDR_INDEXES 44 /* Root of SYS_INDEXES clust index */
#define DICT_HDR_FIELDS 48 /* Root of SYS_FIELDS clust index */
#define DICT_HDR_FSEG_HEADER 56 /* Segment header for the tablespace
segment into which the dictionary
header is created */
/*-------------------------------------------------------------*/
/* The columns in SYS_TABLES */
enum dict_col_sys_tables_enum {
DICT_COL__SYS_TABLES__NAME = 0,
DICT_COL__SYS_TABLES__ID = 1,
DICT_COL__SYS_TABLES__N_COLS = 2,
DICT_COL__SYS_TABLES__TYPE = 3,
DICT_COL__SYS_TABLES__MIX_ID = 4,
DICT_COL__SYS_TABLES__MIX_LEN = 5,
DICT_COL__SYS_TABLES__CLUSTER_ID = 6,
DICT_COL__SYS_TABLES__SPACE = 7,
DICT_NUM_COLS__SYS_TABLES = 8
};
/* The field numbers in the SYS_TABLES clustered index */
enum dict_fld_sys_tables_enum {
DICT_FLD__SYS_TABLES__NAME = 0,
DICT_FLD__SYS_TABLES__DB_TRX_ID = 1,
DICT_FLD__SYS_TABLES__DB_ROLL_PTR = 2,
DICT_FLD__SYS_TABLES__ID = 3,
DICT_FLD__SYS_TABLES__N_COLS = 4,
DICT_FLD__SYS_TABLES__TYPE = 5,
DICT_FLD__SYS_TABLES__MIX_ID = 6,
DICT_FLD__SYS_TABLES__MIX_LEN = 7,
DICT_FLD__SYS_TABLES__CLUSTER_ID = 8,
DICT_FLD__SYS_TABLES__SPACE = 9,
DICT_NUM_FIELDS__SYS_TABLES = 10
};
/* The field numbers in the SYS_TABLE_IDS index */
enum dict_fld_sys_table_ids_enum {
DICT_FLD__SYS_TABLE_IDS__ID = 0,
DICT_FLD__SYS_TABLE_IDS__NAME = 1,
DICT_NUM_FIELDS__SYS_TABLE_IDS = 2
};
/* The columns in SYS_COLUMNS */
enum dict_col_sys_columns_enum {
DICT_COL__SYS_COLUMNS__TABLE_ID = 0,
DICT_COL__SYS_COLUMNS__POS = 1,
DICT_COL__SYS_COLUMNS__NAME = 2,
DICT_COL__SYS_COLUMNS__MTYPE = 3,
DICT_COL__SYS_COLUMNS__PRTYPE = 4,
DICT_COL__SYS_COLUMNS__LEN = 5,
DICT_COL__SYS_COLUMNS__PREC = 6,
DICT_NUM_COLS__SYS_COLUMNS = 7
};
/* The field numbers in the SYS_COLUMNS clustered index */
enum dict_fld_sys_columns_enum {
DICT_FLD__SYS_COLUMNS__TABLE_ID = 0,
DICT_FLD__SYS_COLUMNS__POS = 1,
DICT_FLD__SYS_COLUMNS__DB_TRX_ID = 2,
DICT_FLD__SYS_COLUMNS__DB_ROLL_PTR = 3,
DICT_FLD__SYS_COLUMNS__NAME = 4,
DICT_FLD__SYS_COLUMNS__MTYPE = 5,
DICT_FLD__SYS_COLUMNS__PRTYPE = 6,
DICT_FLD__SYS_COLUMNS__LEN = 7,
DICT_FLD__SYS_COLUMNS__PREC = 8,
DICT_NUM_FIELDS__SYS_COLUMNS = 9
};
/* The columns in SYS_INDEXES */
enum dict_col_sys_indexes_enum {
DICT_COL__SYS_INDEXES__TABLE_ID = 0,
DICT_COL__SYS_INDEXES__ID = 1,
DICT_COL__SYS_INDEXES__NAME = 2,
DICT_COL__SYS_INDEXES__N_FIELDS = 3,
DICT_COL__SYS_INDEXES__TYPE = 4,
DICT_COL__SYS_INDEXES__SPACE = 5,
DICT_COL__SYS_INDEXES__PAGE_NO = 6,
DICT_COL__SYS_INDEXES__MERGE_THRESHOLD = 7,
DICT_NUM_COLS__SYS_INDEXES = 8
};
/* The field numbers in the SYS_INDEXES clustered index */
enum dict_fld_sys_indexes_enum {
DICT_FLD__SYS_INDEXES__TABLE_ID = 0,
DICT_FLD__SYS_INDEXES__ID = 1,
DICT_FLD__SYS_INDEXES__DB_TRX_ID = 2,
DICT_FLD__SYS_INDEXES__DB_ROLL_PTR = 3,
DICT_FLD__SYS_INDEXES__NAME = 4,
DICT_FLD__SYS_INDEXES__N_FIELDS = 5,
DICT_FLD__SYS_INDEXES__TYPE = 6,
DICT_FLD__SYS_INDEXES__SPACE = 7,
DICT_FLD__SYS_INDEXES__PAGE_NO = 8,
DICT_FLD__SYS_INDEXES__MERGE_THRESHOLD = 9,
DICT_NUM_FIELDS__SYS_INDEXES = 10
};
/* The columns in SYS_FIELDS */
enum dict_col_sys_fields_enum {
DICT_COL__SYS_FIELDS__INDEX_ID = 0,
DICT_COL__SYS_FIELDS__POS = 1,
DICT_COL__SYS_FIELDS__COL_NAME = 2,
DICT_NUM_COLS__SYS_FIELDS = 3
};
/* The field numbers in the SYS_FIELDS clustered index */
enum dict_fld_sys_fields_enum {
DICT_FLD__SYS_FIELDS__INDEX_ID = 0,
DICT_FLD__SYS_FIELDS__POS = 1,
DICT_FLD__SYS_FIELDS__DB_TRX_ID = 2,
DICT_FLD__SYS_FIELDS__DB_ROLL_PTR = 3,
DICT_FLD__SYS_FIELDS__COL_NAME = 4,
DICT_NUM_FIELDS__SYS_FIELDS = 5
};
/* The columns in SYS_FOREIGN */
enum dict_col_sys_foreign_enum {
DICT_COL__SYS_FOREIGN__ID = 0,
DICT_COL__SYS_FOREIGN__FOR_NAME = 1,
DICT_COL__SYS_FOREIGN__REF_NAME = 2,
DICT_COL__SYS_FOREIGN__N_COLS = 3,
DICT_NUM_COLS__SYS_FOREIGN = 4
};
/* The field numbers in the SYS_FOREIGN clustered index */
enum dict_fld_sys_foreign_enum {
DICT_FLD__SYS_FOREIGN__ID = 0,
DICT_FLD__SYS_FOREIGN__DB_TRX_ID = 1,
DICT_FLD__SYS_FOREIGN__DB_ROLL_PTR = 2,
DICT_FLD__SYS_FOREIGN__FOR_NAME = 3,
DICT_FLD__SYS_FOREIGN__REF_NAME = 4,
DICT_FLD__SYS_FOREIGN__N_COLS = 5,
DICT_NUM_FIELDS__SYS_FOREIGN = 6
};
/* The field numbers in the SYS_FOREIGN_FOR_NAME secondary index */
enum dict_fld_sys_foreign_for_name_enum {
DICT_FLD__SYS_FOREIGN_FOR_NAME__NAME = 0,
DICT_FLD__SYS_FOREIGN_FOR_NAME__ID = 1,
DICT_NUM_FIELDS__SYS_FOREIGN_FOR_NAME = 2
};
/* The columns in SYS_FOREIGN_COLS */
enum dict_col_sys_foreign_cols_enum {
DICT_COL__SYS_FOREIGN_COLS__ID = 0,
DICT_COL__SYS_FOREIGN_COLS__POS = 1,
DICT_COL__SYS_FOREIGN_COLS__FOR_COL_NAME = 2,
DICT_COL__SYS_FOREIGN_COLS__REF_COL_NAME = 3,
DICT_NUM_COLS__SYS_FOREIGN_COLS = 4
};
/* The field numbers in the SYS_FOREIGN_COLS clustered index */
enum dict_fld_sys_foreign_cols_enum {
DICT_FLD__SYS_FOREIGN_COLS__ID = 0,
DICT_FLD__SYS_FOREIGN_COLS__POS = 1,
DICT_FLD__SYS_FOREIGN_COLS__DB_TRX_ID = 2,
DICT_FLD__SYS_FOREIGN_COLS__DB_ROLL_PTR = 3,
DICT_FLD__SYS_FOREIGN_COLS__FOR_COL_NAME = 4,
DICT_FLD__SYS_FOREIGN_COLS__REF_COL_NAME = 5,
DICT_NUM_FIELDS__SYS_FOREIGN_COLS = 6
};
/* The columns in SYS_VIRTUAL */
enum dict_col_sys_virtual_enum {
DICT_COL__SYS_VIRTUAL__TABLE_ID = 0,
DICT_COL__SYS_VIRTUAL__POS = 1,
DICT_COL__SYS_VIRTUAL__BASE_POS = 2,
DICT_NUM_COLS__SYS_VIRTUAL = 3
};
/* The field numbers in the SYS_VIRTUAL clustered index */
enum dict_fld_sys_virtual_enum {
DICT_FLD__SYS_VIRTUAL__TABLE_ID = 0,
DICT_FLD__SYS_VIRTUAL__POS = 1,
DICT_FLD__SYS_VIRTUAL__BASE_POS = 2,
DICT_FLD__SYS_VIRTUAL__DB_TRX_ID = 3,
DICT_FLD__SYS_VIRTUAL__DB_ROLL_PTR = 4,
DICT_NUM_FIELDS__SYS_VIRTUAL = 5
};
/* A number of the columns above occur in multiple tables. These are the
length of thos fields. */
#define DICT_FLD_LEN_SPACE 4
#define DICT_FLD_LEN_FLAGS 4
#endif