mariadb/storage/innobase/include/fsp0fsp.h

769 lines
28 KiB
C
Raw Normal View History

/*****************************************************************************
2016-06-21 14:21:03 +02:00
Copyright (c) 1995, 2016, Oracle and/or its affiliates. All Rights Reserved.
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
Copyright (c) 2013, 2022, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
*****************************************************************************/
/**************************************************//**
@file include/fsp0fsp.h
File space management
Created 12/18/1995 Heikki Tuuri
*******************************************************/
#ifndef fsp0fsp_h
#define fsp0fsp_h
#include "assume_aligned.h"
#include "fsp0types.h"
#include "fut0lst.h"
#include "ut0byte.h"
#ifndef UNIV_INNOCHECKSUM
#include "mtr0mtr.h"
#include "page0types.h"
#include "rem0types.h"
#else
# include "mach0data.h"
#endif /* !UNIV_INNOCHECKSUM */
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
/** @return the PAGE_SSIZE flags for the current innodb_page_size */
#define FSP_FLAGS_PAGE_SSIZE() \
((srv_page_size == UNIV_PAGE_SIZE_ORIG) ? \
0U : (srv_page_size_shift - UNIV_ZIP_SIZE_SHIFT_MIN + 1) \
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
<< FSP_FLAGS_POS_PAGE_SSIZE)
MDEV-12026: Implement innodb_checksum_algorithm=full_crc32 MariaDB data-at-rest encryption (innodb_encrypt_tables) had repurposed the same unused data field that was repurposed in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN) field of SPATIAL INDEX. Because of this, MariaDB was unable to support encryption on SPATIAL INDEX pages. Furthermore, InnoDB page checksums skipped some bytes, and there are multiple variations and checksum algorithms. By default, InnoDB accepts all variations of all algorithms that ever existed. This unnecessarily weakens the page checksums. We hereby introduce two more innodb_checksum_algorithm variants (full_crc32, strict_full_crc32) that are special in a way: When either setting is active, newly created data files will carry a flag (fil_space_t::full_crc32()) that indicates that all pages of the file will use a full CRC-32C checksum over the entire page contents (excluding the bytes where the checksum is stored, at the very end of the page). Such files will always use that checksum, no matter what the parameter innodb_checksum_algorithm is assigned to. For old files, the old checksum algorithms will continue to be used. The value strict_full_crc32 will be equivalent to strict_crc32 and the value full_crc32 will be equivalent to crc32. ROW_FORMAT=COMPRESSED tables will only use the old format. These tables do not support new features, such as larger innodb_page_size or instant ADD/DROP COLUMN. They may be deprecated in the future. We do not want an unnecessary file format change for them. The new full_crc32() format also cleans up the MariaDB tablespace flags. We will reserve flags to store the page_compressed compression algorithm, and to store the compressed payload length, so that checksum can be computed over the compressed (and possibly encrypted) stream and can be validated without decrypting or decompressing the page. In the full_crc32 format, there no longer are separate before-encryption and after-encryption checksums for pages. The single checksum is computed on the page contents that is written to the file. We do not make the new algorithm the default for two reasons. First, MariaDB 10.4.2 was a beta release, and the default values of parameters should not change after beta. Second, we did not yet implement the full_crc32 format for page_compressed pages. This will be fixed in MDEV-18644. This is joint work with Marko Mäkelä.
2019-02-19 21:00:00 +02:00
/** @return the PAGE_SSIZE flags for the current innodb_page_size in
full checksum format */
#define FSP_FLAGS_FCRC32_PAGE_SSIZE() \
((srv_page_size_shift - UNIV_ZIP_SIZE_SHIFT_MIN + 1) \
<< FSP_FLAGS_FCRC32_POS_PAGE_SSIZE)
/* @defgroup Compatibility macros for MariaDB 10.1.0 through 10.1.20;
see the table in fsp0types.h @{ */
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
/** Zero relative shift position of the PAGE_COMPRESSION field */
#define FSP_FLAGS_POS_PAGE_COMPRESSION_MARIADB101 \
(FSP_FLAGS_POS_ATOMIC_BLOBS \
+ FSP_FLAGS_WIDTH_ATOMIC_BLOBS)
/** Zero relative shift position of the PAGE_COMPRESSION_LEVEL field */
#define FSP_FLAGS_POS_PAGE_COMPRESSION_LEVEL_MARIADB101 \
(FSP_FLAGS_POS_PAGE_COMPRESSION_MARIADB101 + 1)
/** Zero relative shift position of the ATOMIC_WRITES field */
#define FSP_FLAGS_POS_ATOMIC_WRITES_MARIADB101 \
(FSP_FLAGS_POS_PAGE_COMPRESSION_LEVEL_MARIADB101 + 4)
/** Zero relative shift position of the PAGE_SSIZE field */
#define FSP_FLAGS_POS_PAGE_SSIZE_MARIADB101 \
(FSP_FLAGS_POS_ATOMIC_WRITES_MARIADB101 + 2)
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
/** Bit mask of the PAGE_COMPRESSION field */
#define FSP_FLAGS_MASK_PAGE_COMPRESSION_MARIADB101 \
(1U << FSP_FLAGS_POS_PAGE_COMPRESSION_MARIADB101)
/** Bit mask of the PAGE_COMPRESSION_LEVEL field */
#define FSP_FLAGS_MASK_PAGE_COMPRESSION_LEVEL_MARIADB101 \
(15U << FSP_FLAGS_POS_PAGE_COMPRESSION_LEVEL_MARIADB101)
/** Bit mask of the ATOMIC_WRITES field */
#define FSP_FLAGS_MASK_ATOMIC_WRITES_MARIADB101 \
(3U << FSP_FLAGS_POS_ATOMIC_WRITES_MARIADB101)
/** Bit mask of the PAGE_SSIZE field */
#define FSP_FLAGS_MASK_PAGE_SSIZE_MARIADB101 \
(15U << FSP_FLAGS_POS_PAGE_SSIZE_MARIADB101)
2014-12-22 16:53:17 +02:00
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
/** Return the value of the PAGE_COMPRESSION field */
#define FSP_FLAGS_GET_PAGE_COMPRESSION_MARIADB101(flags) \
((flags & FSP_FLAGS_MASK_PAGE_COMPRESSION_MARIADB101) \
>> FSP_FLAGS_POS_PAGE_COMPRESSION_MARIADB101)
/** Return the value of the PAGE_COMPRESSION_LEVEL field */
#define FSP_FLAGS_GET_PAGE_COMPRESSION_LEVEL_MARIADB101(flags) \
((flags & FSP_FLAGS_MASK_PAGE_COMPRESSION_LEVEL_MARIADB101) \
>> FSP_FLAGS_POS_PAGE_COMPRESSION_LEVEL_MARIADB101)
/** Return the value of the PAGE_SSIZE field */
#define FSP_FLAGS_GET_PAGE_SSIZE_MARIADB101(flags) \
((flags & FSP_FLAGS_MASK_PAGE_SSIZE_MARIADB101) \
>> FSP_FLAGS_POS_PAGE_SSIZE_MARIADB101)
/* @} */
/* @defgroup Tablespace Header Constants (moved from fsp0fsp.c) @{ */
/** Offset of the space header within a file page */
#define FSP_HEADER_OFFSET FIL_PAGE_DATA
/* The data structures in files are defined just as byte strings in C */
typedef byte xdes_t;
/* SPACE HEADER
============
File space header data structure: this data structure is contained in the
first page of a space. The space for this header is reserved in every extent
descriptor page, but used only in the first. */
/*-------------------------------------*/
#define FSP_SPACE_ID 0 /* space id */
#define FSP_NOT_USED 4 /* this field contained a value up to
which we know that the modifications
in the database have been flushed to
the file space; not used now */
#define FSP_SIZE 8 /* Current size of the space in
pages */
#define FSP_FREE_LIMIT 12 /* Minimum page number for which the
free list has not been initialized:
the pages >= this limit are, by
definition, free; note that in a
single-table tablespace where size
< 64 pages, this number is 64, i.e.,
we have initialized the space
about the first extent, but have not
physically allocated those pages to the
file */
#define FSP_SPACE_FLAGS 16 /* fsp_space_t.flags, similar to
dict_table_t::flags */
#define FSP_FRAG_N_USED 20 /* number of used pages in the
FSP_FREE_FRAG list */
#define FSP_FREE 24 /* list of free extents */
#define FSP_FREE_FRAG (24 + FLST_BASE_NODE_SIZE)
/* list of partially free extents not
belonging to any segment */
#define FSP_FULL_FRAG (24 + 2 * FLST_BASE_NODE_SIZE)
/* list of full extents not belonging
to any segment */
#define FSP_SEG_ID (24 + 3 * FLST_BASE_NODE_SIZE)
/* 8 bytes which give the first unused
segment id */
#define FSP_SEG_INODES_FULL (32 + 3 * FLST_BASE_NODE_SIZE)
/* list of pages containing segment
headers, where all the segment inode
slots are reserved */
#define FSP_SEG_INODES_FREE (32 + 4 * FLST_BASE_NODE_SIZE)
/* list of pages containing segment
headers, where not all the segment
header slots are reserved */
/*-------------------------------------*/
/* File space header size */
#define FSP_HEADER_SIZE (32 + 5 * FLST_BASE_NODE_SIZE)
#define FSP_FREE_ADD 4 /* this many free extents are added
to the free list from above
FSP_FREE_LIMIT at a time */
/* @} */
/* @defgroup File Segment Inode Constants (moved from fsp0fsp.c) @{ */
/* FILE SEGMENT INODE
==================
Segment inode which is created for each segment in a tablespace. NOTE: in
purge we assume that a segment having only one currently used page can be
freed in a few steps, so that the freeing cannot fill the file buffer with
bufferfixed file pages. */
typedef byte fseg_inode_t;
#define FSEG_INODE_PAGE_NODE FSEG_PAGE_DATA
/* the list node for linking
segment inode pages */
#define FSEG_ARR_OFFSET (FSEG_PAGE_DATA + FLST_NODE_SIZE)
/*-------------------------------------*/
#define FSEG_ID 0 /* 8 bytes of segment id: if this is 0,
it means that the header is unused */
#define FSEG_NOT_FULL_N_USED 8
/* number of used segment pages in
the FSEG_NOT_FULL list */
#define FSEG_FREE 12
/* list of free extents of this
segment */
#define FSEG_NOT_FULL (12 + FLST_BASE_NODE_SIZE)
/* list of partially free extents */
#define FSEG_FULL (12 + 2 * FLST_BASE_NODE_SIZE)
/* list of full extents */
#define FSEG_MAGIC_N (12 + 3 * FLST_BASE_NODE_SIZE)
/* magic number used in debugging */
#define FSEG_FRAG_ARR (16 + 3 * FLST_BASE_NODE_SIZE)
/* array of individual pages
belonging to this segment in fsp
fragment extent lists */
#define FSEG_FRAG_ARR_N_SLOTS (FSP_EXTENT_SIZE / 2)
/* number of slots in the array for
the fragment pages */
#define FSEG_FRAG_SLOT_SIZE 4 /* a fragment page slot contains its
page number within space, FIL_NULL
means that the slot is not in use */
/*-------------------------------------*/
#define FSEG_INODE_SIZE \
(16 + 3 * FLST_BASE_NODE_SIZE \
+ FSEG_FRAG_ARR_N_SLOTS * FSEG_FRAG_SLOT_SIZE)
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
static constexpr byte FSEG_MAGIC_N_BYTES[4]={0x05,0xd6,0x69,0xd2};
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
#define FSEG_FILLFACTOR 8 /* If the number of unused but reserved
pages in a segment is less than
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
reserved pages / FSEG_FILLFACTOR,
and there are
at least FSEG_FRAG_LIMIT used pages,
then we allow a new empty extent to
be added to the segment in
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
fseg_alloc_free_page_general().
Otherwise, we
use unused pages of the segment. */
#define FSEG_FRAG_LIMIT FSEG_FRAG_ARR_N_SLOTS
/* If the segment has >= this many
used pages, it may be expanded by
allocating extents to the segment;
until that only individual fragment
pages are allocated from the space */
#define FSEG_FREE_LIST_LIMIT 40 /* If the reserved size of a segment
is at least this many extents, we
allow extents to be put to the free
list of the extent: at most
FSEG_FREE_LIST_MAX_LEN many */
#define FSEG_FREE_LIST_MAX_LEN 4
/* @} */
/* @defgroup Extent Descriptor Constants (moved from fsp0fsp.c) @{ */
/* EXTENT DESCRIPTOR
=================
File extent descriptor data structure: contains bits to tell which pages in
the extent are free and which contain old tuple version to clean. */
/*-------------------------------------*/
#define XDES_ID 0 /* The identifier of the segment
to which this extent belongs */
#define XDES_FLST_NODE 8 /* The list node data structure
for the descriptors */
#define XDES_STATE (FLST_NODE_SIZE + 8)
/* contains state information
of the extent */
#define XDES_BITMAP (FLST_NODE_SIZE + 12)
/* Descriptor bitmap of the pages
in the extent */
/*-------------------------------------*/
#define XDES_BITS_PER_PAGE 2 /* How many bits are there per page */
#define XDES_FREE_BIT 0 /* Index of the bit which tells if
the page is free */
#define XDES_CLEAN_BIT 1 /* NOTE: currently not used!
Index of the bit which tells if
there are old versions of tuples
on the page */
/* States of a descriptor */
#define XDES_FREE 1 /* extent is in free list of space */
#define XDES_FREE_FRAG 2 /* extent is in free fragment list of
space */
#define XDES_FULL_FRAG 3 /* extent is in full fragment list of
space */
#define XDES_FSEG 4 /* extent belongs to a segment */
/** File extent data structure size in bytes. */
#define XDES_SIZE \
(XDES_BITMAP \
+ UT_BITS_IN_BYTES(FSP_EXTENT_SIZE * XDES_BITS_PER_PAGE))
/** File extent data structure size in bytes for MAX page size. */
#define XDES_SIZE_MAX \
(XDES_BITMAP \
+ UT_BITS_IN_BYTES(FSP_EXTENT_SIZE_MAX * XDES_BITS_PER_PAGE))
/** File extent data structure size in bytes for MIN page size. */
#define XDES_SIZE_MIN \
(XDES_BITMAP \
+ UT_BITS_IN_BYTES(FSP_EXTENT_SIZE_MIN * XDES_BITS_PER_PAGE))
/** Offset of the descriptor array on a descriptor page */
#define XDES_ARR_OFFSET (FSP_HEADER_OFFSET + FSP_HEADER_SIZE)
/**
Determine if a page is marked free.
@param[in] descr extent descriptor
@param[in] offset page offset within extent
@return whether the page is free */
inline bool xdes_is_free(const xdes_t *descr, ulint offset)
{
ut_ad(offset < FSP_EXTENT_SIZE);
ulint index= XDES_FREE_BIT + XDES_BITS_PER_PAGE * offset;
return ut_bit_get_nth(descr[XDES_BITMAP + (index >> 3)], index & 7);
}
#ifndef UNIV_INNOCHECKSUM
/* @} */
/** Read a tablespace header field.
@param[in] page first page of a tablespace
@param[in] field the header field
@return the contents of the header field */
inline uint32_t fsp_header_get_field(const page_t* page, ulint field)
{
return mach_read_from_4(FSP_HEADER_OFFSET + field +
my_assume_aligned<UNIV_ZIP_SIZE_MIN>(page));
}
/** Read the flags from the tablespace header page.
@param[in] page first page of a tablespace
@return the contents of FSP_SPACE_FLAGS */
inline uint32_t fsp_header_get_flags(const page_t *page)
{
return fsp_header_get_field(page, FSP_SPACE_FLAGS);
}
MDEV-11679 Remove redundant function fsp_header_get_crypt_offset() fsp_header_get_crypt_offset(): Remove. xdes_arr_size(): Remove. fsp_header_get_encryption_offset(): Make this an inline function. The correctness of this change was ensured with the following patch that ensures that the two functions returned the same value, only differing by FSP_HEADER_OFFSET (38 bytes): diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc index f2a4c6bf218..e96c788b7df 100644 --- a/storage/innobase/fsp/fsp0fsp.cc +++ b/storage/innobase/fsp/fsp0fsp.cc @@ -850,6 +850,7 @@ fsp_parse_init_file_page( return(ptr); } +static ulint fsp_header_get_encryption_offset(const page_size_t&); /**********************************************************************//** Initializes the fsp system. */ void @@ -868,6 +869,31 @@ fsp_init(void) #endif /* Does nothing at the moment */ + + for (ulint sz = 4096; sz <= 65536; sz *= 2) { + ulint m; + if (sz <= 16384) { + for (ulint ph = 1024; ph <= sz; ph *= 2) { + const page_size_t ps(ph, sz, true); + ulint maria = fsp_header_get_crypt_offset(ps, &m), + oracle = fsp_header_get_encryption_offset(ps); + if (maria != oracle + 38) { + ib::error() << "zip size mismatch: " + << maria << "!=" << oracle + << "(" << ph <<","<<sz<<")" + << m; + } + } + } + const page_size_t p(sz, sz, false); + ulint maria = fsp_header_get_crypt_offset(p, &m), + oracle = fsp_header_get_encryption_offset(p); + if (maria != oracle + 38) { + ib::error() << "size mismatch: " + << maria << "!=" << oracle + << "(" <<sz<<")" << m; + } + } } /**********************************************************************//**
2016-12-28 17:13:14 +02:00
/** Get the byte offset of encryption information in page 0.
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
MDEV-11679 Remove redundant function fsp_header_get_crypt_offset() fsp_header_get_crypt_offset(): Remove. xdes_arr_size(): Remove. fsp_header_get_encryption_offset(): Make this an inline function. The correctness of this change was ensured with the following patch that ensures that the two functions returned the same value, only differing by FSP_HEADER_OFFSET (38 bytes): diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc index f2a4c6bf218..e96c788b7df 100644 --- a/storage/innobase/fsp/fsp0fsp.cc +++ b/storage/innobase/fsp/fsp0fsp.cc @@ -850,6 +850,7 @@ fsp_parse_init_file_page( return(ptr); } +static ulint fsp_header_get_encryption_offset(const page_size_t&); /**********************************************************************//** Initializes the fsp system. */ void @@ -868,6 +869,31 @@ fsp_init(void) #endif /* Does nothing at the moment */ + + for (ulint sz = 4096; sz <= 65536; sz *= 2) { + ulint m; + if (sz <= 16384) { + for (ulint ph = 1024; ph <= sz; ph *= 2) { + const page_size_t ps(ph, sz, true); + ulint maria = fsp_header_get_crypt_offset(ps, &m), + oracle = fsp_header_get_encryption_offset(ps); + if (maria != oracle + 38) { + ib::error() << "zip size mismatch: " + << maria << "!=" << oracle + << "(" << ph <<","<<sz<<")" + << m; + } + } + } + const page_size_t p(sz, sz, false); + ulint maria = fsp_header_get_crypt_offset(p, &m), + oracle = fsp_header_get_encryption_offset(p); + if (maria != oracle + 38) { + ib::error() << "size mismatch: " + << maria << "!=" << oracle + << "(" <<sz<<")" << m; + } + } } /**********************************************************************//**
2016-12-28 17:13:14 +02:00
@return byte offset relative to FSP_HEADER_OFFSET */
inline MY_ATTRIBUTE((pure, warn_unused_result))
ulint fsp_header_get_encryption_offset(ulint zip_size)
MDEV-11679 Remove redundant function fsp_header_get_crypt_offset() fsp_header_get_crypt_offset(): Remove. xdes_arr_size(): Remove. fsp_header_get_encryption_offset(): Make this an inline function. The correctness of this change was ensured with the following patch that ensures that the two functions returned the same value, only differing by FSP_HEADER_OFFSET (38 bytes): diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc index f2a4c6bf218..e96c788b7df 100644 --- a/storage/innobase/fsp/fsp0fsp.cc +++ b/storage/innobase/fsp/fsp0fsp.cc @@ -850,6 +850,7 @@ fsp_parse_init_file_page( return(ptr); } +static ulint fsp_header_get_encryption_offset(const page_size_t&); /**********************************************************************//** Initializes the fsp system. */ void @@ -868,6 +869,31 @@ fsp_init(void) #endif /* Does nothing at the moment */ + + for (ulint sz = 4096; sz <= 65536; sz *= 2) { + ulint m; + if (sz <= 16384) { + for (ulint ph = 1024; ph <= sz; ph *= 2) { + const page_size_t ps(ph, sz, true); + ulint maria = fsp_header_get_crypt_offset(ps, &m), + oracle = fsp_header_get_encryption_offset(ps); + if (maria != oracle + 38) { + ib::error() << "zip size mismatch: " + << maria << "!=" << oracle + << "(" << ph <<","<<sz<<")" + << m; + } + } + } + const page_size_t p(sz, sz, false); + ulint maria = fsp_header_get_crypt_offset(p, &m), + oracle = fsp_header_get_encryption_offset(p); + if (maria != oracle + 38) { + ib::error() << "size mismatch: " + << maria << "!=" << oracle + << "(" <<sz<<")" << m; + } + } } /**********************************************************************//**
2016-12-28 17:13:14 +02:00
{
return zip_size
? XDES_ARR_OFFSET + XDES_SIZE * zip_size / FSP_EXTENT_SIZE
: XDES_ARR_OFFSET + (XDES_SIZE << srv_page_size_shift)
/ FSP_EXTENT_SIZE;
MDEV-11679 Remove redundant function fsp_header_get_crypt_offset() fsp_header_get_crypt_offset(): Remove. xdes_arr_size(): Remove. fsp_header_get_encryption_offset(): Make this an inline function. The correctness of this change was ensured with the following patch that ensures that the two functions returned the same value, only differing by FSP_HEADER_OFFSET (38 bytes): diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc index f2a4c6bf218..e96c788b7df 100644 --- a/storage/innobase/fsp/fsp0fsp.cc +++ b/storage/innobase/fsp/fsp0fsp.cc @@ -850,6 +850,7 @@ fsp_parse_init_file_page( return(ptr); } +static ulint fsp_header_get_encryption_offset(const page_size_t&); /**********************************************************************//** Initializes the fsp system. */ void @@ -868,6 +869,31 @@ fsp_init(void) #endif /* Does nothing at the moment */ + + for (ulint sz = 4096; sz <= 65536; sz *= 2) { + ulint m; + if (sz <= 16384) { + for (ulint ph = 1024; ph <= sz; ph *= 2) { + const page_size_t ps(ph, sz, true); + ulint maria = fsp_header_get_crypt_offset(ps, &m), + oracle = fsp_header_get_encryption_offset(ps); + if (maria != oracle + 38) { + ib::error() << "zip size mismatch: " + << maria << "!=" << oracle + << "(" << ph <<","<<sz<<")" + << m; + } + } + } + const page_size_t p(sz, sz, false); + ulint maria = fsp_header_get_crypt_offset(p, &m), + oracle = fsp_header_get_encryption_offset(p); + if (maria != oracle + 38) { + ib::error() << "size mismatch: " + << maria << "!=" << oracle + << "(" <<sz<<")" << m; + } + } } /**********************************************************************//**
2016-12-28 17:13:14 +02:00
}
/** Check the encryption key from the first page of a tablespace.
@param[in] fsp_flags tablespace flags
@param[in] page first page of a tablespace
@return true if success */
bool
fsp_header_check_encryption_key(
ulint fsp_flags,
page_t* page);
/** Initialize a tablespace header.
@param[in,out] space tablespace
@param[in] size current size in blocks
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
@param[in,out] mtr mini-transaction
@return error code */
dberr_t fsp_header_init(fil_space_t *space, uint32_t size, mtr_t *mtr)
MY_ATTRIBUTE((nonnull, warn_unused_result));
2020-09-09 13:06:46 +03:00
/** Create a new segment.
@param space tablespace
@param byte_offset byte offset of the created segment header
@param mtr mini-transaction
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
@param err error code
2020-09-09 13:06:46 +03:00
@param has_done_reservation whether fsp_reserve_free_extents() was invoked
@param block block where segment header is placed,
or NULL to allocate an additional page for that
@return the block where the segment header is placed, x-latched
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
@retval nullptr if could not create segment */
buf_block_t*
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
fseg_create(fil_space_t *space, ulint byte_offset, mtr_t *mtr, dberr_t *err,
bool has_done_reservation= false, buf_block_t *block= nullptr)
MY_ATTRIBUTE((nonnull(1,3,4), warn_unused_result));
MDEV-12266: Change dict_table_t::space to fil_space_t* InnoDB always keeps all tablespaces in the fil_system cache. The fil_system.LRU is only for closing file handles; the fil_space_t and fil_node_t for all data files will remain in main memory. Between startup to shutdown, they can only be created and removed by DDL statements. Therefore, we can let dict_table_t::space point directly to the fil_space_t. dict_table_t::space_id: A numeric tablespace ID for the corner cases where we do not have a tablespace. The most prominent examples are ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file. There are a few functional differences; most notably: (1) DROP TABLE will delete matching .ibd and .cfg files, even if they were not attached to the data dictionary. (2) Some error messages will report file names instead of numeric IDs. There still are many functions that use numeric tablespace IDs instead of fil_space_t*, and many functions could be converted to fil_space_t member functions. Also, Tablespace and Datafile should be merged with fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use fil_space_t& instead of a numeric ID, and after moving to a single buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to fil_space_t::page_hash. FilSpace: Remove. Only few calls to fil_space_acquire() will remain, and gradually they should be removed. mtr_t::set_named_space_id(ulint): Renamed from set_named_space(), to prevent accidental calls to this slower function. Very few callers remain. fseg_create(), fsp_reserve_free_extents(): Take fil_space_t* as a parameter instead of a space_id. fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(), fil_name_write_rename(), fil_rename_tablespace(). Mariabackup passes the parameter log=false; InnoDB passes log=true. dict_mem_table_create(): Take fil_space_t* instead of space_id as parameter. dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter 'status' with 'bool cached'. dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name. fil_ibd_open(): Return the tablespace. fil_space_t::set_imported(): Replaces fil_space_set_imported(). truncate_t: Change many member function parameters to fil_space_t*, and remove page_size parameters. row_truncate_prepare(): Merge to its only caller. row_drop_table_from_cache(): Assert that the table is persistent. dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL if the tablespace has been discarded. row_import_update_discarded_flag(): Remove a constant parameter.
2018-03-27 16:31:10 +03:00
/** Calculate the number of pages reserved by a segment,
and how many pages are currently used.
@param[in] block buffer block containing the file segment header
@param[in] header file segment header
@param[out] used number of pages that are used (not more than reserved)
@param[in,out] mtr mini-transaction
@return number of reserved pages */
ulint fseg_n_reserved_pages(const buf_block_t &block,
const fseg_header_t *header, ulint *used,
mtr_t *mtr)
MY_ATTRIBUTE((nonnull));
/**********************************************************************//**
Allocates a single free page from a segment. This function implements
the intelligent allocation strategy which tries to minimize file space
fragmentation.
@retval NULL if no page could be allocated */
2012-02-17 11:52:51 +02:00
buf_block_t*
fseg_alloc_free_page_general(
/*=========================*/
2012-02-17 11:52:51 +02:00
fseg_header_t* seg_header,/*!< in/out: segment header */
uint32_t hint, /*!< in: hint of which page would be
2012-02-17 11:52:51 +02:00
desirable */
byte direction,/*!< in: if the new page is needed because
of an index page split, and records are
inserted there in order, into which
direction they go alphabetically: FSP_DOWN,
FSP_UP, FSP_NO_DIR */
bool has_done_reservation, /*!< in: true if the caller has
already done the reservation for the page
with fsp_reserve_free_extents, then there
is no need to do the check for this individual
page */
2012-02-17 11:52:51 +02:00
mtr_t* mtr, /*!< in/out: mini-transaction */
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
mtr_t* init_mtr,/*!< in/out: mtr or another mini-transaction
in which the page should be initialized. */
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
dberr_t* err) /*!< out: error code */
2016-06-21 14:21:03 +02:00
MY_ATTRIBUTE((warn_unused_result, nonnull));
/** Reserves free pages from a tablespace. All mini-transactions which may
use several pages from the tablespace should call this function beforehand
and reserve enough free extents so that they certainly will be able
to do their operation, like a B-tree page split, fully. Reservations
MDEV-12266: Change dict_table_t::space to fil_space_t* InnoDB always keeps all tablespaces in the fil_system cache. The fil_system.LRU is only for closing file handles; the fil_space_t and fil_node_t for all data files will remain in main memory. Between startup to shutdown, they can only be created and removed by DDL statements. Therefore, we can let dict_table_t::space point directly to the fil_space_t. dict_table_t::space_id: A numeric tablespace ID for the corner cases where we do not have a tablespace. The most prominent examples are ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file. There are a few functional differences; most notably: (1) DROP TABLE will delete matching .ibd and .cfg files, even if they were not attached to the data dictionary. (2) Some error messages will report file names instead of numeric IDs. There still are many functions that use numeric tablespace IDs instead of fil_space_t*, and many functions could be converted to fil_space_t member functions. Also, Tablespace and Datafile should be merged with fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use fil_space_t& instead of a numeric ID, and after moving to a single buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to fil_space_t::page_hash. FilSpace: Remove. Only few calls to fil_space_acquire() will remain, and gradually they should be removed. mtr_t::set_named_space_id(ulint): Renamed from set_named_space(), to prevent accidental calls to this slower function. Very few callers remain. fseg_create(), fsp_reserve_free_extents(): Take fil_space_t* as a parameter instead of a space_id. fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(), fil_name_write_rename(), fil_rename_tablespace(). Mariabackup passes the parameter log=false; InnoDB passes log=true. dict_mem_table_create(): Take fil_space_t* instead of space_id as parameter. dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter 'status' with 'bool cached'. dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name. fil_ibd_open(): Return the tablespace. fil_space_t::set_imported(): Replaces fil_space_set_imported(). truncate_t: Change many member function parameters to fil_space_t*, and remove page_size parameters. row_truncate_prepare(): Merge to its only caller. row_drop_table_from_cache(): Assert that the table is persistent. dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL if the tablespace has been discarded. row_import_update_discarded_flag(): Remove a constant parameter.
2018-03-27 16:31:10 +03:00
must be released with function fil_space_t::release_free_extents()!
The alloc_type below has the following meaning: FSP_NORMAL means an
operation which will probably result in more space usage, like an
insert in a B-tree; FSP_UNDO means allocation to undo logs: if we are
deleting rows, then this allocation will in the long run result in
less space usage (after a purge); FSP_CLEANING means allocation done
in a physical record delete (like in a purge) or other cleaning operation
which will result in less space usage in the long run. We prefer the latter
two types of allocation: when space is scarce, FSP_NORMAL allocations
will not succeed, but the latter two allocations will succeed, if possible.
The purpose is to avoid dead end where the database is full but the
user cannot free any space because these freeing operations temporarily
reserve some space.
Single-table tablespaces whose size is < FSP_EXTENT_SIZE pages are a special
case. In this function we would liberally reserve several extents for
every page split or merge in a B-tree. But we do not want to waste disk space
if the table only occupies < FSP_EXTENT_SIZE pages. That is why we apply
different rules in that special case, just ensuring that there are n_pages
free pages available.
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
@param[out] n_reserved number of extents actually reserved; if we
return true and the tablespace size is <
FSP_EXTENT_SIZE pages, then this can be 0,
otherwise it is n_ext
@param[in,out] space tablespace
@param[in] n_ext number of extents to reserve
@param[in] alloc_type page reservation type (FSP_BLOB, etc)
@param[in,out] mtr the mini transaction
@param[out] err error code
@param[in] n_pages for small tablespaces (tablespace size is
less than FSP_EXTENT_SIZE), number of free
pages to reserve.
@return error code
@retval DB_SUCCESS if we were able to make the reservation */
dberr_t
fsp_reserve_free_extents(
uint32_t* n_reserved,
MDEV-12266: Change dict_table_t::space to fil_space_t* InnoDB always keeps all tablespaces in the fil_system cache. The fil_system.LRU is only for closing file handles; the fil_space_t and fil_node_t for all data files will remain in main memory. Between startup to shutdown, they can only be created and removed by DDL statements. Therefore, we can let dict_table_t::space point directly to the fil_space_t. dict_table_t::space_id: A numeric tablespace ID for the corner cases where we do not have a tablespace. The most prominent examples are ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file. There are a few functional differences; most notably: (1) DROP TABLE will delete matching .ibd and .cfg files, even if they were not attached to the data dictionary. (2) Some error messages will report file names instead of numeric IDs. There still are many functions that use numeric tablespace IDs instead of fil_space_t*, and many functions could be converted to fil_space_t member functions. Also, Tablespace and Datafile should be merged with fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use fil_space_t& instead of a numeric ID, and after moving to a single buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to fil_space_t::page_hash. FilSpace: Remove. Only few calls to fil_space_acquire() will remain, and gradually they should be removed. mtr_t::set_named_space_id(ulint): Renamed from set_named_space(), to prevent accidental calls to this slower function. Very few callers remain. fseg_create(), fsp_reserve_free_extents(): Take fil_space_t* as a parameter instead of a space_id. fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(), fil_name_write_rename(), fil_rename_tablespace(). Mariabackup passes the parameter log=false; InnoDB passes log=true. dict_mem_table_create(): Take fil_space_t* instead of space_id as parameter. dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter 'status' with 'bool cached'. dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name. fil_ibd_open(): Return the tablespace. fil_space_t::set_imported(): Replaces fil_space_set_imported(). truncate_t: Change many member function parameters to fil_space_t*, and remove page_size parameters. row_truncate_prepare(): Merge to its only caller. row_drop_table_from_cache(): Assert that the table is persistent. dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL if the tablespace has been discarded. row_import_update_discarded_flag(): Remove a constant parameter.
2018-03-27 16:31:10 +03:00
fil_space_t* space,
uint32_t n_ext,
fsp_reserve_t alloc_type,
mtr_t* mtr,
uint32_t n_pages = 2);
/** Free a page in a file segment.
@param[in,out] seg_header file segment header
@param[in,out] space tablespace
@param[in] offset page number
@param[in,out] mtr mini-transaction
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
@param[in] have_latch whether space->x_lock() was already called
@return error code */
dberr_t
MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB If the InnoDB buffer pool contains many pages for a table or index that is being dropped or rebuilt, and if many of such pages are pointed to by the adaptive hash index, dropping the adaptive hash index may consume a lot of time. The time-consuming operation of dropping the adaptive hash index entries is being executed while the InnoDB data dictionary cache dict_sys is exclusively locked. It is not actually necessary to drop all adaptive hash index entries at the time a table or index is being dropped or rebuilt. We can let the LRU replacement policy of the buffer pool take care of this gradually. For this to work, we must detach the dict_table_t and dict_index_t objects from the main dict_sys cache, and once the last adaptive hash index entry for the detached table is removed (when the garbage page is evicted from the buffer pool) we can free the dict_table_t and dict_index_t object. Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE skip both the buffer pool eviction and the drop of the adaptive hash index. We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE. We can remove the eviction from DROP TABLE. We must retain the eviction in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the discarded table is being re-imported with the same tablespace identifier, the fresh data from the imported tablespace will replace any stale pages in the buffer pool. rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can no longer be interrupted inside InnoDB. fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(), fseg_free_page_low(), fseg_free_extent(): Remove the parameter that specifies whether the adaptive hash index should be dropped. btr_search_lazy_free(): Lazily free an index when the last reference to it is dropped from the adaptive hash index. buf_pool_clear_hash_index(): Declare static, and move to the same compilation unit with the bulk of the adaptive hash index code. dict_index_t::clone(), dict_index_t::clone_if_needed(): Clone an index that is being rebuilt while adaptive hash index entries exist. The original index will be inserted into dict_table_t::freed_indexes and dict_index_t::set_freed() will be called. dict_index_t::set_freed(), dict_index_t::freed(): Note that or check whether the index has been freed. We will use the impossible page number 1 to denote this condition. dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count(). dict_index_t::detach_columns(): Move the assignment n_fields=0 to ha_innobase_inplace_ctx::clear_added_indexes(). We must have access to the columns when freeing the adaptive hash index. Note: dict_table_t::v_cols[] will remain valid. If virtual columns are dropped or added, the table definition will be reloaded in ha_innobase::commit_inplace_alter_table(). buf_page_mtr_lock(): Drop a stale adaptive hash index if needed. We will also reduce the number of btr_get_search_latch() calls and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT in order to benefit cmake -DWITH_INNODB_AHI=OFF.
2020-05-15 17:10:59 +03:00
fseg_free_page(
fseg_header_t* seg_header,
fil_space_t* space,
uint32_t offset,
mtr_t* mtr,
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
bool have_latch = false)
2016-06-21 14:21:03 +02:00
MY_ATTRIBUTE((nonnull, warn_unused_result));
MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
/** Determine whether a page is allocated.
@param space tablespace
@param page page number
@return error code
@retval DB_SUCCESS if the page is marked as free
@retval DB_SUCCESS_LOCKED_REC if the page is marked as allocated */
dberr_t fseg_page_is_allocated(fil_space_t *space, unsigned page)
MY_ATTRIBUTE((nonnull, warn_unused_result));
/** Frees part of a segment. This function can be used to free
a segment by repeatedly calling this function in different
mini-transactions. Doing the freeing in a single mini-transaction
might result in too big a mini-transaction.
@param header segment header; NOTE: if the header resides on first
page of the frag list of the segment, this pointer
becomes obsolete after the last freeing step
@param mtr mini-transaction
@param ahi Drop the adaptive hash index
MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB If the InnoDB buffer pool contains many pages for a table or index that is being dropped or rebuilt, and if many of such pages are pointed to by the adaptive hash index, dropping the adaptive hash index may consume a lot of time. The time-consuming operation of dropping the adaptive hash index entries is being executed while the InnoDB data dictionary cache dict_sys is exclusively locked. It is not actually necessary to drop all adaptive hash index entries at the time a table or index is being dropped or rebuilt. We can let the LRU replacement policy of the buffer pool take care of this gradually. For this to work, we must detach the dict_table_t and dict_index_t objects from the main dict_sys cache, and once the last adaptive hash index entry for the detached table is removed (when the garbage page is evicted from the buffer pool) we can free the dict_table_t and dict_index_t object. Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE skip both the buffer pool eviction and the drop of the adaptive hash index. We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE. We can remove the eviction from DROP TABLE. We must retain the eviction in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the discarded table is being re-imported with the same tablespace identifier, the fresh data from the imported tablespace will replace any stale pages in the buffer pool. rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can no longer be interrupted inside InnoDB. fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(), fseg_free_page_low(), fseg_free_extent(): Remove the parameter that specifies whether the adaptive hash index should be dropped. btr_search_lazy_free(): Lazily free an index when the last reference to it is dropped from the adaptive hash index. buf_pool_clear_hash_index(): Declare static, and move to the same compilation unit with the bulk of the adaptive hash index code. dict_index_t::clone(), dict_index_t::clone_if_needed(): Clone an index that is being rebuilt while adaptive hash index entries exist. The original index will be inserted into dict_table_t::freed_indexes and dict_index_t::set_freed() will be called. dict_index_t::set_freed(), dict_index_t::freed(): Note that or check whether the index has been freed. We will use the impossible page number 1 to denote this condition. dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count(). dict_index_t::detach_columns(): Move the assignment n_fields=0 to ha_innobase_inplace_ctx::clear_added_indexes(). We must have access to the columns when freeing the adaptive hash index. Note: dict_table_t::v_cols[] will remain valid. If virtual columns are dropped or added, the table definition will be reloaded in ha_innobase::commit_inplace_alter_table(). buf_page_mtr_lock(): Drop a stale adaptive hash index if needed. We will also reduce the number of btr_get_search_latch() calls and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT in order to benefit cmake -DWITH_INNODB_AHI=OFF.
2020-05-15 17:10:59 +03:00
@return whether the freeing was completed */
bool
MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB If the InnoDB buffer pool contains many pages for a table or index that is being dropped or rebuilt, and if many of such pages are pointed to by the adaptive hash index, dropping the adaptive hash index may consume a lot of time. The time-consuming operation of dropping the adaptive hash index entries is being executed while the InnoDB data dictionary cache dict_sys is exclusively locked. It is not actually necessary to drop all adaptive hash index entries at the time a table or index is being dropped or rebuilt. We can let the LRU replacement policy of the buffer pool take care of this gradually. For this to work, we must detach the dict_table_t and dict_index_t objects from the main dict_sys cache, and once the last adaptive hash index entry for the detached table is removed (when the garbage page is evicted from the buffer pool) we can free the dict_table_t and dict_index_t object. Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE skip both the buffer pool eviction and the drop of the adaptive hash index. We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE. We can remove the eviction from DROP TABLE. We must retain the eviction in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the discarded table is being re-imported with the same tablespace identifier, the fresh data from the imported tablespace will replace any stale pages in the buffer pool. rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can no longer be interrupted inside InnoDB. fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(), fseg_free_page_low(), fseg_free_extent(): Remove the parameter that specifies whether the adaptive hash index should be dropped. btr_search_lazy_free(): Lazily free an index when the last reference to it is dropped from the adaptive hash index. buf_pool_clear_hash_index(): Declare static, and move to the same compilation unit with the bulk of the adaptive hash index code. dict_index_t::clone(), dict_index_t::clone_if_needed(): Clone an index that is being rebuilt while adaptive hash index entries exist. The original index will be inserted into dict_table_t::freed_indexes and dict_index_t::set_freed() will be called. dict_index_t::set_freed(), dict_index_t::freed(): Note that or check whether the index has been freed. We will use the impossible page number 1 to denote this condition. dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count(). dict_index_t::detach_columns(): Move the assignment n_fields=0 to ha_innobase_inplace_ctx::clear_added_indexes(). We must have access to the columns when freeing the adaptive hash index. Note: dict_table_t::v_cols[] will remain valid. If virtual columns are dropped or added, the table definition will be reloaded in ha_innobase::commit_inplace_alter_table(). buf_page_mtr_lock(): Drop a stale adaptive hash index if needed. We will also reduce the number of btr_get_search_latch() calls and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT in order to benefit cmake -DWITH_INNODB_AHI=OFF.
2020-05-15 17:10:59 +03:00
fseg_free_step(
fseg_header_t* header,
mtr_t* mtr
#ifdef BTR_CUR_HASH_ADAPT
,bool ahi=false
#endif /* BTR_CUR_HASH_ADAPT */
)
MY_ATTRIBUTE((warn_unused_result));
/** Frees part of a segment. Differs from fseg_free_step because
this function leaves the header page unfreed.
@param header segment header which must reside on the first
fragment page of the segment
@param mtr mini-transaction
@param ahi drop the adaptive hash index
MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB If the InnoDB buffer pool contains many pages for a table or index that is being dropped or rebuilt, and if many of such pages are pointed to by the adaptive hash index, dropping the adaptive hash index may consume a lot of time. The time-consuming operation of dropping the adaptive hash index entries is being executed while the InnoDB data dictionary cache dict_sys is exclusively locked. It is not actually necessary to drop all adaptive hash index entries at the time a table or index is being dropped or rebuilt. We can let the LRU replacement policy of the buffer pool take care of this gradually. For this to work, we must detach the dict_table_t and dict_index_t objects from the main dict_sys cache, and once the last adaptive hash index entry for the detached table is removed (when the garbage page is evicted from the buffer pool) we can free the dict_table_t and dict_index_t object. Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE skip both the buffer pool eviction and the drop of the adaptive hash index. We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE. We can remove the eviction from DROP TABLE. We must retain the eviction in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the discarded table is being re-imported with the same tablespace identifier, the fresh data from the imported tablespace will replace any stale pages in the buffer pool. rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can no longer be interrupted inside InnoDB. fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(), fseg_free_page_low(), fseg_free_extent(): Remove the parameter that specifies whether the adaptive hash index should be dropped. btr_search_lazy_free(): Lazily free an index when the last reference to it is dropped from the adaptive hash index. buf_pool_clear_hash_index(): Declare static, and move to the same compilation unit with the bulk of the adaptive hash index code. dict_index_t::clone(), dict_index_t::clone_if_needed(): Clone an index that is being rebuilt while adaptive hash index entries exist. The original index will be inserted into dict_table_t::freed_indexes and dict_index_t::set_freed() will be called. dict_index_t::set_freed(), dict_index_t::freed(): Note that or check whether the index has been freed. We will use the impossible page number 1 to denote this condition. dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count(). dict_index_t::detach_columns(): Move the assignment n_fields=0 to ha_innobase_inplace_ctx::clear_added_indexes(). We must have access to the columns when freeing the adaptive hash index. Note: dict_table_t::v_cols[] will remain valid. If virtual columns are dropped or added, the table definition will be reloaded in ha_innobase::commit_inplace_alter_table(). buf_page_mtr_lock(): Drop a stale adaptive hash index if needed. We will also reduce the number of btr_get_search_latch() calls and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT in order to benefit cmake -DWITH_INNODB_AHI=OFF.
2020-05-15 17:10:59 +03:00
@return whether the freeing was completed, except for the header page */
bool
MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB If the InnoDB buffer pool contains many pages for a table or index that is being dropped or rebuilt, and if many of such pages are pointed to by the adaptive hash index, dropping the adaptive hash index may consume a lot of time. The time-consuming operation of dropping the adaptive hash index entries is being executed while the InnoDB data dictionary cache dict_sys is exclusively locked. It is not actually necessary to drop all adaptive hash index entries at the time a table or index is being dropped or rebuilt. We can let the LRU replacement policy of the buffer pool take care of this gradually. For this to work, we must detach the dict_table_t and dict_index_t objects from the main dict_sys cache, and once the last adaptive hash index entry for the detached table is removed (when the garbage page is evicted from the buffer pool) we can free the dict_table_t and dict_index_t object. Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE skip both the buffer pool eviction and the drop of the adaptive hash index. We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE. We can remove the eviction from DROP TABLE. We must retain the eviction in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the discarded table is being re-imported with the same tablespace identifier, the fresh data from the imported tablespace will replace any stale pages in the buffer pool. rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can no longer be interrupted inside InnoDB. fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(), fseg_free_page_low(), fseg_free_extent(): Remove the parameter that specifies whether the adaptive hash index should be dropped. btr_search_lazy_free(): Lazily free an index when the last reference to it is dropped from the adaptive hash index. buf_pool_clear_hash_index(): Declare static, and move to the same compilation unit with the bulk of the adaptive hash index code. dict_index_t::clone(), dict_index_t::clone_if_needed(): Clone an index that is being rebuilt while adaptive hash index entries exist. The original index will be inserted into dict_table_t::freed_indexes and dict_index_t::set_freed() will be called. dict_index_t::set_freed(), dict_index_t::freed(): Note that or check whether the index has been freed. We will use the impossible page number 1 to denote this condition. dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count(). dict_index_t::detach_columns(): Move the assignment n_fields=0 to ha_innobase_inplace_ctx::clear_added_indexes(). We must have access to the columns when freeing the adaptive hash index. Note: dict_table_t::v_cols[] will remain valid. If virtual columns are dropped or added, the table definition will be reloaded in ha_innobase::commit_inplace_alter_table(). buf_page_mtr_lock(): Drop a stale adaptive hash index if needed. We will also reduce the number of btr_get_search_latch() calls and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT in order to benefit cmake -DWITH_INNODB_AHI=OFF.
2020-05-15 17:10:59 +03:00
fseg_free_step_not_header(
fseg_header_t* header,
mtr_t* mtr
#ifdef BTR_CUR_HASH_ADAPT
,bool ahi=false
#endif /* BTR_CUR_HASH_ADAPT */
)
MY_ATTRIBUTE((warn_unused_result));
2018-10-29 11:26:23 +02:00
/** Reset the page type.
Data files created before MySQL 5.1.48 may contain garbage in FIL_PAGE_TYPE.
In MySQL 3.23.53, only undo log pages and index pages were tagged.
Any other pages were written with uninitialized bytes in FIL_PAGE_TYPE.
@param[in] block block with invalid FIL_PAGE_TYPE
@param[in] type expected page type
@param[in,out] mtr mini-transaction */
ATTRIBUTE_COLD
void fil_block_reset_type(const buf_block_t& block, ulint type, mtr_t* mtr);
/** Check (and if needed, reset) the page type.
Data files created before MySQL 5.1.48 may contain
garbage in the FIL_PAGE_TYPE field.
In MySQL 3.23.53, only undo log pages and index pages were tagged.
Any other pages were written with uninitialized bytes in FIL_PAGE_TYPE.
@param[in] page_id page number
@param[in,out] page page with possibly invalid FIL_PAGE_TYPE
@param[in] type expected page type
@param[in,out] mtr mini-transaction */
inline void
fil_block_check_type(
const buf_block_t& block,
ulint type,
mtr_t* mtr)
{
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
if (UNIV_UNLIKELY(type != fil_page_get_type(block.page.frame)))
fil_block_reset_type(block, type, mtr);
2018-10-29 11:26:23 +02:00
}
/** Checks if a page address is an extent descriptor page address.
@param[in] page_id page id
@param[in] physical_size page size
@return whether a descriptor page */
inline bool fsp_descr_page(const page_id_t page_id, ulint physical_size)
{
return (page_id.page_no() & (physical_size - 1)) == FSP_XDES_OFFSET;
}
/** Initialize a file page whose prior contents should be ignored.
@param[in,out] block buffer pool block */
MDEV-12353: Change the redo log encoding log_t::FORMAT_10_5: physical redo log format tag log_phys_t: Buffered records in the physical format. The log record bytes will follow the last data field, making use of alignment padding that would otherwise be wasted. If there are multiple records for the same page, also those may be appended to an existing log_phys_t object if the memory is available. In the physical format, the first byte of a record identifies the record and its length (up to 15 bytes). For longer records, the immediately following bytes will encode the remaining length in a variable-length encoding. Usually, a variable-length-encoded page identifier will follow, followed by optional payload, whose length is included in the initially encoded total record length. When a mini-transaction is updating multiple fields in a page, it can avoid repeating the tablespace identifier and page number by setting the same_page flag (most significant bit) in the first byte of the log record. The byte offset of the record will be relative to where the previous record for that page ended. Until MDEV-14425 introduces a separate file-level log for redo log checkpoints and file operations, we will write the file-level records in the page-level redo log file. The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT) will be removed in MDEV-14425, and one sequential scan of the page recovery log will suffice. Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags. If the information is needed, it can be parsed from WRITE records that modify FSP_SPACE_FLAGS. MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily as part of this work, before being replaced with WRITE (along with MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES). mtr_buf_t::empty(): Check if the buffer is empty. mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty. mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record, for the same_page encoding. page_recv_t::last_offset: Reflects mtr_t::m_last_offset. Valid values for last_offset during recovery should be 0 or above 8. (The first 8 bytes of a page are the checksum and the page number, and neither are ever updated directly by log records.) Internally, the special value 1 indicates that the same_page form will not be allowed for the subsequent record. mtr_t::page_create(): Take the block descriptor as parameter, so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE record will always followed by a subtype byte, because same_page records must be longer than 1 byte. trx_undo_page_init(): Combine the writes in WRITE record. trx_undo_header_create(): Write 4 bytes using a special MEMSET record that includes 1 bytes of length and 2 bytes of payload. flst_write_addr(): Define as a static function. Combine the writes. flst_zero_both(): Replaces two flst_zero_addr() calls. flst_init(): Do not inline the function. fsp_free_seg_inode(): Zerofill the whole inode. fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT to FIL_NULL when using the physical format. btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page() must have been invoked. fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE. fil_names_dirty_and_write(): Remove the parameter mtr. Write the records using a separate mini-transaction object, because any FILE_ records must be at the start of a mini-transaction log. recv_recover_page(): Add a fil_space_t* parameter. After applying log to the a ROW_FORMAT=COMPRESSED page, invoke buf_zip_decompress() to restore the uncompressed page. buf_page_io_complete(): Remove the temporary hack to discard the uncompressed page of a ROW_FORMAT=COMPRESSED page. page_zip_write_header(): Remove. Use mtr_t::write() or mtr_t::memset() instead, and update the compressed page frame separately. trx_undo_header_add_space_for_xid(): Remove. trx_undo_seg_create(): Perform the changes that were previously made by trx_undo_header_add_space_for_xid(). btr_reset_instant(): New function: Reset the table to MariaDB 10.2 or 10.3 format when rolling back an instant ALTER TABLE operation. page_rec_find_owner_rec(): Merge with the only callers. page_cur_insert_rec_low(): Combine writes by using a local buffer. MEMMOVE data from the preceding record whenever feasible (copying at least 3 bytes). page_cur_insert_rec_zip(): Combine writes to page header fields. PageBulk::insertPage(): Issue MEMMOVE records to copy a matching part from the preceding record. PageBulk::finishPage(): Combine the writes to the page header and to the sparse page directory slots. mtr_t::write(): Only log the least significant (last) bytes of multi-byte fields that actually differ. For updating FSP_SIZE, we must always write all 4 bytes to the redo log, so that the fil_space_set_recv_size() logic in recv_sys_t::parse() will work. mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument instead of a numeric offset to the page frame. Only log the last bytes of multi-byte fields that actually differ. In fil_space_crypt_t::write_page0(), we must log also any unchanged bytes, so that recovery will recognize the record and invoke fil_crypt_parse(). Future work: MDEV-21724 Optimize page_cur_insert_rec_low() redo logging MDEV-21725 Optimize btr_page_reorganize_low() redo logging MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
2020-02-13 19:12:17 +02:00
void fsp_apply_init_file_page(buf_block_t *block);
/** Initialize a file page.
@param[in] space tablespace
@param[in,out] block file page
@param[in,out] mtr mini-transaction */
inline void fsp_init_file_page(
#ifdef UNIV_DEBUG
const fil_space_t* space,
#endif
buf_block_t* block, mtr_t* mtr)
{
ut_d(space->modify_check(*mtr));
MDEV-15053 Reduce buf_pool_t::mutex contention User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0 and will no longer report the PAGE_STATE value READY_FOR_USE. We will remove some fields from buf_page_t and move much code to member functions of buf_pool_t and buf_page_t, so that the access rules of data members can be enforced consistently. Evicting or adding pages in buf_pool.LRU will remain covered by buf_pool.mutex. Evicting or adding pages in buf_pool.page_hash will remain covered by both buf_pool.mutex and the buf_pool.page_hash X-latch. After this fix, buf_pool.page_hash lookups can entirely avoid acquiring buf_pool.mutex, only relying on buf_pool.hash_lock_get() S-latch. Similarly, buf_flush_check_neighbors() can will rely solely on buf_pool.mutex, no buf_pool.page_hash latch at all. The buf_pool.mutex is rather contended in I/O heavy benchmarks, especially when the workload does not fit in the buffer pool. The first attempt to alleviate the contention was the buf_pool_t::mutex split in commit 4ed7082eefe56b3e97e0edefb3df76dd7ef5e858 which introduced buf_block_t::mutex, which we are now removing. Later, multiple instances of buf_pool_t were introduced in commit c18084f71b02ea707c6461353e6cfc15d7553bc6 and recently removed by us in commit 1a6f708ec594ac0ae2dd30db926ab07b100fa24b (MDEV-15058). UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool related debugging in otherwise non-debug builds has not been used for years. Instead, we have been using UNIV_DEBUG, which is enabled in CMAKE_BUILD_TYPE=Debug. buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on std::atomic and the buf_pool.page_hash latches, and in some cases depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before. We must always release buf_block_t::lock before invoking unfix() or io_unfix(), to prevent a glitch where a block that was added to the buf_pool.free list would apper X-latched. See commit c5883debd6ef440a037011c11873b396923e93c5 how this glitch was finally caught in a debug environment. We move some buf_pool_t::page_hash specific code from the ha and hash modules to buf_pool, for improved readability. buf_pool_t::close(): Assert that all blocks are clean, except on aborted startup or crash-like shutdown. buf_pool_t::validate(): No longer attempt to validate n_flush[] against the number of BUF_IO_WRITE fixed blocks, because buf_page_t::flush_type no longer exists. buf_pool_t::watch_set(): Replaces buf_pool_watch_set(). Reduce mutex contention by separating the buf_pool.watch[] allocation and the insert into buf_pool.page_hash. buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a buf_pool.page_hash latch. Replaces and extends buf_page_hash_lock_s_confirm() and buf_page_hash_lock_x_confirm(). buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES. buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads: Use Atomic_counter. buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out(). buf_pool_t::LRU_remove(): Remove a block from the LRU list and return its predecessor. Incorporates buf_LRU_adjust_hp(), which was removed. buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(), for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by BTR_DELETE_OP (purge), which is never invoked on temporary tables. buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments. buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition. buf_LRU_free_page(): Clarify the function comment. buf_flush_check_neighbor(), buf_flush_check_neighbors(): Rewrite the construction of the page hash range. We will hold the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64) consecutive lookups of buf_pool.page_hash. buf_flush_page_and_try_neighbors(): Remove. Merge to its only callers, and remove redundant operations in buf_flush_LRU_list_batch(). buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite. Do not acquire buf_pool.mutex, and iterate directly with page_id_t. ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined and avoids any loops. fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove. buf_flush_page(): Add a fil_space_t* parameter. Minimize the buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated atomically with the io_fix, and we will protect most buf_block_t fields with buf_block_t::lock. The function buf_flush_write_block_low() is removed and merged here. buf_page_init_for_read(): Use static linkage. Initialize the newly allocated block and acquire the exclusive buf_block_t::lock while not holding any mutex. IORequest::IORequest(): Remove the body. We only need to invoke set_punch_hole() in buf_flush_page() and nowhere else. buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type. This field is only used during a fil_io() call. That function already takes IORequest as a parameter, so we had better introduce for the rarely changing field. buf_block_t::init(): Replaces buf_page_init(). buf_page_t::init(): Replaces buf_page_init_low(). buf_block_t::initialise(): Initialise many fields, but keep the buf_page_t::state(). Both buf_pool_t::validate() and buf_page_optimistic_get() requires that buf_page_t::in_file() be protected atomically with buf_page_t::in_page_hash and buf_page_t::in_LRU_list. buf_page_optimistic_get(): Now that buf_block_t::mutex no longer exists, we must check buf_page_t::io_fix() after acquiring the buf_pool.page_hash lock, to detect whether buf_page_init_for_read() has been initiated. We will also check the io_fix() before acquiring hash_lock in order to avoid unnecessary computation. The field buf_block_t::modify_clock (protected by buf_block_t::lock) allows buf_page_optimistic_get() to validate the block. buf_page_t::real_size: Remove. It was only used while flushing pages of page_compressed tables. buf_page_encrypt(): Add an output parameter that allows us ot eliminate buf_page_t::real_size. Replace a condition with debug assertion. buf_page_should_punch_hole(): Remove. buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch(). Add the parameter size (to replace buf_page_t::real_size). buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page(). Add the parameter size (to replace buf_page_t::real_size). fil_system_t::detach(): Replaces fil_space_detach(). Ensure that fil_validate() will not be violated even if fil_system.mutex is released and reacquired. fil_node_t::complete_io(): Renamed from fil_node_complete_io(). fil_node_t::close_to_free(): Replaces fil_node_close_to_free(). Avoid invoking fil_node_t::close() because fil_system.n_open has already been decremented in fil_space_t::detach(). BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY. BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE, and distinguish dirty pages by buf_page_t::oldest_modification(). BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead. This state was only being used for buf_page_t that are in buf_pool.watch. buf_pool_t::watch[]: Remove pointer indirection. buf_page_t::in_flush_list: Remove. It was set if and only if buf_page_t::oldest_modification() is nonzero. buf_page_decrypt_after_read(), buf_corrupt_page_release(), buf_page_check_corrupt(): Change the const fil_space_t* parameter to const fil_node_t& so that we can report the correct file name. buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function. buf_page_io_complete(): Split to buf_page_read_complete() and buf_page_write_complete(). buf_dblwr_t::in_use: Remove. buf_dblwr_t::buf_block_array: Add IORequest::flush_t. buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of os_aio_wait_until_no_pending_writes(). buf_flush_write_complete(): Declare static, not global. Add the parameter IORequest::flush_t. buf_flush_freed_page(): Simplify the code. recv_sys_t::flush_lru: Renamed from flush_type and changed to bool. fil_read(), fil_write(): Replaced with direct use of fil_io(). fil_buffering_disabled(): Remove. Check srv_file_flush_method directly. fil_mutex_enter_and_prepare_for_io(): Return the resolved fil_space_t* to avoid a duplicated lookup in the caller. fil_report_invalid_page_access(): Clean up the parameters. fil_io(): Return fil_io_t, which comprises fil_node_t and error code. Always invoke fil_space_t::acquire_for_io() and let either the sync=true caller or fil_aio_callback() invoke fil_space_t::release_for_io(). fil_aio_callback(): Rewrite to replace buf_page_io_complete(). fil_check_pending_operations(): Remove a parameter, and remove some redundant lookups. fil_node_close_to_free(): Wait for n_pending==0. Because we no longer do an extra lookup of the tablespace between fil_io() and the completion of the operation, we must give fil_node_t::complete_io() a chance to decrement the counter. fil_close_tablespace(): Remove unused parameter trx, and document that this is only invoked during the error handling of IMPORT TABLESPACE. row_import_discard_changes(): Merged with the only caller, row_import_cleanup(). Do not lock up the data dictionary while invoking fil_close_tablespace(). logs_empty_and_mark_files_at_shutdown(): Do not invoke fil_close_all_files(), to avoid a !needs_flush assertion failure on fil_node_t::close(). innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files(). fil_close_all_files(): Invoke fil_flush_file_spaces() to ensure proper durability. thread_pool::unbind(): Fix a crash that would occur on Windows after srv_thread_pool->disable_aio() and os_file_close(). This fix was submitted by Vladislav Vaintroub. Thanks to Matthias Leich and Axel Schwenke for extensive testing, Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.
2020-06-05 12:35:46 +03:00
ut_ad(space->id == block->page.id().space());
fsp_apply_init_file_page(block);
mtr->init(block);
}
MDEV-14795 InnoDB system tablespace cannot be shrunk - Introduce the option :autoshrink attribute to be added to innodb_data_file_path variable to allow the shrinking of system tablespace during startup process. Steps for shrinking the system tablespace: 1) Find the last used extent in system tablespace by iterating through the BITMAP in extent descriptor pages 2) If the last used extent is lesser than user specified size then set desired target size to user specified size. 3) Store the page contents of "to be modified" extent descriptor pages, latches the "to be modified" extent descriptor pages and check for buffer pool memory availability 4) Make checkpoint to flush all pages in buffer pool, so that pages in flush list doesn't have to use doublewrite buffer and disable doublewrite buffer during shrinking process 5) Update the FSP_SIZE and FSP_FREE_LIMIT in header page 6) Remove the "to be truncated" pages from FSP_FREE and FSP_FREE_FRAG list 7) Reset the bitmap in the last descriptor pages for the "to be truncated" pages. 8) In case of multiple files, calculate the truncated last file size and do the truncation in last file 9) Check whether mini-transaction log size doesn't exceed the minimum value of innodb_log_buffer_size which is 2MB. In that case, replace the modified buffer pool pages with the page old content. 11) Commit the mini-transaction for shrinking the tablespace and enable/disable the doublewrite buffer depends on user specified value. recv_sys_t::apply(): Handle the truncation of system tablespace only if the recovered tablespace size is lesser than actual existing size.
2023-08-01 15:44:14 +05:30
/** Truncate the system tablespace */
void fsp_system_tablespace_truncate();
/** Truncate the temporary tablespace */
void fsp_shrink_temp_space();
#ifndef UNIV_DEBUG
# define fsp_init_file_page(space, block, mtr) fsp_init_file_page(block, mtr)
#endif
#ifdef UNIV_BTR_PRINT
/*******************************************************************//**
Writes info of a segment. */
void
fseg_print(
/*=======*/
fseg_header_t* header, /*!< in: segment header */
mtr_t* mtr); /*!< in/out: mini-transaction */
#endif /* UNIV_BTR_PRINT */
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
/** Convert FSP_SPACE_FLAGS from the buggy MariaDB 10.1.0..10.1.20 format.
@param[in] flags the contents of FSP_SPACE_FLAGS
@return the flags corrected from the buggy MariaDB 10.1 format
@retval UINT32_MAX if the flags are not in the buggy 10.1 format */
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
MY_ATTRIBUTE((warn_unused_result, const))
inline uint32_t fsp_flags_convert_from_101(uint32_t flags)
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
{
DBUG_EXECUTE_IF("fsp_flags_is_valid_failure", return UINT32_MAX;);
MDEV-12026: Implement innodb_checksum_algorithm=full_crc32 MariaDB data-at-rest encryption (innodb_encrypt_tables) had repurposed the same unused data field that was repurposed in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN) field of SPATIAL INDEX. Because of this, MariaDB was unable to support encryption on SPATIAL INDEX pages. Furthermore, InnoDB page checksums skipped some bytes, and there are multiple variations and checksum algorithms. By default, InnoDB accepts all variations of all algorithms that ever existed. This unnecessarily weakens the page checksums. We hereby introduce two more innodb_checksum_algorithm variants (full_crc32, strict_full_crc32) that are special in a way: When either setting is active, newly created data files will carry a flag (fil_space_t::full_crc32()) that indicates that all pages of the file will use a full CRC-32C checksum over the entire page contents (excluding the bytes where the checksum is stored, at the very end of the page). Such files will always use that checksum, no matter what the parameter innodb_checksum_algorithm is assigned to. For old files, the old checksum algorithms will continue to be used. The value strict_full_crc32 will be equivalent to strict_crc32 and the value full_crc32 will be equivalent to crc32. ROW_FORMAT=COMPRESSED tables will only use the old format. These tables do not support new features, such as larger innodb_page_size or instant ADD/DROP COLUMN. They may be deprecated in the future. We do not want an unnecessary file format change for them. The new full_crc32() format also cleans up the MariaDB tablespace flags. We will reserve flags to store the page_compressed compression algorithm, and to store the compressed payload length, so that checksum can be computed over the compressed (and possibly encrypted) stream and can be validated without decrypting or decompressing the page. In the full_crc32 format, there no longer are separate before-encryption and after-encryption checksums for pages. The single checksum is computed on the page contents that is written to the file. We do not make the new algorithm the default for two reasons. First, MariaDB 10.4.2 was a beta release, and the default values of parameters should not change after beta. Second, we did not yet implement the full_crc32 format for page_compressed pages. This will be fixed in MDEV-18644. This is joint work with Marko Mäkelä.
2019-02-19 21:00:00 +02:00
if (flags == 0 || fil_space_t::full_crc32(flags)) {
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
return(flags);
}
if (flags >> 18) {
/* The most significant FSP_SPACE_FLAGS bit that was ever set
by MariaDB 10.1.0 to 10.1.20 was bit 17 (misplaced DATA_DIR flag).
The flags must be less than 1<<18 in order to be valid. */
return UINT32_MAX;
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
}
if ((flags & (FSP_FLAGS_MASK_POST_ANTELOPE | FSP_FLAGS_MASK_ATOMIC_BLOBS))
== FSP_FLAGS_MASK_ATOMIC_BLOBS) {
/* If the "atomic blobs" flag (indicating
ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED) flag
is set, then the "post Antelope" (ROW_FORMAT!=REDUNDANT) flag
must also be set. */
return UINT32_MAX;
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
}
/* Bits 6..10 denote compression in MariaDB 10.1.0 to 10.1.20.
They must be either 0b00000 or 0b00011 through 0b10011.
In correct versions, these bits would be
0bd0sss where d is the DATA_DIR flag (garbage bit) and
sss is the PAGE_SSIZE (3, 4, 6, or 7).
NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret
uncompressed data files with innodb_page_size=4k or 64k as
compressed innodb_page_size=16k files. Below is an exhaustive
state space analysis.
-0by1zzz: impossible (the bit 4 must be clean; see above)
-0b101xx: DATA_DIR, innodb_page_size>4k: invalid (COMPRESSION_LEVEL>9)
+0bx0011: innodb_page_size=4k:
!!! Misinterpreted as COMPRESSION_LEVEL=9 or 1, COMPRESSION=1.
-0bx0010: impossible, because sss must be 0b011 or 0b1xx
-0bx0001: impossible, because sss must be 0b011 or 0b1xx
-0b10000: DATA_DIR, innodb_page_size=16:
invalid (COMPRESSION_LEVEL=8 but COMPRESSION=0)
+0b00111: no DATA_DIR, innodb_page_size=64k:
!!! Misinterpreted as COMPRESSION_LEVEL=3, COMPRESSION=1.
-0b00101: impossible, because sss must be 0 for 16k, not 0b101
-0b001x0: no DATA_DIR, innodb_page_size=32k or 8k:
invalid (COMPRESSION_LEVEL=3 but COMPRESSION=0)
+0b00000: innodb_page_size=16k (looks like COMPRESSION=0)
??? Could actually be compressed; see PAGE_SSIZE below */
const uint32_t level = FSP_FLAGS_GET_PAGE_COMPRESSION_LEVEL_MARIADB101(
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
flags);
if (FSP_FLAGS_GET_PAGE_COMPRESSION_MARIADB101(flags) != (level != 0)
|| level > 9) {
/* The compression flags are not in the buggy MariaDB
10.1 format. */
return UINT32_MAX;
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
}
if (!(~flags & FSP_FLAGS_MASK_ATOMIC_WRITES_MARIADB101)) {
/* The ATOMIC_WRITES flags cannot be 0b11.
(The bits 11..12 should actually never be 0b11,
because in MySQL they would be SHARED|TEMPORARY.) */
return UINT32_MAX;
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
}
/* Bits 13..16 are the wrong position for PAGE_SSIZE, and they
should contain one of the values 3,4,6,7, that is, be of the form
2018-06-11 10:33:09 +03:00
0b0011 or 0b01xx (except 0b0101).
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
In correct versions, these bits should be 0bc0se
where c is the MariaDB COMPRESSED flag
and e is the MySQL 5.7 ENCRYPTION flag
and s is the MySQL 8.0 SDI flag. MariaDB can only support s=0, e=0.
Compressed innodb_page_size=16k tables with correct FSP_SPACE_FLAGS
will be properly rejected by older MariaDB 10.1.x because they
would read as PAGE_SSIZE>=8 which is not valid. */
const uint32_t ssize = FSP_FLAGS_GET_PAGE_SSIZE_MARIADB101(flags);
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
if (ssize == 1 || ssize == 2 || ssize == 5 || ssize & 8) {
/* the page_size is not between 4k and 64k;
16k should be encoded as 0, not 5 */
return UINT32_MAX;
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
}
const uint32_t zssize = FSP_FLAGS_GET_ZIP_SSIZE(flags);
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
if (zssize == 0) {
/* not ROW_FORMAT=COMPRESSED */
} else if (zssize > (ssize ? ssize : 5)) {
/* invalid KEY_BLOCK_SIZE */
return UINT32_MAX;
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
} else if (~flags & (FSP_FLAGS_MASK_POST_ANTELOPE
| FSP_FLAGS_MASK_ATOMIC_BLOBS)) {
/* both these flags should be set for
ROW_FORMAT=COMPRESSED */
return UINT32_MAX;
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
}
flags = ((flags & 0x3f) | ssize << FSP_FLAGS_POS_PAGE_SSIZE
| FSP_FLAGS_GET_PAGE_COMPRESSION_MARIADB101(flags)
<< FSP_FLAGS_POS_PAGE_COMPRESSION);
MDEV-12026: Implement innodb_checksum_algorithm=full_crc32 MariaDB data-at-rest encryption (innodb_encrypt_tables) had repurposed the same unused data field that was repurposed in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN) field of SPATIAL INDEX. Because of this, MariaDB was unable to support encryption on SPATIAL INDEX pages. Furthermore, InnoDB page checksums skipped some bytes, and there are multiple variations and checksum algorithms. By default, InnoDB accepts all variations of all algorithms that ever existed. This unnecessarily weakens the page checksums. We hereby introduce two more innodb_checksum_algorithm variants (full_crc32, strict_full_crc32) that are special in a way: When either setting is active, newly created data files will carry a flag (fil_space_t::full_crc32()) that indicates that all pages of the file will use a full CRC-32C checksum over the entire page contents (excluding the bytes where the checksum is stored, at the very end of the page). Such files will always use that checksum, no matter what the parameter innodb_checksum_algorithm is assigned to. For old files, the old checksum algorithms will continue to be used. The value strict_full_crc32 will be equivalent to strict_crc32 and the value full_crc32 will be equivalent to crc32. ROW_FORMAT=COMPRESSED tables will only use the old format. These tables do not support new features, such as larger innodb_page_size or instant ADD/DROP COLUMN. They may be deprecated in the future. We do not want an unnecessary file format change for them. The new full_crc32() format also cleans up the MariaDB tablespace flags. We will reserve flags to store the page_compressed compression algorithm, and to store the compressed payload length, so that checksum can be computed over the compressed (and possibly encrypted) stream and can be validated without decrypting or decompressing the page. In the full_crc32 format, there no longer are separate before-encryption and after-encryption checksums for pages. The single checksum is computed on the page contents that is written to the file. We do not make the new algorithm the default for two reasons. First, MariaDB 10.4.2 was a beta release, and the default values of parameters should not change after beta. Second, we did not yet implement the full_crc32 format for page_compressed pages. This will be fixed in MDEV-18644. This is joint work with Marko Mäkelä.
2019-02-19 21:00:00 +02:00
ut_ad(fil_space_t::is_valid_flags(flags, false));
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
return(flags);
}
/** Compare tablespace flags.
@param[in] expected expected flags from dict_tf_to_fsp_flags()
@param[in] actual flags read from FSP_SPACE_FLAGS
@return whether the flags match */
MY_ATTRIBUTE((warn_unused_result))
inline bool fsp_flags_match(uint32_t expected, uint32_t actual)
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
{
expected&= ~FSP_FLAGS_MEM_MASK;
ut_ad(fil_space_t::is_valid_flags(expected, false));
return actual == expected || fsp_flags_convert_from_101(actual) == expected;
MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K The storage format of FSP_SPACE_FLAGS was accidentally broken already in MariaDB 10.1.0. This fix is bringing the format in line with other MySQL and MariaDB release series. Please refer to the comments that were added to fsp0fsp.h for details. This is an INCOMPATIBLE CHANGE that affects users of page_compression and non-default innodb_page_size. Upgrading to this release will correct the flags in the data files. If you want to downgrade to earlier MariaDB 10.1.x, please refer to the test innodb.101_compatibility how to reset the FSP_SPACE_FLAGS in the files. NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret uncompressed data files with innodb_page_size=4k or 64k as compressed innodb_page_size=16k files, and then probably fail when trying to access the pages. See the comments in the function fsp_flags_convert_from_101() for detailed analysis. Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16. In this way, compressed innodb_page_size=16k tablespaces will not be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20. Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the dict_table_t::flags when the table is available, in fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace(). During crash recovery, fil_load_single_table_tablespace() will use innodb_compression_level for the PAGE_COMPRESSION_LEVEL. FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags that are not to be written to FSP_SPACE_FLAGS. Currently, these will include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR. Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support one innodb_page_size for the whole instance. When creating a dummy tablespace for the redo log, use fil_space_t::flags=0. The flags are never written to the redo log files. Remove many FSP_FLAGS_SET_ macros. dict_tf_verify_flags(): Remove. This is basically only duplicating the logic of dict_tf_to_fsp_flags(), used in a debug assertion. fil_space_t::mark: Remove. This flag was not used for anything. fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter mark_space, and add a parameter for table flags. Check that fil_space_t::flags match the table flags, and adjust the (memory-only) flags based on the table flags. fil_node_open_file(): Remove some redundant or unreachable conditions, do not use stderr for output, and avoid unnecessary server aborts. fil_user_tablespace_restore_page(): Convert the flags, so that the correct page_size will be used when restoring a page from the doublewrite buffer. fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove. It suffices to have fil_space_is_page_compressed(). FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL, FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not exist in the FSP_SPACE_FLAGS but only in memory. fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS in page 0. Called by fil_open_single_table_tablespace(), fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql() except if --innodb-read-only is active. fsp_flags_is_valid(ulint): Reimplement from the scratch, with accurate comments. Do not display any details of detected inconsistencies, because the output could be confusing when dealing with MariaDB 10.1.x data files. fsp_flags_convert_from_101(ulint): Convert flags from buggy MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags cannot be in MariaDB 10.1.x format. fsp_flags_match(): Check the flags when probing files. Implemented based on fsp_flags_is_valid() and fsp_flags_convert_from_101(). dict_check_tablespaces_and_store_max_id(): Do not access the page after committing the mini-transaction. IMPORT TABLESPACE fixes: AbstractCallback::init(): Convert the flags. FetchIndexRootPages::operator(): Check that the tablespace flags match the table flags. Do not attempt to convert tablespace flags to table flags, because the conversion would necessarily be lossy. PageConverter::update_header(): Write back the correct flags. This takes care of the flags in IMPORT TABLESPACE.
2017-01-14 00:13:16 +02:00
}
/** Determine if FSP_SPACE_FLAGS are from an incompatible MySQL format.
@param flags the contents of FSP_SPACE_FLAGS
@return MySQL flags shifted.
@retval 0, if not a MySQL incompatible format. */
MY_ATTRIBUTE((warn_unused_result, const))
2022-12-13 18:01:49 +02:00
inline uint32_t fsp_flags_is_incompatible_mysql(uint32_t flags)
{
/*
MySQL-8.0 SDI flag (bit 14),
or MySQL 5.7 Encyption flag (bit 13)
*/
return flags >> 13 & 3;
}
/** Determine the descriptor index within a descriptor page.
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] offset page offset
@return descriptor index */
inline ulint xdes_calc_descriptor_index(ulint zip_size, ulint offset)
{
2019-04-08 21:58:18 +03:00
return ut_2pow_remainder<ulint>(offset,
zip_size ? zip_size : srv_page_size)
/ FSP_EXTENT_SIZE;
}
/** Determine the descriptor page number for a page.
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] offset page offset
@return descriptor page offset */
inline uint32_t xdes_calc_descriptor_page(ulint zip_size, uint32_t offset)
{
compile_time_assert(UNIV_PAGE_SIZE_MAX > XDES_ARR_OFFSET
+ (UNIV_PAGE_SIZE_MAX / FSP_EXTENT_SIZE_MAX)
* XDES_SIZE_MAX);
compile_time_assert(UNIV_PAGE_SIZE_MIN > XDES_ARR_OFFSET
+ (UNIV_PAGE_SIZE_MIN / FSP_EXTENT_SIZE_MIN)
* XDES_SIZE_MIN);
ut_ad(srv_page_size > XDES_ARR_OFFSET
+ (srv_page_size / FSP_EXTENT_SIZE)
* XDES_SIZE);
ut_ad(UNIV_ZIP_SIZE_MIN > XDES_ARR_OFFSET
+ (UNIV_ZIP_SIZE_MIN / FSP_EXTENT_SIZE)
* XDES_SIZE);
ut_ad(!zip_size
|| zip_size > XDES_ARR_OFFSET
+ (zip_size / FSP_EXTENT_SIZE) * XDES_SIZE);
return ut_2pow_round(offset,
uint32_t(zip_size ? zip_size : srv_page_size));
}
#endif /* UNIV_INNOCHECKSUM */
#endif