The function fsp_header_get_space_id() returns ulint instead of
uint32_t, only to be able to complain that the two adjacent
tablespace ID fields in the page differ. Remove the function,
and merge the check to the callers.
Also, make some more use of aligned_malloc().
InnoDB startup was discovering undo tablespaces in a dirty way.
It was reading a possibly stale copy of the TRX_SYS page before
processing any redo log records.
srv_start(): Do not call buf_pool_invalidate(). Invoke
trx_rseg_get_n_undo_tablespaces() after the recovery has been initiated.
recv_recovery_from_checkpoint_start(): Assert that the buffer pool is
empty. This used to be guaranteed by the buf_pool_invalidate() call.
trx_rseg_get_n_undo_tablespaces(): Move to the calling compilation unit,
and reimplement in a simpler way.
srv_undo_tablespace_create(): Remove the constant parameter
size=SRV_UNDO_TABLESPACE_SIZE_IN_PAGES.
srv_undo_tablespace_open(): Reimplement in a cleaner way, with
more robust error handling.
srv_all_undo_tablespaces_open(): Split from srv_undo_tablespaces_init().
srv_undo_tablespaces_init(): Read all "undo001","undo002" tablespace
files directly, without consulting the TRX_SYS page via calling
trx_rseg_get_n_undo_tablespaces().
This is joint work with Thirunarayanan Balathandayuthapani.
In commit af5947f433
the function btr_discard_page() is invoking btr_set_min_rec_mark()
with the wrong buf_block_t* object. node_ptr is on merge_block,
not block.
btr_discard_page(): Remove the variables merge_page, page, and
always refer to block->frame or merge_block->frame instead.
Also, limit the scope of node_ptr and avoid duplicated conditions.
btr_set_min_rec_mark(): Add a template parameter, so that the
caller can specify whether the page is supposed to have a left sibling.
Otherwise, the assertion (which was introduced in the same commit)
would fail in btr_discard_page().
mtr_t::memcpy(): Replaces mlog_write_string(), mlog_log_string().
The buf_block_t is passed a parameter, so that
mlog_write_initial_log_record_low() can be used instead of
mlog_write_initial_log_record_fast().
fil_space_crypt_t::write_page0(): Remove the fil_space_t* parameter.
Passing buf_block_t helps us avoid calling
mlog_write_initial_log_record_fast() and page_get_page_no(),
and allows us to implement more debug checks, such as
that on ROW_FORMAT=COMPRESSED index pages, only the page header
may be modified by MLOG_MEMSET records.
fseg_n_reserved_pages(): Add a buf_block_t parameter.
mtr_t::write(): Replaces mlog_write_ulint(), mlog_write_ull().
Optimize away writes if the page contents does not change,
except when a dummy write has been explicitly requested.
Because the member function template takes a block descriptor as a
parameter, it is possible to introduce better consistency checks.
Due to this, the code for handling file-based lists, undo logs
and user transactions was refactored to pass around buf_block_t.
page_create_write_log(), mlog_write_initial_log_record():
Merge to the only caller, and use
mlog_write_initial_log_record_low() for writing the log record.
btr_set_min_rec_mark(): Write MLOG_1BYTE instead of
MLOG_REC_MIN_MARK or MLOG_COMP_REC_MIN_MARK.
On ROW_FORMAT=COMPRESSED pages, the minimum record flag is not stored
at all. The flag is computed for the uncompressed page by
page_zip_decompress(). Hence, nothing needs to be logged for
ROW_FORMAT=COMPRESSED tables for this operation.
To facilitate crash-upgrade and hot backup from older versions,
we will retain the code to parse and apply the old log record types
MLOG_REC_MIN_MARK and MLOG_COMP_REC_MIN_MARK.
The MLOG_FILE_WRITE_CRYPT_DATA record was completely redundant.
It can be replaced with a single MLOG_WRITE_STRING record.
To facilitate upgrade from older versions, we will retain
fil_parse_write_crypt_data().
fil_crypt_parse(): Recover fil_space_crypt_t::write_page0().
fil_space_crypt_t::write_page0(): Write everything in a single
MLOG_WRITE_STRING for easy parsing.
fil_space_crypt_t::page0_offset: Remove.
The function get_wkb_of_default_point() should never have been
added, and the cleanup in commit 56ff6f1b0b
should have removed it.
The unnecessary code was added in
mysql/mysql-server@0a27b72171
and mostly disabled in
mysql/mysql-server@8f11af7734.
In MariaDB, the types DATA_POINT and DATA_VAR_POINT are never used.
Instead, DATA_GEOMETRY continues to be used since 10.2.2, introduced by
mysql/mysql-server@673bad7c7e
in MySQL 5.7.1.
Before commit c0f47a4a58 (MDEV-12026)
introduced the innodb_checksum_algorithm=full_crc32 format,
it was impossible to tell if InnoDB data files contained garbage in
the FIL_PAGE_TYPE header field (and possibly other fields).
This is because before commit 3926673ce7
in MySQL 5.1.48, InnoDB would write uninitialized data to some fields,
and because there was no way to tell with which InnoDB version a data
file was created.
If fil_space_t::full_crc32() holds, the data file cannot contain
uninitialized garbage or invalid FIL_PAGE_TYPE, and thus
fil_block_check_type() should not be invoked to correct anything.
Introduce memcpy_aligned<N>(), memcmp_aligned<N>(), memset_aligned<N>()
and use them for accessing InnoDB page header fields that are known
to be aligned.
MY_ASSUME_ALIGNED(): Wrapper for the GCC/clang __builtin_assume_aligned().
Nothing similar seems to exist in Microsoft Visual Studio, and the
C++20 std::assume_aligned is not available to us yet.
Explicitly specified alignment guarantees allow compilers to generate
faster code on platforms with strict alignment rules, instead of
emitting calls to potentially unaligned memcpy(), memcmp(), or memset().
We must relax too strict debug assertions. For latin1_swedish_ci,
mtype=DATA_CHAR or mtype=DATA_VARCHAR will be used instead of
mtype=DATA_MYSQL or mtype=DATA_VARMYSQL. Likewise, some changes of
dtype_get_charset_coll() do not affect the data type encoding,
but only any indexes that are defined on the column.
Charset::same_encoding(): Check whether two charset-collations have
the same character set encoding.
dict_col_t::same_encoding(): Check whether two character columns
have the same character set encoding.
dict_col_t::same_type(): Check whether two columns have a compatible
data type encoding.
dict_col_t::same_format(), dict_table_t::instant_column(): Do not
compare mtype or the charset-collation of prtype directly.
Rely on dict_col_t::same_type() instead.
dtype_get_charset_coll(): Narrow the return type to uint16_t.
This is a refined version of a fix that was developed by
Thirunarayanan Balathandayuthapani.
At each mini-transaction commit, the log sequence number of the
mini-transaction must be written to each modified page, so that
it will be available in the FIL_PAGE_LSN field when the page is
being read in crash recovery.
InnoDB was unnecessarily allocating redundant storage for the
field, in buf_page_t::newest_modification. Let us access
FIL_PAGE_LSN directly.
Furthermore, on ALTER TABLE...IMPORT TABLESPACE, let us write
0 to FIL_PAGE_LSN instead of using log_sys.lsn.
buf_flush_init_for_writing(), buf_flush_update_zip_checksum(),
fil_encrypt_buf_for_full_crc32(), fil_encrypt_buf(),
fil_space_encrypt(): Remove the parameter lsn.
buf_page_get_newest_modification(): Merge with the only caller.
buf_tmp_reserve_compression_buf(), buf_tmp_page_encrypt(),
buf_page_encrypt(): Define static in the same compilation unit
with the only caller.
PageConverter::m_current_lsn: Remove. Write 0 to FIL_PAGE_LSN
on ALTER TABLE...IMPORT TABLESPACE.
Currently InnoDB uses internal parser for adding foreign keys. Remove
internal parser and use data parsed by SQL parser (sql_yacc) for
adding foreign keys.
- create_table_info_t::create_foreign_keys() replacement for
dict_create_foreign_constraints_low();
- Pass constraint name via Foreign_key object.
Temporary until MDEV-20865:
- Pass alter_info as part of create_info.
btr_cur_instant_init_low(): Accurately parse the metadata record
header for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT. CHAR columns
used to be unnecessarily written as nonempty strings of bytes.
buf_read_ibuf_merge_pages(): Discard any page numbers that are
outside the current bounds of the tablespace, by invoking the
function ibuf_delete_recs() that was introduced in MDEV-20934.
This could avoid an infinite change buffer merge loop on
innodb_fast_shutdown=0, because normally the change buffer merge
would only be attempted if a page was successfully loaded into
the buffer pool.
dict_drop_index_tree(): Add the parameter trx_t*.
To prevent the DROP TABLE crash, do not invoke btr_free_if_exists()
if the entire .ibd file will be dropped. Thus, we will avoid a crash
if the BTR_SEG_LEAF or BTR_SEG_TOP of the index is corrupted,
and we will also avoid unnecessarily accessing the to-be-dropped
tablespace via the buffer pool.
In MariaDB 10.2, we disable the DROP TABLE fix if innodb_safe_truncate=0,
because the backup-unsafe MySQL 5.7 WL#6501 form of TRUNCATE TABLE
requires that the individual pages be freed inside the tablespace.
When commit 09af00cbde
removed the crash-upgrade logic of old TRUNCATE TABLE
from MariaDB 10.2 and 10.3, it actually made the return
value of dict_drop_index_tree() redundant.
After MDEV-11556, not even crash recovery should attempt to access
non-existing pages. But, buf_load() is not validating its input
and must thus be able to ignore missing pages, so that is why
buf_read_page_background() does that.
Almost all threads have gone
- the "ticking" threads, that sleep a while then do some work)
(srv_monitor_thread, srv_error_monitor_thread, srv_master_thread)
were replaced with timers. Some timers are periodic,
e.g the "master" timer.
- The btr_defragment_thread is also replaced by a timer , which
reschedules it self when current defragment "item" needs throttling
- the buf_resize_thread and buf_dump_threads are substitutes with tasks
Ditto with page cleaner workers.
- purge workers threads are not tasks as well, and purge cleaner
coordinator is a combination of a task and timer.
- All AIO is outsourced to tpool, Innodb just calls thread_pool::submit_io()
and provides the callback.
- The srv_slot_t was removed, and innodb_debug_sync used in purge
is currently not working, and needs reimplementation.
Apart from page latches (buf_block_t::lock), mini-transactions
are keeping track of at most one dict_index_t::lock and
fil_space_t::latch at a time, and in a rare case, purge_sys.latch.
Let us introduce interfaces for acquiring an index latch
or a tablespace latch.
In a later version, we may want to introduce mtr_t members
for holding a latched dict_index_t* and fil_space_t*,
and replace the remaining use of mtr_t::m_memo
with std::set<buf_block_t*> or with a map<buf_block_t*,byte*>
pointing to log records.
In the test innodb.instant_alter,4k we would be flagging an error
for too large row size. That error was previously only being reported
if the table was being rebuilt. Thus, this merge is fixing a small
omission in MDEV-11369 (instant ADD COLUMN).
Move row size check to early CREATE/ALTER TABLE phase. Stop checking
on table open.
dict_index_add_to_cache(): remove parameter 'strict', stop checking row size
dict_index_t::record_size_info_t: this is a result of row size check operation
create_table_info_t::row_size_is_acceptable(): performs row size check.
Issues error or warning. Writes first overflow field to InnoDB log.
create_table_info_t::create_table(): add row size check
dict_index_t::record_size_info(): this is a refactored version
of dict_index_t::rec_potentially_too_big(). New version doesn't change global
state of a program but return all interesting info. And it's callers who
decide how to handle row size overflow.
dict_index_t::rec_potentially_too_big(): removed