mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-17 04:22:27 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	ab0190101b	MDEV-24402: InnoDB CHECK TABLE ... EXTENDED Until now, the attribute EXTENDED of CHECK TABLE was ignored by InnoDB, and InnoDB only counted the records in each index according to the current read view. Unless the attribute QUICK was specified, the function btr_validate_index() would be invoked to validate the B-tree structure (the sibling and child links between index pages). The EXTENDED check will not only count all index records according to the current read view, but also ensure that any delete-marked records in the clustered index are waiting for the purge of history, and that all secondary index records point to a version of the clustered index record that is waiting for the purge of history. In other words, no index may contain orphan records. Normal MVCC reads and the non-EXTENDED version of CHECK TABLE would ignore these orphans. Unpurged records merely result in warnings (at most one per index), not errors, and no indexes will be flagged as corrupted due to such garbage. It will remain possible to SELECT data from such indexes or tables (which will skip such records) or to rebuild the table to reclaim some space. We introduce purge_sys.end_view that will be (almost) a copy of purge_sys.view at the end of a batch of purging committed transaction history. It is not an exact copy, because if the size of a purge batch is limited by innodb_purge_batch_size, some records that purge_sys.view would allow to be purged will be left over for subsequent batches. The purge_sys.view is relevant in the purge of committed transaction history, to determine if records are safe to remove. The new purge_sys.end_view is relevant in MVCC operations and in CHECK TABLE ... EXTENDED. It tells which undo log records are safe to access (have not been discarded at the end of a purge batch). purge_sys.clone_oldest_view<true>(): In trx_lists_init_at_db_start(), clone the oldest read view similar to purge_sys_t::clone_end_view() so that CHECK TABLE ... EXTENDED will not report bogus failures between InnoDB restart and the completed purge of committed transaction history. purge_sys_t::is_purgeable(): Replaces purge_sys_t::changes_visible() in the case that purge_sys.latch will not be held by the caller. Among other things, this guards access to BLOBs. It is not safe to dereference any BLOBs of a delete-marked purgeable record, because they may have already been freed. purge_sys_t::view_guard::view(): Return a reference to purge_sys.view that will be protected by purge_sys.latch, held by purge_sys_t::view_guard. purge_sys_t::end_view_guard::view(): Return a reference to purge_sys.end_view while it is protected by purge_sys.end_latch. Whenever a thread needs to retrieve an older version of a clustered index record, it will hold a page latch on the clustered index page and potentially also on a secondary index page that points to the clustered index page. If these pages contain purgeable records that would be accessed by a currently running purge batch, the progress of the purge batch would be blocked by the page latches. Hence, it is safe to make a copy of purge_sys.end_view while holding an index page latch, and consult the copy of the view to determine whether a record should already have been purged. btr_validate_index(): Remove a redundant check. row_check_index_match(): Check if a secondary index record and a version of a clustered index record match each other. row_check_index(): Replaces row_scan_index_for_mysql(). Count the records in each index directly, duplicating the relevant logic from row_search_mvcc(). Initialize check_table_extended_view for CHECK ... EXTENDED while holding an index leaf page latch. If we encounter an orphan record, the copy of purge_sys.end_view that we make is safe for visibility checks, and trx_undo_get_undo_rec() will check for the safety to access each undo log record. Should that check fail, we should return DB_MISSING_HISTORY to report a corrupted index. The EXTENDED check tries to match each secondary index record with every available clustered index record version, by duplicating the logic of row_vers_build_for_consistent_read() and invoking trx_undo_prev_version_build() directly. Before invoking row_check_index_match() on delete-marked clustered index record versions, we will consult purge_sys.is_purgeable() in order to avoid accessing freed BLOBs. We will always check that the DB_TRX_ID or PAGE_MAX_TRX_ID does not exceed the global maximum. Orphan secondary index records will be flagged only if everything up to PAGE_MAX_TRX_ID has been purged. We warn also about clustered index records whose nonzero DB_TRX_ID should have been reset in purge or rollback. trx_set_rw_mode(): Move an assertion from ReadView::set_creator_trx_id(). trx_undo_prev_version_build(): Remove two debug-only parameters, and return an error code instead of a Boolean. trx_undo_get_undo_rec(): Return a pointer to the undo log record, or nullptr if one cannot be retrieved. Instead of consulting the purge_sys.view, consult the purge_sys.end_view to determine which records can be accessed. trx_undo_get_rec_if_purgeable(): A variant of trx_undo_get_undo_rec() that will consult purge_sys.view instead of purge_sys.end_view. TRX_UNDO_CHECK_PURGEABILITY: A new parameter to trx_undo_prev_version_build(), passed by row_vers_old_has_index_entry() so that purge_sys.view instead of purge_sys.end_view will be consulted to determine whether a secondary index record may be safely purged. row_upd_changes_disowned_external(): Remove. This should be more expensive than briefly latching purge_sys in trx_undo_prev_version_build() (which may make use of transactional memory). row_sel_reset_old_vers_heap(): New function, split from row_sel_build_prev_vers_for_mysql(). row_sel_build_prev_vers_for_mysql(): Reorder some parameters to simplify the call to row_sel_reset_old_vers_heap(). row_search_for_mysql(): Replaced with direct calls to row_search_mvcc(). sel_node_get_nth_plan(): Define inline in row0sel.h open_step(): Define at the call site, in simplified form. sel_node_reset_cursor(): Merged with the only caller open_step(). --- ReadViewBase::check_trx_id_sanity(): Remove. Let us handle "future" DB_TRX_ID in a more meaningful way: row_sel_clust_sees(): Return DB_SUCCESS if the record is visible, DB_SUCCESS_LOCKED_REC if it is invisible, and DB_CORRUPTION if the DB_TRX_ID is in the future. row_undo_mod_must_purge(), row_undo_mod_clust(): Silently ignore corrupted DB_TRX_ID. We are in ROLLBACK, and we should have noticed that corruption when we were about to modify the record in the first place (leading us to refuse the operation). row_vers_build_for_consistent_read(): Return DB_CORRUPTION if DB_TRX_ID is in the future. Tested by: Matthias Leich Reviewed by: Vladislav Lesin	2022-10-21 10:02:54 +03:00
Marko Mäkelä	6dc157f8a6	Merge 10.5 into 10.6	2022-10-06 09:22:39 +03:00
Marko Mäkelä	de078e060e	Merge 10.4 into 10.5	2022-10-06 08:29:56 +03:00
Marko Mäkelä	1562b2c20b	MDEV-29666 InnoDB fails to purge secondary index records when indexed virtual columns exist row_purge_get_partial(): Replaces trx_undo_rec_get_partial_row(). Also copy the purge_node_t::ref to the purge_node_t::row. In this way, the clustered index key fields will always be available, even if thanks to commit `d384ead0f0` (MDEV-14799) they would no longer be repeated in the remaining part of the undo log record.	2022-10-05 09:30:33 +03:00
Marko Mäkelä	6f4d0659dd	MDEV-22388 Corrupted undo log record leads to server crash trx_undo_rec_copy(): Return nullptr if the undo record is corrupted. trx_undo_rec_get_undo_no(): Define inline with the declaration. trx_purge_dummy_rec: Replaced with a -1 pointer. row_undo_rec_get(), UndorecApplier::apply_undo_rec(): Check if trx_undo_rec_copy() returned nullptr. trx_purge_get_next_rec(): Return nullptr upon encountering any corruption, to signal the end of purge.	2022-06-22 10:04:28 +03:00
Marko Mäkelä	0b47c126e3	MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit `177d8b0c12` is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.	2022-06-06 14:03:22 +03:00
Thirunarayanan Balathandayuthapani	0f717d03b9	MDEV-28443: MDEV-15250 causes latch order violation Problem: ======= Index page latches must be acquired before undo page latches. In trx_t::apply_log(), InnoDB acquired undo log page latch before an index page latch. Solution: ======== In trx_t::apply_log(), InnoDB should copy the undo log record and release the undo log block before applying it on online indexes.	2022-04-29 12:33:02 +03:00
Thirunarayanan Balathandayuthapani	4b80c11f52	MDEV-15250 UPSERT during ALTER TABLE results in 'Duplicate entry' error for alter - InnoDB DDL results in `Duplicate entry' if concurrent DML throws duplicate key error. The following scenario explains the problem connection con1: ALTER TABLE t1 FORCE; connection con2: INSERT INTO t1(pk, uk) VALUES (2, 2), (3, 2); In connection con2, InnoDB throws the 'DUPLICATE KEY' error because of unique index. Alter operation will throw the error when applying the concurrent DML log. - Inserting the duplicate key for unique index logs the insert operation for online ALTER TABLE. When insertion fails, transaction does rollback and it leads to logging of delete operation for online ALTER TABLE. While applying the insert log entries, alter operation encounters 'DUPLICATE KEY' error. - To avoid the above fake duplicate scenario, InnoDB should not write any log for online ALTER TABLE before DML transaction commit. - User thread which does DML can apply the online log if InnoDB ran out of online log and index is marked as completed. Set online log error if apply phase encountered any error. It can also clear all other indexes log, marks the newly added indexes as corrupted. - Removed the old online code which was a part of DML operations commit_inplace_alter_table() : Does apply the online log for the last batch of secondary index log and does frees the log for the completed index. trx_t::apply_online_log: Set to true while writing the undo log if the modified table has active DDL trx_t::apply_log(): Apply the DML changes to online DDL tables dict_table_t::is_active_ddl(): Returns true if the table has an active DDL dict_index_t::online_log_make_dummy(): Assign dummy value for clustered index online log to indicate the secondary indexes are being rebuild. dict_index_t::online_log_is_dummy(): Check whether the online log has dummy value ha_innobase_inplace_ctx::log_failure(): Handle the apply log failure for online DDL transaction row_log_mark_other_online_index_abort(): Clear out all other online index log after encountering the error during row_log_apply() row_log_get_error(): Get the error happened during row_log_apply() row_log_online_op(): Does apply the online log if index is completed and ran out of memory. Returns false if apply log fails UndorecApplier: Introduced a class to maintain the undo log record, latched undo buffer page, parse the undo log record, maintain the undo record type, info bits and update vector UndorecApplier::get_old_rec(): Get the correct version of the clustered index record that was modified by the current undo log record UndorecApplier::clear_undo_rec(): Clear the undo log related information after applying the undo log record UndorecApplier::log_update(): Handle the update, delete undo log and apply it on online indexes UndorecApplier::log_insert(): Handle the insert undo log and apply it on online indexes UndorecApplier::is_same(): Check whether the given roll pointer is generated by the current undo log record information trx_t::rollback_low(): Set apply_online_log for the transaction after partially rollbacked transaction has any active DDL prepare_inplace_alter_table_dict(): After allocating the online log, InnoDB does create fulltext common tables. Fulltext index doesn't allow the index to be online. So removed the dead code of online log removal Thanks to Marko Mäkelä for providing the initial prototype and Matthias Leich for testing the issue patiently.	2022-04-25 18:52:19 +05:30
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Oleksandr Byelkin	41a163ac5c	Merge branch '10.2' into 10.3	2022-01-29 15:41:05 +01:00
Vladislav Vaintroub	47e18af906	MDEV-27494 Rename .ic files to .inl	2022-01-17 16:41:51 +01:00
Marko Mäkelä	52aac131e3	MDEV-18518 Multi-table CREATE and DROP transactions for InnoDB InnoDB used to support at most one CREATE TABLE or DROP TABLE per transaction. This caused complications for DDL operations on partitioned tables (where each partition is treated as a separate table by InnoDB) and FULLTEXT INDEX (where each index is maintained in a number of internal InnoDB tables). dict_drop_index_tree(): Extend the MDEV-24589 logic and treat the purge or rollback of SYS_INDEXES records of clustered indexes specially: by dropping the tablespace if it exists. This is the only form of recovery that we will need. trx_undo_ddl_type: Document the DDL undo log record types better. trx_t::dict_operation: Change the type to bool. trx_t::ddl: Remove. trx_t::table_id, trx_undo_t::table_id: Remove. dict_build_table_def_step(): Remove trx_t::table_id logging. dict_table_close_and_drop(), row_merge_drop_table(): Remove. row_merge_lock_table(): Merged to the only callers, which can call lock_table_for_trx() directly. fts_aux_table_t, fts_aux_id, fts_space_set_t: Remove. fts_drop_orphaned_tables(): Remove. row_merge_rename_index_to_drop(): Remove. Thanks to MDEV-24589, we can simply delete the to-be-dropped indexes from SYS_INDEXES, while still being able to roll back the operation. ha_innobase_inplace_ctx: Make a few data members const. Preallocate trx. prepare_inplace_alter_table_dict(): Simplify the logic. Let the normal rollback take care of some cleanup. row_undo_ins_remove_clust_rec(): Simplify the parsing of SYS_COLUMNS. trx_rollback_active(): Remove the special DROP TABLE logic. trx_undo_mem_create_at_db_start(), trx_undo_reuse_cached(): Always write TRX_UNDO_TABLE_ID as 0.	2021-05-04 13:48:55 +03:00
Marko Mäkelä	3cef4f8f0f	MDEV-515 Reduce InnoDB undo logging for insert into empty table We implement an idea that was suggested by Michael 'Monty' Widenius in October 2017: When InnoDB is inserting into an empty table or partition, we can write a single undo log record TRX_UNDO_EMPTY, which will cause ROLLBACK to clear the table. For this to work, the insert into an empty table or partition must be covered by an exclusive table lock that will be held until the transaction has been committed or rolled back, or the INSERT operation has been rolled back (and the table is empty again), in lock_table_x_unlock(). Clustered index records that are covered by the TRX_UNDO_EMPTY record will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot be distinguished from what MDEV-12288 leaves behind after purging the history of row-logged operations. Concurrent non-locking reads must be adjusted: If the read view was created before the INSERT into an empty table, then we must continue to imagine that the table is empty, and not try to read any records. If the read view was created after the INSERT was committed, then all records must be visible normally. To implement this, we introduce the field dict_table_t::bulk_trx_id. This special handling only applies to the very first INSERT statement of a transaction for the empty table or partition. If a subsequent statement in the transaction is modifying the initially empty table again, we must enable row-level undo logging, so that we will be able to roll back to the start of the statement in case of an error (such as duplicate key). INSERT IGNORE will continue to use row-level logging and locking, because implementing it would require the ability to roll back the latest row. Since the undo log that we write only allows us to roll back the entire statement, we cannot support INSERT IGNORE. We will introduce a handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage engines that INSERT IGNORE is being executed. In many test cases, we add an extra record to the table, so that during the 'interesting' part of the test, row-level locking and logging will be used. Replicas will continue to use row-level logging and locking until MDEV-24622 has been addressed. Likewise, this optimization will be disabled in Galera cluster until MDEV-24623 enables it. dict_table_t::bulk_trx_id: The latest active or committed transaction that initiated an insert into an empty table or partition. Protected by exclusive table lock and a clustered index leaf page latch. ins_node_t::bulk_insert: Whether bulk insert was initiated. trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert). Unlike earlier, this collection will cover also temporary tables. trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(), is_bulk_insert(), was_bulk_insert(). trx_undo_report_row_operation(): Before accessing any undo log pages, invoke trx->mod_tables.emplace() in order to determine whether undo logging was disabled, or whether this is the first INSERT and we are supposed to write a TRX_UNDO_EMPTY record. row_ins_clust_index_entry_low(): If we are inserting into an empty clustered index leaf page, set the ins_node_t::bulk_insert flag for the subsequent trx_undo_report_row_operation() call. lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock(): Remove the redundant parameter 'flags' that can be checked in the caller. btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation(). trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT), ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that the next statement will not be covered by table-level undo logging. ReadView::changes_visible(trx_id_t) const: New accessor for the case where the trx_id_t is not read from a potentially corrupted index page but directly from the memory. In this case, we can skip a sanity check. row_sel(), row_sel_try_search_shortcut(), row_search_mvcc(): row_sel_try_search_shortcut_for_mysql(), row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id. row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees(). lock_sec_rec_cons_read_sees(): Replaced with lower-level code. btr_root_page_init(): Refactored from btr_create(). dict_index_t::clear(), dict_table_t::clear(): Empty an index or table, for the ROLLBACK of an INSERT operation. ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT into an empty table. This is joint work with Thirunarayanan Balathandayuthapani, who created a working prototype. Thanks to Matthias Leich for extensive testing.	2021-01-25 18:41:27 +02:00
Marko Mäkelä	7bcaa541aa	Merge 10.4 into 10.5	2020-05-05 21:16:22 +03:00
Oleksandr Byelkin	7fb73ed143	Merge branch '10.2' into 10.3	2020-05-04 16:47:11 +02:00
Daniel Black	ba2061da52	MDEV-21595: innodb offset_t rename to rec_offs thanks to: perl -i -pe 's/\boffset_t\b/rec_offs/g' $(git grep -lw offset_t storage/innobase)	2020-04-29 12:02:47 +03:00
Marko Mäkelä	f224525204	MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC The -Wconversion in GCC seems to be stricter than in clang. GCC at least since version 4.4.7 issues truncation warnings for assignments to bitfields, while clang 10 appears to only issue warnings when the sizes in bytes rounded to the nearest integer powers of 2 are different. Before GCC 10.0.0, -Wconversion required more casts and would not allow some operations, such as x<<=1 or x+=1 on a data type that is narrower than int. GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining about x\|=y even when x and y are compatible types that are narrower than int. Hence, we must rewrite some x\|=y as x=static_cast<byte>(x\|y) or similar, or we must disable -Wconversion. In GCC 6 and later, the warning for assigning wider to bitfields that are narrower than 8, 16, or 32 bits can be suppressed by applying a bitwise & with the exact bitmask of the bitfield. For older GCC, we must disable -Wconversion for GCC 4 or 5 in such cases. The bitwise negation operator appears to promote short integers to a wider type, and hence we must add explicit truncation casts around them. Microsoft Visual C does not allow a static_cast to truncate a constant, such as static_cast<byte>(1) truncating int. Hence, we will use the constructor-style cast byte(~1) for such cases. This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0, clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019) on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).	2020-03-12 19:46:41 +02:00
Marko Mäkelä	574d8b2940	MDEV-21907: Fix most clang -Wconversion in InnoDB Declare innodb_purge_threads as 4-byte integer (UINT) instead of 4-or-8-byte (ULONG) and adjust the documentation string.	2020-03-11 08:29:48 +02:00
Marko Mäkelä	f8a9f90667	MDEV-12353: Remove support for crash-upgrade We tighten some assertions regarding dict_index_t::is_dummy and crash recovery, now that redo log processing will no longer create dummy objects.	2020-02-13 19:13:45 +02:00
Marko Mäkelä	f802c989ec	MDEV-12353: Replace MLOG_UNDO_INSERT trx_undof_page_add_undo_rec_log(): Remove. trx_undo_page_set_next_prev_and_add(), trx_undo_page_report_modify(), trx_undo_page_report_rename(): Write lower-level redo log records.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	737b701786	MDEV-12353: Remove trx_undo_erase_page_end() MariaDB stopped writing the record MLOG_UNDO_ERASE_END in commit `0fd3def284` (10.3.3). Merge trx_undo_erase_page_end() with its callers.	2020-02-13 18:19:13 +02:00
Marko Mäkelä	28c89b7151	Merge 10.4 into 10.5	2019-12-16 07:47:17 +02:00
Marko Mäkelä	3466b47b0d	Merge 10.2 into 10.3	2019-12-13 10:08:57 +02:00
Eugene Kosov	f0aa073f2b	MDEV-20950 Reduce size of record offsets offset_t: this is a type which represents one record offset. It's unsigned short int. a lot of functions: replace ulint with offset_t btr_pcur_restore_position_func(), page_validate(), row_ins_scan_sec_index_for_duplicate(), row_upd_clust_rec_by_insert_inherit_func(), row_vers_impl_x_locked_low(), trx_undo_prev_version_build(): allocate record offsets on the stack instead of waiting for rec_get_offsets() to allocate it from mem_heap_t. So, reducing memory allocations. RECORD_OFFSET, INDEX_OFFSET: now it's less convenient to store pointers in offset_t* array. One pointer occupies now several offset_t. And those constant are start indexes into array to places where to store pointer values REC_OFFS_HEADER_SIZE: adjusted for the new reality REC_OFFS_NORMAL_SIZE: increase size from 100 to 300 which means less heap allocations. And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which is smaller than previous 800 bytes. REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality rem0rec.h, rem0rec.ic, rem0rec.cc: various arguments, return values and local variables types were changed to fix numerous integer conversions issues. enum field_type_t: offset types concept was introduces which replaces old offset flags stuff. Like in earlier version, 2 upper bits are used to store offset type. And this enum represents those types. REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed get_type(), set_type(), get_value(), combine(): these are convenience functions to work with offsets and it's types rec_offs_base()[0]: still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL rec_offs_base()[i]: these have type offset_t now. Two upper bits contains type.	2019-12-13 00:26:50 +07:00
Marko Mäkelä	ea37b14409	MDEV-16678 Prefer MDL to dict_sys.latch for innodb background tasks This is joint work with Thirunarayanan Balathandayuthapani. The MDL interface between InnoDB and the rest of the server (in storage/innobase/dict/dict0dict.cc and in include/) is my work, while most everything else is Thiru's. The collection of InnoDB persistent statistics and the defragmentation were not refactored to use MDL. They will keep relying on lower-level interlocking with fil_check_pending_operations(). The purge of transaction history and the background operations on fulltext indexes will use MDL. We will revert commit `2c4844c9e7` (MDEV-17813) because thanks to MDL, purge cannot conflict with DDL operations anymore. For a similar reason, we will remove the MDEV-16222 test case from gcol.innodb_virtual_debug_purge. Purge is essentially replacing all use of the global dict_sys.latch with MDL. Purge will skip the undo log records for tables whose names start with #sql-ib or #sql2. Theoretically, such tables might be renamed back to visible table names if TRUNCATE fails to create a new table, or the final rename in ALTER TABLE...ALGORITHM=COPY fails. In that case, purge could permanently leave some garbage in the table. Such garbage will be tolerated; the table would not be considered corrupted. To avoid repeated MDL releases and acquisitions, trx_purge_attach_undo_recs() will sort undo log records by table_id, and purge_node_t will keep the MDL and table handle open for multiple successive undo log records. get_purge_table(): A new accessor, used during the purge of history for indexed virtual columns. This interface should ideally not exist at all. thd_mdl_context(): Accessor of THD::mdl_context. Wrapped in a new thd_mdl_service. dict_get_db_name_len(): Define inline. dict_acquire_mdl_shared(): Acquire explicit shared MDL on a table name if needed. dict_table_open_on_id(): Return MDL_ticket, if requested. dict_table_close(): Release MDL ticket, if requested. dict_fts_index_syncing(), dict_index_t::index_fts_syncing: Remove. row_drop_table_for_mysql() no longer needs to check these, because MDL guarantees that a fulltext index sync will not be in progress while MDL_EXCLUSIVE is protecting a DDL operation. dict_table_t::parse_name(): Parse the table name for acquiring MDL. purge_node_t::undo_recs: Change the type to std::list<trx_purge_rec_t*> (different container, and storing also roll_ptr). purge_node_t: Add mdl_ticket, last_table_id, purge_thd, mdl_hold_recs for acquiring MDL and for keeping the table open across multiple undo log records. purge_vcol_info_t, row_purge_store_vsec_cur(), row_purge_restore_vsec_cur(): Remove. We will acquire the MDL earlier. purge_sys_t::heap: Added, for reading undo log records. fts_sync_during_ddl(): Invoked during ALGORITHM=INPLACE operations to ensure that fts_sync_table() will not conflict with MDL_EXCLUSIVE. Uses fts_t::sync_message for bookkeeping.	2019-12-10 15:42:50 +02:00
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	c0ac0b8860	Update FSF address	2019-05-11 19:25:02 +03:00
Marko Mäkelä	b6f4cccd19	Merge 10.2 into 10.3	2019-05-03 20:14:09 +03:00
Marko Mäkelä	ce195987c3	MDEV-19385: Inconsistent definition of dtuple_get_nth_v_field() The accessor dtuple_get_nth_v_field() was defined differently between debug and release builds in MySQL 5.7.8 in mysql/mysql-server@c47e1751b7 and a debug assertion to document or enforce the questionable assumption tuple->v_fields == &tuple->fields[tuple->n_fields] was missing. This was apparently no problem until MDEV-11369 introduced instant ADD COLUMN to MariaDB Server 10.3. With that work present, in one test case, trx_undo_report_insert_virtual() could in release builds fetch the wrong value for a virtual column. We replace many of the dtuple_t accessors with const-preserving inline functions, and fix missing or misleadingly applied const qualifiers accordingly.	2019-05-03 20:02:50 +03:00
Marko Mäkelä	0abd2766b1	Merge 10.2 into 10.3 Also, related to MDEV-15522, MDEV-17304, MDEV-17835, remove the Galera xtrabackup tests, because xtrabackup never worked with MariaDB Server 10.3 due to InnoDB redo log format changes.	2018-11-30 09:38:56 +02:00
Marko Mäkelä	447e493179	Remove some unnecessary InnoDB #include	2018-11-29 12:53:44 +02:00
Marko Mäkelä	df563e0c03	Merge 10.2 into 10.3 main.derived_cond_pushdown: Move all 10.3 tests to the end, trim trailing white space, and add an "End of 10.3 tests" marker. Add --sorted_result to tests where the ordering is not deterministic. main.win_percentile: Add --sorted_result to tests where the ordering is no longer deterministic.	2018-11-06 09:40:39 +02:00
Marko Mäkelä	abcd09c95a	mtr_t::start(): Remove unused parameters The parameters bool sync=true, bool read_only=false of mtr_t::start() were added in `eca5b0fc17` (MySQL 5.7.3). The parameter read_only was never used anywhere. The parameter sync was only copied around, and would be returned by the unused function mtr_t::is_async(). We do not need this dead code in MariaDB.	2018-11-01 10:48:56 +02:00
Marko Mäkelä	28ae79650d	Terminology: 'metadata' not 'default rec' This follows up to commit `755187c853`. TRX_UNDO_INSERT_METADATA: Renamed from TRX_UNDO_INSERT_DEFAULT trx_undo_metadata: Renamed from trx_undo_default_rec	2018-09-19 09:12:58 +03:00
Marko Mäkelä	cf2a4426a2	MDEV-14717 RENAME TABLE in InnoDB is not crash-safe This is a backport of commit `0bc36758ba` and commit `9eb3fcc9fb`. InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2 redo log records during table-rebuilding ALGORITHM=INPLACE operations. We must write the records for any .ibd file renames, so that the operations are crash-safe. If InnoDB is killed during a RENAME TABLE operation, it can happen that the transaction for updating the data dictionary will be rolled back. But, nothing will roll back the renaming of the .ibd file (the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter, the renaming of the dict_table_t::name in the dict_sys cache. We introduce the undo log record TRX_UNDO_RENAME_TABLE to fix this. fil_space_for_table_exists_in_mem(): Remove the parameters adjust_space, table_id and some code that was trying to work around these deficiencies. fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record. dict_table_rename_in_cache(): Invoke fil_name_write_rename(). trx_undo_rec_copy(): Set the first 2 bytes to the length of the copied undo log record. trx_undo_page_report_rename(), trx_undo_report_rename(): Write a TRX_UNDO_RENAME_TABLE record with the old table name. row_rename_table_for_mysql(): Invoke trx_undo_report_rename() before modifying any data dictionary tables. row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE by invoking dict_table_rename_in_cache(), which will take care of both renaming the table and the file. ha_innobase::truncate(): Remove a work-around.	2018-09-07 22:10:02 +03:00
Marko Mäkelä	76c62bc69c	MDEV-15914: Restore MLOG_UNDO_INSERT trx_undof_page_add_undo_rec_log(): Write the MLOG_UNDO_INSERT record instead of the equivalent MLOG_2BYTES and MLOG_WRITE_STRING. This essentially reverts commit `9ee8917dfd`. In MariaDB 10.3, I attempted to simplify the crash recovery code by making use of lower-level redo log records. It turns out that we must keep the redo log parsing code in order to allow crash-upgrade from older MariaDB versions (MDEV-14848). Now, it further turns out that the InnoDB redo log record format is suboptimal for logging multiple changes to a single page. This simple change to the redo logging of undo log significantly affects the INSERT and UPDATE performance. Essentially, we wrote (space_id,page_number,MLOG_2BYTES,2 bytes) (space_id,page_number,MLOG_WRITE_STRING,N+4 bytes) instead of the previously written (space_id,page_number,MLOG_UNDO_INSERT,N+2 bytes) The added redo log volume caused a single-threaded INSERT (without innodb_adaptive_hash_index) of 1,000,000 rows to consume 11 seconds instead of 9 seconds, and a subsequent UPDATE of 30,000,000 rows to consume 64 seconds instead of 58 seconds. If we omitted all redo logging for the undo log, the INSERT would consume only 4 seconds.	2018-04-26 22:53:33 +03:00
Marko Mäkelä	145ae15a33	Merge bb-10.2-ext into 10.3	2018-01-04 09:22:59 +02:00
Marko Mäkelä	acd2862e65	MDEV-14848 MariaDB 10.3 refuses InnoDB crash-upgrade from MariaDB 10.2 While the redo log format was changed in MariaDB 10.3.2 and 10.3.3 due to MDEV-12288 and MDEV-11369, it should be technically possible to upgrade from a crashed MariaDB 10.2 instance. On a related note, it should be possible for Mariabackup 10.3 to create a backup from a running MariaDB Server 10.2. mlog_id_t: Put back the 10.2 specific redo log record types MLOG_UNDO_INSERT, MLOG_UNDO_ERASE_END, MLOG_UNDO_INIT, MLOG_UNDO_HDR_REUSE. trx_undo_parse_add_undo_rec(): Parse or apply MLOG_UNDO_INSERT. trx_undo_erase_page_end(): Apply MLOG_UNDO_ERASE_END. trx_undo_parse_page_init(): Parse or apply MLOG_UNDO_INIT. trx_undo_parse_page_header_reuse(): Parse or apply MLOG_UNDO_HDR_REUSE. recv_log_recover_10_2(): Remove. Always parse the redo log from 10.2. recv_find_max_checkpoint(), recv_recovery_from_checkpoint_start(): Always parse the redo log from MariaDB 10.2. recv_parse_or_apply_log_rec_body(): Parse or apply MLOG_UNDO_INSERT, MLOG_UNDO_ERASE_END, MLOG_UNDO_INIT. srv_prepare_to_delete_redo_log_files(), innobase_start_or_create_for_mysql(): Upgrade from a previous (supported) redo log format.	2018-01-03 19:08:50 +02:00
Marko Mäkelä	f7fd6ace18	Merge 10.2 into bb-10.2-ext	2018-01-03 15:48:47 +02:00
Marko Mäkelä	d361401bc2	Merge 10.1 into 10.2, with some MDEV-14799 fixups trx_undo_page_report_modify(): For SPATIAL INDEX, keep logging updated off-page columns twice, so that the minimum bounding rectangle (MBR) will be logged. Avoiding the redundant logging would require larger changes to the undo log format. row_build_index_entry_low(): Handle SPATIAL_UNKNOWN more robustly, by refusing to purge the record from the spatial index. We can get this code when processing old undo log from 10.2.10 or 10.2.11 (the releases affected by MDEV-14799, which was a regression from MDEV-14051).	2018-01-03 11:56:24 +02:00
Marko Mäkelä	51e4650ed0	Merge 5.5 into 10.0	2018-01-02 21:52:46 +02:00
Marko Mäkelä	d384ead0f0	MDEV-14799 After UPDATE of indexed columns, old values will not be purged from secondary indexes This is a regression caused by MDEV-14051 'Undo log record is too big.' Purge in the secondary index is wrongly skipped in row_purge_upd_exist_or_extern() because node->row only does not contain all indexed columns. trx_undo_rec_get_partial_row(): Add the parameter for node->update so that the updated columns will be copied from the initial part of the undo log record.	2018-01-02 19:11:10 +02:00
Monty	fbab79c9b8	Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext Conflicts: cmake/make_dist.cmake.in mysql-test/r/func_json.result mysql-test/r/ps.result mysql-test/t/func_json.test mysql-test/t/ps.test sql/item_cmpfunc.h	2018-01-01 19:39:59 +02:00
Vicențiu Ciorbaru	985d2d393c	Merge remote-tracking branch 'origin/10.1' into 10.2	2017-12-22 12:23:39 +02:00
Marko Mäkelä	2534b5cb99	Merge bb-10.2-ext into 10.3	2017-12-20 22:37:24 +02:00
Marko Mäkelä	0bc36758ba	MDEV-14717 RENAME TABLE in InnoDB is not crash-safe InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2 redo log records during table-rebuilding ALGORITHM=INPLACE operations. We must write the records for any .ibd file renames, so that the operations are crash-safe. If InnoDB is killed during a RENAME TABLE operation, it can happen that the transaction for updating the data dictionary will be rolled back. But, nothing will roll back the renaming of the .ibd file (the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter, the renaming of the dict_table_t::name in the dict_sys cache. We introduce the undo log record TRX_UNDO_RENAME_TABLE to fix this. fil_space_for_table_exists_in_mem(): Remove the parameters adjust_space, table_id and some code that was trying to work around these deficiencies. fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record. dict_table_rename_in_cache(): Invoke fil_name_write_rename(). trx_undo_rec_copy(): Set the first 2 bytes to the length of the copied undo log record. trx_undo_page_report_rename(), trx_undo_report_rename(): Write a TRX_UNDO_RENAME_TABLE record with the old table name. row_rename_table_for_mysql(): Invoke trx_undo_report_rename() before modifying any data dictionary tables. row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE by invoking dict_table_rename_in_cache(), which will take care of both renaming the table and the file.	2017-12-20 22:21:03 +02:00
Marko Mäkelä	f7f5c710e4	Correct a function comment The comment became stale in commit `9f57e595b4` which removed the parameter "flags".	2017-12-20 09:21:08 +02:00
Marko Mäkelä	0fd3def284	Remove MLOG_UNDO_ERASE_END	2017-12-19 15:36:36 +02:00

1 2

77 commits