The accessor dtuple_get_nth_v_field() was defined differently between
debug and release builds in MySQL 5.7.8 in
mysql/mysql-server@c47e1751b7
and a debug assertion to document or enforce the questionable assumption
tuple->v_fields == &tuple->fields[tuple->n_fields] was missing.
This was apparently no problem until MDEV-11369 introduced instant
ADD COLUMN to MariaDB Server 10.3. With that work present, in one
test case, trx_undo_report_insert_virtual() could in release builds
fetch the wrong value for a virtual column.
We replace many of the dtuple_t accessors with const-preserving
inline functions, and fix missing or misleadingly applied const
qualifiers accordingly.
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
The parameters bool sync=true, bool read_only=false of mtr_t::start()
were added in
eca5b0fc17
(MySQL 5.7.3).
The parameter read_only was never used anywhere.
The parameter sync was only copied around, and would be returned
by the unused function mtr_t::is_async().
We do not need this dead code in MariaDB.
Rename the 10.2-specific configuration option innodb_unsafe_truncate
to innodb_safe_truncate, and invert its value.
The default (for now) is innodb_safe_truncate=OFF, to avoid
disrupting users with an undo and redo log format change within
a Generally Available (GA) release series.
While MariaDB Server 10.2 is not really guaranteed to be compatible
with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format
change that could be present in XtraBackup, but was reverted from
MariaDB in MDEV-12289), we do not want to disrupt users who have
deployed xtrabackup and MariaDB Server 10.2 in their environments.
With this change, MariaDB 10.2 will continue to use the backup-unsafe
TRUNCATE TABLE code, so that neither the undo log nor the redo log
formats will change in an incompatible way.
Undo tablespace truncation will keep using the redo log only. Recovery
or backup with old code will fail to shrink the undo tablespace files,
but the contents will be recovered just fine.
In the MariaDB Server 10.2 series only, we introduce the configuration
parameter innodb_unsafe_truncate and make it ON by default. To allow
MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE
operations, use loose_innodb_unsafe_truncate=OFF.
MariaDB Server 10.3.10 and later releases will always use the
backup-safe TRUNCATE TABLE, and this parameter will not be
added there.
recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables()
unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan
tables if RENAME operations are not transactional within InnoDB.
LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT.
log_init(), log_group_file_header_flush(),
srv_prepare_to_delete_redo_log_files(),
innobase_start_or_create_for_mysql(): Choose the redo log format
and subformat based on the value of innodb_unsafe_truncate.
This follows up to commit 755187c853.
TRX_UNDO_INSERT_METADATA: Renamed from TRX_UNDO_INSERT_DEFAULT
trx_undo_metadata: Renamed from trx_undo_default_rec
For instant ALTER TABLE, we store a hidden metadata record at the
start of the clustered index, to indicate how the format of the
records differs from the latest table definition.
The term 'default row' is too specific, because it applies to
instant ADD COLUMN only, and we will be supporting more classes
of instant ALTER TABLE later on. For instant ADD COLUMN, we
store the initial default values in the metadata record.
This is a backport of commit 0bc36758ba
and commit 9eb3fcc9fb.
InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2
redo log records during table-rebuilding ALGORITHM=INPLACE operations.
We must write the records for any .ibd file renames, so that the
operations are crash-safe.
If InnoDB is killed during a RENAME TABLE operation, it can happen that
the transaction for updating the data dictionary will be rolled back.
But, nothing will roll back the renaming of the .ibd file
(the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter,
the renaming of the dict_table_t::name in the dict_sys cache. We introduce
the undo log record TRX_UNDO_RENAME_TABLE to fix this.
fil_space_for_table_exists_in_mem(): Remove the parameters
adjust_space, table_id and some code that was trying to work around
these deficiencies.
fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record.
dict_table_rename_in_cache(): Invoke fil_name_write_rename().
trx_undo_rec_copy(): Set the first 2 bytes to the length of the
copied undo log record.
trx_undo_page_report_rename(), trx_undo_report_rename():
Write a TRX_UNDO_RENAME_TABLE record with the old table name.
row_rename_table_for_mysql(): Invoke trx_undo_report_rename()
before modifying any data dictionary tables.
row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE
by invoking dict_table_rename_in_cache(), which will take care
of both renaming the table and the file.
ha_innobase::truncate(): Remove a work-around.
Remove unused InnoDB function parameters and functions.
i_s_sys_virtual_fill_table(): Do not allocate heap memory.
mtr_is_block_fix(): Replace with mtr_memo_contains().
mtr_is_page_fix(): Replace with mtr_memo_contains_page().
trx_undof_page_add_undo_rec_log(): Write the MLOG_UNDO_INSERT
record instead of the equivalent MLOG_2BYTES and MLOG_WRITE_STRING.
This essentially reverts commit 9ee8917dfd.
In MariaDB 10.3, I attempted to simplify the crash recovery code
by making use of lower-level redo log records. It turns out that
we must keep the redo log parsing code in order to allow crash-upgrade
from older MariaDB versions (MDEV-14848).
Now, it further turns out that the InnoDB redo log record format is
suboptimal for logging multiple changes to a single page. This simple
change to the redo logging of undo log significantly affects the
INSERT and UPDATE performance.
Essentially, we wrote
(space_id,page_number,MLOG_2BYTES,2 bytes)
(space_id,page_number,MLOG_WRITE_STRING,N+4 bytes)
instead of the previously written
(space_id,page_number,MLOG_UNDO_INSERT,N+2 bytes)
The added redo log volume caused a single-threaded INSERT
(without innodb_adaptive_hash_index) of
1,000,000 rows to consume 11 seconds instead of 9 seconds,
and a subsequent UPDATE of 30,000,000 rows to consume 64 seconds
instead of 58 seconds. If we omitted all redo logging for the
undo log, the INSERT would consume only 4 seconds.
The trx_t::undo_mutex covered both some main-memory data structures
(trx_undo_t) and access to undo pages. The trx_undo_t is only
accessed by the thread that is associated with a running transaction.
Likewise, each transaction has its private set of undo pages.
The thread that is associated with an active transaction may
lock multiple undo pages concurrently, but no other thread may
lock multiple pages of a foreign transaction.
Concurrent access to the undo logs of an active transaction is possible,
but trx_undo_get_undo_rec_low() only locks one undo page at a time,
without ever holding any undo_mutex.
It seems that the trx_t::undo_mutex would have been necessary if
multi-threaded execution or rollback of a single transaction
had been implemented in InnoDB.
Problem:
=======
InnoDB cleans all temporary undo logs during commit. During rollback
of secondary index entry, InnoDB tries to build the previous version
of clustered index. It leads to access of freed undo page during
previous transaction commit and it leads to undo log corruption.
Solution:
=========
During rollback, temporary undo logs should not try to build
the previous version of the record.
The rollback of the modification of a pre-existing record
should involve a purge-like operation. Before MDEV-12288
the only purge-like operation was the removal of a
delete-marked record.
After MDEV-12288, any rollback of updating an existing record
must reset the DB_TRX_ID column when it is no longer visible
in the purge read view.
row_vers_must_preserve_del_marked(): Remove. It is cleaner to
perform the check directly in row0umod.cc.
row_trx_id_offset(): Auxiliary function to retrieve the byte
offset of DB_TRX_ID in a clustered index leaf page record.
row_undo_mod_must_purge(): Determine if a record should be purged.
row_undo_mod_clust(): For temporary tables, skip the purge checks.
When rolling back an update so that the original record was not
delete-marked, reset DB_TRX_ID if the history is no longer visible.
trx_rsegf_get(), trx_undo_get_first_rec(): Change the parameter to
fil_space_t* so that fewer callers need to be adjusted.
trx_undo_free_page(): Remove the redundant parameter 'space'.
Some fields in system-versioned table may be unversioned.
SQL layer marks unversioned.
And this patch makes InnoDB mark unversioned too because of two reasons:
1) by default fields are versioned
2) most of fields are expected to be versioned
dtype_t::vers_sys_field(): fixed return true on row_start/row_end
dict_col_t::vers_sys_field(): fixed return true on row_start/row_end
There is only one purge_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.
purge_sys_t::create(): The deferred constructor.
purge_sys_t::close(): The early destructor.
undo::Truncate::create(): The deferred constructor.
Because purge_sys.undo_trunc is constructed before the start-up
parameters are parsed, the normal constructor would copy a
wrong value of srv_purge_rseg_truncate_frequency.
TrxUndoRsegsIterator: Do not forward-declare an inline constructor,
because the static construction of purge_sys.rseg_iter would not have
access to it.
Rollback attempted to dereference DB_ROLL_PTR=0, which cannot possibly
be a valid undo log pointer. A safe canonical value would be
roll_ptr_t(1) << ROLL_PTR_INSERT_FLAG_POS
which is what was chosen in MDEV-12288.
This bug was reproduced in 10.3 only. Potentially, the problem could
have been introduced by MDEV-11415, which suppresses undo logging for
ALGORITHM=COPY operations. In those operations, we should actually
have written the safe value of DB_ROLL_PTR instead of writing 0.
However, the test in commit 5421e3aee7
demonstrates that access to the rebuilt table by earlier-started
transactions should actually have been refused with ER_TABLE_DEF_CHANGED.
btr_cur_ins_lock_and_undo(): When undo logging is disabled, use the
safe value of DB_ROLL_PTR.
btr_cur_optimistic_insert(): Validate the DB_TRX_ID,DB_ROLL_PTR before
inserting into a clustered index leaf page.
ins_node_t::sys_buf[]: Replaces row_id_buf and trx_id_buf and some
heap usage.
row_ins_alloc_sys_fields(): Initialize ins_node_t::sys_buf[].
trx_undo_page_report_modify(): Assert that the DB_ROLL_PTR is not 0.
trx_undo_get_undo_rec_low(): Assert that the roll_ptr is valid before
trying to dereference it.
dict_index_t::is_primary(): Check if the index is the primary key.
Rollback attempted to dereference DB_ROLL_PTR=0, which cannot possibly
be a valid undo log pointer. A safer canonical value would be
roll_ptr_t(1) << ROLL_PTR_INSERT_FLAG_POS
which is what was chosen in MDEV-12288, corresponding to reset_trx_id.
No deterministic test case for the bug was found. The simplest test
cases may be related to MDEV-11415, which suppresses undo logging for
ALGORITHM=COPY operations. In those operations, in the spirit of
MDEV-12288, we should actually have written reset_trx_id instead of
using the transaction identifier of the current transaction
(and a bogus value of DB_ROLL_PTR=0). However, thanks to MySQL Bug#28432
which I had fixed in MySQL 5.6.8 as part of WL#6255, access to the
rebuilt table by earlier-started transactions should actually have been
refused with ER_TABLE_DEF_CHANGED.
reset_trx_id: Move the definition to data0type.cc and the declaration
to data0type.h.
btr_cur_ins_lock_and_undo(): When undo logging is disabled, use the
safe value that corresponds to reset_trx_id.
btr_cur_optimistic_insert(): Validate the DB_TRX_ID,DB_ROLL_PTR before
inserting into a clustered index leaf page.
ins_node_t::sys_buf[]: Replaces row_id_buf and trx_id_buf and some
heap usage.
row_ins_alloc_sys_fields(): Init ins_node_t::sys_buf[] to reset_trx_id.
row_ins_buf(): Only if undo logging is enabled, copy trx->id
to node->sys_buf. Otherwise, rely on the initialization in
row_ins_alloc_sys_fields().
row_purge_reset_trx_id(): Invoke mlog_write_string() with reset_trx_id
directly. (No functional change.)
trx_undo_page_report_modify(): Assert that the DB_ROLL_PTR is not 0.
trx_undo_get_undo_rec_low(): Assert that the roll_ptr is valid before
trying to dereference it.
dict_index_t::is_primary(): Check if the index is the primary key.
PageConverter::adjust_cluster_record(): Fix
MDEV-15249 Crash in MVCC read after IMPORT TABLESPACE
by resetting the system fields to reset_trx_id instead of writing
the current transaction ID (which will be committed at the
end of the IMPORT TABLESPACE) and DB_ROLL_PTR=0.
This can partially be viewed as a follow-up fix of MDEV-12288,
because IMPORT should already then have written
DB_TRX_ID=0 and DB_ROLL_PTR=1<<55 to prevent unnecessary
DB_TRX_ID lookups in subsequent accesses to the table.
With trx_sys_t::rw_trx_ids removal, MVCC snapshot overhead became
slightly higher. That is instead of copying an array we now have to
iterate LF_HASH. All this done under trx_sys.mutex protection.
This patch moves MVCC snapshot out of trx_sys.mutex.
Clean-ups:
Removed MVCC: doesn't make too much sense to keep it in a separate class
anymore.
Refactored ReadView so that it now calls register()/deregister() routines
(it was vice versa before).
ReadView doesn't have friends anymore. :(
Even less trx_sys.mutex references.
MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY
Move a test from innodb.rename_table_debug to innodb.alter_copy.
ha_innobase::extra(HA_EXTRA_BEGIN_ALTER_COPY): Register id-versioned
tables so that mysql.transaction_registry will be updated, even for
empty tables that are subjected to ALTER TABLE…ALGORITHM=COPY.
If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend
a lot of time rolling back writes to the intermediate copy of the table.
To reduce the amount of busy work done, a work-around was introduced in
commit fd069e2bb3 in MySQL 4.1.8 and 5.0.2,
to commit the transaction after every 10,000 inserted rows.
A proper fix would have been to disable the undo logging altogether and
to simply drop the intermediate copy of the table on subsequent server
startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585.
In MariaDB 10.2, the intermediate copy of the table would be left behind
with a name starting with the string #sql.
This is a backport of a bug fix from MySQL 8.0.0 to MariaDB,
contributed by jixianliang <271365745@qq.com>.
Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation
InnoDB must for now keep the undo logging enabled, so that the latest
row can be rolled back in case of an error.
In Galera cluster, the LOAD DATA statement will retain the existing
behaviour and commit the transaction after every 10,000 rows if
the parameter wsrep_load_data_splitting=ON is set. The logic to do
so (the wsrep_load_data_split() function and the call
handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work
by Ji Xianliang and Marko Mäkelä.
The original fix:
Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date: Wed Dec 2 16:09:15 2015 +0530
Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY
Problem:
During ALTER TABLE, we commit and restart the transaction for every
10,000 rows, so that the rollback after recovery would not take so long.
Fix:
Suppress the undo logging during copy alter operation. If fts_index is
present then insert directly into fts auxiliary table rather
than doing at commit time.
ha_innobase::num_write_row: Remove the variable.
ha_innobase::write_row(): Remove the hack for committing every 10000 rows.
row_lock_table_for_mysql(): Remove the extra 2 parameters.
lock_get_src_table(), lock_is_table_exclusive(): Remove.
Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com>
Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
Remove unnecessary repeated lookups for undo pages.
trx_undo_assign(), trx_undo_assign_low(), trx_undo_seg_create(),
trx_undo_create(): Return the undo log block to the caller.
Inside InnoDB, each mini-transaction that generates any redo log records
will acquire log_sys->mutex during mtr_t::commit() in order to copy the
records into the global log_sys->buf for writing into the redo log file.
For single-row transactions, this incurs quite a bit of overhead.
We would use two mini-transactions for writing a record into a
freshly updated undo log page. (Only if the undo record will
not fit in that page, then we will have to commit and restart
the mini-transaction.)
trx_undo_assign(): Assign undo log for a persistent transaction,
or return the already assigned one.
trx_undo_assign_low(): Assign undo log for an operation on a
persistent or temporary table.
trx_undo_create(), trx_undo_reuse_cached(): Remove redundant parameters.
Merge the logic from trx_undo_mark_as_dict_operation().
Only invoke set_versioned() on trx_id versioned tables.
dict_table_t::versioned_by_id(): New accessor, to determine if
a table is system versioned by transaction ID.
There is only one transaction system object in InnoDB.
Allocate the storage for it at link time, not at runtime.
lock_rec_fetch_page(): Use the correct fetch mode BUF_GET.
Pages may never be deallocated from a tablespace while
record locks are pointing to them.