- InnoDB should avoid bulk insert operation when table has active
DDL. Because bulk insert writes only one undo log as TRX_UNDO_EMPTY
and logging of concurrent DML happens at commit time uses undo log
record to parse and get the value and operation.
- Removed ROW_T_EMPTY, ROW_OP_EMPTY and their associated functions
and also the test case which tries to log the ROW_OP_EMPTY
when table has active DDL.
- InnoDB DDL results in `Duplicate entry' if concurrent DML throws
duplicate key error. The following scenario explains the problem
connection con1:
ALTER TABLE t1 FORCE;
connection con2:
INSERT INTO t1(pk, uk) VALUES (2, 2), (3, 2);
In connection con2, InnoDB throws the 'DUPLICATE KEY' error because
of unique index. Alter operation will throw the error when applying
the concurrent DML log.
- Inserting the duplicate key for unique index logs the insert
operation for online ALTER TABLE. When insertion fails,
transaction does rollback and it leads to logging of
delete operation for online ALTER TABLE.
While applying the insert log entries, alter operation
encounters 'DUPLICATE KEY' error.
- To avoid the above fake duplicate scenario, InnoDB should
not write any log for online ALTER TABLE before DML transaction
commit.
- User thread which does DML can apply the online log if
InnoDB ran out of online log and index is marked as completed.
Set online log error if apply phase encountered any error.
It can also clear all other indexes log, marks the newly
added indexes as corrupted.
- Removed the old online code which was a part of DML operations
commit_inplace_alter_table() : Does apply the online log
for the last batch of secondary index log and does frees
the log for the completed index.
trx_t::apply_online_log: Set to true while writing the undo
log if the modified table has active DDL
trx_t::apply_log(): Apply the DML changes to online DDL tables
dict_table_t::is_active_ddl(): Returns true if the table
has an active DDL
dict_index_t::online_log_make_dummy(): Assign dummy value
for clustered index online log to indicate the secondary
indexes are being rebuild.
dict_index_t::online_log_is_dummy(): Check whether the online
log has dummy value
ha_innobase_inplace_ctx::log_failure(): Handle the apply log
failure for online DDL transaction
row_log_mark_other_online_index_abort(): Clear out all other
online index log after encountering the error during
row_log_apply()
row_log_get_error(): Get the error happened during row_log_apply()
row_log_online_op(): Does apply the online log if index is
completed and ran out of memory. Returns false if apply log fails
UndorecApplier: Introduced a class to maintain the undo log
record, latched undo buffer page, parse the undo log record,
maintain the undo record type, info bits and update vector
UndorecApplier::get_old_rec(): Get the correct version of the
clustered index record that was modified by the current undo
log record
UndorecApplier::clear_undo_rec(): Clear the undo log related
information after applying the undo log record
UndorecApplier::log_update(): Handle the update, delete undo
log and apply it on online indexes
UndorecApplier::log_insert(): Handle the insert undo log
and apply it on online indexes
UndorecApplier::is_same(): Check whether the given roll pointer
is generated by the current undo log record information
trx_t::rollback_low(): Set apply_online_log for the transaction
after partially rollbacked transaction has any active DDL
prepare_inplace_alter_table_dict(): After allocating the online
log, InnoDB does create fulltext common tables. Fulltext index
doesn't allow the index to be online. So removed the dead
code of online log removal
Thanks to Marko Mäkelä for providing the initial prototype and
Matthias Leich for testing the issue patiently.
Online log for insert operation of redundant table fails with
index->is_instant() assert. Purge can reset the n_core_fields when
alter is waiting to upgrade MDL for commit phase of DDL. In the
meantime, any insert DML tries to log the operation fails with
index is not being instant.
row_log_get_n_core_fields(): Get the n_core_fields of online log
for the given index.
rec_get_converted_size_comp_prefix_low(): Use n_core_fields of online
log when InnoDB calculates the size of data tuple during redundant
row format table rebuild.
rec_convert_dtuple_to_rec_comp(): Use n_core_fields of online log
when InnoDB does the conversion of data tuple to record during
redudant row format table rebuild.
- Adding the test case which has more than 129 instant columns.
We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.
For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().
Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.
Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.
This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).
INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.
In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.
Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.
dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.
ins_node_t::bulk_insert: Whether bulk insert was initiated.
trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.
trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().
trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.
row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.
lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.
btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().
trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.
ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.
row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.
row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().
lock_sec_rec_cons_read_sees(): Replaced with lower-level code.
btr_root_page_init(): Refactored from btr_create().
dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.
ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.
This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
The nonnull attribute is not applicable to parameters that are
passed by reference, at least not in the Intel compiler.
Let us remove the reference indirection, which was only there
so that the pointer could be assigned to NULL, and let the
callers perform that task.
row_log_allocate(): Fix a bug in out-of-memory error handling
that would leave a pointer to freed memory.
offset_t: this is a type which represents one record offset.
It's unsigned short int.
a lot of functions: replace ulint with offset_t
btr_pcur_restore_position_func(),
page_validate(),
row_ins_scan_sec_index_for_duplicate(),
row_upd_clust_rec_by_insert_inherit_func(),
row_vers_impl_x_locked_low(),
trx_undo_prev_version_build():
allocate record offsets on the stack instead of waiting for rec_get_offsets()
to allocate it from mem_heap_t. So, reducing memory allocations.
RECORD_OFFSET, INDEX_OFFSET:
now it's less convenient to store pointers in offset_t*
array. One pointer occupies now several offset_t. And those constant are start
indexes into array to places where to store pointer values
REC_OFFS_HEADER_SIZE: adjusted for the new reality
REC_OFFS_NORMAL_SIZE:
increase size from 100 to 300 which means less heap allocations.
And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which
is smaller than previous 800 bytes.
REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality
rem0rec.h, rem0rec.ic, rem0rec.cc:
various arguments, return values and local variables types were changed to
fix numerous integer conversions issues.
enum field_type_t:
offset types concept was introduces which replaces old offset flags stuff.
Like in earlier version, 2 upper bits are used to store offset type.
And this enum represents those types.
REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed
get_type(), set_type(), get_value(), combine():
these are convenience functions to work with offsets and it's types
rec_offs_base()[0]:
still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL
rec_offs_base()[i]:
these have type offset_t now. Two upper bits contains type.
Also, related to MDEV-15522, MDEV-17304, MDEV-17835,
remove the Galera xtrabackup tests, because xtrabackup never worked
with MariaDB Server 10.3 due to InnoDB redo log format changes.
NULL values when there is no DEFAULT
Copy and inplace algorithm works similarly for
NULL to NOT NULL conversion for the following cases:
(1) strict sql mode - Should give error.
(2) non-strict sql mode - Should give warnings alone
(3) alter ignore table command. - Should give warnings alone.
- Allow NOT NULL constraint to replace the NULL value in the row with
explicit or implicit default value.
- If the default value is non-const value then inplace alter won't
support it.
- ALTER IGNORE will ignore the error if the concurrent DML contains
NULL value.
When MySQL 5.7 introduced indexed virtual columns, it introduced
several bugs into the online table-rebuilding ALTER, that is,
the row_log_table_apply() family of functions.
The online_log format that was introduced for online table-rebuilding
ALTER in MySQL 5.6 should be sufficient. Ideally, any indexed virtual
column values would be evaluated based on the log records in the temporary
file. There is no need to log virtual column values.
(For ADD INDEX, that is row_log_apply(), we always must log the values of
the keys, no matter if the columns are virtual.)
Because omitting the virtual column values removes any chance of
row_log_table_apply() working with indexed virtual columns, we
will for now refuse LOCK=NONE in table-rebuilding ALTER operations
when indexes on virtual columns exist. This restriction would be
lifted in MDEV-14341.
innobase_indexed_virtual_exist(): New predicate, to determine if
indexed virtual columns exist in a table definition.
ha_innobase::check_if_supported_inplace_alter(): Refuse online rebuild
if indexed virtual columns exist.
rec_get_converted_size_temp_v(), rec_convert_dtuple_to_temp_v(): Remove.
row_log_table_delete(), row_log_table_update(, row_log_table_insert():
Remove parameters for virtual columns.
trx_undo_read_v_rows(): Remove the col_map parameter.
row_log_table_apply(): Do not deal with virtual columns.
After MDEV-12288 and MDEV-13536, the DB_TRX_ID of old clustered index
records for which no history is available should be reset to 0.
This caused crashes in online table-rebuilding ALTER, because the
row_log_table_apply() is built on the assumption that the PRIMARY KEY
together with DB_TRX_ID,DB_ROLL_PTR identifies the record.
Both when copying the old table and when writing log about changes to
the old table, we must map "old" DB_TRX_ID to 0. "old" here is simply
"older than the trx_id of the ALTER TABLE transaction", because
the MDL_EXCLUSIVE (and exclusive InnoDB table lock) in
ha_innobase::prepare_inplace_alter_table() forces any transactions
accessing the table to commit or rollback. So, we know that we can
safely reset any DB_TRX_ID in the table that is older than the
transaction ID of the ALTER TABLE, because the undo log history would be
lost in a table-rebuilding ALTER.
Note: After a table-rebuilding online ALTER TABLE, the rebuilt table
may end up containing some nonzero DB_TRX_ID columns. The apply logic
identifies the rows by the combination of PRIMARY KEY and DB_TRX_ID.
These nonzero DB_TRX_ID would necessarily refer to concurrent DML
operations that were started during ha_innobase::inplace_alter_table().
row_log_allocate(): Add a parameter for the ALTER TABLE transaction.
row_log_t::min_trx: The ALTER TABLE transaction ID.
trx_id_check(): A debug function to check that DB_TRX_ID makes sense
(is either 0 or bigger than the ALTER TABLE transaction ID).
reset_trx_id[]: The reset DB_TRX_ID,DB_ROLL_PTR columns.
row_log_table_delete(), row_log_table_get_pk(): Reset the
DB_TRX_ID,DB_ROLL_PTR when they precede the ALTER TABLE transaction.
row_log_table_apply_delete(), row_log_table_apply_update():
Assert trx_id_check().
row_merge_insert_index_tuples(): Remove the unused parameter trx_id.
row_merge_read_clustered_index(): In a table-rebuilding ALTER,
reset the DB_TRX_ID,DB_ROLL_PTR when they precede the ALTER TABLE
transaction. Assert trx_id_check() on clustered index records that
are being buffered.
ATTRIBUTE_NORETURN is supported on all platforms (MSVS and GCC-like).
It declares that a function will not return; instead, the thread or
the whole process will terminate.
ATTRIBUTE_COLD is supported starting with GCC 4.3. It declares that
a function is supposed to be executed rarely. Rarely used error-handling
functions and functions that emit messages to the error log should be
tagged such.
Also, remove empty .ic files that were not removed by my MySQL commit.
Problem:
InnoDB used to support a compilation mode that allowed to choose
whether the function definitions in .ic files are to be inlined or not.
This stopped making sense when InnoDB moved to C++ in MySQL 5.6
(and ha_innodb.cc started to #include .ic files), and more so in
MySQL 5.7 when inline methods and functions were introduced
in .h files.
Solution:
Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from
all files, assuming that the symbols are never defined.
Remove the files fut0fut.cc and ut0byte.cc which only mattered when
UNIV_NONINL was defined.
Contains also:
MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE || m_lock_type != 2' failed. (branch bb-10.2-jan)
Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
enable tests that were fixed in MDEV-10549
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables
Contains also
MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7
The failure happened because 5.7 has changed the signature of
the bool handler::primary_key_is_clustered() const
virtual function ("const" was added). InnoDB was using the old
signature which caused the function not to be used.
MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7
Fixed mutexing problem on lock_trx_handle_wait. Note that
rpl_parallel and rpl_optimistic_parallel tests still
fail.
MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
Reason: incorrect merge
MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
Reason: incorrect merge
you the progress of inplace alter table and row log buffer usage
- (x 100%, it's 4-digit. 10000 means 100.00%)
- Innodb_onlineddl_rowlog_rows
Shows how many rows are stored in row log buffer.
- Innodb_onlineddl_rowlog_pct_used
Shows row log buffer usage in percent ( *100%, it's 4-digit. 10000 means 100.00% ).
- Innodb_onlineddl_pct_progress
Shows the progress of inplace alter table. It might
be not so accurate because inplace alter is highly
depend on disk and buffer pool status.
But still it is useful and better than nothing.
- Add some log for inplace alter table
XtraDB/InnoDB will print some message before and
after doing some task.
Update InnoDB to 5.6.14
Apply MySQL-5.6 hack for MySQL Bug#16434374
Move Aria-only HA_RTREE_INDEX from my_base.h to maria_def.h (breaks an assert in InnoDB)
Fix InnoDB memory leak