Problem:
=======
In slave_parallel_mode=optimistic configuration, when admin commands and
DML operation on the same table are scheduled simultaneously for execution,
it results in lock conflict and slave server either hangs due to
deadlock or goes down with an assert.
Analysis:
========
Admin commands OPTIMIZE, REPAIR and ANALYZE are written to binary log as
ordinary transactions. When 'slave_parallel_mode' is 'optimistic' DMLs are
allowed to run in parallel. But these locks are not detected by parallel
replication deadlock detection-and-handling mechanism. At times they result
in deadlock or assertion.
Fix:
===
Flag admin commands as DDL in Gtid_log_event at the time of writing to
binary log. Add a new bit EXECUTED_TABLE_ADMIN_CMD to
'm_unsafe_rollback_flags'. During 'mysql_admin_table' command execution it
accepts a list of tables to be processed and executes them in a loop. Upon
successful execution enable 'EXECUTED_TABLE_ADMIN_CMD' bit in
thd->transaction.stmt_unsafe_rollback_flags. Gtid_log_event constructor
will notice this flag and mark the current transaction with 'FL_DDL' flag.
Gtid_log_events marked as FL_DDL will not be scheduled parallel execution,
on the slave. They will execute in isolation to prevent deadlocks.
Note: Removed the call to 'trans_commit_implicit' from 'mysql_admin_table'
function as 'mysql_execute_command' will take care of invoking
'trans_commit_implicit'.
This is a backport of
commit fd9ca2a742 (MDEV-23295) and
commit 9a156e1a23 (MDEV-23345) to 10.3.
An instant ADD/DROP/reorder column could create a dummy table
object with the wrong ROW_FORMAT when innodb_default_row_format
was changed between CREATE TABLE and ALTER TABLE.
prepare_inplace_alter_table_dict(): If we had promised that
ALGORITHM=INPLACE is supported, we must preserve the ROW_FORMAT.
The rest of the changes are related to adding
Alter_inplace_info::inplace_supported to cache the return value of
handler::check_if_supported_inplace_alter().
Adds an implementation for SELECT ... FOR UPDATE SKIP LOCKED /
SELECT ... LOCK IN SHARED MODE SKIP LOCKED
This is implemented only InnoDB at the moment, not in RockDB yet.
This adds a new hander flag HA_CAN_SKIP_LOCKED than
will be used when the storage engine advertises the flag.
When a storage engine indicates this flag it will get
TL_WRITE_SKIP_LOCKED and TL_READ_SKIP_LOCKED transaction types.
The Lex structure has been updated to store both the FOR UPDATE/LOCK IN
SHARE as well as the SKIP LOCKED so the SHOW CREATE VIEW
implementation is simplier.
"SELECT FOR UPDATE ... SKIP LOCKED" combined with CREATE TABLE AS or
INSERT.. SELECT on the result set is not safe for STATEMENT based
replication. MIXED replication will replicate this as row based events."
Thanks to guidance from Facebook commit
193896c466
This helped verify basic test case, and components that need implementing
(even though every part was implemented differently).
Thanks Marko for guidance on simplier InnoDB implementation.
Reviewers: Marko, Monty
Problem:
The problem happened because of a conceptual flaw in the server code:
a. The table level CHARSET/COLLATE clause affected all data types,
including numeric and temporal ones:
CREATE TABLE t1 (a INT) CHARACTER SET utf8 [COLLATE utf8_general_ci];
In the above example, the Column_definition_attributes
(and then the FRM record) for the column "a" erroneously inherited
"utf8" as its character set.
b. The "ALTER TABLE t1 CONVERT TO CHARACTER SET csname" statement
also erroneously affected Column_definition_attributes::charset
for numeric and temporal data types and wrote "csname" as their
character set into FRM files.
So now we have arbitrary non-relevant charset ID values for numeric
and temporal data types in all FRM files in the world :)
The code in the server and the other engines did not seem to be affected
by this flaw. Only InnoDB inplace ALTER was affected.
Solution:
Fixing the code in the way that only character string data types
(CHAR,VARCHAR,TEXT,ENUM,SET):
- inherit the table level CHARSET/COLLATE clause
- get the charset value according to "CONVERT TO CHARACTER SET csname".
Numeric and temporal data types now always get &my_charset_numeric
in Column_definition_attributes::charset and always write its ID into FRM files:
- no matter what the table level CHARSET/COLLATE clause is, and
- no matter what "CONVERT TO CHARACTER SET" says.
Details:
1. Adding helper classes to pass small parts of HA_CREATE_INFO
into Type_handler methods:
- Column_derived_attributes - to pass table level CHARSET/COLLATE,
so columns that do not have explicit CHARSET/COLLATE clauses
can derive them from the table level, e.g.
CREATE TABLE t1 (a VARCHAR(1), b CHAR(1)) CHARACTER SET utf8;
- Column_bulk_alter_attributes - to pass bulk attribute changes
generated by the ALTER related code. These bulk changes affect
multiple columns at the same time:
ALTER TABLE ... CONVERT TO CHARACTER SET csname;
Note, passing the whole HA_CREATE_INFO directly to Type_handler
would not be good: HA_CREATE_INFO is huge and would need not desired
dependencies in sql_type.h and sql_type.cc. The Type_handler API should
use smallest possible data types!
2. Type_handler::Column_definition_prepare_stage1() is now responsible
to set Column_definition::charset properly, according to the data type,
for example:
- For string data types, Column_definition_attributes::charset is set from
the table level CHARSET/COLLATE clause (if not specified explicitly in
the column definition).
- For numeric and temporal fields, Column_definition_attributes::charset is
set to &my_charset_numeric, no matter what the table level
CHARSET/COLLATE says.
- For GEOMETRY, Column_definition_attributes::charset is set to
&my_charset_bin, no matter what the table level CHARSET/COLLATE says.
Previously this code (setting `charset`) was outside of of
Column_definition_prepare_stage1(), namely in
mysql_prepare_create_table(), and was erroneously called for
all data types.
3. Adding Type_handler::Column_definition_bulk_alter(), to handle
"ALTER TABLE .. CONVERT TO". Previously this code was inside
get_sql_field_charset() and was erroneously called for all data types.
4. Removing the Schema_specification_st parameter from
Type_handler::Column_definition_redefine_stage1().
Column_definition_attributes::charset is now fully properly initialized by
Column_definition_prepare_stage1(). So we don't need access to the
table level CHARSET/COLLATE clause in Column_definition_redefine_stage1()
any more.
5. Other changes:
- Removing global function get_sql_field_charset()
- Moving the part of the former get_sql_field_charset(), which was
responsible to inherit the table level CHARSET/COLLATE clause to
new methods:
-- Column_definition_attributes::explicit_or_derived_charset() and
-- Column_definition::prepare_charset_for_string().
This code is only needed for string data types.
Previously it was erroneously called for all data types.
- Moving another part, which was responsible to apply the
"CONVERT TO" clause, to
Type_handler_general_purpose_string::Column_definition_bulk_alter().
- Replacing the call for get_sql_field_charset() in sql_partition.cc
to sql_field->explicit_or_derived_charset() - it is perfectly enough.
The old code was redundant: get_sql_field_charset() was called from
sql_partition.cc only when there were no a "CONVERT TO CHARACTER SET"
clause involved, so its purpose was only to inherit the table
level CHARSET/COLLATE clause.
- Moving the code handling the BINCMP_FLAG flag from
mysql_prepare_create_table() to
Column_definition::prepare_charset_for_string():
This code is responsible to resolve the BINARY comparison style
into the corresponding _bin collation, to do the following transparent
rewrite:
CREATE TABLE t1 (a VARCHAR(10) BINARY) CHARSET utf8; ->
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_bin);
This code is only needed for string data types.
Previously it was erroneously called for all data types.
6. Renaming Table_scope_and_contents_source_pod_st::table_charset
to alter_table_convert_to_charset, because the only purpose it's used for
is handlering "ALTER .. CONVERT". The new name is much more self-descriptive.
Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed,
we are observing sporadic hangs during the execution of the
RESET MASTER statement. We are hoping to fix the hangs with these
changes, but due to the rather infrequent occurrence of the hangs
and our inability to reliably reproduce the hangs, we cannot be
sure of this.
What we do know is that innodb_force_recovery=2 (or a larger setting)
will prevent srv_master_callback (the former srv_master_thread) from
running. In that mode, periodic log flushes would never occur and
RESET MASTER could hang indefinitely. That is demonstrated by the new
test case that was developed by Andrei Elkin. We fix this case by
implementing a special case for it.
This also includes some code cleanup and renames of misleadingly
named code. The interface has nothing to do with log checkpoints in
the storage engine; it is only about requesting log writes to be
persistent.
handlerton::commit_checkpoint_request,
commit_checkpoint_notify_ha(): Remove the unused parameter hton.
log_requests.start: Replaces pending_checkpoint_list.
log_requests.end: Replaces pending_checkpoint_list_end.
log_requests.mutex: Replaces pending_checkpoint_mutex.
log_flush_notify_and_unlock(), log_flush_notify(): Replaces
innobase_mysql_log_notify(). The new implementation should be
functionally equivalent to the old one.
innodb_log_flush_request(): Replaces innobase_checkpoint_request().
Implement a fast path for common cases, and reduce the mutex hold time.
POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha()
for the current request if it is already satisfied, as well as invoke
log_flush_notify_and_unlock() for any satisfied requests.
log_write(): Invoke log_flush_notify() when the write is already durable.
This was missing WITH_PMEM when the log is in persistent memory.
Reviewed by: Vladislav Vaintroub
The problem was that the CONNECT engine is trying to open the .frm file
during drop_table(), which the code did not take into account.
Fixed by adding the HA_REUSES_FILE_NAMES table flag to CONNECT.
Other things:
- Fixed a wrong test of HA_REUSE_FILE_NAMES of in mysql_alter_table()
(Comment was correct, no the code)
- Added a test in the connect engine that if the .frm it tries to use in
delete is not made for connect, it will generate an error instead of
crash.
This feature adds the functionality of ignorability for indexes.
Indexes are not ignored be default.
To control index ignorability explicitly for a new index,
use IGNORE or NOT IGNORE as part of the index definition for
CREATE TABLE, CREATE INDEX, or ALTER TABLE.
Primary keys (explicit or implicit) cannot be made ignorable.
The table INFORMATION_SCHEMA.STATISTICS get a new column named IGNORED that
would store whether an index needs to be ignored or not.
When doing a truncate on an Innodb under lock tables, InnoDB would rename
the old table to #sql-... and recreate a new 't1' table. The table lock
would still be on the #sql-table.
When doing ALTER TABLE, Innodb would do the changes on the #sql table
(which would disappear on close).
When the SQL layer, as part of inline alter table, would close the
original t1 table (#sql in InnoDB) and then reopen the t1 table, Innodb
would notice that this does not match it's own (old) t1 table and
generate an error.
Fixed by adding code in truncate table that if we are under lock tables
and truncating an InnoDB table, we would close, reopen and lock the
table after truncate. This will remove the #sql table and ensure that
lock tables is using the new empty table.
Reviewer: Marko Mäkelä
Ever since commit 007f68c37f,
ALTER TABLE no longer invokes handler::open() after
handler::commit_inplace_alter_table().
ha_innobase::reload_statistics(): Reload or recompute statistics
after ALTER TABLE.
innodb_notify_tabledef_changed(): A new function to invoke
ha_innobase::reload_statistics().
handlerton::notify_tabledef_changed(): Add the parameter handler*
so that ha_innobase::reload_statistics() can be invoked.
ha_partition::notify_tabledef_changed(),
partition_notify_tabledef_changed(): Pass through the call
to any partitions or subpartitions.
This is based on code that was supplied by Monty.
Cause: no table->update_handler cloned at the moment of
vers_insert_history_row(). update_handler is needed because there
can't be several inited indexes at once in the same handler. First
index is inited by QUICK_RANGE_SELECT::reset(). Then when history row
is inserted check_duplicate_long_entry_key() is done and it requires
another index.
The idea of this fix is that it's enough to prevent the
next_auto_inc_val from incrementing if an error, to fix this problem
and also the MDEV-17333.
So this patch basically reverts the existing fix to the MDEV-17333.
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
MDEV-21953 deadlock between BACKUP STAGE BLOCK_COMMIT and parallel
replication
Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking
before log_and_order.
The original problem for MDEV-21953 was that while a thread was waiting in
for another threads to commit in 'log_and_order', it had the
MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the
MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks.
This causes a deadlock as the waited-for thread can never get past the
MDL_BACKUP_COMMIT lock in ha_commit_trans.
The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while
a thread is waiting for other 'previous' threads to commit. This ensures
that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which
ensures that there are no deadlocks anymore.
An instant ADD/DROP/reorder column could create a dummy table
object with the wrong ROW_FORMAT when innodb_default_row_format
was changed between CREATE TABLE and ALTER TABLE.
prepare_inplace_alter_table_dict(): If we had promised that
ALGORITHM=INPLACE is supported, we must preserve the ROW_FORMAT.
dict_table_t::prepare_instant(): Add debug assertions to catch
ROW_FORMAT mismatch.
The rest of the changes are related to adding
Alter_inplace_info::inplace_supported to cache the return value of
handler::check_if_supported_inplace_alter().