After two concurrent FTWRL/UNLOCK TABLES, the node stays in paused state
and the following CREATE TABLE fails with
ER_UNKNOWN_COM_ERROR (1047): Aborting TOI: Replication paused on
node for FTWRL/BACKUP STAGE.
The cause is the use of global `wsrep_locked_seqno` to determine
if the node should be resumed on UNLOCK TABLES. In some executions
the `wsrep_locked_seqno` is cleared by the first UNLOCK TABLES
after the second FTWRL gets past `make_global_read_lock_block_commit()`.
As a fix, use `thd->wsrep_desynced_backup_stage` to determine
if the thread should resume the node on UNLOCK TABLES.
Add MTR test galera.galera_ftwrl_concurrent to reproduce the
race. The test contains also cases for BACKUP STAGE which
uses similar mechanism for desyncing and pausing the node.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Also fixes MDEV-32025 Crashes in MDL_key::mdl_key_init with lower-case-table-names=2
Change overview:
- In changes made in MDEV-31948, MDEV-31982 the code path
which originaly worked only in case of lower-case-table-names==1
also started to work in case of lower-case-table-names==2 in a mistake.
Restoring the original check_db_name() compatible behavior
(but without re-using check_db_name() itself).
- MDEV-31978 erroneously added a wrong DBUG_ASSERT. Removing.
Details:
- In mysql_change_db() the database name should be lower-cased only
in case of lower_case_table_names==1. It should not be lower-cased
for lower_case_table_names==2. The problem was caused by MDEV-31948.
The new code version restored the pre-MDEV-31948 behavior, which
used check_db_name() behavior.
- Passing lower_case_table_names==1 instead of just lower_case_table_names
to the "casedn" parameter to DBNameBuffer constructor in sql_parse.cc
The database name should not be lower-cased for lower_case_table_names==2.
This restores pre-MDEV-31982 behavioir which used check_db_name() here.
- Adding a new data type Lex_ident_db_normalized, it stores database
names which are both checked and normalized to lower case
in case lower_case_table_names==1 and lower_case_table_names==2.
- Changing the data type for the "db" parameter to Lex_ident_db_normalized in
lock_schema_name(), lock_db_routines(), find_db_tables_and_rm_known_files().
This is to avoid incorrectly passing a non-normalized name in the future.
- Restoring the database name normalization in mysql_create_db_internal()
and mysql_rm_db_internal() before calling lock_schema_name().
The problem was caused MDEV-31982.
- Adding database name normalization in mysql_alter_db_internal()
and mysql_upgrade_db(). This fixes MDEV-32026.
- Removing a wrong assert in Create_sp_func::create_with_db() was incorrect:
DBUG_ASSERT(Lex_ident_fs(*db).ok_for_lower_case_names());
The database name comes to here checked, but not normalized
to lower case with lower-case-table-names=2.
The assert was erroneously added by MDEV-31978.
- Recording lowercase_tables2.results and lowercase_tables4.results
according to
MDEV-29446 Change SHOW CREATE TABLE to display default collations
These tests are skipped on buildbot on all platforms, so this change
was forgotten in the patch for MDEV-29446.
- Changing the global function ok_for_lower_case_names()
into a method in class Lex_ident_fs.
- Changing a few functions/methods to get the database name
as a "const LEX_CSTRING" instead of a "const char *".
All these functions/methods use ok_for_lower_case_names()
inside. This change helps to avoid new strlen() calls, and also
removes a few old strlen() calls.
Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
:: Syntax change ::
Keyword AUTO enables history partition auto-creation.
Examples:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH
STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO;
Or with explicit partitions:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO
(PARTITION p0 HISTORY, PARTITION pn CURRENT);
To disable or enable auto-creation one can use ALTER TABLE by adding
or removing AUTO from partitioning specification:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
# Disables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR;
# Enables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
If the rest of partitioning specification is identical to CREATE TABLE
no repartitioning will be done (for details see MDEV-27328).
:: Description ::
Before executing history-generating DML command (see the list of commands below)
add N history partitions, so that N would be sufficient for potentially
generated history. N > 1 may be required when history partitions are switched
by INTERVAL and current_timestamp is N times further than the interval
boundary of the last history partition.
If the last history partition equals or exceeds LIMIT records then new history
partition is created and selected as the working partition. According to
MDEV-28411 partitions cannot be switched (or created) while the command is
running. Thus LIMIT does not carry strict limitation and the history partition
size must be planned as LIMIT value plus average number of history one DML
command can generate.
Auto-creation is implemented by synchronous fast_alter_partition_table() call
from the thread of the executed DML command before the command itself is run
(by the fallback and retry mechanism similar to Discovery feature,
see Open_table_context).
The name for newly added partitions are generated like default partition names
with extension of MDEV-22155 (which avoids name clashes by extending assignment
counter to next free-enough gap).
These DML commands can trigger auto-creation:
DELETE (including multitable DELETE, excluding DELETE HISTORY)
UPDATE (including multitable UPDATE)
REPLACE (including REPLACE .. SELECT)
INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU)
LOAD DATA .. REPLACE
:: Bug fixes ::
MDEV-23642 Locking timeout caused by auto-creation affects original DML
The reasons for this are:
- Do not disrupt main business process (the history is auxiliary service);
- Consequences are non-fatal (history is not lost, but comes into wrong
partition; fixed by partitioning rebuild);
- There is more freedom for application to fail in this case or not: it may
read warning info and find corresponding error number.
- While non-failing command is easy to handle by an application and fail it,
the opposite is hard to handle: there is no automatic actions to fix
failed command and retry, DBA intervention is required and until then
application is non-functioning.
MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers
Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is
not possible in locked tables mode.
LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out
of the box as fast_alter_partition_table() can reopen tables via
locked_tables_list.
In LTM_PRELOCKED we reopen and relock table manually.
:: More fixes ::
* some_table_marked_for_reopen flag fix
some_table_marked_for_reopen affets only reopen of
m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only
tables from m_locked_tables.
* Unused can_recover_from_failed_open() condition
Is recover_from_failed_open() can be really used after
open_and_process_routine()?
:: Reviewed by ::
Sergei Golubchik <serg@mariadb.org>
Diagnostics_area::set_error_status (interrupted ALTER TABLE under LOCK)
Analysis: KILL_QUERY is not ignored when local memory used exceeds maximum
session memory. Hence the query proceeds, OK is sent and we end up
reopening tables that are marked for reopen. During this, kill status is
eventually checked and assertion failure happens during trying to send error
message because OK has already been sent.
Fix: Ok is already sent so statement has already executed. It is too
late to give error. So ignore kill.
This adds following new thread states:
* waiting to execute in isolation - DDL is waiting to execute in TOI mode.
* waiting for TOI DDL - some other statement is waiting for DDL to complete.
* waiting for flow control - some statement is paused while flow control is in effect.
* waiting for certification - the transaction is being certified.
Use < TL_FIRST_WRITE for determining a READ transaction.
Use TL_FIRST_WRITE as the relative operator replacing TL_WRITE_ALLOW_WRITE
as the minimium WRITE lock type.
This was caused by two different bugs:
1) Information_schema tables where not locked by lock_tables, but
get_lock_data() was not filtering these out. This caused a crash when
mysql_unlock_some_tables() tried to unlock tables early, including
not locked information schema tables.
Fixed by not locking SYSTEM_TMP_TABLES
2) In some cases the optimizer will notice that we do not need to read
the information_schema tables at all. In this case
join_tab->read_record is not set, which caused a crash in
get_schema_tables_result()
Fixed by ignoring const tables in get_schema_tables_result()
The reason why we have wsrep_on() at all is that the macro WSREP(thd)
depends on the definition of THD, and that is intentionally an opaque
data type for InnoDB. So, we cannot avoid invoking wsrep_on(), but
we can evaluate the less expensive conditions thd && WSREP_ON before
calling the function.
Global_read_lock: Use WSREP_NNULL(thd) instead of wsrep_on(thd)
because we not only know the definition of THD but also that
the pointer is not null.
wsrep_open(): Use WSREP(thd) instead of wsrep_on(thd).
InnoDB: Replace thd && wsrep_on(thd) with wsrep_on(thd), now that
the condition has been merged to the definition of the macro
wsrep_on().
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about
deprecated `register` keyword.
Too much warnings came from Mroonga and I gave up on it.
With MDEV-19384 fixed FTWRL releases HANDLER locks early, which allows
concurrent threads to go. Test case may get stuck on FTWRL waiting for
LOCK TABLES.
The deadlock happened between FTWRL under open HANDLER, LOCK TABLE and
DROP DATABASE
Fixed by reverting the previous fix for handler open in
lock_global_read_lock()
Fixed the original (wrong) test case in flush_read_lock.test to be
repeatable.
local_global_read_lock did release all HANDLER's before taking its
MDL_BACKUP_FTWRL# locks. This had a potential race condition if
there was a waiting LOCK TABLE for one of the freed handlers.
Fixed by moving the release of HANDLER's to after the backup locks are
taken.
After commit 8cf7e3459 it's not anymore critical to free HANDLER's in
FTWRL, but we will keep the mysql_ha_cleanup_no_free() call until 10.5
to not cause any issues with 10.4 just before it's going GA.