If async replication slave thread conflicts with cluster replication,
then the async slave transaction should be BF aborted, and depending on the
state of async slave transaction execution, potentially also replayed.
There were problems in such BF abort implementation and the replaying was not
started.
This pull request contains fixes which make sure that if async slave thread is
marked to abort and replay, it will complete carry out the rollback and
release all locks and resources before starting the replaying. After replaying,
async slave transactions is treated as successful, so the slave thread will
continue as usual, handling next replication event.
There is also new mtr test: galera.galera_slave_replay, which stresses both a
certification failure for async slave thread and a successful BF abort
followed by replaying.
For release builds, do not declare unused variables.
unpack_row(): Omit a debug-only variable from WSREP diagnostic message.
create_wsrep_THD(): Fix -Wmaybe-uninitialized for the PSI_thread_key.
Galera threads were not registered to performance schema and
used pthread_create when mysql_thread_create should have been
used.
Added test case to verify current galera performance schema
instrumentation does work.
Problem was that tests select INFORMATION_SCHEMA.PROCESSLIST processes
from user system user and empty state. Thus, there is not clear
state for slave threads.
Changes:
- Added new status variables that store current amount of applier threads
(wsrep_applier_thread_count) and rollbacker threads
(wsrep_rollbacker_thread_count). This will make clear how many slave threads
of certain type there is.
- Added THD state "wsrep applier idle" when applier slave thread is
waiting for work. This makes finding slave/applier threads easier.
- Added force-restart option for mtr to always restart servers between tests
to avoid race on start of the test
- Added wait_condition_with_debug to wait until the passed statement returns
true, or the operation times out. If operation times out, the additional error
statement will be executed
Changes to be committed:
new file: mysql-test/include/force_restart.inc
new file: mysql-test/include/wait_condition_with_debug.inc
modified: mysql-test/mysql-test-run.pl
modified: mysql-test/suite/galera/disabled.def
modified: mysql-test/suite/galera/r/MW-336.result
modified: mysql-test/suite/galera/r/galera_kill_applier.result
modified: mysql-test/suite/galera/r/galera_var_slave_threads.result
new file: mysql-test/suite/galera/t/MW-336.cnf
modified: mysql-test/suite/galera/t/MW-336.test
modified: mysql-test/suite/galera/t/galera_kill_applier.test
modified: mysql-test/suite/galera/t/galera_parallel_autoinc_largetrx.test
modified: mysql-test/suite/galera/t/galera_parallel_autoinc_manytrx.test
modified: mysql-test/suite/galera/t/galera_var_slave_threads.test
modified: mysql-test/suite/wsrep/disabled.def
modified: mysql-test/suite/wsrep/r/variables.result
modified: mysql-test/suite/wsrep/t/variables.test
modified: sql/mysqld.cc
modified: sql/wsrep_mysqld.cc
modified: sql/wsrep_mysqld.h
modified: sql/wsrep_thd.cc
modified: sql/wsrep_var.cc
This reverts commit 21b2fada7a
and commit 81d71ee6b2.
The MDEV-18464 change introduces a few data race issues. Contrary to
the documentation, the field trx_t::victim is not always being protected
by lock_sys_t::mutex and trx_t::mutex. Most importantly, it seems
that KILL QUERY could wrongly avoid acquiring both mutexes when
invoking lock_trx_handle_wait_low(), in case another thread had
already set trx->victim=true.
We also revert MDEV-12009, because it should depend on the MDEV-18464
fix being present.
As noted on kill_one_thread SUPER should be able to kill even
system threads i.e. threads/query flagged as high priority or
wsrep applier thread. Normal user, should not able to kill
threads/query flagged as high priority (BF) or wsrep applier
thread.
This patch contains a fix for the MDEV-17262/17243 issues and
new mtr test.
These issues (MDEV-17262/17243) have two reasons:
1) After an intermediate commit, a transaction loses its status
of "transaction that registered in the MySQL for 2pc coordinator"
(in the InnoDB) due to the fact that since version 10.2 the
write_row() function (which located in the ha_innodb.cc) does
not call trx_register_for_2pc(m_prebuilt->trx) during the processing
of split transactions. It is necessary to restore this call inside
the write_row() when an intermediate commit was made (for a split
transaction).
Similarly, we need to set the flag of the started transaction
(m_prebuilt->sql_stat_start) after intermediate commit.
The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the
wsrep_load_data_split() function (which located in sql_load.cc)
will also do this, but it will be too late. As a result, the call
to the wsrep_append_keys() function from the InnoDB engine may be
lost or function may be called with invalid transaction identifier.
2) If a transaction with the LOAD DATA statement is divided into
logical mini-transactions (of the 10K rows) and binlog is rotated,
then in rare cases due to the wsrep handler re-registration at the
boundary of the split, the last portion of data may be lost. Since
splitting of the LOAD DATA into mini-transactions is technical,
I believe that we should not allow these mini-transactions to fall
into separate binlogs. Therefore, it is necessary to prohibit the
rotation of binlog in the middle of processing LOAD DATA statement.
https://jira.mariadb.org/browse/MDEV-17262 and
https://jira.mariadb.org/browse/MDEV-17243
Problem was that we skipped background persistent statistics calculation
on applier nodes if thread is marked as high priority (a.k.a BF).
However, on applier nodes all DDL which is replicate will be executed
as high priority i.e BF.
Fixed by allowing background persistent statistics calculation on
applier nodes even when thread is marked as BF. This could lead
BF lock waits but for queries on that node needs that statistics.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
These test can sporadically show mutex deadlock warnings between LOCK_wsrep_thd
and LOCK_thd_data mutexes. This means that these mutexes can be locked in opposite
order by different threads, and thus result in deadlock situation.
To fix such issue, the locking policy of these mutexes should be revised and
enforced to be uniform. However, a quick code review shows that the number of
lock/unlock operations for these mutexes combined is between 100-200, and all these
mutex invocations should be checked/fixed.
On the other hand, it turns out that LOCK_wsrep_thd is used for protecting access to
wsrep variables of THD (wsrep_conflict_state, wsrep_query_state), whereas LOCK_thd_data
protects query, db and mysys_var variables in THD. Extending LOCK_thd_data to protect
also wsrep variables looks like a viable solution, as there should not be a use case
where separate threads need simultaneous access to wsrep variables and THD data variables.
In this commit LOCK_wsrep_thd mutex is refactored to be replaced by LOCK_thd_data.
By bluntly replacing LOCK_wsrep_thd by LOCK_thd_data, will result in double locking
of LOCK_thd_data, and some adjustements have been performed to fix such situations.
The problem was that the introduction of max-thread-mem-used can cause
an allocation error very early, even before mysql_parse() is called.
As mysql_parse() calls thd->reset_for_next_command(), which called
clear_error(), the error number was lost.
Fixed by adding an option to have unique messages for each KILL
signal and change max-thread-mem-used to use this new feature.
This removes a lot of problems with the original approach, where
one could get errors signaled silenty almost any time.
ixed by moving clear_error() from reset_for_next_command() to
do_command(), before any memory allocation for the thread.
Related changes:
- reset_for_next_command() now have an optional parameter if we should
call clear_error() or not. By default it's called, but not anymore from
dispatch_command() which was the original problem.
- Added optional paramater to clear_error() to force calling of
reset_diagnostics_area(). Before clear_error() only called
reset_diagnostics_area() if there was no error, so we normally
called reset_diagnostics_area() twice.
- This change removed several duplicated calls to clear_error()
when starting a query.
- Reset max_mem_used on COM_QUIT, to protect against kill during
quit.
- Use fatal_error() instead of setting is_fatal_error (cleanup)
- Set fatal_error if max_thead_mem_used is signaled.
(Same logic we use for other places where we are out of resources)
Do not silence uncertain cases, or fix any bugs.
The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
Merge feature into 10.2 from feature branch.
Delayed replication adds an option
CHANGE MASTER TO master_delay=<seconds>
Replication will then delay applying events with that many
seconds. This creates a replication slave that reflects the state of
the master some time in the past.
Feature is ported from MySQL source tree.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
- popping PS reprepare observer before BF aborted PS replaying begins
dangling observer will cause failure in open_table() ater on
- test case for this anomaly
Transaction replay causes the THD to re-apply the replication
events from execution, using the same path appliers do. While
applying the log events, the THD's timestamp is set to the
timestamp of the event.
Setting the timestamp explicitly causes function NOW() to
always the timestamp that was set. To avoid this behavior we
reset the timestamp after replaying is done.
mysqld maintains a list of TABLE objects for all temporary
tables created within a session in THD. Here each table is
represented by a TABLE object.
A query referencing a particular temporary table for more
than once, however, failed with ER_CANT_REOPEN_TABLE error
because a TABLE_SHARE was allocate together with the TABLE,
so temporary tables always had only one TABLE per TABLE_SHARE.
This patch lift this restriction by separating TABLE and
TABLE_SHARE objects and storing TABLE_SHAREs for temporary
tables in a list in THD, and TABLEs in a list within their
respective TABLE_SHAREs.
different fix. remove old ones, wait for THD to be fully
initialized before continuing with the server startup process.
This reverts commits db2e21b, 13615c5, 3f515a0, 70113ee.
* Wait for aborted thd (victim) to release MDL locks
* Skip aborting an already aborted thd
* Defer setting OK status in case of CTAS
* Minor cosmetic changes
* Added a test case
Annotate_rows event needs to be preserved until the last Rows event has
been applied because after it has been applied thd->query points to the
query stored inside this event.