- Added some extra command to rpl_start_stop to ensure that the
IO thread has connected to the master before we shut down the server.
- if signal returns signalhandler_t, use this with the alarm code
- Added missing tests to sys_vars
- Fixed some possible overflow bugs in tabxml.cpp
Problem is that FLUSH TABLES WITH READ LOCK first blocks threads from
starting new commits, then waits for running commits to complete. But
in-order parallel replication needs commits to happen in a particular
order, so this can easily deadlock.
To fix this problem, this patch introduces a way to temporarily pause
the parallel replication worker threads. Before starting FTWRL, we let
all worker threads complete in-progress transactions, and then
wait. Then we proceed to take the global read lock. Once the lock is
obtained, we unpause the worker threads. Now commits are blocked from
starting by the global read lock, so the deadlock will no longer occur.
(even when configured with --binlog-format=statement).
Before we got an error on the slave and the slave stopped if the master
was configured with --binlog-format=mixed or --binlog-format=row.
Use sync_with_master_gtid.inc instead of --sync_with_master. The latter is
not correct because the test case uses RESET MASTER; this invalidates the
existing binlog positions on the slave.
In this particular case, there was a small window where --sync_with_master
could trigger too early (on the old position), causing the test case to miss
one event.
The code was using the wrong variable when comparing the binlog name
for the UNTIL position. This could cause the comparison to fail after
binlog rotation, in turn causing the UNTIL clause to not trigger slave
stop.
The assertion is there to catch cases where we rollback while
mark_start_commit() is active. This can allow following event groups
to be replicated too early, causing conflicts.
But in this case, we have an _explicit_ ROLLBACK event in the binlog,
which should not assert.
We fix this by delaying the mark_start_commit() in the explicit
ROLLBACK case. It seems safest to delay this in ROLLBACK case anyway,
and there should be no reason to try to optimise this corner case.
This bug is essentially another variant of MDEV-7458.
If a transaction conflict caused a deadlock kill of T2 in record_gtid()
during commit, the code would do a rollback _before_ running
rgi->unmark_start_commit(). This creates a race where following transactions
could start too early (before T2 has completed its transaction retry). This
in turn could lead to replication failure, if there was a conflict that
caused eg. duplicate key error or similar.
The fix is to remove these rollbacks (in Query_log_event::do_apply_event()
and Xid_log_event::do_apply_event(). They seem out-of-place; code in
log_event.cc generally does not roll back on error, this is handled higher
up.
In addition, because of the extreme difficulty of reproducing bugs like
MDEV-7458 and MDEV-8302, this patch adds some extra precations to try to
detect (in debug builds) or prevent (in release builds) similar bugs.
ha_rollback_trans() will now call unmark_start_commit() if needed (and
assert in debug build when a caller does rollback without unmark first).
We also add an extra check for thd->killed() so that we avoid doing
mark_start_commit() if we already have a pending deadlock kill.
And we add a missing unmark_start_commit() call in the error case, found by
the above assertion.
Fix was to add a test in Query_log_event::Query_log_event() if we are using
CREATE ... SELECT and in this case use trans cache, like we do on the master.
This avoid using (with doesn't have checksum)
Other things:
- Removed dummy call my_checksum(0L, NULL, 0)
- More DBUG_PRINT
- Cleaned up Log_event::need_checksum() to make it more readable (similar as in MySQL 5.6)
- Renamed variable that was hiding another one in create_table_imp()
The fix is that if the slave has a different integer size than
the master, then they will assume the master has the same signed/unsigned modifier
as the slave.
This means that one can safely change a coon the slave an int to a bigint
or an unsigned int to an unsigned int. Changing an unsigned int to an
signed bigint will cause replication failures when the high bit of the
unsigned int is set.
We can't give an error if the signess is different on the master and slave
as the binary log doesn't contain the signess of the column on the master.
Changing the error message to:
"...from type 'decimal(0,?)/*old*/' to type ' 'decimal(10,7)'..."
So it's now clear that the master data type is OLD decimal.
When the slave processes the master restart format_description event,
parallel replication needs to complete any prior events before processing
the restart event (which closes temporary tables and such stuff).
This happens in wait_for_workers_idle(), however it was not waiting long
enough. The wait was using wait_for_prior_commit(), but at that points table
can still be open. This lead to assertion in this case.
So change wait_for_workers_idle() to wait until all worker threads have
reached finish_event_group(), at which point all tables should have been
closed.
1. After a period of wait (where last_master_timestamp=0)
do NOT restore the last_master_timestamp to the timestamp
of the last executed event (which would mean we've just
executed it, and we're that much behind the master).
2. Update last_master_timestamp before executing the event,
not after.
Take the approach from the this commit (but with a different test
case that actually makes sense):
commit 0c75ab453fb8c5439576af8fe5add7a1b89f1569
Author: Luis Soares <luis.soares@sun.com>
Date: Thu Apr 15 17:39:31 2010 +0100
BUG#52166: Seconds_Behind_Master spikes after long idle period
The slave SQL thread was clearing serial_rgi->thd before deleting
serial_rgi, which could cause access to NULL THD.
The clearing was introduced in commit
2e100cc5a4 and is just plain wrong. So revert
that part (single line) of that commit.
Thanks to Daniel Black for bug analysis and test case.
Three-way deadlock:
T1: SHOW GLOBAL STATUS
-> acquire LOCK_status
T2: STOP SLAVE
-> acquire LOCK_active_mi
-> terminate_slave_thread()
-> -> cond_timedwait for handle_slave_sql to stop
T3: sql slave thread (same applies to io thread)
-> handle_slave_sql(), when exiting
-> -> THD::add_status_to_global()
-> -> -> wait for LOCK_status...
T1: SHOW GLOBAL STATUS
-> for "Slave_heartbeat_period" status variable
-> -> show_heartbeat_period()
-> -> -> wait for LOCK_active_mi
cherry-pick from 5.6:
commit fc8b395898f40387b3468122bd0dae31e29a6fde
Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com>
Date: Wed Jun 12 21:41:05 2013 +0530
BUG#16904035-SHOW STATUS - EXCESSIVE LOCKING ON LOCK_ACTIVE_MI AND
ACTIVE_MI->RLI->DATA_LOCK
Problem: Excessive locking on lock_active_mi and rli->data_lock
while executing any `show status like 'X'` command.
Analysis: SHOW_FUNCs for Slave_running, Slave_retried_transactions,
Slave_heartbeat_period, Slave_received_heartbeats,
Slave_last_heartbeat are acquiring lock_active_mi and rli->data_lock
to show their variable value. It is ok to show stale data while showing
the status variables i.e., even if they miss one update, it will
not cause any great trouble.
Fix: Remove the locks from the above mentioned SHOW_FUNC functions.
Add a test case
There was a rare race, where a deadlock error might not be correctly
handled, causing the slave to stop with something like this in the error
log:
150423 14:04:10 [ERROR] Slave SQL: Connection was killed, Gtid 0-1-2, Internal MariaDB error code: 1927
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001 position 1234
The problem was incorrect error handling. When a deadlock is detected, it
causes a KILL CONNECTION on the offending thread. This error is then later
converted to a deadlock error, and the transaction is retried.
However, the deadlock error was not cleared at the start of the retry, nor
was the lingering kill signal. So it was possible to get another deadlock
kill early during retry. If this happened with particular thread
scheduling/timing, it was possible that the new KILL CONNECTION error was
masked by the earlier deadlock error, so that the second kill was not
properly converted into a deadlock error and retry.
This patch adds code that clears the old error and killed flag before
starting the retry. It also adds code to handle a deadlock kill caught in a
couple of places where it was not handled before.
This was a regression from the patch for MDEV-7668.
A test was incorrect, so the slave would not properly handle re-using
temporary tables, which lead to replication failure in this case.
Make sure that in parallel replication, we execute wait_for_prior_commit()
before setting table->in_use for a temporary table. Otherwise we can end up
with two parallel replication worker threads competing with each other for
use of a temporary table.
Re-factor the use of find_temporary_table() to be able to handle errors
in the caller (as wait_for_prior_commit() can return error in case of
deadlock kill).
[This commit cherry-picked to be able to merge MDEV-7936, of which it
is a pre-requisite, into both 10.0 and 10.1.]
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.
In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.
More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.
This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.
(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).
Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.
This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
Fix a race in the test case. When we do start_slave.inc immediately
followed by stop_slave.inc, it is possible to kill the IO thread while
it is still running inside get_master_version_and_clock(), and this
gives warnings in the error log that cause the test to fail.
The hangs occur when the group_commit_orderer object is freed before the last
mark_start_commit() call on it - this loses the wakeup to other waiting worker
threads, causing them to hang until killed manually.
The object was freed because wakeup_subsequent_commits() was called two early
in two places. For MDEV-7888, during ANALYZE TABLE, and for MDEV-7929 during
record_gtid() after processing a DDL event. The group_commit_orderer object
can be freed when its last transaction has called wait_for_prior_commit().
Fix by implementing a suspend/resume mechanism for wakeup_subsequent_commits()
that can be used in places where a transaction is committed without this being
the commit of the actual replication event group.
Also add a protection mechanism (that asserts in debug builds) which can
prevent the too-early free and hang if other similar bugs should remain in
other parts of the code.
This patch fixes a bug in the error handling in parallel replication, when one
worker thread gets a failure and other worker threads processing later
transactions have to rollback and abort.
The problem was with the lifetime of group_commit_orderer objects (GCOs).
A GCO is freed when we register that its last event group has committed. This
relies on register_wait_for_prior_commit() and wait_for_prior_commit() to
ensure that the fact that T2 has committed implies that any earlier T1 has
also committed, and can thus no longer execute mark_start_commit().
However, in the error case, the code was skipping the
register_wait_for_prior_commit() and wait_for_prior_commit() calls. Thus
commit ordering was not guaranteed, and a GCO could be freed too early. Then a
later mark_start_commit() would reference deallocated GCO, which could lead to
lost wakeup (causing slave threads to hang) or other corruption.
This patch makes also the error case respect commit order. This way, also the
error case gets the GCO lifetime correct, and the hang no longer occurs.
BINLOGGED INCORRECTLY - BREAKS A SLAVE
Submitted a incomplete patch with my previous push,
re submitting the extra changes the required to make
the patch complete.
Analysis:
In row based replication, Master does not send temp table information
to Slave. If there are any DDLs that involves in regular table that needs
to be sent to Slave and a temp tables (which will not be available at Slave),
the Master rewrites the query replacing temp table with it's defintion.
Eg: create table regular_table like temptable.
In rewrite logic, server is ignoring the database of regular table
which can cause problems mentioned in this bug.
Fix: dont ignore database information (if available) while
rewriting the query
Delay spawning parallel replication worker threads until a slave SQL
thread is running, and de-spawn them when the last SQL thread stops.
This is especially useful to avoid needless threads on a master in a
setup where same my.cnf is used on masters and slaves.
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.
In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.
More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.
This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.
(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).
Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.
This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.