1. After a period of wait (where last_master_timestamp=0)
do NOT restore the last_master_timestamp to the timestamp
of the last executed event (which would mean we've just
executed it, and we're that much behind the master).
2. Update last_master_timestamp before executing the event,
not after.
Take the approach from the this commit (but with a different test
case that actually makes sense):
commit 0c75ab453fb8c5439576af8fe5add7a1b89f1569
Author: Luis Soares <luis.soares@sun.com>
Date: Thu Apr 15 17:39:31 2010 +0100
BUG#52166: Seconds_Behind_Master spikes after long idle period
The slave SQL thread was clearing serial_rgi->thd before deleting
serial_rgi, which could cause access to NULL THD.
The clearing was introduced in commit
2e100cc5a4 and is just plain wrong. So revert
that part (single line) of that commit.
Thanks to Daniel Black for bug analysis and test case.
There was a rare race, where a deadlock error might not be correctly
handled, causing the slave to stop with something like this in the error
log:
150423 14:04:10 [ERROR] Slave SQL: Connection was killed, Gtid 0-1-2, Internal MariaDB error code: 1927
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001 position 1234
The problem was incorrect error handling. When a deadlock is detected, it
causes a KILL CONNECTION on the offending thread. This error is then later
converted to a deadlock error, and the transaction is retried.
However, the deadlock error was not cleared at the start of the retry, nor
was the lingering kill signal. So it was possible to get another deadlock
kill early during retry. If this happened with particular thread
scheduling/timing, it was possible that the new KILL CONNECTION error was
masked by the earlier deadlock error, so that the second kill was not
properly converted into a deadlock error and retry.
This patch adds code that clears the old error and killed flag before
starting the retry. It also adds code to handle a deadlock kill caught in a
couple of places where it was not handled before.
This was a regression from the patch for MDEV-7668.
A test was incorrect, so the slave would not properly handle re-using
temporary tables, which lead to replication failure in this case.
Make sure that in parallel replication, we execute wait_for_prior_commit()
before setting table->in_use for a temporary table. Otherwise we can end up
with two parallel replication worker threads competing with each other for
use of a temporary table.
Re-factor the use of find_temporary_table() to be able to handle errors
in the caller (as wait_for_prior_commit() can return error in case of
deadlock kill).
[This commit cherry-picked to be able to merge MDEV-7936, of which it
is a pre-requisite, into both 10.0 and 10.1.]
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.
In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.
More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.
This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.
(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).
Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.
This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
Fix a race in the test case. When we do start_slave.inc immediately
followed by stop_slave.inc, it is possible to kill the IO thread while
it is still running inside get_master_version_and_clock(), and this
gives warnings in the error log that cause the test to fail.
The hangs occur when the group_commit_orderer object is freed before the last
mark_start_commit() call on it - this loses the wakeup to other waiting worker
threads, causing them to hang until killed manually.
The object was freed because wakeup_subsequent_commits() was called two early
in two places. For MDEV-7888, during ANALYZE TABLE, and for MDEV-7929 during
record_gtid() after processing a DDL event. The group_commit_orderer object
can be freed when its last transaction has called wait_for_prior_commit().
Fix by implementing a suspend/resume mechanism for wakeup_subsequent_commits()
that can be used in places where a transaction is committed without this being
the commit of the actual replication event group.
Also add a protection mechanism (that asserts in debug builds) which can
prevent the too-early free and hang if other similar bugs should remain in
other parts of the code.
This patch fixes a bug in the error handling in parallel replication, when one
worker thread gets a failure and other worker threads processing later
transactions have to rollback and abort.
The problem was with the lifetime of group_commit_orderer objects (GCOs).
A GCO is freed when we register that its last event group has committed. This
relies on register_wait_for_prior_commit() and wait_for_prior_commit() to
ensure that the fact that T2 has committed implies that any earlier T1 has
also committed, and can thus no longer execute mark_start_commit().
However, in the error case, the code was skipping the
register_wait_for_prior_commit() and wait_for_prior_commit() calls. Thus
commit ordering was not guaranteed, and a GCO could be freed too early. Then a
later mark_start_commit() would reference deallocated GCO, which could lead to
lost wakeup (causing slave threads to hang) or other corruption.
This patch makes also the error case respect commit order. This way, also the
error case gets the GCO lifetime correct, and the hang no longer occurs.
Delay spawning parallel replication worker threads until a slave SQL
thread is running, and de-spawn them when the last SQL thread stops.
This is especially useful to avoid needless threads on a master in a
setup where same my.cnf is used on masters and slaves.
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.
In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.
More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.
This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.
(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).
Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.
This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
The binlog contains specially marked format description events to mark
when a master restart happened (which could have caused temporary
tables to be silently dropped). Such events also cause slave to close
temporary tables.
However, there was a bug that if after this, slave re-connects to the
master in GTID mode, the master can send an old format description
event again. If temporary tables are closed when such event is seen
for the second time, it might drop temporary tables created after that
event, and cause replication failure.
With this patch, the restart flag of the format description event is
cleared by the master when it is sent to the slave in a subsequent
connection, to avoid the errorneous temp table close.
The problem occurs in parallel replication in GTID mode, when we are using
multiple replication domains. In this case, if the SQL thread stops, the
slave GTID position may refer to a different point in the relay log for each
domain.
The bug was that when the SQL thread was stopped and restarted (but the IO
thread was kept running), the SQL thread would resume applying the relay log
from the point of the most advanced replication domain, silently skipping all
earlier events within other domains. This caused replication corruption.
This patch solves the problem by storing, when the SQL thread stops with
multiple parallel replication domains active, the current GTID
position. Additionally, the current position in the relay logs is moved back
to a point known to be earlier than the current position of any replication
domain. Then when the SQL thread restarts from the earlier position, GTIDs
encountered are compared against the stored GTID position. Any GTID that was
already applied before the stop is skipped to avoid duplicate apply.
This patch should have no effect if multi-domain GTID parallel replication is
not used. Similarly, if both SQL and IO thread are stopped and restarted, the
patch has no effect, as in this case the existing relay logs are removed and
re-fetched from the master at the current global @@gtid_slave_pos.
When the server starts up, check if the master-bin.state file was lost.
If it was, recover its contents by scanning the last binlog file, thus
avoiding running with a corrupt binlog state.
If somehow the COMMIT or XID event in an event group was missing, the code in
parallel replication to handle this was not sufficient, leading to server
deadlock.
cherry-pick the upstream fix
commit d4ba10184cd7bde9c31c610e664ecd0c93605c46
Author: Sujatha Sivakumar <sujatha.sivakumar@oracle.com>
Date: Wed Jul 2 11:34:11 2014 +0530
Bug#17453826:ASSERTION ERROR WHEN SETTING FUTURE BINLOG
FILE/POS WITH SEMISYNC
Problem:
========
When DMLs are in progress on the master stopping a slave and
setting ahead binlog name/pos will cause an assert on the
master.
...
The bug occurs when a transaction does a retry after all transactions have
done mark_start_commit() in a batch of group commit from the master. In this
case, the retrying transaction can unmark_start_commit() after the following
batch has already started running and de-allocated the GCO. Then after retry,
the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
wakeup of later GCOs can be lost.
This was seen "in the wild" by a user, even though it is not known exactly
what circumstances can lead to retry of one transaction after all transactions
in a group have reached the commit phase.
The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
lives until rpl_parallel_entry::last_committed_sub_id has reached the last
transaction in the GCO. This guarantees that the GCO will still be alive when
a transaction does mark_start_commit(). Also, we now loop over the list of
active GCOs for wakeup, to ensure we do not lose a wakeup even in the
problematic case.
Use include/sync_with_master_gtid.inc instead of --sync_with_master to avoid a
race in the test case.
In parallel replication, the old-style slave position (which is used by
--sync_with_master) is updated out-of-order between parallel threads. This
makes it possible for the position to be updated past DROP TEMPORARY TABLE t2
just before the commit of INSERT INTO t1 SELECT * FROM t2 becomes visible.
In this case, there is a small window where a SELECT just after
--sync_with_master may not see the changes from the INSERT.
Fix:
===
Backport Bug#11756194 to mysql-5.5. slave breaks if
'drop database' fails on master and mismatched tables on
slave.
'DROP TABLE <deleted tables>' was binlogged when
'DROP DATABASE' failed and at least one table was deleted
from the database. The log event would lead slave SQL thread
stop if some of the tables did not exist on slave.
After this patch, It is always binlogged with 'IF EXISTS'
option.
Problem was that repair() did lock and unlock tables, which leaved already locked tables in wrong state
include/my_check_opt.h:
Added option T_NO_LOCKS to disable locking during repair()
Fixed duplicated bit T_NO_CREATE_RENAME_LSN
mysql-test/suite/rpl/r/myisam_external_lock.result:
Test case for MDEV-6871
mysql-test/suite/rpl/t/myisam_external_lock-slave.opt:
Test case for MDEV-6871
mysql-test/suite/rpl/t/myisam_external_lock.test:
Test case for MDEV-6871
storage/maria/ha_maria.cc:
Don't lock tables during enable_indexes()
Removed some calls to current_thd
storage/myisam/ha_myisam.cc:
Don't lock tables during enable_indexes()
Removed some calls to current_thd
There was a race. The test case was expecting the slave to start processing a
particular DELETE statement, then the test would stop the slave at this
point. But there was missing something to wait until the slave would actually
reach this point; thus depending on timing it was possible that the slave
would be stopped too early, causing .result file difference.
Fixed by adding an appropriate wait to the test case.
Fix rare failures in test case rpl.rpl_gtid_basic:
- Add another possible error code when a connection is killed.
- Make sure that the IO thread has had time to complete its stop after START
SLAVE UNTIL. Otherwise, START SLAVE might run before IO thread stop,
leaving the test case with a stopped IO thread that eventually causes a
wait timeout.
There was a race, a small window between updating slave position and updating
Seconds_Behind_Master, during which the test case could see the wrong value.
Fix by waiting for the expected status to appear.
The replication relay log position was sometimes updated incorrectly at the
end of a transaction in parallel replication. This happened because the relay
log file name was taken from the current Relay_log_info (SQL driver thread),
not the correct value for the transaction in question.
The result was that if a transaction was applied while the SQL driver thread
was at least one relay log file ahead, _and_ the SQL thread was subsequently
stopped before applying any events from the most recent relay log file, then
the relay log position would be incorrect - wrong relay log file name. Thus,
when the slave was started again, usually a relay log read error would result,
or in rare cases, if the position happened to be readable, the slave might
even skip arbitrary amounts of events.
In GTID mode, the relay log position is reset when both slave threads are
restarted, so this bug would only be seen in non-GTID mode, or in GTID mode
when only the SQL thread, not the IO thread, was stopped.
I saw two test failures in rpl.rpl_gtid_crash where we get this in the error
log:
141123 12:47:54 [Note] InnoDB: Restoring possible half-written data pages
141123 12:47:54 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Warning: database page corruption or a failed
InnoDB: file read of space 6 page 3.
InnoDB: Trying to recover it from the doublewrite buffer.
141123 12:47:54 [Note] InnoDB: Recovered the page from the doublewrite buffer.
This test case deliberately crashes the server, and if this crash happens
right in the middle of writing a buffer pool page to disk, it is not
unexpected that we can get a half-written page. The page is recovered
correctly from the doublewrite buffer.
So this patch adds a suppression for this warning in the error log for this
test case.
When a master slave restarts, it logs a special restart format description
event in its binlog. When the slave sees this event, it knows it needs to roll
back any active partial transaction, in case the master crashed previously in
the middle of writing such transaction to its binlog.
However, there was a bug where this rollback did not reset rgi->pending_gtid.
This caused the @@gtid_slave_pos to be updated incorrectly with the GTID of
the partial transaction that was rolled back.
Fix this by always clearing rgi->pending_gtid in cleanup_context(), hopefully
preventing similar bugs from turning up in other special cases where a
transaction is rolled back during replication.
Thanks to Pavel Ivanov for tracking down the issue and providing a test case.
The real problem here was inconsistent handling of entry->commit_errno in
MYSQL_BIN_LOG::write_transaction_or_stmt(). Some return paths were setting it
to the value of errno, some where not. And the setting was redundant anyway,
as it is set consistently by the caller.
Fix by consistently setting it in the caller, and not in each return path in
the function.
The test failure happened because a DBUG_EXECUTE_IF() used in the test case
set an entry->commit_errno that was immediately overwritten in the caller with
whatever happened to be the value of errno. This could lead to different error
message in the .result file.
This bug was seen when parallel replication experienced a deadlock between
transactions T1 and T2, where T2 has reached the commit phase and is waiting
for T1 to commit first. In this case, the deadlock is broken by sending a kill
to T2; that kill error is then later detected and converted to a deadlock
error, which causes T2 to be rolled back and retried.
The problem was that the kill caused ha_commit_trans() to errorneously call
wakeup_subsequent_commits() on T3, signalling it to abort because T2 failed
during commit. This is incorrect, because the error in T2 is only a temporary
error, which will be resolved by normal transaction retry. We should not
signal error to the next transaction until we have executed the code that
handles such temporary errors.
So this patch just removes the calls to wakeup_subsequent_commits() from
ha_commit_trans(). They are incorrect in this case, and they are not needed in
general, as wakeup_subsequent_commits() must in any case be called in
finish_event_group() to wakeup any transactions that may have started to wait
after ha_commit_trans(). And normally, wakeup will in fact have happened
earlier, either from the binlog group commit code, or (in case of no
binlogging) after the fast part of InnoDB/XtraDB group commit.
The symptom of this bug was that replication would break on some transaction
with "Commit failed due to failure of an earlier commit on which this one
depends", but with no such failure of an earlier commit visible anywhere.
The retry of an event group in parallel replication set the wrong value for
the end log position of the event that was retried
(qev->future_event_relay_log_pos). It was too large by the size of the event,
so it pointed into the middle of the following event.
If the retry happened in the very last event of the event group, _and_ the SQL
thread was stopped just after successfully retrying that event, then the SQL
threads's relay log position would be left incorrect. Restarting the SQL
thread could then try to read events from a garbage offset in the relay log,
usually leading to an error about not being able to read the event.
The code in binlog group commit around wait_for_commit that controls commit
order, did the wakeup of subsequent commits early, as soon as a following
transaction is put into the group commit queue, but before any such commit has
actually taken place. This causes problems with too early wakeup of
transactions that need to wait for prior to commit, but do not take part in
the binlog group commit for one reason or the other.
This patch solves the problem, by moving the wakeup to happen only after the
binlog group commit is completed.
This requires a new solution to ensure that transactions that arrive later
than the leader are still able to participate in group commit. This patch
introduces a flag wait_for_commit::commit_started. When this is set, a waiter
can queue up itself in the group commit queue.
This way, effectively the wait_for_prior_commit() is skipped only for
transactions that participate in group commit, so that skipping the wait is
safe. Other transactions still wait as needed for correctness.
The test case had a classic mistake: SET DEBUG_SYNC='now SIGNAL xxx'
followed immediately by SET DEBUG_SYNC='RESET'. This makes it
possible for the waiter to miss the signal, if it does not manage
to wake up prior to the RESET.
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6
This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220
Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.
innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.
dd2d11be7a
merge from MySQL-5.6, revision:
revno: 3677.2.1
committer: Alfranio Correia <alfranio.correia@oracle.com>
timestamp: Tue 2012-02-28 16:26:37 +0000
message:
BUG#13627921 - MISSING FLAGS IN SQL_COMMAND_FLAGS MAY LEAD TO REPLICATION PROBLEMS
Flags in sql_command_flags[command] are not correctly set for the following
commands:
. SQLCOM_SET_OPTION is missing CF_CAN_GENERATE_ROW_EVENTS;
. SQLCOM_BINLOG_BASE64_EVENT is missing CF_CAN_GENERATE_ROW_EVENTS;
. SQLCOM_REVOKE_ALL is missing CF_CHANGES_DATA;
. SQLCOM_CREATE_FUNCTION is missing CF_AUTO_COMMIT_TRANS;
This may lead to a wrong sequence of events in the binary log. To fix
the problem, we correctly set the flags in sql_command_flags[command].