This patch fixes a bug in the error handling in parallel replication, when one
worker thread gets a failure and other worker threads processing later
transactions have to rollback and abort.
The problem was with the lifetime of group_commit_orderer objects (GCOs).
A GCO is freed when we register that its last event group has committed. This
relies on register_wait_for_prior_commit() and wait_for_prior_commit() to
ensure that the fact that T2 has committed implies that any earlier T1 has
also committed, and can thus no longer execute mark_start_commit().
However, in the error case, the code was skipping the
register_wait_for_prior_commit() and wait_for_prior_commit() calls. Thus
commit ordering was not guaranteed, and a GCO could be freed too early. Then a
later mark_start_commit() would reference deallocated GCO, which could lead to
lost wakeup (causing slave threads to hang) or other corruption.
This patch makes also the error case respect commit order. This way, also the
error case gets the GCO lifetime correct, and the hang no longer occurs.
Review fixes:
- Coding style
- Fix bad .result file
- Fix test to be tolerant of different timing.
- Fix test to give better info in case of unexpected timing.
BINLOGGED INCORRECTLY - BREAKS A SLAVE
Submitted a incomplete patch with my previous push,
re submitting the extra changes the required to make
the patch complete.
Analysis:
In row based replication, Master does not send temp table information
to Slave. If there are any DDLs that involves in regular table that needs
to be sent to Slave and a temp tables (which will not be available at Slave),
the Master rewrites the query replacing temp table with it's defintion.
Eg: create table regular_table like temptable.
In rewrite logic, server is ignoring the database of regular table
which can cause problems mentioned in this bug.
Fix: dont ignore database information (if available) while
rewriting the query
MASTER_GTID_WAIT function needs some status to evaluate its use.
master_gtid_wait_count indicates how many times the function is called.
master_gtid_wait_time indicates how much time in microseconds occurred
waiting (or timing out)
master_gtid_timeouts indicates how many time times this function timed
out rather than all successful gtids events being available.
Delay spawning parallel replication worker threads until a slave SQL
thread is running, and de-spawn them when the last SQL thread stops.
This is especially useful to avoid needless threads on a master in a
setup where same my.cnf is used on masters and slaves.
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.
In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.
More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.
This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.
(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).
Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.
This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
The binlog contains specially marked format description events to mark
when a master restart happened (which could have caused temporary
tables to be silently dropped). Such events also cause slave to close
temporary tables.
However, there was a bug that if after this, slave re-connects to the
master in GTID mode, the master can send an old format description
event again. If temporary tables are closed when such event is seen
for the second time, it might drop temporary tables created after that
event, and cause replication failure.
With this patch, the restart flag of the format description event is
cleared by the master when it is sent to the slave in a subsequent
connection, to avoid the errorneous temp table close.
The problem occurs in parallel replication in GTID mode, when we are using
multiple replication domains. In this case, if the SQL thread stops, the
slave GTID position may refer to a different point in the relay log for each
domain.
The bug was that when the SQL thread was stopped and restarted (but the IO
thread was kept running), the SQL thread would resume applying the relay log
from the point of the most advanced replication domain, silently skipping all
earlier events within other domains. This caused replication corruption.
This patch solves the problem by storing, when the SQL thread stops with
multiple parallel replication domains active, the current GTID
position. Additionally, the current position in the relay logs is moved back
to a point known to be earlier than the current position of any replication
domain. Then when the SQL thread restarts from the earlier position, GTIDs
encountered are compared against the stored GTID position. Any GTID that was
already applied before the stop is skipped to avoid duplicate apply.
This patch should have no effect if multi-domain GTID parallel replication is
not used. Similarly, if both SQL and IO thread are stopped and restarted, the
patch has no effect, as in this case the existing relay logs are removed and
re-fetched from the master at the current global @@gtid_slave_pos.
When the server starts up, check if the master-bin.state file was lost.
If it was, recover its contents by scanning the last binlog file, thus
avoiding running with a corrupt binlog state.
If somehow the COMMIT or XID event in an event group was missing, the code in
parallel replication to handle this was not sufficient, leading to server
deadlock.
cherry-pick the upstream fix
commit d4ba10184cd7bde9c31c610e664ecd0c93605c46
Author: Sujatha Sivakumar <sujatha.sivakumar@oracle.com>
Date: Wed Jul 2 11:34:11 2014 +0530
Bug#17453826:ASSERTION ERROR WHEN SETTING FUTURE BINLOG
FILE/POS WITH SEMISYNC
Problem:
========
When DMLs are in progress on the master stopping a slave and
setting ahead binlog name/pos will cause an assert on the
master.
...
- Fixed compiler warnings
- Added include/wait_for_binlog_checkpoint.inc, as suggested by JonasO
- Updated 'build-tags' to work with git (Patch by Serg)
Adjust the configuration options, as discussed on the
maria-developers@ mailing list.
The option to hint a transaction to not be replicated in parallel is
now called @@skip_parallel_replication, consistent with
@@skip_replication.
And the --slave-parallel-mode is now simplified to have just one of
the following values:
none
minimal
conservative
optimistic
aggressive
This reflects successively harder efforts to find opportunities to run
things in parallel on the slave. It allows to extend the server with
more automatic heuristics in the future without having to introduce a
new configuration option for each and every one.
The problem was a too low timeout for slave reconnect. It was set to 9 seconds
(10 retries with 1 second in-between). This is occasinally too short on some
Buildbot hosts, when the test crashes and restarts the master while the slave
IO thread is running.
Fix by increasing --master-retry-count for this test.
The test case injects a DBUG that will crash the server during replication,
then does a START SLAVE. We need to use --error 0,2006,2013 on the START
SLAVE, so that we will not fail the test if the server has time to crash
before the START SLAVE returns to the client.
Fixes a failure seen in Buildbot.
special character sets like utf16, utf32, ucs2.
Analysis: MySQL server does not support few special character sets like
utf16,utf32 and ucs2 as "client's character set"(eg: utf16,utf32, ucs2).
It is known limitation listed in the documentation
http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html.
The default value for default-character-set parameter is 'auto'
which means that if the server's character set is not supported,
then server automatically changes client's character set to
predefined character-set which is 'latin1' in the current code.
Eg:
$ ./mysql -uroot -S$SOCKET_FILE --default-character-set=utf16
ERROR 1231 (42000): Variable 'character_set_client' can't be set to the value of 'utf16'
$ ./mysql -uroot -S$SOCKET_FILE will be successfully connected to
server with 'latin1' as default client side character set.
When IO thread is trying to connect to Master, it sets server's character
set as client's character set. When Slave server is started with these
special character sets, IO thread (which is like a connection to Master)
fails because of the above said limitation.
Fix: Now even IO thread also behaves the same as a regular client behaves.
i.e., If server's character set is not supported as client's character set,
then set default's client character set(latin1) as client's character set.
The bug occurs when a transaction does a retry after all transactions have
done mark_start_commit() in a batch of group commit from the master. In this
case, the retrying transaction can unmark_start_commit() after the following
batch has already started running and de-allocated the GCO. Then after retry,
the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
wakeup of later GCOs can be lost.
This was seen "in the wild" by a user, even though it is not known exactly
what circumstances can lead to retry of one transaction after all transactions
in a group have reached the commit phase.
The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
lives until rpl_parallel_entry::last_committed_sub_id has reached the last
transaction in the GCO. This guarantees that the GCO will still be alive when
a transaction does mark_start_commit(). Also, we now loop over the list of
active GCOs for wakeup, to ensure we do not lose a wakeup even in the
problematic case.
Use include/sync_with_master_gtid.inc instead of --sync_with_master to avoid a
race in the test case.
In parallel replication, the old-style slave position (which is used by
--sync_with_master) is updated out-of-order between parallel threads. This
makes it possible for the position to be updated past DROP TEMPORARY TABLE t2
just before the commit of INSERT INTO t1 SELECT * FROM t2 becomes visible.
In this case, there is a small window where a SELECT just after
--sync_with_master may not see the changes from the INSERT.
Fix:
===
Backport Bug#11756194 to mysql-5.5. slave breaks if
'drop database' fails on master and mismatched tables on
slave.
'DROP TABLE <deleted tables>' was binlogged when
'DROP DATABASE' failed and at least one table was deleted
from the database. The log event would lead slave SQL thread
stop if some of the tables did not exist on slave.
After this patch, It is always binlogged with 'IF EXISTS'
option.
Implement --semi-sync-master-wait-point=AFTER_SYNC|AFTER_COMMIT.
When AFTER_SYNC, the semi-sync wait will be done earlier, before the storage
engine commit rather than after. This means that a transaction will not be
visible on the master until at least one slave has received it.
Make the binlog dump threads not need to take LOCK_log while sending
binlog events to slave. Instead, a new LOCK_binlog_end_pos is used
just to coordinate tracking the current end-of-log.
This is a pre-requisite for MDEV-162, "Enhanced semisync
replication". It should also help reduce the contention on LOCK_log on
a busy master.
Also does some much-needed refactoring/cleanup of the related code in
the binlog dump thread.
Problem was that repair() did lock and unlock tables, which leaved already locked tables in wrong state
include/my_check_opt.h:
Added option T_NO_LOCKS to disable locking during repair()
Fixed duplicated bit T_NO_CREATE_RENAME_LSN
mysql-test/suite/rpl/r/myisam_external_lock.result:
Test case for MDEV-6871
mysql-test/suite/rpl/t/myisam_external_lock-slave.opt:
Test case for MDEV-6871
mysql-test/suite/rpl/t/myisam_external_lock.test:
Test case for MDEV-6871
storage/maria/ha_maria.cc:
Don't lock tables during enable_indexes()
Removed some calls to current_thd
storage/myisam/ha_myisam.cc:
Don't lock tables during enable_indexes()
Removed some calls to current_thd