The test case injects a DBUG that will crash the server during replication,
then does a START SLAVE. We need to use --error 0,2006,2013 on the START
SLAVE, so that we will not fail the test if the server has time to crash
before the START SLAVE returns to the client.
Fixes a failure seen in Buildbot.
special character sets like utf16, utf32, ucs2.
Analysis: MySQL server does not support few special character sets like
utf16,utf32 and ucs2 as "client's character set"(eg: utf16,utf32, ucs2).
It is known limitation listed in the documentation
http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html.
The default value for default-character-set parameter is 'auto'
which means that if the server's character set is not supported,
then server automatically changes client's character set to
predefined character-set which is 'latin1' in the current code.
Eg:
$ ./mysql -uroot -S$SOCKET_FILE --default-character-set=utf16
ERROR 1231 (42000): Variable 'character_set_client' can't be set to the value of 'utf16'
$ ./mysql -uroot -S$SOCKET_FILE will be successfully connected to
server with 'latin1' as default client side character set.
When IO thread is trying to connect to Master, it sets server's character
set as client's character set. When Slave server is started with these
special character sets, IO thread (which is like a connection to Master)
fails because of the above said limitation.
Fix: Now even IO thread also behaves the same as a regular client behaves.
i.e., If server's character set is not supported as client's character set,
then set default's client character set(latin1) as client's character set.
The bug occurs when a transaction does a retry after all transactions have
done mark_start_commit() in a batch of group commit from the master. In this
case, the retrying transaction can unmark_start_commit() after the following
batch has already started running and de-allocated the GCO. Then after retry,
the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
wakeup of later GCOs can be lost.
This was seen "in the wild" by a user, even though it is not known exactly
what circumstances can lead to retry of one transaction after all transactions
in a group have reached the commit phase.
The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
lives until rpl_parallel_entry::last_committed_sub_id has reached the last
transaction in the GCO. This guarantees that the GCO will still be alive when
a transaction does mark_start_commit(). Also, we now loop over the list of
active GCOs for wakeup, to ensure we do not lose a wakeup even in the
problematic case.
Use include/sync_with_master_gtid.inc instead of --sync_with_master to avoid a
race in the test case.
In parallel replication, the old-style slave position (which is used by
--sync_with_master) is updated out-of-order between parallel threads. This
makes it possible for the position to be updated past DROP TEMPORARY TABLE t2
just before the commit of INSERT INTO t1 SELECT * FROM t2 becomes visible.
In this case, there is a small window where a SELECT just after
--sync_with_master may not see the changes from the INSERT.
Fix:
===
Backport Bug#11756194 to mysql-5.5. slave breaks if
'drop database' fails on master and mismatched tables on
slave.
'DROP TABLE <deleted tables>' was binlogged when
'DROP DATABASE' failed and at least one table was deleted
from the database. The log event would lead slave SQL thread
stop if some of the tables did not exist on slave.
After this patch, It is always binlogged with 'IF EXISTS'
option.
Implement --semi-sync-master-wait-point=AFTER_SYNC|AFTER_COMMIT.
When AFTER_SYNC, the semi-sync wait will be done earlier, before the storage
engine commit rather than after. This means that a transaction will not be
visible on the master until at least one slave has received it.
Make the binlog dump threads not need to take LOCK_log while sending
binlog events to slave. Instead, a new LOCK_binlog_end_pos is used
just to coordinate tracking the current end-of-log.
This is a pre-requisite for MDEV-162, "Enhanced semisync
replication". It should also help reduce the contention on LOCK_log on
a busy master.
Also does some much-needed refactoring/cleanup of the related code in
the binlog dump thread.
Problem was that repair() did lock and unlock tables, which leaved already locked tables in wrong state
include/my_check_opt.h:
Added option T_NO_LOCKS to disable locking during repair()
Fixed duplicated bit T_NO_CREATE_RENAME_LSN
mysql-test/suite/rpl/r/myisam_external_lock.result:
Test case for MDEV-6871
mysql-test/suite/rpl/t/myisam_external_lock-slave.opt:
Test case for MDEV-6871
mysql-test/suite/rpl/t/myisam_external_lock.test:
Test case for MDEV-6871
storage/maria/ha_maria.cc:
Don't lock tables during enable_indexes()
Removed some calls to current_thd
storage/myisam/ha_myisam.cc:
Don't lock tables during enable_indexes()
Removed some calls to current_thd
A test clean-up: The "SHOW DATABASES" queries now use "LIKE 'db%'",
to display only the databases created during this test,
thus exclude the system databases, as some of them can be optional
(e.g. performance_schema).
Implement a new mode for parallel replication. In this mode, all transactions
are optimistically attempted applied in parallel. In case of conflicts, the
offending transaction is rolled back and retried later non-parallel.
This is an early-release patch to facilitate testing, more changes to user
interface / options will be expected. The new mode is not enabled by default.
There was a race. The test case was expecting the slave to start processing a
particular DELETE statement, then the test would stop the slave at this
point. But there was missing something to wait until the slave would actually
reach this point; thus depending on timing it was possible that the slave
would be stopped too early, causing .result file difference.
Fixed by adding an appropriate wait to the test case.
Fix rare failures in test case rpl.rpl_gtid_basic:
- Add another possible error code when a connection is killed.
- Make sure that the IO thread has had time to complete its stop after START
SLAVE UNTIL. Otherwise, START SLAVE might run before IO thread stop,
leaving the test case with a stopped IO thread that eventually causes a
wait timeout.
There was a race, a small window between updating slave position and updating
Seconds_Behind_Master, during which the test case could see the wrong value.
Fix by waiting for the expected status to appear.
The replication relay log position was sometimes updated incorrectly at the
end of a transaction in parallel replication. This happened because the relay
log file name was taken from the current Relay_log_info (SQL driver thread),
not the correct value for the transaction in question.
The result was that if a transaction was applied while the SQL driver thread
was at least one relay log file ahead, _and_ the SQL thread was subsequently
stopped before applying any events from the most recent relay log file, then
the relay log position would be incorrect - wrong relay log file name. Thus,
when the slave was started again, usually a relay log read error would result,
or in rare cases, if the position happened to be readable, the slave might
even skip arbitrary amounts of events.
In GTID mode, the relay log position is reset when both slave threads are
restarted, so this bug would only be seen in non-GTID mode, or in GTID mode
when only the SQL thread, not the IO thread, was stopped.
I saw two test failures in rpl.rpl_gtid_crash where we get this in the error
log:
141123 12:47:54 [Note] InnoDB: Restoring possible half-written data pages
141123 12:47:54 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Warning: database page corruption or a failed
InnoDB: file read of space 6 page 3.
InnoDB: Trying to recover it from the doublewrite buffer.
141123 12:47:54 [Note] InnoDB: Recovered the page from the doublewrite buffer.
This test case deliberately crashes the server, and if this crash happens
right in the middle of writing a buffer pool page to disk, it is not
unexpected that we can get a half-written page. The page is recovered
correctly from the doublewrite buffer.
So this patch adds a suppression for this warning in the error log for this
test case.
When a master slave restarts, it logs a special restart format description
event in its binlog. When the slave sees this event, it knows it needs to roll
back any active partial transaction, in case the master crashed previously in
the middle of writing such transaction to its binlog.
However, there was a bug where this rollback did not reset rgi->pending_gtid.
This caused the @@gtid_slave_pos to be updated incorrectly with the GTID of
the partial transaction that was rolled back.
Fix this by always clearing rgi->pending_gtid in cleanup_context(), hopefully
preventing similar bugs from turning up in other special cases where a
transaction is rolled back during replication.
Thanks to Pavel Ivanov for tracking down the issue and providing a test case.
The test case rpl.rpl_parallel_temptable deliberately crashes the master
server as part of the testing. This makes it unsuitable for Valgrind
testing. So make sure that it will be skipped when testing with Valgrind.
The real problem here was inconsistent handling of entry->commit_errno in
MYSQL_BIN_LOG::write_transaction_or_stmt(). Some return paths were setting it
to the value of errno, some where not. And the setting was redundant anyway,
as it is set consistently by the caller.
Fix by consistently setting it in the caller, and not in each return path in
the function.
The test failure happened because a DBUG_EXECUTE_IF() used in the test case
set an entry->commit_errno that was immediately overwritten in the caller with
whatever happened to be the value of errno. This could lead to different error
message in the .result file.
This bug was seen when parallel replication experienced a deadlock between
transactions T1 and T2, where T2 has reached the commit phase and is waiting
for T1 to commit first. In this case, the deadlock is broken by sending a kill
to T2; that kill error is then later detected and converted to a deadlock
error, which causes T2 to be rolled back and retried.
The problem was that the kill caused ha_commit_trans() to errorneously call
wakeup_subsequent_commits() on T3, signalling it to abort because T2 failed
during commit. This is incorrect, because the error in T2 is only a temporary
error, which will be resolved by normal transaction retry. We should not
signal error to the next transaction until we have executed the code that
handles such temporary errors.
So this patch just removes the calls to wakeup_subsequent_commits() from
ha_commit_trans(). They are incorrect in this case, and they are not needed in
general, as wakeup_subsequent_commits() must in any case be called in
finish_event_group() to wakeup any transactions that may have started to wait
after ha_commit_trans(). And normally, wakeup will in fact have happened
earlier, either from the binlog group commit code, or (in case of no
binlogging) after the fast part of InnoDB/XtraDB group commit.
The symptom of this bug was that replication would break on some transaction
with "Commit failed due to failure of an earlier commit on which this one
depends", but with no such failure of an earlier commit visible anywhere.
The retry of an event group in parallel replication set the wrong value for
the end log position of the event that was retried
(qev->future_event_relay_log_pos). It was too large by the size of the event,
so it pointed into the middle of the following event.
If the retry happened in the very last event of the event group, _and_ the SQL
thread was stopped just after successfully retrying that event, then the SQL
threads's relay log position would be left incorrect. Restarting the SQL
thread could then try to read events from a garbage offset in the relay log,
usually leading to an error about not being able to read the event.
The code in binlog group commit around wait_for_commit that controls commit
order, did the wakeup of subsequent commits early, as soon as a following
transaction is put into the group commit queue, but before any such commit has
actually taken place. This causes problems with too early wakeup of
transactions that need to wait for prior to commit, but do not take part in
the binlog group commit for one reason or the other.
This patch solves the problem, by moving the wakeup to happen only after the
binlog group commit is completed.
This requires a new solution to ensure that transactions that arrive later
than the leader are still able to participate in group commit. This patch
introduces a flag wait_for_commit::commit_started. When this is set, a waiter
can queue up itself in the group commit queue.
This way, effectively the wait_for_prior_commit() is skipped only for
transactions that participate in group commit, so that skipping the wait is
safe. Other transactions still wait as needed for correctness.
The test case had a classic mistake: SET DEBUG_SYNC='now SIGNAL xxx'
followed immediately by SET DEBUG_SYNC='RESET'. This makes it
possible for the waiter to miss the signal, if it does not manage
to wake up prior to the RESET.
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6
This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220
Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.
innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.
dd2d11be7a
The test case runs SHOW SLAVE HOSTS. The output of this is only stable after
all slaves have had time to register with the master; this happens
asynchroneously.
The test was waiting for the slave with server_id=3 to appear in the output,
but it was missing a similar wait for server_id=2. Thus, if server_id=2 was
much slower to connect for some reason, it could be missing from the output,
causing the test to fail.
merge from MySQL-5.6, revision:
revno: 3677.2.1
committer: Alfranio Correia <alfranio.correia@oracle.com>
timestamp: Tue 2012-02-28 16:26:37 +0000
message:
BUG#13627921 - MISSING FLAGS IN SQL_COMMAND_FLAGS MAY LEAD TO REPLICATION PROBLEMS
Flags in sql_command_flags[command] are not correctly set for the following
commands:
. SQLCOM_SET_OPTION is missing CF_CAN_GENERATE_ROW_EVENTS;
. SQLCOM_BINLOG_BASE64_EVENT is missing CF_CAN_GENERATE_ROW_EVENTS;
. SQLCOM_REVOKE_ALL is missing CF_CHANGES_DATA;
. SQLCOM_CREATE_FUNCTION is missing CF_AUTO_COMMIT_TRANS;
This may lead to a wrong sequence of events in the binary log. To fix
the problem, we correctly set the flags in sql_command_flags[command].
Added comments
Ensure that tokudb test works even if jemalloc is not installed
Removed not referenced function Item::remove_fixed()
mysql-test/suite/rpl/t/rpl_gtid_reconnect.test:
Fixed race condition
sql/item.cc:
Indentation fix
sql/item.h:
Removed not used function
Added comment
sql/sql_select.cc:
Fixed indentation
storage/tokudb/mysql-test/rpl/include/have_tokudb.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb_add_index/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb_alter_table/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb_bugs/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb_mariadb/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
We should assume that the store engine will report the first duplicate key for this case.
Old code of suppression of unsafe logging error with LIMIT didn't work, because of wrong usage of my_interval_timer().
Suppress unsafe logging errors to the error log if we get too many unsafe logging errors in a short time.
This is to not overflow the error log with meaningless errors.
- Each error code is suppressed and counted separately.
- We do a 5 minute suppression of new errors if we get more than 10 errors in that time.
Only print unsafe logging errors if log_warnings > 1.
mysql-test/suite/binlog/r/binlog_stm_unsafe_warning.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
mysql-test/suite/binlog/r/binlog_unsafe.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
mysql-test/suite/engines/README:
Fixed typos
mysql-test/suite/rpl/r/rpl_known_bugs_detection.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
sql/sql_base.cc:
Don't log warning if there are two unique keys used with INSERT .. ON DUPLICATE KEY UPDATE.
We should assume that the store engine will report the first duplicate key for this case.
sql/sql_class.cc:
Suppress error in binary log if we get too many unsafe logging errors in a short time.
Only print unsafe logging errors if log_warnings > 1