Add an option to control whether the master should keep waiting
until timeout when it detected that there is no semi-sync slave
available.
The bool option 'rpl_semi_sync_master_wait_no_slave' is 1 by
defalt, and will keep waiting until timeout. When set to 0, the
master will switch to asynchronous replication immediately when
no semi-sync slave is available.
Semi-sync status were not reset by FLUSH STATUS, this was because
all semi-sync status variables are defined as SHOW_FUNC and FLUSH
STATUS could only reset SHOW_LONG type variables.
This problem is fixed by change all status variables that should
be reset by FLUSH STATUS from SHOW_FUNC to SHOW_LONG.
After the fix, the following status variables will be reset by
FLUSH STATUS:
Rpl_semi_sync_master_yes_tx
Rpl_semi_sync_master_no_tx
Note: normally, FLUSH STATUS itself will be written into binlog
and be replicated, so after FLUSH STATS, one of
Rpl_semi_sync_master_yes_tx
Rpl_semi_sync_master_no_tx
can be 1 dependent on the semi-sync status. So it's recommended
to use FLUSH NO_WRITE_TO_BINLOG STATUS to avoid this.
Semi-sync uses an extra connection from slave to master to send
replies, this is a normal client connection, and used a normal
SET query to set the reply information on master, which is visible
to user and may cause some confusion and complaining.
This problem is fixed by using the method of sending reply by
using the same connection that is used by master dump thread to
send binlog to slave. Since now the semi-sync plugins are integrated
with the server code, it is not a problem to use the internal net
interfaces to do this.
The master dump thread will mark the event requires a reply and
wait for the reply when the event just sent is the last event
of a transaction and semi-sync status is ON; And the slave will
send a reply to master when it received such an event that requires
a reply.
If an EVENT is created without the DEFINER clause set explicitly or with it set
to CURRENT_USER, the master and slaves become inconsistent. This issue stems from
the fact that in both cases, the DEFINER is set to the CURRENT_USER of the current
thread. On the master, the CURRENT_USER is the mysqld's user, while on the slave,
the CURRENT_USER is empty for the SQL Thread which is responsible for executing
the statement.
To fix the problem, we do what follows. If the definer is not set explicitly,
a DEFINER clause is added when writing the query into binlog; if 'CURRENT_USER' is
used as the DEFINER, it is replaced with the value of the current user before
writing to binlog.
Slave does not correctly handle "expected errors" leading to inconsistencies
between the mater and slave. Specifically, when a statement changes both
transactional and non-transactional tables, the transactional changes are
automatically rolled back on the master but the slave ignores the error and
does not roll them back thus leading to inconsistencies.
To fix the problem, we automatically roll back a statement that fails on
the slave but note that the transaction is not rolled back unless a "rollback"
command is in the relay log file.
binlog
Mixing transactional (T) and non-transactional (N) tables on behalf of a
transaction may lead to inconsistencies among masters and slaves in STATEMENT
mode. The problem stems from the fact that although modifications done to
non-transactional tables on behalf of a transaction become immediately visible
to other connections they do not immediately get to the binary log and therefore
consistency is broken. Although there may be issues in mixing T and M tables in
STATEMENT mode, there are safe combinations that clients find useful.
In this bug, we fix the following issue. Mixing N and T tables in multi-level
(e.g. a statement that fires a trigger) or multi-table table statements (e.g.
update t1, t2...) were not handled correctly. In such cases, it was not possible
to distinguish when a T table was updated if the sequence of changes was N and T.
In a nutshell, just the flag "modified_non_trans_table" was not enough to reflect
that both a N and T tables were changed. To circumvent this issue, we check if an
engine is registered in the handler's list and changed something which means that
a T table was modified.
Check WL 2687 for a full-fledged patch that will make the use of either the MIXED or
ROW modes completely safe.
In STATEMENT based replication, a statement that failed on the master but that
updated non-transactional tables is written to binary log with the error code
appended to it. On the slave, the statement is executed and the same error is
expected. However, when an "expected error" did not happen on the slave and was
either ignored or was related to a concurrency issue on the master, the slave
did not rollback the effects of the statement and as such inconsistencies might
happen.
To fix the problem, we automatically rollback a statement that should have
failed on a slave but succeded and whose expected failure is either ignored or
stems from a concurrency issue on the master.
If the log_bin_trust_function_creators option is not defined, creating a stored
function requires either one of the modifiers DETERMINISTIC, NO SQL, or READS
SQL DATA. Executing a stored function should also follows the same rules if in
STATEMENT mode. However, this was not happening and a wrong error was being
printed out: ER_BINLOG_ROW_RBR_TO_SBR.
The patch makes the creation and execution compatible and prints out the correct
error ER_BINLOG_UNSAFE_ROUTINE when a stored function without one of the modifiers
above is executed in STATEMENT mode.
to wrong result
When using MIXED mode and issuing 'CREATE TEMPORARY TABLE t_tmp',
the statement is logged if the current binlogging mode is
STATEMENT. This causes the slave to replay the instruction and
create the temporary table as well. If there is no switch to ROW
mode, and later on a 'DROP TEMPORARY TABLE t_tmp' is issued, then
this statement will also be logged and the slave will
remove/close the temporary table.
However, if there is a switch to ROW mode between the CREATE and
DROP TEMPORARY table, the DROP statement will not be logged,
leaving the slave with a dangling temporary table.
This patch addresses this, by always logging a DROP TEMPORARY
TABLE IF EXISTS when in mixed mode and a drop statement is issued
for temporary table(s).
binlog
The fix for BUG 43929 introduced a regression issue. In a nutshell, when a
statement that changes a non-transactional table fails, it is written to the
binary log with the error code appended. Unfortunately, after BUG 43929, this
failure was flushing the transactional chace causing mismatch between execution
and logging histories. To fix this issue, we avoid flushing the transactional
cache when a commit or rollback is not issued.
The "get_master_version_and_clock(...)" function in sql/slave.cc ignores
error and passes directly when queries fail, or queries succeed
but the result retrieved is empty.
The "get_master_version_and_clock(...)" function should try to reconnect master
if queries fail because of transient network problems, and fail otherwise.
The I/O thread should print a warning if the some system variables do not
exist on master (very old master)
There is an inconsistency with DROP DATABASE|TABLE|EVENT IF EXISTS and
CREATE DATABASE|TABLE|EVENT IF NOT EXISTS. DROP IF EXISTS statements are
binlogged even if either the DB, TABLE or EVENT does not exist. In
contrast, Only the CREATE EVENT IF NOT EXISTS is binlogged when the EVENT
exists.
This patch fixes the following cases for all the replication formats:
CREATE DATABASE IF NOT EXISTS.
CREATE TABLE IF NOT EXISTS,
CREATE TABLE IF NOT EXISTS ... LIKE,
CREAET TABLE IF NOT EXISTS ... SELECT.
Had attempted to disable this test on Windows only, but the nature of this bug
does not allow for this. The master.opt file is processed before anything in
in the actual test. As a result, we must use disabled.def files to ensure
these tests are skipped on the problematic platforms.
Removed Windows-only code and updated the proper disabled.def files accordingly.
timeout
In STMT and MIXED modes, a statement that changes both non-transactional and
transactional tables must be written to the binary log whenever there are
changes to non-transactional tables. This means that the statement gets into the
binary log even when the changes to the transactional tables fail. In particular
, in the presence of a failure such statement is annotated with the error number
and wrapped in a begin/rollback. On the slave, while applying the statement, it
is expected the same failure and the rollback prevents the transactional changes
to be persisted.
Unfortunately, statements that fail due to concurrency issues (e.g. deadlocks,
timeouts) are logged in the same way causing the slave to stop as the statements
are applied sequentially by the SQL Thread. To fix this bug, we automatically
ignore concurrency failures on the slave. Specifically, the following failures
are ignored: ER_LOCK_WAIT_TIMEOUT, ER_LOCK_DEADLOCK and ER_XA_RBDEADLOCK.
Large transactions and statements may corrupt the binary log if the size of the
cache, which is set by the max_binlog_cache_size, is not enough to store the
the changes.
In a nutshell, to fix the bug, we save the position of the next character in the
cache before starting processing a statement. If there is a problem, we simply
restore the position thus removing any effect of the statement from the cache.
Unfortunately, to avoid corrupting the binary log, we may end up loosing changes
on non-transactional tables if they do not fit in the cache. In such cases, we
store an Incident_log_event in order to stop the slave and alert users that some
changes were not logged.
Precisely, for every non-transactional changes that do not fit into the cache,
we do the following:
a) the statement is *not* logged
b) an incident event is logged after committing/rolling back the transaction,
if any. Note that if a failure happens before writing the incident event to
the binary log, the slave will not stop and the master will not have reported
any error.
c) its respective statement gives an error
For transactional changes that do not fit into the cache, we do the following:
a) the statement is *not* logged
b) its respective statement gives an error
To work properly, this patch requires two additional things. Firstly, callers to
MYSQL_BIN_LOG::write and THD::binlog_query must handle any error returned and
take the appropriate actions such as undoing the effects of a statement. We
already changed some calls in the sql_insert.cc, sql_update.cc and sql_insert.cc
modules but the remaining calls spread all over the code should be handled in
BUG#37148. Secondly, statements must be either classified as DDL or DML because
DDLs that do not get into the cache must generate an incident event since they
cannot be rolled back.
The server was not cleaning the last IO error and error number when
resetting slave.
This patch addresses this issue by backporting into 5.1 part of the
patch in BUG 34654. A fix for this issue had already been pushed into
6.0 as part of the aforementioned bug, however the patch also included
some refactoring. The fix for 5.1 does not take into account the
refactoring part.
Disabling these two tests as they are affected by this bug / causing PB2 failures
on Windows platforms. Can always disable via include/not_windows.inc if
the bug fix looks like it will take some time.
Certain multi-updates gave different results on InnoDB from
to MyISAM, due to on-the-fly updates being used on the former and
the update order matters.
Fixed by turning off on-the-fly updates when update order
dependencies are present.
1. Replace waiting of SQL thread stop by waiting of SQL error on slave and stopped
SQL thread.
2. Remove debug code because it already implemented in MTR2.
Respectively, replaced "--exec diff" by "--diff_files" which is a mysqltest command to run a
non-operating system specific diff. Removed the file rpl_000015-slave.sh as it is not
necessary in the new MTR.