In SBR, if a statement does not fail, it is always written to the binary
log, regardless if rows are changed or not. If there is a failure, a
statement is only written to the binary log if a non-transactional (.e.g.
MyIsam) engine is updated.
INSERT ON DUPLICATE KEY UPDATE and INSERT IGNORE were not following the
rule above and were not written to the binary log, if then engine was
Innodb.
The test started failing on the same day patch for BUG 49978 was
pushed. BUG 49978 changed part of the replication testing
infrastructure in mysql-test-run. This caused the test to fail
sporadically with result differences on relay log file
names. When the test fails the relay-log filenames are shifted by
one, eg:
-show relaylog events in 'slave-relay-bin.000002' from <binlog_start>;
+show relaylog events in 'slave-relay-bin.000003' from <binlog_start>;
The problem was caused by a bad cleanup when using the include
files:
- include/setup_fake_relay_log.inc
- include/cleanup_fake_relay_log.inc
Which would leave a spurious relay-log file around (not listed in
slave-relay-bin.index), causing the server to shift the name of
the relay logs by one, even if cleaning up with RESET SLAVE.
We fix this by removing the relay-log file when it is not needed
anymore, ie at setup time and after recreating the fake relay-log
index.
Additionally, to make the affected test more resilient, we
deployed a call to rpl_reset.inc (which resets both master and
slave, including log files) before actually running the test
case.
Finally, appart from the reported bug, we also fix: (a) an
unrelated issue with the failing test itself - in some cases, the
test was not setting the log file name to use when it should;
(b) one typo.
Normally, auto_increment value is generated for the column by
inserting either NULL or 0 into it. NO_AUTO_VALUE_ON_ZERO
suppresses this behavior for 0 so that only NULL generates
the auto_increment value. This behavior is also followed by
a slave, specifically by the SQL Thread, when applying events
in the statement format from a master. However, when applying
events in the row format, the flag was ignored thus causing
an assertion failure:
"Assertion failed: next_insert_id == 0, file .\handler.cc"
In fact, we never need to generate a auto_increment value for
the column when applying events in row format on slave. So we
don't allow it to happen by using 'MODE_NO_AUTO_VALUE_ON_ZERO'.
Refactoring: Get rid of all the sql_mode checks to rows_log_event
when applying it for avoiding problems caused by the inconsistency
of the sql_mode on slave and master as the sql_mode is not set for
Rows_log_event.
Normally, auto_increment value is generated for the column by
inserting either NULL or 0 into it. NO_AUTO_VALUE_ON_ZERO
suppresses this behavior for 0 so that only NULL generates
the auto_increment value. This behavior is also followed by
a slave, specifically by the SQL Thread, when applying events
in the statement format from a master. However, when applying
events in the row format, the flag was ignored thus causing
an assertion failure:
"Assertion failed: next_insert_id == 0, file .\handler.cc"
In fact, we never need to generate a auto_increment value for
the column when applying events in row format on slave. So we
don't allow it to happen by using 'MODE_NO_AUTO_VALUE_ON_ZERO'.
Refactoring: Get rid of all the sql_mode checks to rows_log_event
when applying it for avoiding problems caused by the inconsistency
of the sql_mode on slave and master as the sql_mode is not set for
Rows_log_event.
Major replication test framework cleanup. This does the following:
- Ensure that all tests clean up the replication state when they
finish, by making check-testcase check the output of SHOW SLAVE STATUS.
This implies:
- Slave must not be running after test finished. This is good
because it removes the risk for sporadic errors in subsequent
tests when a test forgets to sync correctly.
- Slave SQL and IO errors must be cleared when test ends. This is
good because we will notice if a test gets an unexpected error in
the slave threads near the end.
- We no longer have to clean up before a test starts.
- Ensure that all tests that wait for an error in one of the slave
threads waits for a specific error. It is no longer possible to
source wait_for_slave_[sql|io]_to_stop.inc when there is an error
in one of the slave threads. This is good because:
- If a test expects an error but there is a bug that causes
another error to happen, or if it stops the slave thread without
an error, then we will notice.
- When developing tests, wait_for_*_to_[start|stop].inc will fail
immediately if there is an error in the relevant slave thread.
Before this patch, we had to wait for the timeout.
- Remove duplicated and repeated code for setting up unusual replication
topologies. Now, there is a single file that is capable of setting
up arbitrary topologies (include/rpl_init.inc, but
include/master-slave.inc is still available for the most common
topology). Tests can now end with include/rpl_end.inc, which will clean
up correctly no matter what topology is used. The topology can be
changed with include/rpl_change_topology.inc.
- Improved debug information when tests fail. This includes:
- debug info is printed on all servers configured by include/rpl_init.inc
- User can set $rpl_debug=1, which makes auxiliary replication files
print relevant debug info.
- Improved documentation for all auxiliary replication files. Now they
describe purpose, usage, parameters, and side effects.
- Many small code cleanups:
- Made have_innodb.inc output a sensible error message.
- Moved contents of rpl000017-slave.sh into rpl000017.test
- Added mysqltest variables that expose the current state of
disable_warnings/enable_warnings and friends.
- Too many to list here: see per-file comments for details.
It is not necessary to support INSERT DELAYED for a single value insert,
while we do not support that for multi-values insert when binlog is
enabled in SBR.
The lock_type is upgrade to TL_WRITE from TL_WRITE_DELAYED for
INSERT DELAYED for single value insert as multi-values insert
did when binlog is enabled. Then it's safe. And binlog it as
INSERT without DELAYED.
When using BINLOG statement to execute rows log events, session variables
foreign_key_checks and unique_checks are changed temporarily. As each rows
log event has their own special session environment and its own
foreign_key_checks and unique_checks can be different from current session
which executing the BINLOG statement. But these variables are not restored
correctly after BINLOG statement. This problem will cause that the following
statements fail or generate unexpected data.
In this patch, code is added to backup and restore these two variables.
So BINLOG statement will not affect current session's variables again.
win x86 debug_max
The windows MTR run exhibited a different test execution
ordering (due to the fact that in these platforms MTR is invoked
with --parallel > 1). This uncovered a bug in the aforementioned
test case, which is triggered by the following conditions:
1. server is not restarted between two different tests;
2. the test before binlog.binlog_row_failure_mixing_engines
issues flush logs;
3. binlog.binlog_row_failure_mixing_engines uses binlog
positions to limit the output of show_binlog_events;
4. binlog.binlog_row_failure_mixing_engines does not state which
binlog file to use, thence it uses a wrong binlog file with
the correct position.
There are two possible fixes: 1. make sure that the test start
from a clean slate - binlog wise; 2. in addition to the position,
also state the binary log file before sourcing
show_binlog_events.inc .
We go for fix#1, ie, deploy a RESET MASTER before the test is
actually started.
After the WL#2687, the binlog_cache_size and max_binlog_cache_size affect both the
stmt-cache and the trx-cache. This means that the resource used is twice the amount
expected/defined by the user.
The binlog_cache_use is incremented when the stmt-cache or the trx-cache is used
and binlog_cache_disk_use is incremented when the disk space from the stmt-cache or the
trx-cache is used. This behavior does not allow to distinguish which cache may be harming
performance due to the extra disk accesses and needs to have its in-memory cache
increased.
To fix the problem, we introduced two new options and status variables related to the
stmt-cache:
Options:
. binlog_stmt_cache_size
. max_binlog_stmt_cache_size
Status Variables:
. binlog_stmt_cache_use
. binlog_stmt_cache_disk_use
So there are
. binlog_cache_size that defines the size of the transactional cache for
updates to transactional engines for the binary log.
. binlog_stmt_cache_size that defines the size of the statement cache for
updates to non-transactional engines for the binary log.
. max_binlog_cache_size that sets the total size of the transactional
cache.
. max_binlog_stmt_cache_size that sets the total size of the statement
cache.
. binlog_cache_use that identifies the number of transactions that used the
temporary transactional binary log cache.
. binlog_cache_disk_use that identifies the number of transactions that used
the temporary transactional binary log cache but that exceeded the value of
binlog_cache_size.
. binlog_stmt_cache_use that identifies the number of statements that used the
temporary non-transactional binary log cache.
. binlog_stmt_cache_disk_use that identifies the number of statements that used
the temporary non-transactional binary log cache but that exceeded the value of
binlog_stmt_cache_size.
The binlog_cache_use is incremented twice when changes to a transactional table
are committed, i.e. TC_LOG_BINLOG::log_xid calls is called. The problem happens
because log_xid calls both binlog_flush_stmt_cache and binlog_flush_trx_cache
without checking if such caches are empty thus unintentionally increasing the
binlog_cache_use value twice.
Although not explicitly mentioned in the bug, the binlog_cache_disk_use presents
the same problem.
The binlog_cache_use and binlog_cache_disk_use are status variables that are
incremented when transactional (i.e. trx-cache) or non-transactional (i.e.
stmt-cache) changes are committed. They are used to compute the ratio between
the use of disk and memory while gathering updates for a transaction.
The problem reported is fixed by avoiding incrementing the binlog_cache_use
and binlog_cache_disk_use if a cache is empty. We also have decided to increment
both variables when a cache is truncated as the cache is used although its
content is discarded and is not written to the binary log.
In this patch, we take the opportunity to re-organize the code around the
function binlog_flush_trx_cache and binlog_flush_stmt_cache.
replication aborts
When recieving a 'SLAVE STOP' command, slave SQL thread will roll back the
transaction and stop immidiately if there is only transactional table updated,
even through 'CREATE|DROP TEMPOARY TABLE' statement are in it. But These
statements can never be rolled back. Because the temporary tables to the user
session mapping remain until 'RESET SLAVE', Therefore it will abort SQL thread
with an error that the table already exists or doesn't exist, when it restarts
and executes the whole transaction again.
After this patch, SQL thread always waits till the transaction ends and then stops,
if 'CREATE|DROP TEMPOARY TABLE' statement are in it.
The root of the problem is that to interrupt a slave SQL thread
wait, the STOP SLAVE implementation uses thd->awake(THD::NOT_KILLED).
This appears as a spurious wakeup (e.g. from a sleep on a
condition variable) to the code that the slave SQL thread is
executing at the time of the STOP. If the code is not written
to be spurious-wakeup safe, unexpected behavior can occur. For
the reported case, this problem led to an infinite loop around
the interruptible_wait() function in item_func.cc (SLEEP()
function implementation). The loop was not being properly
restarted and, consequently, would not come to an end. Since the
SLEEP function sleeps on a timed event in order to be killable
and to perform periodic checks until the requested time has
elapsed, the spurious wake up was causing the requested sleep
time to be reset every two seconds.
The solution is to calculate the requested absolute time only
once and to ensure that the thread only sleeps until this
time is elapsed. In case of a spurious wake up, the sleep is
restarted using the previously calculated absolute time. This
restores the behavior present in previous releases. If a slave
thread is executing a SLEEP function, a STOP SLAVE statement
will wait until the time requested in the sleep function
has elapsed.
heavy load".
rpl_row_sp003.test has sporadically failed when run on machine
under heavy load or on slow hardware.
This patch fixes races in the test which were causing these
failures and also removes unnecessary 100 second wait from it.
The binlog_cache_use is incremented twice when changes to a transactional table
are committed, i.e. TC_LOG_BINLOG::log_xid calls is called. The problem happens
because log_xid calls both binlog_flush_stmt_cache and binlog_flush_trx_cache
without checking if such caches are empty thus unintentionally increasing the
binlog_cache_use value twice.
To fix the problem we avoided incrementing the binlog_cache_use if the cache is
empty. We also decided to increment binlog_cache_use when the cache is truncated
as the cache is used although its content is discarded and is not written to the
binary log.
Note that binlog_cache_use is incremented for both types of cache, transactional
and non-transactional and that the behavior presented in this patch also applies
to the binlog_cache_disk_use.
Finally, we re-organized the code around the functions binlog_flush_trx_cache and
binlog_flush_stmt_cache.
The lock_type is upgrade to TL_WRITE from TL_WRITE_DELAYED for
INSERT DELAYED when inserting multi values in one statement.
It's safe. But it causes an unsafe warning in SBR.
Make INSERT DELAYED safe by logging it as INSERT without DELAYED.
temp table
This patch introduces two key changes in the replication's behavior.
Firstly, it reverts part of BUG#51894 which puts any update to temporary tables
into the trx-cache. Now, updates to temporary tables are handled according to
the type of their engines as a regular table.
Secondly, an unsafe mixed statement, (i.e. a statement that access transactional
table as well non-transactional or temporary table, and writes to any of them),
are written into the trx-cache in order to minimize errors in the execution when
the statement logging format is in use.
Such changes has a direct impact on which statements are classified as unsafe
statements and thus part of BUG#53259 is reverted.
'CREATE TABLE IF NOT EXISTS ... SELECT' behaviour
BUG#47132, BUG#47442, BUG49494, BUG#23992 and BUG#48814 will disappear
automatically after the this patch.
BUG#55617 is fixed by this patch too.
This is the 5.5 part.
It implements:
- 'CREATE TABLE IF NOT EXISTS ... SELECT' statement will not insert
anything and binlog anything if the table already exists.
It only generate a warning that table already exists.
- A couple of test cases for the behavior changing.
'CREATE TABLE IF NOT EXISTS ... SELECT' behaviour
BUG#55474, BUG#55499, BUG#55598, BUG#55616 and BUG#55777 are fixed
in this patch too.
This is the 5.1 part.
It implements:
- if the table exists, binlog two events: CREATE TABLE IF NOT EXISTS
and INSERT ... SELECT
- Insert nothing and binlog nothing on master if the existing object
is a view. It only generates a warning that table already exists.