One statement that have more than one different tables to update with
autoinc columns just was marked as unsafe in mixed mode, so the unsafe
warning can't be produced in statement mode.
To fix the problem, mark the statement as unsafe in statement mode too.
binlog, replication aborts
In SBR or MBR, the schema name is not being written to the binlog
when executing a LOAD DATA statement. This becomes a problem when
the current database (lets call it db1) is different from the
table's schema (lets call it db2). For instance, take the
following statements:
use db1;
load data local infile 'infile.txt' into table db2.t
Should this statement be logged without t's schema (db2), when
replaying it, one can get db1.t populated instead of db2.t (if
db1.t exists). On the other hand, if there is no db1.t at all,
replication will stop.
We fix this by always logging the table (in load file) with fully
qualified name when its schema is different from the current
database or when no default database was selected.
The problem is that there is only one autoinc value associated with
the query when binlogging. If more than one autoinc values are used
in the query, the autoinc values after the first one can be inserted
wrongly on slave. So these autoinc values can become inconsistent on
master and slave.
The problem is resolved by marking all the statements that invoke
a trigger or call a function that updated autoinc fields as unsafe,
and will switch to row-format in Mixed mode. Actually, the statement
is safe if just one autoinc value is used in sub-statement, but it's
impossible to check how many autoinc values are used in sub-statement.)
vs not null
NOTE: Backporting the patch to next-mr.
The replication was generating corrupted data, warning messages on Valgrind
and aborting on debug mode while replicating a "null" to "not null" field.
Specifically the unpack_row routine, was considering the slave's table
definition and trying to retrieve a field value, where there was nothing to be
retrieved, ignoring the fact that the value was defined as "null" by the master.
To fix the problem, we proceed as follows:
1 - If it is not STRICT sql_mode, implicit default values are used, regardless
if it is multi-row or single-row statement.
2 - However, if it is STRICT mode, then a we do what follows:
2.1 If it is a transactional engine, we do a rollback on the first NULL that is
to be set into a NOT NULL column and return an error.
2.2 If it is a non-transactional engine and it is the first row to be inserted
with multi-row, we also return the error. Otherwise, we proceed with the
execution, use implicit default values and print out warning messages.
Unfortunately, the current patch cannot mimic the behavior showed by the master
for updates on multi-tables and multi-row inserts. This happens because such
statements are unfolded in different row events. For instance, considering the
following updates and strict mode:
(master)
create table t1 (a int);
create table t2 (a int not null);
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
t1 would have (10) and t2 would have (0) as this would be handled as a
multi-row update. On the other hand, if we had the following updates:
(master)
create table t1 (a int);
create table t2 (a int);
(slave)
create table t1 (a int);
create table t2 (a int not null);
(master)
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
On the master t1 would have (10) and t2 would have (NULL). On
the slave, t1 would have (10) but the update on t1 would fail.
NOTE: Backporting the patch to next-mr.
The reason of the bug was incompatibile with the master side behaviour.
INSERT query on the master is allowed to insert into a table without specifying
values of DEFAULT-less fields if sql_mode is not strict.
Fixed with checking sql_mode by the sql thread to decide how to react.
Non-strict sql_mode should allow Write_rows event to complete.
todo: warnings can be shown via show slave status, still this is a
separate rather general issue how to show warnings for the slave threads.
files
NOTE: this is the backport to next-mr.
SHOW BINLOG EVENTS does not work with relay log files. If issuing
"SHOW BINLOG EVENTS IN 'relay-log.000001'" in a non-empty relay
log file (relay-log.000001), mysql reports empty set.
This patch addresses this issue by extending the SHOW command
with RELAYLOG. Events in relay log files can now be inspected by
issuing SHOW RELAYLOG EVENTS [IN 'log_name'] [FROM pos] [LIMIT
[offset,] row_count].
failure to cleanup of table
The test case was not dropping a table before exiting (ie, it was
not cleaning itself after execution). In this case, the warning
message stating that the test did not do a proper cleanup was
deterministic (which can be annoying).
I have found other tests cases on which mtr sporadically reports
that they have not cleaned up after execution:
- rpl_ndb_circular
- rpl_failed_optimize
In this case, the master was dropping a table but there was no
synchronization between the slave and the master.
This patch addresses the rpl_ndb_circular_simplex case by adding
the missing DROP table. The other cases are fixed by deploying
the missing sync_slave_with_master instruction.
Network error happened here, but it can be caused by CR_CONNECTION_ERROR,
CR_CONN_HOST_ERROR, CR_SERVER_GONE_ERROR, CR_SERVER_LOST, ER_CON_COUNT_ERROR,
and ER_SERVER_SHUTDOWN. We just check CR_SERVER_LOST here, so the test fails.
To fix the problem, check all errors that can be cause by the master shutdown.
In RBR, There is an inconsistency between slaves and master.
When INSERT statement which includes an auto_increment field is executed,
Store engine of master will check the value of the auto_increment field.
It will generate a sequence number and then replace the value, if its value is NULL or empty.
if the field's value is 0, the store engine will do like encountering the NULL values
unless NO_AUTO_VALUE_ON_ZERO is set into SQL_MODE.
In contrast, if the field's value is 0, Store engine of slave always generates a new sequence number
whether or not NO_AUTO_VALUE_ON_ZERO is set into SQL_MODE.
SQL MODE of slave sql thread is always consistency with master's.
Another variable is related to this bug.
If generateing a sequence number is decided by the values of
table->auto_increment_field_not_null and SQL_MODE(if includes MODE_NO_AUTO_VALUE_ON_ZERO)
The table->auto_increment_is_not_null is FALSE, which causes this bug to appear. ..
Essentially, Bug#45574 results in this bug. The 'CREATE DATABASE IF NOT EXISTS' statement was not
binlogged, when the database has existed.
Sometimes, the master and slaves become inconsistent. The "CREATE DATABASE
IF NOT EXISTS mysqltest1" statement is not binlogged
if the db 'mysqltest1' existed before the test case is executed.
So the db 'mysqltest1' can't be created on slave.
Patch of Bug#45574 has resolved this problem.
But I think it is better to replace 'mysqltest1' by default db 'test'.
Slave does not correctly handle "expected errors" leading to inconsistencies
between the mater and slave. Specifically, when a statement changes both
transactional and non-transactional tables, the transactional changes are
automatically rolled back on the master but the slave ignores the error and
does not roll them back thus leading to inconsistencies.
To fix the problem, we automatically roll back a statement that fails on
the slave but note that the transaction is not rolled back unless a "rollback"
command is in the relay log file.
binlog
Mixing transactional (T) and non-transactional (N) tables on behalf of a
transaction may lead to inconsistencies among masters and slaves in STATEMENT
mode. The problem stems from the fact that although modifications done to
non-transactional tables on behalf of a transaction become immediately visible
to other connections they do not immediately get to the binary log and therefore
consistency is broken. Although there may be issues in mixing T and M tables in
STATEMENT mode, there are safe combinations that clients find useful.
In this bug, we fix the following issue. Mixing N and T tables in multi-level
(e.g. a statement that fires a trigger) or multi-table table statements (e.g.
update t1, t2...) were not handled correctly. In such cases, it was not possible
to distinguish when a T table was updated if the sequence of changes was N and T.
In a nutshell, just the flag "modified_non_trans_table" was not enough to reflect
that both a N and T tables were changed. To circumvent this issue, we check if an
engine is registered in the handler's list and changed something which means that
a T table was modified.
Check WL 2687 for a full-fledged patch that will make the use of either the MIXED or
ROW modes completely safe.
The "get_master_version_and_clock(...)" function in sql/slave.cc ignores
error and passes directly when queries fail, or queries succeed
but the result retrieved is empty.
The "get_master_version_and_clock(...)" function should try to reconnect master
if queries fail because of transient network problems, and fail otherwise.
The I/O thread should print a warning if the some system variables do not
exist on master (very old master)
The test case added failed sporadically on PB. This is due to the
fact that the user thread in some cases is waiting for slave IO
to stop and then check the error number. Thence, sometimes the
user thread would race for the error number with IO thread.
This post push fix addresses this by replacing the wait for slave
io to stop with a wait for slave io error (as it seems it was
added in 6.0 also after patch on which this is based was
pushed). This implied backporting wait_for_slave_io_error.inc
from 6.0 also.
The server was not cleaning the last IO error and error number when
resetting slave.
This patch addresses this issue by backporting into 5.1 part of the
patch in BUG 34654. A fix for this issue had already been pushed into
6.0 as part of the aforementioned bug, however the patch also included
some refactoring. The fix for 5.1 does not take into account the
refactoring part.
1. Test case was rewritten completely.
2. Test covers 3 cases:
a) do deadlock on slave, wait retries of transaction, unlock slave before lock
timeout;
b) do deadlock on slave and wait error 'lock timeout exceed' on slave;
c) same as b) but if of max relay log size = 0;
3. Added comments inline.
4. Updated result file.
LOAD_FILE
LOAD_FILE is not safe to replicate in STATEMENT mode, because it
depends on a file (which is loaded on master and may not exist in
slave(s)). This leads to scenarios on which the slave replicates the
statement with 'load_file' and it will try to load the file from local
file system. Given that the file may not exist in the slave filesystem
the operation will not succeed (probably returning NULL), causing
master and slave(s) to diverge. However, when using MIXED mode
replication, this can be made to work, if the statement including
LOAD_FILE is marked as unsafe, triggering a switch to ROW mode,
meaning that the contents of the file are written to binlog as row
events. Consequently, the contents from the file in the master will
reach the slave via the binlog.
This patch addresses this bug by marking the load_file function as
unsafe. When in mixed mode and when LOAD_FILE is issued, there will be
a switch to row mode. Furthermore, when in statement mode, the
LOAD_FILE will raise a warning that the statement is unsafe in that
mode.
TRUNCATE TABLE fails to replicate when stmt-based binlogging is not supported.
There were two separate problems with the code, both of which are fixed with
this patch:
1. An error was printed by InnoDB for TRUNCATE TABLE in statement mode when
the in isolation levels READ COMMITTED and READ UNCOMMITTED since InnoDB
does permit statement-based replication for DML statements. However,
the TRUNCATE TABLE is not transactional, but is a DDL, and should therefore
be allowed to be replicated as a statement.
2. The statement was not logged in mixed mode because of the error above, but
the error was not reported to the client.
This patch fixes the problem by treating TRUNCATE TABLE a DDL, that is, it is
always logged as a statement and not reporting an error from InnoDB for TRUNCATE
TABLE.
Documented behaviour was broken by the patch for bug 33699
that actually is not a bug.
This fix reverts patch for bug 33699 and reverts the
UPDATE of NOT NULL field with NULL query to old
behavior.
Remove size of binlog file from SHOW BINARY LOGS.
Changing size of binlog file is an affect of adding or removing events to/from
binlog and it can be checked in next command of test: SHOW BINLOG EVENTS.
For SHOW BINARY LOGS statement enough to show the list of file names.
conflicts:
Text conflict in client/mysqltest.cc
Text conflict in mysql-test/include/wait_until_connected_again.inc
Text conflict in mysql-test/lib/mtr_report.pm
Text conflict in mysql-test/mysql-test-run.pl
Text conflict in mysql-test/r/events_bugs.result
Text conflict in mysql-test/r/log_state.result
Text conflict in mysql-test/r/myisam_data_pointer_size_func.result
Text conflict in mysql-test/r/mysqlcheck.result
Text conflict in mysql-test/r/query_cache.result
Text conflict in mysql-test/r/status.result
Text conflict in mysql-test/suite/binlog/r/binlog_index.result
Text conflict in mysql-test/suite/binlog/r/binlog_innodb.result
Text conflict in mysql-test/suite/rpl/r/rpl_packet.result
Text conflict in mysql-test/suite/rpl/t/rpl_packet.test
Text conflict in mysql-test/t/disabled.def
Text conflict in mysql-test/t/events_bugs.test
Text conflict in mysql-test/t/log_state.test
Text conflict in mysql-test/t/myisam_data_pointer_size_func.test
Text conflict in mysql-test/t/mysqlcheck.test
Text conflict in mysql-test/t/query_cache.test
Text conflict in mysql-test/t/rpl_init_slave_func.test
Text conflict in mysql-test/t/status.test
The next number (AUTO_INCREMENT) field of the table for write
rows events are not initialized, and cause some engines (innodb)
not correctly update the tables's auto_increment value.
This patch fixed this problem by honor next number fields if present.
Problem 1: The test waits for an error in the slave sql thread,
then resolves the error and issues 'start slave'. However, there
is a gap between when the error is reported and the slave sql
thread stops. If this gap was long, the slave would still be
running when 'start slave' happened, so 'start slave' would fail
and cause a test failure.
Fix 1: Made wait_for_slave_sql_error wait for the slave to stop
instead of wait for error in the IO thread. After stopping, the
error code is verified. If the error code is wrong, debug info
is printed. To print debug info, the debug printing code in
wait_for_slave_param.inc was moved out to a new file,
show_rpl_debug_info.inc.
Problem 2: rpl_stm_mystery22 is a horrible name, the comments in
the file didn't explain anything useful, the test was generally
hard to follow, and the test was essentially duplicated between
rpl_stm_mystery22 and rpl_row_mystery22.
Fix 2: The test is about conflicts in the slave SQL thread,
hence I renamed the tests to rpl_{stm,row}_conflicts. Refactored
the test so that the work is done in
extra/rpl_tests/rpl_conflicts.inc, and
rpl.rpl_{row,stm}_conflicts merely sets some variables and then
sourced extra/rpl_tests/rpl_conflicts.inc.
The tests have been rewritten and comments added.
Problem 3: When calling wait_for_slave_sql_error.inc, you always
want to verify that the sql thread stops because of the expected
error and not because of some other error. Currently,
wait_for_slave_sql_error.inc allows the caller to omit the error
code, in which case all error codes are accepted.
Fix 3: Made wait_for_slave_sql_error.inc fail if no error code
is given. Updated rpl_filter_tables_not_exist accordingly.
Problem 4: rpl_filter_tables_not_exist had a typo, the dollar
sign was missing in a 'let' statement.
Fix 4: Added dollar sign.
Problem 5: When replicating from other servers than the one named
'master', the wait_for_slave_* macros were unable to print debug
info on the master.
Fix 5: Replace parameter $slave_keep_connection by
$master_connection.
where timeout can happen:
1. Added waiting start/stop slave to make sure that slave works properly.
2. Added cleanup for slave.
3. Updated related result files.
Problem: Many test cases don't clean up after themselves (fail
to drop tables or fail to reset variables). This implies that:
(1) check-testcase in the new mtr that currently lives in
5.1-rpl failed. (2) it may cause unexpected results in
subsequent tests.
Fix: make all tests clean up.
Also: cleaned away unnecessary output in rpl_packet.result
Also: fixed bug where rpl_log called RESET MASTER with a running
slave. This is not supposed to work.
Also: removed unnecessary code from rpl_stm_EE_err2 and made it
verify that an error occurred.
Also: removed unnecessary code from rpl_ndb_ctype_ucs2_def.