DBUG_SYNC_POINT has at least one strong limitation that it's not defined
on all platforms. It has issues cooperating with @@debug.
All in all its functionality is superseded by DEBUG_SYNC facility and
there is no reason to maintain the old less flexible one.
Fixed with adding debug_sync_set_action() function as a facility to set up
a sync-action in the server sources code and re-writing existing simulations
(found 3) to use it.
Couple of tests have been reworked as well.
The patch offers a pattern for setting sync-points in replication threads
where the standard DEBUG_SYNC does not suffice to reach goals.
When using MyIsam tables and processing concurrent DML
statements, the server may be sending back an OK to the client
before actually finishing the transaction commit procedure. This
has been reported before in BUG@37521 and BUG@29334.
This particular test case gets affected, because it performs the
following sequence:
connect (conn2, ...)
connection conn2;
LOAD DATA CONCURRENT ...
disconnect (conn2, ...)
connection master;
sync_slave_with_master
diff_tables
At this point diff_tables may report difference in the table
content (the master seems to be missing the conn2 rows).
To workaround this MyISAM concurrent DML statements issue and
make this test case deterministic, we wait on conn2 until the
rows inserted show up in the table. After this the test case
proceeds as normally would before this patch.
Original revision:
------------------------------------------------------------
revision-id: li-bing.song@sun.com-20100130124925-o6sfex42b6noyc6x
parent: joro@sun.com-20100129145427-0n79l9hnk0q43ajk
committer: <Li-Bing.Song@sun.com>
branch nick: mysql-5.1-bugteam
timestamp: Sat 2010-01-30 20:49:25 +0800
message:
Bug #48321 CURRENT_USER() incorrectly replicated for DROP/RENAME USER;
REVOKE/GRANT; ALTER EVENT.
The following statements support the CURRENT_USER() where a user is needed.
DROP USER
RENAME USER CURRENT_USER() ...
GRANT ... TO CURRENT_USER()
REVOKE ... FROM CURRENT_USER()
ALTER DEFINER = CURRENT_USER() EVENT
but, When these statements are binlogged, CURRENT_USER() just is binlogged
as 'CURRENT_USER()', it is not expanded to the real user name. When slave
executes the log event, 'CURRENT_USER()' is expand to the user of slave
SQL thread, but SQL thread's user name always NULL. This breaks the replication.
After this patch, All above statements are rewritten when they are binlogged.
The CURRENT_USER() is expanded to the real user's name and host.
------------------------------------------------------------
REVOKE/GRANT; ALTER EVENT.
The following statements support the CURRENT_USER() where a user is needed.
DROP USER
RENAME USER CURRENT_USER() ...
GRANT ... TO CURRENT_USER()
REVOKE ... FROM CURRENT_USER()
ALTER DEFINER = CURRENT_USER() EVENT
but, When these statements are binlogged, CURRENT_USER() just is binlogged
as 'CURRENT_USER()', it is not expanded to the real user name. When slave
executes the log event, 'CURRENT_USER()' is expand to the user of slave
SQL thread, but SQL thread's user name always NULL. This breaks the replication.
After this patch, All above statements are rewritten when they are binlogged.
The CURRENT_USER() is expanded to the real user's name and host.
In RBR, DDL statement will change binlog format to non row-based
format before it is binlogged, but the binlog format was not be
restored, and then manipulating a temporary table can not reset binlog
format to row-based format rightly. So that the manipulated statement
is binlogged with statement-based format.
To fix the problem, restore the state of binlog format after the DDL
statement is binlogged.
cant find record
Some engines return data for the record. Despite the fact that
the null bit is set for some fields, their old value may still in
the row. This can happen when unpacking an AI from the binlog on
top of a previous record in which a field is set to NULL, which
previously contained a value. Ultimately, this may cause the
comparison of records to fail when the slave is doing an index or
range scan.
We fix this by deploying a call to reset() for each field that is
set to null while unpacking a row from the binary log.
Furthermore, we also add mixed mode test case to cover the
scenario where updating and setting a field to null through a
Query event and later searching it through a rows event will
succeed.
Finally, we also change the reset() method, from Field_bit class,
so that it takes into account bits stored among the null bits and
not only the ones stored in the record.
It is well-known that due to concurrency issues, a slave can become
inconsistent when a transaction contains updates to both transaction and
non-transactional tables in statement and mixed modes.
In a nutshell, the current code-base tries to preserve causality among the
statements by writing non-transactional statements to the txn-cache which
is flushed upon commit. However, modifications done to non-transactional
tables on behalf of a transaction become immediately visible to other
connections but may not immediately get into the binary log and therefore
consistency may be broken.
In general, it is impossible to automatically detect causality/dependency
among statements by just analyzing the statements sent to the server. This
happen because dependency may be hidden in the application code and it is
necessary to know a priori all the statements processed in the context of
a transaction such as in a procedure. Moreover, even for the few cases that
we could automatically address in the server, the computation effort
required could make the approach infeasible.
So, in this patch we introduce the option
- "--binlog-direct-non-transactional-updates" that can be used to bypass
the current behavior in order to write directly to binary log statements
that change non-transactional tables.
Problem: When RAND() is binlogged in statement mode, the seed is
binlogged too, so the replication slave generates the same
sequence of random numbers. This makes replication work in many
cases, but not in all cases: the order of rows is not guaranteed
for, e.g., UPDATE or INSERT...SELECT statements, so the row data
will be different if master and slave retrieve the rows in
different orders.
Fix: Mark RAND() as unsafe. It will generate a warning if
binlog_format=STATEMENT and switch to row-logging if
binlog_format=ROW.
'LOAD DATA CONCURRENT [LOCAL] INFILE ...' statment only is binlogged as
'LOAD DATA [LOCAL] INFILE ...' in SBR and MBR. As a result, if replication is on,
queries on slaves will be blocked by the replication SQL thread.
This patch write code to write 'CONCURRENT' into the log event if 'CONCURRENT' option
is in the original statement in SBR and MBR.
escaped field names
When in mixed or statement mode, the master logs LOAD DATA
queries by resorting to an Execute_load_query_log_event. This
event does not contain the original query, but a rewritten
version of it, which includes the table field names. However, the
rewrite does not escape the field names. If these names match a
reserved keyword, then the slave will stop with a syntax error
when executing the event.
We fix this by escaping the fields names as it happens already
for the table name.
Problem: Some system functions that could return different values on
master and slave were not marked unsafe. In particular:
GET_LOCK
IS_FREE_LOCK
IS_USED_LOCK
MASTER_POS_WAIT
RELEASE_LOCK
SLEEP
SYSDATE
VERSION
Fix: Mark these functions unsafe.
binlog, replication aborts
In SBR or MBR, the schema name is not being written to the binlog
when executing a LOAD DATA statement. This becomes a problem when
the current database (lets call it db1) is different from the
table's schema (lets call it db2). For instance, take the
following statements:
use db1;
load data local infile 'infile.txt' into table db2.t
Should this statement be logged without t's schema (db2), when
replaying it, one can get db1.t populated instead of db2.t (if
db1.t exists). On the other hand, if there is no db1.t at all,
replication will stop.
We fix this by always logging the table (in load file) with fully
qualified name when its schema is different from the current
database or when no default database was selected.
Backporting BUG#43789 to mysql-5.1-bugteam
The replication was generating corrupted data, warning messages on Valgrind
and aborting on debug mode while replicating a "null" to "not null" field.
Specifically the unpack_row routine, was considering the slave's table
definition and trying to retrieve a field value, where there was nothing to be
retrieved, ignoring the fact that the value was defined as "null" by the master.
To fix the problem, we proceed as follows:
1 - If it is not STRICT sql_mode, implicit default values are used, regardless
if it is multi-row or single-row statement.
2 - However, if it is STRICT mode, then a we do what follows:
2.1 If it is a transactional engine, we do a rollback on the first NULL that is
to be set into a NOT NULL column and return an error.
2.2 If it is a non-transactional engine and it is the first row to be inserted
with multi-row, we also return the error. Otherwise, we proceed with the
execution, use implicit default values and print out warning messages.
Unfortunately, the current patch cannot mimic the behavior showed by the master
for updates on multi-tables and multi-row inserts. This happens because such
statements are unfolded in different row events. For instance, considering the
following updates and strict mode:
(master)
create table t1 (a int);
create table t2 (a int not null);
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
t1 would have (10) and t2 would have (0) as this would be handled as a
multi-row update. On the other hand, if we had the following updates:
(master)
create table t1 (a int);
create table t2 (a int);
(slave)
create table t1 (a int);
create table t2 (a int not null);
(master)
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
On the master t1 would have (10) and t2 would have (NULL). On
the slave, t1 would have (10) but the update on t1 would fail.
Backporting BUG#38173 to mysql-5.1-bugteam
The reason of the bug was incompatibile with the master side behaviour.
INSERT query on the master is allowed to insert into a table without specifying
values of DEFAULT-less fields if sql_mode is not strict.
Fixed with checking sql_mode by the sql thread to decide how to react.
Non-strict sql_mode should allow Write_rows event to complete.
todo: warnings can be shown via show slave status, still this is a
separate rather general issue how to show warnings for the slave threads.
The BINLOG statement was sharing too much code with the slave SQL thread, introduced with
the patch for Bug#32407. This caused statements to be logged with the wrong server_id, the
id stored inside the events of the BINLOG statement rather than the id of the running
server.
Fix by rearranging code a bit so that only relevant parts of the code are executed by
the BINLOG statement, and the server_id of the server executing the statements will
not be overrided by the server_id stored in the 'format description BINLOG statement'.
Let
- T be a transactional table and N non-transactional table.
- B be begin, C commit and R rollback.
- N be a statement that accesses and changes only N-tables.
- T be a statement that accesses and changes only T-tables.
In RBR, changes to N-tables that happen early in a transaction are not immediately flushed
upon committing a statement. This behavior may, however, break consistency in the presence
of concurrency since changes done to N-tables become immediately visible to other
connections. To fix this problem, we do the following:
. B N N T C would log - B N C B N C B T C.
. B N N T R would log - B N C B N C B T R.
Note that we are not preserving history from the master as we are introducing a commit that
never happened. However, this seems to be more acceptable than the possibility of breaking
consistency in the presence of concurrency.
Let
- T be a transactional table and N non-transactional table.
- B be begin, C commit and R rollback.
- M be a mixed statement, i.e. a statement that updates both T and N.
- M* be a mixed statement that fails while updating either T or N.
This patch restore the behavior presented in 5.1.37 for rows either produced in
the RBR or MIXED modes, when a M* statement that happened early in a transaction
had their changes written to the binary log outside the boundaries of the
transaction and wrapped in a BEGIN/ROLLBACK. This was done to keep the slave
consistent with with the master as the rollback would keep the changes on N and
undo them on T. In particular, we do what follows:
. B M* T C would log - B M* R B T C.
Note that, we are not preserving history from the master as we are introducing a
rollback that never happened. However, this seems to be more acceptable than
making the slave diverge. We do not fix the following case:
. B T M* C would log B T M* C.
The slave will diverge as the changes on T tables that originated from the M
statement are rolled back on the master but not on the slave. Unfortunately, we
cannot simply rollback the transaction as this would undo any uncommitted
changes on T tables.
SBR is not considered in this patch because a failing statement is written to
the binary along with the error code and a slave executes and then rolls back
the statement when it has an associated error code, thus undoing the effects
on T. In RBR and MBR, a full-fledged fix will be pushed after the WL 2687.
The problem is that there is only one autoinc value associated with
the query when binlogging. If more than one autoinc values are used
in the query, the autoinc values after the first one can be inserted
wrongly on slave. So these autoinc values can become inconsistent on
master and slave.
The problem is resolved by marking all the statements that invoke
a trigger or call a function that updated autoinc fields as unsafe,
and will switch to row-format in Mixed mode. Actually, the statement
is safe if just one autoinc value is used in sub-statement, but it's
impossible to check how many autoinc values are used in sub-statement.)
In RBR, 'DROP TEMPORARY TABLE IF EXISTS...' statement is binlogged when the table
does not exist.
In fact, 'DROP TEMPORARY TABLE ...' statement should never be binlogged in RBR
no matter if the table exists or not.
This patch addresses this by checking whether we are dropping a
temporary table or not, when building the custom drop statement.
failure to cleanup of table
The test case was not dropping a table before exiting (ie, it was
not cleaning itself after execution). In this case, the warning
message stating that the test did not do a proper cleanup was
deterministic (which can be annoying).
I have found other tests cases on which mtr sporadically reports
that they have not cleaned up after execution:
- rpl_ndb_circular
- rpl_failed_optimize
In this case, the master was dropping a table but there was no
synchronization between the slave and the master.
This patch addresses the rpl_ndb_circular_simplex case by adding
the missing DROP table. The other cases are fixed by deploying
the missing sync_slave_with_master instruction.
Network error happened here, but it can be caused by CR_CONNECTION_ERROR,
CR_CONN_HOST_ERROR, CR_SERVER_GONE_ERROR, CR_SERVER_LOST, ER_CON_COUNT_ERROR,
and ER_SERVER_SHUTDOWN. We just check CR_SERVER_LOST here, so the test fails.
To fix the problem, check all errors that can be cause by the master shutdown.
In RBR, There is an inconsistency between slaves and master.
When INSERT statement which includes an auto_increment field is executed,
Store engine of master will check the value of the auto_increment field.
It will generate a sequence number and then replace the value, if its value is NULL or empty.
if the field's value is 0, the store engine will do like encountering the NULL values
unless NO_AUTO_VALUE_ON_ZERO is set into SQL_MODE.
In contrast, if the field's value is 0, Store engine of slave always generates a new sequence number
whether or not NO_AUTO_VALUE_ON_ZERO is set into SQL_MODE.
SQL MODE of slave sql thread is always consistency with master's.
Another variable is related to this bug.
If generateing a sequence number is decided by the values of
table->auto_increment_field_not_null and SQL_MODE(if includes MODE_NO_AUTO_VALUE_ON_ZERO)
The table->auto_increment_is_not_null is FALSE, which causes this bug to appear. ..
Essentially, Bug#45574 results in this bug. The 'CREATE DATABASE IF NOT EXISTS' statement was not
binlogged, when the database has existed.
Sometimes, the master and slaves become inconsistent. The "CREATE DATABASE
IF NOT EXISTS mysqltest1" statement is not binlogged
if the db 'mysqltest1' existed before the test case is executed.
So the db 'mysqltest1' can't be created on slave.
Patch of Bug#45574 has resolved this problem.
But I think it is better to replace 'mysqltest1' by default db 'test'.
Slave does not correctly handle "expected errors" leading to inconsistencies
between the mater and slave. Specifically, when a statement changes both
transactional and non-transactional tables, the transactional changes are
automatically rolled back on the master but the slave ignores the error and
does not roll them back thus leading to inconsistencies.
To fix the problem, we automatically roll back a statement that fails on
the slave but note that the transaction is not rolled back unless a "rollback"
command is in the relay log file.
binlog
Mixing transactional (T) and non-transactional (N) tables on behalf of a
transaction may lead to inconsistencies among masters and slaves in STATEMENT
mode. The problem stems from the fact that although modifications done to
non-transactional tables on behalf of a transaction become immediately visible
to other connections they do not immediately get to the binary log and therefore
consistency is broken. Although there may be issues in mixing T and M tables in
STATEMENT mode, there are safe combinations that clients find useful.
In this bug, we fix the following issue. Mixing N and T tables in multi-level
(e.g. a statement that fires a trigger) or multi-table table statements (e.g.
update t1, t2...) were not handled correctly. In such cases, it was not possible
to distinguish when a T table was updated if the sequence of changes was N and T.
In a nutshell, just the flag "modified_non_trans_table" was not enough to reflect
that both a N and T tables were changed. To circumvent this issue, we check if an
engine is registered in the handler's list and changed something which means that
a T table was modified.
Check WL 2687 for a full-fledged patch that will make the use of either the MIXED or
ROW modes completely safe.
The "get_master_version_and_clock(...)" function in sql/slave.cc ignores
error and passes directly when queries fail, or queries succeed
but the result retrieved is empty.
The "get_master_version_and_clock(...)" function should try to reconnect master
if queries fail because of transient network problems, and fail otherwise.
The I/O thread should print a warning if the some system variables do not
exist on master (very old master)
"create as select" (innodb table)
Problem: code constructing "CREATE TABLE..." statement
doesn't take into account that current database is not set
in some cases. That may lead to a server crash.
Fix: check if current database is set.
The test case added failed sporadically on PB. This is due to the
fact that the user thread in some cases is waiting for slave IO
to stop and then check the error number. Thence, sometimes the
user thread would race for the error number with IO thread.
This post push fix addresses this by replacing the wait for slave
io to stop with a wait for slave io error (as it seems it was
added in 6.0 also after patch on which this is based was
pushed). This implied backporting wait_for_slave_io_error.inc
from 6.0 also.
The server was not cleaning the last IO error and error number when
resetting slave.
This patch addresses this issue by backporting into 5.1 part of the
patch in BUG 34654. A fix for this issue had already been pushed into
6.0 as part of the aforementioned bug, however the patch also included
some refactoring. The fix for 5.1 does not take into account the
refactoring part.
1. Test case was rewritten completely.
2. Test covers 3 cases:
a) do deadlock on slave, wait retries of transaction, unlock slave before lock
timeout;
b) do deadlock on slave and wait error 'lock timeout exceed' on slave;
c) same as b) but if of max relay log size = 0;
3. Added comments inline.
4. Updated result file.
LOAD_FILE
LOAD_FILE is not safe to replicate in STATEMENT mode, because it
depends on a file (which is loaded on master and may not exist in
slave(s)). This leads to scenarios on which the slave replicates the
statement with 'load_file' and it will try to load the file from local
file system. Given that the file may not exist in the slave filesystem
the operation will not succeed (probably returning NULL), causing
master and slave(s) to diverge. However, when using MIXED mode
replication, this can be made to work, if the statement including
LOAD_FILE is marked as unsafe, triggering a switch to ROW mode,
meaning that the contents of the file are written to binlog as row
events. Consequently, the contents from the file in the master will
reach the slave via the binlog.
This patch addresses this bug by marking the load_file function as
unsafe. When in mixed mode and when LOAD_FILE is issued, there will be
a switch to row mode. Furthermore, when in statement mode, the
LOAD_FILE will raise a warning that the statement is unsafe in that
mode.
TRUNCATE TABLE fails to replicate when stmt-based binlogging is not supported.
There were two separate problems with the code, both of which are fixed with
this patch:
1. An error was printed by InnoDB for TRUNCATE TABLE in statement mode when
the in isolation levels READ COMMITTED and READ UNCOMMITTED since InnoDB
does permit statement-based replication for DML statements. However,
the TRUNCATE TABLE is not transactional, but is a DDL, and should therefore
be allowed to be replicated as a statement.
2. The statement was not logged in mixed mode because of the error above, but
the error was not reported to the client.
This patch fixes the problem by treating TRUNCATE TABLE a DDL, that is, it is
always logged as a statement and not reporting an error from InnoDB for TRUNCATE
TABLE.
Documented behaviour was broken by the patch for bug 33699
that actually is not a bug.
This fix reverts patch for bug 33699 and reverts the
UPDATE of NOT NULL field with NULL query to old
behavior.
Remove size of binlog file from SHOW BINARY LOGS.
Changing size of binlog file is an affect of adding or removing events to/from
binlog and it can be checked in next command of test: SHOW BINLOG EVENTS.
For SHOW BINARY LOGS statement enough to show the list of file names.
conflicts:
Text conflict in client/mysqltest.cc
Text conflict in mysql-test/include/wait_until_connected_again.inc
Text conflict in mysql-test/lib/mtr_report.pm
Text conflict in mysql-test/mysql-test-run.pl
Text conflict in mysql-test/r/events_bugs.result
Text conflict in mysql-test/r/log_state.result
Text conflict in mysql-test/r/myisam_data_pointer_size_func.result
Text conflict in mysql-test/r/mysqlcheck.result
Text conflict in mysql-test/r/query_cache.result
Text conflict in mysql-test/r/status.result
Text conflict in mysql-test/suite/binlog/r/binlog_index.result
Text conflict in mysql-test/suite/binlog/r/binlog_innodb.result
Text conflict in mysql-test/suite/rpl/r/rpl_packet.result
Text conflict in mysql-test/suite/rpl/t/rpl_packet.test
Text conflict in mysql-test/t/disabled.def
Text conflict in mysql-test/t/events_bugs.test
Text conflict in mysql-test/t/log_state.test
Text conflict in mysql-test/t/myisam_data_pointer_size_func.test
Text conflict in mysql-test/t/mysqlcheck.test
Text conflict in mysql-test/t/query_cache.test
Text conflict in mysql-test/t/rpl_init_slave_func.test
Text conflict in mysql-test/t/status.test