Problem 1: tests often fail in pushbuild with a timeout when waiting
for the slave to start/stop/receive error.
Fix 1: Updated the wait_for_slave_* macros in the following way:
- The timeout is increased by a factor ten
- Refactored the macros so that wait_for_slave_param does the work for
the other macros.
Problem 2: Tests are often incorrectly written, lacking a
source include/wait_for_slave_to_[start|stop].inc.
Fix 2: Improved the chance to get it right by adding
include/start_slave.inc and include/stop_slave.inc, and updated tests
to use these.
Problem 3: The the built-in test language command
wait_for_slave_to_stop is a misnomer (does not wait for the slave io
thread) and does not give as much debug info in case of failure as
the otherwise equivalent macro
source include/wait_for_slave_sql_to_stop.inc
Fix 3: Replaced all calls to the built-in command by a call to the
macro.
Problem 4: Some, but not all, of the wait_for_slave_* macros had an
implicit connection slave. This made some tests confusing to read,
and made it more difficult to use the macro in circular replication
scenarios, where the connection named master needs to wait.
Fix 4: Removed the implicit connection slave from all
wait_for_slave_* macros, and updated tests to use an explicit
connection slave where necessary.
Problem 5: The macros wait_slave_status.inc and wait_show_pattern.inc
were unused. Moreover, using them is difficult and error-prone.
Fix 5: remove these macros.
Problem 6: log_bin_trust_function_creators_basic failed when running
tests because it assumed @@global.log_bin_trust_function_creators=1,
and some tests modified this variable without resetting it to its
original value.
Fix 6: All tests that use this variable have been updated so that
they reset the value at end of test.
Problem: the test syncs slave by a 'wait_condition' waiting until
table t1 has 5000 rows. However, there is no guarantee that t1
makes it to the slave before the wait_condition.
Fix: sync_slave_with_master just after t1 was created.
Problem: rpl_ndb_transaction fails because it assumes nothing
is written to the binlog at a certain point. However, ndb may
binlog updates in ndb system tables at a nondeterministic
time point after an ndb table update has been committed.
Fix: break the test into two. rpl_ndb_transaction still does
the ndb updates needed by the first half of the test. The new
test case rpl_bug26395 includes the part that assumes nothing
more will be written to the binlog.
Problem 1: main.loaddata tried to trigger an error caused by
reading files outside the vardir, by reading itself. However,
if loaddata.test is not world-readable (e.g., umask=0077),
then another error is triggered.
Fix 1: allow the other error too.
Problem 2: rpl_slave_skip and rpl_innodb_mixed_dml tried to
copy a file from mysql-test/suite/rpl/data to mysql-test/var
and then read it. That failed too if umask=0077, since the
file would not become world-readable.
Fix 2: move the files from mysql-test/suite/rpl/data to
mysql-test/std_data and update tests accordingly. Remove
the directory mysql-test/suite/rpl/data.
This bug has been fixed in two slightly different ways in
6.0-rpl and {5.1,6.0}-bugteam. To avoid future merge
problems, I'm now copying the 6.0-rpl fix to 5.1-bugteam.
The previous fix for the bug was incomplete. The test failed
because t2 did not exist on the slave (since the slave was
lagging) when the
wait_condition was executed. Fixed by inserting
sync_slave_with_master just after t2 was created.
Problem: rpl_switch_stm_row_mixed did not wait until row events generated by
INSERT DELAYED were written to the master binlog before it synchronized slave
with master. This caused sporadic errors where these rows were missing on
slave.
Fix: wait until all rows appear on the slave.
This is a backport, applying the same to 5.1-bugteam as was previously
applied to 6.0-rpl
On a slow environment like valgrind the test is vulnerable
because it does not check if slave has stopped at time
of the new session is requested `start slave;' -- disabling
test till it is fixed.
The test is vulnerable because it does not check if slave has stopped at time
of the new session is requested `start slave;'
Fixed with deploying explicitly wait_for_slave_to_stop synchronization macro.
When flushing tables, there were a slight chance that the flush was occuring
between processing of two table map events. Since the tables are opened
one by one, it might result in that the tables were not valid and that sub-
sequent locking of tables would cause the slave to crash.
The problem is solved by opening and locking all tables at once using
simple_open_n_lock_tables(). Also, the patch contain a change to open_tables()
so that pre-locking only takes place when the trg_event_map is not zero, which
was not the case before (this caused the lock to be placed in thd->locked_tables
instead of thd->lock since the assumption was that triggers would be called
later and therefore the tables should be pre-locked).
Temporarily checking in an incorrect test case. Rationale: the impact of
this bug is negligible (it's almost a feature request). We need 5.1 to be
stable, and making a real fix is a bit risky. So the fix is postponed
to 6.0.
The test suite/rpl/t/rpl_innodb_bug28430.test was disabled because of
BUG#32247, but not re-enabled when BUG#32247 was fixed. I've re-enabled
it. The test and result file needed to be updated too.
Among two claimed artifacts the critical one is in that the Table map of
a query following the failing with a duplicate key error CREATE-SELECT is skipped from
instantionating (and thus binlogging). That leads to sending a "chopped" group of the data
row-events without the table map head to the slave.
The slave can not apply the only data row events.
It's not easy to force the slave to react with an error in such a case (the second complaint
on the bug report), because the lack of a table Rows_log_event::do_apply_event the data row event
handler is a common situation which normally designates the event has to be filtered out
basing on the repliation do/ingore rules decision.
Fixed: table map creating and binlogging is restored via deploying the standard cleanup call in
select_create::abort().
No error is reported if by chance the table map was not been binlogged.
Leaving this out to resolve with considering how to combine the do/ingore rules with the situation
when erronoulsy the Table_map is not written to binlog.