Problem 1: The test waits for an error in the slave sql thread,
then resolves the error and issues 'start slave'. However, there
is a gap between when the error is reported and the slave sql
thread stops. If this gap was long, the slave would still be
running when 'start slave' happened, so 'start slave' would fail
and cause a test failure.
Fix 1: Made wait_for_slave_sql_error wait for the slave to stop
instead of wait for error in the IO thread. After stopping, the
error code is verified. If the error code is wrong, debug info
is printed. To print debug info, the debug printing code in
wait_for_slave_param.inc was moved out to a new file,
show_rpl_debug_info.inc.
Problem 2: rpl_stm_mystery22 is a horrible name, the comments in
the file didn't explain anything useful, the test was generally
hard to follow, and the test was essentially duplicated between
rpl_stm_mystery22 and rpl_row_mystery22.
Fix 2: The test is about conflicts in the slave SQL thread,
hence I renamed the tests to rpl_{stm,row}_conflicts. Refactored
the test so that the work is done in
extra/rpl_tests/rpl_conflicts.inc, and
rpl.rpl_{row,stm}_conflicts merely sets some variables and then
sourced extra/rpl_tests/rpl_conflicts.inc.
The tests have been rewritten and comments added.
Problem 3: When calling wait_for_slave_sql_error.inc, you always
want to verify that the sql thread stops because of the expected
error and not because of some other error. Currently,
wait_for_slave_sql_error.inc allows the caller to omit the error
code, in which case all error codes are accepted.
Fix 3: Made wait_for_slave_sql_error.inc fail if no error code
is given. Updated rpl_filter_tables_not_exist accordingly.
Problem 4: rpl_filter_tables_not_exist had a typo, the dollar
sign was missing in a 'let' statement.
Fix 4: Added dollar sign.
Problem 5: When replicating from other servers than the one named
'master', the wait_for_slave_* macros were unable to print debug
info on the master.
Fix 5: Replace parameter $slave_keep_connection by
$master_connection.
Problem 1: tests often fail in pushbuild with a timeout when waiting
for the slave to start/stop/receive error.
Fix 1: Updated the wait_for_slave_* macros in the following way:
- The timeout is increased by a factor ten
- Refactored the macros so that wait_for_slave_param does the work for
the other macros.
Problem 2: Tests are often incorrectly written, lacking a
source include/wait_for_slave_to_[start|stop].inc.
Fix 2: Improved the chance to get it right by adding
include/start_slave.inc and include/stop_slave.inc, and updated tests
to use these.
Problem 3: The the built-in test language command
wait_for_slave_to_stop is a misnomer (does not wait for the slave io
thread) and does not give as much debug info in case of failure as
the otherwise equivalent macro
source include/wait_for_slave_sql_to_stop.inc
Fix 3: Replaced all calls to the built-in command by a call to the
macro.
Problem 4: Some, but not all, of the wait_for_slave_* macros had an
implicit connection slave. This made some tests confusing to read,
and made it more difficult to use the macro in circular replication
scenarios, where the connection named master needs to wait.
Fix 4: Removed the implicit connection slave from all
wait_for_slave_* macros, and updated tests to use an explicit
connection slave where necessary.
Problem 5: The macros wait_slave_status.inc and wait_show_pattern.inc
were unused. Moreover, using them is difficult and error-prone.
Fix 5: remove these macros.
Problem 6: log_bin_trust_function_creators_basic failed when running
tests because it assumed @@global.log_bin_trust_function_creators=1,
and some tests modified this variable without resetting it to its
original value.
Fix 6: All tests that use this variable have been updated so that
they reset the value at end of test.
Problem: rpl_ndb_transaction fails because it assumes nothing
is written to the binlog at a certain point. However, ndb may
binlog updates in ndb system tables at a nondeterministic
time point after an ndb table update has been committed.
Fix: break the test into two. rpl_ndb_transaction still does
the ndb updates needed by the first half of the test. The new
test case rpl_bug26395 includes the part that assumes nothing
more will be written to the binlog.