There was a race. The test case was expecting the slave to start processing a
particular DELETE statement, then the test would stop the slave at this
point. But there was missing something to wait until the slave would actually
reach this point; thus depending on timing it was possible that the slave
would be stopped too early, causing .result file difference.
Fixed by adding an appropriate wait to the test case.
Fix rare failures in test case rpl.rpl_gtid_basic:
- Add another possible error code when a connection is killed.
- Make sure that the IO thread has had time to complete its stop after START
SLAVE UNTIL. Otherwise, START SLAVE might run before IO thread stop,
leaving the test case with a stopped IO thread that eventually causes a
wait timeout.
There was a race, a small window between updating slave position and updating
Seconds_Behind_Master, during which the test case could see the wrong value.
Fix by waiting for the expected status to appear.
The replication relay log position was sometimes updated incorrectly at the
end of a transaction in parallel replication. This happened because the relay
log file name was taken from the current Relay_log_info (SQL driver thread),
not the correct value for the transaction in question.
The result was that if a transaction was applied while the SQL driver thread
was at least one relay log file ahead, _and_ the SQL thread was subsequently
stopped before applying any events from the most recent relay log file, then
the relay log position would be incorrect - wrong relay log file name. Thus,
when the slave was started again, usually a relay log read error would result,
or in rare cases, if the position happened to be readable, the slave might
even skip arbitrary amounts of events.
In GTID mode, the relay log position is reset when both slave threads are
restarted, so this bug would only be seen in non-GTID mode, or in GTID mode
when only the SQL thread, not the IO thread, was stopped.
Simply disallowing equality propagation into LIKE.
A more delicate fix is be possible, but it would need too many changes,
which is not desirable in 10.0 at this point.
MDEV-7106: Sporadic test failure in multi_source.gtid
MDEV-7153: Yet another sporadic failure of multi_source.gtid in buildbot
This patch fixes three races in the multi_source.gtid test case that could
cause sporadic failures:
1. Do not put SHOW ALL SLAVES STATUS in the output, the output is not stable.
2. Ensure that slave1 has replicated as far as expected, before stopping its
connection to master1 (otherwise the following wait will time out due to rows
not replicated from master1).
3. Ensure that slave2 has replicated far enough before connecting slave1 to it
(otherwise we get an error during connect that slave1 is ahead of slave2).
Moving Item_bool_func2 and Item_func_opt_neg from Item_int_func to
Item_bool_func. Now all functions that return is_bool_func()=true
have a common root class Item_bool_func.
This change is needed to fix MDEV-7149 properly.
Problem is on test it tried to verify that no files were left
on test database.
Fix: There's no need to list other file types, it can only
list *.par files
The bug was in DEBUG_SYNC. When waiting, debug_sync_execute() temporarily sets
thd->mysys_var->current_mutex to a new value while waiting. However, if the
old value of current_mutex was NULL, it was not restored, current_mutex
remained set to the temporary value (debug_sync_global.ds_mutex).
This made possible the following race: Thread T1 goes to KILL thread T2. In
THD::awake(), T1 loads T2->mysys_var->current_mutex, it is set to ds_mutex, T1
locks this mutex.
Now T2 runs, it does ENTER_COND, it sets T2->mysys_var->current_mutex to
LOCK_wait_commit (for example).
Then T1 resumes, it reloads mysys_var->current_mutex, now it is set to
LOCK_wait_commit, T1 unlocks this mutex instead of the ds_mutex that it locked
previously.
This causes safe_mutex to assert with the message: "Trying to unlock mutex
LOCK_wait_commit that wasn't locked".
The fix is to ensure that DEBUG_SYNC also will restore
mysys_var->current_mutex in the case where the original value was NULL.
I saw two test failures in rpl.rpl_gtid_crash where we get this in the error
log:
141123 12:47:54 [Note] InnoDB: Restoring possible half-written data pages
141123 12:47:54 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Warning: database page corruption or a failed
InnoDB: file read of space 6 page 3.
InnoDB: Trying to recover it from the doublewrite buffer.
141123 12:47:54 [Note] InnoDB: Recovered the page from the doublewrite buffer.
This test case deliberately crashes the server, and if this crash happens
right in the middle of writing a buffer pool page to disk, it is not
unexpected that we can get a half-written page. The page is recovered
correctly from the doublewrite buffer.
So this patch adds a suppression for this warning in the error log for this
test case.
When a master slave restarts, it logs a special restart format description
event in its binlog. When the slave sees this event, it knows it needs to roll
back any active partial transaction, in case the master crashed previously in
the middle of writing such transaction to its binlog.
However, there was a bug where this rollback did not reset rgi->pending_gtid.
This caused the @@gtid_slave_pos to be updated incorrectly with the GTID of
the partial transaction that was rolled back.
Fix this by always clearing rgi->pending_gtid in cleanup_context(), hopefully
preventing similar bugs from turning up in other special cases where a
transaction is rolled back during replication.
Thanks to Pavel Ivanov for tracking down the issue and providing a test case.
after Operating system error number 36 in a file operation.
Analysis: os_file_get_status did not handle error ENAMETOOLONG
correctly.
Fix: Add correct handling for error ENAMETOOLONG. Note that on InnoDB
case the error is not passed all the way up to server. That would
be bigger rewamp.
innodb.innodb_stats_drop_locked fail and
innodb.innodb_stats_fetch_nonexistent fails in buildbot on Windows
Analysis: Problem is that innodb_stats_create_on_corrupted
test renames mysql.innodb.index_stats and all the rest
are dependend on this table.
Fix: After rename back to original, restart mysqld to
make sure that table is correct.
Use traditional statistics estimation by default (innodb-stats-traditional=true).
There could be performance regression for customers if there is a lot of
open table operations.
* adjust viossl.c to take account the new code
(SSL_get_error is used now, cannot simply remap it)
* remove unnecessary version check
* update the test to 10.0