rpl_packet got a timeout failure sporadically on PB when stopping
slave. The real reason of this bug is that STOP SLAVE stopped
IO thread first and then stopped SQL thread. It was
possible that IO thread stopped after replicating part of a
transaction which SQL thread was executing. SQL thread would
be hung if the transaction could not be rolled back safely.
After this patch, STOP SLAVE will stop SQL thread first and then stop IO
thread, which guarantees that IO thread will fetch the reset of the
events of the transaction that SQL thread is executing, so that SQL
thread can finish the transaction if it cannot be rolled back safely.
Added below auxiliary files to make the test code neater.
restart_slave_sql.inc
rpl_connection_master.inc
rpl_connection_slave.inc
rpl_connection_slave1.inc
when generating new name.
If find_uniq_filename returns an error, then this error is not
being propagated upwards, and execution does not report error to
the user (although a entry in the error log is generated).
Additionally, some more errors were ignored in new_file_impl:
- when writing the rotate event
- when reopening the index and binary log file
This patch addresses this by propagating the error up in the
execution stack. Furthermore, when rotation of the binary log
fails, an incident event is written, because there may be a
chance that some changes for a given statement, were not properly
logged. For example, in SBR, LOAD DATA INFILE statement requires
more than one event to be logged, should rotation fail while
logging part of the LOAD DATA events, then the logged data would
become inconsistent with the data in the storage engine.
mysql-test/include/restart_mysqld.inc:
Refactored restart_mysqld so that it is not hardcoded for
mysqld.1, but rather for the current server.
mysql-test/suite/binlog/t/binlog_index.test:
The error on open of index and binary log on new_file_impl
is now caught. Thence the user will get an error message.
We need to accomodate this change in the test case for the
failing FLUSH LOGS.
mysql-test/suite/rpl/t/rpl_binlog_errors-master.opt:
Sets max_binlog_size to 4096.
mysql-test/suite/rpl/t/rpl_binlog_errors.test:
Added some test cases for asserting that the error is found
and reported.
sql/handler.cc:
Catching error now returned by unlog (in ha_commit_trans) and
returning it.
sql/log.cc:
Propagating errors from new_file_impl upwards. The errors that
new_file_impl catches now are:
- error on generate_new_name
- error on writing the rotate event
- error when opening the index or the binary log file.
sql/log.h:
Changing declaration of:
- rotate_and_purge
- new_file
- new_file_without_locking
- new_file_impl
- unlog
They now return int instead of void.
sql/mysql_priv.h:
Change signature of reload_acl_and_cache so that write_to_binlog
is an int instead of bool.
sql/mysqld.cc:
Redeclaring not_used var as int instead of bool.
sql/rpl_injector.cc:
Changes to catch the return from rotate_and_purge.
sql/slave.cc:
Changes to catch the return values for new_file and rotate_relay_log.
sql/slave.h:
Changes to rotate_relay_log declaration (now returns int
instead of void).
sql/sql_load.cc:
In SBR, some logging of LOAD DATA events goes through
IO_CACHE_CALLBACK invocation at mf_iocache.c:_my_b_get. The
IO_CACHE implementation is ignoring the return value for from
these callbacks (pre_read and post_read), so we need to find out
at the end of the execution if the error is set or not in THD.
sql/sql_parse.cc:
Catching the rotate_relay_log and rotate_and_purge return values.
Semantic change in reload_acl_and_cache so that we report errors
in binlog interactions through the write_to_binlog output parameter.
If there was any failure while rotating the binary log, we should
then report the error to the client when handling SQLCOMM_FLUSH.
Bug#57995: Compiler flag change build error on OSX 10.4: my_getncpus.c
Bug#57996: Compiler flag change build error on OSX 10.5 : bind.c
Bug#57994: Compiler flag change build error : my_redel.c
Bug#57993: Compiler flag change build error on FreeBsd 7.0 : regexec.c
Bug#57992: Compiler flag change build error on FreeBsd : mf_keycache.c
Bug#57997: Compiler flag change build error on OSX 10.6: debug_sync.cc
Fix assorted compiler generated warnings.
cmd-line-utils/readline/bind.c:
Bug#57996: Compiler flag change build error on OSX 10.5 : bind.c
Initialize variable to work around a false positive warning.
include/m_string.h:
Bug#57994: Compiler flag change build error : my_redel.c
The expansion of stpcpy (in glibc) causes warnings if the
return value of strmov is not being used. Since stpcpy is
a GNU extension and the expansion ends up using a built-in
provided by GCC, use the compiler provided built-in directly
when possible.
include/my_compiler.h:
Define a dummy MY_GNUC_PREREQ when not compiling with GCC.
libmysql/libmysql.c:
Bug#58057: 5.1 libmysql/libmysql.c unused variable/compile failure
Variable might not be used in some cases. So, tag it as unused.
mysys/mf_keycache.c:
Bug#57992: Compiler flag change build error on FreeBsd : mf_keycache.c
Use UNINIT_VAR to work around a false positive warning.
mysys/my_getncpus.c:
Bug#57995: Compiler flag change build error on OSX 10.4: my_getncpus.c
Declare variable in the same block where it is used.
regex/regexec.c:
Bug#57993: Compiler flag change build error on FreeBsd 7.0 : regexec.c
Work around a compiler bug which causes the cast to not be enforced.
sql/debug_sync.cc:
Bug#57997: Compiler flag change build error on OSX 10.6: debug_sync.cc
Use UNINIT_VAR to work around a false positive warning.
sql/handler.cc:
Use UNINIT_VAR to work around a false positive warning.
sql/slave.cc:
Use UNINIT_VAR to work around a false positive warning.
sql/sql_partition.cc:
Use UNINIT_VAR to work around a false positive warning.
storage/myisam/ft_nlq_search.c:
Use UNINIT_VAR to work around a false positive warning.
storage/myisam/mi_create.c:
Use UNINIT_VAR to work around a false positive warning.
storage/myisammrg/myrg_open.c:
Use UNINIT_VAR to work around a false positive warning.
tests/mysql_client_test.c:
Change function to take a pointer to const, no need for a cast.
replication aborts
When recieving a 'SLAVE STOP' command, slave SQL thread will roll back the
transaction and stop immidiately if there is only transactional table updated,
even through 'CREATE|DROP TEMPOARY TABLE' statement are in it. But These
statements can never be rolled back. Because the temporary tables to the user
session mapping remain until 'RESET SLAVE', Therefore it will abort SQL thread
with an error that the table already exists or doesn't exist, when it restarts
and executes the whole transaction again.
After this patch, SQL thread always waits till the transaction ends and then stops,
if 'CREATE|DROP TEMPOARY TABLE' statement are in it.
mysql-test/extra/rpl_tests/rpl_stop_slave.test:
Auxiliary file which is used to test this bug.
mysql-test/suite/rpl/t/rpl_stop_slave.test:
Test case for this bug.
sql/slave.cc:
Checking if OPTION_KEEP_LOG is set. If it is set, SQL thread should wait
until the transaction ends.
sql/sql_parse.cc:
Add a debug point for testing this bug.
When slave executes a transaction bigger than slave's max_binlog_cache_size,
slave will crash. It is caused by the assert that server should only roll back
the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL
happens. But slave sql thread always rollbacks the whole transaction when
an error happens.
Ather this patch, we always clear any error set in sql thread(it is different
from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
the transaction.
mysql-test/suite/rpl/r/rpl_binlog_max_cache_size.result:
SET binlog_cache_size and max_binlog_cache_size for all test cases.
Add test case for bug#55375.
mysql-test/suite/rpl/t/rpl_binlog_max_cache_size-master.opt:
binlog_cache_size and max_binlog_cache_size can be set in the client connection.
so remove this option file.
mysql-test/suite/rpl/t/rpl_binlog_max_cache_size.test:
SET binlog_cache_size and max_binlog_cache_size for all test cases.
Add test case for bug#55375.
sql/log_event.cc:
Some functions don't return the error code, so it is a wrong error code.
The error should always be set into thd->main_da. So we use
slave_rows_error_report to report the right error.
sql/slave.cc:
exec_relay_log_event() need call cleanup_context() to clear context.
clearup_context() will call end_trans().
Clear thd's error before cleanup_context. It avoid to trigger the assert
which cause this bug.
The procedure for setting up a fake binary log, by changing the
relay log files manually, is run twice when we issue mtr with
--repeat=2. However, when setting it up for the second time,
neither the sql thread is reset nor the server is restarted. This
means that previous stale relay log IO_CACHE metadata - from
first run - is left around. As a consequence, when the test is
run for the second time, the IO_CACHE for the relay log has
position in file different than zero, triggering the assertion.
We fix this by deploying a call to my_b_seek before calling
check_binlog_magic in next_event. This prevents the server
from asserting, in the cases that the SQL thread was reads
from a hot log (relay.NNNNN), then is stopped, then is restarted
from a previous cold log (relay.MMMMM), and then it reaches
the hot log relay.NNNNN again, in which case, the read
coordinates are not set to zero, but to the old values.
Additionally, some comments to the source code were added.
Fix warnings flagged by the new warning option -Wunused-but-set-variable
that was added to GCC 4.6 and that is enabled by -Wunused and -Wall. The
option causes a warning whenever a local variable is assigned to but is
later unused. It also warns about meaningless pointer dereferences.
client/mysql.cc:
Meaningless pointer dereferences.
client/mysql_upgrade.c:
Check whether reading from the file succeeded.
extra/comp_err.c:
Unused.
extra/yassl/src/yassl_imp.cpp:
Skip instead of reading data that is discarded.
include/my_pthread.h:
Variable is only used in debug builds.
include/mysys_err.h:
Add new error messages.
mysys/errors.c:
Add new error message for permission related functions.
mysys/mf_iocache.c:
Variable is only checked under THREAD.
mysys/my_copy.c:
Raise a error if chmod or chown fails.
mysys/my_redel.c:
Raise a error if chmod or chown fails.
regex/engine.c:
Use a equivalent variable for the assert.
server-tools/instance-manager/instance_options.cc:
Unused.
sql/field.cc:
Unused.
sql/item.cc:
Unused.
sql/log.cc:
Do not ignore the return value of freopen: only set buffer if
reopening succeeds.
Adjust doxygen comment to the right function.
Pass message lenght to log function.
sql/mysqld.cc:
Do not ignore the return value of freopen: only set buffer if
reopening succeeds.
sql/partition_info.cc:
Unused.
sql/slave.cc:
No need to set pointer to the address of '\0'.
sql/spatial.cc:
Unused. Left for historical purposes.
sql/sql_acl.cc:
Unused.
sql/sql_base.cc:
Pointers are always set to the same variables.
sql/sql_parse.cc:
End statement if reading fails.
Store the buffer after it has actually been updated.
sql/sql_repl.cc:
No need to set pointer to the address of '\0'.
sql/sql_show.cc:
Put variable under the same ifdef block.
sql/udf_example.c:
Set null pointer flag appropriately.
storage/csv/ha_tina.cc:
Meaningless dereferences.
storage/example/ha_example.cc:
Return the error since it's available.
storage/myisam/mi_locking.c:
Remove unused and dead code.
Apart strict-aliasing warnings, fix the remaining warnings
generated by GCC 4.4.4 -Wall and -Wextra flags.
One major source of warnings was the in-house function my_bcmp
which (unconventionally) took pointers to unsigned characters
as the byte sequences to be compared. Since my_bcmp and bcmp
are deprecated functions whose only difference with memcmp is
the return value, every use of the function is replaced with
memcmp as the special return value wasn't actually being used
by any caller.
There were also various other warnings, mostly due to type
mismatches, missing return values, missing prototypes, dead
code (unreachable) and ignored return values.
BUILD/SETUP.sh:
Remove flags that are implied by -Wall and -Wextra.
Do not warn about unused parameters in C++.
BUILD/check-cpu:
Print only the compiler version instead of verbose banner.
Although the option is gcc specific, the check was only
being used for GCC specific checks anyway.
client/mysql.cc:
bcmp is no longer defined.
client/mysqltest.cc:
Pass a string to function expecting a format string.
Replace use of bcmp with memcmp.
cmd-line-utils/readline/Makefile.am:
Always define _GNU_SOURCE when compiling GNU readline.
Required to make certain prototypes visible.
cmd-line-utils/readline/input.c:
Condition for the code to be meaningful.
configure.in:
Remove check for bcmp.
extra/comp_err.c:
Use appropriate type.
extra/replace.c:
Replace use of bcmp with memcmp.
extra/yassl/src/crypto_wrapper.cpp:
Do not ignore the return value of fgets. Retrieve the file
position if fgets succeed -- if it fails, the function will
bail out and return a error.
extra/yassl/taocrypt/include/blowfish.hpp:
Use a single array instead of accessing positions of the sbox_
through a subscript to pbox_.
extra/yassl/taocrypt/include/runtime.hpp:
One definition of such functions is enough.
extra/yassl/taocrypt/src/aes.cpp:
Avoid potentially ambiguous conditions.
extra/yassl/taocrypt/src/algebra.cpp:
Rename arguments to avoid shadowing related warnings.
extra/yassl/taocrypt/src/blowfish.cpp:
Avoid potentially ambiguous conditions.
extra/yassl/taocrypt/src/integer.cpp:
Do not define type within a anonymous union.
Use a variable to return a value instead of
leaving the result in a register -- compiler
does not know the logic inside the asm.
extra/yassl/taocrypt/src/misc.cpp:
Define handler for pure virtual functions.
Remove unused code.
extra/yassl/taocrypt/src/twofish.cpp:
Avoid potentially ambiguous conditions.
extra/yassl/testsuite/test.hpp:
Function must have C language linkage.
include/m_string.h:
Remove check which relied on bcmp being defined -- they weren't
being used as bcmp is only visible when _BSD_SOURCE is defined.
include/my_bitmap.h:
Remove bogus helpers which were used only in a few files and
were causing warnings about dead code.
include/my_global.h:
Due to G++ bug, always silence false-positive uninitialized
variables warnings when compiling C++ code with G++.
Remove bogus helper.
libmysql/Makefile.shared:
Remove built-in implementation of bcmp.
mysql-test/lib/My/SafeProcess/safe_process.cc:
Cast pid to largest possible type for a process identifier.
mysys/mf_loadpath.c:
Leave space of the ending nul.
mysys/mf_pack.c:
Replace bcmp with memcmp.
mysys/my_bitmap.c:
Dead code removal.
mysys/my_gethwaddr.c:
Remove unused variable.
mysys/my_getopt.c:
Silence bogus uninitialized variable warning.
Do not cast away the constant qualifier.
mysys/safemalloc.c:
Cast to expected type.
mysys/thr_lock.c:
Silence bogus uninitialized variable warning.
sql/field.cc:
Replace bogus helper with a more appropriate logic which is
used throughout the code.
sql/item.cc:
Remove bogus logical condition which always evaluates to TRUE.
sql/item_create.cc:
Simplify code to avoid signedness related warnings.
sql/log_event.cc:
Replace use of bcmp with memcmp.
No need to use helpers for simple bit operations.
sql/log_event_old.cc:
Replace bmove_align with memcpy.
sql/mysqld.cc:
Move use declaration of variable to the ifdef block where it
is used. Remove now-unnecessary casts and arguments.
sql/set_var.cc:
Replace bogus helpers with simple and classic bit operations.
sql/slave.cc:
Cast to expected type and silence bogus warning.
sql/sql_class.h:
Don't use enum values as bit flags, the supposed type safety is
bogus as the combined bit flags are not a value in the enumeration.
sql/udf_example.c:
Only declare variable when necessary.
sql/unireg.h:
Replace use of bmove_align with memcpy.
storage/innobase/os/os0file.c:
Silence bogus warning.
storage/myisam/mi_open.c:
Remove bogus cast, DBUG_DUMP expects a pointer to unsigned
char.
storage/myisam/mi_page.c:
Remove bogus cast, DBUG_DUMP expects a pointer to unsigned
char.
strings/bcmp.c:
Remove built-in bcmp.
strings/ctype-ucs2.c:
Silence bogus warning.
tests/mysql_client_test.c:
Use a appropriate type as expected by simple_command().
at mf_iocache.c, line 1722
The slave crashed while two threads: IO thread and user thread
raced for the same mutex (the append_buffer_lock protecting the
relay log's IO_CACHE). The IO thread was trying to flush the
cache, and for that was grabbing the append_buffer_lock.
However, the other thread was closing and reopening the relay log
when the IO thread tried to lock. Closing and reopening the log
includes destroying and reinitialising the IO_CACHE
mutex. Therefore, the IO thread tried to lock a destroyed mutex.
We fix this by backporting patch for BUG#50364 which fixed this
bug in mysql server 5.5+. The patch deploys missing
synchronization when flush_master_info is called and the relay
log is flushed by the IO thread. In detail the patch backports
revision (from mysql-trunk):
- luis.soares@sun.com-20100203165617-b1yydr0ee24ycpjm
This patch already includes the post-push fix also in BUG#50364:
- luis.soares@sun.com-20100222002629-0cijwqk6baxhj7gr
When issuing a 'SET GLOBAL SQL_SLAVE_SKIP_COUNTER' statement, the previous
position along with the new position is dumped into the error log. Namely,
the following information is printed out: skip_counter, group_relay_log_name
and group_relay_log_pos.
DBUG_SYNC_POINT has at least one strong limitation that it's not defined
on all platforms. It has issues cooperating with @@debug.
All in all its functionality is superseded by DEBUG_SYNC facility and
there is no reason to maintain the old less flexible one.
Fixed with adding debug_sync_set_action() function as a facility to set up
a sync-action in the server sources code and re-writing existing simulations
(found 3) to use it.
Couple of tests have been reworked as well.
The patch offers a pattern for setting sync-points in replication threads
where the standard DEBUG_SYNC does not suffice to reach goals.
mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test:
rewriting the test from GET_LOCK()-based to DEBUG_SYNC-based;
a pattern of usage DEBUG_SYNC for replication testing is provided.
mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result:
results are changed.
mysql-test/suite/rpl/t/rpl_get_master_version_and_clock.test:
rewriting the test from GET_LOCK()-based to DEBUG_SYNC-based;
limiting the test to run only with MIXED binlog-format as the test last
some 10 secs sensitively contributing to the total of tests run.
mysql-test/suite/rpl/t/rpl_show_slave_running.test:
rewriting the test from GET_LOCK()-based to DEBUG_SYNC-based.
sql/debug_sync.cc:
adding debug_sync_set_action() function as a facility to set up
a sync-action in the server sources code.
sql/debug_sync.h:
externalizing debug_sync_set_action().
sql/item_func.cc:
purging sources from DBUG_SYNC_POINT.
sql/mysql_priv.h:
purging sources from DBUG_SYNC_POINT.
sql/slave.cc:
rewriting failure simulations to base on DEBUG_SYNC rather than GET_LOCK()-based DBUG_SYNC_POINT.
sql/sql_repl.cc:
removing an orphan failure simulation line because no counterpart in tests existing.
backporting of bug@30703 to 5.1.
The fixes are backed up with a regression test.
mysql-test/include/test_fieldsize.inc:
waiting to stop is to be actually exclusively for SQL thread.
mysql-test/suite/rpl/r/rpl_show_slave_running.result:
new results file is added.
mysql-test/suite/rpl/t/rpl_show_slave_running.test:
regression test for bug#30703 is added.
sql/mysqld.cc:
refining `show status like slave_running' handler to correspond to one of
`show slave status'.
sql/slave.cc:
A dbug-sync point is added to complement the regression test.
Until-pos guarding did not distiguish the master originated events from ones that the slave
can introduce to the relay log e.g Rotate to the next relay log at slave restarting.
The local Rotate's coordinate are incomparable with the Until-master-pos.
That led to the unexpectable stop this bug describes.
Fixed with to avoid Until-master-pos comparison for a local slave's event.
Notice that if --replicate-same-server is true such event is treated as coming from
the master side.
mysql-test/r/rpl_until.result:
results changed.
mysql-test/t/rpl_until.test:
regression test for bug#47210 is added.
sql/slave.cc:
st_relay_log_info::is_until_satisfied() is augmented with avoidance of
Until-master-pos comparison for a local slave's event.
if --replicate-same-server is true such event is treated as coming from
the master side.
sql/slave.h:
signature of is_until_satisfied() changed to supply THD and Event to the routine.
Implemented the server infrastructure for the fix:
1. Added a function LEX_STRING *thd_query_string(THD) to return
a LEX_STRING structure instead of char *.
This is the function that must be called in innodb instead of
thd_query()
2. Did some encapsulation in THD : aggregated thd_query and
thd_query_length into a LEX_STRING and made accessor and mutator
methods for easy code updating.
3. Updated the server code to use the new methods where applicable.
The BINLOG statement was sharing too much code with the slave SQL thread, introduced with
the patch for Bug#32407. This caused statements to be logged with the wrong server_id, the
id stored inside the events of the BINLOG statement rather than the id of the running
server.
Fix by rearranging code a bit so that only relevant parts of the code are executed by
the BINLOG statement, and the server_id of the server executing the statements will
not be overrided by the server_id stored in the 'format description BINLOG statement'.
mysql-test/extra/binlog_tests/binlog.test:
Added test to verify if the server_id stored in the 'format
description BINLOG statement' will override the server_id
of the server executing the statements.
mysql-test/suite/binlog/r/binlog_row_binlog.result:
Test result for bug#46640
mysql-test/suite/binlog/r/binlog_stm_binlog.result:
Test result for bug#46640
sql/log_event.cc:
Moved rows_event_stmt_clean() call from update_pos() to apply_event(). This in any case
makes more sense, and is needed as update_pos() is no longer called when executing
BINLOG statements.
Moved setting of rli->relay_log.description_event_for_exec from
Format_description_log_event::do_update_pos() to
Format_description_log_event::do_apply_event()
sql/log_event_old.cc:
Moved rows_event_stmt_clean() call from update_pos() to apply_event(). This in any case
makes more sense, and is needed as update_pos() is no longer called when executing
BINLOG statements.
sql/slave.cc:
The skip flag is no longer needed, as the code path for BINLOG statement has been
cleaned up.
sql/sql_binlog.cc:
Don't invoke the update_pos() code path for the BINLOG statement, as it contains code
that is redundant and/or harmful (especially setting thd->server_id).
On Mac OS X or Windows, sending a SIGHUP to the server or a
asynchronous flush (triggered by flush_time), would cause the
server to crash.
The problem was that a hook used to detach client API handles
wasn't prepared to handle cases where the thread does not have
a associated session.
The solution is to verify whether the thread has a associated
session before trying to detach a handle.
mysql-test/r/federated_debug.result:
Add test case result for Bug#47525
mysql-test/t/federated_debug-master.opt:
Debug point.
mysql-test/t/federated_debug.test:
Add test case for Bug#47525
sql/slave.cc:
Check whether a the thread has a associated session.
sql/sql_parse.cc:
Add debug code to simulate a reload without thread session.
when compiled with Sun Studio compiler).
The thing is that Sun Studio compiler calls destructor of stack
objects when pthread_exit() is called. That triggered an assertion
in DBUG_ENTER()/DBUG_RETURN() validation logic (if DBUG_ENTER() is
used in the beginning of function, all returns should be replaced
by DBUG_RETURN/DBUG_VOID_RETURN macros).
A fix is to explicitly use DBUG_LEAVE macro.
But there is no Last_IO_Error reported.
On the master, if a binary log event is larger than max_allowed_packet,
ER_MASTER_FATAL_ERROR_READING_BINLOG and the specific reason of this error is
sent to a slave when it requests a dump from the master, thus leading
the I/O thread to stop.
On a slave, the I/O thread stops when receiving a packet larger than max_allowed_packet.
In both cases, however, there was no Last_IO_Error reported.
This patch adds code to report the Last_IO_Error and exact reason before stopping the
I/O thread and also reports the case the out memory pops up while
handling packets from the master.
sql/sql_repl.cc:
The master send the Specific reasons instead of "error reading log entry" to the slave which is requesting a dump.
if an fatal error is returned by read_log_event function.
If the SQL Thread fails to execute an event due to a temporary error (e.g.
ER_LOCK_DEADLOCK) and the option "--slave_transaction_retries" is set the SQL
Thread should not be aborted and the transaction should be restarted from the
beginning and re-executed.
Unfortunately, a wrong interpretation of the THD::is_fatal_error was preventing
this behavior. In a nutshell, "this variable is set to TRUE if an execution of a
compound statement cannot continue. In particular, it is used to disable access
to the CONTINUE or EXIT handlers of stored routines. So even temporary errors
may have this variable set.
To fix the bug, we have done what follows:
DBUG_ENTER("has_temporary_error");
- if (thd->is_fatal_error)
- DBUG_RETURN(0);
-
DBUG_EXECUTE_IF("all_errors_are_temporary_errors",
if (thd->main_da.is_error())
{
Bug#45243: crash on win in sql thread clear_tables_to_lock() -> free()
Bug#45242: crash on win in mysql_close() -> free()
Bug#45238: rpl_slave_skip, rpl_change_master failed (lost connection) for STOP SLAVE
Bug#46030: rpl_truncate_3innodb causes server crash on windows
Bug#46014: rpl_stm_reset_slave crashes the server sporadically in pb2
When killing a user session on the server, it's necessary to
interrupt (notify) the thread associated with the session that
the connection is being killed so that the thread is woken up
if waiting for I/O. On a few platforms (Mac, Windows and HP-UX)
where the SIGNAL_WITH_VIO_CLOSE flag is defined, this interruption
procedure is to asynchronously close the underlying socket of
the connection.
In order to enable this schema, each connection serving thread
registers its VIO (I/O interface) so that other threads can
access it and close the connection. But only the owner thread of
the VIO might delete it as to guarantee that other threads won't
see freed memory (the thread unregisters the VIO before deleting
it). A side note: closing the socket introduces a harmless race
that might cause a thread attempt to read from a closed socket,
but this is deemed acceptable.
The problem is that this infrastructure was meant to only be used
by server threads, but the slave I/O thread was registering the
VIO of a mysql handle (a client API structure that represents a
connection to another server instance) as a active connection of
the thread. But under some circumstances such as network failures,
the client API might destroy the VIO associated with a handle at
will, yet the VIO wouldn't be properly unregistered. This could
lead to accesses to freed data if a thread attempted to kill a
slave I/O thread whose connection was already broken.
There was a attempt to work around this by checking whether
the socket was being interrupted, but this hack didn't work as
intended due to the aforementioned race -- attempting to read
from the socket would yield a "bad file descriptor" error.
The solution is to add a hook to the client API that is called
from the client code before the VIO of a handle is deleted.
This hook allows the slave I/O thread to detach the active vio
so it does not point to freed memory.
server-tools/instance-manager/mysql_connection.cc:
Add stub method required for linking.
sql-common/client.c:
Invoke hook.
sql/client_settings.h:
Export hook.
sql/slave.cc:
Introduce hook that clears the active VIO before it is freed
by the client API.
procedures causes crashes!
The problem of that bugreport was mostly fixed by the
patch for bug 38691.
However, attached test case focused on another crash or
valgrind warning problem: SHOW PROCESSLIST query accesses
freed memory of SP instruction that run in a parallel
connection.
Changes of thd->query/thd->query_length in dangerous
places have been guarded with the per-thread
LOCK_thd_data mutex (the THD::LOCK_delete mutex has been
renamed to THD::LOCK_thd_data).
sql/ha_myisam.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
Modification of THD::query/query_length has been guarded
with the a THD::set_query() method call/LOCK_thd_data
mutex.
Unnecessary locking with the global LOCK_thread_count
mutex has been removed.
sql/log_event.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
Modification of THD::query/query_length has been guarded
with the THD::set_query()) method call/LOCK_thd_data
mutex.
sql/slave.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
Modification of THD::query/query_length has been guarded
with the THD::set_query() method call/LOCK_thd_data mutex.
The THD::LOCK_delete mutex has been renamed to
THD::LOCK_thd_data.
sql/sp_head.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
Modification of THD::query/query_length has been guarded
with the a THD::set_query() method call/LOCK_thd_data
mutex.
sql/sql_class.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
The new THD::LOCK_thd_data mutex and THD::set_query()
method has been added to guard modifications of THD::query/
THD::query_length fields, also the Statement::set_statement()
method has been overloaded in the THD class.
The THD::LOCK_delete mutex has been renamed to
THD::LOCK_thd_data.
sql/sql_class.h:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
The new THD::LOCK_thd_data mutex and THD::set_query()
method has been added to guard modifications of THD::query/
THD::query_length fields, also the Statement::set_statement()
method has been overloaded in the THD class.
The THD::LOCK_delete mutex has been renamed to
THD::LOCK_thd_data.
sql/sql_insert.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
Modification of THD::query/query_length has been guarded
with the a THD::set_query() method call/LOCK_thd_data
mutex.
sql/sql_parse.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
Modification of THD::query/query_length has been guarded
with the a THD::set_query() method call/LOCK_thd_data mutex.
sql/sql_repl.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
The THD::LOCK_delete mutex has been renamed to
THD::LOCK_thd_data.
sql/sql_show.cc:
Bug #38816: kill + flush tables with read lock + stored
procedures causes crashes!
Inter-thread read of THD::query/query_length field has
been protected with a new per-thread LOCK_thd_data
mutex in the mysqld_list_processes function.
The "get_master_version_and_clock(...)" function in sql/slave.cc ignores
error and passes directly when queries fail, or queries succeed
but the result retrieved is empty.
The "get_master_version_and_clock(...)" function should try to reconnect master
if queries fail because of transient network problems, and fail otherwise.
The I/O thread should print a warning if the some system variables do not
exist on master (very old master)
mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test:
Added test file for bug #45214
mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result:
Added test result for bug #45214
mysql-test/suite/rpl/t/rpl_get_master_version_and_clock.test:
Added test file for bug #45214
sql/slave.cc:
The 'is_network_error()' function is added for checking if the error is caused by network.
Added a new return value (2) to 'get_master_version_and_clock()' function result set
to indicate transient network errors when queries fail, and the caller should
try to reconnect in this case.
timeout
In STMT and MIXED modes, a statement that changes both non-transactional and
transactional tables must be written to the binary log whenever there are
changes to non-transactional tables. This means that the statement gets into the
binary log even when the changes to the transactional tables fail. In particular
, in the presence of a failure such statement is annotated with the error number
and wrapped in a begin/rollback. On the slave, while applying the statement, it
is expected the same failure and the rollback prevents the transactional changes
to be persisted.
Unfortunately, statements that fail due to concurrency issues (e.g. deadlocks,
timeouts) are logged in the same way causing the slave to stop as the statements
are applied sequentially by the SQL Thread. To fix this bug, we automatically
ignore concurrency failures on the slave. Specifically, the following failures
are ignored: ER_LOCK_WAIT_TIMEOUT, ER_LOCK_DEADLOCK and ER_XA_RBDEADLOCK.
The reason for the crash was rotate_relay_log (mi=0x0) did not verify
the passed value of active_mi. There are more cases where active_mi
is supposed to be non-zero e.g change_master(), stop_slave(), and it's
reasonable to protect from a similar crash all of them with common
fixes.
Fixed with spliting end_slave() in slave threads release and slave
data clean-up parts (a new close_active_mi()). The new function is
invoked at the very end of close_connections() so that all users of
active_mi are proven to have left.
sql/mysqld.cc:
added the 2nd part (data) of the slave's clean up.
sql/slave.cc:
end_slave() is split in two part to release the slave threads and the remained
resources separately.
The new close_active_mi() should be called after all possible users ofactive_mi
has left, i.e at the very end of close_connections().
sql/slave.h:
interface to the new end_active_mi() function is added.
with gcc 4.3.2
Compiling MySQL with gcc 4.3.2 and later produces a number of
warnings, many of which are new with the recent compiler
versions.
This bug will be resolved in more than one patch to limit the
size of changesets. This is the first patch, fixing a number
of the warnings, predominantly "suggest using parentheses
around && in ||", and empty for and while bodies.
with gcc 4.3.2
Compiling MySQL with gcc 4.3.2 and later produces a number of
warnings, many of which are new with the recent compiler
versions.
This bug will be resolved in more than one patch to limit the
size of changesets. This is the first patch, fixing a number
of the warnings, predominantly "suggest using parentheses
around && in ||", and empty for and while bodies.
The server was not cleaning the last IO error and error number when
resetting slave.
This patch addresses this issue by backporting into 5.1 part of the
patch in BUG 34654. A fix for this issue had already been pushed into
6.0 as part of the aforementioned bug, however the patch also included
some refactoring. The fix for 5.1 does not take into account the
refactoring part.
mysql-test/extra/rpl_tests/rpl_reset_slave.test:
Backported the test case and improved with deploying include/start_slave.inc
in relevant spots.
sql/slave.cc:
Backported part of patch from 6.0 that includes cleaning
mi->clear_error() at:
1. beginning of handle_slave_io
2. on successful connection
Also, backported the assertion added in the original patch.
sql/sql_repl.cc:
Backported the call to mi->clear_error() on reset_slave().
stop/start slave
When stopping and restarting the slave while it is replicating
temporary tables, the server would crash or raise an assertion
failure. This was due to the fact that although temporary tables are
saved between slave threads restart, the reference to the thread in
use (table->in_use) was not being properly updated when the restart
happened (it would still reference the old/invalid thread instead of
the new one).
This patch addresses this issue by resetting the reference to the new
slave thread on slave thread restart.
mysql-test/r/rpl_temporary.result:
Result file.
mysql-test/t/rpl_temporary.test:
Test case that checks that both failures go away.
sql/slave.cc:
Changed slave.cc to reset sql_thd reference in temporary tables.
165 changesets with 23 conflicts:
Text conflict in mysql-test/r/lock_multi.result
Text conflict in mysql-test/t/lock_multi.test
Text conflict in mysql-test/t/mysqldump.test
Text conflict in sql/item_strfunc.cc
Text conflict in sql/log.cc
Text conflict in sql/log_event.cc
Text conflict in sql/parse_file.cc
Text conflict in sql/slave.cc
Text conflict in sql/sp.cc
Text conflict in sql/sp_head.cc
Text conflict in sql/sql_acl.cc
Text conflict in sql/sql_base.cc
Text conflict in sql/sql_class.cc
Text conflict in sql/sql_crypt.cc
Text conflict in sql/sql_db.cc
Text conflict in sql/sql_lex.cc
Text conflict in sql/sql_parse.cc
Text conflict in sql/sql_select.cc
Text conflict in sql/sql_table.cc
Text conflict in sql/sql_view.cc
Text conflict in storage/innobase/handler/ha_innodb.cc
Text conflict in storage/myisam/mi_packrec.c
Text conflict in tests/mysql_client_test.c
Updates to Innobase, taken from main 5.1:
bzr: ERROR: Some change isn't sane:
File mysql-test/r/innodb-semi-consistent.result is owned by Innobase and should not be updated.
File mysql-test/t/innodb-semi-consistent.test is owned by Innobase and should not be updated.
File storage/innobase/handler/ha_innodb.cc is owned by Innobase and should not be updated.
File storage/innobase/ibuf/ibuf0ibuf.c is owned by Innobase and should not be updated.
File storage/innobase/include/row0mysql.h is owned by Innobase and should not be updated.
File storage/innobase/include/srv0srv.h is owned by Innobase and should not be updated.
File storage/innobase/include/trx0trx.h is owned by Innobase and should not be updated.
File storage/innobase/include/trx0trx.ic is owned by Innobase and should not be updated.
File storage/innobase/lock/lock0lock.c is owned by Innobase and should not be updated.
File storage/innobase/page/page0cur.c is owned by Innobase and should not be updated.
File storage/innobase/row/row0mysql.c is owned by Innobase and should not be updated.
File storage/innobase/row/row0sel.c is owned by Innobase and should not be updated.
File storage/innobase/srv/srv0srv.c is owned by Innobase and should not be updated.
File storage/innobase/trx/trx0trx.c is owned by Innobase and should not be updated.
(Set env var 'ALLOW_UPDATE_INNOBASE_OWNED' to override.)
The issue of the current bug is unguarded access to mi->slave_running
by the shutdown thread calling end_slave() that is bug#29968
(alas happened not to be cross-linked with the current bug)
Fixed:
with removing the unguarded read of the running status
and perform reading it in terminate_slave_thread()
at time run_lock is taken (mostly bug#29968 backporting, still with some
improvements over that patch - see the error reporting from
terminate_slave_thread()).
Issue of bug#38716 is fixed here for 5.0 branch as well.
Note:
There has been a separate artifact identified -
a race condition between init_slave() and end_slave() -
reported as Bug#44467.
mysql-test/r/rpl_bug38694.result:
a new results file is added.
mysql-test/t/rpl_bug38694-slave.opt:
simulating delay at slave threads shutdown.
mysql-test/t/rpl_bug38694.test:
A new test to check if a delay at the termination phase of slave threads
could cause any issue.
sql/slave.cc:
The unguarded read of the running status is removed. Its reading is done in
terminate_slave_thread() at time run_lock is taken;
Calling terminate_slave_threads(skip_lock := !need_slave_mutex) in the failing branch of start_slave_threads() which is bug#38716 issue.
sql/slave.h:
removing terminate_slave_thread() out of the global interface scope.