The root cause of the crash is that a TranxNode is freed before it is used.
A TranxNode is allocated and inserted into the active list each time
a log event is written and flushed into the binlog file.
The memory for TranxNode is allocated with thd_alloc and will be freed
at the end of the statement. The after_commit/after_rollback callback
was supposed to be called before the end of each statement and remove the node from
the active list. However this assumption is not correct in all cases(e.g. call
'CREATE TEMPORARY TABLE myisam_t SELECT * FROM innodb_t' in a transaction
and delete all temporary tables automatically when a session closed),
and can cause the memory allocated for TranxNode be freed
before it was removed from the active list. So The TranxNode pointer in the active
list would become a wild pointer and cause the crash.
After this patch, We have a class called a TranxNodeAllocate which manages the memory
for allocating and freeing TranxNode. It uses my_malloc to allocate memory.
sql/rpl_handler.cc:
params are not initialized.
Added back n_frees, use 'clear' instead of 'free' since memory is
not freed here.
plugin/semisync/semisync_master.cc:
Added back n_frees, use 'clear' instead of 'free' in the message since memory is not freed here.
Before this patch, semisync assumed transactions running in parallel
can not be larger than max_connections, but this is not true when
the event scheduler is executing events, and cause semisync run out
of preallocated transaction nodes.
Fix the problem by allocating transaction nodes dynamically.
This patch also fixed a possible deadlock when running UNINSTALL
PLUGIN rpl_semi_sync_master and updating in parallel. Fixed by
releasing the internal Delegate lock before unlock the plugins.
mysql-test/suite/rpl/t/rpl_semi_sync_event.test:
Add test case for bug#49020
plugin/semisync/semisync_master.cc:
Allocating TranxNode dynamically
plugin/semisync/semisync_master.h:
Allocating TranxNode dynamically
sql/rpl_handler.cc:
Unlock plugins after we have released the Delegate lock to avoid possible deadlock when uninstalling semisync master plugin and doing update in parallel.
The semisync plugin library names on Unix like systems were prefixed with
'lib', which did not follow the conventions.
Fix the problem by removing the 'lib' prefix on Unix systems.
mysql-test/mysql-test-run.pl:
Remove 'lib' prefix for semisync plugin library names
plugin/semisync/Makefile.am:
Remove 'lib' prefix for semisync plugin library names
plugin/semisync/plug.in:
Remove 'lib' prefix for semisync plugin library names
CMakeLists.txt:
Add plugin/semisync subdirectory
mysql-test/mysql-test-run.pl:
Check for semisync dll for Windows
mysql-test/suite/rpl/r/rpl_semi_sync.result:
Update result file
mysql-test/suite/rpl/t/rpl_semi_sync.test:
Test semi-sync on Windows
plugin/semisync/semisync_master.cc:
Define gettimeofday for Windows
rpl_semi_sync_master_wait_sessions was reset by FLUSH STATUS,
which could cause the master fail to wake up waiting sessions and
result in master timeout waiting for slave reply.
rpl_semi_sync_master_wait_session should not be reset, this
problem is fixed by this patch.
plugin/semisync/semisync_master_plugin.cc:
Change wait_sessions from SHOW_LONG back to SHOW_FUNC so that it will not be reset by FLUSH STATUS.
Remove functions that no longer needed
Fix warning suppressions
mysql-test/suite/rpl/t/rpl_semi_sync.test:
Fix warning suppressions
plugin/semisync/semisync_slave.cc:
Remove functions that no longer needed
plugin/semisync/semisync_slave.h:
Remove functions that no longer needed
Add an option to control whether the master should keep waiting
until timeout when it detected that there is no semi-sync slave
available.
The bool option 'rpl_semi_sync_master_wait_no_slave' is 1 by
defalt, and will keep waiting until timeout. When set to 0, the
master will switch to asynchronous replication immediately when
no semi-sync slave is available.
Semi-sync status were not reset by FLUSH STATUS, this was because
all semi-sync status variables are defined as SHOW_FUNC and FLUSH
STATUS could only reset SHOW_LONG type variables.
This problem is fixed by change all status variables that should
be reset by FLUSH STATUS from SHOW_FUNC to SHOW_LONG.
After the fix, the following status variables will be reset by
FLUSH STATUS:
Rpl_semi_sync_master_yes_tx
Rpl_semi_sync_master_no_tx
Note: normally, FLUSH STATUS itself will be written into binlog
and be replicated, so after FLUSH STATS, one of
Rpl_semi_sync_master_yes_tx
Rpl_semi_sync_master_no_tx
can be 1 dependent on the semi-sync status. So it's recommended
to use FLUSH NO_WRITE_TO_BINLOG STATUS to avoid this.
Errors when send reply to master should never cause the IO thread
to stop, because master can fall back to async replication if it
does not get reply from slave.
The problem is fixed by deliberately ignoring the return value of
slaveReply.
Semi-sync uses an extra connection from slave to master to send
replies, this is a normal client connection, and used a normal
SET query to set the reply information on master, which is visible
to user and may cause some confusion and complaining.
This problem is fixed by using the method of sending reply by
using the same connection that is used by master dump thread to
send binlog to slave. Since now the semi-sync plugins are integrated
with the server code, it is not a problem to use the internal net
interfaces to do this.
The master dump thread will mark the event requires a reply and
wait for the reply when the event just sent is the last event
of a transaction and semi-sync status is ON; And the slave will
send a reply to master when it received such an event that requires
a reply.
On sparc, semisync master/slave status is always showed as OFF, this
is fixed by change rpl_semisync_master/slave_status variables from
long to char.
plugin/semisync/semisync_master.cc:
Change rpl_semisync_master_status variables from long to char
plugin/semisync/semisync_master.h:
Change rpl_semisync_master_status variables from long to char
plugin/semisync/semisync_slave.cc:
Change rpl_semisync_slave_status variables from long to char
plugin/semisync/semisync_slave.h:
Change rpl_semisync_slave_status variables from long to char