Semisync ack (master side) receiver thread is made to report
details of faced errors.
In case of 'magic byte' error, a hexdump of the received packet
is always (level) NOTEd into the error log.
In other cases an exact server level error is print out
as a warning (as it may not be critical) under log_warnings > 2.
An MTR test added for the magic byte error. For others existing mtr
tests cover that, provided log_warnings > 2 is set.
Back port upstream fix
commit 1800b015a1d487330f7b15f2020b887be348a66b
Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com>
Date: Fri Sep 8 20:29:22 2017 +0530
Bug#26027024 SLAVE_COMPRESSED_PROTOCOL DOESN'T WORK WITH
SEMI-SYNC REPLICATION IN MYSQL-5.7
Analysis: In mysql-5.6, dump thread (the thread that is created
on Master after Slave requested for a binlog dump) is also used
to receive acknowledgements from the Slave and act on them accordingly.
For performance reasons, a special thread called Ack Receiver thread
is added in mysql-5.7 Semi synchronous replication plugin.
This thread does not have special handling to receive acknowledgements
if Slave has enabled compression in the protocol. Hence Master is
unable to handle any slave if Slave_compressed_protocol is enabled
on it.
Fix: Enable compress flag on the communication channels if the Slave
has Slave_compressed_protocol ON.
Originally introduced by e972125f1 to avoid harmless wait for
LOCK_global_system_variables in a newly created thread, which creation was
initiated by system variable update.
At the same time it opens dangerous hole, when system variable update
thread already released LOCK_global_system_variables and ack_receiver
thread haven't yet completed new THD construction. In this case THD
constructor goes completely unprotected.
Since ack_receiver.stop() waits for the thread to go down, we have to
temporarily release LOCK_global_system_variables so that it doesn't
deadlock with ack_receiver.run(). Unfortunately it breaks atomicity
of rpl_semi_sync_master_enabled updates and makes them not serialized.
LOCK_rpl_semi_sync_master_enabled was introduced to workaround the above.
TODO: move ack_receiver start/stop into repl_semisync_master
enable_master/disable_master under LOCK_binlog protection?
Part of MDEV-14984 - regression in connect performance
In contrast to thread_count, which is decremented by THD destructor,
this one was most probably intended to be decremented after all THD
destructors are done.
THD_count class was added to achieve similar effect with thread_count.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
The semisync ack collector hits fd's out-of-bound value assert through
a stack of
/usr/sbin/mysqld(_ZN12Ack_receiver17get_slave_socketsEP6fd_setPj+0x70)[0x7fa3bbe27400]
/usr/sbin/mysqld(_ZN12Ack_receiver3runEv+0x540)[0x7fa3bbe27980]
/usr/sbin/mysqld(ack_receive_handler+0x19)[0x7fa3bbe27a79]
The reason of the failure must be the same as in https://bugs.mysql.com/bug.php?id=79865
whose fixes are applied with minor changes.
Specifically, the semisync ack thread is changed to use poll()
instead of select() on platforms where the former is defined.
On the systems that still use select(), Ack receive thread will generate
an error and semi sync will be switched off. Windows systems is exception
case because on windows this limitation does not exists.
The sustain manual testing with `mysqlslap --concurrency > 1024' in "background" while the slave io thread is restarting multiple times.
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
and specifically the ack receiving functionality.
Semisync is turned to be static instead of plugin so its functions
are invoked at the same points as RUN_HOOKS.
The RUN_HOOKS and the observer interface remain to be removed by later
patch.
Todo:
React on killed status by repl_semisync_master.wait_after_sync(). Currently
Repl_semi_sync_master::commit_trx does not check the killed status.
There were few bugfixes found that are present in mysql and its unclear
whether/how they are covered. Those include:
Bug#15985893: GTID SKIPPED EVENTS ON MASTER CAUSE SEMI SYNC TIME-OUTS
Bug#17932935 CALLING IS_SEMI_SYNC_SLAVE() IN EACH FUNCTION CALL
HAS BAD PERFORMANCE
Bug#20574628: SEMI-SYNC REPLICATION PERFORMANCE DEGRADES WITH A HIGH NUMBER OF THREADS