.. file '/var/log/mysql/mariadb-bin.000001' not found in binlog
index, needed for recovery. Aborting.
In Galera cluster, while preparing for rsync/xtrabackup based
SST, the donor node takes an FTWRL followed by (REFRESH_ENGINE_LOG
in rsync based state transfer and) REFRESH_BINARY_LOG. The latter
rotates the binary log and logs Binlog_checkpoint_log_event
corresponding to the penultimate binary log file into the new file.
The checkpoint event for the current file is later logged
synchronously by binlog_background_thread.
Now, since in rsync/xtrabackup based snapshot state transfer methods,
only the last binary log file is transferred to the joiner node; the
file could get transferred even before the checkpoint event for the
same file gets written to it. As a result, the joiner node would fail
to start complaining about the missing binlog file needed for recovery.
In order to fix this, a mechanism has been put in place to make
REFRESH_BINARY_LOG operation wait for Binlog_checkpoint_log_event
to be logged for the current binary log file if the node is part of
a Galera cluster. As further safety, during rsync based state transfer
the donor node now acquires and owns LOCK_log for the duration of file
transfer during SST.
- fixes in innodb to skip wsrep processing (like kill victim) when running in native mysql mode
- similar fixes in mysql server side
- forcing tc_log_dummy in native mysql mode when no binlog used. wsrep hton messes up handler counter
and used to lead in using tc_log_mmap instead. Bad news is that tc_log_mmap does not seem to work at all
If a conflict happens under wsrep_on, the THD's wsrep_conflict_state
is typically set to MUST_ABORT and cleared later, when transaction is
aborted. However, when wsrep_on is disabled, no check is performed to
see whether wsrep_conflict_state is set. So this potentially creates
spurious deadlock errors on the subsequent statement that runs with
wsrep_on enabled.
To avoid this problem wsrep_thd_set_conflict_state() sets the conflict
state only if wsrep_on is enabled.
- At startup time global wsrep_on is set too late and some wsrep paths may be executed
because of this. e.g. replication slave restart could happen before wsrep_on state is defined.
- This fix checks both global wsrep_on and wsrep_provider values to determine if wsrep
processing should happen
- Fix affects all instances where WSREP_ON macro is used
- popping PS reprepare observer before BF aborted PS replaying begins
dangling observer will cause failure in open_table() ater on
- test case for this anomaly
- changed the condition when to do implicit desync as part of FTWRL to
cover only case when node is PC and synced. Donor node has alreaydy desycned
and other states mean that node is not in cluster, so desync is not even possible.
- reverted from tracking donor servicing thread. With xtrabackup SST,
xtrabackup thread will call FTWRL and node is desynced upfront
- Skipping desync in FTWRL if node is operating as donor
- enveloped FTWRL processing with wsrep desync/resync calls. This way FTWRL processing node
will not cause flow control to kick in
- donor servicing thread is unfortunate exception, we must let him to pause provider as part
of FTWRL phase, but not desync/resync as this is done as part of donor control on higher
level