Problem:- When we issue FTWRL with shutdown in parallel, there is race between
FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool)
before FTWRL can lock it. So we can get crash on FTWRL thread
Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for
FTWRL thread to complete its work , and then destroy.
So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed
later in end_slave()
Parallel slave server shutdown found to be hanging in
close_connections() triggered by shutdown due to a slave worker thread
would not be notified to exit in case the worker was sitting idle.
Fixed with destroying the worker pool earlier that is in
slave_prepare_for_shutdown() when all their driver threads have already left.
A test file is added to simulate the bug condition as well as check
multi-sourced and not-idle worker cases.
Sample log error message generated:
2021-01-21 2:33:24 139912137520896 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 369
33:24 139912137520896 [Note] master was 127.0.0.1:16400
2021-01-21 2:33:24 139912137828096 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 369
2021-01-21 2:33:24 139912137828096 [Note] master was 127.0.0.1:16400
Based on work by Hartmut Holzgraefe.
Reviewer: knielsen@knielsen-hq.org, Andrei, Sachin
Analysis:
========
Writes to 'rli->log_space_total' needs to be synchronized, otherwise both
SQL_THREAD and IO_THREAD can try to modify the variable simultaneously
resulting in incorrect rli->log_space_total. In the current test scenario
SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log'
and IO_THREAD is trying to increment the 'rli->log_space_total' in
'queue_event' simultaneously. Hence test occasionally fails with result
mismatch.
Fix:
===
Convert 'rli->log_space_total' variable to atomic type.
Problem:
=======
rpl.rpl_slave_load_tmpdir_not_exist 'stmt' w3 [ fail ] Found warnings/errors
in server log file!
Test ended at 2017-09-27 20:34:55
[Warning] Master is configured to log replication events with checksum, but
will not send such events to slaves that cannot process them
^ Found warnings in /mnt/buildbot/build/mariadb-10.2.10/mysql-test/var/3/log/mysqld.1.err
ok
Analysis:
========
When slave tries to connect to master 'get_master_version_and_clock' function
is invoked to perform elaborated slave-master handshake. During this process
slave server queries master server, to know if it is checksum aware and at the
same time master is notified about its CRC-awareness. The master's side
instant value of @@global.binlog_checksum is stored in the dump thread's
uservar area as well as cached locally to become known in consensus by master
and slave.
Post hand-shake slave requests master for binlog dump. It sends
'COM_BINLOG_DUMP'. This command is sent to master by 'cli_advanced_command'
call. If there is some temporary network failure during this request_dump
call, 'end_server' is invoked to close the current connection between master
and slave. Upon connection close the dump thread on the master gets terminated
and it clears the 'uservar' data it got through master-slave handshake.
The 'COM_BINLOG_DUMP' command is sent once again without master-slave
handshake. Since the checksum data is not available with new dump thread a
warning gets reported.
Fix:
===
Upon network write error donot attempt reconnect, proceed to master-slave
handshake. This ensures that master is aware of slave's capability to use
checksums.
The immediate bug was caused by a failure to recognize a correct
position to stop the slave applier run in optimistic parallel mode.
There were the following set of issues that the analysis unveil.
1 incorrect estimate for the event binlog position passed to
is_until_satisfied
2 wait for workers to complete by the driver thread did not account non-group events
that could be left unprocessed and thus to mix up the last executed
binlog group's file and position:
the file remained old and the position related to the new rotated file
3 incorrect 'slave reached file:pos' by the parallel slave report in the error log
4 relay log UNTIL missed out the parallel slave branch in
is_until_satisfied.
The patch addresses all of them to simplify logics of log change
notification in either the master and relay-log until case.
P.1 is addressed with passing the event into is_until_satisfied()
for proper analisis by the function.
P.2 is fixed by changes in handle_queued_pos_update().
P.4 required removing relay-log change notification by workers.
Instead the driver thread updates the notion of the current relay-log
fully itself with aid of introduced
bool Relay_log_info::until_relay_log_names_defer.
An extra print out of the requested until file:pos is arranged
with --log-warning=3.
in fact, in MariaDB it cannot, but it can show spurious slaves
in SHOW SLAVE HOSTS.
slave was registered in COM_REGISTER_SLAVE and un-registered after
COM_BINLOG_DUMP. If there was no COM_BINLOG_DUMP, it would never
unregister.
This is a backport of the applicable part of
commit 93475aff8d and
commit 2c39f69d34
from 10.4.
Before 10.4 and Galera 4, WSREP_ON is a macro that points to
a global Boolean variable, so it is not that expensive to
evaluate, but we will add an unlikely() hint around it.
WSREP_ON_NEW: Remove. This macro was introduced in
commit c863159c32
when reverting WSREP_ON to its previous definition.
We replace some use of WSREP_ON with WSREP(thd), like it was done
in 93475aff8d. Note: the macro
WSREP() in 10.1 is equivalent to WSREP_NNULL() in 10.4.
Item_func_rand::seed_random(): Avoid invoking current_thd
when WSREP is not enabled.
If async slave thread (slave SQL handler), becomes a BF victim, it may occasionally happen that rollbacker thread is used to carry out the rollback instead of the async slave thread.
This can happen, if async slave thread has flagged "idle" state when BF thread tries to figure out how to kill the victim.
The issue was possible to test by using a galera cluster as slave for external master, and issuing high load of conflicting writes through async replication and directly against galera cluster nodes.
However, a deterministic mtr test for the "conflict window" has not yet been worked on.
The fix, in this patch makes sure that async slave thread state is never set to IDLE. This prevents the rollbacker thread to intervene.
The wsrep_query_state change was refactored to happen by dedicated function to make controlling the idle state change in one place.
If async replication slave thread conflicts with cluster replication,
then the async slave transaction should be BF aborted, and depending on the
state of async slave transaction execution, potentially also replayed.
There were problems in such BF abort implementation and the replaying was not
started.
This pull request contains fixes which make sure that if async slave thread is
marked to abort and replay, it will complete carry out the rollback and
release all locks and resources before starting the replaying. After replaying,
async slave transactions is treated as successful, so the slave thread will
continue as usual, handling next replication event.
There is also new mtr test: galera.galera_slave_replay, which stresses both a
certification failure for async slave thread and a successful BF abort
followed by replaying.
The string doesn't appear to be null-terminated when binlog checksums are
enabled. This causes a corrupt binlog name in the error message when a
slave is ahead of the master.
In MariaDB 10.2 master could have been configured so that there
is extra annotate events. When we peak next event type for CTAS we
need to skip annotate events.
The problem happens when MariaDB master replicates writes for only non InnoDB
tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster,
can apply these writes successfully, but it will, in the end, write gtid position in
mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and
this write makes innodb handlerton part of the replicated "transaction".
Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated
and skips appending wsrep keys for these writes. However, as InnoDB was present
in the transaction, and there are replication events (for MyISAM table) in transaction
cache, but there are no appended keys, wsrep raises an error, and this makes the söave
thread to stop.
The fix is simply to not treat it as an error if async slave tries to replicate a write
set with binlog events, but no keys. We just skip wsrep replication and return successfully.
This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be
of InnoDB engine, and executes MyISAM only write through asyn replication.
There is additional fix for declaring IO and background slave threads as non wsrep.
These threads should not write anything for wsrep replication, and this is just a safeguard
to make sure nothing leaks into cluster from these slave threads.
This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster.
The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure.
PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster.
This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
Analysis
Mysqlbinlog output for encrypted binary log
#Q> insert into tab1 values (3,'row 003')
#190912 17:36:35 server id 10221 end_log_pos 980 CRC32 0x53bcb3d3 Table_map: `test`.`tab1` mapped to number 19
# at 940
#190912 17:36:35 server id 10221 end_log_pos 1026 CRC32 0xf2ae5136 Write_rows: table id 19 flags: STMT_END_F
Here we can see Table_map_log_event ends at 980 but Next event starts at 940.
And the reason for that is we do not send START_ENCRYPTION_EVENT to the slave
Solution:-
Send Start_encryption_log_event as Ignorable_log_event to slave(mysqlbinlog),
So that mysqlbinlog can update its log_pos.
Since Slave can request multiple FORMAT_DESCRIPTION_EVENT while master does not
have so We only update slave master pos when master actually have the
FORMAT_DESCRIPTION_EVENT. Similar logic should be applied for START_ENCRYPTION_EVENT.
Also added the test case when new server reads the data from old server which
does not send START_ENCRYPTION_EVENT to slave.
Master Slave Upgrade Scenario.
When Slave is updated first, Slave will have extra logic of handling
START_ENCRYPTION_EVENT But master willnot be sending START_ENCRYPTION_EVENT.
So there will be no issue.
When Master is updated first, It will send START_ENCRYPTION_EVENT to
slave , But slave will ignore this event in queue_event.
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about
deprecated `register` keyword.
Too much warnings came from Mroonga and I gave up on it.
Problem:
========
When attempting to delay a Slave attached with GTID, there appears to be an
extra delay applied initially. For example, this output reflects a Slave that is
already delayed by 43200 seconds. When switching to GTID replication,
replication is paused until SQL_Remaining_Delay counts down to 0:
CHANGE MASTER TO master_use_gtid=current_pos; CHANGE MASTER TO
MASTER_DELAY=43200;
Seconds_Behind_Master: 44847
Using_Gtid: Current_Pos
SQL_Delay: 43200
SQL_Remaining_Delay: 43089
Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master
executed event
Analysis:
=========
When slave initiates a GTID based connection request to master, the master sends
two GTID_LIST events. The first one is actual GTID_LIST event and the second
one is a fake GTID_LIST event. This is sent by master to provide its current
binlary log file position. The fake GTID_LIST events will have their ev->when=0.
'when' (the timestamp) is set to 0 so that slave could distinguish between real
and fake Rotate events.
On slave side when MASTER_DELAY is configured to "X" the applier will ensure
that there is a time delay of "X" seconds before the event is applied.
General behaviour of MASTER_DELAY example:-
Master
timestamp of event e1=10
timestamp of event e2=11
On slave MASTER_DELAY=5
Event e1 will be applied at = 15
e2 will be applied at =16
In bug scenario:-
On Master: With GTIDs
timestamp of event e1=10
timestamp of event e2=0
On Slave:
e1 will be applied at = 10 + 5 =15
For e2, since "e2->when=0" e2->when is set to current timestamp.
i.e since the e2->when and current timestamp on slave is the same applier waits
for additional master_delay=5 seconds. the ev->when contributes to
"rli->last_master_timestamp".
rli->last_master_timestamp= ev->when + (time_t) ev->exec_time;
Fake events should not update the "ev->when" to "current timestamp" on slave.
Fix:
===
Remove the assignment of current timestamp to "ev->when" when "ev->when=0".
There was a failure in rpl_delayed_slave after recent MDEV-14528 commit.
The parallel applier should not set its
Relay_log::last_master_timestamp from Format-descriptor log event.
The latter may reflect a deep past so Seconds-behind-master will be
computed through it and displayed all time while the first possibly
"slow" group of events is executed.
The main MDEV-14528 is refined, rpl_delayed_slave now passes also
in the parallel mode.
When replicated events are from Master unaware of MariaDB GTID their handling by
the Parallel slave misses Seconds_Behind_Master updating.
In the bug condition the Show-Slave-Status' field remains unchanged.
Because in such case event execution is sequential the bug
is fixed with deploying the same logics as in the explicit single-threaded mode
with is to set
Relay_log_event::last_master_timestamp
member early at the end of event reading from the relay log.
These test can sporadically show mutex deadlock warnings between LOCK_wsrep_thd
and LOCK_thd_data mutexes. This means that these mutexes can be locked in opposite
order by different threads, and thus result in deadlock situation.
To fix such issue, the locking policy of these mutexes should be revised and
enforced to be uniform. However, a quick code review shows that the number of
lock/unlock operations for these mutexes combined is between 100-200, and all these
mutex invocations should be checked/fixed.
On the other hand, it turns out that LOCK_wsrep_thd is used for protecting access to
wsrep variables of THD (wsrep_conflict_state, wsrep_query_state), whereas LOCK_thd_data
protects query, db and mysys_var variables in THD. Extending LOCK_thd_data to protect
also wsrep variables looks like a viable solution, as there should not be a use case
where separate threads need simultaneous access to wsrep variables and THD data variables.
In this commit LOCK_wsrep_thd mutex is refactored to be replaced by LOCK_thd_data.
By bluntly replacing LOCK_wsrep_thd by LOCK_thd_data, will result in double locking
of LOCK_thd_data, and some adjustements have been performed to fix such situations.
Modern compilers (such as GCC 8) emit warnings that the
'register' keyword is deprecated and not valid C++17.
Let us remove most use of the 'register' keyword.
Code in 'extra/' is not touched.
Other things, mainly to get
create_mysqld_error_find_printf_error tool to work:
- Added protection to not include mysqld_error.h twice
- Include "unireg.h" instead of "mysqld_error.h" in server
- Added protection if ER_XX messages are already defined
- Removed wrong calls to my_error(ER_OUTOFMEMORY) as
my_malloc() and my_alloc will do this automatically
- Added missing %s to ER_DUP_QUERY_NAME
- Removed old and wrong calls to my_strerror() when using
MY_ERROR_ON_RENAME (wrong merge)
- Fixed deadlock error message from Galera. Before the extra
information given to ER_LOCK_DEADLOCK was missing because
ER_LOCK_DEADLOCK doesn't provide any extra information.
I kept #ifdef mysqld_error_find_printf_error_used in sql_acl.h
to make it easy to do this kind of check again in the future
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)
- Also fix printf-format warnings
Make the above mentioned warnings fatal.
- fix pthread_join on Windows to set return value.
Problem & Analysis: Slave's Receiver thread, Applier thread and worker
threads are created with LOCAL-INFILE option enabled. As the document
says https://dev.mysql.com/doc/refman/5.7/en/load-data-local.html,
there are some issues if a thread enables local infile.
This flag should be enabled with care. But for the above mentioned
internal threads, server is enabling it at the time of creation.
Fix: Further analysis on the code shows that none of threads really
need this flag to be enabled at any time as Slave never executes
"LOAD DATA LOCAL INFILE" after reading it from Relay log.
Applier thread removes "LOCAL" before start executing the query.