Fix a bunch of issues found with locking, ordering, and non-thread-safe stuff
in Relay_log_info.
Now able to do a simple benchmark, showing 4.5 times speedup for applying a
binlog with 10000 REPLACE statements.
Impement options --binlog-commit-wait-count and
--binlog-commit-wait-usec.
These options permit the DBA to deliberately increase latency
of an individual commit to get more transactions in each
binlog group commit. This increases the opportunity for
parallel replication on the slave, and can also decrease I/O
load on the master.
The options also make it easier to test the parallel
replication with mysql-test-run.
Fix some bugs around waiting for worker threads to end during SQL slave stop.
Free Log_event after parallel execution (still needs to be made thread-safe by
using rpl_group_info rather than rli).
Wait for all worker threads to finish when stopping the SQL thread.
(Only a basic wait; this still needs to be fixed to include timeout
logic as in sql_slave_killed()).
Hook in the wait-for-prior-commit logic (not really tested yet).
Clean up some resource maintenance around rpl_group_info (may still be some
smaller issues there though).
Add a ToDo list at the top of rpl_parallel.cc
Implement facility for the commit in one thread to wait for the commit of
another to complete first. The wait is done in a way that does not hinder
that a waiter and a waitee can group commit together with a single fsync()
in both binlog and InnoDB. The wait is done efficiently with respect to
locking.
The patch was originally made to support TaoBao parallel replication with
in-order commit; now it will be adapted to also be used for parallel
replication of group-committed transactions.
A waiter THD registers itself with a prior waitee THD. The waiter will then
complete its commit at the earliest in the same group commit of the waitee
(when using binlog). The wait can also be done explicitly by the waitee.
Now whenever we reach the GTID point requested from the slave (when using GTID
position to connect), we send a fake Gtid_list event. This event is used by
the slave to know the current old-style position for MASTER_POS_WAIT(), and
later the similar binlog position for MASTER_GTID_WAIT().
Without this fake event, if the slave is already fully up-to-date with the
master, there may be no events sent at the given position for an indeterminate
time.
If the mysql.gtid_slave_pos table is not available, we cannot load nor update
the current GTID position persistently. This can happen eg. after an upgrade,
before mysql_upgrade_db is run, or if the table is InnoDB and the server is
restarted without the InnoDB storage engine enabled.
Before, replication always failed to start if the table was unavailable. With
this patch, we try to continue with old-style replication, after suitable
complaints in the error log. In strict mode, or if slave is configured to use
GTID, slave still refuses to start.
There was some old code that cleared the position in CHANGE MASTER,
it was forgotten to be removed.
In addition, add code that saves/restores the old-style position
when we nuke the old relay logs as part of GTID slave start.
Normally we will not use these, but it could be useful in case
the GTID connect fails and user wants to go back to the old-style
coordinates.
mysql-test/include/wait_show_condition.inc:
Print failing statement if timeout
mysql-test/r/myisam-metadata.result:
Updated DBUG_SYNC
mysql-test/t/myisam-metadata.test:
Updated DBUG_SYNC.
Removed wait_show_condtion, as this is not needed when we use DBUG_SYNC
This should fix timing issues with the test
mysys/thr_mutex.c:
Added comments
sql/sql_acl.cc:
atoi -> atoll() (Safety)
storage/myisam/ha_myisam.cc:
Send signal before mi_repair_by_sort.
Fix problems related to reconnect. When we need to reconnect (ie. explict
stop/start of just the IO thread by user, or automatic reconnect due to
loosing network connection with the master), it is a bit complex to correctly
resume at the right point without causing duplicate or missing events in the
relay log. The previous code had multiple problems in this regard.
With this patch, the problem is solved as follows. The IO thread keeps track
(in memory) of which GTID was last queued to the relay log. If it needs to
reconnect, it resumes at that GTID position. It also counts number of events
received within the last, possibly partial, event group, and skips the same
number of events after a reconnect, so that events already enqueued before the
reconnect are not duplicated.
(There is no need to keep any persistent state; whenever we restart slave
threads after both of them being stopped (such as after server restart), we
erase the relay logs and start over from the last GTID applied by SQL thread.
But while the SQL thread is running, this patch is needed to get correct relay
log).
There were several cases where the slave GTID position was not loaded
correctly before being used. This caused various failures such as
corrupting the position at slave start and empty values of
@@gtid_slave_pos and @@gtid_current_pos.
Fixed by adding more checks for loaded position, and by always loading
the position at server startup.
Problem :
libreadline.so was already present on the machine, however the cmake check NEW_READLINE_INTERFACE was unsuccessfull indicating, thus bundled library had to be used instead of system library.
The problem was that the value for HAVE_HIST_ENTRY cmake variable was cached with incorrect value (1 on NetBSD).
The fix is to change HAVE_HIST_ENTRY to 0 with CACHE FORCE, after switching to bundled readline.
The idea in the code was to protect the user that tries to connect a slave
to a master with completely different domains than what was intended. If
none of the domains in the start position are present at all in the master
binlog, we gave an error.
However, this is a stupid idea. Because when a slave connects to a master
to start replication from the very start of binlogs - such as when setting
up new master->slave servers from scratch - there will be just this
situation, the requested slave position is empty for all the domains in the
master's binlog.
So the code that gives this error is wrong, and the solution is simply to
remove it.
When @@GLOBAL.gtid_strict_mode=1, then certain operations result
in error that would otherwise result in out-of-order binlog files
between servers.
GTID sequence numbers are now allocated independently per domain;
this results in less/no holes in GTID sequences, increasing the
likelyhood that diverging binlogs will be caught by the slave when
GTID strict mode is enabled.
The problem was the Gtid_list event which is logged to the binlog in
10.0 and is not understood by the 5.5 server.
This event is supposed to be replaced with a dummy event for 5.5
servers. But the very first event logged in the very first binlog
has an empty list of GTID, which makes the event too short to be
replacable with an empty event.
The fix is to pad the empty Gtid_list event to be big enough to
be replacable by a dummy event.
Change of user interface to be more logical and more in line with expectations
to work similar to old-style replication.
User can now explicitly choose in CHANGE MASTER whether binlog position is
taken into account (master_gtid_pos=current_pos) or not (master_gtid_pos=
slave_pos) when slave connects to master.
@@gtid_pos is replaced by three separate variables @@gtid_slave_pos (can
be set by user, replicated GTIDs only), @@gtid_binlog_pos (read only), and
@@gtid_current_pos (a combination of the two, most recent GTID within each
domain). mysql.rpl_slave_state is renamed to mysql.gtid_slave_pos to match.
This fixes MDEV-4474.
replaced snippets_udf.cc with the latest version (2.0.8 from sphinxsource.com), fixed trivial errors on Windows.
It will be compiled and installed into plugins directory now.