This commit adds 3 new status variables to 'show all slaves status':
- Master_last_event_time ; timestamp of the last event read from the
master by the IO thread.
- Slave_last_event_time ; Master timestamp of the last event committed
on the slave.
- Master_Slave_time_diff: The difference of the above two timestamps.
All the above variables are NULL until the slave has started and the
slave has read one query event from the master that changes data.
- Added information_schema.slave_status, which allows us to remove:
- show_master_info(), show_master_info_get_fields(),
send_show_master_info_data(), show_all_master_info()
- class Sql_cmd_show_slave_status.
- Protocol::store(I_List<i_string_pair>* str_list) as it is not
used anymore.
- Changed old SHOW SLAVE STATUS and SHOW ALL SLAVES STATUS to
use the SELECT code path, as all other SHOW ... STATUS commands.
Other things:
- Xid_log_time is set to time of commit to allow slave that reads the
binary log to calculate Master_last_event_time and
Slave_last_event_time.
This is needed as there is not 'exec_time' for row events.
- Fixed that Load_log_event calculates exec_time identically to
Query_event.
- Updated RESET SLAVE to reset Master/Slave_last_event_time
- Updated SQL thread's update on first transaction read-in to
only update Slave_last_event_time on group events.
- Fixed possible (unlikely) bugs in sql_show.cc ...old_format() functions
if allocation of 'field' would fail.
Reviewed By:
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Kristian Nielsen <knielsen@knielsen-hq.org>
Remove alter_algorithm but keep the variable as no-op (with a warning).
The reasons for removing alter_algorithm are:
- alter_algorithm was introduced as a replacement for the
old_alter_table that was used to force the usage of the original
alter table algorithm (copy) in the cases where the new alter
algorithm did not work. The new option was added as a way to force
the usage of a specific algorithm when it should instead have made
it possible to disable algorithms that would not work for some
reason.
- alter_algorithm introduced some cases where ALTER TABLE would not
work without specifying the ALGORITHM=XXX option together with
ALTER TABLE.
- Having different values of alter_algorithm on master and slave could
cause slave to stop unexpectedly.
- ALTER TABLE FORCE, as used by mariadb-upgrade, would not always work
if alter_algorithm was set for the server.
- As part of the MDEV-33449 "improving repair of tables" it become
clear that alter- algorithm made it harder to provide a better and
more consistent ALTER TABLE FORCE and REPAIR TABLE and it would be
better to remove it.
This patch extends the timestamp from
2038-01-19 03:14:07.999999 to 2106-02-07 06:28:15.999999
for 64 bit hardware and OS where 'long' is 64 bits.
This is true for 64 bit Linux but not for Windows.
This is done by treating the 32 bit stored int as unsigned instead of
signed. This is safe as MariaDB has never accepted dates before the epoch
(1970).
The benefit of this approach that for normal timestamp the storage is
compatible with earlier version.
However for tables using system versioning we before stored a
timestamp with the year 2038 as the 'max timestamp', which is used to
detect current values. This patch stores the new 2106 year max value
as the max timestamp. This means that old tables using system
versioning needs to be updated with mariadb-upgrade when moving them
to 11.4. That will be done in a separate commit.
Similar to #2480.
567b681 introduced safe_strcpy() to minimize the use of C with
potentially unsafe memory overflow with strcpy() whose use is
discouraged.
Replace instances of strcpy() with safe_strcpy() where possible, limited
here to files in the `sql/` directory.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
Clear any pending deadlock kill after completing XA PREPARE, and before
updating the mysql.gtid_slave_pos table in a separate transaction.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Keep track of each recently active XID, recording which worker it was queued
on. If an XID might still be active, choose the same worker to queue event
groups that refer to the same XID to avoid conflicts.
Otherwise, schedule the XID freely in the next round-robin slot.
This way, XA PREPARE can normally be scheduled without restrictions (unless
duplicate XID transactions come close together). This improves scheduling
and parallelism over the old method, where the worker thread to schedule XA
PREPARE on was fixed based on a hash value of the XID.
XA COMMIT will normally be scheduled on the same worker as XA PREPARE, but
can be a different one if the XA PREPARE is far back in the event history.
Testcase and code for trimming dynamic array due to Andrei.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This is a preparatory patch to facilitate the next commit to improve
the scheduling of XA transactions in parallel replication.
When choosing the scheduling bucket for the next event group in
rpl_parallel_entry::choose_thread(), use an explicit FIFO for the
round-robin selection instead of a simple cyclic counter i := (i+1) % N.
This allows to schedule XA COMMIT/ROLLBACK dependencies explicitly without
changing the round-robin scheduling of other event groups.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
When rolling back and retrying a transaction in parallel replication, don't
release the domain ownership (for --gtid-ignore-duplicates) as part of the
rollback. Otherwise another master connection could grab the ownership and
double-apply the transaction in parallel with the retry.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This will makes it easier to find out what replication workers are
doing and what they are waiting for.
Things changed in processlist:
- Slave_SQL time was not consistent. Now time for state "Slave has
read all relay log; waiting for more updates" shows how long it has
waited for getting the next event.
- Slave_worker threads did often show "Closing tables" for a long
time. Now the state is reverted to the previous state after
"Closing tables" is done.
- Commit and Rollback states where not shown for replication (and some
other threads). Now Commit and Rollback states are always shown and
the state is reverted to previous state when the Commit/Rollback
have finished.
Code changes:
- Added thd->set_time_for_next_stage() for parallel replication when
when starting to wait for prior transactions to commit, group commit,
and FTWRL and for free space in thread pool.
Before we reset the time only after the above events.
- Moved THD_STAGE_INFO(stage_rollback) and THD_STAGE_INFO(stage_commit)
from sql_parse.cc to transaction.cc to ensure this is done for
all commits and not only 'normal connection queries'.
Test case changes:
- close_thread_tables() reverting stage to previous stage caused the
counter in performance_schema to be increased. In many case it is
the 'sql/starting' stage that was effected.
- We only change to "Commit" stage if there is a need for a commit.
This caused some "Commit" stages to disapper from perfschema reports.
TODO in 11.#:
- Slave_IO always showes "Waiting for master to send event" and the time is
from SLAVE START. We should in 11.# change this to be the time since
reading the last event.
The previous patch for MDEV-10653 changes the rpl_parallel::workers_idle()
function to use Relay_log_info::last_inuse_relaylog to check for idle
workers. But the code was missing a NULL check. Also, there was one place
during SQL slave thread start which was missing mutex synchronisation when
updating inuse_relaylog.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The error-injection inject_mdev8031 simulates a deadlock kill in a specific
place, by setting killed_for_retry to RETRY_KILL_KILLED directly. If a real
deadlock kill triggers at the same time, it is possible for the thread to
complete its transaction retry and set rgi_slave to NULL before the real
readlock kill can complete in the background. This will cause a segfault
due to null-pointer access.
Fix by changing the error injection to do a real background deadlock kill,
which ensures that the thread will wait for any pending background kills to
complete.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in
buildbot with timeout in include
A replication parallel worker thread can deadlock with another
connection running SHOW SLAVE STATUS. That is, if the replication
worker thread is in do_gco_wait() and is killed, it will already
hold the LOCK_parallel_entry, and during error reporting, try to
grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in
reverse order. It will initially grab the err_lock, and then try to
grab LOCK_parallel_entry. This leads to a deadlock when both threads
have grabbed their first lock without the second.
This patch implements the MDEV-31894 proposed fix to optimize the
workers_idle() check to compare the last in-use relay log’s
queued_count==dequeued_count for idleness. This removes the need for
workers_idle() to grab LOCK_parallel_entry, as these values are
atomically updated.
Huge thanks to Kristian Nielsen for diagnosing the problem!
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
MariaDB async replication SQL thread was stopped for any failure
in applying of replication events and error message logged for the failure
was: "Node has dropped from cluster". The assumption was that event applying
failure is always due to node dropping out.
With optimistic parallel replication, event applying can fail for natural
reasons and applying should be retried to handle the failure. This retry
logic was never exercised because the slave SQL thread was stopped with first
applying failure.
To support optimistic parallel replication retrying logic this commit will
now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
and replication is configured for optimistic or aggressive retry logic.
During the development of this fix, galera.galera_as_slave_nonprim test showed
some problems. The test was analyzed, and it appears to need some attention.
One excessive sleep command was removed in this commit, but it will need more
fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
is successful, though.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
The problem was that parallel replication of temporary tables using
statement-based binlogging could overlap the COMMIT in one thread with a DML
or DROP TEMPORARY TABLE in another thread using the same temporary table.
Temporary tables are not safe for concurrent access, so this caused
reference to freed memory and possibly other nastiness.
The fix is to disable the optimisation with overlapping commits of one
transaction with the start of a later transaction, when temporary tables are
in use. Then the following event groups will be blocked from starting until
the one using temporary tables is completed.
This also fixes occasional test failures of rpl.rpl_parallel_temptable seen
in Buildbot.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
When the SQL driver thread goes to wait for room in the parallel slave
worker queue, there was a race where a kill at the right moment could
be ignored and the wait proceed uninterrupted by the kill.
Fix by moving the THD::check_killed() to occur _after_ doing ENTER_COND().
This bug was seen as sporadic failure of the testcase rpl.rpl_parallel
(rpl.rpl_parallel_gco_wait_kill since 10.5), with "Slave stopped with
wrong error code".
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.
Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:
MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master
Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.
Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:
MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master
Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB locking code changes.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The largest_started_sub_id needs to be set under LOCK_parallel_entry
together with testing stop_sub_id. However, in-between was the logic for
do_ftwrl_wait(), which temporarily releases the mutex. This could lead to
inconsistent stopping amongst worker threads and lost data.
Fix by moving all the stop-related logic out from unrelated do_gco_wait()
and do_ftwrl_wait() and into its own function do_stop_handling().
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The problem is that when a worker thread is (user) killed in
wait_for_prior_commit, the event group may complete out-of-order since the
wait for prior commit was aborted by the kill.
This fix ensures that event groups will always complete in-order, even
in the error case. This is done in finish_event_group() by doing an
extra wait_for_prior_commit(), if necessary, that ignores kills.
This fix supersedes the fix for MDEV-30780, so the earlier fix for
that is reverted in this patch.
Also fix that an error from wait_for_prior_commit() inside
finish_event_group() would not signal the error to
wakeup_subsequent_commits().
Based on earlier work by Brandon Nesterenko and Andrei Elkin, with
some changes to simplify the semantics of wait_for_prior_commit() and
make the code more robust to future changes.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The problem was an incorrect unmark_start_commit() in
signal_error_to_sql_driver_thread(). If an event group gets an error, this
unmark could run after the following GCO started, and the subsequent
re-marking could access de-allocated GCO.
The offending unmark_start_commit() looks obviously incorrect, and the fix
is to just remove it. It was introduced in the MDEV-8302 patch, the commit
message of which suggests it was added there solely to satisfy an assertion
in ha_rollback_trans(). So update this assertion instead to not trigger for
event groups that experienced an error (rgi->worker_error). When an error
occurs in an event group, all following event groups are skipped anyway, so
the unmark should never be needed in this case.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
At STOP SLAVE, worker threads will continue applying event groups until the
end of the current GCO before stopping. This is a left-over from when only
conservative mode was available. In optimistic and aggressive mode, often
_all_ queued event will be in the same GCO, and slave stop will be
needlessly delayed.
This patch instead records at STOP SLAVE time the latest (highest sub_id)
event group that has started. Then worker threads will continue to apply
event groups up to that event group, but skip any following. The result is
that each worker thread will complete its currently running event group, and
then the slave will stop.
If the slave is caught up, and STOP SLAVE is run in the middle of an event
group that is already executing in a worker thread, then that event group
will be rolled back and the slave stop immediately, as normal.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The problem is that a parallel replica would not immediately stop
running/queued transactions when issued STOP SLAVE. That is, it
allowed the current group of transactions to run, and sometimes the
transactions which belong to the next group could be started and run
through commit after STOP SLAVE was issued too, if the last group
had started committing. This would lead to long periods to wait for
all waiting transactions to finish.
This patch updates a parallel replica to try and abort immediately
and roll-back any ongoing transactions. The exception to this is any
transactions which are non-transactional (e.g. those modifying
sequences or non-transactional tables), and any prior transactions,
will be run to completion.
The specifics are as follows:
1. A new stage was added to SHOW PROCESSLIST output for the SQL
Thread when it is waiting for a replica thread to either rollback or
finish its transaction before stopping. This stage presents as
“Waiting for worker thread to stop”
2. Worker threads which error or are killed no longer perform GCO
cleanup if there is a concurrently running prior transaction. This
is because a worker thread scheduled to run in a future GCO could be
killed and incorrectly perform cleanup of the active GCO.
3. Refined cases when the FL_TRANSACTIONAL flag is added to GTID
binlog events to disallow adding it to transactions which modify
both transactional and non-transactional engines when the binlogging
configuration allow the modifications to exist in the same event,
i.e. when using binlog_direct_non_trans_update == 0 and
binlog_format == statement.
4. A few existing MTR tests relied on the completion of certain
transactions after issuing STOP SLAVE, and were re-recorded
(potentially with added synchronizations) under the new rollback
behavior.
Reviewed By
===========
Andrei Elkin <andrei.elkin@mariadb.com>