Commit graph

2701 commits

Author SHA1 Message Date
Sergei Golubchik
bf2bdd1a1a Merge branch '10.8' into 10.9 2022-05-19 14:07:55 +02:00
Sergei Golubchik
b7ffccf49b Merge branch '10.7' into 10.8 2022-05-18 13:26:48 +02:00
Sergei Golubchik
99a433ed1c Merge branch '10.6' into 10.7 2022-05-18 10:34:38 +02:00
Sergei Golubchik
b2187662bc Merge branch '10.5' into 10.6 2022-05-18 10:30:47 +02:00
Sergei Golubchik
7970ac7fe8 Merge branch '10.4' into 10.5 2022-05-18 09:50:26 +02:00
Sergei Golubchik
23ddc3518f Merge branch '10.3' into 10.4 2022-05-18 01:25:30 +02:00
Sergei Golubchik
a0d4f0f306 Merge branch '10.2' into 10.3
commit 84984b79f2 is null-merged
2022-05-18 01:23:47 +02:00
Andrei
726bd8c968 MDEV-28550 improper handling of replication event group that contains
GTID_LIST_EVENT or INCIDENT_EVENT.

It's legal to have either of the two inside a group. E.g
  Gtid_event, Gtid_log_list_event, Query_1, ... Xid_log_event
is permitted.
However, the slave IO thread treated both
as the terminal even when the group represents a DDL query.
That causes a premature Gtid state update so the slave IO would think
the whole group has been collected while in fact Query_1 etc are yet to process.

Fixed with correcting a condition to compute the terminal event
of the group.
Tested with rpl_mysqlbinlog_slave_consistency (of 10.9) and
rpl_gtid_errorlog.test.
2022-05-13 09:45:32 +02:00
Sergei Golubchik
443c2a715d Merge branch '10.7' into 10.8 2022-05-11 12:21:36 +02:00
Sergei Golubchik
fd132be117 Merge branch '10.6' into 10.7 2022-05-11 11:25:33 +02:00
Sergei Golubchik
3bc98a4ec4 Merge branch '10.5' into 10.6 2022-05-10 14:01:23 +02:00
Sergei Golubchik
ef781162ff Merge branch '10.4' into 10.5 2022-05-09 22:04:06 +02:00
Sergei Golubchik
a70a1cf3f4 Merge branch '10.3' into 10.4 2022-05-08 23:03:08 +02:00
Sergei Golubchik
93e64d1f58 cleanup: log_current_statement and OPTION_KEEP_LOG
rename OPTION_KEEP_LOG -> OPTION_BINLOG_THIS_TRX.
    Meaning: transaction cache will be written to binlog even on rollback.

    convert log_current_statement to OPTION_BINLOG_THIS_STMT.
    Meaning: the statement will be written to binlog (or trx binlog cache)
    even if it normally wouldn't be.

    setting OPTION_BINLOG_THIS_STMT must always set OPTION_BINLOG_THIS_TRX,
    otherwise the statement won't be logged if the transaction is rolled back.
    Use OPTION_BINLOG_THIS to set both.
2022-05-06 10:45:17 +03:00
Oleksandr Byelkin
9614fde1aa Merge branch '10.2' into 10.3 2022-05-03 10:59:54 +02:00
Sergei Golubchik
1430cf7873 MDEV-28428 Master_SSL_Crl shows Master_SSL_CA value in SHOW SLAVE STATUS output
it was showing ca and capath instead of crl and crl_path
2022-04-28 13:21:04 +02:00
Andrei
388032e990 MDEV-27697. Removed a false assert. 2022-04-26 19:47:59 +03:00
Andrei
945245aea4 MDEV-27697. Two affected tests fixed.
A result file is updated in one case and former error simulation got
refined.
2022-04-26 17:05:40 +03:00
Andrei
1bcdc3e9eb MDEV-27697 slave must recognize incomplete replication event group
In cases of a faulty master or an incorrect binlog event producer, that slave is working with,
sends an incomplete group of events slave must react with an error to not to log
into the relay-log any new events that do not belong to the incomplete group.

Fixed with extending received event properties check when slave connects to master
in gtid mode.
Specifically for the event that can be a part of a group its relay-logging is
permitted only when its position within the group is validated.
Otherwise slave IO thread stops with ER_SLAVE_RELAY_LOG_WRITE_FAILURE.
2022-04-25 16:00:35 +03:00
Brandon Nesterenko
a83c7ab1ea MDEV-11853: semisync thread can be killed after sync binlog but before ACK in the sync state
Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.

Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.

Notes:
 1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
 2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT

Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
2022-04-22 12:59:54 -06:00
Daniel Black
88ce8a3d8b Merge 10.7 into 10.8 2022-03-25 15:06:56 +11:00
Daniel Black
e86986a157 Merge 10.6 into 10.7 2022-03-24 18:57:07 +11:00
Andrei
5ccd845d51 MDEV-27760 event may non stop replicate in circular semisync setup
MDEV-21117 had to relax own events acceptance condition for a case
when a former semisync master server recovers after crash as the
semisync slave. That however admitted a possibility for endless event
"orbiting" in the non-strict slave gtid mode of semisync circular
setup.

The same server-id event termination is restored now for
the non-strict gtid mode to follow regular rules (that is it's ignored
unless @@global.replicate_same_server_id allows it in).

To address MDEV-21117 recovery agenda,
in the strict gtid mode and the transaction's gtid ordered strictly
greater than the current slave gtid state, the same server-id
transaction is accepted.

The gtid strict mode is safe to accept transactions even if
the slave state were not set correct by the user, e.g
at the former master.
An added test shows a typical out-of-order error at execution so
no data corruption is guaranteed in such a case.
2022-03-22 19:20:19 +02:00
Oleksandr Byelkin
4fb2cb1a30 Merge branch '10.7' into 10.8 2022-02-04 14:50:25 +01:00
Oleksandr Byelkin
9ed8deb656 Merge branch '10.6' into 10.7 2022-02-04 14:11:46 +01:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00
Oleksandr Byelkin
41a163ac5c Merge branch '10.2' into 10.3 2022-01-29 15:41:05 +01:00
Sachin
0c5d1342ae MDEV-11675 Lag Free Alter On Slave
This commit implements two phase binloggable ALTER.
When a new

      @@session.binlog_alter_two_phase = YES

ALTER query gets logged in two parts, the START ALTER and the COMMIT
or ROLLBACK ALTER. START Alter is written in binlog as soon as
necessary locks have been acquired for the table. The timing is
such that any concurrent DML:s that update the same table are either
committed, thus logged into binary log having done work on the old
version of the table, or will be queued for execution on its new
version.

The "COMPLETE" COMMIT or ROLLBACK ALTER are written at the very point
of a normal "single-piece" ALTER that is after the most of
the query work is done. When its result is positive COMMIT ALTER is
written, otherwise ROLLBACK ALTER is written with specific error
happened after START ALTER phase.
Replication of two-phase binloggable ALTER is
cross-version safe. Specifically the OLD slave merely does not
recognized the start alter part, still being able to process and
memorize its gtid.

Two phase logged ALTER is read from binlog by mysqlbinlog to produce
BINLOG 'string', where 'string' contains base64 encoded
Query_log_event containing either the start part of ALTER, or a
completion part. The Query details can be displayed with `-v` flag,
similarly to ROW format events.  Notice, mysqlbinlog output containing
parts of two-phase binloggable ALTER is processable correctly only by
binlog_alter_two_phase server.

@@log_warnings > 2 can reveal details of binlogging and slave side
processing of the ALTER parts.

The current commit also carries fixes to the following list of
reported bugs:
MDEV-27511, MDEV-27471, MDEV-27349, MDEV-27628, MDEV-27528.

Thanks to all people involved into early discussion of the feature
including Kristian Nielsen, those who helped to design, implement and
test: Sergei Golubchik, Andrei Elkin who took the burden of the
implemenation completion, Sujatha Sivakumar, Brandon
Nesterenko, Alice Sherepa, Ramesh Sivaraman, Jan Lindstrom.
2022-01-27 21:25:07 +02:00
Brandon Nesterenko
96de6bfd5e MDEV-16091: Seconds_Behind_Master spikes to millions of seconds
Problem:
========
A slave’s relay log format description event is used when
calculating Seconds_Behind_Master (SBM). This forces the SBM
value to spike when processing these events, as their creation
date is set to the timestamp that the IO thread begins.

Solution:
========
When the slave generates a format description event, mark the
event as a relay log event so it does not update the
rli->last_master_timestamp variable.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
2022-01-04 11:21:33 -07:00
sjaakola
ef2dbb8dbc MDEV-23328 Server hang due to Galera lock conflict resolution
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is

          wsrep_thd_LOCK()
          wsrep_kill_victim()
          lock_rec_other_has_conflicting()
          lock_clust_rec_read_check_and_lock()
          row_search_mvcc()
          ha_innobase::index_read()
          ha_innobase::rnd_pos()
          handler::ha_rnd_pos()
          handler::rnd_pos_by_record()
          handler::ha_rnd_pos_by_record()
          Rows_log_event::find_row()
          Update_rows_log_event::do_exec_row()
          Rows_log_event::do_apply_event()
          Log_event::apply_event()
          wsrep_apply_events()

and mutexes are taken in the order

          lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data

When a normal KILL statement is executed, the stack is

          innobase_kill_query()
          kill_handlerton()
          plugin_foreach_with_mask()
          ha_kill_query()
          THD::awake()
          kill_one_thread()

        and mutexes are

          victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.

In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.

TOI replication is used, in this approach,  purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.

This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-10-29 20:40:35 +02:00
Jan Lindström
d5bc05798f MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL)
Revert "MDEV-23328 Server hang due to Galera lock conflict resolution"

This reverts commit eac8341df4.
2021-10-29 20:38:11 +02:00
sjaakola
5c230b21bf MDEV-23328 Server hang due to Galera lock conflict resolution
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is

          wsrep_thd_LOCK()
          wsrep_kill_victim()
          lock_rec_other_has_conflicting()
          lock_clust_rec_read_check_and_lock()
          row_search_mvcc()
          ha_innobase::index_read()
          ha_innobase::rnd_pos()
          handler::ha_rnd_pos()
          handler::rnd_pos_by_record()
          handler::ha_rnd_pos_by_record()
          Rows_log_event::find_row()
          Update_rows_log_event::do_exec_row()
          Rows_log_event::do_apply_event()
          Log_event::apply_event()
          wsrep_apply_events()

and mutexes are taken in the order

          lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data

When a normal KILL statement is executed, the stack is

          innobase_kill_query()
          kill_handlerton()
          plugin_foreach_with_mask()
          ha_kill_query()
          THD::awake()
          kill_one_thread()

        and mutexes are

          victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.

In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.

TOI replication is used, in this approach,  purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.

This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-10-29 09:52:52 +03:00
Jan Lindström
aa7ca987db MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL)
Revert "MDEV-23328 Server hang due to Galera lock conflict resolution"

This reverts commit eac8341df4.
2021-10-29 09:52:40 +03:00
Marko Mäkelä
d7af7bfc2b Merge 10.6 into 10.7 2021-10-28 09:14:51 +03:00
Marko Mäkelä
d8c6c53a06 Merge 10.5 into 10.6 2021-10-28 09:08:58 +03:00
Marko Mäkelä
a8ded39557 Merge 10.4 into 10.5 2021-10-28 08:48:36 +03:00
Julius Goryavsky
7948a1dc53 MDEV-26914: Unreleased mutex in the exec_relay_log_event() function
In the replication-related code, in the exec_relay_log_event() (slave.cc)
function, where the "data_lock" mutex is captured, this mutex is then not
released on one of the early return branches within a specific insert for
WSREP, namely under the branch: "if (wsrep_before_statement(thd))". As a
result, the mutex remains captured, resulting in errors or hangs.

This commit fixes this issue, which is now showing up as intermittent
failures in mtr tests for galera and galera_sr suites.
2021-10-28 03:17:12 +02:00
Aleksey Midenkov
d324c03d0c Vanilla cleanups and refactorings
Dead code cleanup:

part_info->num_parts usage was wrong and working incorrectly in
mysql_drop_partitions() because num_parts is already updated in
prep_alter_part_table(). We don't have to update part_info->partitions
because part_info is destroyed at alter_partition_lock_handling().

Cleanups:

- DBUG_EVALUATE_IF() macro replaced by shorter form DBUG_IF();
- Typo in ER_KEY_COLUMN_DOES_NOT_EXITS.

Refactorings:

- Splitted write_log_replace_delete_frm() into write_log_delete_frm()
  and write_log_replace_frm();
- partition_info via DDL_LOG_STATE;
- set_part_info_exec_log_entry() removed.

DBUG_EVALUATE removed

DBUG_EVALUTATE was only added for consistency together with
DBUG_EVALUATE_IF. It is not used anywhere in the code.

DBUG_SUICIDE() fix on release build

On release DBUG_SUICIDE() was statement. It was wrong as
DBUG_SUICIDE() is used in expression context.
2021-10-26 17:07:46 +02:00
Sergei Golubchik
0299ec29d4 cleanup: MY_BITMAP mutex
in about a hundred of users of MY_BITMAP, only two were using its
built-in mutex, and only one of those two was actually needing it.

Remove the mutex from MY_BITMAP, remove all associated conditions
and checks in bitmap functions. Use an external LOCK_temp_pool
mutex and temp_pool_set_next/temp_pool_clear_bit acccessors.

Remove bitmap_init/bitmap_free, always use my_* versions.
2021-08-26 23:39:52 +02:00
Sujatha
6c39eaeb12 MDEV-21117: refine the server binlog-based recovery for semisync
Problem:
=======
When the semisync master is crashed and restarted as slave it could
recover transactions that former slaves may never have seen.
A known method existed to clear out all prepared transactions
with --tc-heuristic-recover=rollback does not care to adjust
binlog accordingly.

Fix:
===
The binlog-based recovery is made to concern of the slave semisync role of
post-crash restarted server.
No changes in behavior is done to the "normal" binloggging server
and the semisync master.

When the restarted server is configured with
  --rpl-semi-sync-slave-enabled=1
the refined recovery attempts to roll back prepared transactions
and truncate binlog accordingly.
In case of a partially committed (that is committed at least
in one of the engine participants) such transaction gets committed.
It's guaranteed no (partially as well) committed transactions
exist beyond the truncate position.
In case there exists a non-transactional replication event
(being in a way a committed transaction) past the
computed truncate position the recovery ends with an error.

As after master crash and failover to slave, the demoted-to-slave
ex-master must be ready to face and accept its own (generated by)
events, without generally necessary --replicate-same-server-id.
So the acceptance conditions are relaxed for the semisync slave
to accept own events without that option.
While gtid_strict_mode ON ensures no duplicate transaction can be
(re-)executed the master_use_gtid=none slave has to be
configured with --replicate-same-server-id.

*NOTE* for reviewers.

This patch does not handle the user XA which is done
in next git commit.
2021-06-11 19:49:39 +03:00
Rucha Deodhar
4e19539c14 MDEV-22189: Change error messages inside code to have mariadb instead of
mysql

Fix: Changed error messages, rerecorded results and changed other relevant
files.
2021-05-24 11:38:13 +05:30
Monty
85d6278fed Change replication to use uchar for all buffers instead of char
This change is to get rid of randomly failing tests, especially those
that reads random position of the binary log. From looking at the logs
it's clear that some failures is because of a read char (with value >= 128)
is converted to a big long value. Using uchar everywhere makes this much
less likely to happen.
Another benefit is that a lot of cast of char to uchar could be removed.

Other things:
- Removed some extra space before '=' and '+=' in assignments
- Fixed indentations and lines > 80 characters
- Replace '16' with 'element_size' (from class definition) in
  Gtid_list_log_event()
2021-05-19 22:54:12 +02:00
Monty
a206658b98 Change CHARSET_INFO character set and collaction names to LEX_CSTRING
This change removed 68 explict strlen() calls from the code.

The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name

Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible

Other things:
- There where compiler issues with ensuring that all character set names
  points to the same string: gcc doesn't allow one to use integer constants
  when defining global structures (constant char * pointers works fine).
  To get around this, I declared defines for each character set name
  length.
2021-05-19 22:54:07 +02:00
Monty
b6ff139aa3 Reduce usage of strlen()
Changes:
- To detect automatic strlen() I removed the methods in String that
  uses 'const char *' without a length:
  - String::append(const char*)
  - Binary_string(const char *str)
  - String(const char *str, CHARSET_INFO *cs)
  - append_for_single_quote(const char *)
  All usage of append(const char*) is changed to either use
  String::append(char), String::append(const char*, size_t length) or
  String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
  String::append()
- Added overflow argument to escape_string_for_mysql() and
  escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
  This was needed as most usage of the above functions never tested the
  result for -1 and would have given wrong results or crashes in case
  of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
  Changed all Item_func::func_name()'s to func_name_cstring()'s.
  The old Item_func_or_sum::func_name() is now an inline function that
  returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
  LEX_CSTRING.
- Changed for some functions the name argument from const char * to
  to const LEX_CSTRING &:
  - Item::Item_func_fix_attributes()
  - Item::check_type_...()
  - Type_std_attributes::agg_item_collations()
  - Type_std_attributes::agg_item_set_converter()
  - Type_std_attributes::agg_arg_charsets...()
  - Type_handler_hybrid_field_type::aggregate_for_result()
  - Type_handler_geometry::check_type_geom_or_binary()
  - Type_handler::Item_func_or_sum_illegal_param()
  - Predicant_to_list_comparator::add_value_skip_null()
  - Predicant_to_list_comparator::add_value()
  - cmp_item_row::prepare_comparators()
  - cmp_item_row::aggregate_row_elements_for_comparison()
  - Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
  could be simplified to not use String_space(), thanks to the fixed
  my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
  - NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
  clarify what the function really does.
- Rename of protocol function:
  bool store(const char *from, CHARSET_INFO *cs) to
  bool store_string_or_null(const char *from, CHARSET_INFO *cs).
  This was done to both clarify the difference between this 'store' function
  and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.

Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
  in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
  append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
  inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
  case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.
2021-05-19 22:27:48 +02:00
Marko Mäkelä
ca3f497564 Merge 10.2 into 10.3, except MDEV-25682 2021-05-18 08:40:19 +03:00
Sachin Kumar
355dc74b76 MDEV-22370 safe_mutex: Trying to lock uninitialized mutex at /data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL
Problem:- When we issue FTWRL with shutdown in parallel, there is race between
FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool)
before FTWRL can lock it. So we can get crash on FTWRL thread

Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for
FTWRL thread to complete its work , and then destroy.
So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed
later in end_slave()
2021-05-14 11:49:46 +01:00
Andrei Elkin
3616640a31 MDEV-20821 parallel slave server shutdown hang
Parallel slave server shutdown found to be hanging in
close_connections() triggered by shutdown due to a slave worker thread
would not be notified to exit in case the worker was sitting idle.

Fixed with destroying the worker pool earlier that is in
slave_prepare_for_shutdown() when all their driver threads have already left.
A test file is added to simulate the bug condition as well as check
multi-sourced and not-idle worker cases.
2021-05-14 11:49:26 +01:00
Sujatha
70642871bc MDEV-16437: merge 5.7 P_S replication instrumentation and tables
Merge 'replication_applier_status_by_coordinator' table.

This table captures SQL_THREAD status in case of both single threaded and
multi threaded slave configuration. When multi_source replication is enabled
this table will display each source specific SQL_THREAD status.

Added new columns for:
 - LAST_SEEN_TRANSACTION
 - LAST_TRANS_RETRY_COUNT
2021-04-16 09:02:00 +05:30