Commit graph

270 commits

Author SHA1 Message Date
Monty
25b5c63905 MDEV-33856: Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps
This commit adds 3 new status variables to 'show all slaves status':

- Master_last_event_time ; timestamp of the last event read from the
  master by the IO thread.
- Slave_last_event_time ; Master timestamp of the last event committed
  on the slave.
- Master_Slave_time_diff: The difference of the above two timestamps.

All the above variables are NULL until the slave has started and the
slave has read one query event from the master that changes data.

- Added information_schema.slave_status, which allows us to remove:
   - show_master_info(), show_master_info_get_fields(),
     send_show_master_info_data(), show_all_master_info()
   - class Sql_cmd_show_slave_status.
   - Protocol::store(I_List<i_string_pair>* str_list) as it is not
     used anymore.
- Changed old SHOW SLAVE STATUS and SHOW ALL SLAVES STATUS to
  use the SELECT code path, as all other SHOW ... STATUS commands.

Other things:
- Xid_log_time is set to time of commit to allow slave that reads the
  binary log to calculate Master_last_event_time and
  Slave_last_event_time.
  This is needed as there is not 'exec_time' for row events.
- Fixed that Load_log_event calculates exec_time identically to
  Query_event.
- Updated RESET SLAVE to reset Master/Slave_last_event_time
- Updated SQL thread's update on first transaction read-in to
  only update Slave_last_event_time on group events.
- Fixed possible (unlikely) bugs in sql_show.cc ...old_format() functions
  if allocation of 'field' would fail.

Reviewed By:
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-07-25 08:57:27 -06:00
Monty
dfdedd46e4 MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range
This patch extends the timestamp from
2038-01-19 03:14:07.999999 to 2106-02-07 06:28:15.999999
for 64 bit hardware and OS where 'long' is 64 bits.
This is true for 64 bit Linux but not for Windows.

This is done by treating the 32 bit stored int as unsigned instead of
signed.  This is safe as MariaDB has never accepted dates before the epoch
(1970).
The benefit of this approach that for normal timestamp the storage is
compatible with earlier version.

However for tables using system versioning we before stored a
timestamp with the year 2038 as the 'max timestamp', which is used to
detect current values.  This patch stores the new 2106 year max value
as the max timestamp. This means that old tables using system
versioning needs to be updated with mariadb-upgrade when moving them
to 11.4. That will be done in a separate commit.
2024-05-27 12:39:02 +02:00
Oleksandr Byelkin
99b370e023 Merge branch '11.2' into 11.4 2024-05-21 19:38:51 +02:00
Sergei Golubchik
bf5da43e50 Merge branch '11.1' into 11.2 2024-05-13 10:00:26 +02:00
Sergei Golubchik
9cf718859f cleanup: use THD_STAGE_INFO, not thd_proc_info
and put master-slave.inc *last* in the series of includes
2024-04-24 22:08:52 +02:00
Sergei Golubchik
018d537ec1 Merge branch '10.6' into 10.11 2024-04-22 15:23:10 +02:00
Marko Mäkelä
829cb1a49c Merge 10.5 into 10.6 2024-04-17 14:14:58 +03:00
Kristian Nielsen
16aa4b5f59 Merge from 10.4 to 10.5
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-15 17:46:49 +02:00
Monty
da47c0370d Fixed calculating of last_master_timestamp for parallel replication.
This effects the Seconds_Behind_Master value.
2024-04-10 17:01:24 +03:00
Kristian Nielsen
0a6f46965a MDEV-33475: --gtid-ignore-duplicate can double-apply event in case of parallel replication retry
When rolling back and retrying a transaction in parallel replication, don't
release the domain ownership (for --gtid-ignore-duplicates) as part of the
rollback. Otherwise another master connection could grab the ownership and
double-apply the transaction in parallel with the retry.

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-13 16:59:10 +01:00
Sergei Golubchik
7f0094aac8 Merge branch '11.2' into 11.3 2023-12-21 02:14:59 +01:00
Sergei Golubchik
fef31a26f3 Merge branch '11.1' into 11.2 2023-12-20 23:43:05 +01:00
Marko Mäkelä
2b99e5f7ef Merge 10.6 into 10.11 2023-12-20 15:58:36 +02:00
Marko Mäkelä
2b01e5103d Merge 10.5 into 10.6 2023-12-19 18:41:42 +02:00
Marko Mäkelä
4ae105a37d Merge 10.4 into 10.5 2023-12-18 08:59:07 +02:00
Brandon Nesterenko
8dad51481b MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave
AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in
buildbot with timeout in include

A replication parallel worker thread can deadlock with another
connection running SHOW SLAVE STATUS. That is, if the replication
worker thread is in do_gco_wait() and is killed, it will already
hold the LOCK_parallel_entry, and during error reporting, try to
grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in
reverse order. It will initially grab the err_lock, and then try to
grab LOCK_parallel_entry. This leads to a deadlock when both threads
have grabbed their first lock without the second.

This patch implements the MDEV-31894 proposed fix to optimize the
workers_idle() check to compare the last in-use relay log’s
queued_count==dequeued_count for idleness. This removes the need for
workers_idle() to grab LOCK_parallel_entry, as these values are
atomically updated.

Huge thanks to Kristian Nielsen for diagnosing the problem!

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2023-12-11 07:45:23 -07:00
Brandon Nesterenko
0c1bf5e247 MDEV-27247: Add keywords "SQL_BEFORE_GTIDS" and "SQL_AFTER_GTIDS" for START SLAVE UNTIL
New Feature:
============
This patch extends the START SLAVE UNTIL command with options
SQL_BEFORE_GTIDS and SQL_AFTER_GTIDS to allow user control of
whether the replica stops before or after a provided GTID state. Its
syntax is:

START SLAVE UNTIL (SQL_BEFORE_GTIDS|SQL_AFTER_GTIDS)=”<gtid_list>”

When providing SQL_BEFORE_GTIDS=”<gtid_list>”, for each domain
specified in the gtid_list, the replica will execute transactions up
to the GTID found, and immediately stop processing events in that
domain (without executing the transaction of the specified GTID).
Once all domains have stopped, the replica will stop. Events
originating from domains that are not specified in the list are not
replicated.

START SLAVE UNTIL SQL_AFTER_GTIDS=”<gtid_list>” is an alias to the
default behavior of START SLAVE UNTIL master_gtid_pos=”<gtid_list>”.
That is, the replica will only execute transactions originating from
domain ids provided in the list, and will stop once all transactions
provided in the UNTIL list have all been executed.

Example:
=========
If a primary server has a binary log consisting of the following GTIDs:

0-1-1
1-1-1
0-1-2
1-1-2
0-1-3
1-1-3

If a fresh replica (i.e. one with an empty GTID position,
@@gtid_slave_pos='') is started with SQL_BEFORE_GTIDS, i.e.

START SLAVE UNTIL SQL_BEFORE_GTIDS=”1-1-2”

The resulting gtid_slave_pos of the replica will be “1-1-1”.
This is because the replica will execute only events from domain 1
until it sees the transaction with sequence number 2, and
immediately stop without executing it.

If the replica is started with SQL_AFTER_GTIDS, i.e.

START SLAVE UNTIL SQL_AFTER_GTIDS=”1-1-2”

then the resulting gtid_slave_pos of the replica will be “1-1-2”.
This is because it will only execute events from domain 1 until it
has executed the provided GTID.

Reviewed By:
============
Kristian Nielson <knielsen@knielsen-hq.org>
2023-10-23 06:40:05 -06:00
Nikita Malyavin
2be4c836e5 MDEV-29069 ER_KEY_NOT_FOUND on online autoinc addition + concurrent DELETE
We can't rely on keys formed with columns that were added during this
ALTER. These columns can be set with non-deterministic values, which can
end up with broken or incorrect search.

The same applies to the keys that contain reliable columns, but also have
bogus ones. Using them can narrow the search, but they're also ignored.

Also, added columns shouldn't be considered during the record match. To
determine them, table->has_value_set bitmap is used.

To fill has_value_set bitmap in the find_key call, extra unpack_row call
has been added.

For replication case, extra replica columns can be considered for this
case. We try to ignore them, too.
2023-08-15 10:16:13 +02:00
Sergei Golubchik
e72ed758c7 cleanup: remove rpl_group_info::get_table_data()
use table->pos_in_table_list instead.

Also, table->in_use is always set
2023-08-15 10:16:13 +02:00
Nikita Malyavin
bdbd357362 Simplify rgi->get_table_data call 2023-08-15 10:16:12 +02:00
Sergei Golubchik
0b67af5a81 cleanup
no functional changes here
2023-08-15 10:16:11 +02:00
Nikita Malyavin
ab4bfad206 MDEV-16329 [5/5] ALTER ONLINE TABLE
* Log rows in online_alter_binlog.
* Table online data is replicated within dedicated binlog file
* Cached data is written on commit.
* Versioning is fully supported.
* Works both wit and without binlog enabled.

* For now savepoints setup is forbidden while ONLINE ALTER goes on.
  Extra support is required. We can simply log the SAVEPOINT query events
  and replicate them together with row events. But it's not implemented
  for now.

* Cache flipping:

  We want to care for the possible bottleneck in the online alter binlog
  reading/writing in advance.

  IO_CACHE does not provide anything better that sequential access,
  besides, only a single write is mutex-protected, which is not suitable,
  since we should write a transaction atomically.

  To solve this, a special layer on top Event_log is implemented.
  There are two IO_CACHE files underneath: one for reading, and one for
  writing.

  Once the read cache is empty, an exclusive lock is acquired (we can wait
  for a currently active transaction finish writing), and flip() is emitted,
  i.e. the write cache is reopened for read, and the read cache is emptied,
  and reopened for writing.

  This reminds a buffer flip that happens in accelerated graphics
  (DirectX/OpenGL/etc).

  Cache_flip_event_log is considered non-blocking for a single reader and a
  single writer in this sense, with the only lock held by reader during flip.

  An alternative approach by implementing a fair concurrent circular buffer
  is described in MDEV-24676.

* Cache managers:
  We have two cache sinks: statement and transactional.
  It is important that the changes are first cached per-statement and
  per-transaction.
  If a statement fails, then only statement data is rolled back. The
  transaction moves along, however.

  Turns out, there's no guarantee that TABLE well persist in
  thd->open_tables to the transaction commit moment.
  If an error occurs, tables from statement are purged.
  Therefore, we can't store te caches in TABLE. Ideally, it should be
  handlerton, but we cut the corner and store it in THD in a list.
2023-08-15 10:16:11 +02:00
Aleksey Midenkov
92bfc0e8c4 MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
:: Syntax change ::

Keyword AUTO enables history partition auto-creation.

Examples:

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH
    STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12;

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO;

Or with explicit partitions:

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO
    (PARTITION p0 HISTORY, PARTITION pn CURRENT);

To disable or enable auto-creation one can use ALTER TABLE by adding
or removing AUTO from partitioning specification:

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;

    # Disables auto-creation:
    ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR;

    # Enables auto-creation:
    ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;

If the rest of partitioning specification is identical to CREATE TABLE
no repartitioning will be done (for details see MDEV-27328).

:: Description ::

Before executing history-generating DML command (see the list of commands below)
add N history partitions, so that N would be sufficient for potentially
generated history. N > 1 may be required when history partitions are switched
by INTERVAL and current_timestamp is N times further than the interval
boundary of the last history partition.

If the last history partition equals or exceeds LIMIT records then new history
partition is created and selected as the working partition. According to
MDEV-28411 partitions cannot be switched (or created) while the command is
running. Thus LIMIT does not carry strict limitation and the history partition
size must be planned as LIMIT value plus average number of history one DML
command can generate.

Auto-creation is implemented by synchronous fast_alter_partition_table() call
from the thread of the executed DML command before the command itself is run
(by the fallback and retry mechanism similar to Discovery feature,
see Open_table_context).

The name for newly added partitions are generated like default partition names
with extension of MDEV-22155 (which avoids name clashes by extending assignment
counter to next free-enough gap).

These DML commands can trigger auto-creation:

    DELETE (including multitable DELETE, excluding DELETE HISTORY)
    UPDATE (including multitable UPDATE)
    REPLACE (including REPLACE .. SELECT)
    INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU)
    LOAD DATA .. REPLACE

:: Bug fixes ::

MDEV-23642 Locking timeout caused by auto-creation affects original DML

    The reasons for this are:

    - Do not disrupt main business process (the history is auxiliary service);

    - Consequences are non-fatal (history is not lost, but comes into wrong
      partition; fixed by partitioning rebuild);

    - There is more freedom for application to fail in this case or not: it may
      read warning info and find corresponding error number.

    - While non-failing command is easy to handle by an application and fail it,
      the opposite is hard to handle: there is no automatic actions to fix
      failed command and retry, DBA intervention is required and until then
      application is non-functioning.

MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers

    Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is
    not possible in locked tables mode.

    LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out
    of the box as fast_alter_partition_table() can reopen tables via
    locked_tables_list.

    In LTM_PRELOCKED we reopen and relock table manually.

:: More fixes ::

* some_table_marked_for_reopen flag fix

  some_table_marked_for_reopen affets only reopen of
  m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only
  tables from m_locked_tables.

* Unused can_recover_from_failed_open() condition

  Is recover_from_failed_open() can be really used after
  open_and_process_routine()?

:: Reviewed by ::

Sergei Golubchik <serg@mariadb.org>
2022-05-06 15:11:02 +03:00
Sachin
0c5d1342ae MDEV-11675 Lag Free Alter On Slave
This commit implements two phase binloggable ALTER.
When a new

      @@session.binlog_alter_two_phase = YES

ALTER query gets logged in two parts, the START ALTER and the COMMIT
or ROLLBACK ALTER. START Alter is written in binlog as soon as
necessary locks have been acquired for the table. The timing is
such that any concurrent DML:s that update the same table are either
committed, thus logged into binary log having done work on the old
version of the table, or will be queued for execution on its new
version.

The "COMPLETE" COMMIT or ROLLBACK ALTER are written at the very point
of a normal "single-piece" ALTER that is after the most of
the query work is done. When its result is positive COMMIT ALTER is
written, otherwise ROLLBACK ALTER is written with specific error
happened after START ALTER phase.
Replication of two-phase binloggable ALTER is
cross-version safe. Specifically the OLD slave merely does not
recognized the start alter part, still being able to process and
memorize its gtid.

Two phase logged ALTER is read from binlog by mysqlbinlog to produce
BINLOG 'string', where 'string' contains base64 encoded
Query_log_event containing either the start part of ALTER, or a
completion part. The Query details can be displayed with `-v` flag,
similarly to ROW format events.  Notice, mysqlbinlog output containing
parts of two-phase binloggable ALTER is processable correctly only by
binlog_alter_two_phase server.

@@log_warnings > 2 can reveal details of binlogging and slave side
processing of the ALTER parts.

The current commit also carries fixes to the following list of
reported bugs:
MDEV-27511, MDEV-27471, MDEV-27349, MDEV-27628, MDEV-27528.

Thanks to all people involved into early discussion of the feature
including Kristian Nielsen, those who helped to design, implement and
test: Sergei Golubchik, Andrei Elkin who took the burden of the
implemenation completion, Sujatha Sivakumar, Brandon
Nesterenko, Alice Sherepa, Ramesh Sivaraman, Jan Lindstrom.
2022-01-27 21:25:07 +02:00
Sujatha
70642871bc MDEV-16437: merge 5.7 P_S replication instrumentation and tables
Merge 'replication_applier_status_by_coordinator' table.

This table captures SQL_THREAD status in case of both single threaded and
multi threaded slave configuration. When multi_source replication is enabled
this table will display each source specific SQL_THREAD status.

Added new columns for:
 - LAST_SEEN_TRANSACTION
 - LAST_TRANS_RETRY_COUNT
2021-04-16 09:02:00 +05:30
Otto Kekäläinen
cebf9ee204 Fix various spelling errors still found in code
Reseting -> Resetting
Unknow -> Unknown
capabilites -> capabilities
choosen -> chosen
direcory -> directory
informations -> information
openned -> opened
refered -> referred
to access -> one to access
missmatch -> mismatch
succesfully -> successfully
dont -> don't
2021-03-22 18:10:39 +11:00
Marko Mäkelä
d7a5824899 Merge 10.4 into 10.5 2020-11-13 21:54:21 +02:00
Sujatha
b2029c0300 Merge branch '10.3' into 10.4 2020-11-12 15:39:02 +05:30
Andrei Elkin
e156a8da08 MDEV-21851: Error in BINLOG_BASE64_EVENT i s always error-logged as if it is done by Slave
The prefix of error log message out of a failed BINLOG applying
is corrected to be the sql command name.
2020-06-12 11:25:27 +03:00
Marko Mäkelä
6da14d7b4a Merge 10.3 into 10.4 2020-05-30 11:04:27 +03:00
Marko Mäkelä
dad7a8ee7d Merge 10.2 into 10.3 2020-05-27 17:10:39 +03:00
Andrei Elkin
0c1f97b3ab MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTIL
The immediate bug was caused by a failure to recognize a correct
position to stop the slave applier run in optimistic parallel mode.
There were the following set of issues that the analysis unveil.
1 incorrect estimate for the event binlog position passed to
  is_until_satisfied
2 wait for workers to complete by the driver thread did not account non-group events
  that could be left unprocessed and thus to mix up the last executed
  binlog group's file and position:
  the file remained old and the position related to the new rotated file
3 incorrect 'slave reached file:pos' by the parallel slave report in the error log
4 relay log UNTIL missed out the parallel slave branch in
  is_until_satisfied.

The patch addresses all of them to simplify logics of log change
notification in either the master and relay-log until case.
P.1 is addressed with passing the event into is_until_satisfied()
for proper analisis by the function.
P.2 is fixed by changes in handle_queued_pos_update().
P.4 required removing relay-log change notification by workers.
Instead the driver thread updates the notion of the current relay-log
fully itself with aid of introduced
bool Relay_log_info::until_relay_log_names_defer.

An extra print out of the requested until file:pos is arranged
with --log-warning=3.
2020-05-26 18:49:43 +03:00
Sergey Vojtovich
5876ed9e5b Relay_log_info::executed_entries to Atomic_counter 2020-04-15 18:36:07 +04:00
Sergey Vojtovich
1c8de231a3 dequeued_count my_atomic to Atomic_counter
Also allocate inuse_relaylog with new rather than my_malloc(MY_ZEROFILL).
2020-03-25 23:49:38 +04:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
cb248f8806 Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru
5543b75550 Update FSF Address
* Update wrong zip-code
2019-05-11 21:29:06 +03:00
Kristian Nielsen
34f11b06e6 Move deletion of old GTID rows to slave background thread
This patch changes how old rows in mysql.gtid_slave_pos* tables are deleted.
Instead of doing it as part of every replicated transaction in
record_gtid(), it is done periodically (every @@gtid_cleanup_batch_size
transaction) in the slave background thread.

This removes the deletion step from the replication process in SQL or worker
threads, which could speed up replication with many small transactions. It
also decreases contention on the global mutex LOCK_slave_state. And it
simplifies the logic, eg. when a replicated transaction fails after having
deleted old rows.

With this patch, the deletion of old GTID rows happens asynchroneously and
slightly non-deterministic. Thus the number of old rows in
mysql.gtid_slave_pos can temporarily exceed @@gtid_cleanup_batch_size. But
all old rows will be deleted eventually after sufficiently many new GTIDs
have been replicated.
2018-12-07 07:10:40 +01:00
Marko Mäkelä
32062cc61c Merge 10.1 into 10.2 2018-11-06 08:41:48 +02:00
Kristian Nielsen
3eb2c46644 Merge branch 'gtid_table_garbage_rows' into gtid_table_garbage_rows_10.3 2018-10-07 23:40:32 +02:00
Kristian Nielsen
2f4a0c5be2 Fix accumulation of old rows in mysql.gtid_slave_pos
This would happen especially in optimistic parallel replication, where there
is a good chance that a transaction will be rolled back (due to conflicts)
after it has executed record_gtid(). If the transaction did any deletions of
old rows as part of record_gtid(), those deletions will be undone as well.
And the code did not properly ensure that the deletions would be re-tried.

This patch makes record_gtid() remember the list of deletions done as part
of a transaction. Then in rpl_slave_state::update() when the changes have
been committed, we discard the list. However, in case of error and rollback,
in cleanup_context() we will instead put the list back into
rpl_global_gtid_slave_state so that the deletions will be re-tried later.

Probably fixes part of the cause of MDEV-12147 as well.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2018-10-07 18:59:52 +02:00
Sergei Golubchik
bb8e99fdc3 Merge branch 'bb-10.2-ext' into 10.3 2017-08-26 00:34:43 +02:00
Monty
b77bb43f60 Use microsecond_interval_timer() instead of my_time() in replicaiton
- Use microsecond_interval_timer() to calculate time for applying row events.
  (Faster execution)
- Removed return value for set_row_stmt_start_timestamp()
  as it was never used.
2017-08-24 01:04:35 +02:00
Sergei Golubchik
cb1e76e4de Merge branch '10.1' into 10.2 2017-08-17 11:38:34 +02:00
Sergei Golubchik
8e8d42ddf0 Merge branch '10.0' into 10.1 2017-08-08 10:18:43 +02:00
Vicențiu Ciorbaru
786ad0a158 Merge remote-tracking branch 'origin/5.5' into 10.0 2017-07-25 00:41:54 +03:00
Sergei Golubchik
9a5fe1f4ea Merge remote-tracking branch 'mysql/5.5' into 5.5 2017-07-18 14:59:10 +02:00
Kristian Nielsen
1d91910b94 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Merge into MariaDB 10.3.
2017-07-03 09:33:41 +02:00