Commit graph

465 commits

Author SHA1 Message Date
Marko Mäkelä
5c69e93630 Merge 10.7 into 10.8 2022-03-30 09:34:07 +03:00
Marko Mäkelä
a4d753758f Merge 10.6 into 10.7 2022-03-30 08:52:05 +03:00
Marko Mäkelä
b242c3141f Merge 10.5 into 10.6 2022-03-29 16:16:21 +03:00
Marko Mäkelä
d62b0368ca Merge 10.4 into 10.5 2022-03-29 12:59:18 +03:00
Marko Mäkelä
ae6e214fd8 Merge 10.3 into 10.4 2022-03-29 11:13:18 +03:00
Brandon Nesterenko
cd88b0831f DBAAS-7828: Primary/replica: configuration change of autocommit=0 can not be applied
Problem:
========
When the mysql.gtid_slave_pos table uses the InnoDB engine, and
mysqld starts, it reads the table and begins a transaction. After
reading the value, it should end the transaction and release all
associated locks. The bug reported in DBAAS-7828 shows that when
autocommit is off, the locks are not released, resulting in
indefinite hangs on future attempts to change gtid_slave_pos. In
particular, the transaction was not properly finalized because
thd->server_status was not updated to reflect the end of the
transaction.

Solution:
========
This patch updates the code to properly commit the transaction after
reading gtid_slave_pos during mysqld start-up.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
2022-03-24 12:00:40 -06:00
Sachin
0c5d1342ae MDEV-11675 Lag Free Alter On Slave
This commit implements two phase binloggable ALTER.
When a new

      @@session.binlog_alter_two_phase = YES

ALTER query gets logged in two parts, the START ALTER and the COMMIT
or ROLLBACK ALTER. START Alter is written in binlog as soon as
necessary locks have been acquired for the table. The timing is
such that any concurrent DML:s that update the same table are either
committed, thus logged into binary log having done work on the old
version of the table, or will be queued for execution on its new
version.

The "COMPLETE" COMMIT or ROLLBACK ALTER are written at the very point
of a normal "single-piece" ALTER that is after the most of
the query work is done. When its result is positive COMMIT ALTER is
written, otherwise ROLLBACK ALTER is written with specific error
happened after START ALTER phase.
Replication of two-phase binloggable ALTER is
cross-version safe. Specifically the OLD slave merely does not
recognized the start alter part, still being able to process and
memorize its gtid.

Two phase logged ALTER is read from binlog by mysqlbinlog to produce
BINLOG 'string', where 'string' contains base64 encoded
Query_log_event containing either the start part of ALTER, or a
completion part. The Query details can be displayed with `-v` flag,
similarly to ROW format events.  Notice, mysqlbinlog output containing
parts of two-phase binloggable ALTER is processable correctly only by
binlog_alter_two_phase server.

@@log_warnings > 2 can reveal details of binlogging and slave side
processing of the ALTER parts.

The current commit also carries fixes to the following list of
reported bugs:
MDEV-27511, MDEV-27471, MDEV-27349, MDEV-27628, MDEV-27528.

Thanks to all people involved into early discussion of the feature
including Kristian Nielsen, those who helped to design, implement and
test: Sergei Golubchik, Andrei Elkin who took the burden of the
implemenation completion, Sujatha Sivakumar, Brandon
Nesterenko, Alice Sherepa, Ramesh Sivaraman, Jan Lindstrom.
2022-01-27 21:25:07 +02:00
Oleksandr Byelkin
763bdee81b MDEV-26637: (explicit length) ASAN: main.metadata and user_variables.basic MTR failures after MDEV-26572
Use explicit length for hash record length
2021-10-12 10:01:07 +02:00
Marko Mäkelä
f09d33f521 Merge 10.5 into 10.6 2021-05-18 11:13:45 +03:00
Marko Mäkelä
cc2651b74c Merge 10.4 into 10.5 2021-05-18 09:21:59 +03:00
Marko Mäkelä
4240704abc Merge 10.3 into 10.4 2021-05-18 08:59:12 +03:00
Marko Mäkelä
ca3f497564 Merge 10.2 into 10.3, except MDEV-25682 2021-05-18 08:40:19 +03:00
Daniel Black
80ae3677e1 MDEV-25681: --relay-log{,-index} missing warning
No longer a MySQL server, "his" is the wrong pronoun
for a server.

Thanks Michael Newton for highlighting these problems

Also changed slave -> replica.
2021-05-17 09:39:43 +10:00
Sujatha
70642871bc MDEV-16437: merge 5.7 P_S replication instrumentation and tables
Merge 'replication_applier_status_by_coordinator' table.

This table captures SQL_THREAD status in case of both single threaded and
multi threaded slave configuration. When multi_source replication is enabled
this table will display each source specific SQL_THREAD status.

Added new columns for:
 - LAST_SEEN_TRANSACTION
 - LAST_TRANS_RETRY_COUNT
2021-04-16 09:02:00 +05:30
Monty
cccc96d66c Fixed wrong initializations of Dynamic_array
Other things:
- Added size() function to Dynamic_array()
2021-03-20 21:17:32 +02:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Sujatha
eb75e8705d MDEV-8134: The relay-log is not flushed after the slave-relay-log.999999 showed
Problem:
========
Auto purge of relaylogs stops when relay-log-file is
'slave-relay-log.999999' and slave_parallel_threads is enabled.

Analysis:
=========
The problem is that in Relay_log_info::inc_group_relay_log_pos() function,
when two log names are compared via strcmp() function, it gives correct
result, when log name sequence numbers are of same digits(6 digits), But
when the number goes to 7 digits, a 999999 compares greater than
1000000, which is wrong, hence the bug.

Fix:
====
Extract the numeric extension part of the file name, convert it into
unsigned long and compare.

Thanks to David Zhao for the contribution.
2021-01-21 13:00:02 +05:30
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
24ec8eaf66 MDEV-15532 after-merge fixes from Monty
The Galera tests were massively failing with debug assertions.
2020-12-02 16:16:29 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Monty
7edfed6305 After merge fixes
Change thd->mdl_context.release_transactional_locks() to
thd->mdl_release_transactional_locks()
2020-12-01 16:23:28 +02:00
Marko Mäkelä
81ab9ea63f Merge 10.2 into 10.3 2020-12-01 14:55:46 +02:00
Monty
828471cbf8 MDEV 15532 Assertion `!log->same_pk' failed in row_log_table_apply_delete
The reason for the failure is that
thd->mdl_context.release_transactional_locks()
was called after commit & rollback even in cases where the current
transaction is still active.

For 10.2, 10.3 and 10.4 the fix is simple:
- Replace all calls to thd->mdl_context.release_transactional_locks() with
  thd->release_transactional_locks(). The thd function will only call
  the mdl_context function if there are no active transactional locks.
  In 10.6 we will better fix where we will change the return value for
  some trans_xxx() functions to indicate if transaction did close the
  transaction or not. This will avoid the need of the indirect call.

Other things:
- trans_xa_commit() and trans_xa_rollback() will automatically
  call release_transactional_locks() if the transaction is closed.
- We can't do that for the other functions as the caller of many of these
  are doing additional work (like close_thread_tables) before calling
  release_transactional_locks().
- Added missing abort_result_set() and missing DBUG_RETURN in
  select_create::send_eof()
- Fixed wrong indentation in injector::transaction::commit()
2020-11-30 22:21:43 +02:00
Marko Mäkelä
d7a5824899 Merge 10.4 into 10.5 2020-11-13 21:54:21 +02:00
Sujatha
b2029c0300 Merge branch '10.3' into 10.4 2020-11-12 15:39:02 +05:30
Sujatha
bafb011a82 Merge branch '10.2' into 10.3 2020-11-12 14:10:05 +05:30
Sujatha
984a06db2c MDEV-4633: multi_source.simple test fails sporadically
Analysis:
========
Writes to 'rli->log_space_total' needs to be synchronized, otherwise both
SQL_THREAD and IO_THREAD can try to modify the variable simultaneously
resulting in incorrect rli->log_space_total.  In the current test scenario
SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log'
and IO_THREAD is trying to increment the 'rli->log_space_total' in
'queue_event' simultaneously. Hence test occasionally fails with  result
mismatch.

Fix:
===
Convert 'rli->log_space_total' variable to atomic type.
2020-11-12 13:04:39 +05:30
Monty
2682458128 Use larger buffer when reading binary and relay logs
- Should speed up replication
2020-07-23 10:54:32 +03:00
Marko Mäkelä
1813d92d0c Merge 10.4 into 10.5 2020-07-02 09:41:44 +03:00
Marko Mäkelä
f347b3e0e6 Merge 10.3 into 10.4 2020-07-02 07:39:33 +03:00
Marko Mäkelä
1df1a63924 Merge 10.2 into 10.3 2020-07-02 06:17:51 +03:00
Marko Mäkelä
ea2bc974dc Merge 10.1 into 10.2 2020-07-01 12:03:55 +03:00
Sujatha
3bc8939552 MDEV-22806: MSAN reports use-of-uninitialized-value for rpl_parallel_conflicts.test
Problem:
========
Relay_log_info::flush reports following MSAN issue.
==17820==WARNING: MemorySanitizer: use-of-uninitialized-value is reported
#5  0x00005584f0981441 in my_write (Filedes=22,
Buffer=0x72500003e818 "5\n./slave-relay-bin.000003\n21385\n
master-bin.000001\n21643\n0\n", '\245' <repeats 141 times>..., Count=118,
MyFlags=532) at /home/sujatha/bug_repo/test-10.5-msan/mysys/my_write.c:49

Analysis:
=========
In parallel replication at the end of each statement execution the worker execution
status is updated in 'relay-log.info' file. When two workers try to flush
the status at the same time, since the write to cache is not serialized both
workers write to the same address simultaneously and increment the
length twice. Because of this the length of buffer is more than actual data.
When flush code tries to read the buffer beyond valid data length MSAN
reports uninitialized values error.

Fix:
===
Serialize the relay log flush operation using "rli->data_lock".
2020-06-25 16:14:23 +05:30
Andrei Elkin
e156a8da08 MDEV-21851: Error in BINLOG_BASE64_EVENT i s always error-logged as if it is done by Slave
The prefix of error log message out of a failed BINLOG applying
is corrected to be the sql command name.
2020-06-12 11:25:27 +03:00
Marko Mäkelä
4a0b56f604 Merge 10.4 into 10.5 2020-05-31 10:28:59 +03:00
Marko Mäkelä
6da14d7b4a Merge 10.3 into 10.4 2020-05-30 11:04:27 +03:00
Marko Mäkelä
dad7a8ee7d Merge 10.2 into 10.3 2020-05-27 17:10:39 +03:00
Andrei Elkin
0c1f97b3ab MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTIL
The immediate bug was caused by a failure to recognize a correct
position to stop the slave applier run in optimistic parallel mode.
There were the following set of issues that the analysis unveil.
1 incorrect estimate for the event binlog position passed to
  is_until_satisfied
2 wait for workers to complete by the driver thread did not account non-group events
  that could be left unprocessed and thus to mix up the last executed
  binlog group's file and position:
  the file remained old and the position related to the new rotated file
3 incorrect 'slave reached file:pos' by the parallel slave report in the error log
4 relay log UNTIL missed out the parallel slave branch in
  is_until_satisfied.

The patch addresses all of them to simplify logics of log change
notification in either the master and relay-log until case.
P.1 is addressed with passing the event into is_until_satisfied()
for proper analisis by the function.
P.2 is fixed by changes in handle_queued_pos_update().
P.4 required removing relay-log change notification by workers.
Instead the driver thread updates the notion of the current relay-log
fully itself with aid of introduced
bool Relay_log_info::until_relay_log_names_defer.

An extra print out of the requested until file:pos is arranged
with --log-warning=3.
2020-05-26 18:49:43 +03:00
Andrei Elkin
dbe447a789 MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTIL
The immediate bug was caused by a failure to recognize a correct
position to stop the slave applier run in optimistic parallel mode.
There were the following set of issues that the analysis unveil.
1 incorrect estimate for the event binlog position passed to
  is_until_satisfied
2 wait for workers to complete by the driver thread did not account non-group events
  that could be left unprocessed and thus to mix up the last executed
  binlog group's file and position:
  the file remained old and the position related to the new rotated file
3 incorrect 'slave reached file:pos' by the parallel slave report in the error log
4 relay log UNTIL missed out the parallel slave branch in
  is_until_satisfied.

The patch addresses all of them to simplify logics of log change
notification in either the master and relay-log until case.
P.1 is addressed with passing the event into is_until_satisfied()
for proper analisis by the function.
P.2 is fixed by changes in handle_queued_pos_update().
P.4 required removing relay-log change notification by workers.
Instead the driver thread updates the notion of the current relay-log
fully itself with aid of introduced
bool Relay_log_info::until_relay_log_names_defer.

An extra print out of the requested until file:pos is arranged
with --log-warning=3.
2020-05-26 18:26:50 +03:00
Monty
d1d472646d Change THD->transaction to a pointer to enable multiple transactions
All changes (except one) is of type
thd->transaction.  -> thd->transaction->

thd->transaction points by default to 'thd->default_transaction'
This allows us to 'easily' have multiple active transactions for a
THD object, like when reading data from the mysql.proc table
2020-05-23 12:29:10 +03:00
Marko Mäkelä
53aabda6b5 Merge 10.4 into 10.5 2020-03-27 09:39:15 +02:00
Sergey Vojtovich
1c8de231a3 dequeued_count my_atomic to Atomic_counter
Also allocate inuse_relaylog with new rather than my_malloc(MY_ZEROFILL).
2020-03-25 23:49:38 +04:00
Marko Mäkelä
3b25083785 Merge 10.4 into 10.5 2020-03-23 10:50:14 +02:00
Sergey Vojtovich
a39d92ca57 gtid_pos_table: my_atomic to std::atomic 2020-03-21 17:36:38 +04:00
Andrei Elkin
c8ae357341 MDEV-742 XA PREPAREd transaction survive disconnect/server restart
Lifted long standing limitation to the XA of rolling it back at the
transaction's
connection close even if the XA is prepared.

Prepared XA-transaction is made to sustain connection close or server
restart.
The patch consists of

    - binary logging extension to write prepared XA part of
      transaction signified with
      its XID in a new XA_prepare_log_event. The concusion part -
      with Commit or Rollback decision - is logged separately as
      Query_log_event.
      That is in the binlog the XA consists of two separate group of
      events.

      That makes the whole XA possibly interweaving in binlog with
      other XA:s or regular transaction but with no harm to
      replication and data consistency.

      Gtid_log_event receives two more flags to identify which of the
      two XA phases of the transaction it represents. With either flag
      set also XID info is added to the event.

      When binlog is ON on the server XID::formatID is
      constrained to 4 bytes.

    - engines are made aware of the server policy to keep up user
      prepared XA:s so they (Innodb, rocksdb) don't roll them back
      anymore at their disconnect methods.

    - slave applier is refined to cope with two phase logged XA:s
      including parallel modes of execution.

This patch does not address crash-safe logging of the new events which
is being addressed by MDEV-21469.

CORNER CASES: read-only, pure myisam, binlog-*, @@skip_log_bin, etc

Are addressed along the following policies.
1. The read-only at reconnect marks XID to fail for future
   completion with ER_XA_RBROLLBACK.

2. binlog-* filtered XA when it changes engine data is regarded as
   loggable even when nothing got cached for binlog.  An empty
   XA-prepare group is recorded. Consequent Commit-or-Rollback
   succeeds in the Engine(s) as well as recorded into binlog.

3. The same applies to the non-transactional engine XA.

4. @@skip_log_bin=OFF does not record anything at XA-prepare
   (obviously), but the completion event is recorded into binlog to
   admit inconsistency with slave.

The following actions are taken by the patch.

At XA-prepare:
   when empty binlog cache - don't do anything to binlog if RO,
   otherwise write empty XA_prepare (assert(binlog-filter case)).

At Disconnect:
   when Prepared && RO (=> no binlogging was done)
     set Xid_cache_element::error := ER_XA_RBROLLBACK
     *keep* XID in the cache, and rollback the transaction.

At XA-"complete":
   Discover the error, if any don't binlog the "complete",
   return the error to the user.

Kudos
-----
Alexey Botchkov took to drive this work initially.
Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of
good recommendations.
Sergei Voitovich made a magnificent review and improvements to the code.
They all deserve a bunch of thanks for making this work done!
2020-03-14 22:45:48 +02:00
Sergei Golubchik
c1c5222cae cleanup: PSI key is *always* the first argument 2020-03-10 19:24:23 +01:00
Sergei Golubchik
22b6d8487a perfschema file instrumentation related changes 2020-03-10 19:24:22 +01:00
Sergei Golubchik
7c58e97bf6 perfschema memory related instrumentation changes 2020-03-10 19:24:22 +01:00