Commit graph

2329 commits

Author SHA1 Message Date
Sachin Setiya
2fe6186124 MDEV-10715 Galera: Replicate MariaDB GTID to other nodes in the cluster
Problem:- Gtid are not transferred in Galera Cluster.

Solution:- We need to transfer gtid in the case on either when cluster is
slave/master in async replication. In normal Gtid replication gtid are generated on
recieving node itself and it is always on sync with other nodes. Because galera keeps
node in sync , So all nodes get same no of event groups. So the issue arises when
say galera is slave in async replication.
A
|    (Async replication)
D <-> E <-> F  {Galera replication}
So what should happen is that all node should apply the master gtid but this does
node happen, becuase node E, F does not recieve gtid from D in write set , So what E(or F)
does is that it applies wsrep_gtid_domain_id, D server-id , E gtid next seq no. This
generated gtid does not always work when say A has different domain id.

So In this commit, on galera node when we see that this event is recieved from master
we simply write Gtid_Log_Event in write_set and send it to other nodes.
2017-12-25 13:57:42 +05:30
Andrei Elkin
c7e38076f3 MDEV-9510 Segmentation fault in binlog thread causes crash
With combination of --log-bin and Galera the server may crash
reporting two characteristic stacks:

  /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0xc7)[0x7f182a8e2cb7]
  /usr/sbin/mysqld(binlog_background_thread+0x2b5)[0x7f182a8e3275]

or

  /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7ff395b2dafd]
  /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7ff395b2db91]
  /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7ff395b300b2]

The reason of the failure appears to be non-matching decrements for
  `xid_count_per_binlog::xid_count`
which can occur when a transaction is executed having its connection issued
`SET @@sql_log_bin=0`. In such case the xid count is not incremented but
its decrements still runs to turn `binlog_xid_count_list` into improper state
which the following FLUSH BINARY LOGS exposes through the crash.

*Note_1*: the regression test reuses an existing galera.sql_log_bin
which does not run stably (even in its base form) by mtr with --log-bin.

*Note_2*: 10.0-galera branch is free of this issue having missed MDEV-7205
fixes.
2017-11-15 22:26:32 +02:00
Andrei Elkin
aae4932775 MDEV-12012/MDEV-11969 Can't remove GTIDs for a stale GTID Domain ID
As reported in MDEV-11969 "there's no way to ditch knowledge" about some
domain that is no longer updated on a server. Besides being of annoyance to
clutter output in DBA console stale domains can prevent the slave
to connect the master as MDEV-12012 witnesses.
What domain is obsolete must be evaluated by the user (DBA) according
to whether the domain info is still relevant and will the domain ever
receive any update.

This patch introduces a method to discard obsolete gtid domains from
the server binlog state. The removal requires no event group from such
domain present in existing binlog files though. If there are any the
containing logs must be first PURGEd in order for

  FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)

succeed. Otherwise the command returns an error.

The list of obsolete domains can be computed through
intersecting two sets - the earliest (first) binlog's Gtid_list
and the current value of @@global.gtid_binlog_state - and extracting
the domain id components from the intersection list items.
The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog
omitting the deleted domains from the active binlog file's Gtid_list.
Notice though when the command is ineffective - that none of requested to delete
domain exists in the binlog state - rotation does not occur.

Obsolete domain deletion is not harmful for connected slaves as long
as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID.
The slaves must have the last event from purged files processed as usual,
in order not to bump later into requesting a gtid from a file which
was already gone.
While the command is not replicated (as ordinary FLUSH BINLOG LOGS is)
slaves, even though having extra domains, won't suffer from reconnection errors
thanks to master-slave gtid connection protocol allowing the master
to be ignorant about a gtid domain.
Should at failover such slave to be promoted into master role it may run
the ex-master's

 FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)

to clean its own binlog state.

NOTES.
  suite/perfschema/r/start_server_low_digest.result
is re-recorded as consequence of internal parser codes changes.
2017-11-15 22:26:32 +02:00
Sergei Golubchik
2e3a16e366 Merge branch '10.0' into 10.1 2017-09-21 22:02:21 +02:00
Sachin Setiya
e84f5356c3 MDEV-12290 Wrong timestamps in binary log causes replication issues
Binlog_background_thread does not make a call to set_time(), And when
we call binlog_checkpoint_log_event->write() , we write the wrong timestamp.
In this patch we correct this by calling thd->set_time().
2017-09-21 13:22:49 +05:30
Marko Mäkelä
829752973b Merge branch '10.0' into 10.1 2017-08-30 13:06:13 +03:00
Andrei Elkin
888a8b69bd MDEV-13437 InnoDB fails to return error for XA COMMIT or XA ROLLBACK in read-only mode
Assertions failed due to incorrect handling of the --tc-heuristic-recover
option when InnoDB is in read-only mode either due to innodb_read_only=1
or innodb_force_recovery>3. InnoDB failed to refuse a XA COMMIT or
XA ROLLBACK operation, and there were errors in the error handling in
the upper layer.

This was fixed by making InnoDB XA operations respect the
high_level_read_only flag. The InnoDB part of the fix and
parts of the test main.tc_heuristic_recover were provided
by Marko Mäkelä.

LOCK_log mutex lock/unlock had to be added to fix MDEV-13438.
The measure is confirmed by mysql sources as well.

For testing of the conflicting option combination, mysql-test-run is
made to export a new $MYSQLD_LAST_CMD. It holds the very last value
generated by mtr.mysqld_start().  Even though the options have been
also always stored in $mysqld->{'started_opts'} there were no access
to them beyond the automatic server restart by mtr through the expect
file interface.

Effectively therefore $MYSQLD_LAST_CMD represents a more general
interface to $mysqld->{'started_opts'} which can be used in wider
scopes including server launch with incompatible options.

Notice another existing method to restart the server with incompatible
options relying on $MYSQLD_CMD is is aware of $mysqld->{'started_opts'}
(the actual options that the server is launched by mtr). In order to use
this method they would have to be provided manually.

NOTE: When merging to 10.2, the file search_pattern_in_file++.inc
should be replaced with the pre-existing search_pattern_in_file.inc.
2017-08-29 11:59:59 +03:00
Sergei Golubchik
8e8d42ddf0 Merge branch '10.0' into 10.1 2017-08-08 10:18:43 +02:00
Sergei Golubchik
da2a838628 MDEV-12824 GCC 7 warning: this statement may fall through [-Wimplicit-fallthrough=] 2017-07-20 20:13:28 +02:00
Marko Mäkelä
b61700c221 Merge 10.0 into 10.1 2017-05-23 08:59:03 +03:00
Sergei Golubchik
725e47bfb5 Merge branch '5.5' into 10.0 2017-05-20 00:59:40 +02:00
Marko Mäkelä
13a350ac29 Merge 10.0 into 10.1 2017-05-19 12:29:37 +03:00
Sachin Setiya
b5cdf01404 MDEV-11092 Assertion `!writer.checksum_len || writer.remains == 0' failed
Problem:-
This crash happens because logged stmt is quite big and while writing
Annotate_rows_log_event it throws EFBIG error  but we ignore this error
and do not call cache_data->set_incident().

Solution:-
When we normally write Binlog_log_event we check for error EFBIG, but we did
do this for Annotate_rows_log_event. We check for this error and call
cache_data->set_incident() accordingly.

# Conflicts:
#	sql/log.cc
2017-05-18 17:13:37 +05:30
Marko Mäkelä
71cd205956 Silence bogus GCC 7 warnings -Wimplicit-fallthrough
Do not silence uncertain cases, or fix any bugs.

The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
2017-05-17 08:27:04 +03:00
Marko Mäkelä
7972da8aa1 Silence bogus GCC 7 warnings -Wimplicit-fallthrough
Do not silence uncertain cases, or fix any bugs.

The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
2017-05-17 08:07:02 +03:00
Sergei Golubchik
71b4503242 MDEV-9998 Fix issues caught by Clang's -Wpointer-bool-conversion warning
remove useless checks
and a couple of others
2017-05-15 22:23:10 +02:00
Marko Mäkelä
adc91387e3 Merge 10.0 into 10.1 2017-03-03 13:27:12 +02:00
Monty
f3c65ce951 Add protection to not access is_open() without LOCK_log mutex
Protection added to reopen_file() and new_file_impl().

Without this we could get an assert in fn_format() as name == 0,
because the file was closed and name reset, atthe same time
new_file_impl() was called.
2017-02-28 16:10:47 +01:00
Monty
4bad74e139 Added error checking for all calls to flush_relay_log_info() and stmt_done() 2017-02-28 16:10:47 +01:00
Sergei Golubchik
2f20d297f8 Merge branch '10.0' into 10.1 2016-12-11 09:53:42 +01:00
Sergei Golubchik
3e8155c637 Merge branch '5.5' into 10.0 2016-12-09 16:33:48 +01:00
Sergei Golubchik
f5e0522d92 MDEV-10388 MariaDB 10.1.x keeps (deleted) ML* files in tmpdir after LOAD DATA completes
truncate unused IO_CACHE backing store files in binlog_cache_data
to release the disk space they were occupying
2016-12-07 13:32:11 +01:00
Sergei Golubchik
f640527e65 typo fixed: s/MSYQL/MYSQL/ 2016-12-03 22:02:00 +01:00
Sergei Golubchik
66d9696596 Merge branch '10.0' into 10.1 2016-09-28 17:55:28 +02:00
Sergei Golubchik
77ce4ead81 Merge branch '5.5' into 10.0 2016-09-27 09:21:19 +02:00
Sergei Golubchik
8483659f4f report correct write error on log writes 2016-09-26 12:20:28 +02:00
Nirbhay Choubey
467217e669 MDEV-9510: Print extra info to error log
Activated by enabling wsrep_debug.
2016-08-26 12:45:48 -04:00
Kristian Nielsen
48fbb2bf07 MDEV-10553: Semi-sync replication hangs when master opens new binlog file
In the AFTER_SYNC case, semi-sync was taking the binlog file name from
the wrong place, so around binlog rotation it could be using the new
name with a position belonging to the previous binlog file name.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2016-08-16 12:34:58 +02:00
Nirbhay Choubey
3fd214c8be MDEV-9423: cannot add new node to the cluser: Binlog..
.. file '/var/log/mysql/mariadb-bin.000001' not found in binlog
index, needed for recovery. Aborting.

In Galera cluster, while preparing for rsync/xtrabackup based
SST, the donor node takes an FTWRL followed by (REFRESH_ENGINE_LOG
in rsync based state transfer and) REFRESH_BINARY_LOG. The latter
rotates the binary log and logs Binlog_checkpoint_log_event
corresponding to the penultimate binary log file into the new file.
The checkpoint event for the current file is later logged
synchronously by binlog_background_thread.

Now, since in rsync/xtrabackup based snapshot state transfer methods,
only the last binary log file is transferred to the joiner node; the
file could get transferred even before the checkpoint event for the
same file gets written to it. As a result, the joiner node would fail
to start complaining about the missing binlog file needed for recovery.

In order to fix this, a mechanism has been put in place to make
REFRESH_BINARY_LOG operation wait for Binlog_checkpoint_log_event
to be logged for the current binary log file if the node is part of
a Galera cluster. As further safety, during rsync based state transfer
the donor node now acquires and owns LOCK_log for the duration of file
transfer during SST.
2016-06-29 16:50:53 -04:00
Sergei Golubchik
3361aee591 Merge branch '10.0' into 10.1 2016-06-28 22:01:55 +02:00
Sergei Golubchik
c081c978a2 Merge branch '5.5' into bb-10.0 2016-06-21 14:11:02 +02:00
Sergei Golubchik
ae29ea2d86 Merge branch 'mysql/5.5' into 5.5 2016-06-14 13:55:28 +02:00
Sujatha Sivakumar
ef3f09f0c9 Bug#23251517: SEMISYNC REPLICATION HANGING
Revert following bug fix:

Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE

This fix results in a deadlock between slave IO thread
and SQL thread.

(cherry picked from commit e3fea6c6dbb36c6ab21c4ab777224560e9608b53)
2016-05-16 11:34:20 +02:00
Monty
732adec0a4 Removed some not needed when doing delete thd, which caused warnings about
wrong mutex usage from safe_mutex.
Ensure that LOCK_status is always taken before LOCK_thread_count
2016-04-28 13:39:55 +03:00
Sujatha Sivakumar
3a8f43bec7 Bug#22897202: RPL_IO_THD_WAIT_FOR_DISK_SPACE HAS OCCASIONAL
FAILURES

Analysis:
=========
Test script is not ensuring that "assert_grep.inc" should be
called only after 'Disk is full' error is written to the
error log.

Test checks for "Queueing master event to the relay log"
state. But this state is set before invoking 'queue_event'.
Actual 'Disk is full' error happens at a very lower level.
It can happen that we might even reset the debug point
before even the actual disk full simulation occurs and the
"Disk is full" message will never appear in the error log.

In order to guarentee that we must have some mechanism where
in after we write "Disk is full" error messge into the error
log we must signal the test to execute SSS and then reset
the debug point. So that test is deterministic.

Fix:
===
Added debug sync point to make script deterministic.
2016-04-19 11:44:34 +05:30
Sergei Golubchik
3b0c7ac1f9 Merge branch '10.0' into 10.1 2016-03-21 13:02:53 +01:00
Otto Kekäläinen
1777fd5f55 Fix spelling: occurred, execute, which etc 2016-03-04 02:09:37 +02:00
Sujatha Sivakumar
8361151765 Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE

Problem:
========
Currently SHOW SLAVE STATUS blocks if IO thread waits for
disk space. This makes automation tools verifying
server health block on taking relevant action. Finally this
will create SHOW SLAVE STATUS piles.

Analysis:
=========
SHOW SLAVE STATUS hangs on mi->data_lock if relay log write
is waiting for free disk space while holding mi->data_lock.
mi->data_lock is needed to protect the format description
event (mi->format_description_event) which is accessed by
the clients running FLUSH LOGS and slave IO thread. Note
relay log writes don't need to be protected by
mi->data_lock, LOCK_log is used to protect relay log between
IO and SQL thread (see MYSQL_BIN_LOG::append_event). The
code takes mi->data_lock to protect
mi->format_description_event during relay log rotate which
might get triggered right after relay log write.

Fix:
====
Release the data_lock just for the duration of writing into
relay log.

Made change to ensure the following lock order is maintained
to avoid deadlocks.

data_lock, LOCK_log

data_lock is held during relay log rotations to protect
the description event.
2016-03-01 12:29:51 +05:30
Sergei Golubchik
a5679af1b1 Merge branch '10.0' into 10.1 2016-02-23 21:35:05 +01:00
Sergei Golubchik
271fed4106 Merge branch '5.5' into 10.0 2016-02-15 22:50:59 +01:00
Sergei Golubchik
f3444df415 Merge branch 'mysql/5.5' into 5.5
reverted about half of commits as either not applicable or
outright wrong
2016-02-09 11:27:40 +01:00
Sergei Golubchik
0278151260 fix galera.lp1438990 test
binlog_savepoint_rollback() should not try to truncate a binlog
unless binlog_savepoint_set has actually remembered the position
to truncate to.
2015-12-22 11:59:15 +01:00
Sergei Golubchik
97d2c9bf39 MDEV-9214 Server miscalculates the number of XA-capable engines
Relax the number-of-XA-engines check on recovery. Allow *more*
engines to be present than absolutely necessary, extra engines
cannot affect ACID guarantees of the recovery process.

As a bonus, 10.0->crash->10.1 upgrade won't complain about
wsrep being a new XA storge engine.
2015-12-19 13:59:29 +01:00
Sujatha Sivakumar
c5ba706791 Bug#22278455: MYSQL 5.5:RPL_BINLOG_INDEX FAILS IN VALGRIND.
Problem:
=======
rpl_binlog_index.test fails with following valgrind error.

line
Conditional jump or move depends on uninitialised value(s)
at 0x4C2F842: __memcmp_sse4_1 (in /usr/lib64/valgrind/
vgpreload_memcheck-amd64-linux.so)
0x739E39: find_uniq_filename(char*) (log.cc:2212)
0x73A11B: MYSQL_LOG::generate_new_name(char*, char const*)
(log.cc:2492)
0x73A1ED: MYSQL_LOG::init_and_set_log_file_name(char const*,
char const*, enum_log_type, cache_type) (log.cc:2289)
0x73B6F5: MYSQL_BIN_LOG::open(char const*, enum_log_type,


Analysis and fix:
=================
This issue was fixed as part of Bug#20459363 fix in 5.6 and
above. Hence backporting the fix to MySQL-5.5.
2015-12-16 10:48:57 +05:30
Sergei Golubchik
4a450928e0 fix a few spelling mistakes
https://github.com/MariaDB/server/pull/56
2015-12-11 15:21:42 +01:00
Vladislav Vaintroub
2f8c84fd16 MDEV-7588 Add thd_wait_begin/thd_wait_end to wait_for_binlog_endpos 2015-11-24 14:16:48 +01:00
Sergei Golubchik
4046ed12bc rbr and savepoint in a subtatement
Apply MySQL fix for bug#76727:
https://github.com/mysql/mysql-server/commit/69d4e72c

  Bug#20901025: SLAVE ASSERTION IN UNPACK_ROW WITH ROLLBACK TO
  SAVEPOINT IN ERROR HANDLER
2015-11-19 17:04:19 +01:00
Sergei Golubchik
beded7d9c9 Merge branch '10.0' into 10.1 2015-11-19 15:52:14 +01:00
Vladislav Vaintroub
dd90dae3c0 MDEV-7588 Add thd_wait_begin/end to notify threadpool of binlog waits 2015-11-17 20:08:36 +01:00
Vladislav Vaintroub
e669a5fd87 MDEV-7588 Add thd_wait_begin/end to notify threadpool of binlog waits 2015-11-17 18:33:08 +01:00