Commit graph

3342 commits

Author SHA1 Message Date
Kristian Nielsen
ed34927065 MDEV-7936: Assertion `!table || table->in_use == _current_thd()' failed on parallel replication in optimistic mode
Additional 10.1-specific test case.
2015-04-13 14:38:25 +02:00
Kristian Nielsen
2de8db6296 Merge MDEV-7936 into 10.1 2015-04-13 14:28:07 +02:00
Kristian Nielsen
17aff4b17b Merge MDEV-7936 into 10.0.
Conflicts:
	sql/sql_base.cc
2015-04-13 14:27:25 +02:00
Kristian Nielsen
60d094aeac MDEV-7936: Assertion `!table || table->in_use == _current_thd()' failed on parallel replication in optimistic mode
Make sure that in parallel replication, we execute wait_for_prior_commit()
before setting table->in_use for a temporary table. Otherwise we can end up
with two parallel replication worker threads competing with each other for
use of a temporary table.

Re-factor the use of find_temporary_table() to be able to handle errors
in the caller (as wait_for_prior_commit() can return error in case of
deadlock kill).
2015-04-13 14:24:18 +02:00
Kristian Nielsen
c47fe0e9db MDEV-7668: Intermediate master groups CREATE TEMPORARY with INSERT, causing parallel replication failure
[This commit cherry-picked to be able to merge MDEV-7936, of which it
is a pre-requisite, into both 10.0 and 10.1.]

Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.

In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.

More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.

This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.

(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).

Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.

This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
2015-04-13 14:08:57 +02:00
Sergei Golubchik
d214c83b5e mtr: make search_pattern_in_file.inc more verbose
report when a pattern is found
do not abort, but merely report when a pattern is not found
2015-04-11 10:22:26 +02:00
Sergei Golubchik
a73676b2e6 Merge branch '10.1' into bb-10.1-serg 2015-04-10 19:32:14 +02:00
Sergei Golubchik
dd8f931957 be less annoying about sysvar-based table attributes
do not *always* add them to the create table definition,
but only when a sysvar value is different from a default.

also, when adding them - don't quote numbers
2015-04-10 02:36:54 +02:00
Kristian Nielsen
50d98e9cbd Merge MDEV-7940 into 10.0 2015-04-09 10:13:17 +02:00
Kristian Nielsen
abba4184e6 Merge MDEV-7940 into 10.1 2015-04-09 10:05:27 +02:00
Kristian Nielsen
15a2b5aab1 MDEV-7940: Sporadic failure in rpl.rpl_gtid_until
Fix a race in the test case. When we do start_slave.inc immediately
followed by stop_slave.inc, it is possible to kill the IO thread while
it is still running inside get_master_version_and_clock(), and this
gives warnings in the error log that cause the test to fail.
2015-04-09 10:03:59 +02:00
Kristian Nielsen
accdabd668 Merge MDEV-7888 and MDEV-7929 into 10.0. 2015-04-08 13:19:22 +02:00
Kristian Nielsen
7ee1a41ce1 MDEV-7888, MDEV-7929: Parallel replication hangs sometimes on ANALYZE TABLE or DDL
Follow-up patch with 10.1-specific changes.

Add test cases that more closely resembles the original bug report (which uses
the 10.1-specific --slave-parallel-mode=optimistic).

Also fix the code so that ANALYZE statements are now marked as DDL, and will
not be attempted to speculatively run in parallel with other transactions.
2015-04-08 13:15:04 +02:00
Kristian Nielsen
48c10fb5f7 Merge MDEV-7888 and MDEV-7929 into 10.1. 2015-04-08 11:04:24 +02:00
Kristian Nielsen
3b961347db MDEV-7888, MDEV-7929: Parallel replication hangs sometimes on ANALYZE TABLE or DDL
The hangs occur when the group_commit_orderer object is freed before the last
mark_start_commit() call on it - this loses the wakeup to other waiting worker
threads, causing them to hang until killed manually.

The object was freed because wakeup_subsequent_commits() was called two early
in two places. For MDEV-7888, during ANALYZE TABLE, and for MDEV-7929 during
record_gtid() after processing a DDL event. The group_commit_orderer object
can be freed when its last transaction has called wait_for_prior_commit().

Fix by implementing a suspend/resume mechanism for wakeup_subsequent_commits()
that can be used in places where a transaction is committed without this being
the commit of the actual replication event group.

Also add a protection mechanism (that asserts in debug builds) which can
prevent the too-early free and hang if other similar bugs should remain in
other parts of the code.
2015-04-08 11:01:18 +02:00
Alexander Barkov
7f613ebdb6 MDEV-7284 INDEX: CREATE OR REPLACE 2015-04-03 15:43:55 +04:00
Kristian Nielsen
f573b65e41 Merge MDEV-7847 and MDEV-7882 into 10.0.
Conflicts:
	mysql-test/suite/rpl/r/rpl_parallel.result
	sql/rpl_parallel.cc
2015-03-30 15:10:29 +02:00
Kristian Nielsen
c41e4d3b49 Merge MDEV-7847 and MDEV-7882 into 10.0.
Conflicts:
	mysql-test/suite/rpl/r/rpl_parallel.result
	mysql-test/suite/rpl/t/rpl_parallel.test
2015-03-30 14:51:25 +02:00
Kristian Nielsen
880f2273fd MDEV-7847: "Slave worker thread retried transaction 10 time(s) in vain, giving up", followed by replication hanging
This patch fixes a bug in the error handling in parallel replication, when one
worker thread gets a failure and other worker threads processing later
transactions have to rollback and abort.

The problem was with the lifetime of group_commit_orderer objects (GCOs).
A GCO is freed when we register that its last event group has committed. This
relies on register_wait_for_prior_commit() and wait_for_prior_commit() to
ensure that the fact that T2 has committed implies that any earlier T1 has
also committed, and can thus no longer execute mark_start_commit().

However, in the error case, the code was skipping the
register_wait_for_prior_commit() and wait_for_prior_commit() calls. Thus
commit ordering was not guaranteed, and a GCO could be freed too early. Then a
later mark_start_commit() would reference deallocated GCO, which could lead to
lost wakeup (causing slave threads to hang) or other corruption.

This patch makes also the error case respect commit order. This way, also the
error case gets the GCO lifetime correct, and the hang no longer occurs.
2015-03-30 14:33:44 +02:00
Kristian Nielsen
3d4850158f Fix embarrassing bug in test case that caused sporadic test failures. 2015-03-17 10:36:38 +01:00
Kristian Nielsen
1f8efee584 Merge MDEV-7198: status variable for Slave_skipped_errors 2015-03-16 14:54:16 +01:00
Kristian Nielsen
ef4d8db5ec MDEV-6981: feature request MASTER_GTID_WAIT status variables
Review fixes:
 - Coding style
 - Fix bad .result file
 - Fix test to be tolerant of different timing.
 - Fix test to give better info in case of unexpected timing.
2015-03-16 14:40:29 +01:00
Kristian Nielsen
0e717c5bf4 Merge branch 'mdev-6981-master_gtid_wait-status-variables' of https://github.com/openquery/mariadb-server into danblack
Conflicts:
	sql/mysqld.cc
2015-03-16 13:41:11 +01:00
Daniel Black
9362dd43ff additional slave_skip_errors status 2015-03-16 23:15:36 +11:00
Daniel Black
51ea3939b4 Complete test for status slave_skipped_errors 2015-03-16 23:06:30 +11:00
Kristian Nielsen
2e82a8233c MDEV-7785: errorneous -> erroneous spelling mistake 2015-03-16 10:54:47 +01:00
Venkatesh Duggirala
59142d9a27 Bug CREATE TABLE DB.TABLE LIKE TMPTABLE IS
BINLOGGED INCORRECTLY - BREAKS A SLAVE

Submitted a incomplete patch with my previous push,
re submitting the extra changes the required to make
the patch complete.
2015-03-13 13:13:48 +05:30
Venkatesh Duggirala
151b8ec4d1 Bug CREATE TABLE DB.TABLE LIKE TMPTABLE IS BINLOGGED INCORRECTLY - BREAKS A SLAVE
Analysis:
In row based replication, Master does not send temp table information
to Slave. If there are any DDLs that involves in regular table that needs
to be sent to Slave and a temp tables (which will not be available at Slave),
the Master rewrites the query replacing temp table with it's defintion.
Eg: create table regular_table like temptable.
In rewrite logic, server is ignoring the database of regular table
which can cause problems mentioned in this bug.

Fix: dont ignore database information (if available) while
rewriting the query
2015-03-13 12:32:44 +05:30
Oleksandr Byelkin
dab12366b1 MDEV-6956:SET STATEMENT default_master_connection = ... has no effect
the problem was in assigning default value during parsing.
2015-03-12 09:47:36 +01:00
Daniel Black
fa5809ce10 Add Master_gtid_wait_{count,time,timeouts} status
MASTER_GTID_WAIT function needs some status to evaluate its use.

master_gtid_wait_count indicates how many times the function is called.
master_gtid_wait_time indicates how much time in microseconds occurred
waiting (or timing out)
master_gtid_timeouts indicates how many time times this function timed
out rather than all successful gtids events being available.
2015-03-12 06:43:38 +11:00
Kristian Nielsen
ed04c40b01 MDEV-5289: master server starts slave parallel threads
Delay spawning parallel replication worker threads until a slave SQL
thread is running, and de-spawn them when the last SQL thread stops.

This is especially useful to avoid needless threads on a master in a
setup where same my.cnf is used on masters and slaves.
2015-03-11 09:18:16 +01:00
Kristian Nielsen
96784eb106 MDEV-7668: Intermediate master groups CREATE TEMPORARY with INSERT, causing parallel replication failure
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.

In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.

More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.

This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.

(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).

Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.

This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
2015-03-09 13:17:37 +01:00
Sergei Golubchik
2db62f686e Merge branch '10.0' into 10.1 2015-03-07 13:21:02 +01:00
Kristian Nielsen
7dee7a036a GTID: Add missing test of reconnecting into out-of-order binlog. 2015-03-05 09:41:01 +01:00
Kristian Nielsen
95d7208859 Merge MDEV-6589 and MDEV-6403 into 10.1.
Conflicts:
	sql/log.cc
	sql/rpl_rli.cc
	sql/sql_repl.cc
2015-03-04 13:49:37 +01:00
Kristian Nielsen
3ef0b9b235 Merge MDEV-6589 and MDEV-6403 into 10.0. 2015-03-04 13:36:54 +01:00
Kristian Nielsen
78c74dbe30 MDEV-6403: Temporary tables lost at STOP SLAVE in GTID mode if master has not rotated binlog since restart
The binlog contains specially marked format description events to mark
when a master restart happened (which could have caused temporary
tables to be silently dropped). Such events also cause slave to close
temporary tables.

However, there was a bug that if after this, slave re-connects to the
master in GTID mode, the master can send an old format description
event again. If temporary tables are closed when such event is seen
for the second time, it might drop temporary tables created after that
event, and cause replication failure.

With this patch, the restart flag of the format description event is
cleared by the master when it is sent to the slave in a subsequent
connection, to avoid the errorneous temp table close.
2015-03-04 13:36:29 +01:00
Kristian Nielsen
ad0d203f2e MDEV-6589: Incorrect relay log start position when restarting SQL thread after error in parallel replication
The problem occurs in parallel replication in GTID mode, when we are using
multiple replication domains. In this case, if the SQL thread stops, the
slave GTID position may refer to a different point in the relay log for each
domain.

The bug was that when the SQL thread was stopped and restarted (but the IO
thread was kept running), the SQL thread would resume applying the relay log
from the point of the most advanced replication domain, silently skipping all
earlier events within other domains. This caused replication corruption.

This patch solves the problem by storing, when the SQL thread stops with
multiple parallel replication domains active, the current GTID
position. Additionally, the current position in the relay logs is moved back
to a point known to be earlier than the current position of any replication
domain. Then when the SQL thread restarts from the earlier position, GTIDs
encountered are compared against the stored GTID position. Any GTID that was
already applied before the stop is skipped to avoid duplicate apply.

This patch should have no effect if multi-domain GTID parallel replication is
not used. Similarly, if both SQL and IO thread are stopped and restarted, the
patch has no effect, as in this case the existing relay logs are removed and
re-fetched from the master at the current global @@gtid_slave_pos.
2015-03-04 13:36:04 +01:00
Alexander Barkov
87b0cc9912 MDEV-7286 TRIGGER: CREATE OR REPLACE, CREATE IF NOT EXISTS
Based on the patch by Sriram Patil, made under terms of GSoC 2014.
2015-03-04 09:52:01 +04:00
Nirbhay Choubey
75a27eeaf7 MDEV-4987: Sort by domain_id when list of GTIDs are output
Added logic to sort gtid list based on domain_id before
populating them in string. Added a test case.
2015-02-27 23:33:22 -05:00
Kristian Nielsen
aa845d123c MDEV-6391: GTID binlog state not recovered if mariadb-bin.state is removed
When the server starts up, check if the master-bin.state file was lost.
If it was, recover its contents by scanning the last binlog file, thus
avoiding running with a corrupt binlog state.
2015-02-27 14:34:52 +01:00
Alexander Barkov
2d01907c1d MDEV-7281 EVENT: CREATE OR REPLACE 2015-02-27 13:34:18 +04:00
Kristian Nielsen
a227cf8046 MDEV-7335: Potential parallel slave deadlock with specific binlog corruption
If somehow the COMMIT or XID event in an event group was missing, the code in
parallel replication to handle this was not sufficient, leading to server
deadlock.
2015-02-24 14:39:15 +01:00
Sergei Golubchik
0ba168020e MDEV-6769 DROP TRIGGER IF NOT EXIST binlogged on master but not on slave
don't return from DROP TRIGGER IF NOT EXISTS on the slave side
early when the trigger couldn't be read
2015-02-22 12:54:52 +01:00
Sergei Golubchik
b739103f12 MDEV-7591 master crashed when slave specfied a future position with semi-repl plugin
cherry-pick the upstream fix

commit d4ba10184cd7bde9c31c610e664ecd0c93605c46
Author: Sujatha Sivakumar <sujatha.sivakumar@oracle.com>
Date:   Wed Jul 2 11:34:11 2014 +0530

    Bug#17453826:ASSERTION ERROR WHEN SETTING FUTURE BINLOG
    FILE/POS WITH SEMISYNC

    Problem:
    ========
    When DMLs are in progress on the master stopping a slave and
    setting ahead binlog name/pos will cause an assert on the
    master.
    ...
2015-02-22 12:54:52 +01:00
Sergei Golubchik
d7e7862364 Merge branch '5.5' into 10.0 2015-02-18 15:16:27 +01:00
Sergei Golubchik
8e80f91fa3 Merge remote-tracking branch 'mysql/5.5' into bb-5.5-merge @ mysql-5.5.42 2015-02-11 23:50:40 +01:00
Monty
3a3ec744b5 cleanups done as part of adding encryption
- Fixed compiler warnings
- Added include/wait_for_binlog_checkpoint.inc, as suggested by JonasO
- Updated 'build-tags' to work with git (Patch by Serg)
2015-02-10 10:21:16 +01:00
Sergei Golubchik
4280b25ed8 --getopt-prefix-matching command-line option 2015-02-10 10:21:15 +01:00
Kristian Nielsen
8672339328 MDEV-6676: Optimistic parallel replication
Adjust the configuration options, as discussed on the
maria-developers@ mailing list.

The option to hint a transaction to not be replicated in parallel is
now called @@skip_parallel_replication, consistent with
@@skip_replication.

And the --slave-parallel-mode is now simplified to have just one of
the following values:

  none
  minimal
  conservative
  optimistic
  aggressive

This reflects successively harder efforts to find opportunities to run
things in parallel on the slave. It allows to extend the server with
more automatic heuristics in the future without having to introduce a
new configuration option for each and every one.
2015-02-07 09:42:58 +01:00