Commit graph

63821 commits

Author SHA1 Message Date
f4rnham
060ec5b6b9 MDEV-7130: MASTER_POS_WAIT(log_name,log_pos,timeout,"connection_name") hangs, does not respect the timeout
Changed also arg_count check for connection_name to prevent same bug
if fifth argument is introduced in future
2015-04-24 13:08:27 +02:00
Kristian Nielsen
b616991a68 MDEV-8031: Parallel replication stops on "connection killed" error (probably incorrectly handled deadlock kill)
There was a rare race, where a deadlock error might not be correctly
handled, causing the slave to stop with something like this in the error
log:

150423 14:04:10 [ERROR] Slave SQL: Connection was killed, Gtid 0-1-2, Internal MariaDB error code: 1927
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001 position 1234

The problem was incorrect error handling. When a deadlock is detected, it
causes a KILL CONNECTION on the offending thread. This error is then later
converted to a deadlock error, and the transaction is retried.

However, the deadlock error was not cleared at the start of the retry, nor
was the lingering kill signal. So it was possible to get another deadlock
kill early during retry. If this happened with particular thread
scheduling/timing, it was possible that the new KILL CONNECTION error was
masked by the earlier deadlock error, so that the second kill was not
properly converted into a deadlock error and retry.

This patch adds code that clears the old error and killed flag before
starting the retry. It also adds code to handle a deadlock kill caught in a
couple of places where it was not handled before.
2015-04-23 14:09:15 +02:00
Kristian Nielsen
4760528754 MDEV-8029: test failure in rpl.rpl_parallel_temptable
Fix a silly typo that caused the test to occasionally fail.
2015-04-21 10:16:14 +02:00
Kristian Nielsen
519ad0f7e3 MDEV-8016: Replication aborts on DROP /*!40005 TEMPORARY */ TABLE IF EXISTS
This was a regression from the patch for MDEV-7668.

A test was incorrect, so the slave would not properly handle re-using
temporary tables, which lead to replication failure in this case.
2015-04-20 12:59:46 +02:00
Kristian Nielsen
a8523559e9 Merge MDEV-7975 into 10.0 2015-04-14 14:23:35 +02:00
Kristian Nielsen
5d2b85a297 MDEV-7975: sporadic failure in test case rpl.rpl_gtid_startpos
Add some suppressions that were missing. They are for if a STOP SLAVE is
executed early during IO thread startup, when it is negotiating with the
master. The master connection may be killed in the middle of a
mysql_real_query(), which is not a test failure if it is a network error.

This also caught one real code error, fixed with this commit: The I/O thread
would fail to automatically reconnect if a network error happened while
fetching the value of @@GLOBAL.gtid_domain_id.
2015-04-14 13:03:11 +02:00
Kristian Nielsen
17aff4b17b Merge MDEV-7936 into 10.0.
Conflicts:
	sql/sql_base.cc
2015-04-13 14:27:25 +02:00
Kristian Nielsen
60d094aeac MDEV-7936: Assertion `!table || table->in_use == _current_thd()' failed on parallel replication in optimistic mode
Make sure that in parallel replication, we execute wait_for_prior_commit()
before setting table->in_use for a temporary table. Otherwise we can end up
with two parallel replication worker threads competing with each other for
use of a temporary table.

Re-factor the use of find_temporary_table() to be able to handle errors
in the caller (as wait_for_prior_commit() can return error in case of
deadlock kill).
2015-04-13 14:24:18 +02:00
Kristian Nielsen
c47fe0e9db MDEV-7668: Intermediate master groups CREATE TEMPORARY with INSERT, causing parallel replication failure
[This commit cherry-picked to be able to merge MDEV-7936, of which it
is a pre-requisite, into both 10.0 and 10.1.]

Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.

In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.

More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.

This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.

(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).

Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.

This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
2015-04-13 14:08:57 +02:00
Kristian Nielsen
50d98e9cbd Merge MDEV-7940 into 10.0 2015-04-09 10:13:17 +02:00
Kristian Nielsen
15a2b5aab1 MDEV-7940: Sporadic failure in rpl.rpl_gtid_until
Fix a race in the test case. When we do start_slave.inc immediately
followed by stop_slave.inc, it is possible to kill the IO thread while
it is still running inside get_master_version_and_clock(), and this
gives warnings in the error log that cause the test to fail.
2015-04-09 10:03:59 +02:00
Kristian Nielsen
670d4dd84f Merge MDEV-7910' into 10.0 2015-04-08 15:10:22 +02:00
Kristian Nielsen
b3c7c8cde8 MDEV-7910: innodb.binlog_consistent fails sporadically in buildbot
The test case was missing --source include/wait_for_binlog_checkpoint.inc.
So it could occasionally fail if the checkpoint managed to occur just at the
right point in time between fetching the two binlog positions to compare.
2015-04-08 15:08:53 +02:00
Kristian Nielsen
accdabd668 Merge MDEV-7888 and MDEV-7929 into 10.0. 2015-04-08 13:19:22 +02:00
Kristian Nielsen
3b961347db MDEV-7888, MDEV-7929: Parallel replication hangs sometimes on ANALYZE TABLE or DDL
The hangs occur when the group_commit_orderer object is freed before the last
mark_start_commit() call on it - this loses the wakeup to other waiting worker
threads, causing them to hang until killed manually.

The object was freed because wakeup_subsequent_commits() was called two early
in two places. For MDEV-7888, during ANALYZE TABLE, and for MDEV-7929 during
record_gtid() after processing a DDL event. The group_commit_orderer object
can be freed when its last transaction has called wait_for_prior_commit().

Fix by implementing a suspend/resume mechanism for wakeup_subsequent_commits()
that can be used in places where a transaction is committed without this being
the commit of the actual replication event group.

Also add a protection mechanism (that asserts in debug builds) which can
prevent the too-early free and hang if other similar bugs should remain in
other parts of the code.
2015-04-08 11:01:18 +02:00
Jan Lindström
e9c10f9916 MDEV-7908: assertion in innobase_release_savepoint
Problem was that in XA prepared state we should still be able to
release a savepoint, but assertions were too strict.
2015-04-06 17:38:51 +03:00
Jan Lindström
b53bcd438f MDEV-7367: Updating a virtual column corrupts table which crashes server
Analysis: MySQL table definition contains also virtual columns. Similarly,
index fielnr references MySQL table fields. However, InnoDB table definition
does not contain virtual columns. Therefore, when matching MySQL key fieldnr
we need to use actual column name to find out referenced InnoDB dictionary
column name.

Fix: Add new function to match MySQL index key columns to InnoDB dictionary.
2015-03-31 09:16:48 +03:00
Jan Lindström
0563f49ba3 MDEV-7754: innodb assert "array->n_elems < array->max_elems" on a huge blob update
Replace static array of thread sync levels with std::vector.
2015-03-31 09:16:48 +03:00
Kristian Nielsen
c41e4d3b49 Merge MDEV-7847 and MDEV-7882 into 10.0.
Conflicts:
	mysql-test/suite/rpl/r/rpl_parallel.result
	mysql-test/suite/rpl/t/rpl_parallel.test
2015-03-30 14:51:25 +02:00
Kristian Nielsen
880f2273fd MDEV-7847: "Slave worker thread retried transaction 10 time(s) in vain, giving up", followed by replication hanging
This patch fixes a bug in the error handling in parallel replication, when one
worker thread gets a failure and other worker threads processing later
transactions have to rollback and abort.

The problem was with the lifetime of group_commit_orderer objects (GCOs).
A GCO is freed when we register that its last event group has committed. This
relies on register_wait_for_prior_commit() and wait_for_prior_commit() to
ensure that the fact that T2 has committed implies that any earlier T1 has
also committed, and can thus no longer execute mark_start_commit().

However, in the error case, the code was skipping the
register_wait_for_prior_commit() and wait_for_prior_commit() calls. Thus
commit ordering was not guaranteed, and a GCO could be freed too early. Then a
later mark_start_commit() would reference deallocated GCO, which could lead to
lost wakeup (causing slave threads to hang) or other corruption.

This patch makes also the error case respect commit order. This way, also the
error case gets the GCO lifetime correct, and the hang no longer occurs.
2015-03-30 14:33:44 +02:00
Kristian Nielsen
3d4850158f Fix embarrassing bug in test case that caused sporadic test failures. 2015-03-17 10:36:38 +01:00
Kristian Nielsen
2e82a8233c MDEV-7785: errorneous -> erroneous spelling mistake 2015-03-16 10:54:47 +01:00
Kristian Nielsen
184f718fef MDEV-7249: Performance problem in parallel replication with multi-level slaves
Parallel replication (in 10.0 / "conservative" mode) relies on binlog group
commits to group transactions that can be safely run in parallel on the
slave. The --binlog-commit-wait-count and --binlog-commit-wait-usec options
exist to increase the number of commits per group. But in case of conflicts
between transactions, this can cause unnecessary delay and reduced througput,
especially on a slave where commit order is fixed.

This patch adds a heuristics to reduce this problem. When transaction T1 goes
to commit, it will first wait for N transactions to queue up for a group
commit. However, if we detect that another transaction T2 is waiting for a row
lock held by T1, then we will skip the wait and let T1 commit immediately,
releasing locks and let T2 continue.

On a slave, this avoids the unfortunate situation where T1 is waiting for T2
to join the group commit, but T2 is waiting for T1 to release locks, causing
no work to be done for the duration of the --binlog-commit-wait-usec timeout.

(The heuristic seems reasonable on the master as well, so it is enabled for
all transactions, not just replication transactions).
2015-03-13 14:01:52 +01:00
Alexander Barkov
bc902a2bfc MDEV-7387 [PATCH] Alter table xxx CHARACTER SET utf8, CONVERT TO CHARACTER SET latin1 should fail
A contribution from Daniel Black, with minor additional enhancements.
2015-03-13 16:12:54 +04:00
Kristian Nielsen
ed04c40b01 MDEV-5289: master server starts slave parallel threads
Delay spawning parallel replication worker threads until a slave SQL
thread is running, and de-spawn them when the last SQL thread stops.

This is especially useful to avoid needless threads on a master in a
setup where same my.cnf is used on masters and slaves.
2015-03-11 09:18:16 +01:00
Jan Lindström
a7fd11b31d MDEV-7685: MariaDB - server crashes when inserting more rows than
available space on disk

Add error handling when disk full situation happens and
intentionally bring server down with stacktrace because
on all cases InnoDB can't continue anyway.
2015-03-09 18:33:25 +02:00
Elena Stepanova
ec16d1b62f MDEV-7107 Sporadic test failure in multi_source.multisource
Extend show_slave_status.inc to run SHOW ALL SLAVES STATUS and
SHOW SLAVE 'name' STATUS on demand, and make the test use
the include file instead of direct SHOW statements
2015-03-09 15:42:26 +02:00
Kristian Nielsen
96784eb106 MDEV-7668: Intermediate master groups CREATE TEMPORARY with INSERT, causing parallel replication failure
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.

In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.

More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.

This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.

(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).

Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.

This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
2015-03-09 13:17:37 +01:00
Jan Lindström
040027c888 MDEV-7627 :Some symbols in table name can cause to Error Code: 1050
when created FK

Analysis: Table name is on filename charset but foreign key
identifiers are not. This lead incorrect foreign key
identifier number to be used.

Fix: Convert foreign key identifier to filename charset before
comparing it to table name when largest foreign key identifier
number is resolved.
2015-03-09 09:47:25 +02:00
Elena Stepanova
6fc0a8af24 MDEV-7187 perfschema.aggregate fails sporadically in buildbot
During slow execution, e.g. under valgrind, there was a chance
that Aria checkpoint would happen while P_S tables were being
queried; it could cause different data in joined P_S, and
thus combinations of results that the test did not expect.

Fixed by disabling Aria checkpoints for the test.
2015-03-08 23:12:19 +02:00
Sergei Golubchik
c0af8213fe MDEV-7669 tmp_table_count-7586 fails in ps and embedded
disable a broken test, pending a proper fix
2015-03-06 20:49:21 +01:00
Sergei Golubchik
5f510a9175 Merge branch '5.5' into 10.0 2015-03-06 18:41:32 +01:00
Sergei Golubchik
d7d19071d2 MDEV-6838 Using too big key for internal temp tables
update test results after the fix
2015-03-06 17:03:46 +01:00
Sergei Golubchik
12d87c3ba5 MDEV-7659 buildbot may leave stale mysqld
safe_process puts its children (mysqld, in this case) into a separate
process group, to be able to kill it all at once.

buildslave kills mtr's process group when it loses connection to
the master.

result? buildslave kills mtr and safe_process, but leaves stale
mysqld processes in their own process groups.

fix: put safe_process itself into a separate process group, then
buildslave won't kill it and safe_process will kill mysqld'd
and itself when it will notice that the parent mtr no longer exists.
2015-03-06 11:15:55 +01:00
Jan Lindström
206b111b11 MDEV-7672: Crash creating an InnoDB table with foreign keys
Analysis: after a red-black-tree lookup we use node withouth
checking did lookup succeed or not. This lead to situation
where NULL-pointer was used.

Fix: Add additional check that found node from red-back-tree
is valid.
2015-03-06 11:19:23 +02:00
Kristian Nielsen
7dee7a036a GTID: Add missing test of reconnecting into out-of-order binlog. 2015-03-05 09:41:01 +01:00
Kristian Nielsen
3ef0b9b235 Merge MDEV-6589 and MDEV-6403 into 10.0. 2015-03-04 13:36:54 +01:00
Kristian Nielsen
78c74dbe30 MDEV-6403: Temporary tables lost at STOP SLAVE in GTID mode if master has not rotated binlog since restart
The binlog contains specially marked format description events to mark
when a master restart happened (which could have caused temporary
tables to be silently dropped). Such events also cause slave to close
temporary tables.

However, there was a bug that if after this, slave re-connects to the
master in GTID mode, the master can send an old format description
event again. If temporary tables are closed when such event is seen
for the second time, it might drop temporary tables created after that
event, and cause replication failure.

With this patch, the restart flag of the format description event is
cleared by the master when it is sent to the slave in a subsequent
connection, to avoid the errorneous temp table close.
2015-03-04 13:36:29 +01:00
Kristian Nielsen
ad0d203f2e MDEV-6589: Incorrect relay log start position when restarting SQL thread after error in parallel replication
The problem occurs in parallel replication in GTID mode, when we are using
multiple replication domains. In this case, if the SQL thread stops, the
slave GTID position may refer to a different point in the relay log for each
domain.

The bug was that when the SQL thread was stopped and restarted (but the IO
thread was kept running), the SQL thread would resume applying the relay log
from the point of the most advanced replication domain, silently skipping all
earlier events within other domains. This caused replication corruption.

This patch solves the problem by storing, when the SQL thread stops with
multiple parallel replication domains active, the current GTID
position. Additionally, the current position in the relay logs is moved back
to a point known to be earlier than the current position of any replication
domain. Then when the SQL thread restarts from the earlier position, GTIDs
encountered are compared against the stored GTID position. Any GTID that was
already applied before the stop is skipped to avoid duplicate apply.

This patch should have no effect if multi-domain GTID parallel replication is
not used. Similarly, if both SQL and IO thread are stopped and restarted, the
patch has no effect, as in this case the existing relay logs are removed and
re-fetched from the master at the current global @@gtid_slave_pos.
2015-03-04 13:36:04 +01:00
Vicențiu Ciorbaru
45b6edb158 MDEV-6838: Using too big key for internal temp tables
This bug manifests due to wrong computation and evaluation of
keyinfo->key_length. The issues were:
* Using table->file->max_key_length() as an absolute value that must not be
  reached for a key, while it represents the maximum number of bytes
  possible for a table key.
* Incorrectly computing the keyinfo->key_length size during
  KEY_PART_INFO creation. The metadata information regarding the key
  such the field length (for strings) was added twice.
2015-02-28 23:58:05 +02:00
Kristian Nielsen
aa845d123c MDEV-6391: GTID binlog state not recovered if mariadb-bin.state is removed
When the server starts up, check if the master-bin.state file was lost.
If it was, recover its contents by scanning the last binlog file, thus
avoiding running with a corrupt binlog state.
2015-02-27 14:34:52 +01:00
Vicențiu Ciorbaru
ec4ff9a2e7 MDEV-7586: Merged derived tables/VIEWs increment created_tmp_tables
Temporary table count fix. The number of temporary tables was increased
when the table is not actually created. (when do_not_open was passed
as TRUE to create_tmp_table).
2015-02-26 23:09:54 +02:00
Sergei Golubchik
5c66abf0b0 Merge remote-tracking branch 'origin/10.0' into 10.0 2015-02-25 16:34:33 +01:00
Sergey Petrunya
4a3e94e025 MDEV-7413: optimizer_use_condition_selectivity > 2 crashes 10.0.15+maria-1~wheezy
Add a testcase. The bug itself was fixed by the fix for MDEV-7316.
2015-02-25 16:58:36 +03:00
Alexander Barkov
f825b5a4ee MDEV-7629 Regression: Bit and hex string literals changed column names in 10.0.14 2015-02-25 14:13:32 +04:00
Kristian Nielsen
a227cf8046 MDEV-7335: Potential parallel slave deadlock with specific binlog corruption
If somehow the COMMIT or XID event in an event group was missing, the code in
parallel replication to handle this was not sufficient, leading to server
deadlock.
2015-02-24 14:39:15 +01:00
Kristian Nielsen
41cfdc838e Add error handling on realpath() call.
(Without this, it happened for me that realpath() failed returning
undef for the default vardir. This in turn caused mysql-test-run.pl to
delete the source mysql-test/ directory.)

Backport from 10.1, it's not nice to get one's source directory nuked
by a rouge mysql-test-run.
2015-02-23 13:36:52 +01:00
Kristian Nielsen
b5d6aa5517 MDEV-7310: last_commit_pos_offset set to wrong value after binlog rotate in group commit
When the binlog was rotated due to @@max_binlog_size, the values of the
binlog_shapshot_file and binlog_snapshot_position were inconsistent in case of
non-transactional DML. The position was refering to the old file, while the
filename was of the new file after rotation. This patch makes them consistent
by making sure the position is also refering to the new file.
2015-02-23 13:27:51 +01:00
Sergei Golubchik
f2cb45daf3 Merge remote-tracking branch 'origin/10.0' into 10.0 2015-02-22 21:45:24 +01:00
Sergei Golubchik
3653de82e5 test failure on labrador: account for a different errno on Mac OS X 2015-02-22 12:54:52 +01:00