Commit graph

34130 commits

Author SHA1 Message Date
Kristian Nielsen
7671fd70c0 MDEV-7080: rpl.rpl_gtid_crash fails sporadically in buildbot
The real problem here was inconsistent handling of entry->commit_errno in
MYSQL_BIN_LOG::write_transaction_or_stmt(). Some return paths were setting it
to the value of errno, some where not. And the setting was redundant anyway,
as it is set consistently by the caller.

Fix by consistently setting it in the caller, and not in each return path in
the function.

The test failure happened because a DBUG_EXECUTE_IF() used in the test case
set an entry->commit_errno that was immediately overwritten in the caller with
whatever happened to be the value of errno. This could lead to different error
message in the .result file.
2014-11-17 08:53:42 +01:00
Kristian Nielsen
26b1113032 MDEV-6917: Parallel replication: "Commit failed due to failure of an earlier commit on which this one depends", but no prior failure seen
This bug was seen when parallel replication experienced a deadlock between
transactions T1 and T2, where T2 has reached the commit phase and is waiting
for T1 to commit first. In this case, the deadlock is broken by sending a kill
to T2; that kill error is then later detected and converted to a deadlock
error, which causes T2 to be rolled back and retried.

The problem was that the kill caused ha_commit_trans() to errorneously call
wakeup_subsequent_commits() on T3, signalling it to abort because T2 failed
during commit. This is incorrect, because the error in T2 is only a temporary
error, which will be resolved by normal transaction retry. We should not
signal error to the next transaction until we have executed the code that
handles such temporary errors.

So this patch just removes the calls to wakeup_subsequent_commits() from
ha_commit_trans(). They are incorrect in this case, and they are not needed in
general, as wakeup_subsequent_commits() must in any case be called in
finish_event_group() to wakeup any transactions that may have started to wait
after ha_commit_trans(). And normally, wakeup will in fact have happened
earlier, either from the binlog group commit code, or (in case of no
binlogging) after the fast part of InnoDB/XtraDB group commit.

The symptom of this bug was that replication would break on some transaction
with "Commit failed due to failure of an earlier commit on which this one
depends", but with no such failure of an earlier commit visible anywhere.
2014-11-13 11:01:31 +01:00
Kristian Nielsen
3dcd01e5e6 MDEV-7065: Incorrect relay log position in parallel replication after retry of transaction
The retry of an event group in parallel replication set the wrong value for
the end log position of the event that was retried
(qev->future_event_relay_log_pos). It was too large by the size of the event,
so it pointed into the middle of the following event.

If the retry happened in the very last event of the event group, _and_ the SQL
thread was stopped just after successfully retrying that event, then the SQL
threads's relay log position would be left incorrect. Restarting the SQL
thread could then try to read events from a garbage offset in the relay log,
usually leading to an error about not being able to read the event.
2014-11-13 10:46:09 +01:00
Kristian Nielsen
d08b893b39 MDEV-6775: Wrong binlog order in parallel replication: Intermediate commit
The code in binlog group commit around wait_for_commit that controls commit
order, did the wakeup of subsequent commits early, as soon as a following
transaction is put into the group commit queue, but before any such commit has
actually taken place. This causes problems with too early wakeup of
transactions that need to wait for prior to commit, but do not take part in
the binlog group commit for one reason or the other.

This patch solves the problem, by moving the wakeup to happen only after the
binlog group commit is completed.

This requires a new solution to ensure that transactions that arrive later
than the leader are still able to participate in group commit. This patch
introduces a flag wait_for_commit::commit_started. When this is set, a waiter
can queue up itself in the group commit queue.

This way, effectively the wait_for_prior_commit() is skipped only for
transactions that participate in group commit, so that skipping the wait is
safe. Other transactions still wait as needed for correctness.
2014-11-13 10:31:20 +01:00
Kristian Nielsen
ecc33da2ca Fix a confusing error message in the testsuite 2014-11-13 09:49:33 +01:00
Jan Lindström
0f32299437 MDEV-7035: Remove innodb_io_capacity setting depending on
setting of innodb_io_capacity_max

(a) Changed the behaviour so that if you set innodb_io_capacity to a 
value > innodb_io_capacity_max that the value is accepted AND 
that innodb_io_capacity_max = innodb_io_capacity * 2.

(b) If someone wants to reduce innodb_io_capacity_max and 
reduce it below innodb_io_capacity then innodb_io_capacity 
should be reduced to the same level as innodb_io_capacity_max.

In both cases give a warning to user.
2014-11-13 13:24:26 +02:00
Jan Lindström
da52110497 MDEV-7070: rpl.rpl_innodb_bug68220 fails in buildbot 2014-11-12 16:14:08 +02:00
Elena Stepanova
a1dfaa28a1 MDEV-7073 main.information_schema and main.information_schema_all_engines fail in buildbot on a build without perfschema
main.information_schema: added a condition to the query to exclude perfschema tables
main.information_schema_all_engines: added a call to the include file to check for the presence of perfschema
2014-11-12 06:27:56 +04:00
Elena Stepanova
aee7e67101 MDEV-7075 perfschema.mks_timer-6258 test not skipped on builds without perfschema
Added a call for the include file
2014-11-12 05:40:21 +04:00
Sergey Petrunya
9578a332a2 Fix buildbot failure: make selectivity.test and selectivity_innodb.test work when
table names are case-insensitive.
2014-11-11 19:59:31 +03:00
Alexander Barkov
9e8202013a MDEV-6965 non-captured group \2 in regexp_replace 2014-11-10 16:43:27 +04:00
Jan Lindström
080fdbf937 MDEV-4396: Fix innodb.innodb_bug14676111 test. 2014-11-03 15:47:57 +02:00
Sergei Golubchik
2160646c1d 5.5 merge 2014-11-03 17:47:37 +01:00
Alexander Barkov
d1ca1c1fae MDEV-7001 Bad result for NOT NOT STRCMP('a','b') and NOT NOT NULLIF(2,3)
The bug is not very important per se, but it was helpful to move
Item_func_strcmp out of Item_bool_func2 (to Item_int_func),
for the purposes of "MDEV-4912 Add a plugin to field types (column types)".
2014-11-02 01:08:09 +04:00
Kristian Nielsen
bad5fdec18 Fix sporadic test failure in main.processlist
The test runs a query in one thread, then in another queries the processlist
and expects to find the first thread in the COM_SLEEP state. The problem is
that the thread signals completion to the client before changing to COM_SLEEP
state, so there is a window where the other thread can see the wrong state.

A previous attempt to fix this was ineffective. It set a DEBUG_SYNC to handle
proper waiting, but unfortunately that DEBUG_SYNC point ended up triggering
already at the end of SET DEBUG_SYNC=xxx, so the wait was ineffective.

Fix it properly now (hopefully) by ensuring that we wait for the DEBUG_SYNC
point to trigger at the end of the SELECT SLEEP(), not just at the end of
SET DEBUG_SYNC=xxx.
2014-10-31 12:48:17 +01:00
Sergey Petrunya
e4521f8cae Merge 2014-10-29 15:20:46 +03:00
Sergey Petrunya
30b28babdc Merge 5.3->5.5 2014-10-29 13:22:48 +03:00
Igor Babaev
100b10d8ef Fixed bug mdev-6843.
The function  get_column_range_cardinality() returned a wrong result for any column
containing only null values.
2014-10-28 22:31:52 -07:00
Igor Babaev
2d088e265c Merge 2014-10-28 16:31:26 -07:00
Sergey Petrunya
a8341dfd6e MDEV-6879: Dereference of NULL primary_file->table in DsMrr_impl::get_disk_sweep_mrr_cost()
(Backport to 5.3)
(Attempt #2)
- Don't attempt to use BKA for materialized derived tables. The 
  table is neither filled nor fully opened yet, so attempt to 
  call handler->multi_range_read_info() causes crash.
2014-10-29 01:46:05 +03:00
Sergey Petrunya
9cb002b359 MDEV-6878: Use of uninitialized saved_primary_key in Mrr_ordered_index_reader::resume_read()
(Backport to 5.3)
(variant #2, with fixed coding style)
- Make Mrr_ordered_index_reader::resume_read() restore index position 
  only if it was saved before with Mrr_ordered_index_reader::interrupt_read().
2014-10-29 01:37:58 +03:00
Sergey Petrunya
94c8f33569 MDEV-6888: Query spends a long time in best_extension_by_limited_search with mrr enabled
- TABLE::create_key_part_by_field() should not set PART_KEY_FLAG in field->flags
  = The reason is that it is used by hash join code which calls it to create a hash
    table lookup structure. It doesn't create a real index.
  = Another caller of the function is TABLE::add_tmp_key(). Made it to set the flag itself.

- The differences in join_cache.result could also be observed before this patch: one
  could put "FLUSH TABLES" before the queries and get exactly the same difference.
2014-10-29 01:20:45 +03:00
Igor Babaev
592b7fbac9 Fixed bug mdev-6325.
Field::selectivity should be set for all fields used in range conditions.
2014-10-28 14:33:31 -07:00
Jan Lindström
dbc9123f1c Fix test failure. 2014-10-27 11:03:17 +02:00
Jan Lindström
caeffc7a7d MDEV-6926: innodb_rows_updated is misleading on slav
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6

This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220

Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.

innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.

dd2d11be7a
2014-10-26 07:22:51 +02:00
Jan Lindström
60e995cfec MDEV-6930: Make innodb_max_dirty_pages_pct my.cnf variable a double
Merged Facebook commit ecff018632c6db49bad73d9233c3cdc9f41430e9
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6

This change is to fix: http://bugs.mysql.com/62534

This makes innodb_max_dirty_pages_pct a double with min,default,max values
0.001, 75, 99.999.

This also makes innodb_max_dirty_pages_pct_lwm and adaptive_flushing_lwm
doubles, as these sysvars are inter-dependent.

Added more to the BUFFER POOL AND MEMORY section of SHOW INNODB STATUS:
Percent pages dirty: X.X
This is all n_dirty_pages / used_pages
Percent all pages dirty: X.X
This is all n_dirty_pages / all-pages
Max dirty pages percent: X.X
This is innodb_max_dirty_pages_pct

Also changed all of buf from 2 to 3 digits of precision (%.2f -> %.3f).
2014-10-25 09:24:39 +03:00
Sergey Petrunya
135bf1fcfb Merge 2014-10-21 00:02:24 +04:00
Sergey Petrunya
1a996bde1e MDEV-6879: Dereference of NULL primary_file->table in DsMrr_impl::get_disk_sweep_mrr_cost()
(Attempt #2)
- Don't attempt to use BKA for materialized derived tables. The 
  table is neither filled nor fully opened yet, so attempt to 
  call handler->multi_range_read_info() causes crash.
2014-10-20 23:35:34 +04:00
Alexander Barkov
81194d9203 MDEV-6649 Different warnings for TIME and TIME(N) when @@old_mode=zero_date_time_cast 2014-10-20 16:42:00 +04:00
Kristian Nielsen
848d1166b6 Attempt to fix a failure in test case innodb.innodb_information_schema seen occasionally in Buildbot.
The test case waits for other threads to complete, but the wait is only 2
seconds. This is likely to sometimes be too little on our heavily loaded
buildbot VMs, that can easily stall for more than 2 seconds from time to time.

So let's try to increase the timeout (to about 40 seconds) and see if it
helps.
2014-10-29 15:10:02 +01:00
Kristian Nielsen
2fdb88e870 Fix a spurious test failure in rpl.rpl_show_slave_hosts
The test case runs SHOW SLAVE HOSTS. The output of this is only stable after
all slaves have had time to register with the master; this happens
asynchroneously.

The test was waiting for the slave with server_id=3 to appear in the output,
but it was missing a similar wait for server_id=2. Thus, if server_id=2 was
much slower to connect for some reason, it could be missing from the output,
causing the test to fail.
2014-10-29 14:44:40 +01:00
Kristian Nielsen
8e63a7fe27 Yet another attempt at fixing random failures in test case main.myisam-metadata
I think I finally found the problem, managed to reproduce locally using a
sleep in the test case to simulate the particular race condition that causes
the test to fail often in Buildbot.

The test starts an ALTER TABLE that does repair by sort in one thread, then
another thread waits for the sort to be visible in SHOW PROCESSLIST and runs a
SHOW statement in parallel.

The problem happens when the sort manages to run to completion before the
other thread has the time to look at SHOW PROCESSLIST. In this case, the wait
times out because the state looked for has already passed.

Earlier I added some DEBUG_SYNC to prevent this race, but it turns out that
DEBUG_SYNC itself changes the state in the processlist. So when the debug sync
point was hit, the processlist was showing the wrong state, so the wait would
still time out.

Fixed now by looking for the processlist to contain either the "Repair by
sorting" state or the debug sync wait stage.

Also clean up previous attempts to fix it. Set the wait timeout back to
reasonable 60 seconds, and simplify the DEBUG_SYNC operations to work closer
to how the original test case was intended.
2014-10-29 13:39:22 +01:00
Sergey Petrunya
ad66fafbbb Merge 2014-10-29 14:22:25 +03:00
Kristian Nielsen
5b99f4d330 Increase timeout for check-testcase and friends, in an attempt to cure some random buildbot test failures. 2014-10-28 12:45:39 +01:00
Kristian Nielsen
bd3d16754b Increase wait timeout in test main.myisam-metadata, in an attempt to get rid of Buildbot random failures. 2014-10-22 15:05:59 +02:00
Kristian Nielsen
64af1ecc20 Fix two races in test main.processlist that could cause random failures (seen in Buildbot)
1. Do not use NULL `info' field in processlist to select the thread of
interest. This can fail if the read of processlist ends up happening after
REAP succeeds, but before the `info' field is reset. Instead, select on the
CONNECTION_ID(), making sure we still scan the whole list to trigger the same
code as in the original test case.

2. Wait for the query to really complete before reading it in the
processlist. When REAP returns, it only means that ack has been sent to
client, the reset of query stage happens a bit later in the code.
2014-10-22 13:51:33 +02:00
Kristian Nielsen
b27a09561f Attempt to fix a rare random test error in main.information_schema.
Add missing REAP to the test.

A later test failed with strange incorrect values for COM_SELECT
in information_schema.global_status. Since global_status is updated
at the end of session activity, it seems appropriate to ensure that
all background connections have completed before accessing it.

(I checked that the original bug still triggers the test case after
the modification with REAP).
2014-10-21 15:23:40 +02:00
Kristian Nielsen
2b4445e60e Fix test failure in perfschema.myisam_file_io when perfschema is not compiled into the server. 2014-10-20 09:36:41 +02:00
Sergey Petrunya
af4d469a8d MDEV-6879: Dereference of NULL primary_file->table in DsMrr_impl::get_disk_sweep_mrr_cost()
- Don't attempt to use BKA for materialized derived tables. The 
  table is neither filled nor fully opened yet, so attempt to 
  call handler->multi_range_read_info() causes crash.
2014-10-16 22:58:08 +04:00
Sergey Petrunya
8925b4a935 MDEV-6878: Use of uninitialized saved_primary_key in Mrr_ordered_index_reader::resume_read()
- Make Mrr_ordered_index_reader::resume_read() restore index position 
  only if it was saved before with Mrr_ordered_index_reader::interrupt_read().
2014-10-16 17:57:13 +04:00
Sergey Vojtovich
f09a8ba6a0 MDEV-6872 - innodb.innodb fails on PPC64
innodb_buffer_pool_pages_total depends on page size. On Power8 it is 65k
compared to 4k on Intel. As we round allocations on page size we may get
slightly more memory for buffer pool.
2014-10-15 12:11:34 +04:00
Sergey Petrunya
b261ec393a MDEV-6484: Assertion `tab->ref.use_count' failed on query with joins, constant table, multi-part key
- test_if_skip_sort_order()/create_ref_for_key() may change table 
  access from EQ_REF(index1) to REF(index2). 
- Doing so doesn't make much sense from optimization POV, but since 
  they are doing it, they should update tab->read_record.unlock_row
  accordingly.
2014-10-14 15:11:06 +04:00
Sergei Golubchik
ba85a008e4 merge 2014-10-10 20:59:06 +02:00
Sergei Golubchik
1b75bed00f 5.5.40+ merge 2014-10-09 10:30:11 +02:00
Sergey Vojtovich
6f762cdd6c Backport from 10.0:
revno: 4301
committer: Alexey Botchkov <holyfoot@askmonty.org>
branch nick: 10exp
timestamp: Tue 2014-07-22 15:28:15 +0500
message:
  gis-precise.test fixed to work on Power8.
------------------------------------------------------------
revno: 4295
committer: Alexey Botchkov <holyfoot@askmonty.org>
branch nick: 10exp
timestamp: Mon 2014-07-21 13:07:48 +0500
message:
  gis-precise test fixed to pass on Power8.
2014-10-08 18:10:31 +04:00
Sergei Golubchik
9329467e79 don't run privilege checking tests in embedded 2014-10-08 09:24:41 +02:00
Sergei Golubchik
104771de9a decimal: *correct* implementation of ROUND_UP at last 2014-10-08 00:46:10 +02:00
Sergei Golubchik
fc58ba6c76 MDEV-5553 A view or procedure with a non existing definer can block "SHOW TABLE STATUS" with an unclear error message
Don't double-check privileges for a column in the GROUP BY that refers to
the same column in SELECT clause. Privileges were already checked for SELECT clause.
2014-10-07 11:55:39 +02:00
Sergei Golubchik
e5bc21af37 MDEV-4813 Replication fails on updating a MEMORY table with an index using btree
skip NULL VARCHAR key parts like it's done elsewhere
2014-10-07 10:54:14 +02:00
Sergei Golubchik
d2e808025a fixes for decimal type 2014-10-07 10:53:43 +02:00