Commit graph

3329 commits

Author SHA1 Message Date
Sujatha
7df07c7666 MDEV-20953: binlog_encryption.rpl_corruption failed in buildbot due to wrong error code
Problem:
========
CURRENT_TEST: binlog_encryption.rpl_corruption

mysqltest: In included file "./include/wait_for_slave_io_error.inc":
...
At line 72: Slave stopped with wrong error code
**** Slave stopped with wrong error code: 1743 (expected 1595,1913) ****

Analysis:
========
The test emulates the corruption at the various stages of replication for
example in binlog file, in network and in relay log etc. It verifies that all
corruption cases are handled through appropriate error messages.

The test cases which emulate network failure expect following errors.
--ER_SLAVE_RELAY_LOG_WRITE_FAILURE (1595)
--ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE (1743)

Ideally test should expect error codes as 1595 and 1743.
But the test actually waits on incorrect error code 1595,1913

Fix:
===
Added appropriate error code for 'ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE'.
Replaced 1913 with 1743.
2019-11-12 16:31:08 +05:30
Andrei Elkin
13db50fc03 MDEV-19376 Repl_semi_sync_master::commit_trx assertion failure: ... || !m_active_tranxs->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos)
The assert indicates that the current transaction got caught uncleaned from
the semisync master's cache when it is signaled to proceed upon its
ack receive.

The reason of missed cleanup turns out to be a flaw in the gtid
connect mode.
A submitted by connecting slave value of its last received event's
binlog file *name* was adopted into
{{Repl_semi_sync_master::m_reply_file_name}} as a part of semisync
initialization.

Notice that the initialization still refines the position part of the
submitted last received event's binlog coordinates.
The master side binlog filename:pos refinement is
specific to the gtid connect mode for purpose of computing the latest
binlog file to resume slave feeding from.
Effectively in the gtid connect mode the computed resumption filename:pos
may appear smaller in which case a new post-connect time committing
transaction may be logged with its filename:pos also less than the
submitted coordinates and that triggers the assert.

Fixed with making the semisync initialization to use the refined filename:pos.
It is guaranteed to be less than any new generated transaction's binlog:pos.
2019-11-10 16:16:37 +02:00
Sujatha
9b80f9300d MDEV-20645: Replication consistency is broken as workers miss the error notification from an earlier failed group.
Analysis:
========
In general if there are three groups.
1 - Inserts 32 which fails due to local entry '32' on slave.
2 - Inserts 33
3 - Inserts 34

Each group considers itself as a waiter and it waits for prior group 'waitee'.
This is done in 'register_wait_for_prior_event_group_commit'. If there is no
other parallel group being scheduled then no waitee will be there.

Let us assume 3 groups are being scheduled in parallel.

3-> waits for 2-> waits for->1

'1' upon completion it checks is there any registered subsequent waiter. If
so it wakes up the subsequent waiter with its execution status. This execution
status is stored in wakeup_error.

If '1' failed then it sends corresponding wakeup_error to 2. Then '2' aborts
and it propagates error to '3'.  So all further commits are aborted.  This
mechanism works only when all transactions reach a stage where they are
waiting for their prior commit to complete.

In case of optimistic following scenario occurs.

1,2,3 are scheduled in parallel.

3 - Reaches group_commit_code waits for 2 to complete.
1 - errors out sets stop_on_error_sub_id=1.

When a group execution results in error its corresponding sub_id is set to
'stop_on_error_sub_id'. Any new groups queued for execution will check if
their sub_id is > stop_on_error_sub_id.  If it is true their execution will be
skipped as prior group execution failed.  'skip_event_group=1' will be set.
Since the execution of SQL thread is about to stop we just skip execution of
all the following event groups.  We still do all the normal waiting and wakeup
processing between the event groups as a simple way to ensure that everything
is stopped and cleaned up correctly.

Upon error '1' transaction checks for registered waiters. Since no one is
there it simply goes away.

2 - Starts the execution. It checks do I have a waitee.

Since wait_commit_sub_id == entry->last_committed_sub_id no waitee is set.

Secondly: 'entry->stop_on_error_sub_id' is set by '1'st execution.  Now
'handle_parallel_thread' code checks if the current group 'sub_id' is greater
than the 'sub_id' set within 'stop_on_error_sub_id'.

Since the above is true 'skip_event_group=true' is set.  Simply call
'wait_for_prior_commit' to wakeup all waiters.  Group '2' didn't had any
waitee and its execution is skipped.  Hence its wakeup_error=0.It sends a
positive wakeup signal to '3'. Which commits. This results in a missed
transaction. i.e 33 is missed and 34 is committed.

Fix:
===
When a worker learns that an earlier transaction execution has failed, and it
should not proceed for further execution, it should mark its own execution
status as failed so that it alerts its followers to abort as well.
2019-09-30 13:22:37 +05:30
Sachin
0e38cd37c7 MDEV-20137 rpl.mdev_17588 fails in buildbot with "Table doesn't exist"
Fix the test case.
2019-09-04 17:45:25 +05:30
Sachin
284c72eacf MDEV-17614 INSERT on dup key update is replication unsafe
Problem:-
When mysql executes INSERT ON DUPLICATE KEY INSERT, the storage engine checks
if the inserted row would generate a duplicate key error. If yes, it returns
the existing row to mysql, mysql updates it and sends it back to the storage
engine.When the table has more than one unique or primary key, this statement
is sensitive to the order in which the storage engines checks the keys.
Depending on this order, the storage engine may determine different rows
to mysql, and hence mysql can update different rows.The order that the
storage engine checks keys is not deterministic. For example, InnoDB checks
keys in an order that depends on the order in which indexes were added to
the table. The first added index is checked first. So if master and slave
have added indexes in different orders, then slave may go out of sync.

Solution:-
Make INSERT...ON DUPLICATE KEY UPDATE unsafe while using stmt or mixed format
When there is more then one unique key.
Although there is two exception.
  1. Auto Increment key is not counted because Innodb will get gap lock for
    failed Insert and concurrent insert will get a next increment value. But if
    user supplies auto inc value it can be unsafe.
  2. Count only unique keys for which insertion is performed.

So this patch also addresses the bug id #72921
2019-08-09 19:36:56 +05:30
Thirunarayanan Balathandayuthapani
47f8a18fec MDEV-20247 Replication hangs with "preparing" and never starts
- The commit ab6dd77408 wrongly sets the
condition inside innobase_srv_conc_enter_innodb().  Problem is that
InnoDB makes the thread to sleep indefinitely if it is a replication
slave thread.

Thanks to Sujatha Sivakumar for contributing the replication test case.
2019-08-07 12:35:04 +05:30
Sujatha
eef7540405 MDEV-18930: Failed CREATE OR REPLACE TEMPORARY not written into binary log makes data on master and slave diverge
Problem:
=======
Failed CREATE OR REPLACE TEMPORARY TABLE statement which dropped the table but
failed at a later stage of creation of temporary table is not written to
binarylog in row based replication. This causes the slave to diverge.

Analysis:
========
CREATE OR REPLACE statements work as shown below.

CREATE OR REPLACE TABLE table_name (a int);
is basically the same as:

DROP TABLE IF EXISTS table_name;
CREATE TABLE table_name (a int);

Hence every CREATE OR REPLACE TABLE command which dropped the table should be
written to binary log, even when following CREATE TABLE part fails. In order
to achieve this, during the execution of CREATE OR REPLACE command, when a
table is dropped 'thd->log_current_statement' flag is set. When table creation
results in an error within 'mysql_create_table' code, the error handling part
looks for this flag. If it is set the failed CREATE OR REPLACE statement is
written into the binary log inspite of error. This ensure that slave doesn't
diverge from the master. In case of row based replication the error handling
code returns very early, if the table is of type temporary. This is done based
on the assumption that temporary tables are not replicated in row based
replication.

It fails to handle the cases where a temporary table was created as part of
statement based replication at an earlier stage and the binary log format was
changed to row because of an unsafe statement. In this case when a CREATE OR
REPLACE statement is executed on this temporary table it will dropped but the
query will not be written to binary log. Hence slave diverges.

Fix:
===
In error handling code check the return status of create table operation. If
it is successful and replication mode is row based and table is of type
temporary then return. Other wise proceed further to the code which checks for
thd->log_current_statement flag and does appropriate logging.
2019-08-05 14:34:31 +05:30
Sujatha
8d0dabc56b MDEV-20091 DROP TEMPORARY table is logged despite no CREATE was logged
Fixing post push test issues.

Recorded the result files for following:
rpl_stm_drop_create_temp_table.result
rpl_mixed_drop_create_temp_table.result.
2019-07-25 14:24:04 +05:30
Sujatha
e32f29b7f3 MDEV-20091 DROP TEMPORARY table is logged despite no CREATE was logged
MDEV-5589 commit set up a policy to skip DROP TEMPORARY TABLE binary logging
in case the target table has not been "CREATEed" in binlog (no CREATE
Query-log-event was logged into the binary log).

It turns out that

1. the rule did not cover non-existing table DROPped with IF-EXISTS clause.
   The logged-create knowledge for the non-existing one does not even need
   MDEV-5589 patch, and

2. connection close disobeys it to trigger automatic DROP-IF-EXISTS
   binlogging.

Either 1 or 2 or even both is/are also responsible for unexpected binlog
records observed in MDEV-17863, actually rendering a referred
@@global.read_only irrelevant as far as the described stored procedure
definition *and* the ROW binlog-format are concerned.
2019-07-25 11:38:45 +05:30
Sachin
bdc961acc7 MDEV-17588 replicate-do filters cause errors when creating filtered-out tables on master with syntax unsupported on slave
Add only test case for MDEV , because MDEV-19653 already fixed the issue.
2019-07-10 11:51:43 +05:30
Sujatha
49ec78ab70 MDEV-19370: rpl.kill_race_condition failed in buildbot with Wrong value for slave parameter
Problem:
=======
Executing test with following options will result in test failure.

./mtr rpl.kill_race_condition{,,,,,,,,,,} --repeat=10 --par 12 --mem

Fix:
====
Test simulates applier thread kill scenario while applying a row event.  But it
doesn't wait for applier to catch the error stop.

Added :wait_for_slave_sql_error.inc to catch the error.

Test uses START SLAVE as a final step and doesn't wait for both threads to
start.

Added: start_slave.inc
2019-06-28 19:54:15 +03:00
Andrei Elkin
5c1f51835a MDEV-19904 poradic buildbot failures in rpl.rpl_semi_sync_uninstall_plugin
The test allowed non-deterministic execution thanks to unresetable status
var of Slave_connections.

Fixed with expecting a correct value for Slaves_connected.
2019-06-28 18:34:58 +03:00
Sujatha
b347396181 MDEV-11094: Blackhole table updates on slave fail when row annotation is enabled
Problem:
=======
rpl_blackhole.test fails when executed with following options
mysqld=--binlog_annotate_row_events=1, mysqld=--replicate_annotate_row_events=1

Test output:
------------
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
rpl.rpl_blackhole_bug 'mix'              [ pass ]    791
rpl.rpl_blackhole_bug 'row'              [ fail ]
Replicate_Wild_Ignore_Table
Last_Errno	1032
Last_Error	Could not execute Update_rows_v1 event on table test.t1; Can't find
record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's
master log master-bin.000001, end_log_pos 1510

Analysis:
=========
Enabling "replicate_annotate_row_events" on slave, Tells the slave to write
annotate rows events received from the master to its own binary log. The
received annotate events are applied after the Gtid event as shown below.
thd->query() will be set to the actual query received from the master, through
annotate event. Annotate_rows event should not be deleted after the event is
applied as the thd->query will be used to generate new Annotate_rows event
during applying the subsequent Rows events. After the last Rows event has been
applied, the saved Annotate_rows event (if any) will be deleted.

In balckhole engine all the DML operations are noops as they donot store any
data. They simply return success without doing any operation. But the existing
strictly expects thd->query() to be 'NULL' to identify that row based
replication is in use. This assumption will fail when row annotations are
enabled as the query is not 'NULL'. Hence various row based operations like
'update', 'delete', 'index lookup' will fail when row annotations are enabled.

Fix:
===
Extend the row based replication check to include row annotations as well.
i.e Either the thd->query() is NULL or thd->query() points to query and row
annotations are in use.
2019-05-29 15:18:52 +05:30
Andrei Elkin
aaf53ea0b6 MDEV-17948 Assertion `thd_killed(thd) || !m_active_tranxs ..
Simulation of a big-sized event in rpl.rpl_semi_sync_skip_repl did not clean
up after itself so screw the last binlog event offset which could jump
backwards.
The test is refined to rotate a binlog file with simulation and use the next
one for logics of the test incl master-slave synchonization.
2019-05-24 17:30:35 +03:00
Sujatha
47637a3dd1 MDEV-11095: rpl.rpl_row_mysqlbinlog test fails if row annotation enabled
Problem:
=======
Whel rpl.rpl_row_mysqlbinlog test is executed as shown below it fails with
result content mismatch.

perl mtr rpl_row_mysqlbinlog --mysqld=--binlog-annotate-row-events=1

Analysis:
=========
When row annotations are enabled the actual query is written into the binlog
which helps users to understand the query, even when row based replication is
enabled.

For example: Simple insert in row based replication looks like shown below.

#190402 16:31:27 server id 1  end_log_pos 526 	Annotate_rows:
#Q> insert into t values (10)
#190402 16:31:27 server id 1  end_log_pos 566 	Table_map: `test`.`t` mapped to number 19
# at 566
#190402 16:31:27 server id 1  end_log_pos 600 	Write_rows: table id 19 flags: STMT_END_F

BINLOG '
B0GjXBMBAAAAKAAAADYCAAAAABMAAAAAAAEABHRlc3QAAXQAAQMAAQ==
B0GjXBcBAAAAIgAAAFgCAAAAABMAAAAAAAEAAf/+CgAAAA==
'/*!*/;
# at 600

The test creates some binary log events and redirects them into a SQL file.
Executes RESET MASTER and sources the SQL file back on clean master and verifies
that the data is available. Please refer following steps.

../client/mysqlbinlog ./var/mysqld.1/data/master-bin.000001 > test.sql
../client/mysql -uroot -S./var/tmp/mysqld.1.sock -Dtest  < test.sql
../client/mysqlbinlog ./var/mysqld.1/data/master-bin.000001 -v > row.sql

When the row based replication specific SQL file is sourced once again on master
the newly generated binlog will treat the entire "BASE 64" encoded event as
query and write it into the binary log.

Output from 'row.sql':

#Q> BINLOG '
#Q> B0GjXBMBAAAAKAAAADYCAAAAABMAAAAAAAEABHRlc3QAAXQAAQMAAQ==
#Q> B0GjXBcBAAAAIgAAAFgCAAAAABMAAAAAAAEAAf/+CgAAAA==
#190402 16:31:27 server id 1  end_log_pos 657 	Table_map: `test`.`t` mapped to number 23
# at 657
#190402 16:31:27 server id 1  end_log_pos 691 	Write_rows: table id 23 flags: STMT_END_F

BINLOG '
B0GjXBMBAAAAKAAAAJECAAAAABcAAAAAAAEABHRlc3QAAXQAAQMAAQ==
B0GjXBcBAAAAIgAAALMCAAAAABcAAAAAAAEAAQH+CgAAAA==
### INSERT INTO `test`.`t`
### SET
###   @1=10
'/*!*/;
# at 691


This is expected behaviour as we cannot extract query from BASE 64 encoded
input. This causes more number of binary logs to be generated when the test is
executed with row annotations.

The following lines from test assumes that only two binary logs will contain
entire data.

 --echo --- Test 4 Second Remote test --
---exec $MYSQL_BINLOG --read-from-remote-server --user=root --host=127.0.0.1
	--port=$MASTER_MYPORT master-bin.000001 > $MYSQLTEST_VARDIR/tmp/remote.sql
---exec $MYSQL_BINLOG --read-from-remote-server --user=root --host=127.0.0.1
	--port=$MASTER_MYPORT master-bin.000002 >> $MYSQLTEST_VARDIR/tmp/remote.sql

In a case when row annotations are enabled the data gets spread across four
binary logs. As test uses only the first two binary log files, data available in
other binary logs gets missed. Hence test fails with result content mismatch as
less data is avaialble.

Fix:
====
Use "-to-the-last" option of "mysqlbinlog" tool which will ensure that all the
available binary log specific contents are included in .sql file.
2019-05-14 13:03:06 +05:30
Vicențiu Ciorbaru
cb248f8806 Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru
5543b75550 Update FSF Address
* Update wrong zip-code
2019-05-11 21:29:06 +03:00
Vladislav Vaintroub
e8778f1c7c MDEV-19265 Server should throw warning if event is created and event_scheduler = OFF 2019-04-28 12:49:59 +02:00
Sergei Golubchik
2ce52790ff Merge branch '5.5' into 10.1 2019-04-26 14:02:37 +02:00
Sergei Golubchik
33fe3b58cc test case for a previous commit 2019-04-25 21:43:27 +02:00
Sergei Golubchik
1a4746e128 Merge branch '5.5' into 10.1 2019-03-27 19:35:03 +01:00
Marko Mäkelä
a6585d5ce9 Merge 10.0 into 10.1 2019-03-27 11:56:08 +02:00
Sujatha Sivakumar
f2d549d8db MDEV-14784: Slave crashes in show_status_array upon running a trigger with
select from I_S

Problem:
========
When applier thread tries to access 'variable_name' of
INFORMATION_SCHEMA.SESSION_VARIABLES table through triggers, it results in an
abnormal exit of slave server.

Analysis:
========
At the time of replication of stored routines and triggers, their associated
security context will be sent by the master. The applier thread on the slave
server will use this information to set the required security context for the
execution of stored routines and triggers. This is achieved as follows.

->The stored routine object has a member named 'm_security_ctx' which holds the
  security context received from master.
->The applier thread's security_ctx is stored into a 'backup' object.

->Set the applier thread's security_ctx to 'm_security_ctx'.

->Upon the completion of stored routine execution restore the original security
  context of applier thread from the backup.

During the above process the 'm_security_ctx' object is not initialized
properly. Hence the 'external_user' of 'm_security_ctx' has invalid value for
this variable and accessing this variable results in abnormal exit of server.

Fix:
===
Invoke the Security_context::init() call from the constructor of stored routine
so that 'm_security_ctx' gets initialized properly.
2019-03-27 12:34:03 +05:30
Marko Mäkelä
1933cf98e8 Merge 5.5 into 10.0 2019-03-26 14:13:46 +02:00
sachin
778c525ff8 MDEV-17119 replicate_rewrite_db does not work for single chardatabase name
Fixed issue in logic.
2019-03-22 12:13:24 +05:30
Marko Mäkelä
213ece2f2e Merge 10.1 into 10.1
This is joint work with Oleksandr Byelkin.
2019-02-02 13:00:15 +02:00
Andrei Elkin
b22354680e merge 10.0 -> 10.1 to resolve MDEV-17803 conflicts. 2019-01-23 20:16:21 +02:00
Sergei Golubchik
7886a70ef9 MDEV-17421: mtr does not restart the server whose parameters were changed
remove tests that rely on specific execution order
2019-01-23 17:18:57 +01:00
Jan Lindström
2d098933ee
Merge pull request #880 from tempesta-tech/sysprg/MDEV-17421
MDEV-17421: mtr does not restart the server whose parameters were changed
2019-01-23 18:03:51 +02:00
Andrei Elkin
cce2b45c8f MDEV-17803 Row-based event is not applied when table map id is greater
32 bit int

Row-based slave applier could not parse correctly the table id when
the value exceeded the max of 32 bit unsigned int.
The reason turns out in that the being parsed value placeholder
was sized as 4 bytes.

The type is fixed to ulonglong.

Additionally the patch works around Rows_log_event::m_table_id 4 bytes
size on 32 bits platforms. In case of last_table_id value overflows
the 4 byte max, there won't be the zero value for m_table_id generated
and the first wrapped-around value is one, this is thanks to excluding
UINT_MAX32 + 1 from TABLE_SHARE::table_map_id.
2019-01-23 15:48:06 +02:00
Sergei Golubchik
aeefd26ecb Merge branch '10.0' into 10.1 2018-12-29 23:44:45 +01:00
Sachin
f16d4d4c6e MDEV-17720 slave_ddl_exec_mode=IDEMPOTENT does not handle DROP DATABASE
Relevant if exists flag are added for create database and drop database.
2018-12-19 23:47:26 +05:30
Sergei Golubchik
44f6f44593 Merge branch '10.0' into 10.1 2018-10-30 15:10:01 +01:00
Sergei Golubchik
f4b8b6b9a3 MDEV-15919 lower_case_table_names does not behave as expected
followup for e31e697f17

Fix the test not to fail on Mac OS X
(lower_case_table_names=0 prevents mysqld from starting on case insensitive
filesystem)
2018-10-29 21:44:38 +01:00
Sergei Golubchik
37ab7e4596 Merge branch '5.5' into 10.0 2018-10-27 20:46:38 +02:00
Sachin
e31e697f17 MDEV-15919 lower_case_table_names does not behave as expected(nor...
consistently) on Replication Slave

lower_case_table_names  0 -> 1 replication works, it's safe as long as
 mixed case names mapping to the lower case ones is one-to-one
2018-10-17 10:46:20 +05:30
Kristian Nielsen
bc2903e744 Merge branch 'gtid_table_garbage_rows' into 10.1 2018-10-13 19:31:42 +02:00
Julius Goryavsky
1eb364f8b3 MDEV-17421: mtr does not restart the server whose parameters were changed
If a mtr test runs multiple servers and only some of them get
restarted on whatever reason with new command-line parameters,
then subsequent mtr test may fail, because no cleanup is performed.
Replication and Galera test suites are affected.

In the mtr script, there is a server_need_restart function
that decides whether we need to start a new mysqld process before
the new (next) test. If the mysqld parameters were changed in the
previous test - not necessarily the parameters of the primary mysqld
server, maybe even the secondary server parameters - this function
decides to start a new mysqld process. But since it does not remove
the old (changed) parameters, the new process starts with the
parameters changed by the *previous* test.

To correct this error, we must delete the modified process
parameters after checking that they have been changed during
the previous test.

This patch also simplifies and makes more stable the
galera_drop_database test, during debugging of which this
problem was detected.

https://jira.mariadb.org/browse/MDEV-17421
2018-10-10 17:16:34 +02:00
Kristian Nielsen
2f4a0c5be2 Fix accumulation of old rows in mysql.gtid_slave_pos
This would happen especially in optimistic parallel replication, where there
is a good chance that a transaction will be rolled back (due to conflicts)
after it has executed record_gtid(). If the transaction did any deletions of
old rows as part of record_gtid(), those deletions will be undone as well.
And the code did not properly ensure that the deletions would be re-tried.

This patch makes record_gtid() remember the list of deletions done as part
of a transaction. Then in rpl_slave_state::update() when the changes have
been committed, we discard the list. However, in case of error and rollback,
in cleanup_context() we will instead put the list back into
rpl_global_gtid_slave_state so that the deletions will be re-tried later.

Probably fixes part of the cause of MDEV-12147 as well.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2018-10-07 18:59:52 +02:00
Sachin
865237e5af Fix rpl_parallel_optimistic_nobinlog failure on binlog 2018-10-01 15:15:34 +05:30
Oleksandr Byelkin
31081593aa Merge branch '11.0' into 10.1 2018-09-06 22:45:19 +02:00
Sergei Golubchik
09bc99fac9 cleanup: remove extra/rpl_tests/rpl_foreign_key.test 2018-09-04 09:49:53 +02:00
Oleksandr Byelkin
b4210f3640 Merge branch '10.0' into 10.1 2018-08-21 10:07:26 +02:00
Oleksandr Byelkin
bcc677bb72 Merge branch '5.5' into 10.0 2018-08-15 16:48:13 +02:00
sachin
68ebfb31f2 MDEV-16166 RBR breaks with HA_ERR_KEY_NOT_FOUND upon DELETE from table...
with spatial index

So the issue is since it is spatial index , at the time of searching index
for key (Rows_log_event::find_row) we use wrong field image we use
Field::itRAW while we should be using Field::itMBR
2018-08-04 01:39:35 +05:30
Sergei Golubchik
bb24663f5a MDEV-13577 slave_parallel_mode=optimistic should not report the mode's specific temporary errors
Revert 7bbe324fc1
Fix the bug differently (in log_event.cc)
Fix the test case to actually fail without the bug fix
2018-06-20 11:10:27 +02:00
Andrei Elkin
7bbe324fc1 MDEV-13577 slave_parallel_mode=optimistic should not report the mode's
specific temporary errors

The optimistic parallel slave's worker thread could face a run-time error due to
the algorithm's specifics which allows for conflicts like the reported
"Can't find record in 'table'".
A typical stack is like

{noformat}
#0  handler::print_error (this=0x61c00008f8a0, error=149, errflag=0) at handler.cc:3650
#1  0x0000555555e95361 in write_record (thd=thd@entry=0x62a0000a2208, table=table@entry=0x61f00008ce88, info=info@entry=0x7fffdee356d0) at sql_insert.cc:1944
#2  0x0000555555ea7767 in mysql_insert (thd=thd@entry=0x62a0000a2208, table_list=0x61b00012ada0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=<optimized out>, ignore=<optimized out>) at sql_insert.cc:1039
#3  0x0000555555efda90 in mysql_execute_command (thd=thd@entry=0x62a0000a2208) at sql_parse.cc:3927
#4  0x0000555555f0cc50 in mysql_parse (thd=0x62a0000a2208, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at sql_parse.cc:7449
#5  0x00005555566d4444 in Query_log_event::do_apply_event (this=0x61200005b9c8, rgi=<optimized out>, query_arg=<optimized out>, q_len_arg=<optimized out>) at log_event.cc:4508
#6  0x00005555566d639e in Query_log_event::do_apply_event (this=<optimized out>, rgi=<optimized out>) at log_event.cc:4185
#7  0x0000555555d738cf in Log_event::apply_event (rgi=0x61d0001ea080, this=0x61200005b9c8) at log_event.h:1343
#8  apply_event_and_update_pos_apply (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080, reason=<optimized out>) at slave.cc:3479
#9  0x0000555555d8596b in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080) at slave.cc:3623
#10 0x00005555562aca83 in rpt_handle_event (qev=qev@entry=0x6190000fa088, rpt=rpt@entry=0x62200002bd68) at rpl_parallel.cc:50
#11 0x00005555562bd04e in handle_rpl_parallel_thread (arg=arg@entry=0x62200002bd68) at rpl_parallel.cc:1258
{noformat}

Here {{handler::print_error}} computes whether to error log the
current error when --log-warnings > 1. The decision flag is consulted
bu {{my_message_sql()}} which can be eventually called.
In the bug case the decision is to log.
However in the optimistic mode slave applier case any conflict is
attempted to resolve with rollback and retry to success. Hence the
logging is at least extraneous.

The case is fixed with adding a new flag {{ME_LOG_AS_WARN}} which
{{handler::print_error}} may propagate further on through {{my_error}}
when the error comes from an optimistically running slave worker thread.

The new flag effectively requests the warning level for the errlog record,
while the thread's DA records the actual error (which is regarded as temporary one
by the parallel slave error handler).
2018-06-12 15:29:16 +03:00
Monty
e744c687ca Merge remote-tracking branch 'origin/10.0' into 10.1 2018-05-24 11:08:02 +03:00
Monty
da71c1bad7 MDEV-16229 Replication aborts with ER_VIEW_SELECT_TMPTABLE after half-failed RENAME
Problem was that detection of temporary tables was all wrong for
RENAME TABLE.
(Temporary tables where opened by top level call to
open_temporary_tables(), which can't detect if a temporary table
was renamed to something and then reused).

Fixed by adding proper parsing of rename list to check against
the current name of a table at each rename stage.
Also change do_rename_temporary() to check against the current
state of temporary tables, not according to the state of start
of RENAME TABLE.
2018-05-22 23:05:48 +03:00
Sergei Golubchik
1bc3899a52 Merge branch '10.0' into 10.1 2018-05-08 17:08:23 +02:00