Before this patch, semisync assumed transactions running in parallel
can not be larger than max_connections, but this is not true when
the event scheduler is executing events, and cause semisync run out
of preallocated transaction nodes.
Fix the problem by allocating transaction nodes dynamically.
This patch also fixed a possible deadlock when running UNINSTALL
PLUGIN rpl_semi_sync_master and updating in parallel. Fixed by
releasing the internal Delegate lock before unlock the plugins.
mysql-test/suite/rpl/t/rpl_semi_sync_event.test:
Add test case for bug#49020
plugin/semisync/semisync_master.cc:
Allocating TranxNode dynamically
plugin/semisync/semisync_master.h:
Allocating TranxNode dynamically
sql/rpl_handler.cc:
Unlock plugins after we have released the Delegate lock to avoid possible deadlock when uninstalling semisync master plugin and doing update in parallel.
The 'slave_patternload_file' is assigned to the real path of the load data file
when initializing the object of Relay_log_info. But the path of the load data
file is not formatted to real path when executing event from relay log. So the
error will be encountered if the path of the load data file is a symbolic link.
Actually the global 'opt_secure_file_priv' is not formatted to real path when
loading data from file. So the same thing will happen too.
To fix these errors, the path of the load data file should be formatted to
real path when executing event from relay log. And the 'opt_secure_file_priv'
should be formatted to real path when loading data infile.
mysql-test/suite/rpl/r/rpl_loaddata_symlink.result:
Test result for bug#43913.
mysql-test/suite/rpl/t/rpl_loaddata_symlink-master.sh:
Added the test file to create a link from $MYSQLTEST_VARDIR/std_data
to $MYSQLTEST_VARDIR/std_data_master_link
mysql-test/suite/rpl/t/rpl_loaddata_symlink-slave.sh:
Added the test file to create a link from $MYSQLTEST_VARDIR/std_data
to $MYSQLTEST_VARDIR/std_data_slave_link
mysql-test/suite/rpl/t/rpl_loaddata_symlink.test:
Added the test file to verify if loading data infile will work fine
if the path of the load data file is a symbolic link.
sql/rpl_rli.cc:
Added call 'my_realpath' function for avoiding sometimes the 'fn_format'
function can't format real path rightly.
<tmp_tbl> with RBL
When binlogging the statement, the server always handle the existing
object as a table, even though it is a view. However a view is
handled differently in other parts of the code thus leading the
statement to crash in RBL if the view exists.
This happens because the underlying tables for the view are not opened
when we try to call store_create_info() on the view in order to build
a CREATE TABLE statement.
This patch will only address the crash problem, other binlogging
problems related to CREATE TABLE IF NOT EXISTS LIKE when the existing
object is a view will be solved by BUG 47442.
In RBR, All statements operating on temporary tables should not be binlogged.
Despite this fact, after executing 'TRUNCATE... ' on a temporary table,
the command is still logged, even if in row-based mode. Consequently, this raises
problems in the slave as the table may not exist, resulting in an
execution failure. Ultimately, this causes the slave to report
an error and abort.
After this patch, 'TRUNCATE ...' statement on a temporary table will not be
binlogged in RBR.
Problem: Some system functions that could return different values on
master and slave were not marked unsafe. In particular:
GET_LOCK
IS_FREE_LOCK
IS_USED_LOCK
MASTER_POS_WAIT
RELEASE_LOCK
SLEEP
SYSDATE
VERSION
Fix: Mark these functions unsafe.
mysql-test/extra/rpl_tests/rpl_stm_000001.test:
- The test does not work in mixed mode any more, since it tries to
simulate an error in the sql thread in a query that uses get_lock.
Since get_lock now causes the query to be logged in row format,
the error didn't happen. Hence, we now force statement mode.
- Warnings must be disabled when the unsafe query is issued.
- Replaced some save_master_pos+connection slave+sync_with_master
by sync_slave_with_master.
mysql-test/suite/binlog/r/binlog_stm_mix_innodb_myisam.result:
updated result file
mysql-test/suite/binlog/r/binlog_stm_row.result:
updated result file
mysql-test/suite/binlog/r/binlog_unsafe.result:
updated result file
mysql-test/suite/binlog/t/binlog_killed.test:
binlog_killed only works in statement format now, since
it switches to row mode in mixed mode.
mysql-test/suite/binlog/t/binlog_stm_mix_innodb_myisam.test:
suppress warnings for unsafe statements
mysql-test/suite/binlog/t/binlog_stm_row.test:
- Suppress warnings in test that causes warnings.
- The test sets binlog format explicitly, so no need to execute it
twice.
mysql-test/suite/binlog/t/binlog_unsafe.test:
Added test for all unsafe system functions. This test also includes
system functions that were unsafe prior to BUG#47995.
mysql-test/suite/rpl/r/rpl_err_ignoredtable.result:
updated result file
mysql-test/suite/rpl/r/rpl_get_lock.result:
updated result file
mysql-test/suite/rpl/r/rpl_nondeterministic_functions.result:
new result file
mysql-test/suite/rpl/r/rpl_stm_000001.result:
updated result file
mysql-test/suite/rpl/r/rpl_trigger.result:
updated result file
mysql-test/suite/rpl/t/rpl_err_ignoredtable.test:
- suppress warnings for unsafe statement
- replaced save_master_pos+connection slave+sync_with_master
with sync_slave_with_master
mysql-test/suite/rpl/t/rpl_get_lock.test:
update test case that causes new warnings
mysql-test/suite/rpl/t/rpl_nondeterministic_functions.test:
Added new test case for nondeterministic functions.
mysql-test/suite/rpl/t/rpl_trigger.test:
update test case that causes new warnings
sql/item_create.cc:
Marked some system functions unsafe.
sql/item_strfunc.cc:
Clarified comment related to this bug.
sql/sql_yacc.yy:
Marked sysdate unsafe.
The 'rpl_get_master_version_and_clock' test verifies if the slave I/O
thread tries to reconnect to master when it tries to get the values of
the UNIX_TIMESTAMP, SERVER_ID from master under network disconnection.
So the master server is restarted for making the transient network
disconnection. Restarting master server can bring two problems as following:
1. The time out error is encountered sporadically. The slave I/O thread tries
to reconnect master ten times, which is set in my.cnf. So in the test
framework sporadically the slave I/O thread really stoped when it can't
reconnect to master in the ten times successfully before the master starts,
then the time out error will be encountered while waiting for the slave to
start.
2. These warnings and errors are produced in server log file when
the slave I/O thread tries to get the values of the UNIX_TIMESTAMP,
SERVER_ID from master under the transient network disconnection.
To fix problem 1, increase the master retry count to sixty times,
so that the slave I/O thread has enough time to reconnect master
successfully.
To fix problem 2, suppress these warnings and errors by mtr suppression,
because they are expected.
mysql-test/suite/rpl/t/rpl_get_master_version_and_clock-slave.opt:
Added the *.opt file for increasing master retry count to
sixty times.
mysql-test/suite/rpl/t/rpl_get_master_version_and_clock.test:
Added mtr suppression for suppressing warnings and errors
in server log file.
binlog, replication aborts
In SBR or MBR, the schema name is not being written to the binlog
when executing a LOAD DATA statement. This becomes a problem when
the current database (lets call it db1) is different from the
table's schema (lets call it db2). For instance, take the
following statements:
use db1;
load data local infile 'infile.txt' into table db2.t
Should this statement be logged without t's schema (db2), when
replaying it, one can get db1.t populated instead of db2.t (if
db1.t exists). On the other hand, if there is no db1.t at all,
replication will stop.
We fix this by always logging the table (in load file) with fully
qualified name when its schema is different from the current
database or when no default database was selected.
CMakeLists.txt:
Add plugin/semisync subdirectory
mysql-test/mysql-test-run.pl:
Check for semisync dll for Windows
mysql-test/suite/rpl/r/rpl_semi_sync.result:
Update result file
mysql-test/suite/rpl/t/rpl_semi_sync.test:
Test semi-sync on Windows
plugin/semisync/semisync_master.cc:
Define gettimeofday for Windows
Conflicts
=========
Text conflict in .bzr-mysql/default.conf
Text conflict in libmysqld/CMakeLists.txt
Text conflict in libmysqld/Makefile.am
Text conflict in mysql-test/collections/default.experimental
Text conflict in mysql-test/extra/rpl_tests/rpl_row_sp006.test
Text conflict in mysql-test/suite/binlog/r/binlog_tmp_table.result
Text conflict in mysql-test/suite/rpl/r/rpl_loaddata.result
Text conflict in mysql-test/suite/rpl/r/rpl_loaddata_fatal.result
Text conflict in mysql-test/suite/rpl/r/rpl_row_create_table.result
Text conflict in mysql-test/suite/rpl/r/rpl_row_sp006_InnoDB.result
Text conflict in mysql-test/suite/rpl/r/rpl_stm_log.result
Text conflict in mysql-test/suite/rpl_ndb/r/rpl_ndb_circular_simplex.result
Text conflict in mysql-test/suite/rpl_ndb/r/rpl_ndb_sp006.result
Text conflict in mysql-test/t/mysqlbinlog.test
Text conflict in sql/CMakeLists.txt
Text conflict in sql/Makefile.am
Text conflict in sql/log_event_old.cc
Text conflict in sql/rpl_rli.cc
Text conflict in sql/slave.cc
Text conflict in sql/sql_binlog.cc
Text conflict in sql/sql_lex.h
21 conflicts encountered.
NOTE
====
mysql-5.1-rpl-merge has been made a mirror of mysql-next-mr:
- "mysql-5.1-rpl-merge$ bzr pull ../mysql-next-mr"
This is the first cset (merge/...) committed after pulling
from mysql-next-mr.
Backporting BUG#43789 to mysql-5.1-bugteam
The replication was generating corrupted data, warning messages on Valgrind
and aborting on debug mode while replicating a "null" to "not null" field.
Specifically the unpack_row routine, was considering the slave's table
definition and trying to retrieve a field value, where there was nothing to be
retrieved, ignoring the fact that the value was defined as "null" by the master.
To fix the problem, we proceed as follows:
1 - If it is not STRICT sql_mode, implicit default values are used, regardless
if it is multi-row or single-row statement.
2 - However, if it is STRICT mode, then a we do what follows:
2.1 If it is a transactional engine, we do a rollback on the first NULL that is
to be set into a NOT NULL column and return an error.
2.2 If it is a non-transactional engine and it is the first row to be inserted
with multi-row, we also return the error. Otherwise, we proceed with the
execution, use implicit default values and print out warning messages.
Unfortunately, the current patch cannot mimic the behavior showed by the master
for updates on multi-tables and multi-row inserts. This happens because such
statements are unfolded in different row events. For instance, considering the
following updates and strict mode:
(master)
create table t1 (a int);
create table t2 (a int not null);
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
t1 would have (10) and t2 would have (0) as this would be handled as a
multi-row update. On the other hand, if we had the following updates:
(master)
create table t1 (a int);
create table t2 (a int);
(slave)
create table t1 (a int);
create table t2 (a int not null);
(master)
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
On the master t1 would have (10) and t2 would have (NULL). On
the slave, t1 would have (10) but the update on t1 would fail.
Backporting BUG#38173 to mysql-5.1-bugteam
The reason of the bug was incompatibile with the master side behaviour.
INSERT query on the master is allowed to insert into a table without specifying
values of DEFAULT-less fields if sql_mode is not strict.
Fixed with checking sql_mode by the sql thread to decide how to react.
Non-strict sql_mode should allow Write_rows event to complete.
todo: warnings can be shown via show slave status, still this is a
separate rather general issue how to show warnings for the slave threads.
A fix and a test case for Bug#34898 "mysql_info() reports 0 warnings
while mysql_warning_count() reports 1"
Review the patch by Chad Miller, implement review comments
(since Chad left) and push the patch.
This bug is actually not a bug. At least according to Monty.
See Bug#841 "wrong number of warnings" reported back in July 2003
and closed as "not a bug".
mysql_info() was printing the number of truncated columns, not
the number of warnings.
But since the message of mysql_info() was "Warnings: <number of truncated
columns>", people would expect to get the number
of warnings in it, not the number of truncated columns.
So a possible fix would be to change the message of mysql_info()
to say Rows changed: <n>, truncated: <m>.
Instead, put the number of warnings there. That is, remove the
feature that thd->cuted_fields (the number of truncated fields)
is exposed to the client. The number of truncated columns can be
calculated on the client, by analyzing SHOW WARNINGS output,
and in future we may remove thd->cuted_fields altogether.
So let's have one less thing to worry about.
client/mysqltest.cc:
Fix a bug in mysqltest program which used to return
a wrong number of affected rows in ps-protocol, and a wrong
mysql_info() information in both protocols in presence of warnings.
mysql-test/r/insert.result:
Update results (Bug#34898)
mysql-test/suite/rpl/r/rpl_udf.result:
Update to the changed output of mysqltest: mysql_info() is now printed
before warnings.
mysql-test/t/insert.test:
Add a test case for Bug#34898.
sql/sql_table.cc:
A fix for Bug#34898 - report statement warn count, not the
number of truncated values in mysql_info().
sql/sql_update.cc:
A fix for Bug#34898 - report statement warn count, not the
number of truncated values in mysql_info().
Add an option to control whether the master should keep waiting
until timeout when it detected that there is no semi-sync slave
available.
The bool option 'rpl_semi_sync_master_wait_no_slave' is 1 by
defalt, and will keep waiting until timeout. When set to 0, the
master will switch to asynchronous replication immediately when
no semi-sync slave is available.
Semi-sync status were not reset by FLUSH STATUS, this was because
all semi-sync status variables are defined as SHOW_FUNC and FLUSH
STATUS could only reset SHOW_LONG type variables.
This problem is fixed by change all status variables that should
be reset by FLUSH STATUS from SHOW_FUNC to SHOW_LONG.
After the fix, the following status variables will be reset by
FLUSH STATUS:
Rpl_semi_sync_master_yes_tx
Rpl_semi_sync_master_no_tx
Note: normally, FLUSH STATUS itself will be written into binlog
and be replicated, so after FLUSH STATS, one of
Rpl_semi_sync_master_yes_tx
Rpl_semi_sync_master_no_tx
can be 1 dependent on the semi-sync status. So it's recommended
to use FLUSH NO_WRITE_TO_BINLOG STATUS to avoid this.
Semi-sync uses an extra connection from slave to master to send
replies, this is a normal client connection, and used a normal
SET query to set the reply information on master, which is visible
to user and may cause some confusion and complaining.
This problem is fixed by using the method of sending reply by
using the same connection that is used by master dump thread to
send binlog to slave. Since now the semi-sync plugins are integrated
with the server code, it is not a problem to use the internal net
interfaces to do this.
The master dump thread will mark the event requires a reply and
wait for the reply when the event just sent is the last event
of a transaction and semi-sync status is ON; And the slave will
send a reply to master when it received such an event that requires
a reply.
mysql-test/suite/rpl/r/rpl_stop_middle_group.result:
the new result file
mysql-test/suite/rpl/t/rpl_stop_middle_group.test:
renamed from rpl_row_stop_middle_update and added a regression test for bug#45940.
The issue appears when number of heartbeat events non-zero before start of test
block. But really we need to check that no new events has received during test block.
So I did following:
1. Replace absolute values by diff of values
2. Increase heartbeat period from 1.5 to 5 sec
Let
- T be a transactional table and N non-transactional table.
- B be begin, C commit and R rollback.
- N be a statement that accesses and changes only N-tables.
- T be a statement that accesses and changes only T-tables.
In RBR, changes to N-tables that happen early in a transaction are not immediately flushed
upon committing a statement. This behavior may, however, break consistency in the presence
of concurrency since changes done to N-tables become immediately visible to other
connections. To fix this problem, we do the following:
. B N N T C would log - B N C B N C B T C.
. B N N T R would log - B N C B N C B T R.
Note that we are not preserving history from the master as we are introducing a commit that
never happened. However, this seems to be more acceptable than the possibility of breaking
consistency in the presence of concurrency.
CHANGE MASTER TO command required the value for RELAY_LOG_FILE to
be an absolute path, which was different from the requirement of
MASTER_LOG_FILE.
This patch fixed the problem by changing the value for RELAY_LOG_FILE
to be the basename of the log file as that for MASTER_LOG_FILE.
The problem is that there is only one autoinc value associated with
the query when binlogging. If more than one autoinc values are used
in the query, the autoinc values after the first one can be inserted
wrongly on slave. So these autoinc values can become inconsistent on
master and slave.
The problem is resolved by marking all the statements that invoke
a trigger or call a function that updated autoinc fields as unsafe,
and will switch to row-format in Mixed mode. Actually, the statement
is safe if just one autoinc value is used in sub-statement, but it's
impossible to check how many autoinc values are used in sub-statement.)
mysql-test/suite/rpl/r/rpl_auto_increment_update_failure.result:
Test result for bug#45677
mysql-test/suite/rpl/t/rpl_auto_increment_update_failure.test:
Added test to verify the following two properties:
P1) insert/update in an autoinc column causes statement to
be logged in row format if binlog_format=mixed
P2) if binlog_format=mixed, and a trigger or function contains
two or more inserts/updates in a table that has an autoinc
column, then the slave should not go out of sync, even if
there are concurrent transactions.
sql/sql_base.cc:
Added function 'has_write_table_with_auto_increment' to check
if one (or more) write tables have auto_increment columns.
Removed function 'has_two_write_locked_tables_with_auto_increment',
because the function is included in function
'has_write_table_with_auto_increment'.
We cann connect() in a non-blocking mode to be able to specify a
non-standard timeout.
The problem was that we did not fetch the status from the
non-blocking connect(). We assumed that poll() would not return
a POLLIN flag if the connect failed. But on some platforms this
is not true.
After a successful poll() we do now retrieve the status value
from connect() with getsockopt(...SO_ERROR...). Now we do know
if (and how) the connect failed.
The test case for my investigation was rpl.rlp_ssl1 on an
Ubuntu 9.04 x86_64 machine. Both, IPV4 and IPV6 were active.
'localhost' resolved first for IPV6 and then for IPV4. The
connection over IPV6 was blocked. rpl.rlp_ssl1 timed out
as it did not notice the failed connect(). The first read()
failed, which was interpreted as a master crash and the
connection was tried to reestablish with the same result
until the retry limit was reached.
With the fix, the connect() problem is immediately recognized,
and the connect() is retried on the second resolution for
'localhost', which is successful.
libmysqld/libmysqld.c:
Bug#37267 - connect() EINPROGRESS failures mishandled in client library
Changed a DBUG print string to distinguish the two mysql_real_connect()
implementations in DBUG traces.
mysql-test/include/wait_for_slave_param.inc:
Bug#37267 - connect() EINPROGRESS failures mishandled in client library
Made timeout value available in error message.
mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result:
Bug#37267 - connect() EINPROGRESS failures mishandled in client library
Fixed test result. Connect error is now detected as CR_CONN_HOST_ERROR
(2003) instead of CR_SERVER_LOST (2013).
sql-common/client.c:
Bug#37267 - connect() EINPROGRESS failures mishandled in client library
Added retrieval of the error code from the non-blocking connect().
Added DBUG.
Added comment.
NOTE: Backporting the patch to next-mr.
The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
Although such solution has been proposed to reduce the probability of corrupted
files due to a slave-crash, the performance penalty introduced by it has
made the approach impractical for highly intensive workloads.
In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
this is the main source of performance issues.
This patch introduces new options that give more control to the user on
what should be fsynced and how often:
1) (--sync-master-info, integer) which syncs the master.info after #th event;
2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th
events.
3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
transactions.
To provide both performance and increased reliability, we recommend the following
setup:
1) --sync-master-info = 0 eventually the operating system will fsync it;
2) --sync-relay-log = 0 eventually the operating system will fsync it;
3) --sync-relay-log-info = 1 fsyncs it after every transaction;
Notice, that the previous setup does not reduce the probability of
corrupted master.info and relay-log.bin*. To overcome the issue, this patch also
introduces a recovery mechanism that right after restart throws away relay-log.bin*
retrieved from a master and updates the master.info based on the relay.info:
4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
throws away relay-log.bin* after a crash.
However, it can only recover the incorrect binlog file and position in master.info,
if other informations (host, port password, etc) are corrupted or incorrect,
then this recovery mechanism will fail to work.
vs not null
NOTE: Backporting the patch to next-mr.
The replication was generating corrupted data, warning messages on Valgrind
and aborting on debug mode while replicating a "null" to "not null" field.
Specifically the unpack_row routine, was considering the slave's table
definition and trying to retrieve a field value, where there was nothing to be
retrieved, ignoring the fact that the value was defined as "null" by the master.
To fix the problem, we proceed as follows:
1 - If it is not STRICT sql_mode, implicit default values are used, regardless
if it is multi-row or single-row statement.
2 - However, if it is STRICT mode, then a we do what follows:
2.1 If it is a transactional engine, we do a rollback on the first NULL that is
to be set into a NOT NULL column and return an error.
2.2 If it is a non-transactional engine and it is the first row to be inserted
with multi-row, we also return the error. Otherwise, we proceed with the
execution, use implicit default values and print out warning messages.
Unfortunately, the current patch cannot mimic the behavior showed by the master
for updates on multi-tables and multi-row inserts. This happens because such
statements are unfolded in different row events. For instance, considering the
following updates and strict mode:
(master)
create table t1 (a int);
create table t2 (a int not null);
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
t1 would have (10) and t2 would have (0) as this would be handled as a
multi-row update. On the other hand, if we had the following updates:
(master)
create table t1 (a int);
create table t2 (a int);
(slave)
create table t1 (a int);
create table t2 (a int not null);
(master)
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
On the master t1 would have (10) and t2 would have (NULL). On
the slave, t1 would have (10) but the update on t1 would fail.