NOTE: Backporting the patch to next-mr.
The slave was crashing while failing to execute the init_slave() function.
The issue stems from two different reasons:
1 - A failure while allocating the master info structure generated a
segfault due to a NULL pointer.
2 - A failure while recovering generated a segfault due to a non-initialized
relay log file. In other words, the mi->init and rli->init were both set to true
before executing the recovery process thus creating an inconsistent state as the
relay log file was not initialized.
To circumvent such problems, we refactored the recovery process which is now executed
while initializing the relay log. It is ensured that the master info structure is
created before accessing it and any error is propagated thus avoiding to set mi->init
and rli->init to true when for instance the relay log is not initialized or the relay
info is not flushed.
The changes related to the refactory are described below:
1 - Removed call to init_recovery from init_slave.
2 - Changed the signature of the function init_recovery.
3 - Removed flushes. They are called while initializing the relay log and master
info.
4 - Made sure that if the relay info is not flushed the mi-init and rli-init are not
set to true.
In this patch, we also replaced the exit(1) in the fault injection by DBUG_ABORT()
to make it compliant with the code guidelines.
rpl_slave_skip fails randomly on PB2. This patch fixes the failure by
setting explicit wait for SQL thread to stop, instead of the
wait_for_slave_to_stop mysqltest command, after a start until command
is executed.
NOTE: Backporting the patch to next-mr.
The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
Although such solution has been proposed to reduce the probability of corrupted
files due to a slave-crash, the performance penalty introduced by it has
made the approach impractical for highly intensive workloads.
In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
this is the main source of performance issues.
This patch introduces new options that give more control to the user on
what should be fsynced and how often:
1) (--sync-master-info, integer) which syncs the master.info after #th event;
2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th
events.
3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
transactions.
To provide both performance and increased reliability, we recommend the following
setup:
1) --sync-master-info = 0 eventually the operating system will fsync it;
2) --sync-relay-log = 0 eventually the operating system will fsync it;
3) --sync-relay-log-info = 1 fsyncs it after every transaction;
Notice, that the previous setup does not reduce the probability of
corrupted master.info and relay-log.bin*. To overcome the issue, this patch also
introduces a recovery mechanism that right after restart throws away relay-log.bin*
retrieved from a master and updates the master.info based on the relay.info:
4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
throws away relay-log.bin* after a crash.
However, it can only recover the incorrect binlog file and position in master.info,
if other informations (host, port password, etc) are corrupted or incorrect,
then this recovery mechanism will fail to work.
BUG#31665 sync_binlog should cause relay logs to be synchronized
NOTE: Backporting the patch to next-mr.
Add sync_relay_log option to server, this option works for relay log
the same as option sync_binlog for binlog. This option also synchronize
master info to disk when set to non-zero value.
Original patches from Sinisa and Mark, with some modifications
vs not null
NOTE: Backporting the patch to next-mr.
The replication was generating corrupted data, warning messages on Valgrind
and aborting on debug mode while replicating a "null" to "not null" field.
Specifically the unpack_row routine, was considering the slave's table
definition and trying to retrieve a field value, where there was nothing to be
retrieved, ignoring the fact that the value was defined as "null" by the master.
To fix the problem, we proceed as follows:
1 - If it is not STRICT sql_mode, implicit default values are used, regardless
if it is multi-row or single-row statement.
2 - However, if it is STRICT mode, then a we do what follows:
2.1 If it is a transactional engine, we do a rollback on the first NULL that is
to be set into a NOT NULL column and return an error.
2.2 If it is a non-transactional engine and it is the first row to be inserted
with multi-row, we also return the error. Otherwise, we proceed with the
execution, use implicit default values and print out warning messages.
Unfortunately, the current patch cannot mimic the behavior showed by the master
for updates on multi-tables and multi-row inserts. This happens because such
statements are unfolded in different row events. For instance, considering the
following updates and strict mode:
(master)
create table t1 (a int);
create table t2 (a int not null);
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
t1 would have (10) and t2 would have (0) as this would be handled as a
multi-row update. On the other hand, if we had the following updates:
(master)
create table t1 (a int);
create table t2 (a int);
(slave)
create table t1 (a int);
create table t2 (a int not null);
(master)
insert into t1 values (1);
insert into t2 values (2);
update t1, t2 SET t1.a=10, t2.a=NULL;
On the master t1 would have (10) and t2 would have (NULL). On
the slave, t1 would have (10) but the update on t1 would fail.
logging is disabled
NOTE: this is the backport to next-mr.
If one sets binlog-format but does NOT enable binary log, server
refuses to start up. The following messages appears in the error log:
090217 12:47:14 [ERROR] You need to use --log-bin to make
--binlog-format work.
090217 12:47:14 [ERROR] Aborting
This patch addresses this by making the server not to bail out if the
binlog-format is set without the log-bin option. Additionally, the
specified binlog-format is stored, in the global system variable
"binlog_format", and a warning is printed instead of an error.
beyond unsigned long.
BUG#44779: binlog.binlog_max_extension may be causing failure on
next test in PB
NOTE1: this is the backport to next-mr.
NOTE2: already includes patch for BUG#44779.
Binlog file extensions would turn into negative numbers once the
variable used to hold the value reached maximum for signed
long. Consequently, incrementing value to the next (negative) number
would lead to .000000 extension, causing the server to fail.
This patch addresses this issue by not allowing negative extensions
and by returning an error on find_uniq_filename, when the limit is
reached. Additionally, warnings are printed to the error log when the
limit is approaching. FLUSH LOGS will also report warnings to the
user, if the extension number has reached the limit. The limit has been
set to 0x7FFFFFFF as the maximum.
mysql-test/suite/binlog/t/binlog_max_extension.test:
Test case added that checks the maximum available number for
binlog extensions.
sql/log.cc:
Changes to find_uniq_filename and test_if_number.
sql/log.h:
Added macros with values for MAX_LOG_UNIQUE_FN_EXT and
LOG_WARN_UNIQUE_FN_EXT_LEFT, as suggested in review.
STATUS'
NOTE: this is the backport to next-mr.
SHOW SHOW STATUS LIKE 'Slave_running' command believes that
if active_mi->slave_running != 0, then io thread is running normally.
But it isn't so in fact. When some errors happen to make io thread
try to reconnect master, then it will become transitional status
(MYSQL_SLAVE_RUN_NOT_CONNECT == 1), which also doesn't equal 0.
Yet, "SHOW SLAVE STATUS" believes that only if
active_mi->slave_running == MYSQL_SLAVE_RUN_CONNECT, then io thread is running.
So "SHOW SLAVE STATUS" can get the correct result.
Fixed to make SHOW SHOW STATUS LIKE 'Slave_running' command have the same
check condition with "SHOW SLAVE STATUS". It only believe that the io thread
is running when active_mi->slave_running == MYSQL_SLAVE_RUN_CONNECT.
NOTE: this is the backport to next-mr.
This patch addresses the bug reported by checking wether
host argument is an empty string or not. If empty, an error is
reported to the client, otherwise continue normally.
This commit is based on the originally proposed patch and adds
a test case as requested during review as well as refines comments,
and makes test case result file less verbose (compared to previous patch).
NOTE: this is the backport to next-mr.
When using replication, the slave will not log any slow query logs queries
replicated from the master, even if the option "--log-slow-slave-statements"
is set and these take more than "log_query_time" to execute.
In order to log slow queries in replicated thread one needs to set the
--log-slow-slave-statements, so that the SQL thread is initialized with the
correct switch. Although setting this flag correctly configures the slave
thread option to log slow queries, there is an issue with the condition that
is used to check whether to log the slow query or not. When replaying binlog
events the statement contains the SET TIMESTAMP clause which will force the
slow logging condition check to fail. Consequently, the slow query logging will
not take place.
This patch addresses this issue by removing the second condition from the
log_slow_statements as it prevents slow queries to be binlogged and seems
to be deprecated.
NOTE: Backporting the patch to next-mr.
The reason of the bug was incompatibile with the master side behaviour.
INSERT query on the master is allowed to insert into a table without specifying
values of DEFAULT-less fields if sql_mode is not strict.
Fixed with checking sql_mode by the sql thread to decide how to react.
Non-strict sql_mode should allow Write_rows event to complete.
todo: warnings can be shown via show slave status, still this is a
separate rather general issue how to show warnings for the slave threads.
NOTE: Backporting the patch to next-mr.
WL#4828 Augment DBUG_ENTER/DBUG_EXIT to crash MySQL in different functions
-------
The assessment of the replication code in the presence of faults is extremely
import to increase reliability. In particular, one needs to know if servers
will either correctly recovery or print out appropriate error messages thus
avoiding unexpected problems in a production environment.
In order to accomplish this, the current patch refactories the debug macros
already provided in the source code and introduces three new macros that
allows to inject faults, specifically crashes, while entering or exiting a
function or method. For instance, to crash a server while returning from
the init_slave function (see module sql/slave.cc), one needs to do what
follows:
1 - Modify the source replacing DBUG_RETURN by DBUG_CRASH_RETURN;
DBUG_CRASH_RETURN(0);
2 - Use the debug variable to activate dbug instructions:
SET SESSION debug="+d,init_slave_crash_return";
The new macros are briefly described below:
DBUG_CRASH_ENTER (function) is equivalent to DBUG_ENTER which registers the
beginning of a function but in addition to it allows for crashing the server
while entering the function if the appropriate dbug instruction is activate.
In this case, the dbug instruction should be "+d,function_crash_enter".
DBUG_CRASH_RETURN (value) is equivalent to DBUG_RETURN which notifies the
end of a function but in addition to it allows for crashing the server
while returning from the function if the appropriate dbug instruction is
activate. In this case, the dbug instruction should be
"+d,function_crash_return". Note that "function" should be the same string
used by either the DBUG_ENTER or DBUG_CRASH_ENTER.
DBUG_CRASH_VOID_RETURN (value) is equivalent to DBUG_VOID_RETURN which
notifies the end of a function but in addition to it allows for crashing
the server while returning from the function if the appropriate dbug
instruction is activate. In this case, the dbug instruction should be
"+d,function_crash_return". Note that "function" should be the same string
used by either the DBUG_ENTER or DBUG_CRASH_ENTER.
To inject other faults, for instance, wrong return values, one should rely
on the macros already available. The current patch also removes a set of
macros that were either not being used or were redundant as other macros
could be used to provide the same feature. In the future, we also consider
dynamic instrumentation of the code.
BUG#45747 DBUG_CRASH_* is not setting the strict option
---------
When combining DBUG_CRASH_* with "--debug=d:t:i:A,file" the server crashes
due to a call to the abort function in the DBUG_CRASH_* macro althought the
appropriate keyword has not been set.
NOTE: Backporting the patch to next-mr.
The use of option log_slave_updates without log_bin was preventing the server
from starting. To fix the problem, we replaced the error message and the exit
call by a warning message.
files
NOTE: this is the backport to next-mr.
SHOW BINLOG EVENTS does not work with relay log files. If issuing
"SHOW BINLOG EVENTS IN 'relay-log.000001'" in a non-empty relay
log file (relay-log.000001), mysql reports empty set.
This patch addresses this issue by extending the SHOW command
with RELAYLOG. Events in relay log files can now be inspected by
issuing SHOW RELAYLOG EVENTS [IN 'log_name'] [FROM pos] [LIMIT
[offset,] row_count].
mysql-test/extra/rpl_tests/rpl_show_relaylog_events.inc:
Shared part of the test case.
mysql-test/include/show_binlog_events.inc:
Added options $binary_log_file, $binary_log_limit_row,
$binary_log_limit_offset so that show_binlog_events can take
same parameters as SHOW BINLOG EVENTS does.
mysql-test/include/show_relaylog_events.inc:
Clone of show_binlog_events for relaylog events.
mysql-test/suite/rpl/t/rpl_row_show_relaylog_events.test:
Test case for row based replication.
mysql-test/suite/rpl/t/rpl_stm_mix_show_relaylog_events.test:
Test case for statement and mixed mode replication.
sql/lex.h:
Added RELAYLOG symbol.
sql/mysqld.cc:
Added "show_relaylog_events" to status_vars.
sql/sp_head.cc:
Set SQLCOM_SHOW_RELAYLOG_EVENTS to return flags=
sp_head::MULTI_RESULTS; in sp_get_flags_for_command as
SQLCOM_SHOW_BINLOG_EVENTS does.
sql/sql_lex.h:
Added sql_command SQLCOM_SHOW_RELAYLOG_EVENTS to lex enum_sql_command.
sql/sql_parse.cc:
Added handling of SQLCOM_SHOW_RELAYLOG_EVENTS.
sql/sql_repl.cc:
mysql_show_binlog_events set to choose the log file to use based on
the command issued (SHOW BINLOG|RELAYLOG EVENTS).
sql/sql_yacc.yy:
Added RELAYLOG to the grammar.
There is a missing check for memory allocation failure when allocating
memory for the handlerton structure. If the handlerton init function
tries to de-reference the pointer, it will cause a segmentation fault
and crash the server.
This patch fixes the problem by not calling the init function if memory
allocation failed, and instead prints an informative error message and
reports the error to the caller.
sql/handler.cc:
Add a check if memory allocation succeeded before calling the init
function. If it failed, it is not necessary to free the memory,
but the plugin->data is set to NULL to ensure that it can be checked
for failure.
When setting AUTOCOMMIT=1 after starting a transaction, the binary log
did not commit the outstanding transaction. The reason was that the binary
log commit function saw the values of the new settings, deciding that there
were nothing to commit.
Fixed the problem by moving the implicit commit to before the thread option
flags were changed, so that the binary log sees the old values of the flags
instead of the values they will take after the statement.
mysql-test/extra/binlog_tests/implicit.test:
New test file to check implicit commits both inside and outside transactions.
mysql-test/suite/binlog/t/binlog_implicit_commit.test:
Test for implicit commit of SET AUTOCOMMIT and LOCK/UNLOCK TABLES.
sql/set_var.cc:
Adding code to commit pending transaction before changing option flags.
slave leaves slave unstable
Problem: when replicating from non-transactional to
transactional engine with autocommit off, no BEGIN/COMMIT
is written to the binlog. When the slave replicates, it
will start a transaction that never ends.
Fix: Force autocommit=on on slave by always replicating
autocommit=1 from the master.