From a user perspective, the problem is that a FLUSH LOGS or SIGHUP
signal could end up associating the stdout and stderr to random
files. In the case of this bug report, the streams would end up
associated to InnoDB ibd files.
The freopen(3) function is not thread-safe on FreeBSD. What this
means is that if another thread calls open(2) during freopen()
is executing that another thread's fd returned by open(2) may get
re-associated with the file being passed to freopen(3). See FreeBSD
PR number 79887 for reference:
http://www.freebsd.org/cgi/query-pr.cgi?pr=79887
This problem is worked around by substituting a internal hook within
the FILE structure. This avoids the loss of atomicity by not having
the original fd closed before its duplicated.
Patch based on the original work by Vasil Dimov.
Post-push fixes:
- fixed platform dependent result files
- appeasing valgrind warnings:
Fault injection was also uncovering a previously existing
potential mem leaks. For BUG#46166 testing purposes, fixed
by forcing handling the leak when injecting faults.
when generating new name.
If find_uniq_filename returns an error, then this error is not
being propagated upwards, and execution does not report error to
the user (although a entry in the error log is generated).
Additionally, some more errors were ignored in new_file_impl:
- when writing the rotate event
- when reopening the index and binary log file
This patch addresses this by propagating the error up in the
execution stack. Furthermore, when rotation of the binary log
fails, an incident event is written, because there may be a
chance that some changes for a given statement, were not properly
logged. For example, in SBR, LOAD DATA INFILE statement requires
more than one event to be logged, should rotation fail while
logging part of the LOAD DATA events, then the logged data would
become inconsistent with the data in the storage engine.
the DROP statement ..."
Problem: When using temporary tables and closing a session, an
implicit DROP TEMPORARY TABLE IF EXISTS is written to the binary
log (while cleaning up the context of the session THD - see:
sql_class.cc:THD::cleanup which calls close_temporary_tables).
close_temporary_tables, first checks if the binary log is opened
and then proceeds to creating the DROP statements. Then, such
statements, are written to the binary log through
MYSQL_BIN_LOG::write(Log_event *). Inside, there is another check
if the binary log is opened and if not an error is returned. This
is where the faulty behavior is triggered. Given that the test
case replays a binary log, with temp tables statements, and right
after it issues RESET MASTER, there is a chance that is_open will
report false (when the mysql session is closed and the temporary
tables are written).
is_open may return false, because MYSQL_BIN_LOG::reset_logs is
not setting the correct flag (LOG_CLOSE_TO_BE_OPENED), on the
MYSQL_LOG_BIN::log_state (instead it sets just the
LOG_CLOSE_INDEX flag, leaving the log_state to
LOG_CLOSED). Thence, when writing the DROP statement as part of
the THD::cleanup, the thread could get a return value of false
for is_open - inside MYSQL_BIN_LOG::write, ultimately reporting
that it can't write the event to the binary log.
Fix: We fix this by adding the correct flag, missing in the
second close.
Fix assorted warnings that are generated in optimized builds.
Most of it is silencing variables that are set but unused.
This patch also introduces the MY_ASSERT_UNREACHABLE macro
which helps the compiler to deduce that a certain piece of
code is unreachable.
For crash testing: kill the server without generating core file.
include/my_dbug.h
Use kill(getpid(), SIGKILL) which cannot be caught by signal handlers.
All DBUG_XXX macros should be no-ops in optimized mode, do that for DBUG_ABORT as well.
sql/handler.cc
Kill server without generating core.
sql/log.cc
Kill server without generating core.
A CREATE...SELECT that fails is written to the binary log if a non-transactional
statement is updated. If the logging format is ROW, the CREATE statement and the
changes are written to the binary log as distinct events and by consequence the
created table is not rolled back in the slave.
In this patch, we opted to let the slave goes out of sync by not writting to the
binary log the CREATE statement. We do this by simply reseting the binary log's
cache.
Fix warnings flagged by the new warning option -Wunused-but-set-variable
that was added to GCC 4.6 and that is enabled by -Wunused and -Wall. The
option causes a warning whenever a local variable is assigned to but is
later unused. It also warns about meaningless pointer dereferences.
Apart strict-aliasing warnings, fix the remaining warnings
generated by GCC 4.4.4 -Wall and -Wextra flags.
One major source of warnings was the in-house function my_bcmp
which (unconventionally) took pointers to unsigned characters
as the byte sequences to be compared. Since my_bcmp and bcmp
are deprecated functions whose only difference with memcmp is
the return value, every use of the function is replaced with
memcmp as the special return value wasn't actually being used
by any caller.
There were also various other warnings, mostly due to type
mismatches, missing return values, missing prototypes, dead
code (unreachable) and ignored return values.
This patch fixes two problems described as follows:
1 - If there is an on-going transaction and a temporary table is created or
dropped, any failed statement that follows the "create" or "drop commands"
triggers a rollback and by consequence the slave will go out sync because
the binary log will have a wrong sequence of events.
To fix the problem, we changed the expression that evaluates when the
cache should be flushed after either the rollback of a statment or
transaction.
2 - When a "CREATE TEMPORARY TABLE SELECT * FROM" was executed the
OPTION_KEEP_LOG was not set into the thd->options. For that reason, if
the transaction had updated only transactional engines and was rolled
back at the end (.e.g due to a deadlock) the changes were not written
to the binary log, including the creation of the temporary table.
To fix the problem, we have set the OPTION_KEEP_LOG into the thd->options
when a "CREATE TEMPORARY TABLE SELECT * FROM" is executed.
MYSQL_BIN_LOG m_table_map_version member and it's associated
functions were not used in the logic of binlogging and replication,
this patch removed all related code.
When mysqlbinlog was given the --database=X flag, it always printed
'ROLLBACK TO', but the corresponding 'SAVEPOINT' statement was not
printed. The replicated filter(replicated-do/ignore-db) and binlog
filter (binlog-do/ignore-db) has the same problem. They are solved
in this patch together.
After this patch, We always check whether the query is 'SAVEPOINT'
statement or not. Because this is a literal check, 'SAVEPOINT' and
'ROLLBACK TO' statements are also binlogged in uppercase with no
any comments.
The binlog before this patch can be handled correctly except one case
that any comments are in front of the keywords. for example:
/* bla bla */ SAVEPOINT a;
/* bla bla */ ROLLBACK TO a;
for InnoDB
The class Field_bit_as_char stores the metadata for the
field incorrecly because bytes_in_rec and bit_len are set
to (field_length + 7 ) / 8 and 0 respectively, while
Field_bit has the correct values field_length / 8 and
field_length % 8.
Solved the problem by re-computing the values for the
metadata based on the field_length instead of using the
bytes_in_rec and bit_len variables.
To handle compatibility with old server, a table map
flag was added to indicate that the bit computation is
exact. If the flag is clear, the slave computes the
number of bytes required to store the bit field and
compares that instead, effectively allowing replication
*without conversion* from any field length that require
the same number of bytes to store.
mode
When the master was executing in sql_mode='traditional' (which
implies that really_abort_on_warning returns TRUE - because of
MODE_STRICT_ALL_TABLES), the error code (ER_DUP_ENTRY in the
reported case) was not being set in the
Query_log_event. Therefore, even if a failure was to be expected
when replaying the statement on the slave, a failure would occur,
because the Query_log_event was not transporting the expected
error code, but 0 instead.
This was because when the master was getting the error code to
set it in the Query_log_event, the executing thread would be
assumed to have been killed:
THD::killed==THD::KILL_BAD_DATA. This would make the error code
fetch routine not to check thd->main_da.sql_errno(), but instead
the thd->killed value. What's more, is that the server would
thd->killed value if thd->killed == THD::KILL_BAD_DATA and return
0 instead. So this is a double inconsistency, as the we should
not even check thd->killed but rather thd->main_da.sql_errno().
We fix this by extending the condition used to choose whether to
check the thd->main_da.sql_errno() or thd->killed, so that it
takes into consideration the case when:
thd->killed==THD::KILL_BAD_DATA.
To 5.x Release
Notes
=====
This is a backport of BUG#23300 into 5.1 GA.
Original cset revid (in betony):
luis.soares@sun.com-20090929140901-s4kjtl3iiyy4ls2h
Description
===========
When using replication, the slave will not log any slow query
logs queries replicated from the master, even if the
option "--log-slow-slave-statements" is set and these take more
than "log_query_time" to execute.
In order to log slow queries in replicated thread one needs to
set the --log-slow-slave-statements, so that the SQL thread is
initialized with the correct switch. Although setting this flag
correctly configures the slave thread option to log slow queries,
there is an issue with the condition that is used to check
whether to log the slow query or not. When replaying binlog
events the statement contains the SET TIMESTAMP clause which will
force the slow logging condition check to fail. Consequently, the
slow query logging will not take place.
This patch addresses this issue by removing the second condition
from the log_slow_statements as it prevents slow queries to be
binlogged and seems to be deprecated.
It is well-known that due to concurrency issues, a slave can become
inconsistent when a transaction contains updates to both transaction and
non-transactional tables in statement and mixed modes.
In a nutshell, the current code-base tries to preserve causality among the
statements by writing non-transactional statements to the txn-cache which
is flushed upon commit. However, modifications done to non-transactional
tables on behalf of a transaction become immediately visible to other
connections but may not immediately get into the binary log and therefore
consistency may be broken.
In general, it is impossible to automatically detect causality/dependency
among statements by just analyzing the statements sent to the server. This
happen because dependency may be hidden in the application code and it is
necessary to know a priori all the statements processed in the context of
a transaction such as in a procedure. Moreover, even for the few cases that
we could automatically address in the server, the computation effort
required could make the approach infeasible.
So, in this patch we introduce the option
- "--binlog-direct-non-transactional-updates" that can be used to bypass
the current behavior in order to write directly to binary log statements
that change non-transactional tables.
The problem is a somewhat common misusage of the strmake function.
The strmake(dst, src, len) function writes at most /len/ bytes to
the string pointed to by src, not including the trailing null byte.
Hence, if /len/ is the exact length of the destination buffer, a
one byte buffer overflow can occur if the length of the source
string is equal to or greater than /len/.
Problem is that purge_logs implementation in ndb (ndbcluster_binlog_index_purge_file)
calls mysql_parse (with (thd->options & OPTION_BIN_LOG) === 0))
but MYSQL_BIN_LOG first takes LOCK_log and then checks thd->options
Solution in this patch, changes so that rotate_and_purge does not hold
LOCK_log when calling purge_logs_before_date. I think this is safe
as other "purge"-function(s) is called wo/ holding LOCK_log, e.g purge_master_logs
This patch fixes three bugs as follows. First, aborting the server while purging
binary logs might generate orphan files due to how the purge operation was
implemented:
(purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
1 - register the files to be removed in a temporary buffer.
2 - update the log-bin.index.
3 - flush the log-bin.index.
4 - erase the files whose names where register in the temporary buffer
in step 1.
Thus a failure while executing step 4 would generate an orphan file. Second,
a similar issue might happen while creating a new binary as follows:
(create routine - sql/log.cc - MYSQL_BIN_LOG::open)
1 - open the new log-bin.
2 - update the log-bin.index.
Thus a failure while executing step 1 would generate an orphan file.
To fix these issues, we record the files to be purged or created before really
removing or adding them. So if a failure happens such records can be used to
automatically remove dangling files. The new steps might be outlined as follows:
(purge routine - sql/log.cc - MYSQL_BIN_LOG::purge_logs)
1 - register the files to be removed in the log-bin.~rec~ placed in
the data directory.
2 - update the log-bin.index.
3 - flush the log-bin.index.
4 - delete the log-bin.~rec~.
(create routine - sql/log.cc - MYSQL_BIN_LOG::open)
1 - register the file to be created in the log-bin.~rec~
placed in the data directory.
2 - open the new log-bin.
3 - update the log-bin.index.
4 - delete the log-bin.~rec~.
(recovery routine - sql/log.cc - MYSQL_BIN_LOG::open_index_file)
1 - open the log-bin.index.
2 - open the log-bin.~rec~.
3 - for each file in log-bin.~rec~.
3.1 Check if the file is in the log-bin.index and if so ignore it.
3.2 Otherwise, delete it.
The third issue can be described as follows. The purge operation was allowing
to remove a file in use thus leading to the loss of data and possible
inconsistencies between the master and slave. Roughly, the routine was only
taking into account the dump threads and so if a slave was not connect the
file might be delete even though it was in use.
file
Issuing 'FLUSH LOGS' does not close and reopen indexfile.
Instead a SEEK_SET is performed.
This patch makes index file to be closed and reopened whenever a
rotation happens (FLUSH LOGS is issued or binary log exceeds
maximum configured size).
Implemented the server infrastructure for the fix:
1. Added a function LEX_STRING *thd_query_string(THD) to return
a LEX_STRING structure instead of char *.
This is the function that must be called in innodb instead of
thd_query()
2. Did some encapsulation in THD : aggregated thd_query and
thd_query_length into a LEX_STRING and made accessor and mutator
methods for easy code updating.
3. Updated the server code to use the new methods where applicable.
Let
- T be a transactional table and N non-transactional table.
- B be begin, C commit and R rollback.
- N be a statement that accesses and changes only N-tables.
- T be a statement that accesses and changes only T-tables.
In RBR, changes to N-tables that happen early in a transaction are not immediately flushed
upon committing a statement. This behavior may, however, break consistency in the presence
of concurrency since changes done to N-tables become immediately visible to other
connections. To fix this problem, we do the following:
. B N N T C would log - B N C B N C B T C.
. B N N T R would log - B N C B N C B T R.
Note that we are not preserving history from the master as we are introducing a commit that
never happened. However, this seems to be more acceptable than the possibility of breaking
consistency in the presence of concurrency.
Let
- T be a transactional table and N non-transactional table.
- B be begin, C commit and R rollback.
- M be a mixed statement, i.e. a statement that updates both T and N.
- M* be a mixed statement that fails while updating either T or N.
This patch restore the behavior presented in 5.1.37 for rows either produced in
the RBR or MIXED modes, when a M* statement that happened early in a transaction
had their changes written to the binary log outside the boundaries of the
transaction and wrapped in a BEGIN/ROLLBACK. This was done to keep the slave
consistent with with the master as the rollback would keep the changes on N and
undo them on T. In particular, we do what follows:
. B M* T C would log - B M* R B T C.
Note that, we are not preserving history from the master as we are introducing a
rollback that never happened. However, this seems to be more acceptable than
making the slave diverge. We do not fix the following case:
. B T M* C would log B T M* C.
The slave will diverge as the changes on T tables that originated from the M
statement are rolled back on the master but not on the slave. Unfortunately, we
cannot simply rollback the transaction as this would undo any uncommitted
changes on T tables.
SBR is not considered in this patch because a failing statement is written to
the binary along with the error code and a slave executes and then rolls back
the statement when it has an associated error code, thus undoing the effects
on T. In RBR and MBR, a full-fledged fix will be pushed after the WL 2687.
Problem: LOGGER::general_log_write() relied on valid "thd" parameter passed
but had inconsistent "if (thd)" check.
Fix: as we always pass a valid "thd" parameter to the method,
redundant check removed.
with gcc 4.3.2
This patch fixes a number of GCC warnings about variables used
before initialized. A new macro UNINIT_VAR() is introduced for
use in the variable declaration, and LINT_INIT() usage will be
gradually deprecated. (A workaround is used for g++, pending a
patch for a g++ bug.)
GCC warnings for unused results (attribute warn_unused_result)
for a number of system calls (present at least in later
Ubuntus, where the usual void cast trick doesn't work) are
also fixed.
binlog
Mixing transactional (T) and non-transactional (N) tables on behalf of a
transaction may lead to inconsistencies among masters and slaves in STATEMENT
mode. The problem stems from the fact that although modifications done to
non-transactional tables on behalf of a transaction become immediately visible
to other connections they do not immediately get to the binary log and therefore
consistency is broken. Although there may be issues in mixing T and M tables in
STATEMENT mode, there are safe combinations that clients find useful.
In this bug, we fix the following issue. Mixing N and T tables in multi-level
(e.g. a statement that fires a trigger) or multi-table table statements (e.g.
update t1, t2...) were not handled correctly. In such cases, it was not possible
to distinguish when a T table was updated if the sequence of changes was N and T.
In a nutshell, just the flag "modified_non_trans_table" was not enough to reflect
that both a N and T tables were changed. To circumvent this issue, we check if an
engine is registered in the handler's list and changed something which means that
a T table was modified.
Check WL 2687 for a full-fledged patch that will make the use of either the MIXED or
ROW modes completely safe.
binlog
The fix for BUG 43929 introduced a regression issue. In a nutshell, when a
statement that changes a non-transactional table fails, it is written to the
binary log with the error code appended. Unfortunately, after BUG 43929, this
failure was flushing the transactional chace causing mismatch between execution
and logging histories. To fix this issue, we avoid flushing the transactional
cache when a commit or rollback is not issued.
The problem: described in the bug report.
The fix:
--increase buffers where it's necessary
(buffers which are used in stxnmov)
--decrease buffer lengths which are used
Large transactions and statements may corrupt the binary log if the size of the
cache, which is set by the max_binlog_cache_size, is not enough to store the
the changes.
In a nutshell, to fix the bug, we save the position of the next character in the
cache before starting processing a statement. If there is a problem, we simply
restore the position thus removing any effect of the statement from the cache.
Unfortunately, to avoid corrupting the binary log, we may end up loosing changes
on non-transactional tables if they do not fit in the cache. In such cases, we
store an Incident_log_event in order to stop the slave and alert users that some
changes were not logged.
Precisely, for every non-transactional changes that do not fit into the cache,
we do the following:
a) the statement is *not* logged
b) an incident event is logged after committing/rolling back the transaction,
if any. Note that if a failure happens before writing the incident event to
the binary log, the slave will not stop and the master will not have reported
any error.
c) its respective statement gives an error
For transactional changes that do not fit into the cache, we do the following:
a) the statement is *not* logged
b) its respective statement gives an error
To work properly, this patch requires two additional things. Firstly, callers to
MYSQL_BIN_LOG::write and THD::binlog_query must handle any error returned and
take the appropriate actions such as undoing the effects of a statement. We
already changed some calls in the sql_insert.cc, sql_update.cc and sql_insert.cc
modules but the remaining calls spread all over the code should be handled in
BUG#37148. Secondly, statements must be either classified as DDL or DML because
DDLs that do not get into the cache must generate an incident event since they
cannot be rolled back.
statements missed from general log
A refinement of the test in the previous patch to avoid
using sleep as a means to ensure that timestamps are
added to the log entries.
BEGIN/COMMIT/ROLLBACK was subject to replication db rules, and
caused the boundary of a transaction not recognized correctly
when these queries were ignored by the rules.
Fixed the problem by skipping replication db rules for these
statements.
Make the caller of Query_log_event, Execute_load_log_event
constructors and THD::binlog_query to provide the error code
instead of having the constructors to figure out the error code.
Bug#44091: libmysqld gets stuck waiting on mutex on initialization
The problem was that libmysqld wasn't enforcing a certain
initialization and deinitialization order for the mysys
library. Another problem was that the global object used
for management of log event handlers (aka LOGGER) wasn't
being prepared for a possible reutilization.
What leads to the hang/crash reported is that a failure
to load the language file triggers a double call of the
cleanup functions, causing an already destroyed mutex to
be used.
The solution is enforce a order on the initialization and
deinitialization of the mysys library within the libmysqld
library and to ensure that the global LOGGER object reset
it's internal state during cleanup.