Report claims that Seconds_behind_master behaves unexpectedly.
Code analysis shows that there is an evident flaw in that treating of FormatDescription event is wrong
so that after FLUSH LOGS on slave the Seconds_behind_master's calculation slips and incorrect
value can be reported to SHOW SLAVE STATUS.
Even worse is that the gap between the correct and incorrect deltas grows with time.
Fixed with prohibiting changes to rpl->last_master_timestamp by artifical events (any kind of).
suggestion as comments is added how to fight with lack of info on the slave side by means of
new heartbeat feature coming.
The test can not be done ealily fully determistic.
In case of out-of-memory error received from the master, print the corresponding message to the error log and stop slave I/O thread to avoid reconnecting with a wrong binary log position.
The issue found with bug 25411 is due to the function skip_rear_comments()
which damages the source code while implementing a work around.
The root cause of the problem is in the lexical analyser, which does not
process special comments properly.
For special comments like :
[1] aaa /*!50000 bbb */ ccc
since 5.0 is a version older that the current code, the parser is in lining
the content of the special comment, so that the query to process is
[2] aaa bbb ccc
However, the text of the query captured when processing a stored procedure,
stored function or trigger (or event in 5.1), can be after rebuilding it:
[3] aaa bbb */ ccc
which is wrong.
To fix bug 25411 properly, the lexical analyser needs to return [2] when
in lining special comments.
In order to implement this, some preliminary cleanup is required in the code,
which is implemented by this patch.
Before this change, the structure named LEX (or st_lex) contains attributes
that belong to lexical analysis, as well as attributes that represents the
abstract syntax tree (AST) of a statement.
Creating a new LEX structure for each statements (which makes sense for the
AST part) also re-initialized the lexical analysis phase each time, which
is conceptually wrong.
With this patch, the previous st_lex structure has been split in two:
- st_lex represents the Abstract Syntax Tree for a statement. The name "lex"
has not been changed to avoid a bigger impact in the code base.
- class lex_input_stream represents the internal state of the lexical
analyser, which by definition should *not* be reinitialized when parsing
multiple statements from the same input stream.
This change is a pre-requisite for bug 25411, since the implementation of
lex_input_stream will later improve to deal properly with special comments,
and this processing can not be done with the current implementation of
sp_head::reset_lex and sp_head::restore_lex, which interfere with the lexer.
This change set alone does not fix bug 25411.
Problem: to handle a situation when the size of event on the master is greater than max_allowed_packet on slave, we checked for the wrong constant (ER_NET_PACKET_TOO_LARGE instead of CR_NET_PACKET_TOO_LARGE).
Solution: test for the client "packet too large" error code instead of the server one in slave I/O thread.
"INSERT... ON DUPLICATE KEY UPDATE skips auto_increment values"
didn't make it into 5.0.36 and 5.1.16,
so we need to adjust the bug-detection-based-on-version-number code.
Because the rpl tree has a too old version, rpl_insert_id cannot pass,
so I disable it (like is already the case in 5.1-rpl for the same reason),
and the repl team will re-enable it when they merge 5.0 and 5.1 into
their trees (thus getting the right version number).
"INSERT... ON DUPLICATE KEY UPDATE skips auto_increment values".
When in an INSERT ON DUPLICATE KEY UPDATE, using
an autoincrement column, we inserted some autogenerated values and
also updated some rows, some autogenerated values were not used
(for example, even if 10 was the largest autoinc value in the table
at the start of the statement, 12 could be the first autogenerated
value inserted by the statement, instead of 11). One autogenerated
value was lost per updated row. Led to exhausting the range of the
autoincrement column faster.
Bug introduced by fix of BUG#20188; present since 5.0.24 and 5.1.12.
This bug breaks replication from a pre-5.0.24 master.
But the present bugfix, as it makes INSERT ON DUP KEY UPDATE
behave like pre-5.0.24, breaks replication from a [5.0.24,5.0.34]
master to a fixed (5.0.36) slave! To warn users against this when
they upgrade their slave, as agreed with the support team, we add
code for a fixed slave to detect that it is connected to a buggy
master in a situation (INSERT ON DUP KEY UPDATE into autoinc column)
likely to break replication, in which case it cannot replicate so
stops and prints a message to the slave's error log and to SHOW SLAVE
STATUS.
For 5.0.36->[5.0.24,5.0.34] replication we cannot warn as master
does not know the slave's version (but we always recommended to users
to have slave at least as new as master).
As agreed with support, I'll also ask for an alert to be put into
the MySQL Network Monitoring and Advisory Service.
The possibility of the race is removed by changing sequence of calls
pthread_mutex_unlock(&mi->run_lock);
pthread_cond_broadcast(&mi->stop_cond);
into
pthread_cond_broadcast(&mi->stop_cond);
pthread_mutex_unlock(&mi->run_lock);
at the end of I/O thread (similar change at the end of SQL thread). This ensures
that no thread waiting on the condition executes between the broadcast and the
unlock and thus can't delete the mi structure which caused the bug.
The relay log may not be open for some reason (e.g. disk error) after rotation,
and using it causes the slave crash.
Fix: check we have it open before access, return error otherwise.
The update_slave_list() call is a remainder from attempts to implement failsafe
replication. This code is now obsolete and not maintained (see comments in
rpl_failsafe.cc).
Inspecting the code one can see that this function do not interferre with normal
slave operation and thus can be safely removed. This will solve the issue
reported in the bug (errors on slave reconnection).
A related issue is to remove unneccessary reconnections done by slave. This is
handled in the patch for BUG#20435.
- Removed not used variables and functions
- Added #ifdef around code that is not used
- Renamed variables and functions to avoid conflicts
- Removed some not used arguments
Fixed some class/struct warnings in ndb
Added define IS_LONGDATA() to simplify code in libmysql.c
I did run gcov on the changes and added 'purecov' comments on almost all lines that was not just variable name changes
Fixed compiler warnings (detected by VC++):
- Removed not used variables
- Added casts
- Fixed wrong assignments to bool
- Fixed wrong calls with bool arguments
- Added missing argument to store(longlong), which caused wrong store method to be called.
(Mostly in DBUG_PRINT() and unused arguments)
Fixed bug in query cache when used with traceing (--with-debug)
Fixed memory leak in mysqldump
Removed warnings from mysqltest scripts (replaced -- with #)
When we write 'query=...' string to a frm file for views on a slave,
indentifiers are not properly quoted due to missing OPTION_QUOTE_SHOW_CREATE
flag in the thd->options.
Fix: properly set thd->options for the slave thread.
Transaction on the slave sql thread got blocked against a slave's mysqld local ta's
lock. Since the default, slave-transaction-retries=10, there was replaying of the
replicated ta. That failed because of a new started from 5.0.13 policy not to rollback
a timed-out transaction. Effectively the first round of a timed-out ta becomes committed
by the replaying's first "BEGIN".
It was decided to backport already existed method working in 5.1 implemented in
bug #16228 for handling symmetrical deadlock problem. That patch introduced end_trans
execution whenever a replicated ta deadlocks or timed-out.
Note, that this solution can be practically suboptimal - in the light of the changed behavior
due to timeout we still could replay only the last statement - only with a high rate of timeouting
replicated transactions.
A communication packet can also be a binlog event sent from the master to the slave.
To be sent by master dump and accepted by slave io thread both have to have
the value of max_allowed_packet bigger than one that client connection had.
In the patch there is the MAX possible replicatio header size estimation for events
in binlog that embedded user query. Only these events of query_log_event type, i.e
just plain queries, require attention.
Fix when __attribute__() is stubbed out, add ATTRIBUTE_FORMAT() for specifying
__attribute__((format(...))) safely, make more use of the format attribute,
and fix some of the warnings that this turns up (plus a bonus unrelated one).
when calling a SP from C API"
The bug was caused by lack of checks for misuse in mysql_real_query.
A stored procedure always returns at least one result, which is the
status of execution of the procedure itself.
This result, or so-called OK packet, is similar to a result
returned by INSERT/UPDATE/CREATE operations: it contains the overall
status of execution, the number of affected rows and the number of
warnings. The client test program attached to the bug did not read this
result and ivnoked the next query. In turn, libmysql had no check for
such scenario and mysql_real_query was simply trying to send that query
without reading the pending response, thus messing up the communication
protocol.
The fix is to return an error from mysql_real_query when it's called
prior to retrieval of all pending results.
No test case as the bug is in an existing test case (rpl_trigger.test
when it is run under valgrind).
The warning was caused by memory corruption in replication slave: thd->db
was pointing at a stack address that was previously used by
sp_head::execute()::old_db. This happened because mysql_change_db
behaved differently in replication slave and did not make a copy of the
argument to assign to thd->db.
The solution is to always free the old value of thd->db and allocate a new
copy, regardless whether we're running in a replication slave or not.