Problem: For every event read, mysqlbinlog calls localtime() which in turn
calls stat(/etc/localtime) which is causing kernel mutex contention.
Analysis and Fix:
localtime() calls stat(/etc/localtime) for every instance of the call
where as localtime_r() the reentrant version was optimized to store
the read only tz internal structure. Hence it will not call
stat(/etc/localtime). It will call only once at the beginning.
The mysql server is calling localtime_r() and mysqlbinlog tool is
one place where we are still using localtime().
Once the process (mysqlbinlog) is started if timezone is changed
it will be not picked up the the process and it will continue
with the same values as the beginning of the process. This
behavior is in-lined with mysql server.
Also adding localtime_r() and gmtime_r() support for windows.
Problem:
========
In a master slave replication if a slave receives a
Start_log_event_v3 the payload is expected to be of fixed
size. If a payload which is smaller than the fixed size is
received it causes a read out of bounds issue.
Analysis:
========
According to documentation the fixed data part of
Start_log_event_v3 looks as shown below.
2 bytes: The binary log format version
50 bytes: The MySQL server's version
4 bytes: Timestamp in seconds when this event was created
Since the payload is expected to be of fixed size, therefore
ST_SERVER_VER_LEN (50) bytes are memcpy'ed into
server_version. But if a malicious master sends a shorter
payload it causes a read out of bounds issue.
Fix:
===
In Start_log_event_v3 event's constructor a check has been
added which expects the minimum payload length to be of size
common_header_len + ST_COMMON_HEADER_LEN_OFFSET bytes. If a
malicious packet of lesser length is received it will be
considered as an invalid event.
sql/log_event.cc:
Added code changes to check the minimum packet length
of Start_log_event_v3 should be > 56.
sql/log_event.h:
Moved server_version from stack to heap and modified
is_valid function for Start_log_event_v3.
Problem: A corrupted header length in FORMAT_DESCRIPTION_LOG_EVENT
can cause server to crash.
Analysis: FORMAT_DESCRIPTION_EVENT will be considered invalid if
header len is too small (i.e. below OLD_HEADER_LEN).
Format_description_log_event:: Format_description_log_event(...)
{
...
if ((common_header_len=buf[ST_COMMON_HEADER_LEN_OFFSET]) < OLD_HEADER_LEN)
DBUG_VOID_RETURN; /* sanity check */
...
post_header_len= my_memdup(...)
}
In that case Format_description_log_event constructor will return early,
without allocating any memory for post_header_len. Thence this variable is
left uninitialized and making server to crash when server is trying
to free the uninitialized value.
Fix: When Format_description_log_event constructor returns early, assign
NULL to post_header_len.
bzr merge -r4264 maria/5.5
Text conflict in sql/mysqld.cc
Text conflict in storage/xtradb/btr/btr0cur.c
Text conflict in storage/xtradb/buf/buf0buf.c
Text conflict in storage/xtradb/buf/buf0lru.c
Text conflict in storage/xtradb/handler/ha_innodb.cc
5 conflicts encountered.
~40% bugfixed(*) applied
~40$ bugfixed reverted (incorrect or we're not buggy)
~20% bugfixed applied, despite us being not buggy
(*) only changes in the server code, e.g. not cmakefiles
NON-EXISTS RECORDS
Problem:
========
In RBR replication, master deletes a record but the record
don't exist on slave. when slave tries to apply the
Delete_row_log_event from master, it will result in an
assert on slave.
Analysis:
========
This problem exists not only with Delete_rows event but also
with Update_rows event as well. Trying to update a non
existing row on the slave from the master will cause the
same assert. This assert occurs only for the tables that
doesn't have primary keys and which basically require
sequential scan to be done to locate a record. This bug
occurs only with innodb engine not with myisam.
When update or delete rows is executed on a slave on a table
which doesn't have primary key the updated record is stored
in a buffer named table->record[0] and the same is copied to
table->record[1] so that during sequential scan
table->record[0] can reloaded with fetched data from the
table and compared against table->record[1]. In a special
case where there is no record on the slave side scan will
result in EOF in that case we reinit the scan and we try to
compare record[0] with record[1] which are basically the
same. This comparison is incorrect. Since they both are the
same record_compare() will report that record is found and
we try to go ahead and try to update/delete non existing
row. Ideally if the scan results in EOF means no data found
hence no need to do a record_compare() at all.
Fix:
===
Avoid comparision of records on EOF.
sql/log_event.cc:
Avoid record comparison on end of file.
sql/log_event_old.cc:
Avoid record comparison on end of file.
Problem:
Load_log_event::print_query() function does not put escape character in file name
for "LOAD DATA INFILE" statement.
Analysis:
When we have "'" in our file name for "LOAD DATA INFILE" statement,
Load_log_event::print_query() function does not put escape character
in our file name.
This one result that when we show binary-log, we get file name without
escape character.
Solution:
To put escape character when we have "'" in file name, for this instead of using
simple memcpy() to put file-name, we will use pretty_print_str().
Performance schema tables are local to a server and they should not
be allowed to be executed by the slave from the relay log.
From 5.6.10, P_S events are not written into the binary log.
But prior to that, from mysql 5.5 onwards, P_S events are written
to the binary log by master.
The following are problematic scenarios:
1. Master 5.5 -> Slave 5.5
========================
A) RBR: Slave crashes
B) SBR: P_S statements are replicated.
2.Master 5.5 -> Slave 5.6
========================
A) RBR: SQL thd generates error
B) SBR : P_S statements are replicated
3. 5.5 binlog executed on a server 5.5 using mysqlbinlog|mysql
=================================================================
A) RBR: Server crash (because of BINLOG'... statement)
B) SBR: P_S statements are executed
4. 5.5 binlog executed on server 5.6 using mysqlbinlog|mysql
================================================================
A) RBR: SQL error (because of BINLOG'... statement)
B) SBR: P_S statements are executed.
The generalized behaviour should be:
a) Slave SQL thread should certainly ignore P_S events read from
the relay log.
b) mysqlbinlog|mysql should replay the binlog succesfully.
OF ROW DATA
Problem:
========
Inserting a row larger than 4G when server uses RBR leads
to crash.
Analysis:
========
Row-based binary logging logs changes in individual table
rows. During the execution of DML statements in RBR the
actual row data will be stored within "m_rows_buf" buffer
and this buffer contents will be written to binary log.
"m_rows_buf" is prepared within the following function
"Rows_log_event::do_add_row_data".
When a huge row is specified as in this bug scenario where
row size is 4294971520 > UINT_MAX (4294967295) then the
"m_rows_buf" is reallocated to accommodate the row data and
then the row is copied to the buffer. During this realloc
call, the length is getting type casted to "uint" which
results in overflow. Because of the overflow the reallocated
memory happens to be incorrect than what was requested
and it results in a crash during copy of rowdata to buffer.
Hence rows of size > 4GB cannot be written to binary log.
By default the event_length can be stored within 4 bytes
which in turn restricts an event's size to grow. Hence large
rows cannot be replicated using row based replication.
Fix:
===
An error is generated if the row size exceeds 4GB value.
sql/log_event.cc:
An error is generated if the row size exceeds 4GB value.
Debug simulations are added to test the fix.
AUTO_INC COLUMN ONLY ON SLAVE
In RBR, if the slave's table as one additional auto_inc column,
then, it will insert the value 0 instead of generating the next
auto_inc number.
We fix this by checking that if an auto_inc extra column exists,
when compared to column data of the row event, we explicitly set
it to NULL and flag the engine that a nulled auto_inc column will
be inserted.
(MYSQLBINLOG -V CRASHES WITH THAT BINLOG)
Problem: If slave receives a corrupted row event,
slave server is crashing.
Analysis: When slave is unpacking the row event, it is
not validating the data before applying the event. If the
data is corrupted for eg: the length of a field is wrong,
it could end up reading wrong data leading to a crash.
A similar problem happens when mysqlbinlog tool is used
against a corrupted binlog using '-v' option. Due to -v
option, the tool tries to print the values of all the
fields. Corrupted field length could lead to a crash.
Fix: Before unpacking the field, a verification
will be made on the length. If it falls into the event
range, only then it will be unpacked. Otherwise,
"ER_SLAVE_CORRUPT_EVENT" error will be thrown.
Incase mysqlbinlog -v case, the field value will not be
printed and the processing of the file will be stopped.
sql/field.h:
Removed a function which is not required anymore
sql/log_event.cc:
Adding a validation on the field length before
the tool tries to print the value.
sql/log_event.h:
Changing unpack_row call according to the new arguments
sql/log_event_old.h:
Changing unpack_row call according to the new arguments
sql/rpl_record.cc:
Adding a new argument 'row_end' which tells
the end position of the complete data in the
row event. It will be used to do validation
before doing 'unpack' field.
sql/rpl_record.h:
Adding a new argument 'row_end' which tells
the end position of the complete data in the
row event. It will be used to do validation
before doing 'unpack' field.
sql/rpl_utility.cc:
Now calc_field_size() is required for client too.
(and valgrind warnings)
* move thd userstat initialization to the same function
that was adding thd userstat to global counters.
* initialize thd->start_bytes_received in THD::init
(when thd->userstat_running is set)
"SHOW BINLOG EVENTS"
Problem:
========
mysql was crashed after executing "show binlog events in
'mysql-bin.000005' from 99", the crash happened randomly.
Analysis:
========
During construction of LOAD EVENT or NEW LOAD EVENT object
if the starting offset is provided as incorrect value then
all the object members that are retrieved from the offset
are also invalid. Some times it will lead to out of bound
address offsets. In the bug scenario, the file name is
extracrated from an invalid address and the same is fed to
strlen(fname) function. Passing invalid address to strlen
will lead to crash.
Fix:
===
Validate if the given offset falls within the event boundary
or not.
sql/log_event.cc:
Added code to validate fname's address. "fname" should
be within event boundary. Added code to find invalid
invents.
REPLICATION FILTERS ARE USED.
Problem:
When Filtered-slave applies Int_var_log_event and when it
tries to write the event to its own binlog, LAST_INSERT_ID
value is written wrongly.
Analysis:
THD::stmt_depends_on_first_successful_insert_id_in_prev_stmt
is a variable which is set when LAST_INSERT_ID() is used by
a statement. If it is set, first_successful_insert_id_in_
prev_stmt_for_binlog will be stored in the statement-based
binlog. This variable is CUMULATIVE along the execution of
a stored function or trigger: if one substatement sets it
to 1 it will stay 1 until the function/trigger ends,
thus making sure that first_successful_insert_id_in_
prev_stmt_for_binlog does not change anymore and is
propagated to the caller for binlogging. This is achieved
using the following code
if(!stmt_depends_on_first_successful_insert_id_in_prev_stmt)
{
/* It's the first time we read it */
first_successful_insert_id_in_prev_stmt_for_binlog=
first_successful_insert_id_in_prev_stmt;
stmt_depends_on_first_successful_insert_id_in_prev_stmt= 1;
}
Slave server, after receiving Int_var_log_event event from
master, it is setting
stmt_depends_on_first_successful_insert_id_in_prev_stmt
to true(*which is wrong*) and not setting
first_successful_insert_id_in_prev_stmt_for_binlog. Because
of this problem, when the actual DML statement with
LAST_INSERT_ID() is parsed by slave SQL thread,
first_successful_insert_id_in_prev_stmt_for_binlog is not
set. Hence the value zero (default value) is written to
slave's binlog.
Why only *Filtered slave* is effected when the code is
in common place:
-------------------------------------------------------
In Query_log_event::do_apply_event,
THD::stmt_depends_on_first_successful_insert_id_in_prev_stmt
is reset to zero at the end of the function. In case of
normal slave (No Filters), this variable will be reset.
In Filtered slave, Slave SQL thread defers all IRU events's
execution until IRU's Query_log event is received. Once it
receives Query_log_event it executes all pending IRU events
and then it executes Query_log_event. Hence the variable is
not getting reset to 0, causing this bug.
Fix: As described above, the root cause was setting
THD::stmt_depends_on_first_successful_insert_id_in_prev_stmt
when Int_var_log_event was executed by a SQL thread. Hence
removing the problematic line from the code.
Dump thread may encounter an error when reading events from the active binlog
file. However the errors may be temporary, so dump thread will try to read
the event again. But dump thread seeked to an wrong position, it caused some
events was sent twice.
To fix the bug, prev_pos is defined out the while loop and is set the correct
position after reading every event correctly.
This patch also make binlog_can_be_corrupted more accurate, only the binlogs
not closed normally are marked binlog_can_be_corrupted.
Finally, two warnings are added when dump threads encounter the temporary
errors.
mysql-test/suite/rpl/r/last_insert_id.result:
Test case for last_insert_id
mysql-test/suite/rpl/t/last_insert_id.cnf:
Test case for last_insert_id
mysql-test/suite/rpl/t/last_insert_id.test:
Test case for last_insert_id
sql/log_event.cc:
Added DBUG_PRINT
Set thd->first_successful_insert_id_in_prev_stmt_for_binlog when setting thd->first_successful_insert_id_in_prev_stmt.
This is required to get last_insert_id() replicated.
This is analog to how read_first_successful_insert_id_in_prev_stmt() works.
sql/rpl_utility.cc:
Added DBUG_PRINT
STATUS OF ROLLBACKED TRANSACTION" and bug #17054007 - "TRANSACTION
IS NOT FULLY ROLLED BACK IN CASE OF INNODB DEADLOCK".
The problem in the first bug report was that although deadlock involving
metadata locks was reported using the same error code and message as InnoDB
deadlock it didn't rollback transaction like the latter. This caused
confusion to users as in some cases after ER_LOCK_DEADLOCK transaction
could have been restarted immediately and in some cases rollback was
required.
The problem in the second bug report was that although InnoDB deadlock
caused transaction rollback in all storage engines it didn't cause release
of metadata locks. So concurrent DDL on the tables used in transaction was
blocked until implicit or explicit COMMIT or ROLLBACK was issued in the
connection which got InnoDB deadlock.
The former issue has stemmed from the fact that when support for detection
and reporting metadata locks deadlocks was added we erroneously assumed
that InnoDB doesn't rollback transaction on deadlock but only last statement
(while this is what happens on InnoDB lock timeout actually) and so didn't
implement rollback of transactions on MDL deadlocks.
The latter issue was caused by the fact that rollback of transaction due
to deadlock is carried out by setting THD::transaction_rollback_request
flag at the point where deadlock is detected and performing rollback
inside of trans_rollback_stmt() call when this flag is set. And
trans_rollback_stmt() is not aware of MDL locks, so no MDL locks are
released.
This patch solves these two problems in the following way:
- In case when MDL deadlock is detect transaction rollback is requested
by setting THD::transaction_rollback_request flag.
- Code performing rollback of transaction if THD::transaction_rollback_request
is moved out from trans_rollback_stmt(). Now we handle rollback request
on the same level as we call trans_rollback_stmt() and release statement/
transaction MDL locks.
See also MySQL Bug #39750 and similar ones.
Fix my_delete() on Windows, to safely remvoe files on Windows, including files that are opened by another threads in the same process, antiviruses and backup applications. If file to be deleted is also opened by another thread, the file is renamed to unique name prior to deletion - this makes it possible to create file with the same name right after deletion.
With this patch my_delete_allow_opened() becomes obsolete and is replaced with my_delete().
This patch is rework of the patch http://lists.mysql.com/commits/59327 for MySQL bug#39750.
bzr merge lp:maria/5.5 -rtag:mariadb-5.5.31
Text conflict in cmake/cpack_rpm.cmake
Text conflict in debian/dist/Debian/control
Text conflict in debian/dist/Ubuntu/control
Text conflict in sql/CMakeLists.txt
Conflict adding file sql/db.opt. Moved existing file to sql/db.opt.moved.
Conflict adding file sql/db.opt.moved. Moved existing file to sql/db.opt.moved.moved.
Text conflict in sql/mysqld.cc
Text conflict in support-files/mysql.spec.sh
8 conflicts encountered.
MDEV-4489 "Replication of big5, cp932, gbk, sjis strings makes wrong values on slave"
has been fixed.
Problem:
String constants of some Asian charsets (big5,cp932,gbk,sjis)
can have backslash '\' (0x5C) in the second byte of multi-byte characters.
Replicating of such constants using the standard '\'-escaping is dangerous.
Therefore, constants of these charsets are replicated using hex notation:
INSERT INTO t1 (a) VALUES (0x815C);
However, 0xHHHH constants do not work well in some cases,
because they can behave as strings and as numbers, depending on context
(for example, depending on the data type of the column in an INSERT statement).
This SQL script was not replicated correctly with statement-based replication:
SET NAMES gbk;
PREPARE STMT FROM 'INSERT INTO t1 (a) VALUES (?)';
SET @a = '1';
EXECUTE STMT USING @a;
The INSERT statement was replicated as:
INSERT INTO t1 (a) VALUES (0x31);
'1' was correctly converted to the number 1 on master.
But the 0x31 constant was treated as number 49 on slave.
Fix:
1. Binary log now uses X'HHHH' instead of 0xHHHH constants.
2. The X'HHHH' constants now work always as strings, in all contexts.
This is the SQL standard compliant behaviour.
After the fix, the above statement is replicated as:
INSERT INTO t1 (a) VALUES (X'31');
X'31' is treated as string '1' on slave, and is correctly converted to 1.
modified:
@ mysql-test/r/ctype_cp932_binlog_stm.result
@ mysql-test/r/select.result
@ mysql-test/r/select_jcl6.result
@ mysql-test/r/select_pkeycache.result
@ mysql-test/r/user_var-binlog.result
@ mysql-test/r/varbinary.result
@ mysql-test/suite/binlog/r/binlog_stm_ctype_ucs.result
@ mysql-test/suite/binlog/r/binlog_stm_mix_innodb_myisam.result
@ mysql-test/suite/rpl/r/rpl_charset_sjis.result
@ mysql-test/suite/rpl/r/rpl_mdev382.result
@ mysql-test/suite/rpl/t/rpl_charset_sjis.test
@ mysql-test/t/ctype_cp932_binlog_stm.test
@ mysql-test/t/select.test
@ mysql-test/t/varbinary.test
Adding and updating tests
@ sql/item.cc
@ sql/item.h
@ sql/sql_yacc.yy
@ sql/sql_lex.cc
Splitting the implementations of X'HH' and 0xHH constants into two
separate classes. Fixing the parser to distinguish the two syntaxes.
@ sql/log_event.cc
Using X'HH' instead of 0xHH for binary logging for string constants
of the "dangerous" charsets.
@ sql/sql_string.h
Adding a helped method String::append_hex().
NO ERRORS REPORTED
Problem:
=======
Errors from my_b_fill are ignored. MYSQL_BIN_LOG::write_cache
code assumes that 0 returned from my_b_fill always means
end-of-cache, but that is incorrect. It can result in error
and the error is ignored. Other callers of my_b_fill don't
check for error: my_b_copy_to_file, maybe my_b_gets.
Fix:
===
An error handler is already present to check the "cache"
error that is reported during "MYSQL_BIN_LOG::write_cache"
call. Hence error handlers are added for "my_b_copy_to_file"
and "my_b_gets".
During my_b_fill() function call, when the cache read fails
info->error= -1 is set. Hence a check for "info->error"
is added for the above to callers upon their return.
mysys/mf_iocache2.c:
Added a check for "cache->error" and simulation of cache read failure
mysys/my_read.c:
Simulation of read failure
sql/log_event.cc:
Added debug simulation
sql/sql_repl.cc:
Added a check for cache error
At logging a first Query referring a user var, the slave missed to log the user var.
It appears that at execution of a Uservar event the slaver applier
thought of the variable as already logged.
The reason of misjudgement is in coincidence of query id:s: of one that the thread
holds at Uservar execution and another one that the thread sees at the Query applying.
While the two are naturally different in the regular execution branch (as two computational
events are separated as individual events), in the deferred applying case the User var execution
effectively belongs to its Query processing.
Fixed with storing the Uservar parsing time (where desicion to defer is taken) query id
to temporarily substitute with it the actual query id at the Uservar execution time
(along with its query).
Such manipulation mimics behaviour of the regular applying branch.
sql/log_event.cc:
Storing the Uservar parsing time query id into a new member of the event
to to temporarily substitute
with it the actual thread id at the Uservar execution time.
sql/log_event.h:
Storage for keeping query-id in User-var intance is added.