Problem:
========
1) Drop table queries are re-generated by server
before writing the events(queries) into binlog
for various reasons. If table name/db name contains
a non regular characters (like latin characters),
the generated query is wrong. Hence it breaks the
replication.
2) In the edge case, when table name/db name contains
64 characters, server is throwing an assert
assert(M_TBLLEN < 128)
3) In the edge case, when db name contains 64 latin
characters, binlog content is interpreted badly
which is leading replication failure.
Analysis & Fix :
================
1) Parser reads the table name from the query and converts
it to standard charset(utf8) and stores it in table_name variable.
When drop table query is regenerated with the same table_name
variable, it should be converted back to the original charset
from standard charset(utf8).
2) Latin character takes two bytes for each character. Limit
of the identifier is 64. SYSTEM_CHARSET_MBMAXLEN is set to '3'.
So there is a possiblity that tablename/dbname contains 3 * 64.
Hence assert is changed to
(M_TBLLEN <= NAME_CHAR_LEN*SYSTEM_CHARSET_MBMAXLEN)
3) db_len in the binlog event header is taking 1 byte.
db_len is ranged from 0 to 192 bytes (3 * 64).
While reading the db_len from the event, server
is casting to uint instead of uchar which is leading
to bad db_len. This problem is fixed by changing the
cast type to uchar.
Problem & Analysis: If DML invokes a trigger or a
stored function that inserts into an AUTO_INCREMENT column,
that DML has to be marked as 'unsafe' statement. If the
tables are locked in the transaction prior to DML statement
(using LOCK TABLES), then the same statement is not marked as
'unsafe' statement. The logic of checking whether unsafeness
is protected with if (!thd->locked_tables_mode). Hence if
we lock the tables prior to DML statement, it is *not* entering
into this if condition. Hence the statement is not marked
as unsafe statement.
Fix: Irrespective of locked_tables_mode value, the unsafeness
check should be done. Now with this patch, the code is moved
out to 'decide_logging_format()' function where all these checks
are happening and also with out 'if(!thd->locked_tables_mode)'.
Along with the specified test case in the bug scenario
(BINLOG_STMT_UNSAFE_AUTOINC_COLUMNS), we also identified that
other cases BINLOG_STMT_UNSAFE_AUTOINC_NOT_FIRST,
BINLOG_STMT_UNSAFE_WRITE_AUTOINC_SELECT, BINLOG_STMT_UNSAFE_INSERT_TWO_KEYS
are also protected with thd->locked_tables_mode which is not right. All
of those checks also moved to 'decide_logging_format()' function.
Note: Backporting the patch from mysql-5.6.
Problem:
A CREATE TABLE with an invalid table name is detected
at SQL layer. So the table name is reset to an empty
string. But the storage engine is called with this
empty table name. The table name is specified as
"database/table". So, in the given scenario we get
only "database/".
Solution:
Within InnoDB, detect this error and report it to
higher layer.
rb#9274 approved by jimmy.
PROBLEMS
Description:- Server variable "--lower_case_tables_names"
when set to "0" on windows platform which does not support
case sensitive file operations leads to problems. A warning
message is printed in the error log while starting the
server with "--lower_case_tables_names=0". Also according to
the documentation, seting "lower_case_tables_names" to "0"
on a case-insensitive filesystem might lead to index
corruption.
Analysis:- The problem reported in the bug is:-
Creating an INNODB table 'a' and executing a query, "INSERT
INTO a SELECT a FROM A;" on a server started with
"--lower_case_tables_names=0" and running on a
case-insensitive filesystem leads innodb to flat spin.
Optimizer thinks that "a" and "A" are two different tables
as the variable "lower_case_table_names" is set to "0". As a
result, optimizer comes up with a plan which does not need a
temporary table. If the same table is used in select and
insert, a temporary table is needed. This incorrect
optimizer plan leads to infinite insertions.
Fix:- If the server is started with
"--lower_case_tables_names" set to 0 on a case-insensitive
filesystem, an error, "The server option
'lower_case_table_names'is configured to use case sensitive
table names but the data directory is on a case-insensitive
file system which is an unsupported combination. Please
consider either using a case sensitive file system for your
data directory or switching to a case-insensitive table name
mode.", is printed in the server error log and the server
exits.
Problem:
If we add a referential integrity constraint with a duplicate
name, an error occurs. The foreign key object would not have
been added to the dictionary cache. In the error path, there
is an attempt to remove this foreign key object. Since this
object is not there, the search returns a NULL result.
De-referencing the null object results in this crash.
Solution:
If the search to the foreign key object failed, then don't
attempt to access it.
rb#9309 approved by Marko.
AVOID DEADLOCK AFTER RESTORE
Analysis
--------
Accessing the restored NDB table in an active multi-statement
transaction was resulting in deadlock found error.
MySQL Server needs to discover metadata of NDB table from
data nodes after table is restored from backup. Metadata
discovery happens on the first access to restored table.
Current code mandates this statement to be the first one
in the transaction. This is because discover needs exclusive
metadata lock on the table. Lock upgrade at this point can
lead to MDL deadlock and the code was written at the time
when MDL deadlock detector was not present. In case when
discovery attempted in the statement other than the first
one in transaction ER_LOCK_DEADLOCK error is reported
pessimistically.
Fix:
---
Removed the constraint as any potential deadlock will be
handled by deadlock detector. Also changed code in discover
to keep metadata locks of active transaction.
Same issue was present in table auto repair scenario. Same
fix is added in repair path also.
if XA PREPARE transactions hold explicit locks.
innobase_shutdown_for_mysql(): Call trx_sys_close() before lock_sys_close()
(and dict_close()) so that trx_free_prepared() will see all locks intact.
RB: 8561
Reviewed-by: Vasil Dimov <vasil.dimov@oracle.com>
Backport from mysql-5.5 to mysql-5.1 of:
Bug19770858: MYSQLD CAN BE DRIVEN TO OOM WITH TWO SIMPLE SESSION VARS
The problem was that the maximum value of the transaction_prealloc_size
session system variable was ULONG_MAX which meant that it was possible
to cause the server to allocate excessive amounts of memory.
This patch fixes the problem by reducing the maxmimum value of
transaction_prealloc_size and transaction_alloc_block_size down
to 128K.
Note that transactions will still be able to allocate more than
128K if needed, this patch just reduces the amount that can be
preallocated - as well as the maximum size of the incremental
allocation blocks.
(cherry picked from commit 540c9f7ebb428bbf9ec028feabe1f7f919fdefd9)
Conflicts:
mysql-test/suite/sys_vars/r/transaction_alloc_block_size_basic.result
mysql-test/suite/sys_vars/r/transaction_alloc_block_size_basic_64.result
mysql-test/suite/sys_vars/t/disabled.def
mysql-test/suite/sys_vars/t/transaction_alloc_block_size_basic.test
sql/sys_vars.cc
BINLOGGED INCORRECTLY - BREAKS A SLAVE
Submitted a incomplete patch with my previous push,
re submitting the extra changes the required to make
the patch complete.
Analysis:
In row based replication, Master does not send temp table information
to Slave. If there are any DDLs that involves in regular table that needs
to be sent to Slave and a temp tables (which will not be available at Slave),
the Master rewrites the query replacing temp table with it's defintion.
Eg: create table regular_table like temptable.
In rewrite logic, server is ignoring the database of regular table
which can cause problems mentioned in this bug.
Fix: dont ignore database information (if available) while
rewriting the query
The problem was that the maximum value of the transaction_prealloc_size
session system variable was ULONG_MAX which meant that it was possible
to cause the server to allocate excessive amounts of memory.
This patch fixes the problem by reducing the maxmimum value of
transaction_prealloc_size and transaction_alloc_block_size down
to 128K.
Note that transactions will still be able to allocate more than
128K if needed, this patch just reduces the amount that can be
preallocated - as well as the maximum size of the incremental
allocation blocks.
special character sets like utf16, utf32, ucs2.
Analysis: MySQL server does not support few special character sets like
utf16,utf32 and ucs2 as "client's character set"(eg: utf16,utf32, ucs2).
It is known limitation listed in the documentation
http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html.
The default value for default-character-set parameter is 'auto'
which means that if the server's character set is not supported,
then server automatically changes client's character set to
predefined character-set which is 'latin1' in the current code.
Eg:
$ ./mysql -uroot -S$SOCKET_FILE --default-character-set=utf16
ERROR 1231 (42000): Variable 'character_set_client' can't be set to the value of 'utf16'
$ ./mysql -uroot -S$SOCKET_FILE will be successfully connected to
server with 'latin1' as default client side character set.
When IO thread is trying to connect to Master, it sets server's character
set as client's character set. When Slave server is started with these
special character sets, IO thread (which is like a connection to Master)
fails because of the above said limitation.
Fix: Now even IO thread also behaves the same as a regular client behaves.
i.e., If server's character set is not supported as client's character set,
then set default's client character set(latin1) as client's character set.
Fix:
===
Backport Bug#11756194 to mysql-5.5. slave breaks if
'drop database' fails on master and mismatched tables on
slave.
'DROP TABLE <deleted tables>' was binlogged when
'DROP DATABASE' failed and at least one table was deleted
from the database. The log event would lead slave SQL thread
stop if some of the tables did not exist on slave.
After this patch, It is always binlogged with 'IF EXISTS'
option.
Description:
Using correct length when moving to next field in cmp_ref. The store
length already includes the length bytes of blobs, which is already considered
earlier for blob types.
Approved by Mattias, Jimmy [rb-7088]
The debug configuration parameter innodb_optimistic_insert_debug
which was introduced for testing corner cases in B-tree handling
had a bug in it. The value 1 would trigger an infinite sequence
of page splits.
Fix: When the value 1 is specified, disable this debug feature.
Approved by Yasufumi Kinoshita
dict_set_corrupted(): Use the canonical way of searching for
less-than-equal (PAGE_CUR_LE) and then checking low_match.
The code that was introduced in MySQL 5.5.17 in
Bug#11830883 SUPPORT "CORRUPTED" BIT FOR INNODB TABLES AND INDEXES
could position the cursor on the page supremum, and then attempt
to overwrite non-existing 7th field of the 1-field supremum record.
Approved by Jimmy Yang
some of the tables are created in InnoDB and some tables are created in MyISAM.
We need to create all tables on InnoDB. Fix is to add engine=innodb to the
CREATE TABLE statements.
approved in IM by Marko and Vasil.
Normally, SET SESSION SQL_LOG_BIN is used by DBAs to run a
non-conflicting command locally only, ensuring it does not
get replicated.
Setting GLOBAL SQL_LOG_BIN would not require all sessions to
disconnect. When SQL_LOG_BIN is changed globally, it does not
immediately take effect for any sessions. It takes effect by
becoming the session-level default inherited at the start of
each new session, and this setting is kept and cached for the
duration of that session. Setting it intentionally is unlikely
to have a useful effect under any circumstance; setting it
unintentionally, such as while intending to use SET [SESSION]
is potentially disastrous. Accidentally using SET GLOBAL
SQL_LOG_BIN will not show an immediate effect to the user,
instead not having the desired session-level effect, and thus
causing other potential problems with local-only maintenance
being binlogged and executed on slaves; And transactions from
new sessions (after SQL_LOG_BIN is changed globally) are not
binlogged and replicated, which would result in irrecoverable
or difficult data loss.
This is the regular GLOBAL variables way to work, but in
replication context it does not look right on a working server
(with connected sessions) 'set global sql_log_bin' and none of
that connections is affected. Unexperienced DBA after noticing
that the command did "nothing" will change the session var and
most probably won't unset the global var, causing new sessions
to not be binlog.
Setting GLOBAL SQL_LOG_BIN allows DBA to stop binlogging on all
new sessions, which can be used to make a server "replication
read-only" without restarting the server. But this has such big
requirements, stop all existing connections, that it is more
likely to make a mess, it is too risky to allow the GLOBAL variable.
The statement 'SET GLOBAL SQL_LOG_BIN=N' will produce an error
in 5.5, 5.6 and 5.7. Reading the GLOBAL SQL_LOG_BIN will produce
a deprecation warning in 5.7.
FROM A FUNCTION
Scenario:
In a stored procedure, CREATE TABLE statement is not allowed. But an
exception is provided for CREATE TEMPORARY TABLE. We can create a temporary
table in a stored procedure.
Let there be two stored functions f1 and f2 and two stored procedures p1 and
p2. Their properties are as follows:
. stored function f1() calls stored procedure p1().
. stored function f2() calls stored procedure p2().
. stored procedure p1() creates temporary table t1.
. stored procedure p2() does DML on t1.
Consider the following situation:
1. Autocommit mode is on.
2. select f1()
3. select f2()
Step 2: In this step, t1 would be created via p1(). A table level transaction
lock would have been taken. The ::external_lock() would not have been called
on this table. At the end of step 2, because of autocommit mode on, this table
level lock will be released.
Step 3: When we execute DML on table t1 via p2() we have two problems:
Problem 1:
The function ha_innobase::external_lock() would have been called but since
it is a select query no table level locks would have been taken. Hence the
following assert will fail:
ut_ad(lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
Solution:
The solution would be to identify this situation and take a table level lock
and use the proper lock type prebuilt->select_lock_type = LOCK_X for DML
operations.
Problem 2:
Another problem is that in step 3, ha_innobase::open() is never called on
the table t1.
Solution:
The solution would be to identify this situation and call re-init the handler
of table t1.
rb#6429 approved by Krunal.
Problem:
Creation of a table fails when innodb_strict_mode is enabled, but the same
table is created without any warning when innodb_strict_mode is enabled.
Solution:
If creation of a table fails with an error when innodb_strict_mode is
enabled, it must issue a warning when innodb_strict_mode is disabled.
rb#6723 approved by Krunal.
Problem:
We maintain two rb trees in each dict_table_t. The foreign_rbt must be in
sync with foreign_list. The referenced_rbt must be in sync with
referenced_list. There is one function which checks this consistency and it
failed, resulting in an assert failure.
The root cause of the problem was identified that the search order was
lost in the referenced_rbt. This is because while renaming the table,
we didn't not refresh this referenced_rbt.
Solution:
When a foreign key is renamed, we must delete and re-insert into both
foreign_rbt and referenced_rbt.
rb#6412 approved by Jimmy.
The test case makes use of the fine DEBUG_SYNC facility. Furthermore,
since it needs synchronization on internal threads (dump and SQL
threads) the server code has DEBUG_SYNC commands internally deployed
and activated through the DBUG_EXECUTE_IF macro. The internal
DBUG_SYNC commands are then controlled from the test case through the
DEBUG variable.
There were three problems around the DEBUG + DEBUG_SYNC facility
usage:
1. When signaling the SQL thread to continue, the test would reset
immediately the DEBUG_SYNC variable. This could mean that the SQL
thread might loose the signal and continue to wait forever;
2. A similar scenario was happening with the dump thread on the
master. This thread was instructed to wait, and later it would be
signaled to continue, but immediately after the DEBUG_SYNC would be
reset. This could lead to the dump thread missing the signal and
wait forever;
3. The test was not cleaning itself up with respect to the
instrumentation of the dump thread. This would leave the
conditional execution of an internal DEBUG_SYNC command active
(through the usage of DBUG_EXECUTE_IF).
We fix#1 and #2 by waiting for the threads to receive the signal and
only then issue the reset. We fix#3 by reseting the DEBUG variable,
thus deactivating the dump thread internal DEBUG_SYNC command.
Bug#16415173 CRLF INSTEAD OF LF IN SQL-BENCH SCRIPTS
Correct perms and converts from Windows style to UNIX style line endings on some files.
Fix perms on installed ini files.
(MySQL 5.5 version)
DISCONNECT CON1 AND CON2
Problem:
The test suite/binlog/t/binlog_killed.test makes the connections
con1 and con2 but forgets to disconnect them + wait till that
operation is finished at test end.
This mistake has the potential to harm subsequent tests in
case these tests depend on the content of the processlist.
Solution:
Added disconnect + wait_until_disconnected.inc
within the test cleanup.
SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
SHOW PROCESSLIST, SHOW BINLOGS
Problem: A deadlock was occurring when 4 threads were
involved in acquiring locks in the following way
Thread 1: Dump thread ( Slave is reconnecting, so on
Master, a new dump thread is trying kill
zombie dump threads. It acquired thread's
LOCK_thd_data and it is about to acquire
mysys_var->current_mutex ( which LOCK_log)
Thread 2: Application thread is executing show binlogs and
acquired LOCK_log and it is about to acquire
LOCK_index.
Thread 3: Application thread is executing Purge binary logs
and acquired LOCK_index and it is about to
acquire LOCK_thread_count.
Thread 4: Application thread is executing show processlist
and acquired LOCK_thread_count and it is
about to acquire zombie dump thread's
LOCK_thd_data.
Deadlock Cycle:
Thread 1 -> Thread 2 -> Thread 3-> Thread 4 ->Thread 1
The same above deadlock was observed even when thread 4 is
executing 'SELECT * FROM information_schema.processlist' command and
acquired LOCK_thread_count and it is about to acquire zombie
dump thread's LOCK_thd_data.
Analysis:
There are four locks involved in the deadlock. LOCK_log,
LOCK_thread_count, LOCK_index and LOCK_thd_data.
LOCK_log, LOCK_thread_count, LOCK_index are global mutexes
where as LOCK_thd_data is local to a thread.
We can divide these four locks in two groups.
Group 1 consists of LOCK_log and LOCK_index and the order
should be LOCK_log followed by LOCK_index.
Group 2 consists of other two mutexes
LOCK_thread_count, LOCK_thd_data and the order should
be LOCK_thread_count followed by LOCK_thd_data.
Unfortunately, there is no specific predefined lock order defined
to follow in the MySQL system when it comes to locks across these
two groups. In the above problematic example,
there is no problem in the way we are acquiring the locks
if you see each thread individually.
But If you combine all 4 threads, they end up in a deadlock.
Fix:
Since everything seems to be fine in the way threads are taking locks,
In this patch We are changing the duration of the locks in Thread 4
to break the deadlock. i.e., before the patch, Thread 4
('show processlist' command) mysqld_list_processes()
function acquires LOCK_thread_count for the complete duration
of the function and it also acquires/releases
each thread's LOCK_thd_data.
LOCK_thread_count is used to protect addition and
deletion of threads in global threads list. While show
process list is looping through all the existing threads,
it will be a problem if a thread is exited but there is no problem
if a new thread is added to the system. Hence a new mutex is
introduced "LOCK_thd_remove" which will protect deletion
of a thread from global threads list. All threads which are
getting exited should acquire LOCK_thd_remove
followed by LOCK_thread_count. (It should take LOCK_thread_count
also because other places of the code still thinks that exit thread
is protected with LOCK_thread_count. In this fix, we are changing
only 'show process list' query logic )
(Eg: unlink_thd logic will be protected with
LOCK_thd_remove).
Logic of mysqld_list_processes(or file_schema_processlist)
will now be protected with 'LOCK_thd_remove' instead of
'LOCK_thread_count'.
Now the new locking order after this patch is:
LOCK_thd_remove -> LOCK_thd_data -> LOCK_log ->
LOCK_index -> LOCK_thread_count
Problem: Uninstallation of semi sync plugin causes replication to
break.
Analysis: A semisync enabled replication is mutual agreement between
Master and Slave when the connection (I/O thread) is established.
Once I/O thread is started and if semisync is enabled on both
master and slave, master appends special magic header to events
using semisync plugin functions and sends it to slave. And slave
expects that each event will have that special magic header format
and reads those bytes using semisync plugin functions.
When semi sync replication is in use if users execute
uninstallation of the plugin on master, slave gets confused while
interpreting that event's content because it expects special
magic header at the beginning of the event. Slave SQL thread will
be stopped with "Missing magic number in the header" error.
Similar problem will happen if uninstallation of the plugin happens
on slave when semi sync replication is in in use. Master sends
the events with magic header and slave does not know about the
added magic header and thinks that it received a corrupted event.
Hence slave SQL thread stops with "Found corrupted event" error.
Fix: Uninstallation of semisync plugin will be blocked when semisync
replication is in use and will throw 'ER_UNKNOWN_ERROR' error.
To detect that semisync replication is in use, this patch uses
semisync status variable values.
> On Master, it checks for 'Rpl_semi_sync_master_status' to be OFF
before allowing the uninstallation of rpl_semi_sync_master plugin.
>> Rpl_semi_sync_master_status is OFF when
>>> there is no dump thread running
>>> there are no semisync slaves
> On Slave, it checks for 'Rpl_semi_sync_slave_status' to be OFF
before allowing the uninstallation of rpl_semi_sync_slave plugin.
>> Rpl_semi_sync_slave_status is OFF when
>>> there is no I/O thread running
>>> replication is asynchronous replication.
BREAKS RBR
Analysis:
--------
A table created using a query of the format:
CREATE TABLE t1 AS SELECT REPEAT('A',1000) DIV 1 AS a;
breaks the Row Based Replication.
The query above creates a table having a field of datatype
'bigint' with a display width of 3000 which is beyond the
maximum acceptable value of 255.
In the RBR mode, CREATE TABLE SELECT statement is
replicated as a combination of CREATE TABLE statement
equivalent to one the returned by SHOW CREATE TABLE and
row events for rows inserted. When this CREATE TABLE event
is executed on the slave, an error is reported:
Display width out of range for column 'a' (max = 255)
The following is the output of 'SHOW CREATE TABLE t1':
CREATE TABLE t1(`a` bigint(3000) DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem is due to the combination of two facts:
1) The above CREATE TABLE SELECT statement uses the display
width of the result of DIV operation as the display width
of the column created without validating the width for out
of bound condition.
2) The DIV operation incorrectly returns the length of its first
argument as the display width of its result; thus allowing
creation of a table with an incorrect display width of 3000
for the field.
Fix:
----
This fix changes the DIV operation implementation to correctly
evaluate the display width of its result. We check if DIV's
results estimated width crosses maximum width for integer
value (21) and if yes set it to this maximum value.
This patch also fixes fixes maximum display width evaluation
for DIV function when its first argument is in UCS2.
WRITTEN WHILE ROWS REMAINS
Problem:
========
When truncate table fails while using transactional based
engines even though the operation errors out we still
continue and log it to binlog. Because of this master has
data but the truncate will be written to binary log which
will cause inconsistency.
Analysis:
========
Truncate table can happen either through drop and create of
table or by deleting rows. In the second case the existing
code is written in such a way that even if an error occurs
the truncate statement will always be binlogged. Which is not
correct.
Binlogging of TRUNCATE TABLE statement should check whether
truncate is executed "transactionally or not". If the table
is transaction based we log the TRUNCATE TABLE only on
successful completion.
If table is non transactional there are possibilities that on
error we could have partial changes done hence in such cases
we do log in spite of errors as some of the lines might have
been removed, so the statement has to be sent to slave.
Fix:
===
Using table handler whether truncate table is being executed
in transaction based mode or not is identified and statement
is binlogged accordingly.