SHOW PROCESSLIST, SHOW BINLOGS
Problem: A deadlock was occurring when 4 threads were
involved in acquiring locks in the following way
Thread 1: Dump thread ( Slave is reconnecting, so on
Master, a new dump thread is trying kill
zombie dump threads. It acquired thread's
LOCK_thd_data and it is about to acquire
mysys_var->current_mutex ( which LOCK_log)
Thread 2: Application thread is executing show binlogs and
acquired LOCK_log and it is about to acquire
LOCK_index.
Thread 3: Application thread is executing Purge binary logs
and acquired LOCK_index and it is about to
acquire LOCK_thread_count.
Thread 4: Application thread is executing show processlist
and acquired LOCK_thread_count and it is
about to acquire zombie dump thread's
LOCK_thd_data.
Deadlock Cycle:
Thread 1 -> Thread 2 -> Thread 3-> Thread 4 ->Thread 1
The same above deadlock was observed even when thread 4 is
executing 'SELECT * FROM information_schema.processlist' command and
acquired LOCK_thread_count and it is about to acquire zombie
dump thread's LOCK_thd_data.
Analysis:
There are four locks involved in the deadlock. LOCK_log,
LOCK_thread_count, LOCK_index and LOCK_thd_data.
LOCK_log, LOCK_thread_count, LOCK_index are global mutexes
where as LOCK_thd_data is local to a thread.
We can divide these four locks in two groups.
Group 1 consists of LOCK_log and LOCK_index and the order
should be LOCK_log followed by LOCK_index.
Group 2 consists of other two mutexes
LOCK_thread_count, LOCK_thd_data and the order should
be LOCK_thread_count followed by LOCK_thd_data.
Unfortunately, there is no specific predefined lock order defined
to follow in the MySQL system when it comes to locks across these
two groups. In the above problematic example,
there is no problem in the way we are acquiring the locks
if you see each thread individually.
But If you combine all 4 threads, they end up in a deadlock.
Fix:
Since everything seems to be fine in the way threads are taking locks,
In this patch We are changing the duration of the locks in Thread 4
to break the deadlock. i.e., before the patch, Thread 4
('show processlist' command) mysqld_list_processes()
function acquires LOCK_thread_count for the complete duration
of the function and it also acquires/releases
each thread's LOCK_thd_data.
LOCK_thread_count is used to protect addition and
deletion of threads in global threads list. While show
process list is looping through all the existing threads,
it will be a problem if a thread is exited but there is no problem
if a new thread is added to the system. Hence a new mutex is
introduced "LOCK_thd_remove" which will protect deletion
of a thread from global threads list. All threads which are
getting exited should acquire LOCK_thd_remove
followed by LOCK_thread_count. (It should take LOCK_thread_count
also because other places of the code still thinks that exit thread
is protected with LOCK_thread_count. In this fix, we are changing
only 'show process list' query logic )
(Eg: unlink_thd logic will be protected with
LOCK_thd_remove).
Logic of mysqld_list_processes(or file_schema_processlist)
will now be protected with 'LOCK_thd_remove' instead of
'LOCK_thread_count'.
Now the new locking order after this patch is:
LOCK_thd_remove -> LOCK_thd_data -> LOCK_log ->
LOCK_index -> LOCK_thread_count
ISSUE:
------
For UNION of selects, rows examined by the query will be sum
of rows examined by individual select operations and rows
examined for union operation. The value of session level
global counter that is used to count the rows examined by a
select statement should be accumulated and reset before it
is used for next select statement. But we have missed to
reset the same. Because of this examined row count of a
select query is accounted more than once.
SOLUTION:
---------
In union reset the session level global counter used to
accumulate count of examined rows after its value is saved.
mysql-test/r/union.result:
Expected output of testcase added.
mysql-test/t/union.test:
Test to verify examined row count of Union operations.
sql/sql_union.cc:
Reset the value of thd->examined_row_count after
accumulating the value.
ISSUE:
------
For UNION of selects, rows examined by the query will be sum
of rows examined by individual select operations and rows
examined for union operation. The value of session level
global counter that is used to count the rows examined by a
select statement should be accumulated and reset before it
is used for next select statement. But we have missed to
reset the same. Because of this examined row count of a
select query is accounted more than once.
SOLUTION:
---------
In union reset the session level global counter used to
accumulate count of examined rows after its value is saved.
Problem:
If there is a predicate on a column referenced by MIN/MAX and
that predicate is not present in all the disjunctions on
keyparts earlier in the compound index, Loose Index Scan will
not return correct result.
Analysis:
When loose index scan is chosen, range optimizer currently
groups all the predicates that contain group parts separately
and minmax parts separately. It therefore applies all the
conditions on the group parts first to the fetched row.
Then in the call to next_max, it processes the conditions
which have min/max keypart.
For ex in the following query:
Select f1, max(f2) from t1 where (f1 = 10 and f2 = 13) or
(f1 = 3) group by f1;
Condition (f2 = 13) would be applied even for rows that
satisfy (f1 = 3) thereby giving wrong results.
Solution:
Do not choose loose_index_scan for such cases. So a new rule
WA2 is introduced to take care of the same.
WA2: "If there are predicates on C, these predicates must
be in conjuction to all predicates on all earlier keyparts
in I."
Todo the same, fix reuses the function get_constant_key_infix().
Since this funciton will fail for all multi-range conditions, it
is re-written to recognize that if the sub-conditions are
equivalent across the disjuncts: it will now succeed.
And to achieve this a new helper function is introduced called
all_same().
The fix also moves the test of NGA3 up to the former only
caller, get_constant_key_infix().
mysql-test/r/group_min_max_innodb.result:
Added test result change for Bug#17909656
mysql-test/t/group_min_max_innodb.test:
Added test cases for Bug#17909656
sql/opt_range.cc:
Introduced Rule WA2 because of Bug#17909656
Problem:
If there is a predicate on a column referenced by MIN/MAX and
that predicate is not present in all the disjunctions on
keyparts earlier in the compound index, Loose Index Scan will
not return correct result.
Analysis:
When loose index scan is chosen, range optimizer currently
groups all the predicates that contain group parts separately
and minmax parts separately. It therefore applies all the
conditions on the group parts first to the fetched row.
Then in the call to next_max, it processes the conditions
which have min/max keypart.
For ex in the following query:
Select f1, max(f2) from t1 where (f1 = 10 and f2 = 13) or
(f1 = 3) group by f1;
Condition (f2 = 13) would be applied even for rows that
satisfy (f1 = 3) thereby giving wrong results.
Solution:
Do not choose loose_index_scan for such cases. So a new rule
WA2 is introduced to take care of the same.
WA2: "If there are predicates on C, these predicates must
be in conjuction to all predicates on all earlier keyparts
in I."
Todo the same, fix reuses the function get_constant_key_infix().
Since this funciton will fail for all multi-range conditions, it
is re-written to recognize that if the sub-conditions are
equivalent across the disjuncts: it will now succeed.
And to achieve this a new helper function is introduced called
all_same().
The fix also moves the test of NGA3 up to the former only
caller, get_constant_key_infix().
Typo leading to not including the last list values (partition).
Also improved pruning to skip last partition if not used.
rb#4762 approved by Aditya and Marko.
Typo leading to not including the last list values (partition).
Also improved pruning to skip last partition if not used.
rb#4762 approved by Aditya and Marko.
! pthread_equal(pth read_self(), (&(&LOCK_open)->m_mutex)->thread)'
fails in intern_sys_var_ptr on server shutdown after uninstalling
TokuDB plugin at runtime
This assertion was introduced by patch for MDEV-5089 to ensure proper lock order
among LOCK_open and LOCK_global_system_variables: LOCK_open must not be held
while acquiring LOCK_global_system_variables.
intern_sys_var_ptr() may be called while freeing storage engine variables with
PLUGIN_VAR_MEMALLOC flag (when destroying table share after storage engine was
uninstalled). In this case LOCK_open is held, which is harmless because we need
global value pointer and thus won't acquire LOCK_global_system_variables.
Relaxed assertion so it is valid only for session variables.
Problem: Uninstallation of semi sync plugin causes replication to
break.
Analysis: A semisync enabled replication is mutual agreement between
Master and Slave when the connection (I/O thread) is established.
Once I/O thread is started and if semisync is enabled on both
master and slave, master appends special magic header to events
using semisync plugin functions and sends it to slave. And slave
expects that each event will have that special magic header format
and reads those bytes using semisync plugin functions.
When semi sync replication is in use if users execute
uninstallation of the plugin on master, slave gets confused while
interpreting that event's content because it expects special
magic header at the beginning of the event. Slave SQL thread will
be stopped with "Missing magic number in the header" error.
Similar problem will happen if uninstallation of the plugin happens
on slave when semi sync replication is in in use. Master sends
the events with magic header and slave does not know about the
added magic header and thinks that it received a corrupted event.
Hence slave SQL thread stops with "Found corrupted event" error.
Fix: Uninstallation of semisync plugin will be blocked when semisync
replication is in use and will throw 'ER_UNKNOWN_ERROR' error.
To detect that semisync replication is in use, this patch uses
semisync status variable values.
> On Master, it checks for 'Rpl_semi_sync_master_status' to be OFF
before allowing the uninstallation of rpl_semi_sync_master plugin.
>> Rpl_semi_sync_master_status is OFF when
>>> there is no dump thread running
>>> there are no semisync slaves
> On Slave, it checks for 'Rpl_semi_sync_slave_status' to be OFF
before allowing the uninstallation of rpl_semi_sync_slave plugin.
>> Rpl_semi_sync_slave_status is OFF when
>>> there is no I/O thread running
>>> replication is asynchronous replication.
Problem: Uninstallation of semi sync plugin causes replication to
break.
Analysis: A semisync enabled replication is mutual agreement between
Master and Slave when the connection (I/O thread) is established.
Once I/O thread is started and if semisync is enabled on both
master and slave, master appends special magic header to events
using semisync plugin functions and sends it to slave. And slave
expects that each event will have that special magic header format
and reads those bytes using semisync plugin functions.
When semi sync replication is in use if users execute
uninstallation of the plugin on master, slave gets confused while
interpreting that event's content because it expects special
magic header at the beginning of the event. Slave SQL thread will
be stopped with "Missing magic number in the header" error.
Similar problem will happen if uninstallation of the plugin happens
on slave when semi sync replication is in in use. Master sends
the events with magic header and slave does not know about the
added magic header and thinks that it received a corrupted event.
Hence slave SQL thread stops with "Found corrupted event" error.
Fix: Uninstallation of semisync plugin will be blocked when semisync
replication is in use and will throw 'ER_UNKNOWN_ERROR' error.
To detect that semisync replication is in use, this patch uses
semisync status variable values.
> On Master, it checks for 'Rpl_semi_sync_master_status' to be OFF
before allowing the uninstallation of rpl_semi_sync_master plugin.
>> Rpl_semi_sync_master_status is OFF when
>>> there is no dump thread running
>>> there are no semisync slaves
> On Slave, it checks for 'Rpl_semi_sync_slave_status' to be OFF
before allowing the uninstallation of rpl_semi_sync_slave plugin.
>> Rpl_semi_sync_slave_status is OFF when
>>> there is no I/O thread running
>>> replication is asynchronous replication.
It is triple bug with one test suite:
1. Incorrect outer table detection
2. Incorrect leaf table processing for multi-update (should be full like for usual updates and inserts)
3. ON condition fix_fields() fould be called for all tables of the query.
WHERE ONE OF SELECT* IS DISTINCT FAILS.
ISSUE:
------
There are 2 issues related to explain union.
1. If we have subquery with union of selects. And, one of
the select need temp table to materialize its results
then it will replace its query structure with a simple
select from temporary table. Trying to display new
internal temporary table scan resulted in crash. But to
display the query plan, we should save the original
query structure.
2. Multiple execution of prepared explain statement which
have union of subqueries resulted in crash. If we have
constant subqueries, fake select used in union operation
will be evaluated once before using it for explain.
During first execution we have set fake select options to
SELECT_DESCRIBE, but did not reset after the explain.
Hence during next execution of prepared statement during
first time evaluation of fake select we had our select
options as SELECT_DESCRIBE this resulted in improperly
initialized data structures and crash.
SOLUTION:
---------
1. If called by explain now we save the original query
structure. And this will be used for displaying.
2. Reset the fake select options after it is called for
explain of union.
sql/sql_select.cc:
Reset the fake select options after it is called for explain of union
sql/sql_union.cc:
If called by explain but not from select_describe and we
need a temp table, then we create a temp join to preserve
original query structure.
WHERE ONE OF SELECT* IS DISTINCT FAILS.
ISSUE:
------
There are 2 issues related to explain union.
1. If we have subquery with union of selects. And, one of
the select need temp table to materialize its results
then it will replace its query structure with a simple
select from temporary table. Trying to display new
internal temporary table scan resulted in crash. But to
display the query plan, we should save the original
query structure.
2. Multiple execution of prepared explain statement which
have union of subqueries resulted in crash. If we have
constant subqueries, fake select used in union operation
will be evaluated once before using it for explain.
During first execution we have set fake select options to
SELECT_DESCRIBE, but did not reset after the explain.
Hence during next execution of prepared statement during
first time evaluation of fake select we had our select
options as SELECT_DESCRIBE this resulted in improperly
initialized data structures and crash.
SOLUTION:
---------
1. If called by explain now we save the original query
structure. And this will be used for displaying.
2. Reset the fake select options after it is called for
explain of union.
BREAKS RBR
Analysis:
--------
A table created using a query of the format:
CREATE TABLE t1 AS SELECT REPEAT('A',1000) DIV 1 AS a;
breaks the Row Based Replication.
The query above creates a table having a field of datatype
'bigint' with a display width of 3000 which is beyond the
maximum acceptable value of 255.
In the RBR mode, CREATE TABLE SELECT statement is
replicated as a combination of CREATE TABLE statement
equivalent to one the returned by SHOW CREATE TABLE and
row events for rows inserted. When this CREATE TABLE event
is executed on the slave, an error is reported:
Display width out of range for column 'a' (max = 255)
The following is the output of 'SHOW CREATE TABLE t1':
CREATE TABLE t1(`a` bigint(3000) DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem is due to the combination of two facts:
1) The above CREATE TABLE SELECT statement uses the display
width of the result of DIV operation as the display width
of the column created without validating the width for out
of bound condition.
2) The DIV operation incorrectly returns the length of its first
argument as the display width of its result; thus allowing
creation of a table with an incorrect display width of 3000
for the field.
Fix:
----
This fix changes the DIV operation implementation to correctly
evaluate the display width of its result. We check if DIV's
results estimated width crosses maximum width for integer
value (21) and if yes set it to this maximum value.
This patch also fixes fixes maximum display width evaluation
for DIV function when its first argument is in UCS2.
BREAKS RBR
Analysis:
--------
A table created using a query of the format:
CREATE TABLE t1 AS SELECT REPEAT('A',1000) DIV 1 AS a;
breaks the Row Based Replication.
The query above creates a table having a field of datatype
'bigint' with a display width of 3000 which is beyond the
maximum acceptable value of 255.
In the RBR mode, CREATE TABLE SELECT statement is
replicated as a combination of CREATE TABLE statement
equivalent to one the returned by SHOW CREATE TABLE and
row events for rows inserted. When this CREATE TABLE event
is executed on the slave, an error is reported:
Display width out of range for column 'a' (max = 255)
The following is the output of 'SHOW CREATE TABLE t1':
CREATE TABLE t1(`a` bigint(3000) DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem is due to the combination of two facts:
1) The above CREATE TABLE SELECT statement uses the display
width of the result of DIV operation as the display width
of the column created without validating the width for out
of bound condition.
2) The DIV operation incorrectly returns the length of its first
argument as the display width of its result; thus allowing
creation of a table with an incorrect display width of 3000
for the field.
Fix:
----
This fix changes the DIV operation implementation to correctly
evaluate the display width of its result. We check if DIV's
results estimated width crosses maximum width for integer
value (21) and if yes set it to this maximum value.
This patch also fixes fixes maximum display width evaluation
for DIV function when its first argument is in UCS2.
Bug#18396916 MAIN.OUTFILE_LOADDATA TEST FAILS ON ARM, AARCH64, PPC/PPC64
The recorded results for the failing tests were wrong.
They were introduced by the patch for
Bug#30946 mysqldump silently ignores --default-character-set when used with --tab
Correct results were returned for platforms where 'char' is implemented as unsigned.
This was reported as
Bug#46895 Test "outfile_loaddata" fails (reproducible)
Bug#11755168 46895: TEST "OUTFILE_LOADDATA" FAILS (REPRODUCIBLE)
The patch for that bug fixed only parts of the problem,
leaving the incorrect results in the .result file.
Solution: use 'uchar' for field_terminator and line_terminator on all platforms.
Also: remove some un-necessary casts, leaving the ones we actually need.
Bug#18396916 MAIN.OUTFILE_LOADDATA TEST FAILS ON ARM, AARCH64, PPC/PPC64
The recorded results for the failing tests were wrong.
They were introduced by the patch for
Bug#30946 mysqldump silently ignores --default-character-set when used with --tab
Correct results were returned for platforms where 'char' is implemented as unsigned.
This was reported as
Bug#46895 Test "outfile_loaddata" fails (reproducible)
Bug#11755168 46895: TEST "OUTFILE_LOADDATA" FAILS (REPRODUCIBLE)
The patch for that bug fixed only parts of the problem,
leaving the incorrect results in the .result file.
Solution: use 'uchar' for field_terminator and line_terminator on all platforms.
Also: remove some un-necessary casts, leaving the ones we actually need.
Both bugs are caused by the same problem: the function optimize_cond() should
update the value of *cond_equal rather than the value of join->cond_equal,
because it is called not only for the WHERE condition, but for the HAVING
condition as well.
WRITTEN WHILE ROWS REMAINS
Problem:
========
When truncate table fails while using transactional based
engines even though the operation errors out we still
continue and log it to binlog. Because of this master has
data but the truncate will be written to binary log which
will cause inconsistency.
Analysis:
========
Truncate table can happen either through drop and create of
table or by deleting rows. In the second case the existing
code is written in such a way that even if an error occurs
the truncate statement will always be binlogged. Which is not
correct.
Binlogging of TRUNCATE TABLE statement should check whether
truncate is executed "transactionally or not". If the table
is transaction based we log the TRUNCATE TABLE only on
successful completion.
If table is non transactional there are possibilities that on
error we could have partial changes done hence in such cases
we do log in spite of errors as some of the lines might have
been removed, so the statement has to be sent to slave.
Fix:
===
Using table handler whether truncate table is being executed
in transaction based mode or not is identified and statement
is binlogged accordingly.
mysql-test/suite/binlog/r/binlog_truncate_kill.result:
Added test case to test the fix for Bug#17942050.
mysql-test/suite/binlog/t/binlog_truncate_kill.test:
Added test case to test the fix for Bug#17942050.
sql/sql_truncate.cc:
Check if truncation is successful or not and retun appropriate
return values so that binlogging can be done based on that.
sql/sql_truncate.h:
Added a new enum.
WRITTEN WHILE ROWS REMAINS
Problem:
========
When truncate table fails while using transactional based
engines even though the operation errors out we still
continue and log it to binlog. Because of this master has
data but the truncate will be written to binary log which
will cause inconsistency.
Analysis:
========
Truncate table can happen either through drop and create of
table or by deleting rows. In the second case the existing
code is written in such a way that even if an error occurs
the truncate statement will always be binlogged. Which is not
correct.
Binlogging of TRUNCATE TABLE statement should check whether
truncate is executed "transactionally or not". If the table
is transaction based we log the TRUNCATE TABLE only on
successful completion.
If table is non transactional there are possibilities that on
error we could have partial changes done hence in such cases
we do log in spite of errors as some of the lines might have
been removed, so the statement has to be sent to slave.
Fix:
===
Using table handler whether truncate table is being executed
in transaction based mode or not is identified and statement
is binlogged accordingly.
Revert the old patch revid:monty@askmonty.org-20100325133339-7mkel6valai0b4lb
This patch caused the InnoDB part of the transaction to not be marked
read-write in some cases, which messes up XA commit (and likely other
stuff as well).
The problem was in the validation of the input data for blob types.
When assigned binary data, the character blob types were only checking if
the length of these data is a multiple of the minimum char length for the
destination charset.
And since e.g. UTF-8's minimum character length is 1 (becuase it's
variable length) even byte sequences that are invalid utf-8 strings (e.g.
wrong leading byte etc) were copied verbatim into utf-8 columns when
coming from binary strings or fields.
Storing invalid data into string columns was having all kinds of ill effects
on code that assumed that the encoding data are valid to begin with.
Fixed by additionally checking the incoming binary string for validity when
assigning it to a non-binary string column.
Made sure the conversions to charsets with no known "invalid" ranges
are not covered by the extra check.
Removed trailing spaces.
Test case added.
The problem was in the validation of the input data for blob types.
When assigned binary data, the character blob types were only checking if
the length of these data is a multiple of the minimum char length for the
destination charset.
And since e.g. UTF-8's minimum character length is 1 (becuase it's
variable length) even byte sequences that are invalid utf-8 strings (e.g.
wrong leading byte etc) were copied verbatim into utf-8 columns when
coming from binary strings or fields.
Storing invalid data into string columns was having all kinds of ill effects
on code that assumed that the encoding data are valid to begin with.
Fixed by additionally checking the incoming binary string for validity when
assigning it to a non-binary string column.
Made sure the conversions to charsets with no known "invalid" ranges
are not covered by the extra check.
Removed trailing spaces.
Test case added.
LOCAL AND IMPORT ERRORS
Description:
-----------
This bug happens due to the fact that current algorithm is designed
that in the case of LOCAL load of data, in case of the error, the
remaining part of the file is read in order to return the proper
error message to the client side.
But, the problem with current implementation is that data stream
for the client side is cleared only in the case where line delimiters
exist, which is not a case with, for example fixed width
fields.
Fix:
----
Ported patch provided by Sinisa Milivojevic n bug report for this
issue to 5.5+ versions.
As part of this patch code is changed to clear the data stream
by calling new member function "READ_INFO::skip_data_till_eof".
LOCAL AND IMPORT ERRORS
Description:
-----------
This bug happens due to the fact that current algorithm is designed
that in the case of LOCAL load of data, in case of the error, the
remaining part of the file is read in order to return the proper
error message to the client side.
But, the problem with current implementation is that data stream
for the client side is cleared only in the case where line delimiters
exist, which is not a case with, for example fixed width
fields.
Fix:
----
Ported patch provided by Sinisa Milivojevic n bug report for this
issue to 5.5+ versions.
As part of this patch code is changed to clear the data stream
by calling new member function "READ_INFO::skip_data_till_eof".
Before this fix, specially crafted queries
using the INFORMATION_SCHEMA could crash the server.
The root cause was a buffer overflow,
see the (private) bug comments for details.
With this fix, the buffer overflow condition is properly handled,
and the queries involved do return the expected result.
Before this fix, specially crafted queries
using the INFORMATION_SCHEMA could crash the server.
The root cause was a buffer overflow,
see the (private) bug comments for details.
With this fix, the buffer overflow condition is properly handled,
and the queries involved do return the expected result.
- With big_tables=ON, materialized table will use Aria (or MyISAM) SE, which
allows prefix key reads. However, the temp.table has rec_per_key=NULL which
causes the optimizer to crash when attempting to read index statistics for a
prefix index read.
- Fixed by providing a rec_per_key array with zeros (i.e. "no statistics data")