Note: Backporting the patch from mysql-5.6.
Problem:
A CREATE TABLE with an invalid table name is detected
at SQL layer. So the table name is reset to an empty
string. But the storage engine is called with this
empty table name. The table name is specified as
"database/table". So, in the given scenario we get
only "database/".
Solution:
Within InnoDB, detect this error and report it to
higher layer.
rb#9274 approved by jimmy.
recv_find_max_checkpoint(): Amend the error message to give advice
about downgrading. The 5.7.9 redo log format was intentionally changed
so that older MySQL versions will not find a valid redo log checkpoint.
PROBLEM
Whenever we insert in unique secondary index we take shared
locks on all possible duplicate record present in the table.
But while during a replace on the unique secondary index ,
we take exclusive and locks on the all duplicate record.
When the records are deleted, they are first delete marked
and later purged by the purge thread. While purging the
record we call the lock_update_delete() which in turn calls
lock_rec_inherit_to_gap() to inherit locks of the deleted
records. In repeatable read mode we inherit all the locks
from the record to the next record but in the read commited
mode we skip inherting them as gap type locks. We make a
exception here if the lock on the records is in shared mode
,we assume that it is set during insert for unique secondary
index and needs to be inherited to stop constraint violation.
We didnt handle the case when exclusive locks are set during
replace, we skip inheriting locks of these records and hence
causing constraint violation.
FIX
While inheriting the locks,check whether the transaction is
allowed to do TRX_DUP_REPLACE/TRX_DUP_IGNORE, if true
inherit the locks.
[ Revewied by Jimmy #rb9709]
The root cause is that x86 has a stronger memory model than the ARM
processors. And the GCC builtins didn't issue the correct fences when
setting/unsetting the lock word. In particular during the mutex release.
The solution is rewriting atomic TAS operations: replace '__sync_' by
'__atomic_' if possible.
Reviewed-by: Sunny Bains <sunny.bains@oracle.com>
Reviewed-by: Bin Su <bin.x.su@oracle.com>
Reviewed-by: Debarun Banerjee <debarun.banerjee@oracle.com>
Reviewed-by: Krunal Bauskar <krunal.bauskar@oracle.com>
RB: 9782
RB: 9665
RB: 9783
INSERT INDEX RECORD
Problem:
=======
IBUF_BITMAP_FREE bit in ibuf bitmap array is used to indicate the free
space available in leaf page. IBUF_BITMAP_FREE bit indicates free
space more than actual existing free space for the leaf page.
Solution:
=========
Ibuf_bitmap_array is not updated for the secondary index leaf page when
insert operation is done by updating a delete marked existing
record in the index.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
RB: 9544
Post push fix. The function cmp_dtuple_rec() was used without a prototype
in the file row0purge.c. Adding the include file rem0cmp.h to row0purge.c
to resolve this issue.
approved by Krunal over IM.
Problem:
If we add a referential integrity constraint with a duplicate
name, an error occurs. The foreign key object would not have
been added to the dictionary cache. In the error path, there
is an attempt to remove this foreign key object. Since this
object is not there, the search returns a NULL result.
De-referencing the null object results in this crash.
Solution:
If the search to the foreign key object failed, then don't
attempt to access it.
rb#9309 approved by Marko.
Problem :
---------
This is a regression of Bug#19138298. In purge_node_t::validate_pcur
we are trying to get offsets for all columns of clustered index from
stored record in persistent cursor. This would fail when stored record
is not having all fields of the index. The stored record stores only
fields that are needed to uniquely identify the entry.
Solution :
----------
1. Use pcur.old_n_fields to get fields that are stored
2. Add comment to note dependency between stored fields in purge node
ref and stored cursor.
3. Return if the cursor record is not already stored as it is not safe
to access cursor record directly without latch.
Reviewed-by: Marko Makela <marko.makela@oracle.com>
RB: 9139
Problem :
---------
This is a regression of bug-19138298. During purge, if
btr_pcur_restore_position fails, we set found_clust to FALSE
so that it can find a possible clustered index record in future
calls for the same undo entry. This, however, overwrites the
old_rec_buf while initializing pcur again in next call.
The leak is reproducible in local environment and with the
test provided along with bug-19138298.
Solution :
----------
If btr_pcur_restore_position() fails close the cursor.
Reviewed-by: Marko Makela <Marko.Makela@oracle.com>
Reviewed-by: Annamalai Gurusami <Annamalai.Gurusami@oracle.com>
RB: 9074
As man page of open(2) suggested, we should open the same file in the same
mode, to have better performance. For some data files, we will first call
os_file_create_simple_no_error_handling_func() to open them, and then call
os_file_create_func() again. We have to make sure if DIRECT IO is specified,
these two functions should both open file with O_DIRECT.
Reviewed-by: Sunny Bains <sunny.bains@oracle.com>
RB: 8981
Scenario:
1. The purge thread takes an undo log record and parses it and forms
the record to be purged. We have the primary and secondary keys
to locate the actual records.
2. Using the secondary index key, we search in the secondary index.
One record is found.
3. Then it is checked if this record can be purged. The answer is we
can purge this record. To determine this we look up the clustered
index record. Either there is no corresponding clustered index
record, or the matching clustered index record is delete marked.
4. Then we check whether the secondary index record is delete marked.
We find that it is not delete marked. We report warning in optimized
build and assert in debug build.
Problem:
In step 3, we report that the record is purgeable even though it is
not delete marked. This is because of inconsistency between the
following members of purge_node_t structure - found_clust, ref and pcur.
Solution:
In the row_purge_reposition_pcur(), if the persistent cursor restore
fails, then reset the purge_node_t->found_clust member. This will
keep the members of purge_node_t structure in a consistent state.
rb#8813 approved by Marko.
if XA PREPARE transactions hold explicit locks.
innobase_shutdown_for_mysql(): Call trx_sys_close() before lock_sys_close()
(and dict_close()) so that trx_free_prepared() will see all locks intact.
RB: 8561
Reviewed-by: Vasil Dimov <vasil.dimov@oracle.com>
PROBLEM
Create time is calculated as last status change time of .frm file.
The first problem was that innodb was passing file name as
"table_name#po#p0.frm" to the stat() call which calculates the create time.
Since there is no frm file with this name create_time will be stored as NULL.
The second problem is ha_partition::info() updates stats for create time
when HA_STATUS_CONST flag was set ,where as innodb calculates this statistic
when HA_STATUS_TIME is set,which causes create_time to be set as NULL.
Fix
Pass proper .frm name to stat() call and calculate create time when
HA_STATUS_CONST flag is set.
TO USE A SECOND WATCH PAGE PER INSTANCE
Description:
BUF_POOL_WATCH_SIZE is also initialized to number of purge threads.
so BUF_POOL_WATCH_SIZE will never be lesser than number of purge threads.
From the code, there is no scope for purge thread to skip buf_pool_watch_unset.
So there can be at most one buffer pool watch active per purge thread.
In other words, there is no chance for purge thread instance to hold a watch
when setting another watch.
Solution:
Adding code comments to clarify the issue.
Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
Approved via Bug page.
MYISAM TABLE CAUSES THE SERVER TO CRASH
Issue:
-----
During index maintanence, R-tree node might need a split.
In some cases the square of mbr could be calculated to
infinite (as in this case) or to NaN. This is currently
not handled. This is specific to MyISAM.
SOLUTION:
---------
If the calculated value in "mbr_join_square" is infinite or
NaN, set it to max double value.
Initialization of output parameters of "pick_seeds" is
required if calculation is infinite (or negative infinite).
Similar to the fix made for INNODB as part of Bug#19533996.
RESERVATION AND SIGNAL COUNT
Problem:
Reservation and Signal count value shows negative value for show engine
innodb statement.
Solution:
This is happening due to counter overflow error. Reservation and Signal
count values are defined as unsigned long but these variables are converted to
long while printing it. Change Reservation and Signal count values as unsigned
long datatype while printing it.
Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
Approved in bug page.
MYISAM TABLE CAUSES THE SERVER TO CRASH
Issue:
-----
During index maintanence, R-tree node might need a split.
In some cases the square of mbr could be calculated to
infinite (as in this case) or to NaN. This is currently
not handled. This is specific to MyISAM.
SOLUTION:
---------
If the calculated value in "mbr_join_square" is infinite or
NaN, set it to max double value.
Initialization of output parameters of "pick_seeds" is
required if calculation is infinite (or negative infinite).
Similar to the fix made for INNODB as part of Bug#19533996.
Problem:
This is a coding mistake during error handling. When the specified foreign
key constraint is wrong because of data type mismatch, the resulting
foreign key object will not have valid foreign->id (it will be NULL.)
Solution:
While removing the foreign key object from dictionary cache during error
handling, ensure that foreign->id is not null before using it.
rb#8204 approved by Sunny.
ISSUE:
------
There can be up to MERGEBUFF2 number of sorted merge chunks,
We need enough buffer space for at least one record from
each merge chunks. If estimates are wrong(very low) and we
allocate buffer space for less than MERGEBUFF2, then we will
have issue in merge_buffers, if actual number of rows to be
sorted is bigger than estimate and external filesort is
chosen.
SOLUTION:
---------
Set number of rows to sort to be at least MERGEBUFF2.
CRASHES WITH AUTO_INCREMENT COLUMN
Description:- Creating a federated table with AUTO_INCREMENT
column using LIKE clause results in a server crash.
Analysis:- Creating a federated table with AUTO_INCREMENT
column using LIKE clause results in a federated server
crash due to the uninitialized connection structure(mysql).
Also due to unassigned connection string for the remote
server, at the time of preparation of "create_info"
structure, the creation of any federated table using LIKE
clause fails with an error, "ERROR 1 (HY000): server name:
'' doesn't exist!". This bug is not only with
AUTO_INCREMENT but in all creations of federated tables with
LIKE clause.
Fix :- In ha_federated::info(), "mysql->insert_id" assigned
to "stats.auto_increment_value" only when there is an
active connection. This fixes the crash issue. For creating
the federated table with LIKE clause, connection string is
assigned at the time of preparation of "create_info"
structure.
CRASHES ON EVERY START ATTEMPT
Description:
------------
push_warning_printf function is used to print the warning message
to the client. So this function should not invoke while recovering
the server. Moreover current_thd is NULL while starting the server.
Solution:
---------
- Avoiding the warning to be printed while recovery.
This patch already pushed in mysql-5.6.
COMMUNICATION PACKETS; FEDERATED TABLE
Description:- Execution of FLUSH TABLES on a federated
table which has been idle for wait_timeout (on the remote
server) + tcp_keepalive_time, fails with an error,
"ERROR 1160 (08S01): Got an error writing communication
packets."
Analysis:- During FLUSH TABLE execution the federated
table is closed which will inturn close the federated
connection. While closing the connection, federated server
tries to communincate with the remote server. Since the
connection was idle for wait_timeout(on the remote server)+
tcp_keepalive_time, the socket gets closed. So this
communication fails because of broken pipe and the error is
thrown. But federated connections are expected to reconnect
silently. And also it cannot reconnect because the
"auto_reconnect" variable is set to 0 in "mysql_close()".
Fix:- Before closing the federated connection, in
"ha_federated_close()", a check is added which will verify
wheather the connection is alive or not. If the connection
is not alive, then "mysql->net.error" is set to 2 which
will indicate that the connetion is broken. Also the
setting of "auto_reconnect" variable to 0 is delayed and is
done after "COM_QUIT" command.
NOTE:- For reproducing this issue, "tcp_keepalive_time" has
to be set to a smaller value. This value is set in the
"/proc/sys/net/ipv4/tcp_keepalive_time" file in Unix
systems. So we need root permission for changing it, which
can't be done through mtr test. So submitting the patch
without mtr test.
Description:
Using correct length when moving to next field in cmp_ref. The store
length already includes the length bytes of blobs, which is already considered
earlier for blob types.
Approved by Mattias, Jimmy [rb-7088]
The debug configuration parameter innodb_optimistic_insert_debug
which was introduced for testing corner cases in B-tree handling
had a bug in it. The value 1 would trigger an infinite sequence
of page splits.
Fix: When the value 1 is specified, disable this debug feature.
Approved by Yasufumi Kinoshita
Problem:
In the function dict_foreign_remove_from_cache(), the rb tree was updated
without actually verifying whether the given foreign key object is there in the
rb tree or not. There can be an existing foreign key object with the same id
in the rb tree, which must not be removed. Such a scenario comes when an
attempt is made to add a foreign key object with a duplicate identifier.
Solution:
When the foreign key object is removed from the dictionary cache, ensure
that the foreign key object removed from the rbt is the correct one.
rb#7168 approved by Jimmy and Marko.
dict_set_corrupted(): Use the canonical way of searching for
less-than-equal (PAGE_CUR_LE) and then checking low_match.
The code that was introduced in MySQL 5.5.17 in
Bug#11830883 SUPPORT "CORRUPTED" BIT FOR INNODB TABLES AND INDEXES
could position the cursor on the page supremum, and then attempt
to overwrite non-existing 7th field of the 1-field supremum record.
Approved by Jimmy Yang
Bug#17959689: MAKE GCC AND CLANG GIVE CONSISTENT COMPILATION WARNINGS
Bug#18313717: ENABLE -WERROR IN MAINTANER MODE WHEN COMPILING WITH CLANG
Bug#18510941: REMOVE CMAKE WORKAROUNDS FOR OLDER VERSIONS OF OS X/XCODE
Backport from mysql-5.6 to mysql-5.5
FROM A FUNCTION
Scenario:
In a stored procedure, CREATE TABLE statement is not allowed. But an
exception is provided for CREATE TEMPORARY TABLE. We can create a temporary
table in a stored procedure.
Let there be two stored functions f1 and f2 and two stored procedures p1 and
p2. Their properties are as follows:
. stored function f1() calls stored procedure p1().
. stored function f2() calls stored procedure p2().
. stored procedure p1() creates temporary table t1.
. stored procedure p2() does DML on t1.
Consider the following situation:
1. Autocommit mode is on.
2. select f1()
3. select f2()
Step 2: In this step, t1 would be created via p1(). A table level transaction
lock would have been taken. The ::external_lock() would not have been called
on this table. At the end of step 2, because of autocommit mode on, this table
level lock will be released.
Step 3: When we execute DML on table t1 via p2() we have two problems:
Problem 1:
The function ha_innobase::external_lock() would have been called but since
it is a select query no table level locks would have been taken. Hence the
following assert will fail:
ut_ad(lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
Solution:
The solution would be to identify this situation and take a table level lock
and use the proper lock type prebuilt->select_lock_type = LOCK_X for DML
operations.
Problem 2:
Another problem is that in step 3, ha_innobase::open() is never called on
the table t1.
Solution:
The solution would be to identify this situation and call re-init the handler
of table t1.
rb#6429 approved by Krunal.
Problem:
Creation of a table fails when innodb_strict_mode is enabled, but the same
table is created without any warning when innodb_strict_mode is enabled.
Solution:
If creation of a table fails with an error when innodb_strict_mode is
enabled, it must issue a warning when innodb_strict_mode is disabled.
rb#6723 approved by Krunal.
CHECK.
Analysis:
----------
Issue here is, while creating or altering the InnoDB table,
if the foreign key defined on the table references a parent
table on which the user has no access privileges then the
table is created without reporting any error.
Currently the privilege level REFERENCES_ACL is unused
and is not used for access evaluation while creating the
table with a foreign key constraint or adding the foreign
key constraint to a table. But when no privileges are granted
to user then also access evaluation on parent table is ignored.
Fix:
---------
For DMLs, irrelevant of the fact, support does not want any
changes to avoid permission checks on every operation.
So, as a fix, added a function "check_fk_parent_table_access"
to check whether any of the SELECT_ACL, INSERT_ACL, UDPATE_ACL,
DELETE_ACL or REFERENCE_ACL privileges are granted for user
at table level. If none of them is granted then error is reported.
This function is called during the table creation and alter
operation.
Problem:
We maintain two rb trees in each dict_table_t. The foreign_rbt must be in
sync with foreign_list. The referenced_rbt must be in sync with
referenced_list. There is one function which checks this consistency and it
failed, resulting in an assert failure.
The root cause of the problem was identified that the search order was
lost in the referenced_rbt. This is because while renaming the table,
we didn't not refresh this referenced_rbt.
Solution:
When a foreign key is renamed, we must delete and re-insert into both
foreign_rbt and referenced_rbt.
rb#6412 approved by Jimmy.
IN RECOVERY
During redo log processing, the data dictionary is not available. We should
check it in dict_find_table_by_space() to prevent SEGV error.
rb#5678, approved by Jimmy.
Problem:
When a unique secondary index is scanned for duplicate checking, gap locks
were not taken if the transaction had isolation level <= READ COMMITTED.
This change was done while fixing Bug #16133801 UNEXPLAINABLE INNODB UNIQUE
INDEX LOCKS ON DELETE + INSERT WITH SAME VALUES (rb#2035). Because of this
the duplicate check logic failed, and resulted in duplicate values in unique
secondary index.
Solution:
When a unique secondary index is scanned for duplicate checking, gap locks
must be taken irrespective of the transaction isolation level. This is
achieved by reverting rb#2035.
rb#5910 approved by Jimmy
WITH CERTAIN MAX_HEAP_TABLE_SIZE VALUES
Description:
When the system variable 'max_heap_table_size'
is set to 20GB, the server crashes on creation of a
temporary tables or tables using MEMORY storage engine.
Analysis:
The variable 'max_record' determines the amount heap
allocated for the records of the table. This value
is determined using the 'max_heap_table_size' variable.
'records_in_block' in turn uses the max_records to
determine the number of records per block.
When the 'max_heap_table_size' is set to 20GB, then
the 'records_in_block' is calculated to a value of
2^28.
The size of the block determined by multiplying the
'records_in_block' and 'recbuffer' results in overflow
and hence the value becomes zero. As a result, zero bytes
of the heap is allocated for the table. This will
result in a server crash when the table is accessed.
Fix:
The variables 'records_in_block' and 'recbuffer' are
typecasted to 'unsigned long' while calculating the
size of the block.
SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
Description: Using the temporary file vulnerability an
attacker can create a file with arbitrary content at a
location of his choice. This can be used to create the
file /var/lib/mysql/my.cnf, which will be read as a
configuration file by MySQL, because it is located in the
home directory of the mysql user. With this configuration
file, the attacker can specify his own plugin_dir variable,
which then allows him to load arbitrary code via
"INSTALL PLUGIN...".
Analysis: While creating the ".TMD" file we are not checking
if the file is already exits or not in mi_repair() function.
And we are truncating if the ".TMD" file exits and going ahead
This is creating the security breach.
Fix: We need to use O_EXCL flag along with O_RDWR and O_TRUNC
which will make sure if any user creates ".TMD" file, will
fails the repair table with "cannot create ".TMD" file error".
Actually we are initialing "param.tmpfile_createflag" member
with O_RDWR | O_TRUNC | O_EXCL in myisamchk_init(). And we
are modifying it in ha_myisam::repair() to O_RDWR | O_TRUNC.
So, we need to remove the line which is modifying the
"param.tmpfile_createflag".
archive table which is using an auto increment column, the
server hangs. In order to recover the mysqld process, it
has to be terminated abnormally using SIGKILL. The problem
is observed in mysql-5.5.
Bug #18065452 "PREPARING" STATE HOGS CPU WITH ARCHIVE
+ SUBQUERY
Analysis: This happens because the server is trapped inside
an infinite loop in the function,
"subselect_indexsubquery_engine::exec()". This function
resolves the correlated suquery by doing an index lookup
for the appropriate engine. In case of archive engine,
after reaching the end of records, "table->status" is not
set to STATUS_NOT_FOUND. As a result the loop is not
terminated.
Fix: The "table->status" is set to STATUS_NOT_FOUND when
the end of records is reached.
THE PERFORMANCE UNDER HEAVY INSERT
Problem:
There are three memset call to allocate memory for system fields
in each insert.
Solution:
Instead of calling it in 3 times, we can combine it into
one memset call. It will reduce the CPU usage under heavy insert.
Approved by Marko rb-4916
FAILING ASSERTION: FLEN == LEN
Problem:
Broken invariant triggered when building a unique index on a
binary column and the input data contains duplicate keys. This was broken
in debug builds only.
Fix:
Fixed length of the binary datatype can be greater than length of
the shorter prefix on which index is being created.