IN RECOVERY
During redo log processing, the data dictionary is not available. We should
check it in dict_find_table_by_space() to prevent SEGV error.
rb#5678, approved by Jimmy.
IN RECOVERY
During redo log processing, the data dictionary is not available. We should
check it in dict_find_table_by_space() to prevent SEGV error.
rb#5678, approved by Jimmy.
Problem:
When a unique secondary index is scanned for duplicate checking, gap locks
were not taken if the transaction had isolation level <= READ COMMITTED.
This change was done while fixing Bug #16133801 UNEXPLAINABLE INNODB UNIQUE
INDEX LOCKS ON DELETE + INSERT WITH SAME VALUES (rb#2035). Because of this
the duplicate check logic failed, and resulted in duplicate values in unique
secondary index.
Solution:
When a unique secondary index is scanned for duplicate checking, gap locks
must be taken irrespective of the transaction isolation level. This is
achieved by reverting rb#2035.
rb#5910 approved by Jimmy
Problem:
When a unique secondary index is scanned for duplicate checking, gap locks
were not taken if the transaction had isolation level <= READ COMMITTED.
This change was done while fixing Bug #16133801 UNEXPLAINABLE INNODB UNIQUE
INDEX LOCKS ON DELETE + INSERT WITH SAME VALUES (rb#2035). Because of this
the duplicate check logic failed, and resulted in duplicate values in unique
secondary index.
Solution:
When a unique secondary index is scanned for duplicate checking, gap locks
must be taken irrespective of the transaction isolation level. This is
achieved by reverting rb#2035.
rb#5910 approved by Jimmy
SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
Analysis: Can't disable the error message because you may get database
started with incorrect log file size.
Fix: Thus only improve the error message to give more information
to users.
THE PERFORMANCE UNDER HEAVY INSERT
Problem:
There are three memset call to allocate memory for system fields
in each insert.
Solution:
Instead of calling it in 3 times, we can combine it into
one memset call. It will reduce the CPU usage under heavy insert.
Approved by Marko rb-4916
THE PERFORMANCE UNDER HEAVY INSERT
Problem:
There are three memset call to allocate memory for system fields
in each insert.
Solution:
Instead of calling it in 3 times, we can combine it into
one memset call. It will reduce the CPU usage under heavy insert.
Approved by Marko rb-4916
This is not yet a fix. This is change to print additional information at the point
when this assertion is going to happen. Print as much information about the pages
and index to find out why next page is not a compact format.
FAILING ASSERTION: FLEN == LEN
Problem:
Broken invariant triggered when building a unique index on a
binary column and the input data contains duplicate keys. This was broken
in debug builds only.
Fix:
Fixed length of the binary datatype can be greater than length of
the shorter prefix on which index is being created.
FAILING ASSERTION: FLEN == LEN
Problem:
Broken invariant triggered when building a unique index on a
binary column and the input data contains duplicate keys. This was broken
in debug builds only.
Fix:
Fixed length of the binary datatype can be greater than length of
the shorter prefix on which index is being created.
file storage.
Analysis: posix_fallocate was called using 0 as offset and len as
desired size. This is not optimal for SSDs.
Fix: Call posix_fallocate with correct offset i.e. current file size
and extend the file from there len bytes.
on debug build when innodb_use_fallocate=1
Analysis: There was merge error that caused additional call to
fil_node_complete_io.
Fixed by removing incorrect call and fixed calculation of extended
file size.
Problem:
In the clustered index, when an update operation is done the overall
scenario (after rb#4479) is as follows:
1. Delete mark the old record that is to be updated.
2. The old record disowns the blobs.
3. Insert the new record into clustered index.
4. For non-updated blobs, new record must own it. Verified by assert.
5. For non-updated blobs, in new record marked as inherited.
Scenario involving DB_LOCK_WAIT:
If step 3 times out, then we will skip 1 and 2 and will continue from
step 3. This skipping is achieved by the UPD_NODE_INSERT_BLOB state.
In this case, step 4 is not correct. Because of step 1, the new
record need not own the blobs. Hence the assert failure.
Solution:
The assert in step 4 is removed. Instead code is added to ensure that
the record owns the blob.
Note:
This is a regression caused by rb#4479.
rb#4571 approved by Marko
Problem:
In the clustered index, when an update operation is done the overall
scenario (after rb#4479) is as follows:
1. Delete mark the old record that is to be updated.
2. The old record disowns the blobs.
3. Insert the new record into clustered index.
4. For non-updated blobs, new record must own it. Verified by assert.
5. For non-updated blobs, in new record marked as inherited.
Scenario involving DB_LOCK_WAIT:
If step 3 times out, then we will skip 1 and 2 and will continue from
step 3. This skipping is achieved by the UPD_NODE_INSERT_BLOB state.
In this case, step 4 is not correct. Because of step 1, the new
record need not own the blobs. Hence the assert failure.
Solution:
The assert in step 4 is removed. Instead code is added to ensure that
the record owns the blob.
Note:
This is a regression caused by rb#4479.
rb#4571 approved by Marko
AUTO_INCREMENT_INCREMENT
Problem:
=======
When auto_increment_increment system variable decreases,
immediate next value of auto increment column is not affected.
Solution:
========
Get the previous inserted value of auto increment column by
subtracting the previous auto_increment_increment from next
auto increment value. After that calculate the current autoinc value
using newly changed auto_increment_increment variable.
Approved by Sunny [rb#4394]
AUTO_INCREMENT_INCREMENT
Problem:
=======
When auto_increment_increment system variable decreases,
immediate next value of auto increment column is not affected.
Solution:
========
Get the previous inserted value of auto increment column by
subtracting the previous auto_increment_increment from next
auto increment value. After that calculate the current autoinc value
using newly changed auto_increment_increment variable.
Approved by Sunny [rb#4394]
Problem:
The function row_upd_changes_ord_field_binary() is used to decide whether to
use row_upd_clust_rec_by_insert() or row_upd_clust_rec(). The function
row_upd_changes_ord_field_binary() does not make use of charset information.
Based on binary comparison it decides that r1 and r2 differ in their ordering
fields.
In the function row_upd_clust_rec_by_insert(), an update is done by delete +
insert. These operations internally make use of cmp_dtuple_rec_with_match()
to compare records r1 and r2. This comparison takes place with the use of
charset information.
This means that it is possible for the deleted record to be reused in the
subsequent insert. In the given scenario, the characters 'a' and 'A' are
considered equal in the my_charset_latin1. When this happens, the ownership
information of externally stored blobs are not correctly handled.
Solution:
When an update is done by delete followed by insert, disown the relevant
externally stored fields during the delete marking itself (within the same
mtr). If the insert succeeds, then nothing with respect to blob ownership
needs to be done. If the insert fails, then the disown done earlier will be
removed when the operation is rolled back.
rb#4479 approved by Marko.
Problem:
The function row_upd_changes_ord_field_binary() is used to decide whether to
use row_upd_clust_rec_by_insert() or row_upd_clust_rec(). The function
row_upd_changes_ord_field_binary() does not make use of charset information.
Based on binary comparison it decides that r1 and r2 differ in their ordering
fields.
In the function row_upd_clust_rec_by_insert(), an update is done by delete +
insert. These operations internally make use of cmp_dtuple_rec_with_match()
to compare records r1 and r2. This comparison takes place with the use of
charset information.
This means that it is possible for the deleted record to be reused in the
subsequent insert. In the given scenario, the characters 'a' and 'A' are
considered equal in the my_charset_latin1. When this happens, the ownership
information of externally stored blobs are not correctly handled.
Solution:
When an update is done by delete followed by insert, disown the relevant
externally stored fields during the delete marking itself (within the same
mtr). If the insert succeeds, then nothing with respect to blob ownership
needs to be done. If the insert fails, then the disown done earlier will be
removed when the operation is rolled back.
rb#4479 approved by Marko.
The maximum value for innodb_thread_sleep_delay is 4294967295 (32-bit) or
18446744073709551615 (64-bit) microseconds. This is way too big, since
the max value of innodb_thread_sleep_delay is limited by
innodb_adaptive_max_sleep_delay if that value is set to non-zero value
(its default is 150,000).
Solution
The maximum value of innodb_thread_sleep_delay should be the same as
the maximum value of innodb_adaptive_max_sleep_delay, which is 1000000.
Approved by Jimmy, rb#4429
The maximum value for innodb_thread_sleep_delay is 4294967295 (32-bit) or
18446744073709551615 (64-bit) microseconds. This is way too big, since
the max value of innodb_thread_sleep_delay is limited by
innodb_adaptive_max_sleep_delay if that value is set to non-zero value
(its default is 150,000).
Solution
The maximum value of innodb_thread_sleep_delay should be the same as
the maximum value of innodb_adaptive_max_sleep_delay, which is 1000000.
Approved by Jimmy, rb#4429
IN DOCUMENTATION
Problem
-------
The documentation says that we support 'K' prefix
while specifiying size for innodb datafile in the
server variable for innodb_data_file_path ,but the
function srv_parse_megabytes() only handles only
'M' (megabytes) and 'G' (gigabytes) .
Fix
---
Modify srv_parse_megabytes() to handle Kilobytes.
Add in documentation that while specifying size
in KB it should be mentioned in multiples of 1024
other wise they will be rounded off to nearest
MB (megabyte) boundry .(eg if size mentioned
as 2313KB will be considered as 2 MB ).
[ Approved by Marko #rb 2387 ]
IN DOCUMENTATION
Problem
-------
The documentation says that we support 'K' prefix
while specifiying size for innodb datafile in the
server variable for innodb_data_file_path ,but the
function srv_parse_megabytes() only handles only
'M' (megabytes) and 'G' (gigabytes) .
Fix
---
Modify srv_parse_megabytes() to handle Kilobytes.
Add in documentation that while specifying size
in KB it should be mentioned in multiples of 1024
other wise they will be rounded off to nearest
MB (megabyte) boundry .(eg if size mentioned
as 2313KB will be considered as 2 MB ).
[ Approved by Marko #rb 2387 ]
ERRORS IN THE FK SECTION
ANALYSIS
--------
Any error during the renaming of the table was
incorrectly logged in the dict_foreign_err_file
and it showed up in foreign key section when
we give the query "show engine innodb status".
FIX
---
Prevent renaming error from being logged in
dict_foreign_err_file section.
[Aprooved by marko #rb 2501 ]
ERRORS IN THE FK SECTION
ANALYSIS
--------
Any error during the renaming of the table was
incorrectly logged in the dict_foreign_err_file
and it showed up in foreign key section when
we give the query "show engine innodb status".
FIX
---
Prevent renaming error from being logged in
dict_foreign_err_file section.
[Aprooved by marko #rb 2501 ]
possibly since it was introduced in the patch for Bug#16720368 around
2013-04-30. This fix is simply to adjust the mtr.add_suppression() lines
in the testcase and to add a missing "\n" in the error message.
Approved by Marko in RB 3746
possibly since it was introduced in the patch for Bug#16720368 around
2013-04-30. This fix is simply to adjust the mtr.add_suppression() lines
in the testcase and to add a missing "\n" in the error message.
Approved by Marko in RB 3746
Regression from bug#14621190 due to disabled optimistic restoration
of cursor, which required full key lookup instead of verifying
if previously positioned btree cursor could be reused.
Fixed by enable optimistic restore and adjust cursor afterward.
rb#3324 approved by Marko.
Regression from bug#14621190 due to disabled optimistic restoration
of cursor, which required full key lookup instead of verifying
if previously positioned btree cursor could be reused.
Fixed by enable optimistic restore and adjust cursor afterward.
rb#3324 approved by Marko.
--Implemented CHECK TABLE...QUICK.
Introduce CHECK TABLE...QUICK that would skip the btr_validate_index()
and btr_search_validate() call, and count the no. of records in each index.
Approved by Marko and Kevin. (rb#3567).
--Implemented CHECK TABLE...QUICK.
Introduce CHECK TABLE...QUICK that would skip the btr_validate_index()
and btr_search_validate() call, and count the no. of records in each index.
Approved by Marko and Kevin. (rb#3567).
AND 'KILL SESSION' LEAD TO CRASH
Analysis:
--------
This situation occurs when the connection executes query
"show engine innodb status" and this connection is killed by
executing statement "kill <con>" by another connection.
In function "innodb_show_status", function "stat_print"
is called to print the status but return value of function
is not checked. After killing connection, if write to
connection fails then error is returned and same is set
in Diagnostic area. Since FALSE is returned from
"innodb_show_status" now, assert to check no error
is set in function "set_eof_status" (called from
my_eof) is failing.
Fix:
----
Changed code to check return value of function "stat_print"
in "innodb_show_status".
AND 'KILL SESSION' LEAD TO CRASH
Analysis:
--------
This situation occurs when the connection executes query
"show engine innodb status" and this connection is killed by
executing statement "kill <con>" by another connection.
In function "innodb_show_status", function "stat_print"
is called to print the status but return value of function
is not checked. After killing connection, if write to
connection fails then error is returned and same is set
in Diagnostic area. Since FALSE is returned from
"innodb_show_status" now, assert to check no error
is set in function "set_eof_status" (called from
my_eof) is failing.
Fix:
----
Changed code to check return value of function "stat_print"
in "innodb_show_status".
ha_innobase::records_in_range() should return HA_POS_ERROR for the table during discarded without requesting pages.
The later other handler method should treat the error correctly.
Approved by Sunny in rb#3433
ha_innobase::records_in_range() should return HA_POS_ERROR for the table during discarded without requesting pages.
The later other handler method should treat the error correctly.
Approved by Sunny in rb#3433
INDEX_READ_MAP HAD NO MATCH
If index_read_map is called for exact search and no matching records
exists it will position the cursor on the next record, but still having the
relative position to BTR_PCUR_ON.
This will make a call for index_next to read yet another next record,
instead of returning the record the cursor points to.
Fixed by setting pcur->rel_pos = BTR_PCUR_BEFORE if an exact
[prefix] search is done, but failed.
Also avoids optimistic restoration if rel_pos != BTR_PCUR_ON,
since btr_cur may be different than old_rec.
rb#3324, approved by Marko and Jimmy
INDEX_READ_MAP HAD NO MATCH
If index_read_map is called for exact search and no matching records
exists it will position the cursor on the next record, but still having the
relative position to BTR_PCUR_ON.
This will make a call for index_next to read yet another next record,
instead of returning the record the cursor points to.
Fixed by setting pcur->rel_pos = BTR_PCUR_BEFORE if an exact
[prefix] search is done, but failed.
Also avoids optimistic restoration if rel_pos != BTR_PCUR_ON,
since btr_cur may be different than old_rec.
rb#3324, approved by Marko and Jimmy
The testcase for this bug fails randomly due to two reasons.
1. Due to ibuf merge happening background
2. Due to dict stats update which brings the evicted page back into
buffer pool.
Fix ibuf_contract_ext() to not do any merges with ibuf_debug enabled and
also changed dict_stats_update() to return fake statistics without
bringing the secondary index pages into buffer pool.
Approved by Marko. rb#3419
The testcase for this bug fails randomly due to two reasons.
1. Due to ibuf merge happening background
2. Due to dict stats update which brings the evicted page back into
buffer pool.
Fix ibuf_contract_ext() to not do any merges with ibuf_debug enabled and
also changed dict_stats_update() to return fake statistics without
bringing the secondary index pages into buffer pool.
Approved by Marko. rb#3419
IT IS DONE IN-PLACE
With change buffer enabled, InnoDB doesn't write a transaction log
record when it merges a record from the insert buffer to an secondary
index page if the insertion is performed as an update-in-place.
Fixed by logging the 'update-in-place' operation on secondary index
pages.
Approved by Marko. rb#2429
IT IS DONE IN-PLACE
With change buffer enabled, InnoDB doesn't write a transaction log
record when it merges a record from the insert buffer to an secondary
index page if the insertion is performed as an update-in-place.
Fixed by logging the 'update-in-place' operation on secondary index
pages.
Approved by Marko. rb#2429
DICT_TABLE_GET_FORMAT(CLUST_INDEX->TABLE) >= 1
The function row_sel_sec_rec_is_for_clust_rec() was incorrectly
preparing to compare a NULL column prefix in a secondary index with a
non-NULL column in a clustered index.
This can trigger an assertion failure in 5.1 plugin and later. In the
built-in InnoDB of MySQL 5.1 and earlier, we would apparently only do
some extra work, by trimming the clustered index field for the
comparison.
The code might actually have worked properly apart from this debug
assertion failure. It is merely doing some extra work in fetching a
BLOB column, and then comparing it to NULL (which would return the
same result, no matter what the BLOB contents is).
While the test case involves CHECK TABLE, this could theoretically
occur during any read that uses a secondary index on a column prefix
of a column that can be NULL.
rb#3101 approved by Mattias Jonsson
DICT_TABLE_GET_FORMAT(CLUST_INDEX->TABLE) >= 1
The function row_sel_sec_rec_is_for_clust_rec() was incorrectly
preparing to compare a NULL column prefix in a secondary index with a
non-NULL column in a clustered index.
This can trigger an assertion failure in 5.1 plugin and later. In the
built-in InnoDB of MySQL 5.1 and earlier, we would apparently only do
some extra work, by trimming the clustered index field for the
comparison.
The code might actually have worked properly apart from this debug
assertion failure. It is merely doing some extra work in fetching a
BLOB column, and then comparing it to NULL (which would return the
same result, no matter what the BLOB contents is).
While the test case involves CHECK TABLE, this could theoretically
occur during any read that uses a secondary index on a column prefix
of a column that can be NULL.
rb#3101 approved by Mattias Jonsson
There was a race condition in the rollback of TRX_UNDO_UPD_DEL_REC.
Once row_undo_mod_clust() has rolled back the changes by the rolling-back
transaction, it attempts to purge the delete-marked record, if possible, in a
separate mini-transaction.
However, row_undo_mod_remove_clust_low() fails to check if the DB_TRX_ID of
the record that it found after repositioning the cursor, is still the same.
If it is not, it means that the record was purged and another record was
inserted in its place.
So, the rollback would have performed an incorrect purge, breaking the
locking rules and causing corruption.
The problem was found by creating a table that contains a unique
secondary index and a primary key, and two threads running REPLACE
with only one value for the unique column, so that the uniqueness
constraint would be violated all the time, leading to statement
rollback.
This bug exists in all InnoDB versions (I checked MySQL 3.23.53).
It has become easier to repeat in 5.5 and 5.6 thanks to scalability
improvements and a dedicated purge thread.
rb#3085 approved by Jimmy Yang
There was a race condition in the rollback of TRX_UNDO_UPD_DEL_REC.
Once row_undo_mod_clust() has rolled back the changes by the rolling-back
transaction, it attempts to purge the delete-marked record, if possible, in a
separate mini-transaction.
However, row_undo_mod_remove_clust_low() fails to check if the DB_TRX_ID of
the record that it found after repositioning the cursor, is still the same.
If it is not, it means that the record was purged and another record was
inserted in its place.
So, the rollback would have performed an incorrect purge, breaking the
locking rules and causing corruption.
The problem was found by creating a table that contains a unique
secondary index and a primary key, and two threads running REPLACE
with only one value for the unique column, so that the uniqueness
constraint would be violated all the time, leading to statement
rollback.
This bug exists in all InnoDB versions (I checked MySQL 3.23.53).
It has become easier to repeat in 5.5 and 5.6 thanks to scalability
improvements and a dedicated purge thread.
rb#3085 approved by Jimmy Yang
Since the mtr_t struct is marked as invalid in DEBUG_VALGRIND build
during mtr_commit, checking mtr->inside_ibuf will cause this warning.
Also since mtr->inside_ibuf cannot be set in mtr_commit (assert check)
and mtr->state is set to MTR_COMMITTED, the 'ut_ad(!ibuf_inside(&mtr))'
check is not needed if 'ut_ad(mtr.state == MTR_COMMITTED)' is also
checked.
Since the mtr_t struct is marked as invalid in DEBUG_VALGRIND build
during mtr_commit, checking mtr->inside_ibuf will cause this warning.
Also since mtr->inside_ibuf cannot be set in mtr_commit (assert check)
and mtr->state is set to MTR_COMMITTED, the 'ut_ad(!ibuf_inside(&mtr))'
check is not needed if 'ut_ad(mtr.state == MTR_COMMITTED)' is also
checked.
SHUTDOWN IS IN PROGRESS
PROBLEM
-------
In the background thread srv_master_thread() we have a
a one second delay loop which will continuously monitor
server activity .If the server is inactive (with out any
user activity) or in a shutdown state we do some background
activity like flushing the changes.In the current code
we are not checking if server is in shutdown state before
sleeping for one second.
FIX
---
If server is in shutdown state ,then dont go to one second
sleep.
SHUTDOWN IS IN PROGRESS
PROBLEM
-------
In the background thread srv_master_thread() we have a
a one second delay loop which will continuously monitor
server activity .If the server is inactive (with out any
user activity) or in a shutdown state we do some background
activity like flushing the changes.In the current code
we are not checking if server is in shutdown state before
sleeping for one second.
FIX
---
If server is in shutdown state ,then dont go to one second
sleep.
Problem:
When the user specified foreign key name contains "_ibfk_", InnoDB wrongly
tries to rename it.
Solution:
When a table is renamed, all its associated foreign keys will also be renamed,
only if the foreign key names are automatically generated. If the foreign key
names are given by the user, even if it has _ibfk_ in it, it must not be
renamed.
rb#2935 approved by Jimmy, Krunal and Satya
Problem:
When the user specified foreign key name contains "_ibfk_", InnoDB wrongly
tries to rename it.
Solution:
When a table is renamed, all its associated foreign keys will also be renamed,
only if the foreign key names are automatically generated. If the foreign key
names are given by the user, even if it has _ibfk_ in it, it must not be
renamed.
rb#2935 approved by Jimmy, Krunal and Satya
SERIALIZABLE
Problem:
The documentation claims that WITH CONSISTENT SNAPSHOT will work for both
REPEATABLE READ and SERIALIZABLE isolation levels. But it will work only
for REPEATABLE READ isolation level. Also, the clause WITH CONSISTENT
SNAPSHOT is silently ignored when it is not applicable to the given isolation
level.
Solution:
Generate a warning when the clause WITH CONSISTENT SNAPSHOT is ignored.
rb#2797 approved by Kevin.
Note: Support team wanted to push this to 5.5+.
SERIALIZABLE
Problem:
The documentation claims that WITH CONSISTENT SNAPSHOT will work for both
REPEATABLE READ and SERIALIZABLE isolation levels. But it will work only
for REPEATABLE READ isolation level. Also, the clause WITH CONSISTENT
SNAPSHOT is silently ignored when it is not applicable to the given isolation
level.
Solution:
Generate a warning when the clause WITH CONSISTENT SNAPSHOT is ignored.
rb#2797 approved by Kevin.
Note: Support team wanted to push this to 5.5+.
MULTI-FILE TABLESPACE
ANALYSIS
--------
When a tablespace has multiple data files, InnoDB fails to
open the tablespace. This is because for each ibd file,
the first page is checked.But the first page of all ibd file
need not be the first page of the tablespace. Only the first
page of the tablespace contains the tablespace header. When
we check the first page of an ibd file that is not the first
page of the tablespace, then the "tablespace flags" is not
really available.This was wrongly used to check if a page is
corrupt or not.
FIX
---
Use the tablespace flags only if the page number is 0
in a tablespace.
[Approved by Inaam rb#2836 ]
MULTI-FILE TABLESPACE
ANALYSIS
--------
When a tablespace has multiple data files, InnoDB fails to
open the tablespace. This is because for each ibd file,
the first page is checked.But the first page of all ibd file
need not be the first page of the tablespace. Only the first
page of the tablespace contains the tablespace header. When
we check the first page of an ibd file that is not the first
page of the tablespace, then the "tablespace flags" is not
really available.This was wrongly used to check if a page is
corrupt or not.
FIX
---
Use the tablespace flags only if the page number is 0
in a tablespace.
[Approved by Inaam rb#2836 ]
Analysis
--------
The pthread_mutex commit_threads_m was initiliazed but never
used.
Fix
---
Removing the commit_threads_m mutex from the code base.
[ Approved by Marko rb#2475]
Analysis
--------
The pthread_mutex commit_threads_m was initiliazed but never
used.
Fix
---
Removing the commit_threads_m mutex from the code base.
[ Approved by Marko rb#2475]
DDL AND I_S QUERIES
Skip partially created indexes (ones whose name starts with TEMP_INDEX_PREFIX)
from stats gathering.
Because InnoDB reports HA_INPLACE_ADD_INDEX_NO_WRITE to MySQL, the latter
allows parallel execution of ha_innobase::add_index() and ha_innobase::info().
Reviewed by: Inaam (rb:2613)
DDL AND I_S QUERIES
Skip partially created indexes (ones whose name starts with TEMP_INDEX_PREFIX)
from stats gathering.
Because InnoDB reports HA_INPLACE_ADD_INDEX_NO_WRITE to MySQL, the latter
allows parallel execution of ha_innobase::add_index() and ha_innobase::info().
Reviewed by: Inaam (rb:2613)
IF IT HAS A WRONG COUNT
If CHECK TABLE finds that a secondary index contains the wrong
number of entries, it used to report an error but not mark the
index as corrupt. The error means that the index should be rebuilt,
which can be done with ALTER TABLE DROP INDEX and ALTER TABLE ADD
INDEX. But just in case the DBA does not pay any attention to the
output of CHECK TABLE, the secondary index should be marked as
corrupted so that it is not used again.
Approved by Inaam in RB:2607
IF IT HAS A WRONG COUNT
If CHECK TABLE finds that a secondary index contains the wrong
number of entries, it used to report an error but not mark the
index as corrupt. The error means that the index should be rebuilt,
which can be done with ALTER TABLE DROP INDEX and ALTER TABLE ADD
INDEX. But just in case the DBA does not pay any attention to the
output of CHECK TABLE, the secondary index should be marked as
corrupted so that it is not used again.
Approved by Inaam in RB:2607
ON DELETION ORDER
Problem:
When a InnoDB index page is under-filled, we will merge it with either
the left sibling node or the right sibling node. But this checking is
incorrect. When the left sibling node is available, even if merging
is not possible with left sibling node, we do not check for the
possibility of merging with the right sibling node.
Solution:
If left sibling node is available, and merging with left sibling node
is not possible, then check if merge with right sibling node is
possible.
rb#2506 approved by jimmy & ima.
ON DELETION ORDER
Problem:
When a InnoDB index page is under-filled, we will merge it with either
the left sibling node or the right sibling node. But this checking is
incorrect. When the left sibling node is available, even if merging
is not possible with left sibling node, we do not check for the
possibility of merging with the right sibling node.
Solution:
If left sibling node is available, and merging with left sibling node
is not possible, then check if merge with right sibling node is
possible.
rb#2506 approved by jimmy & ima.
i_s_innodb_buffer_page_get_info(): Do not read the buffer block frame
contents of read-fixed blocks, because it may be invalid or
uninitialized. When we are going to decompress or read a block, we
will put it into buf_pool->page_hash and buf_pool->LRU, read-fix the
block and release the mutexes for the duration of the reading or
decompression.
rb#2500 approved by Jimmy Yang
i_s_innodb_buffer_page_get_info(): Do not read the buffer block frame
contents of read-fixed blocks, because it may be invalid or
uninitialized. When we are going to decompress or read a block, we
will put it into buf_pool->page_hash and buf_pool->LRU, read-fix the
block and release the mutexes for the duration of the reading or
decompression.
rb#2500 approved by Jimmy Yang
ESCAPED WITH BACKSLASH
Problem:
When the CREATE TABLE statement used COMMENTS with escape sequences like
'foo\'s', InnoDB did not parse is correctly when trying to extract the
foreign key information. Because of this, the foreign keys specified
in the CREATE TABLE statement were not created.
Solution:
Make the InnoDB internal parser aware of escape sequences.
rb#2457 approved by Kevin.
ESCAPED WITH BACKSLASH
Problem:
When the CREATE TABLE statement used COMMENTS with escape sequences like
'foo\'s', InnoDB did not parse is correctly when trying to extract the
foreign key information. Because of this, the foreign keys specified
in the CREATE TABLE statement were not created.
Solution:
Make the InnoDB internal parser aware of escape sequences.
rb#2457 approved by Kevin.
INSERT BUFFER MERGE
Problem:
When the record is merged from the change buffer to the actual page,
in a particular condition, it is assumed that the deleted rec will
be re-used by the inserted rec. With this assumption the lock is
restored on the pointer to the deleted rec itself, thinking that
it is pointing to the newly inserted rec.
Solution:
Just before restoring the lock, update the rec pointer to point
to the newly inserted record. An assert has been added to verify
this. This assert will fail without the fix and will pass with
the fix.
rb#2449 in review by Marko and Jimmy
INSERT BUFFER MERGE
Problem:
When the record is merged from the change buffer to the actual page,
in a particular condition, it is assumed that the deleted rec will
be re-used by the inserted rec. With this assumption the lock is
restored on the pointer to the deleted rec itself, thinking that
it is pointing to the newly inserted rec.
Solution:
Just before restoring the lock, update the rec pointer to point
to the newly inserted record. An assert has been added to verify
this. This assert will fail without the fix and will pass with
the fix.
rb#2449 in review by Marko and Jimmy
INSERT BUFFER MERGE
Problem:
When the record is merged from the change buffer to the actual page,
in a particular condition, it is assumed that the deleted rec will
be re-used by the inserted rec. With this assumption the lock is
restored on the pointer to the deleted rec itself, thinking that
it is pointing to the newly inserted rec.
Solution:
Just before restoring the lock, update the rec pointer to point
to the newly inserted record. An assert has been added to verify
this. This assert will fail without the fix and will pass with
the fix.
rb#2449 in review by Marko and Jimmy
INSERT BUFFER MERGE
Problem:
When the record is merged from the change buffer to the actual page,
in a particular condition, it is assumed that the deleted rec will
be re-used by the inserted rec. With this assumption the lock is
restored on the pointer to the deleted rec itself, thinking that
it is pointing to the newly inserted rec.
Solution:
Just before restoring the lock, update the rec pointer to point
to the newly inserted record. An assert has been added to verify
this. This assert will fail without the fix and will pass with
the fix.
rb#2449 in review by Marko and Jimmy
INNODB_FAST_SHUTDOWN IS 2
Problem:
When innodb_fast_shutdown is set to 2 and the master thread enters
flush loop, under some circumstances it will not be able to exit it.
This may lead to a shutdown hanging.
This is happening because of the following:
1. In the flush_loop block of code, if the srv_fast_shutdown is
equal to 2 (very fast shutdown), then we do not flush dirty
pages in buffer pool to disk.
2. In the same flush_loop block of code, if the number of dirty
pages is more than user specified limit, we go to step 1.
This results in infinite loop.
Solution:
When we are in the process of doing a very fast shutdown, don't
do step 2 above.
rb#2328 approved by Inaam.
INNODB_FAST_SHUTDOWN IS 2
Problem:
When innodb_fast_shutdown is set to 2 and the master thread enters
flush loop, under some circumstances it will not be able to exit it.
This may lead to a shutdown hanging.
This is happening because of the following:
1. In the flush_loop block of code, if the srv_fast_shutdown is
equal to 2 (very fast shutdown), then we do not flush dirty
pages in buffer pool to disk.
2. In the same flush_loop block of code, if the number of dirty
pages is more than user specified limit, we go to step 1.
This results in infinite loop.
Solution:
When we are in the process of doing a very fast shutdown, don't
do step 2 above.
rb#2328 approved by Inaam.