bug#39483 InnoDB hang on adaptive hash because of out of order ::open()
call by MySQL
Forward port of r2629
Under some conditions MySQL calls ::open with search_latch leading
to a deadlock as we try to acquire dict_sys->mutex inside ::open
breaking the latching order. The fix is to release search_latch.
Reviewed by: Heikki
Add the parameter struct charset_info_st* cs, so that the call
thd_charset(current_thd) can be avoided. The macro current_thd has no
defined value in the Windows plugin.
ha_innodb.cc: Declare strict_mode as PLUGIN_VAR_OPCMDARG, because we
do want to be able to disable innodb_strict_mode. This is a non-functional
change, because PLUGIN_VAR_NOCMDARG seems to accept an argument as well.
innodb-zip.test: Do not store innodb_strict_mode. It is a session variable.
Add a test case for innodb_strict_mode=off.
there will always be enough space for two node pointer records in an
empty B-tree page. This was reported as Mantis issue #73.
page_zip_rec_needs_ext(): Add the parameter n_fields, for accurate
estimation of the compressed size of the data dictionary information.
Given that this function is only invoked for records on leaf pages,
require that there be enough space for one record in the compressed
page. We check elsewhere that there will be enough room for two node
pointer records on higher-level pages.
btr_cur_optimistic_insert(): Ensure that there will be enough room for
two node pointer records on an empty non-leaf page. The rule for
leaf-page records will be enforced by the callers of
page_zip_rec_needs_ext().
btr_cur_pessimistic_insert(): Remove the insufficient check that the
leaf page record should be compressible by itself. Instead, now we
require that two node pointer records fit on a non-leaf page, and one
record will fit in uncompressed form on the leaf page.
page_zip_write_header(), page_zip_write_rec(): Re-enable the debug
assertions that were violated by the insufficient check in
btr_cur_pessimistic_insert().
innodb_bug36172.test: Use a larger compressed page size.
btr_search_drop_page_hash_index(): Add const qualifiers to the local
variables page, rec, and index, to ensure that they are not modified
by this function.
page_get_infimum_offset(), page_get_supremum_offset(): New functions.
page_get_infimum_rec(), page_get_supremum_rec(): Replaced by
const-preserving macros that invoke the accessor functions.
help in tracking down issue #63 (memory corruption). UNIV_BTR_DEBUG
is currently enabled in univ.i.
btr_root_fseg_validate(): New function, for validating a file segment
header on a B-tree root page.
btr_root_block_get(), btr_free_but_not_root(),
btr_root_raise_and_insert(), btr_discard_only_page_on_level():
Check PAGE_BTR_SEG_LEAF and PAGE_BTR_SEG_TOP on the root page with
btr_root_fseg_validate().
btr_root_raise_and_insert(): Move the assertion
dict_index_get_page(index) == page_get_page_no(root)
inside UNIV_BTR_DEBUG. It was previously enabled by UNIV_DEBUG.
btr_free_root(): Check PAGE_BTR_SEG_TOP on the root page with
btr_root_fseg_validate().
Add a test case to check that mysqld does not crash when running ANALYZE TABLE
with different values for innodb_stats_sample_pages.
Suggested by: Marko
Approved by: Marko
Limit the number of the pages that are sampled so it is never greater
than the total number of pages in the index.
The parameter that specifies the number of pages to test is global for
all tables. By limiting it this way we allow the user to set it "high"
to suit "large" tables and to avoid unnecessary work for "small" tables
(e.g. doing 100 dives in a table that has 5 pages, obviously testing
some pages more than once).
Suggested by: Ken
Approved by: Marko
Merge 2605:2617 from branches/5.1:
------------------------------------------------------------------------
r2609 | sunny | 2008-08-24 01:19:05 +0300 (Sun, 24 Aug 2008) | 12 lines
Changed paths:
M /branches/5.1/handler/ha_innodb.cc
M /branches/5.1/mysql-test/innodb-autoinc.result
M /branches/5.1/mysql-test/innodb-autoinc.test
branches/5.1: Fix for MySQL Bug#38839. Reset the statement level last
value field in prebuilt. This field tracks the last value in an autoincrement
interval. We use this value to check whether we need to update a table's
AUTOINC counter, if the value written to a table is less than this value
then we avoid updating the table's AUTOINC value in order to reduce
mutex contention. If it's not reset (e.g., after a DELETE statement) then
there is the possibility of missing updates to the table's AUTOINC counter
resulting in a subsequent duplicate row error message under certain
conditions (see the test case for details).
Bug #38839 - auto increment does not work properly with InnoDB after update
------------------------------------------------------------------------
r2617 | vasil | 2008-09-09 15:46:17 +0300 (Tue, 09 Sep 2008) | 47 lines
Changed paths:
M /branches/5.1/mysql-test/innodb.result
branches/5.1:
Merge a change from MySQL (fix the failing innodb test):
------------------------------------------------------------
revno: 2646.12.1
committer: Mattias Jonsson <mattiasj@mysql.com>
branch nick: wl4176_2-51-bugteam
timestamp: Mon 2008-08-11 20:02:03 +0200
message:
Bug#20129: ALTER TABLE ... REPAIR PARTITION ... complains that
partition is corrupt
The main problem was that ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR
PARTITION took another code path (over mysql_alter_table instead of
mysql_admin_table) which differs in two ways:
1) alter table opens the tables in a different way than admin tables do
resulting in returning with error before it tried the command
2) alter table does not start to send any diagnostic rows to the client
which the lower admin functions continue to use -> resulting in
assertion crash
The fix:
Remapped ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR PARTITION to use
the same code path as ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE t.
Adding check in mysql_admin_table to setup the partition list for
which partitions that should be used.
Partitioned tables will still not work with
REPAIR TABLE/PARTITION USE_FRM, since that requires moving partitions
to tables, REPAIR TABLE t USE_FRM, and check that the data still
fulfills the partitioning function and then move the table back to
being a partition.
NOTE: I have removed the following functions from the handler
interface:
analyze_partitions, check_partitions, optimize_partitions,
repair_partitions
Since they are not longer needed.
THIS ALTERS THE STORAGE ENGINE API
I have verified that OPTIMIZE TABLE actually rebuilds the table
and calls ANALYZE.
Approved by: Heikki
foreign key constraint, find a truly equivalent index for it.
If none is available, refuse to drop the index. MySQL can drop
an index when creating a "stronger" index.
This was reported as Mantis issue #70 and MySQL Bug #38786.
innodb-index.test: Add a test case.
dict_foreign_find_equiv_index(): New function, to replace the
incorrectly written function dict_table_find_equivalent_index().
dict_table_replace_index_in_foreign_list(): Simplify the implementation.
in fast index creation. In r1399, we wrote undo log records about
creating indexes. The special undo log records were deemed
unnecessary later, but this special handling was not removed then.
row_merge_create_index(): Do not assign index->id.
dict_build_index_def_step(): Unconditionally assign index->id.
Merge 2537:2605 from branches/5.1:
------------------------------------------------------------------------
r2545 | vasil | 2008-07-25 17:24:23 +0300 (Fri, 25 Jul 2008) | 37 lines
Changed paths:
M /branches/5.1/handler/ha_innodb.cc
branches/5.1:
Fix Bug#38185 ha_innobase::info can hold locks even when called with HA_STATUS_NO_LOCK
The fix is to call fsp_get_available_space_in_free_extents() from
ha_innobase::info() only if HA_STATUS_NO_LOCK is not present in the flag
*AND*
change get_schema_tables_record() in MySQL's sql/sql_show.cc to call
::info() *without* HA_STATUS_NO_LOCK whenever a user issues SELECT FROM
information_schema.tables;
Without the change to sql/sql_show.cc this patch would lead to Bug#32440
resurfacing. I.e. delete_length would never be updated in ::info() and
will remain 0 forever, resulting in the free space not being shown
anywhere.
This is the change to sql/sql_show.cc for reference, it needs to be
committed to the MySQL repo before or at the same time with this change
to ha_innodb.cc:
--- patch begins here ---
--- sql/sql_show.cc.orig 2008-07-23 09:32:14.000000000 +0300
+++ sql/sql_show.cc 2008-07-23 09:32:19.000000000 +0300
@@ -3549,8 +3549,7 @@ static int get_schema_tables_record(THD
if(file)
{
- file->info(HA_STATUS_VARIABLE | HA_STATUS_TIME | HA_STATUS_AUTO |
- HA_STATUS_NO_LOCK);
+ file->info(HA_STATUS_VARIABLE | HA_STATUS_TIME | HA_STATUS_AUTO);
enum row_type row_type = file->get_row_type();
switch (row_type) {
case ROW_TYPE_NOT_USED:
--- patch ends here ---
Approved by: Heikki
------------------------------------------------------------------------
r2603 | marko | 2008-08-21 16:25:05 +0300 (Thu, 21 Aug 2008) | 10 lines
Changed paths:
M /branches/5.1/handler/ha_innodb.cc
M /branches/5.1/include/ha_prototypes.h
M /branches/5.1/row/row0sel.c
branches/5.1: Identify SELECT statements by thd_sql_command() == SQLCOM_SELECT
instead of parsing the query string. This fixes MySQL Bug #37885 without
us having to implement lexical analysis of SQL comments in yet another place.
thd_is_select(): A new predicate.
row_search_for_mysql(): Use thd_is_select().
Approved by Heikki.
------------------------------------------------------------------------
dict_table_get_referenced_constraint(), dict_table_get_foreign_constraint():
Simplify the iteration loop.
dict_table_find_equivalent_index(): Correct the function comment.
cache, especially buf_pool->LRU_old and bpage->old.
buf_LRU_old_adjust_len(), buf_LRU_remove_block(): Check that blocks in
buf_pool->LRU_old have the "old" flag set and the blocks preceding
buf_pool->LRU_old have the "old" flag clear.
buf_LRU_add_block_low(), buf_relocate(): Check that buf_pool->LRU_old
is the first block in the LRU list whose "old" flag is set.
buf_LRU_free_block(): When replacing a control block in the LRU list
with a control block for a compressed page, assert that the "old"
flags in the neighboring LRU list entries grow monotonically.
buf_page_set_old(): Assert that the "old" flags in the neighboring LRU
list entries grow monotonically.
buf_pool->LRU_old_len. However, we forgot to check if buf_pool->LRU_old
happens to point to b's successor in the LRU list. If it does, we must
assign buf_pool->LRU_old = b. The following invariants hold:
In the LRU list, the "old" flag should grow monotonically, i.e., it is 0
for the first few items and 1 from thereafter.
If buf_pool->LRU_old != NULL, it must point to the first item with old=1
in the LRU list, and there must be buf_pool->LRU_old_len old items in the list.
This should fix Mantis issue#50 and issue#68.
The cardinality of every index (the number of different key values) is
calculated when the table is opened, at SHOW TABLE STATUS,
ANALYZE TABLE and on other circumstances (like when the table has
changed too much). Note that if the mysql client is running with the
auto-rehash setting turned on (default) this causes all tables to be
opened when it starts.
Previously InnoDB sampled 8 random pages from the index to get an
estimate of the cardinality. Now the number of sampled pages can be
changed via the global parameter innodb_stats_sample_pages which can
be tuned at runtime. The default value for this parameter is 8.
If the value of this parameter is changed, there may be serious problems:
- small values (say, 1) can cause an error in table stats;
- values much larger than 8 (say, 100), can cause a big slowdown in
table opening time, SHOW TABLE status, etc.
- query plans may be different from the old ones.
Approved by: Heikki
recovery, tolerate clustered index records whose externally stored
columns have not been written. This should remove the assertion failures
that were reported as Mantis issue#58, issue#62, issue#64.
trx_is_recv(): New function: TRUE if this transaction is rolling back
an incomplete transaction in crash recovery.
enum trx_rbmode: Rollback modes: no rollback, normal rollback, crash recovery.
btr_cur_pessimistic_delete(), btr_free_externally_stored_field(),
btr_rec_free_externally_stored_fields():
Replace the ibool parameter with enum trx_rbmode.
btr_free_externally_stored_field(): If field_ref is zero, return
but assert ut_a(rbmode == RB_RECOVERY). Unless InnoDB has crashed
while inserting a clustered index record, field_ref should not be zero.
btr_rec_free_updated_extern_fields(): Add the parameter enum trx_rbmode.
btr_cur_pessimistic_update(): Pass the rbmode parameter to
btr_rec_free_updated_extern_fields().
row_undo_ins(), row_undo_mod_upd_del_sec(): If row_build_index_entry()
fails, assert trx_is_recv() and skip this secondary index.
row_undo_mod_upd_del_sec(): Empty the heap at the end of each loop
iteration in order to conserve memory and to reduce the number of
low-level memory allocations.
buf_flush_init_for_writing(), buf_LRU_block_remove_hashed_page():
Dump the page frame featuring the incorrect FIL_PAGE_TYPE along with the
page_zip->data that might contain an earlier version of the page.
buf_block_align() on a non-file page frame that was created in
btr_cur_pessimistic_insert(), to see if a record fits on a compressed
page by itself. These assertions caused an assertion failure in
buf_block_align() in innodb_bug36172.test.
page_zip_write_rec(), page_zip_write_header(): Remove the assertion
that calls buf_frame_get_page_zip().
an incorrect value. This is to track down Mantis issue#63 and issue#65.
buf_LRU_block_remove_hashed_page(),
buf_flush_init_for_writing(): dump the compressed page before ut_error.
Fixes a race in recovery where the recovery thread recovering a
PREPARED trx and the background rollback thread can both try
to free the trx after its status is set to COMMITTED_IN_MEMORY.
trx->is_recovered flag was introduced in r2040.
Reviewed by: Sunny
This fix makes two basic changes in blob handling:
(The bug was introduced in r2252)
1) The blob prefixes are no longer stored in the undo if
a) We are modifying a delete marked record and
b) The record was delete marked by an already committed trx.
2) When building old row version to check if one of these versions
can hold an implicit lock on the record we stop our probe if
a) The version is delete marked and
b) The delete marking is done by a trx which is different from
the current active trx on the record.
Reviewed by: Heikki
------------------------------------------------------------------------
r2537 | inaam | 2008-07-15 20:46:03 +0300 (Tue, 15 Jul 2008) | 12 lines
branches/5.1 issue# 4
Fixed a timing hole where a thread dropping an index can free the
in-memory index struct while another thread is still using
that structure to remove entries from adaptive hash index belonging
to one of the pages that belongs to the index being dropped.
The fix is to have a reference counter in the index struct and to
wait for this counter to drop to zero beforing freeing the struct.
Reviewed by: Heikki
------------------------------------------------------------------------