print page dump
buf_page_print(): Remove the ut_ad(0) from the beginning. Add two flags
(enum buf_page_print_flags) that can be bitwise-ORed together:
BUF_PAGE_PRINT_NO_CRASH:
Do not crash debug builds at the end of buf_page_print().
BUF_PAGE_PRINT_NO_FULL:
Do not print the full page dump. This can be useful when adding
diagnostic printout to flushing or to the doublewrite buffer.
trx_sys_doublewrite_init_or_restore_page(): Replace exit(1) with ut_error,
so that we can get a core dump if this extraordinary condition happens.
rb:924 approved by Sunny Bains
WITH LARGE BUFFER POOL
(Note: this a backport of revno:3472 from mysql-trunk)
rb://845
approved by: Marko
When dropping a table (with an .ibd file i.e.: with
innodb_file_per_table set) we scan entire LRU to invalidate pages from
that table. This can be painful in case of large buffer pools as we hold
the buf_pool->mutex for the scan. Note that gravity of the problem does
not depend on the size of the table. Even with an empty table but a
large and filled up buffer pool we'll end up scanning a very long LRU
list.
The fix is to scan flush_list and just remove the blocks belonging to
the table from the flush_list, marking them as non-dirty. The blocks
are left in the LRU list for eventual eviction due to aging. The
flush_list is typically much smaller than the LRU list but for cases
where it is very long we have the solution of releasing the
buf_pool->mutex after scanning 1K pages.
buf_page_[set|unset]_sticky(): Use new IO-state BUF_IO_PIN to ensure
that a block stays in the flush_list and LRU list when we release
buf_pool->mutex. Previously we have been abusing BUF_IO_READ to achieve
this.
Bug#12612184 RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE()
The fix introduced potentially more severe crash recovery problems
than the bug causes. Revert the fix for now.
The fix of Bug#12612184 broke crash recovery. When a record that
contains off-page columns (BLOBs) is updated, we must first write redo
log about the BLOB page writes, and only after that write the redo log
about the B-tree changes. The buggy fix would log the B-tree changes
first, meaning that after recovery, we could end up having a record
that contains a null BLOB pointer.
Because we will be redo logging the writes off the off-page columns
before the B-tree changes, we must make sure that the pages chosen for
the off-page columns are free both before and after the B-tree
changes. In this way, the worst thing that can happen in crash
recovery is that the BLOBs are written to free pages, but the B-tree
changes are not applied. The BLOB pages would correctly remain free in
this case. To achieve this, we must allocate the BLOB pages in the
mini-transaction of the B-tree operation. A further quirk is that BLOB
pages are allocated from the same file segment as leaf pages. Because
of this, we must temporarily "hide" any leaf pages that were freed
during the B-tree operation by "fake allocating" them prior to writing
the BLOBs, and freeing them again before the mtr_commit() of the
B-tree operation, in btr_mark_freed_leaves().
btr_cur_mtr_commit_and_start(): Remove this faulty function that was
introduced in the Bug#12612184 fix. The problem that this function was
trying to address was that when we did mtr_commit() the BLOB writes
before the mtr_commit() of the update, the new BLOB pages could have
overwritten clustered index B-tree leaf pages that were freed during
the update. If recovery applied the redo log of the BLOB writes but
did not see the log of the record update, the index tree would be
corrupted. The correct solution is to make the freed clustered index
pages unavailable to the BLOB allocation. This function is also a
likely culprit of InnoDB hangs that were observed when testing the
Bug#12612184 fix.
btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
or freed (nonfree=FALSE) before committing the mini-transaction.
btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).
btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or if this is a "fake allocation"
(init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).
btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
initialized and X-latched in a different mini-transaction than the one
that is used for the allocation. Invoke btr_page_alloc_low(). If a
clustered index leaf page was previously freed in mtr, remove it from
the memo of previously freed pages.
btr_page_free(): Assert that the page is a B-tree page and it has been
X-latched by the mini-transaction. If the freed page was a leaf page
of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
the mini-transaction.
btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.
fsp_alloc_from_free_frag(): Refactored from
fsp_alloc_free_page(). Allocate the specified page from a partially
free extent.
fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or NULL if this is a "fake allocation"
that prevents the reuse of a previously freed B-tree page for BLOB
storage. If init_mtr==NULL, try harder to reallocate the specified page
and assert that it succeeded.
fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
specifying the mini-transaction where the page should be initialized.
Do not allow init_mtr == NULL, because this function is never to be
used for "fake allocations".
mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
mtr->freed_clust_leaf for quickly determining if any
MTR_MEMO_FREE_CLUST_LEAF operations have been posted.
row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
the mini-transaction as the alloc_mtr to
btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
row_build(): Correct a comment, and add a debug assertion that a
record that contains NULL BLOB pointers must be a fresh insert.
row_upd_clust_rec(): When columns are being moved off-page, invoke
btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
buf_reset_check_index_page_at_flush(): Remove. The function
fsp_init_file_page_low() already sets
bpage->check_index_page_at_flush=FALSE.
There is a known issue in tablespace extension. If the request to
allocate a BLOB page leads to the tablespace being extended, crash
recovery could see BLOB writes to pages that are off the tablespace
file bounds. This should trigger an assertion failure in fil_io() at
crash recovery. The safe thing would be to write redo log about the
tablespace extension to the mini-transaction of the BLOB write, not to
the mini-transaction of the record update. However, there is no redo
log record for file extension in the current redo log format.
rb:693 approved by Sunny Bains
btr_cur_compress_if_useful(), btr_compress(): Add the parameter ibool
adjust. If adjust=TRUE, adjust the cursor position after compressing
the page.
btr_lift_page_up(): Return a pointer to the father page.
BTR_KEEP_POS_FLAG: A new flag for btr_cur_pessimistic_update().
btr_cur_pessimistic_update(): If *big_rec != NULL and flags &
BTR_KEEP_POS_FLAG, keep the cursor positioned on the updated record.
Also, do not release the index tree x-lock if *big_rec != NULL.
btr_cur_mtr_commit_and_start(): Commits and restarts a
mini-transaction so that it will retain an x-lock on index->lock and
the page of the cursor. This is invoked when
btr_cur_pessimistic_update() returns *big_rec != NULL.
In all callers of btr_cur_pessimistic_update() that do not pass
BTR_KEEP_POS_FLAG, assert that *big_rec == NULL.
btr_cur_compress(): Unused function [in the built-in MySQL 5.1], remove.
page_rec_get_nth(): Return the nth record on the page (an inverse
function of page_rec_get_n_recs_before()). Refactored from
page_get_middle_rec().
page_get_middle_rec(): Invoke page_rec_get_nth().
page_cur_insert_rec_zip_reorg(): Make use of the page directory
shortcuts in page_rec_get_nth() instead of scanning the whole list of
records.
row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update().
row_ins_index_entry_low(): If row_ins_clust_index_entry_by_modify()
returns a big_rec, invoke btr_cur_mtr_commit_and_start() in order to
commit and start the mini-transaction without releasing the x-locks on
index->lock and the cursor page, and write the big_rec. Releasing the
page latch in mtr_commit() caused a race condition.
row_upd_clust_rec(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update(). If it returns a big_rec, invoke
btr_cur_mtr_commit_and_start() in order to commit and start the
mini-transaction without releasing the x-locks on index->lock and the
cursor page, and write the big_rec. Releasing the page latch in
mtr_commit() caused a race condition.
sync_thread_add_level(): Add the parameter ibool relock. When TRUE,
bypass the latching order rules.
rw_lock_add_debug_info(): For nested X-lock requests, pass relock=TRUE
to sync_thread_add_level().
rb:678 approved by Jimmy Yang
------------------------------------------------------------
revno 2876.244.305
revision id marko.makela@oracle.com-20110413082211-e6ouhjz5rmqxcqap
parent marko.makela@oracle.com-20110413075948-kvytmc37ye1nt7d9
committer Marko Mäkelä <marko.makela@oracle.com>
branch nick 5.6-innodb
timestamp Wed 2011-04-13 11:22:11 +0300
message:
Suppress the Bug #58815 (Bug #11765812) assertion failure.
buf_page_get_gen(): Introduce BUF_GET_POSSIBLY_FREED for suppressing the
check that the file page must not have been freed.
btr_estimate_n_rows_in_range_on_level(): Pass BUF_GET_POSSIBLY_FREED and
explain in the comments why this is needed and why it should be mostly
harmless to ignore the problem. If InnoDB had always initialized all
unused fields in data files, no problem would exist.
This change does not fix the bug, it just "shoots the messenger".
rb:647 approved by Jimmy Yang
buf_page_t: Remove the buf_pool pointer. Add buf_pool_index:6 next to
buf_fix_count:19 (it was buf_fix_count:25). This will limit the number
of concurrent transactions to somewhere around 524,288.
buf_pool_index(buf_pool): A new function to determine the index of a
buffer pool in buf_pool_ptr[].
buf_pool_ptr: Make this a dynamically allocated array instead of an
array of pointers. Allocate the array in buf_pool_init() and free in
buf_pool_free().
buf_pool_free_instance(): No longer free the buf_pool object, as it is
allocated from a big array.
buf_pool_from_bpage(), buf_pool_from_block(): Move the definitions to
the beginning of the files, because some compilers have (had) problems
with forward definitions of inline functions. Calculate the buffer
pool from buf_pool_index.
buf_pool_from_array(): Add debug assertions for input validation.
Rename buf_pool_watch, buf_pool_mutex, buf_pool_zip_mutex
to buf_pool->watch, buf_pool->mutex, buf_pool->zip_mutex
in comments. Refer to buf_pool->flush_list_mutex instead of
flush_list_mutex.
Remove obsolete declarations of buf_pool_mutex and buf_pool_zip_mutex.
Remove the pure attribute from a function. The function doesn't qualify as
a pure function because it has a side-effect (modifies its parameter). Add
a clarifying comment to another function's declaration.
bzr branch mysql-5.1-performance-version mysql-trunk # Summit
cd mysql-trunk
bzr merge mysql-5.1-innodb_plugin # which is 5.1 + Innodb plugin
bzr rm innobase # remove the builtin
Next step: build, test fixes.
Bug#36600 and Bug#36793:
Bug #36600 SHOW STATUS takes a lot of CPU in buf_get_latched_pages_number
Fix by removing the Innodb_buffer_pool_pages_latched variable from SHOW
STATUS output in non-UNIV_DEBUG compilation.
Bug #36793 rpl_innodb_bug28430 fails on Solaris
This is a back port from branches/zip. This code has been tested on a
big-endian machine too.
Bugs fixed:
- Bug #20877: InnoDB data dictionary memory footprint is too big
- Bug #13544: Second delete of same row in transaction illustrates non-optimal locking
- Bug #20791: valgrind errors in InnoDB
Bugs fixed:
- Bug #20791 valgrind errors in InnoDB
Remove Valgrind warning of Bug #20791 : in new database
creation, we read the doublewrite buffer magic number from
uninitialized memory; the code worked because it was extremely
unlikely that the memory would contain the magic number
- Bug #21784 DROP TABLE crashes 5.1.12-pre if concurrent
queries on the table
remove update_thd() in ::store_lock()
Also includes numerous coding style fixes, etc. See file-level
comments for details.
Fixed BUGS:
#3300: "UPDATE statement with no index column in where condition locks
all rows"
Implement semi-consistent read to reduce lock conflicts at the cost
of breaking serializability.
ha_innobase::unlock_row(): reset the "did semi consistent read" flag
ha_innobase::was_semi_consistent_read(),
ha_innobase::try_semi_consistent_read(): new methods
row_prebuilt_t, row_create_prebuilt(): add field row_read_type for
keeping track of semi-consistent reads
row_vers_build_for_semi_consistent_read(),
row_sel_build_committed_vers_for_mysql(): new functions
row_search_for_mysql(): implement semi-consistent reads
#9802: "Foreign key checks disallow alter table".
Added test cases.
#12456: "Cursor shows incorrect data - DML does not affect,
probably caching"
This patch implements a high-granularity read view to be used with
cursors. In this high-granularity consistent read view modifications
done by the creating transaction after the cursor is created or
future transactions are not visible. But those modifications that
transaction did before the cursor was created are visible.
#12701: "Support >4GB buffer pool and log files on 64-bit Windows"
Do not call os_file_create_tmpfile() at runtime. Instead, create all
tempfiles at startup and guard access to them with mutexes.
#13778: "If FOREIGN_KEY_CHECKS=0, one can create inconsistent FOREIGN KEYs".
When FOREIGN_KEY_CHECKS=0 we still need to check that datatypes between
foreign key references are compatible.
#14189: "VARBINARY and BINARY variables: trailing space ignored with InnoDB"
innobase_init(): Assert that
DATA_MYSQL_BINARY_CHARSET_COLL == my_charset_bin.number.
dtype_get_pad_char(): Do not pad VARBINARY or BINARY columns.
row_ins_cascade_calc_update_vec(): Refuse ON UPDATE CASCADE when trying
to change the length of a VARBINARY column that refers to or is referenced
by a BINARY column. BINARY columns are no longer padded on comparison,
and thus they cannot be padded on storage either.
#14747: "Race condition can cause btr_search_drop_page_hash_index() to crash"
Note that buf_block_t::index should be protected by btr_search_latch
or an s-latch or x-latch on the index page.
btr_search_drop_page_hash_index(): Read block->index while holding
btr_search_latch and use the cached value in the loop. Remove some
redundant assertions.
#15108: "mysqld crashes when innodb_log_file_size is set > 4G"
#15308: "Problem of Order with Enum Column in Primary Key"
#15550: "mysqld crashes in printing a FOREIGN KEY error in InnoDB"
row_ins_foreign_report_add_err(): When printing the parent record,
use the index in the parent table rather than the index in the child table.
#15653: "Slow inserts to InnoDB if many thousands of .ibd files"
Keep track on unflushed modifications to file spaces. When there are tens
of thousands of file spaces, flushing all files in fil_flush_file_spaces()
would be very slow.
fil_flush_file_spaces(): Only flush unflushed file spaces.
fil_space_t, fil_system_t: Add a list of unflushed spaces.
#15991: "innodb-file-per-table + symlink database + rename = cr"
os_file_handle_error(): Map the error codes EXDEV, ENOTDIR, and EISDIR
to the new code OS_FILE_PATH_ERROR. Treat this code as OS_FILE_PATH_ERROR.
This fixes the crash on RENAME TABLE when the .ibd file is a symbolic link
to a different file system.
#16157: "InnoDB crashes when main location settings are empty"
This patch is from Heikki.
#16298: "InnoDB segfaults in INSERTs in upgrade of 4.0 -> 5.0 tables
with VARCHAR BINARY"
dict_load_columns(): Set the charset-collation code
DATA_MYSQL_BINARY_CHARSET_COLL for those binary string columns
that lack a charset-collation code, i.e., the tables were created
with an older version of MySQL/InnoDB than 4.1.2.
#16229: "MySQL/InnoDB uses full explicit table locks in trigger processing"
Take a InnoDB table lock only if user has explicitly requested a table
lock. Added some additional comments to store_lock() and external_lock().
#16387: "InnoDB crash when dropping a foreign key <table>_ibfk_0"
Do not mistake TABLENAME_ibfk_0 for auto-generated id.
dict_table_get_highest_foreign_id(): Ignore foreign constraint
identifiers starting with the pattern TABLENAME_ibfk_0.
#16582: "InnoDB: Error in an adaptive hash index pointer to page"
Account for a race condition when dropping the adaptive hash index
for a B-tree page.
btr_search_drop_page_hash_index(): Retry the operation if a hash index
with different parameters was built meanwhile. Add diagnostics for the
case that hash node pointers to the page remain.
btr_search_info_update_hash(), btr_search_info_update_slow():
Document the parameter "info" as in/out.
#16814: "SHOW INNODB STATUS format error in LATEST FOREIGN KEY ERROR
section"
Add a missing newline to the LAST FOREIGN KEY ERROR section in SHOW
INNODB STATUS output.
dict_foreign_error_report(): Always print a newline after invoking
dict_print_info_on_foreign_key_in_create_format().
#16827: "Better InnoDB error message if ibdata files omitted from my.cnf"
#17126: "CHECK TABLE on InnoDB causes a short hang during check of adaptive
hash"
CHECK TABLE blocking other queries, by releasing the btr_search_latch
periodically during the adaptive hash table validation.
#17405: "Valgrind: conditional jump or move depends on unititialised values"
buf_block_init(): Reset magic_n, buf_fix_count and io_fix to avoid
testing uninitialized variables.