InnoDB: Remove HAVE_purify, UNIV_INIT_MEM_TO_ZERO, UNIV_SET_MEM_TO_ZERO.
The compile-time setting HAVE_purify can mask potential bugs.
It is being set in PB2 Valgrind runs. We should simply get rid of it,
and replace it with UNIV_MEM_INVALID() to declare uninitialized memory
as such in Valgrind-instrumented binaries.
os_mem_alloc_large(), ut_malloc_low(): Remove the parameter set_to_zero.
ut_malloc(): Define as a macro that invokes ut_malloc_low().
buf_pool_init(): Never initialize the buffer pool frames. All pages
must be initialized before flushing them to disk.
mem_heap_alloc(): Never initialize the allocated memory block.
os_mem_alloc_nocache(), ut_test_malloc(): Unused function, remove.
rb:813 approved by Jimmy Yang
WITH LARGE BUFFER POOL
(Note: this a backport of revno:3472 from mysql-trunk)
rb://845
approved by: Marko
When dropping a table (with an .ibd file i.e.: with
innodb_file_per_table set) we scan entire LRU to invalidate pages from
that table. This can be painful in case of large buffer pools as we hold
the buf_pool->mutex for the scan. Note that gravity of the problem does
not depend on the size of the table. Even with an empty table but a
large and filled up buffer pool we'll end up scanning a very long LRU
list.
The fix is to scan flush_list and just remove the blocks belonging to
the table from the flush_list, marking them as non-dirty. The blocks
are left in the LRU list for eventual eviction due to aging. The
flush_list is typically much smaller than the LRU list but for cases
where it is very long we have the solution of releasing the
buf_pool->mutex after scanning 1K pages.
buf_page_[set|unset]_sticky(): Use new IO-state BUF_IO_PIN to ensure
that a block stays in the flush_list and LRU list when we release
buf_pool->mutex. Previously we have been abusing BUF_IO_READ to achieve
this.
The fix of Bug#12612184 broke crash recovery. When a record that
contains off-page columns (BLOBs) is updated, we must first write redo
log about the BLOB page writes, and only after that write the redo log
about the B-tree changes. The buggy fix would log the B-tree changes
first, meaning that after recovery, we could end up having a record
that contains a null BLOB pointer.
Because we will be redo logging the writes off the off-page columns
before the B-tree changes, we must make sure that the pages chosen for
the off-page columns are free both before and after the B-tree
changes. In this way, the worst thing that can happen in crash
recovery is that the BLOBs are written to free pages, but the B-tree
changes are not applied. The BLOB pages would correctly remain free in
this case. To achieve this, we must allocate the BLOB pages in the
mini-transaction of the B-tree operation. A further quirk is that BLOB
pages are allocated from the same file segment as leaf pages. Because
of this, we must temporarily "hide" any leaf pages that were freed
during the B-tree operation by "fake allocating" them prior to writing
the BLOBs, and freeing them again before the mtr_commit() of the
B-tree operation, in btr_mark_freed_leaves().
btr_cur_mtr_commit_and_start(): Remove this faulty function that was
introduced in the Bug#12612184 fix. The problem that this function was
trying to address was that when we did mtr_commit() the BLOB writes
before the mtr_commit() of the update, the new BLOB pages could have
overwritten clustered index B-tree leaf pages that were freed during
the update. If recovery applied the redo log of the BLOB writes but
did not see the log of the record update, the index tree would be
corrupted. The correct solution is to make the freed clustered index
pages unavailable to the BLOB allocation. This function is also a
likely culprit of InnoDB hangs that were observed when testing the
Bug#12612184 fix.
btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
or freed (nonfree=FALSE) before committing the mini-transaction.
btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).
btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or if this is a "fake allocation"
(init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).
btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
initialized and X-latched in a different mini-transaction than the one
that is used for the allocation. Invoke btr_page_alloc_low(). If a
clustered index leaf page was previously freed in mtr, remove it from
the memo of previously freed pages.
btr_page_free(): Assert that the page is a B-tree page and it has been
X-latched by the mini-transaction. If the freed page was a leaf page
of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
the mini-transaction.
btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.
fsp_alloc_from_free_frag(): Refactored from
fsp_alloc_free_page(). Allocate the specified page from a partially
free extent.
fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or NULL if this is a "fake allocation"
that prevents the reuse of a previously freed B-tree page for BLOB
storage. If init_mtr==NULL, try harder to reallocate the specified page
and assert that it succeeded.
fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
specifying the mini-transaction where the page should be initialized.
Do not allow init_mtr == NULL, because this function is never to be
used for "fake allocations".
mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
mtr->freed_clust_leaf for quickly determining if any
MTR_MEMO_FREE_CLUST_LEAF operations have been posted.
row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
the mini-transaction as the alloc_mtr to
btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
row_build(): Correct a comment, and add a debug assertion that a
record that contains NULL BLOB pointers must be a fresh insert.
row_upd_clust_rec(): When columns are being moved off-page, invoke
btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
buf_reset_check_index_page_at_flush(): Remove. The function
fsp_init_file_page_low() already sets
bpage->check_index_page_at_flush=FALSE.
There is a known issue in tablespace extension. If the request to
allocate a BLOB page leads to the tablespace being extended, crash
recovery could see BLOB writes to pages that are off the tablespace
file bounds. This should trigger an assertion failure in fil_io() at
crash recovery. The safe thing would be to write redo log about the
tablespace extension to the mini-transaction of the BLOB write, not to
the mini-transaction of the record update. However, there is no redo
log record for file extension in the current redo log format.
rb:693 approved by Sunny Bains
Also addressed issues in bug #11745133, where we could mark a table
corrupted instead of crashing the server when found a corrupted buffer/page
if the table created with innodb_file_per_table on.
------------------------------------------------------------
revno 2876.244.305
revision id marko.makela@oracle.com-20110413082211-e6ouhjz5rmqxcqap
parent marko.makela@oracle.com-20110413075948-kvytmc37ye1nt7d9
committer Marko Mäkelä <marko.makela@oracle.com>
branch nick 5.6-innodb
timestamp Wed 2011-04-13 11:22:11 +0300
message:
Suppress the Bug #58815 (Bug #11765812) assertion failure.
buf_page_get_gen(): Introduce BUF_GET_POSSIBLY_FREED for suppressing the
check that the file page must not have been freed.
btr_estimate_n_rows_in_range_on_level(): Pass BUF_GET_POSSIBLY_FREED and
explain in the comments why this is needed and why it should be mostly
harmless to ignore the problem. If InnoDB had always initialized all
unused fields in data files, no problem would exist.
This change does not fix the bug, it just "shoots the messenger".
rb:647 approved by Jimmy Yang
Remove most references to thread id in InnoDB. Three references
remain: the current holder of a mutex, and the current x-lock holder
of a rw-lock, and some references in UNIV_SYNC_DEBUG checks. This
allows MySQL to change the thread associated to a client connection.
Tighten the UNIV_SYNC_DEBUG checks, trying to ensure that no InnoDB
mutex or x-lock is being held when returning control to MySQL. The
only semaphore that may be held is the btr_search_latch in shared mode.
sync_thread_levels_empty_except_dict(): A wrapper for
sync_thread_levels_empty_gen(TRUE).
sync_thread_levels_nonempty_trx(): Check that the current thread is
not holding any InnoDB semaphores, except btr_search_latch if
trx->has_search_latch.
sync_thread_levels_empty(): Unused function; remove.
trx_t: Remove mysql_thread_id and mysql_process_no.
srv_slot_t: Remove id and handle.
row_search_for_mysql(), srv_conc_enter_innodb(),
srv_conc_force_enter_innodb(), srv_conc_force_exit_innodb(),
srv_conc_exit_innodb(), srv_suspend_mysql_thread: Assert
!sync_thread_levels_nonempty_trx().
rb:634 approved by Sunny Bains
ibuf_inside(), ibuf_enter(), ibuf_exit(): Add the parameter mtr. The
flag is no longer kept in the thread-local storage but in the
mini-transaction (mtr->inside_ibuf).
mtr_start(): Clean up the comment and remove the unused return value.
mtr_commit(): Assert !ibuf_inside(mtr) in debug builds.
ibuf_mtr_start(): Like mtr_start(), but sets the flag.
ibuf_mtr_commit(), ibuf_btr_pcur_commit_specify_mtr(): Wrappers that
assert ibuf_inside().
buf_page_get_zip(), buf_page_init_for_read(),
buf_read_ibuf_merge_pages(), fil_io(), ibuf_free_excess_pages(),
ibuf_contract_ext(): Remove assertions on ibuf_inside(), because a
mini-transaction is not available.
buf_read_ahead_linear(): Add the parameter inside_ibuf.
ibuf_restore_pos(): When this function returns FALSE, it commits mtr
and must therefore do ibuf_exit(mtr).
ibuf_delete_rec(): This function commits mtr and must therefore do
ibuf_exit(mtr).
ibuf_rec_get_page_no(), ibuf_rec_get_space(), ibuf_rec_get_info(),
ibuf_rec_get_op_type(), ibuf_build_entry_from_ibuf_rec(),
ibuf_rec_get_volume(), ibuf_get_merge_page_nos(),
ibuf_get_volume_buffered_count(), ibuf_get_entry_counter_low(): Add
the parameter mtr in debug builds, for asserting ibuf_inside(mtr).
rb:585 approved by Sunny Bains
ibuf_page(): Renamed to ibuf_page_low(). Add the parameters file, line
so that the latch diagnostics will be more meaningful.
In debug builds, add the parameter ibool x_latch. When x_latch=FALSE,
do not x-latch the page, but only buffer-fix it for reading the bit.
In UNIV_SYNC_DEBUG, display a message if an insert buffer bitmap page
was already latched. (The message should be displayed in those cases
where the code would have previously failed.)
ibuf_page(): A wrapper macro for ibuf_page_low(). Pass x_latch=TRUE.
ibuf_bitmap_page_get_bits(): Renamed to ibuf_bitmap_page_get_bits_low().
In UNIV_DEBUG, add the parameter latch_mode.
Remove the parameter mtr unless UNIV_DEBUG is defined.
ibuf_bitmap_page_get_bits(): A wrapper macro for
ibuf_bitmap_page_get_bits_low(). Pass latch_type=MTR_MEMO_PAGE_X_FIX.
buf_page_get_gen(): Use ibuf_page_low(x_latch=FALSE) in the debug assertion.
This avoids the possible deadlock.
Do not disable InnoDB inlining when UNIV_DEBUG is defined. The
inlining is now solely controlled by the preprocessor symbol
UNIV_MUST_NOT_INLINE and by any compiler options.
mtr_memo_contains(): Add an explicit type conversion from void*, so
that the function can be compiled by a C++ compiler. Previously, this
function was never seen by the C++ compiler, because it is only
present in UNIV_DEBUG builds and InnoDB inlining used to be disabled.
buf_flush_validate_skip(): A wrapper that skips most calls of
buf_flush_validate_low(). Invoked by debug assertions in
buf_flush_insert_into_flush_list() and buf_flush_remove().
fil_validate_skip(): A wrapper that skips most calls of
fil_validate(). Invoked by debug assertions in fil_io() and fil_io_wait().
os_aio_validate_skip(): A wrapper that skips most calls of
os_aio_validate(). Invoked by debug assertions in
os_aio_func(), os_aio_windows_handle() and os_aio_simulated_handle.
os_get_os_version(): Only include this function if __WIN__ is defined.
sync_array_deadlock_step(): Slight optimizations. This function is a
major CPU consumer in UNIV_SYNC_DEBUG builds.
buf_page_t: Remove the buf_pool pointer. Add buf_pool_index:6 next to
buf_fix_count:19 (it was buf_fix_count:25). This will limit the number
of concurrent transactions to somewhere around 524,288.
buf_pool_index(buf_pool): A new function to determine the index of a
buffer pool in buf_pool_ptr[].
buf_pool_ptr: Make this a dynamically allocated array instead of an
array of pointers. Allocate the array in buf_pool_init() and free in
buf_pool_free().
buf_pool_free_instance(): No longer free the buf_pool object, as it is
allocated from a big array.
buf_pool_from_bpage(), buf_pool_from_block(): Move the definitions to
the beginning of the files, because some compilers have (had) problems
with forward definitions of inline functions. Calculate the buffer
pool from buf_pool_index.
buf_pool_from_array(): Add debug assertions for input validation.
Rename buf_pool_watch, buf_pool_mutex, buf_pool_zip_mutex
to buf_pool->watch, buf_pool->mutex, buf_pool->zip_mutex
in comments. Refer to buf_pool->flush_list_mutex instead of
flush_list_mutex.
Remove obsolete declarations of buf_pool_mutex and buf_pool_zip_mutex.
Additional fixes in 5.5:
ibuf_set_del_mark(): Add diagnostics when setting a buffered delete-mark fails.
ibuf_delete(): Correct a misleading comment about non-found records.
rec_print(): Add a const qualifier to the index parameter.
Bug #56680 wrong InnoDB results from a case-insensitive covering index
row_search_for_mysql(): When a secondary index record might not be
visible in the current transaction's read view and we consult the
clustered index and optionally some undo log records, return the
relevant columns of the clustered index record to MySQL instead of the
secondary index record.
ibuf_insert_to_index_page_low(): New function, refactored from
ibuf_insert_to_index_page().
ibuf_insert_to_index_page(): When we are inserting a record in place
of a delete-marked record and some fields of the record differ, update
that record just like row_ins_sec_index_entry_by_modify() would do.
btr_cur_update_alloc_zip(): Make the function public.
mysql_row_templ_t: Add clust_rec_field_no.
row_sel_store_mysql_rec(), row_sel_push_cache_row_for_mysql(): Add the
flag rec_clust, for returning data at clust_rec_field_no instead of
rec_field_no. Resurrect the debug assertion that the record not be
marked for deletion. (Bug #55626)
[UNIV_DEBUG || UNIV_IBUF_DEBUG] ibuf_debug, buf_page_get_gen(),
buf_flush_page_try():
Implement innodb_change_buffering_debug=1 for evicting pages from the
buffer pool, so that change buffering will be attempted more
frequently.
row_search_for_mysql(): When a secondary index record might not be
visible in the current transaction's read view and we consult the
clustered index and optionally some undo log records, return the
relevant columns of the clustered index record to MySQL instead of the
secondary index record.
REC_INFO_DELETED_FLAG: Move the definition from rem0rec.ic to rem0rec.h.
ibuf_insert_to_index_page_low(): New function, refactored from
ibuf_insert_to_index_page().
ibuf_insert_to_index_page(): When we are inserting a record in place
of a delete-marked record and some fields of the record differ, update
that record just like row_ins_sec_index_entry_by_modify() would do.
mysql_row_templ_t: Add clust_rec_field_no.
row_sel_store_mysql_rec(), row_sel_push_cache_row_for_mysql(): Add the
flag rec_clust, for returning data at clust_rec_field_no instead of
rec_field_no. Resurrect the debug assertion that the record not be
marked for deletion. (Bug #55626)
buf_LRU_free_block(): Refactored from
buf_LRU_search_and_free_block(). This is needed for the
innodb_change_buffering_debug diagnostics.
[UNIV_DEBUG || UNIV_IBUF_DEBUG] ibuf_debug, buf_page_get_gen(),
buf_flush_page_try():
Implement innodb_change_buffering_debug=1 for evicting pages from the
buffer pool, so that change buffering will be attempted more
frequently.
Fix compiler warning:
buf/buf0flu.c: In function 'buf_flush_batch':
buf/buf0flu.c:844:9: error: variable 'old_page_count' set but not used [-Werror=unused-but-set-variable]
pages that it wants to flush then we should honor that value as in
not going beyond that in our eagerness to flush the neighbors of
the selected victim.
In order to allow thread schedulers to be dynamically loaded,
it is necessary to make the following changes to the server:
- Two new service interfaces
- Modifications to InnoDB to inform the thread scheduler of state changes.
- Changes to the VIO subsystem for checking if data is available on a socket.
- Elimination of remains of the old thread pool implementation.
The two new service interfaces introduces are:
my_thread_scheduler
A service interface to register a thread
scheduler.
thd_wait
A service interface to inform thread scheduler
that the thread is about to start waiting.
In addition, the patch adds code that:
- Add a call to thd_wait for table locks in mysys
thd_lock.c by introducing a set function that
can be used to set a callback to be used when
waiting on a lock and resuming from waiting.
- Calling the mysys set function from the server
to set the callbacks correctly.