and compressed tables
buf_LRU_drop_page_hash_for_tablespace(): after releasing and
reacquiring the buffer pool mutex, do not dereference any block
descriptor pointer that is not known to be a pointer to an
uncompressed page frame (type buf_block_t; state ==
BUF_BLOCK_FILE_PAGE). Also, defer the acquisition of the block_mutex
until it is needed.
buf_page_get_gen(): Add mode == BUF_GET_IF_IN_POOL_PEEK for
buffer-fixing a block without making it young in the LRU list.
buf_page_get_gen(), buf_page_init(), buf_LRU_block_remove_hashed_page():
Set bpage->state = BUF_BLOCK_ZIP_FREE before buf_buddy_free(bpage),
so that similar race conditions might be detected a little easier.
btr_search_drop_page_hash_when_freed(): Use BUF_GET_IF_IN_POOL_PEEK
when dropping the hash indexes.
rb://528 approved by Jimmy Yang
IN OVERFLOW
Do not assign the result of the difference to a signed variable and
checking whether it is negative afterwards because this limits the max diff
to 2G on 32 bit systems. E.g. "signed = 3.5G - 1G" would be negative and the
code would assume that 3.5G < 1G. Instead compare the two variables directly
and assign to unsigned only if we know that the result of the subtraction
will be positive.
Discussed with: Jimmy and Sunny (via IRC)
Bug #11766501: Multiple RBS break the get rseg with mininum trx_t::no code during purge
Bug# 59291 changes:
Main problem is that truncating the UNDO log at the completion of every
trx_purge() call is expensive as the number of rollback segments is increased.
We truncate after a configurable amount of pages. The innodb_purge_batch_size
parameter is used to control when InnoDB does the actual truncate. The truncate
is done once after 128 (or TRX_SYS_N_RSEGS iterations). In other words we
truncate after purge 128 * innodb_purge_batch_size. The smaller the batch
size the quicker we truncate.
Introduce a new parameter that allows how many rollback segments to use for
storing REDO information. This is really step 1 in allowing complete control
to the user over rollback space management.
New parameters:
i) innodb_rollback_segments = number of rollback_segments to use
(default is now 128) dynamic parameter, can be changed anytime.
Currently there is little benefit in changing it from the default.
Optimisations in the patch.
i. Change the O(n) behaviour of trx_rseg_get_on_id() to O(log n)
Backported from 5.6. Refactor some of the binary heap code.
Create a new include/ut0bh.ic file.
ii. Avoid truncating the rollback segments after every purge.
Related changes that were moved to a separate patch:
i. Purge should not do any flushing, only wait for space to be free so that
it only does purging of records unless it is held up by a long running
transaction that is preventing it from progressing.
ii. Give the purge thread preference over transactions when acquiring the
rseg->mutex during commit. This to avoid purge blocking unnecessarily
when getting the next rollback segment to purge.
Bug #11766501 changes:
Add the rseg to the min binary heap under the cover of the kernel mutex and
the binary heap mutex. This ensures the ordering of the min binary heap.
The two changes have to be committed together because they share the same
that fixes both issues.
rb://567 Approved by: Inaam Rana.
rw_lock_create_func(): Initialize lock->writer_thread, so that Valgrind
will not complain even when Valgrind instrumentation is not enabled.
Flag lock->writer_thread uninitialized, so that Valgrind can complain
when it is used uninitialized.
rw_lock_set_writer_id_and_recursion_flag(): Revert the bogus Valgrind
instrumentation that was pushed in the first attempt to fix this bug.
by silencing a bogus Valgrind warning:
==4392== Conditional jump or move depends on uninitialised value(s)
==4392== at 0x5A18416: rw_lock_set_writer_id_and_recursion_flag (sync0rw.ic:283)
==4392== by 0x5A1865C: rw_lock_x_lock_low (sync0rw.c:558)
==4392== by 0x5A18481: rw_lock_x_lock_func (sync0rw.c:617)
==4392== by 0x597EEE6: mtr_x_lock_func (mtr0mtr.ic:271)
==4392== by 0x597EBBD: fsp_header_init (fsp0fsp.c:970)
==4392== by 0x5A15E78: innobase_start_or_create_for_mysql (srv0start.c:1508)
==4392== by 0x598B789: innobase_init(void*) (ha_innodb.cc:2282)
os_compare_and_swap_thread_id() is defined as
__sync_bool_compare_and_swap(). From the GCC doc:
`bool __sync_bool_compare_and_swap (TYPE *ptr, TYPE oldval TYPE newval, ...)'
...
The "bool" version returns true if the comparison is successful and
NEWVAL was written.
So it is not possible that the return value is uninitialized, no matter what
the arguments to os_compare_and_swap_thread_id() are. Probably Valgrind gets
confused by the implementation of the GCC internal function
__sync_bool_compare_and_swap().
This option is known to be broken when tablespaces contain off-page
columns after crash recovery. It has only been tested when creating
the data files from the scratch.
btr_blob_dbg_t: A map from page_no:heap_no:field_no to first_blob_page_no.
This map is instantiated for every clustered index in index->blobs.
It is protected by index->blobs_mutex.
btr_blob_dbg_msg_issue(): Issue a diagnostic message.
Invoked when btr_blob_dbg_msg is set.
btr_blob_dbg_rbt_insert(): Insert a btr_blob_dbg_t into index->blobs.
btr_blob_dbg_rbt_delete(): Remove a btr_blob_dbg_t from index->blobs.
btr_blob_dbg_cmp(): Comparator for btr_blob_dbg_t.
btr_blob_dbg_add_blob(): Add a BLOB reference to the map.
btr_blob_dbg_add_rec(): Add all BLOB references from a record to the map.
btr_blob_dbg_print(): Display the map of BLOB references in an index.
btr_blob_dbg_remove_rec(): Remove all BLOB references of a record from
the map.
btr_blob_dbg_is_empty(): Check that no BLOB references exist to or
from a page. Disowned references from delete-marked records are
tolerated.
btr_blob_dbg_op(): Perform an operation on all BLOB references on a
B-tree page.
btr_blob_dbg_add(): Add all BLOB references from a B-tree page to the
map.
btr_blob_dbg_remove(): Remove all BLOB references from a B-tree page
from the map.
btr_blob_dbg_restore(): Restore the BLOB references after a failed
page reorganize.
btr_blob_dbg_set_deleted_flag(): Modify the 'deleted' flag in the BLOB
references of a record.
btr_blob_dbg_owner(): Own or disown a BLOB reference.
btr_page_create(), btr_page_free_low(): Assert that no BLOB references exist.
btr_create(): Create index->blobs for clustered indexes.
btr_page_reorganize_low(): Invoke btr_blob_dbg_remove() before copying
the records. Invoke btr_blob_dbg_restore() if the operation fails.
btr_page_empty(), btr_lift_page_up(), btr_compress(), btr_discard_page():
Invoke btr_blob_dbg_remove().
btr_cur_del_mark_set_clust_rec(): Invoke btr_blob_dbg_set_deleted_flag().
Other cases of modifying the delete mark are either in the secondary
index or during crash recovery, which we do not promise to support.
btr_cur_set_ownership_of_extern_field(): Invoke btr_blob_dbg_owner().
btr_store_big_rec_extern_fields(): Invoke btr_blob_dbg_add_blob().
btr_free_externally_stored_field(): Invoke btr_blob_dbg_assert_empty()
on the first BLOB page.
page_cur_insert_rec_low(), page_cur_insert_rec_zip(),
page_copy_rec_list_end_to_created_page(): Invoke btr_blob_dbg_add_rec().
page_cur_insert_rec_zip_reorg(), page_copy_rec_list_end(),
page_copy_rec_list_start(): After failure, invoke
btr_blob_dbg_remove() and btr_blob_dbg_add().
page_cur_delete_rec(): Invoke btr_blob_dbg_remove_rec().
page_delete_rec_list_end(): Invoke btr_blob_dbg_op(btr_blob_dbg_remove_rec).
page_zip_reorganize(): Invoke btr_blob_dbg_remove() before copying the records.
page_zip_copy_recs(): Invoke btr_blob_dbg_add().
row_upd_rec_in_place(): Invoke btr_blob_dbg_rbt_delete() and
btr_blob_dbg_rbt_insert().
innobase_start_or_create_for_mysql(): Warn when UNIV_BLOB_DEBUG is enabled.
rb://550 approved by Jimmy Yang
rb://566
approved by: Sunny
When using native aio on linux each IO helper thread should be able to
handle upto 256 IO requests. The number 256 is the same which is used
for simulated aio as well. In case of windows where we also use native
aio this limit is 32 because of OS constraints. It seems that we are
using the limit of 32 for all the platforms where we are using native
aio. The fix is to use 256 on all platforms except windows (when native
aio is enabled on windows)