IN OVERFLOW
Do not assign the result of the difference to a signed variable and
checking whether it is negative afterwards because this limits the max diff
to 2G on 32 bit systems. E.g. "signed = 3.5G - 1G" would be negative and the
code would assume that 3.5G < 1G. Instead compare the two variables directly
and assign to unsigned only if we know that the result of the subtraction
will be positive.
Discussed with: Jimmy and Sunny (via IRC)
Bug #11766501: Multiple RBS break the get rseg with mininum trx_t::no code during purge
Bug# 59291 changes:
Main problem is that truncating the UNDO log at the completion of every
trx_purge() call is expensive as the number of rollback segments is increased.
We truncate after a configurable amount of pages. The innodb_purge_batch_size
parameter is used to control when InnoDB does the actual truncate. The truncate
is done once after 128 (or TRX_SYS_N_RSEGS iterations). In other words we
truncate after purge 128 * innodb_purge_batch_size. The smaller the batch
size the quicker we truncate.
Introduce a new parameter that allows how many rollback segments to use for
storing REDO information. This is really step 1 in allowing complete control
to the user over rollback space management.
New parameters:
i) innodb_rollback_segments = number of rollback_segments to use
(default is now 128) dynamic parameter, can be changed anytime.
Currently there is little benefit in changing it from the default.
Optimisations in the patch.
i. Change the O(n) behaviour of trx_rseg_get_on_id() to O(log n)
Backported from 5.6. Refactor some of the binary heap code.
Create a new include/ut0bh.ic file.
ii. Avoid truncating the rollback segments after every purge.
Related changes that were moved to a separate patch:
i. Purge should not do any flushing, only wait for space to be free so that
it only does purging of records unless it is held up by a long running
transaction that is preventing it from progressing.
ii. Give the purge thread preference over transactions when acquiring the
rseg->mutex during commit. This to avoid purge blocking unnecessarily
when getting the next rollback segment to purge.
Bug #11766501 changes:
Add the rseg to the min binary heap under the cover of the kernel mutex and
the binary heap mutex. This ensures the ordering of the min binary heap.
The two changes have to be committed together because they share the same
that fixes both issues.
rb://567 Approved by: Inaam Rana.
rw_lock_create_func(): Initialize lock->writer_thread, so that Valgrind
will not complain even when Valgrind instrumentation is not enabled.
Flag lock->writer_thread uninitialized, so that Valgrind can complain
when it is used uninitialized.
rw_lock_set_writer_id_and_recursion_flag(): Revert the bogus Valgrind
instrumentation that was pushed in the first attempt to fix this bug.
by silencing a bogus Valgrind warning:
==4392== Conditional jump or move depends on uninitialised value(s)
==4392== at 0x5A18416: rw_lock_set_writer_id_and_recursion_flag (sync0rw.ic:283)
==4392== by 0x5A1865C: rw_lock_x_lock_low (sync0rw.c:558)
==4392== by 0x5A18481: rw_lock_x_lock_func (sync0rw.c:617)
==4392== by 0x597EEE6: mtr_x_lock_func (mtr0mtr.ic:271)
==4392== by 0x597EBBD: fsp_header_init (fsp0fsp.c:970)
==4392== by 0x5A15E78: innobase_start_or_create_for_mysql (srv0start.c:1508)
==4392== by 0x598B789: innobase_init(void*) (ha_innodb.cc:2282)
os_compare_and_swap_thread_id() is defined as
__sync_bool_compare_and_swap(). From the GCC doc:
`bool __sync_bool_compare_and_swap (TYPE *ptr, TYPE oldval TYPE newval, ...)'
...
The "bool" version returns true if the comparison is successful and
NEWVAL was written.
So it is not possible that the return value is uninitialized, no matter what
the arguments to os_compare_and_swap_thread_id() are. Probably Valgrind gets
confused by the implementation of the GCC internal function
__sync_bool_compare_and_swap().
This option is known to be broken when tablespaces contain off-page
columns after crash recovery. It has only been tested when creating
the data files from the scratch.
btr_blob_dbg_t: A map from page_no:heap_no:field_no to first_blob_page_no.
This map is instantiated for every clustered index in index->blobs.
It is protected by index->blobs_mutex.
btr_blob_dbg_msg_issue(): Issue a diagnostic message.
Invoked when btr_blob_dbg_msg is set.
btr_blob_dbg_rbt_insert(): Insert a btr_blob_dbg_t into index->blobs.
btr_blob_dbg_rbt_delete(): Remove a btr_blob_dbg_t from index->blobs.
btr_blob_dbg_cmp(): Comparator for btr_blob_dbg_t.
btr_blob_dbg_add_blob(): Add a BLOB reference to the map.
btr_blob_dbg_add_rec(): Add all BLOB references from a record to the map.
btr_blob_dbg_print(): Display the map of BLOB references in an index.
btr_blob_dbg_remove_rec(): Remove all BLOB references of a record from
the map.
btr_blob_dbg_is_empty(): Check that no BLOB references exist to or
from a page. Disowned references from delete-marked records are
tolerated.
btr_blob_dbg_op(): Perform an operation on all BLOB references on a
B-tree page.
btr_blob_dbg_add(): Add all BLOB references from a B-tree page to the
map.
btr_blob_dbg_remove(): Remove all BLOB references from a B-tree page
from the map.
btr_blob_dbg_restore(): Restore the BLOB references after a failed
page reorganize.
btr_blob_dbg_set_deleted_flag(): Modify the 'deleted' flag in the BLOB
references of a record.
btr_blob_dbg_owner(): Own or disown a BLOB reference.
btr_page_create(), btr_page_free_low(): Assert that no BLOB references exist.
btr_create(): Create index->blobs for clustered indexes.
btr_page_reorganize_low(): Invoke btr_blob_dbg_remove() before copying
the records. Invoke btr_blob_dbg_restore() if the operation fails.
btr_page_empty(), btr_lift_page_up(), btr_compress(), btr_discard_page():
Invoke btr_blob_dbg_remove().
btr_cur_del_mark_set_clust_rec(): Invoke btr_blob_dbg_set_deleted_flag().
Other cases of modifying the delete mark are either in the secondary
index or during crash recovery, which we do not promise to support.
btr_cur_set_ownership_of_extern_field(): Invoke btr_blob_dbg_owner().
btr_store_big_rec_extern_fields(): Invoke btr_blob_dbg_add_blob().
btr_free_externally_stored_field(): Invoke btr_blob_dbg_assert_empty()
on the first BLOB page.
page_cur_insert_rec_low(), page_cur_insert_rec_zip(),
page_copy_rec_list_end_to_created_page(): Invoke btr_blob_dbg_add_rec().
page_cur_insert_rec_zip_reorg(), page_copy_rec_list_end(),
page_copy_rec_list_start(): After failure, invoke
btr_blob_dbg_remove() and btr_blob_dbg_add().
page_cur_delete_rec(): Invoke btr_blob_dbg_remove_rec().
page_delete_rec_list_end(): Invoke btr_blob_dbg_op(btr_blob_dbg_remove_rec).
page_zip_reorganize(): Invoke btr_blob_dbg_remove() before copying the records.
page_zip_copy_recs(): Invoke btr_blob_dbg_add().
row_upd_rec_in_place(): Invoke btr_blob_dbg_rbt_delete() and
btr_blob_dbg_rbt_insert().
innobase_start_or_create_for_mysql(): Warn when UNIV_BLOB_DEBUG is enabled.
rb://550 approved by Jimmy Yang
rb://566
approved by: Sunny
When using native aio on linux each IO helper thread should be able to
handle upto 256 IO requests. The number 256 is the same which is used
for simulated aio as well. In case of windows where we also use native
aio this limit is 32 because of OS constraints. It seems that we are
using the limit of 32 for all the platforms where we are using native
aio. The fix is to use 256 on all platforms except windows (when native
aio is enabled on windows)
The bug would cause a crash of InnoDB if a non-standard or unknown table
flags existed in a SYS_TABLES record. This is important because the next
file version, Cheetah, will identify itself by expanding this field. So
unless this is fixed, an older engine that tries to open a table in a
tablespace with a newer file version will crash instead of report an error
and refuse to open the table, as it should do.
Reviewed at RB://583. Approved by Marko.
btr_rec_get_field_ref_offs(), btr_rec_get_field_ref(): New functions.
Get the pointer to an externally stored field.
btr_cur_set_ownership_of_extern_field(): Assert that the BLOB has not
already been disowned.
btr_store_big_rec_extern_fields(): Rename to
btr_store_big_rec_extern_fields_func() and add the debug parameter
update_in_place. All pointers to externally stored columns in the
record must either be zero or they must be pointers to inherited
columns, owned by this record or an earlier record version. For any
BLOB that is stored, the BLOB pointer must previously have been
zero. When the function completes, all BLOB pointers must be nonzero
and owned by the record.
rb://549 approved by Jimmy Yang
row_purge(): Change the return type to void. (The return value always
was DB_SUCCESS.) Remove some local variables.
row_undo_mod_remove_clust_low(): Remove some local variables.
rb://547 approved by Jimmy Yang