This patch changes the innodb mutexes and rw_locks implementation.
On supported platforms it uses GCC builtin atomics. These changes
are based on the patch sent by Mark Callaghan of Google under BSD
license. More technical discussion can be found at rb://30
Approved by: Heikki
btr_search_disabled: Rename to btr_search_enabled and change the type
to char, so that it can be directly linked to the MySQL parameters.
Note that the variable is protected by btr_search_latch and
btr_search_enabled_mutex, a new mutex introduced in this patch.
btr_search_enabled_mutex: A new mutex, to protect btr_search_enabled
together with btr_search_latch.
buf_pool_drop_hash_index(): New function, to be called from
btr_search_disable().
btr_search_disable(), btr_search_enable(): Fix bugs. These functions
were previously unused.
btr_search_guess_on_hash(), btr_search_build_page_hash_index():
Check btr_search_enabled once more, while holding btr_search_latch.
btr_cur_search_to_nth_level(): Note that the reads of btr_search_enabled
may be dirty and explain why it should not be a problem.
innobase_adaptive_hash_index: Remove. The variable btr_search_enabled will be used directly instead.
innodb_adaptive_hash_index_update(): New function, an update callback for
innodb_adaptive_hash_index. This will call either btr_search_disable()
or btr_search_enable() when the value is assigned. The functions will
be called even if the value does not appear to be changed, e.g., when
setting from TRUE to TRUE or FALSE to FALSE.
rb://85 approved by Heikki Tuuri. This addresses Issue #163.
All InnoDB related tests passed on Windows, except
known failure in partition_innodb_semi_consistent.
The inadvertent change to btr0sea.c in this commit is reverted in r4060.
------------------------------------------------------------------------
r4035 | vasil | 2009-01-26 09:26:25 -0600 (Mon, 26 Jan 2009) | 23 lines
branches/5.1:
Merge a change from MySQL:
------------------------------------------------------------
revno: 2646.161.4
committer: Tatiana A. Nurnberg <azundris@mysql.com>
branch nick: 51-31177v2
timestamp: Mon 2009-01-12 06:32:49 +0100
message:
Bug#31177: Server variables can't be set to their current values
Bounds-checks and blocksize corrections were applied to user-input,
but constants in the server were trusted implicitly. If these values
did not actually meet the requirements, the user could not set change
a variable, then set it back to the (wonky) factory default or maximum
by explicitly specifying it (SET <var>=<value> vs SET <var>=DEFAULT).
Now checks also apply to the server's presets. Wonky values and maxima
get corrected at startup. Consequently all non-offsetted values the user
sees are valid, and users can set the variable to that exact value if
they so desire.
introduced in r4036.
Do not call buf_block_get_space(), buf_block_get_page_no()
unless the block state is BUF_BLOCK_FILE_PAGE.
This bug was reported by Michael.
assertion failure that was accidentally introduced in r4036.
Instead of calling buf_block_get_frame(), which asserts that the
block must be buffer-fixed, access block->frame directly. That
is safe, because changes of block->page.state are protected by
the buffer pool mutex, which we are holding.
This bug was reported by Michael.
within UNIV_DEBUG. The two remaining callers in non-debug builds,
btr_search_guess_on_hash() and btr_search_validate(), were rewritten
to call buf_page_hash_get().
To implement support for a resizeable buffer pool, the function
buf_block_align() had been rewritten to perform a page hash lookup in
the buffer pool. The caller was also made responsible for holding the
buffer pool mutex.
Because the page hash lookup is expensive and it has to be done while
holding the buffer pool mutex, implement buf_block_align() by pointer
arithmetics again, and make btr_search_guess_on_hash() call it. Note
that this will have to be adjusted if the interface to the resizeable
buffer pool is actually implemented.
rb://83 approved by Heikki Tuuri, to address Issue #161.
As a deviation from the approved patch, this patch also makes
btr_search_validate() (invoked by CHECK TABLE) check that
buf_pool->page_hash is consistent with buf_block_align().
------------------------------------------------------------------------
r4032 | marko | 2009-01-23 15:43:51 +0200 (Fri, 23 Jan 2009) | 10 lines
branches/5.1: Merge r4031 from branches/5.0:
btr_search_drop_page_hash_when_freed(): Check if buf_page_get_gen()
returns NULL. The page may have been evicted from the buffer pool
between buf_page_peek_if_search_hashed() and buf_page_get_gen(),
because the buffer pool mutex will be released between these two calls.
(Bug #42279, Issue #160)
rb://82 approved by Heikki Tuuri
------------------------------------------------------------------------
from or to 0, compress the page at the same time. This is necessary,
because the column information stored on the compressed page will
differ between leaf and non-leaf pages. Leaf pages are identified by
PAGE_LEVEL=0. This bug was reported as Issue #150.
Document the similarity between btr_page_create() and
btr_page_empty(). Make the function signature of btr_page_empty()
identical with btr_page_create(). (This will add the parameter "level".)
btr_root_raise_and_insert(): Replace some code with a call to
btr_page_empty().
btr_attach_half_pages(): Assert that the page level has already been
set on both block and new_block. Do not set it again.
btr_discard_only_page_on_level(): Document that this function is
probably never called. Make it work on any height tree. (Tested on
2-high tree by disabling btr_lift_page_up().)
rb://68
when inserting or deleting from the insert buffer B-tree. Assert that
records in the insert buffer B-tree are never updated. This could cure
Issue #135.
btr_cur_optimistic_insert(): Do not update the insert buffer bitmap
when inserting to the insert buffer tree.
btr_cur_optimistic_delete(): Do not update the insert buffer bitmap
when deleting from the insert buffer tree. This could be the cause
of the assertion failure that was reported in Issue #135.
btr_cur_update_alloc_zip(): Assert that the index is not the insert
buffer. The insert buffer will never be stored in compressed format.
btr_cur_update_in_place(), btr_cur_optimistic_update(),
btr_cur_pessimistic_update(): Assert that these functions are never
invoked on the insert buffer tree. The insert buffer only supports
the insertion and deletion of records.
Make the variables rw_latch and buf_mode local in the for loop.
Initialize them at the beginning of each for loop round to reduce
register spilling on register-starved platforms such as the x86. Move
the assignment of rw_latch and buf_mode from the end of the loop to
the beginning of the loop. These parameters will only be needed in
the buf_page_get_gen() call at the start of the loop.
Remove the second (redundant) call to ibuf_should_try().
ibuf_should_try(): Now that the successful calls to this function will
be halved, halve the magic constant that ibuf_flush_count will be
compared to, accordingly.
rb://61 approved by Heikki over IM.
ibuf_get_volume_buffered(): Declare with static linkage.
This function is private to ibuf0ibuf.c.
btr_cur_pessimistic_delete(): Use the cached result of
btr_cur_get_index(cursor).
column of a compressed table, the BTR_EXTERN_LEN field in the BLOB pointer
will be written as 0. Tolerate this in the functions that deal with
externally stored columns. This fixes Issue #80 and was posted at rb://26.
Note that the clustered index record is always deleted or purged last,
after any secondary index records referring to it have been deleted.
btr_free_externally_stored_field(): On an uncompressed table, zero out
the BTR_EXTERN_LEN, so that half-deleted BLOBs can be detected after
crash recovery.
btr_copy_externally_stored_field_prefix(): Return 0 if the BLOB has been
half-deleted.
row_upd_ext_fetch(): Assert that the externally stored column exists.
row_ext_cache_fill(): Allow btr_copy_externally_stored_field_prefix()
to return 0.
row_sel_sec_rec_is_for_blob(): Return FALSE if the BLOB has been half-deleted.
This is correct, because the clustered index record would have been deleted
or purged last, after any secondary index records referring to it had been
deleted.
and the adaptive hash index. This should fix Issue #95 and Issue #87.
page_zip_copy_recs(): Copy PAGE_MAX_TRX_ID as well, to have similar behavior
to page_copy_rec_list_start() and page_copy_rec_list_end().
btr_root_raise_and_insert(), btr_page_split_and_insert(), btr_lift_page_up():
Update the lock table and the adaptive hash index.
Change the definition of add_on from ulint to ullint, to eliminate
the warning in .\btr\btr0cur.c:
conversion from 'ullint' to 'ulint', possible loss of data
Approved by: Heikki (on IM)
btr_lift_page_up(): Invoke page_zip_validate() on every page whose level
is adjusted.
btr_compress(): Invoke page_zip_validate() on merge_page at the end.
page_zip_copy_recs(): Relax the page_zip_validate(...) to
page_zip_validate_low(..., sloppy = TRUE) to avoid bogus assertion failures.
page_copy_rec_list_end(), page_delete_rec_list_start():
Invoke page_zip_validate_low(sloppy = TRUE).
in r2631. Include the node pointer field in the size calculation.
rec_get_converted_size_comp_prefix(): New function, to compute the storage
size of the prefix of an ordinary record in COMPACT format.
rec_get_converted_size_comp(): Use rec_get_converted_size_comp_prefix().
buf_block_dbg_add_level(block, level): Define as an empty macro when
UNIV_SYNC_DEBUG is not defined. Remove #ifdef UNIV_SYNC_DEBUG around
all invocations.
there will always be enough space for two node pointer records in an
empty B-tree page. This was reported as Mantis issue #73.
page_zip_rec_needs_ext(): Add the parameter n_fields, for accurate
estimation of the compressed size of the data dictionary information.
Given that this function is only invoked for records on leaf pages,
require that there be enough space for one record in the compressed
page. We check elsewhere that there will be enough room for two node
pointer records on higher-level pages.
btr_cur_optimistic_insert(): Ensure that there will be enough room for
two node pointer records on an empty non-leaf page. The rule for
leaf-page records will be enforced by the callers of
page_zip_rec_needs_ext().
btr_cur_pessimistic_insert(): Remove the insufficient check that the
leaf page record should be compressible by itself. Instead, now we
require that two node pointer records fit on a non-leaf page, and one
record will fit in uncompressed form on the leaf page.
page_zip_write_header(), page_zip_write_rec(): Re-enable the debug
assertions that were violated by the insufficient check in
btr_cur_pessimistic_insert().
innodb_bug36172.test: Use a larger compressed page size.
btr_search_drop_page_hash_index(): Add const qualifiers to the local
variables page, rec, and index, to ensure that they are not modified
by this function.
page_get_infimum_offset(), page_get_supremum_offset(): New functions.
page_get_infimum_rec(), page_get_supremum_rec(): Replaced by
const-preserving macros that invoke the accessor functions.
help in tracking down issue #63 (memory corruption). UNIV_BTR_DEBUG
is currently enabled in univ.i.
btr_root_fseg_validate(): New function, for validating a file segment
header on a B-tree root page.
btr_root_block_get(), btr_free_but_not_root(),
btr_root_raise_and_insert(), btr_discard_only_page_on_level():
Check PAGE_BTR_SEG_LEAF and PAGE_BTR_SEG_TOP on the root page with
btr_root_fseg_validate().
btr_root_raise_and_insert(): Move the assertion
dict_index_get_page(index) == page_get_page_no(root)
inside UNIV_BTR_DEBUG. It was previously enabled by UNIV_DEBUG.
btr_free_root(): Check PAGE_BTR_SEG_TOP on the root page with
btr_root_fseg_validate().
Limit the number of the pages that are sampled so it is never greater
than the total number of pages in the index.
The parameter that specifies the number of pages to test is global for
all tables. By limiting it this way we allow the user to set it "high"
to suit "large" tables and to avoid unnecessary work for "small" tables
(e.g. doing 100 dives in a table that has 5 pages, obviously testing
some pages more than once).
Suggested by: Ken
Approved by: Marko
The cardinality of every index (the number of different key values) is
calculated when the table is opened, at SHOW TABLE STATUS,
ANALYZE TABLE and on other circumstances (like when the table has
changed too much). Note that if the mysql client is running with the
auto-rehash setting turned on (default) this causes all tables to be
opened when it starts.
Previously InnoDB sampled 8 random pages from the index to get an
estimate of the cardinality. Now the number of sampled pages can be
changed via the global parameter innodb_stats_sample_pages which can
be tuned at runtime. The default value for this parameter is 8.
If the value of this parameter is changed, there may be serious problems:
- small values (say, 1) can cause an error in table stats;
- values much larger than 8 (say, 100), can cause a big slowdown in
table opening time, SHOW TABLE status, etc.
- query plans may be different from the old ones.
Approved by: Heikki
recovery, tolerate clustered index records whose externally stored
columns have not been written. This should remove the assertion failures
that were reported as Mantis issue#58, issue#62, issue#64.
trx_is_recv(): New function: TRUE if this transaction is rolling back
an incomplete transaction in crash recovery.
enum trx_rbmode: Rollback modes: no rollback, normal rollback, crash recovery.
btr_cur_pessimistic_delete(), btr_free_externally_stored_field(),
btr_rec_free_externally_stored_fields():
Replace the ibool parameter with enum trx_rbmode.
btr_free_externally_stored_field(): If field_ref is zero, return
but assert ut_a(rbmode == RB_RECOVERY). Unless InnoDB has crashed
while inserting a clustered index record, field_ref should not be zero.
btr_rec_free_updated_extern_fields(): Add the parameter enum trx_rbmode.
btr_cur_pessimistic_update(): Pass the rbmode parameter to
btr_rec_free_updated_extern_fields().
row_undo_ins(), row_undo_mod_upd_del_sec(): If row_build_index_entry()
fails, assert trx_is_recv() and skip this secondary index.
row_undo_mod_upd_del_sec(): Empty the heap at the end of each loop
iteration in order to conserve memory and to reduce the number of
low-level memory allocations.
------------------------------------------------------------------------
r2537 | inaam | 2008-07-15 20:46:03 +0300 (Tue, 15 Jul 2008) | 12 lines
branches/5.1 issue# 4
Fixed a timing hole where a thread dropping an index can free the
in-memory index struct while another thread is still using
that structure to remove entries from adaptive hash index belonging
to one of the pages that belongs to the index being dropped.
The fix is to have a reference counter in the index struct and to
wait for this counter to drop to zero beforing freeing the struct.
Reviewed by: Heikki
------------------------------------------------------------------------
page must be big enough to store two records, this is done because compression
later may very well result in two records residing on the same page. This
change handles the case where only one record fits on a page. We don't
split the page in the middle by default if there is only record on the page.
We only split the page if the tuple to be inserted is less than existing
record. That way the existing record is copied over to the right page
during the split and the new tuple is inserted to the left.
externally stored columns.
innodb-zip.test: Correct the test case. Without the fixes, the test
would fail, because the BLOB would be prepended with a 768-byte prefix
of the data.
row_upd_index_replace_new_col_vals_index_pos(),
row_upd_index_replace_new_col_vals(): Use only one "heap"
parameter that must be non-NULL. When fetching externally
stored columns, use upd_field_t::orig_len.
upd_get_field_by_field_no(): New accessor function, for retrieving
an field from an update vector by field_no.
row_upd_index_replace_new_col_val(): New function, for replacing the
value from an update vector. This used to be duplicated code in
row_upd_index_replace_new_col_vals_index_pos() and
row_upd_index_replace_new_col_vals().
blocks that contains uncompressed and compressed frames. This patch was
designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed
by Marko and Sunny.
buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove.
buf_page_belongs_to_unzip_LRU(): New predicate:
bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE.
buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the
regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds.
buf_LRU_free_block(): Add a third return value to refine the case
"cannot free the block".
buf_LRU_search_and_free_block(): Update the documentation to reflect the
implementation.
buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]:
Statistics for the unzip_LRU algorithm.
buf_LRU_stat_update(): New function: Update the statistics. Called once
per second by srv_error_monitor_thread().
buf_LRU_validate(): Validate the unzip_LRU list as well.
buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before
falling back to the regular LRU?
buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list():
Subfunctions of buf_LRU_search_and_free_block().
buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed
page from the unzip_LRU list before falling back to evicting an entire block
from the common LRU list.
buf_unzip_LRU_remove_block_if_needed(): New function.
buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
symbols. Use it for all definitions of non-static variables and functions.
lexyy.c, make_flex.sh: Declare yylex as UNIV_INTERN, not static. It is
referenced from pars0grm.c.
Actually, according to
nm .libs/ha_innodb.so|grep -w '[ABCE-TVXYZ]'
the following symbols are still global:
* The vtable for class ha_innodb
* pars0grm.c: The function yyparse() and the variables yychar, yylval, yynerrs
The required changes to the Bison-generated file pars0grm.c will be addressed
in a separate commit, which will add a script similar to make_flex.sh.
The class ha_innodb is renamed from class ha_innobase by a #define. Thus,
there will be no clash with the builtin InnoDB. However, there will be some
overhead for invoking virtual methods of class ha_innodb. Ideas for making
the vtable hidden are welcome. -fvisibility=hidden is not available in GCC 3.
btr_cur_pessimistic_update(): Note why the externally stored columns
of a record on a latched page cannot have been purged.
trx_undo_get_undo_rec(): Clarify that the stack of versions is locked
all the way down to the purge view.
trx_undo_prev_version_build(): Set *old_vers = NULL also when the record
could have been purged already. Add some clarifying comments.
is not indexed.
btr_search_update_hash_ref(), btr_search_drop_page_hash_index(),
btr_search_build_page_hash_index(), btr_search_update_hash_on_delete(),
btr_search_update_hash_node_on_insert(), btr_search_update_hash_on_insert(),
btr_search_validate():
Assert that hashed blocks do not belong to the insert buffer tree.
btr_search_move_or_delete_hash_entries():
When invoked on the insert buffer tree, assert that neither block is hashed.