in r2631. Include the node pointer field in the size calculation.
rec_get_converted_size_comp_prefix(): New function, to compute the storage
size of the prefix of an ordinary record in COMPACT format.
rec_get_converted_size_comp(): Use rec_get_converted_size_comp_prefix().
buf_block_dbg_add_level(block, level): Define as an empty macro when
UNIV_SYNC_DEBUG is not defined. Remove #ifdef UNIV_SYNC_DEBUG around
all invocations.
there will always be enough space for two node pointer records in an
empty B-tree page. This was reported as Mantis issue #73.
page_zip_rec_needs_ext(): Add the parameter n_fields, for accurate
estimation of the compressed size of the data dictionary information.
Given that this function is only invoked for records on leaf pages,
require that there be enough space for one record in the compressed
page. We check elsewhere that there will be enough room for two node
pointer records on higher-level pages.
btr_cur_optimistic_insert(): Ensure that there will be enough room for
two node pointer records on an empty non-leaf page. The rule for
leaf-page records will be enforced by the callers of
page_zip_rec_needs_ext().
btr_cur_pessimistic_insert(): Remove the insufficient check that the
leaf page record should be compressible by itself. Instead, now we
require that two node pointer records fit on a non-leaf page, and one
record will fit in uncompressed form on the leaf page.
page_zip_write_header(), page_zip_write_rec(): Re-enable the debug
assertions that were violated by the insufficient check in
btr_cur_pessimistic_insert().
innodb_bug36172.test: Use a larger compressed page size.
btr_search_drop_page_hash_index(): Add const qualifiers to the local
variables page, rec, and index, to ensure that they are not modified
by this function.
page_get_infimum_offset(), page_get_supremum_offset(): New functions.
page_get_infimum_rec(), page_get_supremum_rec(): Replaced by
const-preserving macros that invoke the accessor functions.
help in tracking down issue #63 (memory corruption). UNIV_BTR_DEBUG
is currently enabled in univ.i.
btr_root_fseg_validate(): New function, for validating a file segment
header on a B-tree root page.
btr_root_block_get(), btr_free_but_not_root(),
btr_root_raise_and_insert(), btr_discard_only_page_on_level():
Check PAGE_BTR_SEG_LEAF and PAGE_BTR_SEG_TOP on the root page with
btr_root_fseg_validate().
btr_root_raise_and_insert(): Move the assertion
dict_index_get_page(index) == page_get_page_no(root)
inside UNIV_BTR_DEBUG. It was previously enabled by UNIV_DEBUG.
btr_free_root(): Check PAGE_BTR_SEG_TOP on the root page with
btr_root_fseg_validate().
Limit the number of the pages that are sampled so it is never greater
than the total number of pages in the index.
The parameter that specifies the number of pages to test is global for
all tables. By limiting it this way we allow the user to set it "high"
to suit "large" tables and to avoid unnecessary work for "small" tables
(e.g. doing 100 dives in a table that has 5 pages, obviously testing
some pages more than once).
Suggested by: Ken
Approved by: Marko
The cardinality of every index (the number of different key values) is
calculated when the table is opened, at SHOW TABLE STATUS,
ANALYZE TABLE and on other circumstances (like when the table has
changed too much). Note that if the mysql client is running with the
auto-rehash setting turned on (default) this causes all tables to be
opened when it starts.
Previously InnoDB sampled 8 random pages from the index to get an
estimate of the cardinality. Now the number of sampled pages can be
changed via the global parameter innodb_stats_sample_pages which can
be tuned at runtime. The default value for this parameter is 8.
If the value of this parameter is changed, there may be serious problems:
- small values (say, 1) can cause an error in table stats;
- values much larger than 8 (say, 100), can cause a big slowdown in
table opening time, SHOW TABLE status, etc.
- query plans may be different from the old ones.
Approved by: Heikki
recovery, tolerate clustered index records whose externally stored
columns have not been written. This should remove the assertion failures
that were reported as Mantis issue#58, issue#62, issue#64.
trx_is_recv(): New function: TRUE if this transaction is rolling back
an incomplete transaction in crash recovery.
enum trx_rbmode: Rollback modes: no rollback, normal rollback, crash recovery.
btr_cur_pessimistic_delete(), btr_free_externally_stored_field(),
btr_rec_free_externally_stored_fields():
Replace the ibool parameter with enum trx_rbmode.
btr_free_externally_stored_field(): If field_ref is zero, return
but assert ut_a(rbmode == RB_RECOVERY). Unless InnoDB has crashed
while inserting a clustered index record, field_ref should not be zero.
btr_rec_free_updated_extern_fields(): Add the parameter enum trx_rbmode.
btr_cur_pessimistic_update(): Pass the rbmode parameter to
btr_rec_free_updated_extern_fields().
row_undo_ins(), row_undo_mod_upd_del_sec(): If row_build_index_entry()
fails, assert trx_is_recv() and skip this secondary index.
row_undo_mod_upd_del_sec(): Empty the heap at the end of each loop
iteration in order to conserve memory and to reduce the number of
low-level memory allocations.
------------------------------------------------------------------------
r2537 | inaam | 2008-07-15 20:46:03 +0300 (Tue, 15 Jul 2008) | 12 lines
branches/5.1 issue# 4
Fixed a timing hole where a thread dropping an index can free the
in-memory index struct while another thread is still using
that structure to remove entries from adaptive hash index belonging
to one of the pages that belongs to the index being dropped.
The fix is to have a reference counter in the index struct and to
wait for this counter to drop to zero beforing freeing the struct.
Reviewed by: Heikki
------------------------------------------------------------------------
page must be big enough to store two records, this is done because compression
later may very well result in two records residing on the same page. This
change handles the case where only one record fits on a page. We don't
split the page in the middle by default if there is only record on the page.
We only split the page if the tuple to be inserted is less than existing
record. That way the existing record is copied over to the right page
during the split and the new tuple is inserted to the left.
externally stored columns.
innodb-zip.test: Correct the test case. Without the fixes, the test
would fail, because the BLOB would be prepended with a 768-byte prefix
of the data.
row_upd_index_replace_new_col_vals_index_pos(),
row_upd_index_replace_new_col_vals(): Use only one "heap"
parameter that must be non-NULL. When fetching externally
stored columns, use upd_field_t::orig_len.
upd_get_field_by_field_no(): New accessor function, for retrieving
an field from an update vector by field_no.
row_upd_index_replace_new_col_val(): New function, for replacing the
value from an update vector. This used to be duplicated code in
row_upd_index_replace_new_col_vals_index_pos() and
row_upd_index_replace_new_col_vals().
blocks that contains uncompressed and compressed frames. This patch was
designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed
by Marko and Sunny.
buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove.
buf_page_belongs_to_unzip_LRU(): New predicate:
bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE.
buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the
regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds.
buf_LRU_free_block(): Add a third return value to refine the case
"cannot free the block".
buf_LRU_search_and_free_block(): Update the documentation to reflect the
implementation.
buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]:
Statistics for the unzip_LRU algorithm.
buf_LRU_stat_update(): New function: Update the statistics. Called once
per second by srv_error_monitor_thread().
buf_LRU_validate(): Validate the unzip_LRU list as well.
buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before
falling back to the regular LRU?
buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list():
Subfunctions of buf_LRU_search_and_free_block().
buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed
page from the unzip_LRU list before falling back to evicting an entire block
from the common LRU list.
buf_unzip_LRU_remove_block_if_needed(): New function.
buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
symbols. Use it for all definitions of non-static variables and functions.
lexyy.c, make_flex.sh: Declare yylex as UNIV_INTERN, not static. It is
referenced from pars0grm.c.
Actually, according to
nm .libs/ha_innodb.so|grep -w '[ABCE-TVXYZ]'
the following symbols are still global:
* The vtable for class ha_innodb
* pars0grm.c: The function yyparse() and the variables yychar, yylval, yynerrs
The required changes to the Bison-generated file pars0grm.c will be addressed
in a separate commit, which will add a script similar to make_flex.sh.
The class ha_innodb is renamed from class ha_innobase by a #define. Thus,
there will be no clash with the builtin InnoDB. However, there will be some
overhead for invoking virtual methods of class ha_innodb. Ideas for making
the vtable hidden are welcome. -fvisibility=hidden is not available in GCC 3.
btr_cur_pessimistic_update(): Note why the externally stored columns
of a record on a latched page cannot have been purged.
trx_undo_get_undo_rec(): Clarify that the stack of versions is locked
all the way down to the purge view.
trx_undo_prev_version_build(): Set *old_vers = NULL also when the record
could have been purged already. Add some clarifying comments.
is not indexed.
btr_search_update_hash_ref(), btr_search_drop_page_hash_index(),
btr_search_build_page_hash_index(), btr_search_update_hash_on_delete(),
btr_search_update_hash_node_on_insert(), btr_search_update_hash_on_insert(),
btr_search_validate():
Assert that hashed blocks do not belong to the insert buffer tree.
btr_search_move_or_delete_hash_entries():
When invoked on the insert buffer tree, assert that neither block is hashed.
to the undo log, also store the original length of the column, so that the
changes will be correctly undone in transaction rollback or when fetching
previous versions of the row.
innodb-zip.test: New file, for tests of the compression.
upd_field_t: Add orig_len, the original length of new_val.
btr_push_update_extern_fields(): Restore the original prefix of the column.
Add the parameter heap where memory will be allocated if necessary.
trx_undo_rec_get_col_val(): Add the output parameter orig_len.
trx_undo_page_report_modify_ext(): New function: Write an externally
stored column to the undo log. This is only called from
trx_undo_page_report_modify(), and this is the only caller of
trx_undo_page_fetch_ext().
trx_undo_update_rec_get_update(): Read the original length of the column
prefix to upd_field->orig_len.
btr_page_get_sure_split_rec(): Remove the check if insert_size
exceeds free_space.
btr_page_split_and_insert(): If a compressed page has already been split,
avoid further splits by inserting the record to an empty page. As a
performance optimization, avoid invoking btr_page_insert_fits() on
compressed tables.
stored columns (BLOBs).
btr_copy_blob_prefix(), btr_copy_zblob_prefix(),
btr_copy_externally_stored_field_prefix_low(),
btr_copy_externally_stored_field_prefix(),
btr_copy_externally_stored_field(),
btr_rec_copy_externally_stored_field():
Note that the page containing the clustered index record that points to
the BLOB must be latched.
btr_copy_zblob_prefix(): Note that there is no latch on the page, and thus
all accesses to a given page via this function must be covered by the same
set of locks or latches.
btr_copy_zblob_prefix(): Note that the block acquired by
buf_page_get_zip() is protected by an exclusive table lock or
or by a latch on the clustered index record.
row_build(), row_upd_index_replace_new_col_vals_index_pos(), and
row_upd_index_replace_new_col_vals() are safe.
btr_cur_optimistic_update(), btr_cur_pessimistic_update(): Note that
the B-tree page of the clustered index record is latched in mtr.
trx_undo_prev_version_build(): Add const qualifiers to index_rec
and rec. Note that the page of index_rec is latched in index_mtr.
row_vers_impl_x_locked_off_kernel(), row_vers_old_has_index_entry():
Note that the stack of versions is locked by mtr and thus it is
safe to call row_build().
buf_pool->mutex: Rename to buf_pool_mutex, so that the wrappers will have
to be used when changes are merged from other source trees.
buf_pool->zip_mutex: Rename to buf_pool_zip_mutex.
buf_pool_mutex_own(), buf_pool_mutex_enter(), buf_pool_mutex_exit():
Wrappers for buf_pool_mutex.
btr_copy_blob_prefix(), btr_copy_externally_stored_field_prefix_low():
Document the return value as "number of bytes written", not "bytes written".
trx_undo_page_fetch_ext(): Explain the assertion ut_a(ext_len).
row_build_index_entry(): Explain the assertion ut_a(!ext).
mutex is temporarily released.
buf_LRU_free_block(), buf_buddy_alloc_clean(): Add an output parameter that
will be assigned TRUE when the buffer pool mutex is released.
This bug was spotted by and fix provided by Sunny.
univ.i: Do not define UNIV_DEBUG, UNIV_ZIP_DEBUG.
btr_cur_del_unmark_for_ibuf(): Use the same comment in both btr0cur.c and
btr0cur.h. Wrap long lines.
contents end up with conflicting versions of a record's state. The zipped
page record was not being marked as "(un)deleted" because we were not
passing the zipped page contents to the (un)delete function, which first
(un)delete marks the uncompressed version and then based on whether
page_zip is NULL or not (un)delete marks the record in the compressed page.
inserted, uncommitted clustered index records when determining if a
secondary index record that contains a column prefix of an externally
stored column is referencing the clustered index record.
field_ref_zero[]: A BLOB pointer full of zero, for use in comparisons.
btr_copy_externally_stored_field_prefix(): Assert that the BLOB pointer is set.
row_ext_lookup_ith(), row_ext_lookup(), row_ext_lookup_low(): Document
that field_ref_zero is returned when the BLOB cannot be fetched.
row_ext_lookup_low(): Return field_ref_zero and *len = 0 when the
BLOB pointer is unset.
row_build_index_entry(): Return NULL when a needed BLOB pointer cannot
be dereferenced (row_ext_lookup returns field_ref_zero). Check the
return value for NULL in callers.
row_vers_impl_x_locked_off_kernel(): Avoid comparisons when
row_build_index_entry() returns NULL.
row_vers_old_has_index_entry(): Ignore records for which
row_build_index_entry() returns NULL. The entry should never be NULL
in rollback, but it may be NULL in purge.
row_merge_buf_add(): Assert that row_ext_lookup() does not return
field_ref_zero. The table will be locked during index creation.
btr_cur_optimistic_insert(), pass big_rec to it, so that
the field references of externally stored columns (BLOB pointers)
will not be left uninitialized after a successful optimistic insert.
This bug was spotted by Sunny.
column prefix of an externally stored column.
row_upd_ext_fetch(): New function.
row_upd_index_replace_new_col_vals(),
row_upd_index_replace_new_col_vals_index_pos(): Fetch prefixes of
externally stored columns when they are needed for column prefix
indexes. For memory allocation, add the parameter ext_heap. Avoid
repeating the inner loop after finding a matching upd_field->field_no.