This fix makes two basic changes in blob handling:
(The bug was introduced in r2252)
1) The blob prefixes are no longer stored in the undo if
a) We are modifying a delete marked record and
b) The record was delete marked by an already committed trx.
2) When building old row version to check if one of these versions
can hold an implicit lock on the record we stop our probe if
a) The version is delete marked and
b) The delete marking is done by a trx which is different from
the current active trx on the record.
Reviewed by: Heikki
externally stored columns.
innodb-zip.test: Correct the test case. Without the fixes, the test
would fail, because the BLOB would be prepended with a 768-byte prefix
of the data.
row_upd_index_replace_new_col_vals_index_pos(),
row_upd_index_replace_new_col_vals(): Use only one "heap"
parameter that must be non-NULL. When fetching externally
stored columns, use upd_field_t::orig_len.
upd_get_field_by_field_no(): New accessor function, for retrieving
an field from an update vector by field_no.
row_upd_index_replace_new_col_val(): New function, for replacing the
value from an update vector. This used to be duplicated code in
row_upd_index_replace_new_col_vals_index_pos() and
row_upd_index_replace_new_col_vals().
lock_get_table(), locks_row_eq_lock(), buf_page_get_mutex(): Add return
after ut_error. On Windows, ut_error is not declared as "noreturn".
Add explicit type casts when assigning ulint to byte to get rid of
"possible loss of precision" warnings.
struct i_s_table_cache_struct: Declare rows_used, rows_allocd as ulint
instead of ullint. 32 bits should be enough.
fill_innodb_trx_from_cache(), i_s_zip_fill_low(): Cast 64-bit unsigned
integers to longlong when calling Field::store(longlong, bool is_unsigned).
Otherwise, the compiler would implicitly convert them to double and
invoke Field::store(double) instead.
recv_truncate_group(), recv_copy_group(), recv_calc_lsn_on_data_add():
Cast ib_uint64_t expressions to ulint to get rid of "possible loss of
precision" warnings. (There should not be any loss of precision in
these cases.)
log_close(), log_checkpoint_margin(): Declare some variables as ib_uint64_t
instead of ulint, so that there won't be any potential loss of precision.
mach_write_ull(): Cast the second argument of mach_write_to_4() to ulint.
OS_FILE_FROM_FD(): Cast the return value of _get_osfhandle() to HANDLE.
row_merge_dict_table_get_index(): Cast the parameter of mem_free() to (void*)
in order to get rid of the bogus MSVC warning C4090, which has been reported
as MSVC bug 101661:
<http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=101661>
row_mysql_read_blob_ref(): To get rid of a bogus MSVC warning C4090,
drop a const qualifier.
suggested by Heikki, because it breaks row_vers_impl_x_locked_off_kernel();
see Mantis issue #10.
However, now that Heikki's fix has been removed, the code may break elsewhere
when it tries to dereference half-freed or completely freed externally
stored columns.
about undo_rec possibly being uninitialized. When trx_undo_get_undo_rec()
leaves undo_rec uninitialized, both functions will return DB_MISSING_HISTORY
without dereferencing undo_rec.
This closes Mantis issue #7.
symbols. Use it for all definitions of non-static variables and functions.
lexyy.c, make_flex.sh: Declare yylex as UNIV_INTERN, not static. It is
referenced from pars0grm.c.
Actually, according to
nm .libs/ha_innodb.so|grep -w '[ABCE-TVXYZ]'
the following symbols are still global:
* The vtable for class ha_innodb
* pars0grm.c: The function yyparse() and the variables yychar, yylval, yynerrs
The required changes to the Bison-generated file pars0grm.c will be addressed
in a separate commit, which will add a script similar to make_flex.sh.
The class ha_innodb is renamed from class ha_innobase by a #define. Thus,
there will be no clash with the builtin InnoDB. However, there will be some
overhead for invoking virtual methods of class ha_innodb. Ideas for making
the vtable hidden are welcome. -fvisibility=hidden is not available in GCC 3.
btr_cur_pessimistic_update(): Note why the externally stored columns
of a record on a latched page cannot have been purged.
trx_undo_get_undo_rec(): Clarify that the stack of versions is locked
all the way down to the purge view.
trx_undo_prev_version_build(): Set *old_vers = NULL also when the record
could have been purged already. Add some clarifying comments.
to the undo log, also store the original length of the column, so that the
changes will be correctly undone in transaction rollback or when fetching
previous versions of the row.
innodb-zip.test: New file, for tests of the compression.
upd_field_t: Add orig_len, the original length of new_val.
btr_push_update_extern_fields(): Restore the original prefix of the column.
Add the parameter heap where memory will be allocated if necessary.
trx_undo_rec_get_col_val(): Add the output parameter orig_len.
trx_undo_page_report_modify_ext(): New function: Write an externally
stored column to the undo log. This is only called from
trx_undo_page_report_modify(), and this is the only caller of
trx_undo_page_fetch_ext().
trx_undo_update_rec_get_update(): Read the original length of the column
prefix to upd_field->orig_len.
row_build(), row_upd_index_replace_new_col_vals_index_pos(), and
row_upd_index_replace_new_col_vals() are safe.
btr_cur_optimistic_update(), btr_cur_pessimistic_update(): Note that
the B-tree page of the clustered index record is latched in mtr.
trx_undo_prev_version_build(): Add const qualifiers to index_rec
and rec. Note that the page of index_rec is latched in index_mtr.
row_vers_impl_x_locked_off_kernel(), row_vers_old_has_index_entry():
Note that the stack of versions is locked by mtr and thus it is
safe to call row_build().
btr_copy_blob_prefix(), btr_copy_externally_stored_field_prefix_low():
Document the return value as "number of bytes written", not "bytes written".
trx_undo_page_fetch_ext(): Explain the assertion ut_a(ext_len).
row_build_index_entry(): Explain the assertion ut_a(!ext).
enough prefixes of externally stored columns, so that purge will not have
to dereference any BLOB pointers, which may be invalid. This will not be
necessary for logging inserts, because inserts are no-ops in purge, and
the record will remain locked during transaction rollback.
TODO: in dict_build_table_def_step() or dict_build_index_def_step(),
prevent the creation of tables with too many columns for which a
prefix index is defined. This is because there is a size limit of undo
log records, and for each prefix-indexed column, the log must store
REC_MAX_INDEX_COL_LEN + BTR_EXTERN_FIELD_REF_SIZE bytes.
trx_undo_page_report_insert(): Assert that the index is clustered.
trx_undo_page_fetch_ext(): New function, for fetching the BLOB prefix
in trx_undo_page_report_modify().
trx_undo_page_report_modify(): Write long enough prefixes of the externally
stored columns to the undo log.
trx_undo_rec_get_partial_row(): Remove the parameter "ext". Assert that
the undo log contains long enough prefixes of the externally stored columns.
purge_node_t: Remove the field "ext".
Only add indexed BLOBs to row_ext.
trx_undo_rec_get_partial_row(): Move the BLOB fetching to row_ext_create().
row_build(): Pass only those BLOBs to row_ext_create() that are referenced by
ordering columns of some indexes, similar to trx_undo_rec_get_partial_row().
row_ext_create(): Add the parameter "tuple". Move the implementation
from row0ext.ic to row0ext.c.
row_ext_lookup_ith(), row_ext_lookup(): Return a const pointer. Remove
the parameters "field" and "f_len". Make the row_ext_t* parameter const.
row_ext_t: Remove the field zip_size.
field_ref_zero[]: Declare in btr0types.h instead of btr0cur.h.
row_ext_lookup_low(): Rename to row_ext_cache_fill() and change the
signature.
only for those externally stored columns that occur in the ordering columns
of indexes. Prefetch the prefixes of those columns, because the clustered
index record and the BLOBs may have been deleted by the time when the
purge thread needs to read the BLOB prefixes.
row_ext_create(): Add the debug assertion ut_ad(ut_is_2pow(zip_size)).
column prefix of an externally stored column.
row_upd_ext_fetch(): New function.
row_upd_index_replace_new_col_vals(),
row_upd_index_replace_new_col_vals_index_pos(): Fetch prefixes of
externally stored columns when they are needed for column prefix
indexes. For memory allocation, add the parameter ext_heap. Avoid
repeating the inner loop after finding a matching upd_field->field_no.
set the "external storage" flag. When parsing the undo log, do not
misinterpret a SQL NULL column for externally stored.
These bugs were spotted by Heikki and Sunny.
trx_undo_page_report_modify(): Set the UNIV_EXTERN_STORAGE_FIELD flag
when needed.
trx_undo_rec_get_partial_row(): Check for len == UNIV_SQL_NULL.
fix the bugs introduced in r1591.
row_rec_to_index_entry_low(): Clear "n_ext". Do not allow it to be NULL.
Add const qualifier to dict_index_t*.
row_rec_to_index_entry(): Add the parameters "offsets" and "n_ext".
btr_cur_optimistic_update(): Add an assertion that there are no externally
stored columns. Remove the unreachable call to btr_cur_unmark_extern_fields()
and the preceding unnecessary call to rec_get_offsets().
btr_push_update_extern_fields(): Remove the parameters index, offsets.
Only report the additional externally stored columns of the update vector.
row_build(), trx_undo_rec_get_partial_row(): Flag externally stored columns
also with dfield_set_ext().
rec_copy_prefix_to_dtuple(): Assert that there are no externally stored
columns in the prefix.
row_build_row_ref(): Note and assert that the index is a secondary index,
and assert that there are no externally stored columns.
row_build_row_ref_fast(): Assert that there are no externally stored columns.
rec_offs_get_n_alloc(): Expose the function.
row_build_row_ref_in_tuple(): Assert that there are no externally stored
columns in a record of a secondary index.
row_build_row_ref_from_row(): Assert that there are no externally stored
columns.
row_upd_check_references_constraints(): Add the parameter offsets, to
avoid a redundant call to rec_get_offsets().
row_upd_del_mark_clust_rec(): Add the parameter offsets. Remove
duplicated code.
row_ins_index_entry_set_vals(): Copy the external storage flag.
sel_pop_prefetched_row(): Assert that there are no externally stored
columns.
row_scan_and_check_index(): Copy offsets to a temporary heap across
the invocation of row_rec_to_index_entry().
and use it for flagging externally stored columns in the data tuple.
The data tuple contains the same columns as the clustered index record,
but in a different order. This error was introduced in r1591.
TODO: the assertion ut_ad(!dfield_is_ext()) may fail in
btr_cur_pessimistic_update().
offsets_[] arrays, as suggested by Vasil.
rec_offs_set_n_alloc(): Declare as a public function. Assert that
n_alloc > REC_OFFS_HEADER_SIZE.
rec_offs_get_n_alloc(): Assert that n_alloc > REC_OFFS_HEADER_SIZE.
trx_t: Remove dict_undo_list and dict_redo_list.
innobase_create_temporary_tablename(): Replace TEMP_TABLE_PREFIX with
a table name suffix "#1" or "#2". In this way, the user can restore
precious data, should anything go wrong. It is possible to reach an
inconsistent state, because the creation, deletion and renaming of
single-table tablespaces are not transactional.
ut_print_namel(), fil_make_ibd_name(), innobase_rename_table(): Remove
the special treatment of TEMP_TABLE_PREFIX.
Introduce TEMP_INDEX_PREFIX == 0xff for temporary indexes. This byte
cannot occur in index names since MySQL 4.1. However, it might have
been possible to use this byte in MySQL 4.0.
recv_recovery_from_checkpoint_finish(): Call the new function
row_merge_drop_temp_indexes(), to drop all indexes whose name starts
with the byte 0xff.
row_merge_rename_indexes(): Renamed from row_merge_rename_index().
Remove the parameter "index".
row_drop_table_for_mysql(): Unconditionally call trx_commit_for_mysql().
row_drop_table_for_mysql_no_commit(): Correct the function commit,
based on the corrected comment of row_drop_table_for_mysql(). Rely on
table->to_be_dropped instead of TEMP_TABLE_PREFIX.
ha_innobase::add_index(): Simplify the control flow.
Some things still fail in innodb-index.test, and there seems to be
a race condition (data dictionary lock wait) when running with --valgrind.
dfield_t: Add an "external storage" flag, dfield->ext.
dfield_is_null(), dfield_is_ext(), dfield_set_ext(), dfield_set_null():
New functions.
dfield_copy(), dfield_copy_data(): Add const qualifiers, fix in/out comments.
data_write_sql_null(): Use memset().
big_rec_field_t: Replace byte* data with const void* data.
ut_ulint_sort(): Remove.
upd_field_t: Remove extern_storage.
upd_node_t: Replace ext_vec, n_ext_vec with n_ext.
row_merge_copy_blobs(): New function.
row_ins_index_entry(): Add the parameter "ibool foreign" for suppressing
foreign key checks during fast index creation or when inserting into
secondary indexes.
btr_page_insert_fits(): Add const qualifiers.
btr_cur_add_ext(), upd_ext_vec_contains(): Remove.
dfield_print_also_hex(), dfield_print(): Replace if...else if with switch.
Observe dfield_is_ext().
trx_undo_report_row_operation(), trx_undo_report_dict_operation():
Reduce the scope of some variables. Move the return(DB_SUCCESS)
case inside the for loop.
trx_undo_left(): Add const qualifiers.
trx_undo_page_report_insert(): Use exact trx_undo_left() limit.
Remove a duplicated trx_undo_left() check.
trx_undo_page_report_modify(): Eliminate the local variable len.
Document that no prefix for BLOBs needs to be stored in the undo log.
Lump two trx_undo_left() checks together.
trx_undo_page_report_modify(), trx_undo_report_row_operation():
Add const qualifier to the parameter rec. Remove some local variables.
trx_undo_report_row_operation(): Invoke rec_get_offsets() only once.
buf_page_get_gen(). This saves one mutex operation per block request.
buf_page_get_gen(), various macros and functions: Add parameter zip_size.
btr_node_ptr_get_child(): Add parameter index.
fil_space_get_latch(): Add optional output parameter zip_size.
fil_space_get_zip_size(): Return 0 for space id==0, because the
system tablespace is never compressed.
fsp_header_init(): Remove the parameter zip_size.
ibuf_free_excess_pages(): Remove the parameter zip_size.
trx_rseg_t, trx_undo_t: Add field zip_size.
xdes_lst_get_next(): Remove, unused.
Replace buf_frame_t* guess with buf_block_t* guess in order to avoid
a buf_block_align() call.
trx_undo_t: Replace page_t* guess_page with buf_block_t* guess_block.
btr_search_t: Replace page_t* root_guess with buf_block_t* root_guess.
accessors returning pointers with macros that preserve const qualifiers.
In UNIV_DEBUG builds, retain the accessors and cast away constness there.
dfield_get_type(), dfield_get_data(), dtuple_get_nth_field(),
dict_table_get_nth_col(), dict_table_get_sys_col(): Implement as macro
unless #ifdef UNIV_DEBUG.
rec_get_nth_field(): Replace with rec_get_nth_field_offs() that does not
do pointer arithmetics. Implement rec_get_nth_field() as a macro.
and modify some functions to return const pointers. Add const qualifiers
to local variable declarations or casts to remove the const qualifier
in those places where write access is needed.
Replace ut_ad(mtr_memo_contains(mtr, buf_block_align(ptr), ...))
with ut_ad(mtr_memo_contains_page(mtr, ptr, ...)) in order to reduce the
number of buf_block_align() calls.
of externally stored columns, and fix bugs introduced in r873. (Bug #22496)
btr_page_get_sure_split_rec(), btr_page_insert_fits(),
rec_get_converted_size(), rec_convert_dtuple_to_rec(),
rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new():
Add parameters ext and n_ext. Flag external fields during the
conversion.
rec_set_field_extern_bits(), rec_set_field_extern_bits_new(),
rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old():
Remove. The bits are set by rec_convert_dtuple_to_rec().
page_cur_insert_rec_low(): Remove the parameters ext and n_ext.
btr_cur_add_ext(): New utility function for updating and sorting ext[].
Low-level functions now expect the array to be in ascending order
for performance reasons. Used in btr_cur_optimistic_insert(),
btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update().
btr_cur_optimistic_insert(): Remove some defensive code, because we cannot
compute the added parameters of rec_get_converted_size().
btr_push_update_extern_fields(): Sort the array. Require the array to
be twice the maximum usage, so that ut_ulint_sort() can be used.
dtuple_convert_big_rec(): Allocate new space for the BLOB pointer,
to avoid overwriting prefix indexes to the same column. Adapt
dtuple_convert_back_big_rec().
row_build_index_entry(): Fetch the columns also for prefix indexes of
the clustered index.
page_zip_apply_log(), page_zip_decompress_clust(): Allow externally
stored fields to lack a locally stored part.
in the clustered index to be smaller than the indexed prefix in secondary
indexes.
row_ext_lookup(): Return NULL if the column is not stored externally.
trx_undo_rec_get_partial_row(): row_build(): Add parameter row_ext_t** ext.
row_build_index_entry(): Add the parameter row_ext_t* ext.
Invoke row_ext_lookup() to fetch prefixes of externally stored columns.
upd_node_t, undo_node_t, purge_node_t: Add the field row_ext_t* ext.
btr_page_split_and_insert(): Avoid dereferencing pointers to garbage on
the old page.
btr_cur_pessimistic_insert(): Pass pointer to big_rec_vec to
btr_cur_optimistic_insert().
trx_undo_prev_version_build(): Only invoke rec_set_field_extern_bits()
if n_ext_vect > 0.
row_ins_index_entry_low(): Simplify a debug assertion.
page_copy_rec_list_end_no_locks(): Make the loop slightly more readable.
page_delete_rec_list_end(): Delete records on compressed pages one by one.
trx-undo_prev_version_build(): Pass offsets==NULL to
rec_set_field_extern_bits().
rec_set_field_extern_bits(), rec_set_field_extern_bits_new():
Accept offsets==NULL.
row_upd_rec_in_place(): Remove the bogus comment that the function
would only be invoked on a clustered index. Remove the related
debug assertion.
page_zip_dir_delete() will need to handle BLOBs.
rec_set_field_extern_bits(), rec_set_field_extern_bits_new():
Add parameter offsets.
rec_offs_set_nth_extern(): New function to set an extern bit in offsets.
This will be called when an extern bit is set in a record.
page_cur_rec_insert(), page_cur_insert_rec_low(): Document that the
parameter "offsets" is in/out.
page_zip_dir_delete(): Note that the array of BLOB pointers will need
to be shifted.
page0zip.ic: Document the entry type for clearing a record.
page_zip_available(): Add parameter "index". Remove parameters
"is_leaf" and "is_clustered".
page_zip_get_trailer_len(): New function for computing the trailer length
of the compressed page.
page_zip_apply_log(): Implement the modification log entry type for
clearing the data bytes of a record.
page_zip_decompress(): Initialize n_blobs when actually copying the
BLOB pointers to place.
page_zip_validate(): Add diagnostic messages for failures. Check
also m_start, m_end, and n_blobs.
page_zip_write_blob_ptr(): Add page_zip_validate() assertion.
of clustered indexes. Previously, parts of the code assumed that these
columns would exist on all leaf pages. Simplify the update-in-place of
these columns.
Add inline function dict_index_is_clust() to replace all tests
index->type & DICT_CLUSTERED.
Remove the redo log entry types MLOG_ZIP_WRITE_TRX_ID and
MLOG_ZIP_WRITE_ROLL_PTR, because the modifications to these columns
are covered by logical logging.
Fuse page_zip_write_trx_id() and page_zip_write_roll_ptr() into
page_zip_write_trx_id_and_roll_ptr().
page_zip_dir_add_slot(), page_zip_available(): Add flag "is_clustered",
so that no space will be reserved for TRX_ID and ROLL_PTR on leaf pages
of secondary indexes.
page_zip_apply_log(): Flag an error when val==0 is encoded with two bytes.
page_zip_write_rec(): Add debug assertions that there is enough space
available for the entry before copying the data bytes of the record.
row_upd_rec_in_place(), page_zip_write_rec(): Add parameter "index".
page_dir_set_n_heap(): Add a debug assertion that on compressed
pages, n_heap will always be incremented by one. Improve code formatting.
page_zip_dir_add_slot(): New function, called from
page_cur_insert_rec_low() after page_mem_alloc_heap().
rec_set_n_owned_new(): Do not call page_zip_rec_set_owned()
on the supremum record.
rec_offs_make_valid(): Add debug assertions.
page_zip_dir_user_size(): Correct an off-by-one error in the debug assertion.
page_zip_apply_log(): Add parameter trx_id_col. Skip trx_id and roll_ptr.
page_zip_decompress(): Simplify the handling of "storage" in the loop that
copies the uncompressed fields.
page_zip_write_rec(): Store trx_id and roll_ptr separately.
page_zip_write_trx_id(), page_zip_write_roll_ptr(): Fix off-by-one errors.
page_cur_insert_rec_low(): Call page_zip_dir_add_slot() after
page_mem_alloc_heap(). Remove some redundant assertions.
Pass page_zip to page_dir_split_slot().
compressed pages.
btr_root_raise_and_insert(): Distinguish root_page_zip and new_page_zip.
btr_cur_set_ownership_of_extern_field(): Do not log the write on the
uncompressed page if it will be logged for page_zip.
lock_rec_insert_check_and_lock(), lock_sec_rec_modify_check_and_lock():
Update the max_trx_id field also on the compressed page.
mlog_write_ulint(): Add UNIV_UNLIKELY hints. Remove trailing white space.
mlog_log_string(): Remove trailing white space.
rec_set_field_extern_bits(): Remove parameter mtr, as the write will either
occur in the heap, or it will be logged at a higher level.
recv_parse_or_apply_log_rec_body(),
page_zip_write_header(): Add log record type MLOG_ZIP_WRITE_HEADER.
page_header_set_field(): Pass mtr=NULL to page_zip_write_header().
page_header_reset_last_insert(): Pass mtr to page_zip_write_header().
btr_page_set_index_id(), btr_page_set_level(),
btr_page_set_next(), btr_page_set_prev(): Pass mtr to page_zip_write_header().
row_upd_rec_sys_fields(): Pass mtr=NULL to page_zip_write_trx_id() and
page_zip_write_roll_ptr(), since the write will be logged at a higher level.
page_zip_write_header(): Add parameter mtr.
page_zip_write_header_log(): New function.
Remove rec_set_nth_field_extern_bit().
Make rec_set_nth_field_extern_bit_old() static.
Rename rec_set_nth_field_extern_bit_new()
to rec_set_field_extern_bits_new() and make it static.
row_ins_index_entry_low(): Remove bogus TODO comment.
BLOB pointers, trx_id, and roll_ptr.
btr_empty(), btr_create(), page_create(): Add parameter "index", as some
index information will be encoded on the compressed page.
Define REC_NODE_PTR_SIZE as 4.
Allow btr_page_reorganize() and btr_page_reorganize_low() to fail.
Define the error code DB_ZIP_OVERFLOW.
Make row_ins_index_entry_low() static.
page0zip: Encode the index, log reorganized records, and store uncompressed
fields separately from the compressed data stream.