columns
When the server crashes after a record stub has been inserted and
before all its off-page columns have been written, the record will
contain incomplete off-page columns after crash recovery. Such records
may only be accessed at the READ UNCOMMITTED isolation level or when
rolling back a recovered transaction in recv_recovery_rollback_active().
Skip these records at the READ UNCOMMITTED isolation level.
TODO: Add assertions for checking the above assumptions hold when an
incomplete BLOB is encountered.
btr_rec_copy_externally_stored_field(): Return NULL if the field is
incomplete.
row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this
context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs.
row_sel_store_mysql_rec(): Return FALSE if not all columns could be
retrieved. Previously this function always returned TRUE. Assert that
the record is not delete-marked.
row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns
could be retrieved.
row_search_for_mysql(): Skip records containing incomplete off-page
columns. Assert that the transaction isolation level is READ
UNCOMMITTED.
rb://380 approved by Jimmy Yang
and clarifies the invariant in dict_table_get_on_id().
In Mar 2007 Marko observed a crash during recovery, the crash resulted from
an UNDO operation on a system table. His solution was to acquire an X lock on
the data dictionary, this in hindsight was an overkill. It is unclear what
caused the crash, current hypothesis is that it was a memory corruption.
The X lock results in performance issues by when undoing changes due to
rollback during normal operation on regular tables.
Why the change is safe:
======================
The InnoDB code has changed since the original X lock change was made. In the
new code we always lock the data dictionary in X mode during startup when
UNDOing operations on the system tables (this is a given). This ensures that
the crash Marko observed cannot happen as long as all transactions that update
the system tables follow the standard rules by setting the appropriate DICT_OP
flag when writing the log records when they make the changes.
If transactions violate the above mentioned rule then during recovery (at
startup) the rollback code (see trx0roll.c) will not acquire the X lock
and we will see the crash again. This will however be a different bug.
ha_innobase::create(): Add the local variable row_type = form->s->row_type.
Adjust it to ROW_TYPE_COMPRESSED when ROW_FORMAT is not specified or inherited
but KEY_BLOCK_SIZE is. Observe the inherited ROW_FORMAT even when it is not
explicitly specified.
innodb_bug54679.test: New test, to test the bug and to ensure that there are
no regressions. (The only difference in the test result without the patch
applied is that the first ALTER TABLE changes ROW_FORMAT to Compact.)
when renaming tables
Allocate the table name using ut_malloc() instead of table->heap because
the latter cannot be freed.
Adjust dict_sys->size calculations all over the code.
Change dict_table_t::name from const char* to char* because we need to
ut_malloc()/ut_free() it.
Reviewed by: Inaam, Marko, Heikki (rb://384)
Approved by: Heikki (rb://384)
ha_innobase::index_read(), ha_innobase::records_in_range(): Check that
the index is useable before invoking row_sel_convert_mysql_key_to_innobase().
This fix is based on a suggestion by Yasufumi Kinoshita.
dict_check_tablespaces_and_store_max_id(): Initialize max_space_id
and fil_system->max_assigned_id from DICT_HDR_MAX_SPACE_ID.
fil_space_create(): Suppress the warning unless !recv_recovery_on
(do not complain while applying the redo log).
Valgrind warning happpens because of uninitialized null bytes.
In row_sel_push_cache_row_for_mysql() function we fill fetch cache
with necessary field values, row_sel_store_mysql_rec() is called
for this and leaves null bytes untouched.
Later row_sel_pop_cached_row_for_mysql() rewrites table record
buffer with uninited null bytes. We can see the problem from the
test case:
At 'SELECT...' we call row_sel_push...->row_sel_store...->row_sel_pop_cached...
chain which rewrites table->record[0] buffer with uninitialized null bytes.
When we call 'UPDATE...' statement, compare_record uses this buffer and
valgrind warning occurs.
The fix is to init null bytes with default values.
for InnoDB plugin
dict_load_table(): Pass the correct tablespace flags to
fil_open_single_table_tablespace(). For ROW_FORMAT=COMPACT and REDUNDANT,
the tablespace flags are 0. The table flags would be 0 or DICT_TF_COMPACT.
In semi-consistent read, only unlock freshly locked non-matching records.
lock_rec_lock_fast(): Return LOCK_REC_SUCCESS,
LOCK_REC_SUCCESS_CREATED, or LOCK_REC_FAIL instead of TRUE/FALSE.
enum db_err: Add DB_SUCCESS_LOCKED_REC for indicating a successful
operation where a record lock was created.
lock_sec_rec_read_check_and_lock(),
lock_clust_rec_read_check_and_lock(), lock_rec_enqueue_waiting(),
lock_rec_lock_slow(), lock_rec_lock(), row_ins_set_shared_rec_lock(),
row_ins_set_exclusive_rec_lock(), sel_set_rec_lock(),
row_sel_get_clust_rec_for_mysql(): Return DB_SUCCESS_LOCKED_REC if a
new record lock was created. Adjust callers.
row_unlock_for_mysql(): Correct the function documentation.
row_prebuilt_t::new_rec_locks: Correct the documentation.
In semi-consistent read, only unlock freshly locked non-matching records.
Define DB_SUCCESS_LOCKED_REC for indicating a successful operation
where a record lock was created.
lock_rec_lock_fast(): Return LOCK_REC_SUCCESS,
LOCK_REC_SUCCESS_CREATED, or LOCK_REC_FAIL instead of TRUE/FALSE.
lock_sec_rec_read_check_and_lock(),
lock_clust_rec_read_check_and_lock(), lock_rec_enqueue_waiting(),
lock_rec_lock_slow(), lock_rec_lock(), row_ins_set_shared_rec_lock(),
row_ins_set_exclusive_rec_lock(), sel_set_rec_lock(),
row_sel_get_clust_rec_for_mysql(): Return DB_SUCCESS_LOCKED_REC if a
new record lock was created. Adjust callers.
row_unlock_for_mysql(): Correct the function documentation.
row_prebuilt_t::new_rec_locks: Correct the documentation.