Bug#13639204 64111: CRASH ON SELECT SUBQUERY WITH NON UNIQUE INDEX
The crash happened due to wrong calculation
of key length during creation of reference for
sort order index. The problem is that
keyuse->used_tables can have OUTER_REF_TABLE_BIT enabled
but used_tables parameter(create_ref_for_key() func) does
not have it. So key parts which have OUTER_REF_TABLE_BIT
are ommited and it could lead to incorrect key length
calculation(zero key length).
Fix the calculation of the next autoinc value when offset > 1. Some of the
results have changed due to the changes in the allocation calculation. The
new calculation will result in slightly bigger gaps for bulk inserts.
rb://866 Approved by Jimmy Yang.
Backported from mysql-trunk (5.6)
rb://942
approved by: Marko Makela
We don't need to scan LRU for dropping AHI entries when DROPing a table.
AHI entries are already removed when we free up extents for the btree.
IN LOCK_VALIDATE()
rb://917
approved by: Marko Makela
In lock_validate() the limit is used to release the kernel_mutex during
the validation, to obey the latching order.
If we do the limit++ then we are rechecking the same lock most times on
each iteration because limit is being incremented by one and
<space, page_no> will nearly always be > limit. If we set the limit
correctly to (space, page+1) then we are actually making progress
during the iteration.
truncating, inserting the same set of rows. When a table is
re-created with the same set of rows, the data file size must
not grow.
rb:968
Approved by Marko.
This bug has been there at least since MySQL 4.0.9. (Before 4.0.9, the
code probably was even more severely broken.)
btr_pcur_restore_position(): When cursor restoration fails, before
invoking btr_pcur_store_position() move to the previous or next record
unless cursor->rel_pos==BTR_PCUR_ON or the record was not a user
record.
This bug can cause skipped records when btr_pcur_store_position() is
called on the last record of a page. A symptom would be record count
mismatch in CHECK TABLE, or failure to find a record to delete-mark or
update or purge. The following operations should be affected by the
bug:
* row_search_for_mysql(): SELECT, UPDATE, REPLACE, CHECK TABLE,
(almost anything else than INSERT)
* foreign key CASCADE operations
* row_merge_read_clustered_index(): index creation (since MySQL 5.1
InnoDB Plugin)
* multi-threaded purge (after MySQL 5.5): not sure, but it might fail
to purge some records
Not all callers of btr_pcur_restore_position() should be affected.
Anything that asserts or checks that restoration succeeds is
unaffected. For example, cursor restoration on the change buffer tree
should always succeed, because access is being protected by additional
latches. Likewise, rollback, or any code accesses data dictionary
tables while holding dict_sys->mutex should be safe.
rb:967 approved by Jimmy Yang
There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
Backporting the fix from MySQL 5.5 to 5.1
rb:961
rb:947
There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
rb:947
approved by Jimmy Yang
The problem was introduced in 5.5.20 by Bug 13116225. It tried to
protect against downgrading from a version 5.6.4 database that was
created with a page size other than 16k. Version 5.6.4 supports
page sizes 4k and 8k and stamps that page size into the header page
of each tablespace file. Version 5.5.20 attempts to read that page
size in the file header.
But it turns out that only the first system tablespace file has a
reliable flags field in the header. So only ibdata1 can be or needs
to be tested for another page size. Extra system tablespace files
like ibdata2, ibdata3, etc do not and should not be tested since the
flags field is unreliable.
OF WIDE RECORDS
row_ins_index_entry_low(), row_upd_clust_rec(): Make a redo log
checkpoint if a DEBUG flag is set. Add DEBUG_SYNC around
btr_store_big_rec_extern_fields().
rb:946 approved by Jimmy Yang
The actual Bug#11754376 does not exist in MySQL 5.5 because at startup
we drop entries for temporary tables from InnoDB dictionary cache (only
if ROW_FORMAT is not REDUNDANT). But nevertheless the bug in
normalize_table_name_low() is present so we fix it.
GRACEFUL SHUTDOWN
During startup mysql picks up .frm files from the tmpdir directory and
tries to drop those tables in the storage engine.
The problem is that when tmpdir ends in / then ha_innobase::delete_table()
is passed a string like "/var/tmp//#sql123", then it wrongly normalizes it
to "/#sql123" and calls row_drop_table_for_mysql() which of course fails
to delete the table entry from the InnoDB dictionary cache.
ha_innobase::delete_table() returns an error but nevertheless mysql wipes
away the .frm file and the entry in the InnoDB dictionary cache remains
orphaned with no easy way to remove it.
The "no easy" way to remove it is to create a similar temporary table again,
copy its .frm file to tmpdir under "#sql123.frm" and restart mysqld with
tmpdir=/var/tmp (no trailing slash) - this way mysql will pick the .frm file
after restart and will try to issue drop table for "/var/tmp/#sql123"
(notice do double slash), ha_innobase::delete_table() will normalize it to
"tmp/#sql123" and row_drop_table_for_mysql() will successfully remove the
table entry from the dictionary cache.
The solution is to fix normalize_table_name_low() to normalize things like
"/var/tmp//table" correctly to "tmp/table".
This patch also adds a test function which invokes
normalize_table_name_low() with various inputs to make sure it works
correctly and a mtr test that calls this test function.
Reviewed by: Marko (http://bur03.no.oracle.com/rb/r/929/)
print page dump
buf_page_print(): Remove the ut_ad(0) from the beginning. Add two flags
(enum buf_page_print_flags) that can be bitwise-ORed together:
BUF_PAGE_PRINT_NO_CRASH:
Do not crash debug builds at the end of buf_page_print().
BUF_PAGE_PRINT_NO_FULL:
Do not print the full page dump. This can be useful when adding
diagnostic printout to flushing or to the doublewrite buffer.
trx_sys_doublewrite_init_or_restore_page(): Replace exit(1) with ut_error,
so that we can get a core dump if this extraordinary condition happens.
rb:924 approved by Sunny Bains
rb://914
approved by: Marko Makela
Poll in fil_rename_tablespace() after setting ::stop_ios flag can
result in a hang because the other thread actually dispatching the IO
won't wake IO helper threads or flush the tablespace before starting
wait in fil_mutex_enter_and_prepare_for_io().
This fix does not remove the underlying cause of the assertion
failure. It just works around the problem, allowing a corrupted
secondary index to be fixed by DROP INDEX and CREATE INDEX (or in the
worst case, by re-creating the table).
ibuf_delete(): If the record to be purged is the last one in the page
or it is not delete-marked, refuse to purge it. Instead, write an
error message to the error log and let a debug assertion fail.
ibuf_set_del_mark(): If the record to be delete-marked is not found,
display some more information in the error log and let a debug
assertion fail.
row_undo_mod_del_unmark_sec_and_undo_update(),
row_upd_sec_index_entry(): Let a debug assertion fail when the record
to be delete-marked is not found.
buf_page_print(): Add ut_ad(0) so that corruption will be more
prominent in stress testing with debug binaries. Add ut_ad(0) here and
there where corruption is noticed.
btr_corruption_report(): Display some data on page_is_comp() mismatch.
btr_assert_not_corrupted(): A wrapper around btr_corruption_report().
Assert that page_is_comp() agrees with the table flags.
rb:911 approved by Inaam Rana
rb://898
approved by: Marko Makela
On some kernel versions native aio operations are not supported on
tmpfs. Check this during start up and fall back to simulated aio.
ISSUES WITH COPYING PARTITIONED INNODB TABLES FROM LINUX TO WINDOWS
This problem was already fixed in mysql-trunk as part of bug #11755924. I am
backporting the fix to mysql-5.1.
If we meet DB_TOO_MANY_CONCURRENT_TRXS during the execution tab_create_graph from row_create_table_for_mysql(), .ibd file for the table should be created already but was not deleted for the error handling.
rb:875 approved by Jimmy Yang
If we meet DB_TOO_MANY_CONCURRENT_TRXS during the execution tab_create_graph from row_create_table_for_mysql(), .ibd file for the table should be created already but was not deleted for the error handling.
rb:875 approved by Jimmy Yang
InnoDB: Remove HAVE_purify, UNIV_INIT_MEM_TO_ZERO, UNIV_SET_MEM_TO_ZERO.
The compile-time setting HAVE_purify can mask potential bugs.
It is being set in PB2 Valgrind runs. We should simply get rid of it,
and replace it with UNIV_MEM_INVALID() to declare uninitialized memory
as such in Valgrind-instrumented binaries.
os_mem_alloc_large(), ut_malloc_low(): Remove the parameter set_to_zero.
ut_malloc(): Define as a macro that invokes ut_malloc_low().
buf_pool_init(): Never initialize the buffer pool frames. All pages
must be initialized before flushing them to disk.
mem_heap_alloc(): Never initialize the allocated memory block.
os_mem_alloc_nocache(), ut_test_malloc(): Unused function, remove.
rb:813 approved by Jimmy Yang
CREATE TABLE bug13510739 (c INTEGER NOT NULL, PRIMARY KEY (c)) ENGINE=INNODB;
INSERT INTO bug13510739 VALUES (1), (2), (3), (4);
DELETE FROM bug13510739 WHERE c=2;
HANDLER bug13510739 OPEN;
HANDLER bug13510739 READ `primary` = (2);
HANDLER bug13510739 READ `primary` NEXT; <-- crash
The bug is that in the particular testcase row_search_for_mysql() picked up
a delete-marked record and quit, leaving the cursor non-positioned state and
on the subsequent 'get next' call the code crashed because of the
non-positioned cursor.
In row0sel.cc (line numbers from mysql-trunk):
4653 if (rec_get_deleted_flag(rec, comp)) {
...
4679 if (index == clust_index && unique_search) {
4680
4681 err = DB_RECORD_NOT_FOUND;
4682
4683 goto normal_return;
4684 }
it quit from here, not storing the cursor position.
In contrast, if the record=2 is not found at all (e.g. sleep(1) after DELETE
to let the purge wipe it away completely) then 'get = 2' does find record=3
and quits from here:
4366 if (0 != cmp_dtuple_rec(search_tuple, rec, offsets)) {
...
4394 btr_pcur_store_position(pcur, &mtr);
4395
4396 err = DB_RECORD_NOT_FOUND;
4397 #if 0
4398 ut_print_name(stderr, trx, FALSE, index->name);
4399 fputs(" record not found 3\n", stderr);
4400 #endif
4401
4402 goto normal_return;
Another fix could be to extend the condition on line 4366 to hold only if
seach_tuple matches rec AND if rec is not delete marked.
Notice that in the above test case if we wait about 1 second somewhere after
DELETE and before 'get = 2', then the testcase does not crash and returns 4
instead. Not sure if this is the correct behavior, but this bugfix removes
the crash and makes the code return what it also returns in the non-crashing
case (if rec=2 is not found during 'get = 2', e.g. we have sleep(1) there).
Approved by: Marko (http://bur03.no.oracle.com/rb/r/863/)
The counter handler_read_key (SSV::ha_read_key_count) is incremented
incorrectly.
The mysql server maintains a per thread system_status_var (SSV)
object. This object contains among other things the counter
SSV::ha_read_key_count. The purpose of this counter is to measure the
number of requests to read a row based on a key (or the number of
index lookups).
This counter was wrongly incremented in the
ha_innobase::innobase_get_index(). The fix removes
this increment statement (for both innodb and innodb_plugin).
The various callers of the innobase_get_index() was checked to
determine if anybody must increment this counter (if they first call
innobase_get_index() and then perform an index lookup). It was found
that no caller of innobase_get_index() needs to worry about the
SSV::ha_read_key_count counter.
This bug ensures that a live downgrade from an InnoDB 5.6 with WL5756 and
a database created with innodb-page-size=8k or 4k will make a version 5.5
installation politely refuse to start. Instead of crashing or giving some
indication about a corrupted database, it will indicate the page size
difference.
This patch takes only that part of the Wl5756 patch that is needed to
protect against opening a tablespace that is stamped with a different
page size.
It also contains the change in dict_index_find_on_id_low() just in case
a database with another page size is created by recompiling a pre-WL5756
InnoDB.
WITH LARGE BUFFER POOL
(Note: this a backport of revno:3472 from mysql-trunk)
rb://845
approved by: Marko
When dropping a table (with an .ibd file i.e.: with
innodb_file_per_table set) we scan entire LRU to invalidate pages from
that table. This can be painful in case of large buffer pools as we hold
the buf_pool->mutex for the scan. Note that gravity of the problem does
not depend on the size of the table. Even with an empty table but a
large and filled up buffer pool we'll end up scanning a very long LRU
list.
The fix is to scan flush_list and just remove the blocks belonging to
the table from the flush_list, marking them as non-dirty. The blocks
are left in the LRU list for eventual eviction due to aging. The
flush_list is typically much smaller than the LRU list but for cases
where it is very long we have the solution of releasing the
buf_pool->mutex after scanning 1K pages.
buf_page_[set|unset]_sticky(): Use new IO-state BUF_IO_PIN to ensure
that a block stays in the flush_list and LRU list when we release
buf_pool->mutex. Previously we have been abusing BUF_IO_READ to achieve
this.
Remove btr_cur_t::ibuf_cnt. The only dependence on cursor->ibuf_cnt
was ibuf_set_entry_counter(). That code path was removed in the
original bug fix (marko.makela@oracle.com-20111107072802-dgwagejlpub0rjkd).
rb:819 approved by Jimmy Yang
I manually checked that all the conflicting InnoDB changes are in 5.5 already.
Two things I am not sure about - I commented them with XXX in this patch.
I will further check with the authors of the changesets whether these things
should be present or not.
a.k.a. Bug#7975 deadlock without any locking, simple select and update
Bug#7975 was reintroduced when the storage engine API was made
pluggable in MySQL 5.1. Instead of looking at thd->lex directly, we
rely on handler::extra(). But, we were looking at the wrong extra()
flag, and we were ignoring the TRX_DUP_REPLACE flag in places where we
should obey it.
innodb_replace.test: Add tests for hopefully all affected statement
types, so that bug should never ever resurface. This kind of tests
should have been added when fixing Bug#7975 in MySQL 5.0.3 in the
first place.
rb:806 approved by Sunny Bains
ibuf_insert_low(), the only caller of ibuf_set_entry_counter(), will
have latched an insert buffer bitmap page in bitmap_mtr before
invoking ibuf_set_entry_counter(). The latching order forbids any
further pages to be latched.
ibuf_set_entry_counter(): Renamed to ibuf_get_entry_counter(),
simplified the code and added comments.
Added the following symbols for predefined field numbers in change
buffer records:
#define IBUF_REC_FIELD_SPACE 0 /*!< in the pre-4.1 format,
the page number. later, the space_id */
#define IBUF_REC_FIELD_MARKER 1 /*!< starting with 4.1, a marker
consisting of 1 byte that is 0 */
#define IBUF_REC_FIELD_PAGE 2 /*!< starting with 4.1, the
page number */
#define IBUF_REC_FIELD_METADATA 3 /* the metadata field */
#define IBUF_REC_FIELD_USER 4 /* first user field */
rb:802 approved by Sunny Bains
Bug#12612184 RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE()
The fix introduced potentially more severe crash recovery problems
than the bug causes. Revert the fix for now.
This was an attempt to address problems with the Bug#12612184 fix.
Even with this follow-up fix, crash recovery can be broken.
Let us fix the bug later.
In the ON UPDATE CASCADE clause of FOREIGN KEY constraints, the
calculated update vector was not fully initialized. This bug was
introduced in the InnoDB Plugin when implementing support for
ROW_FORMAT=DYNAMIC.
Additionally, the data type information was not initialized, but
apparently it has never been needed in this case. Nevertheless, it is
not good programming practice to pass uninitialized values around.
calc_row_difference(): Declare the update field uninitialized in
Valgrind. Copy the data type information as well, except when the
field is SQL NULL. In the built-in InnoDB, initialize
ufield->extern_storage = FALSE (an initialization bug that had gone
unnoticed this far). The InnoDB Plugin and later have this flag to
dfield_t and have always initialized it properly.
row_ins_cascade_calc_update_vec(): Reduce the scope of some
pointers. Initialize orig_len. (This caused the bug in InnoDB Plugin
and later.)
row_ins_foreign_check_on_constraint(): Simplify a condition. Declare
the update vector uninitialized.
rb:771 approved by Jimmy Yang
btr_record_not_null_field_in_rec(): Remove the parameter rec.
Use rec_offs_nth_sql_null() instead of rec_get_nth_field().
rb:788 approved by Jimmy Yang
rw_lock_x_lock_func(): Assert that the thread is not already holding
the lock in a conflicting mode (RW_LOCK_SHARED).
rw_lock_s_lock_func(): Assert that the thread is not already holding
the lock in a conflicting mode (RW_LOCK_EX).
Bug 12980094 - ASSERTION IN INNODB DETECTED IN RQG_PARTITION_DDL
Bug 13034534 - RQG TESTS FAIL ON WINDOWS WITH CRASH NEAR RW_LOCK_DEBUG_PRINT
All access to struct rw_lock_debug_struct must be protected by rw_lock_debug_mutex_enter().
Replace part of the patch that Kevin apparently forgot to push.
Fix the bug also in the built-in InnoDB of MySQL 5.1.
I cannot explain why the test case was not failing without the
full patch.
This was rb:762, approved by me.
The problem occurred when indexes are added between the time that an
UNDO record is created and the time that the purge thread comes around
and deletes the old secondary index entries. The purge thread would
hit an assert when trying to build a secondary index entry for
searching. The problem was that the old value of those fields were not
in the UNDO record since they were not part of an index when the UPDATE
occured.
A test case was added to innodb-index.test.
when buffered changes are to be discarded
ibuf_add_free_page(): Lower the latching order of the newly allocated page
to SYNC_IBUF_TREE_NODE_NEW after latching the insert buffer tree root.
This bug always was bogus UNIV_SYNC_DEBUG alarm. The function
buf_block_dbg_add_level() is a no-op unless UNIV_SYNC_DEBUG is defined.
Tweak the faulty UNIV_SYNC_DEBUG diagnostics a little bit more.
ibuf_add_free_page(): Lower the latching order of the newly allocated page
only after acquiring the ibuf_mutex.
This fix was accidentally pushed to mysql-5.1 after the 5.1.59 clone-off in
bzr revision id marko.makela@oracle.com-20110829081642-z0w992a0mrc62s6w
with the fix of Bug#12704861 Corruption after a crash during BLOB update
but not merged to mysql-5.5 and upwards.
In the Barracuda formats, the clustered index record no longer
contains a prefix of off-page columns. Because of this, the undo log
must contain these prefixes, so that purge and multi-versioning will
continue to work. However, this also means that an undo log record can
become too big to fit in an undo log page. (It is a limitation of the
undo log that undo records cannot span across multiple pages.)
In case the checks for undo log size fail when CREATE TABLE or CREATE
INDEX is executed, we need a fallback that blocks a modification
operation when the undo log record would exceed the maximum size.
trx_undo_free_last_page_func(): Renamed from trx_undo_free_page_in_rollback().
Define the trx_t parameter only in debug builds.
trx_undo_free_last_page(): Wrapper for trx_undo_free_last_page_func().
Pass the trx_t parameter only in debug builds.
trx_undo_truncate_end_func(): Renamed from trx_undo_truncate_end().
Define the trx_t parameter only in debug builds. Rewrite a for(;;) loop
as a while loop for clarity.
trx_undo_truncate_end(): Wrapper for from trx_undo_truncate_end_func().
Pass the trx_t parameter only in debug builds.
trx_undo_erase_page_end(): Return TRUE if the page was non-empty
to begin with. Refuse to erase empty pages.
trx_undo_report_row_operation(): If the page for which the undo log
was too big was empty, free the undo page and return DB_TOO_BIG_RECORD.
rb:749 approved by Inaam Rana
The fix of Bug#12612184 broke crash recovery. When a record that
contains off-page columns (BLOBs) is updated, we must first write redo
log about the BLOB page writes, and only after that write the redo log
about the B-tree changes. The buggy fix would log the B-tree changes
first, meaning that after recovery, we could end up having a record
that contains a null BLOB pointer.
Because we will be redo logging the writes off the off-page columns
before the B-tree changes, we must make sure that the pages chosen for
the off-page columns are free both before and after the B-tree
changes. In this way, the worst thing that can happen in crash
recovery is that the BLOBs are written to free pages, but the B-tree
changes are not applied. The BLOB pages would correctly remain free in
this case. To achieve this, we must allocate the BLOB pages in the
mini-transaction of the B-tree operation. A further quirk is that BLOB
pages are allocated from the same file segment as leaf pages. Because
of this, we must temporarily "hide" any leaf pages that were freed
during the B-tree operation by "fake allocating" them prior to writing
the BLOBs, and freeing them again before the mtr_commit() of the
B-tree operation, in btr_mark_freed_leaves().
btr_cur_mtr_commit_and_start(): Remove this faulty function that was
introduced in the Bug#12612184 fix. The problem that this function was
trying to address was that when we did mtr_commit() the BLOB writes
before the mtr_commit() of the update, the new BLOB pages could have
overwritten clustered index B-tree leaf pages that were freed during
the update. If recovery applied the redo log of the BLOB writes but
did not see the log of the record update, the index tree would be
corrupted. The correct solution is to make the freed clustered index
pages unavailable to the BLOB allocation. This function is also a
likely culprit of InnoDB hangs that were observed when testing the
Bug#12612184 fix.
btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
or freed (nonfree=FALSE) before committing the mini-transaction.
btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).
btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or if this is a "fake allocation"
(init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).
btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
initialized and X-latched in a different mini-transaction than the one
that is used for the allocation. Invoke btr_page_alloc_low(). If a
clustered index leaf page was previously freed in mtr, remove it from
the memo of previously freed pages.
btr_page_free(): Assert that the page is a B-tree page and it has been
X-latched by the mini-transaction. If the freed page was a leaf page
of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
the mini-transaction.
btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.
fsp_alloc_from_free_frag(): Refactored from
fsp_alloc_free_page(). Allocate the specified page from a partially
free extent.
fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or NULL if this is a "fake allocation"
that prevents the reuse of a previously freed B-tree page for BLOB
storage. If init_mtr==NULL, try harder to reallocate the specified page
and assert that it succeeded.
fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
specifying the mini-transaction where the page should be initialized.
Do not allow init_mtr == NULL, because this function is never to be
used for "fake allocations".
mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
mtr->freed_clust_leaf for quickly determining if any
MTR_MEMO_FREE_CLUST_LEAF operations have been posted.
row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
the mini-transaction as the alloc_mtr to
btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
row_build(): Correct a comment, and add a debug assertion that a
record that contains NULL BLOB pointers must be a fresh insert.
row_upd_clust_rec(): When columns are being moved off-page, invoke
btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
buf_reset_check_index_page_at_flush(): Remove. The function
fsp_init_file_page_low() already sets
bpage->check_index_page_at_flush=FALSE.
There is a known issue in tablespace extension. If the request to
allocate a BLOB page leads to the tablespace being extended, crash
recovery could see BLOB writes to pages that are off the tablespace
file bounds. This should trigger an assertion failure in fil_io() at
crash recovery. The safe thing would be to write redo log about the
tablespace extension to the mini-transaction of the BLOB write, not to
the mini-transaction of the record update. However, there is no redo
log record for file extension in the current redo log format.
rb:693 approved by Sunny Bains
Also addressed issues in bug #11745133, where we could mark a table
corrupted instead of crashing the server when found a corrupted buffer/page
if the table created with innodb_file_per_table on.