There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
Backporting the fix from MySQL 5.5 to 5.1
rb:961
rb:947
This bug was originally filed and fixed as Bug#12612184. The original
fix was buggy, and it was patched by Bug#12704861. Also that patch was
buggy (potentially breaking crash recovery), and both fixes were
reverted.
This fix was not ported to the built-in InnoDB of MySQL 5.1, because
the function signatures of many core functions are different from
InnoDB Plugin and later versions. The block allocation routines and
their callers would have to changed so that they handle block
descriptors instead of page frames.
When a record is updated so that its size grows, non-updated columns
can be selected for external (off-page) storage. The bug is that the
initially inserted updated record contains an all-zero BLOB pointer to
the field that was not updated. Only after the BLOB pages have been
allocated and written, the valid pointer can be written to the record.
Between the release of the page latch in mtr_commit(mtr) after
btr_cur_pessimistic_update() and the re-latching of the page in
btr_pcur_restore_position(), other threads can see the invalid BLOB
pointer consisting of 20 zero bytes. Moreover, if the system crashes
at this point, the situation could persist after crash recovery, and
the contents of the non-updated column would be permanently lost.
The problem is amplified by the ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED that were introduced in
innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist
in all InnoDB versions.
The fix is as follows. After a pessimistic B-tree operation that needs
to write out off-page columns, allocate the pages for these columns in
the mini-transaction that performed the B-tree operation (btr_mtr),
but write the pages in a separate mini-transaction (blob_mtr). Do
mtr_commit(blob_mtr) before mtr_commit(btr_mtr). A quirk: Do not reuse
pages that were previously freed in btr_mtr. Only write the off-page
columns to 'fresh' pages.
In this way, crash recovery will see redo log entries for blob_mtr
before any redo log entry for btr_mtr. It will apply the BLOB page
writes to pages that were marked free at that point. If crash recovery
fails to see all of the btr_mtr redo log, there will be some
unreachable BLOB data in free pages, but the B-tree will be in a
consistent state.
btr_page_alloc_low(): Renamed from btr_page_alloc(). Add the parameter
init_mtr. Return an allocated block, or NULL. If init_mtr!=mtr but
the page was already X-latched in mtr, do not initialize the page.
btr_page_alloc(): Wrapper for btr_page_alloc_for_ibuf() and
btr_page_alloc_low().
btr_page_free(): Add a debug assertion that the page was a B-tree page.
btr_lift_page_up(): Return the father block.
btr_compress(), btr_cur_compress_if_useful(): Add the parameter ibool
adjust, for adjusting the cursor position.
btr_cur_pessimistic_update(): Preserve the cursor position when
big_rec will be written and the new flag BTR_KEEP_POS_FLAG is defined.
Remove a duplicate rec_get_offsets() call. Keep the X-latch on
index->lock when big_rec is needed.
btr_store_big_rec_extern_fields(): Replace update_inplace with
an operation code, and local_mtr with btr_mtr. When not doing a
fresh insert and btr_mtr has freed pages, put aside any pages that
were previously X-latched in btr_mtr, and free the pages after
writing out all data. The data must be written to 'fresh' pages,
because btr_mtr will be committed and written to the redo log after
the BLOB writes have been written to the redo log.
btr_blob_op_is_update(): Check if an operation passed to
btr_store_big_rec_extern_fields() is an update or insert-by-update.
fseg_alloc_free_page_low(), fsp_alloc_free_page(),
fseg_alloc_free_extent(), fseg_alloc_free_page_general(): Add the
parameter init_mtr. Return an allocated block, or NULL. If
init_mtr!=mtr but the page was already X-latched in mtr, do not
initialize the page.
xdes_get_descriptor_with_space_hdr(): Assert that the file space
header is being X-latched.
fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page().
fsp_page_create(): New function, for allocating, X-latching and
potentially initializing a page. If init_mtr!=mtr but the page was
already X-latched in mtr, do not initialize the page.
fsp_free_page(): Add ut_ad(0) to the error outcomes.
fsp_free_page(), fseg_free_page_low(): Increment mtr->n_freed_pages.
fsp_alloc_seg_inode_page(), fseg_create_general(): Assert that the
page was not previously X-latched in the mini-transaction. A file
segment or inode page should never be allocated in the middle of an
mini-transaction that frees pages, such as btr_cur_pessimistic_delete().
fseg_alloc_free_page_low(): If the hinted page was allocated, skip the
check if the tablespace should be extended. Return NULL instead of
FIL_NULL on failure. Remove the flag frag_page_allocated. Instead,
return directly, because the page would already have been initialized.
fseg_find_free_frag_page_slot() would return ULINT_UNDEFINED on error,
not FIL_NULL. Correct a bogus assertion.
fseg_alloc_free_page(): Redefine as a wrapper macro around
fseg_alloc_free_page_general().
buf_block_buf_fix_inc(): Move the definition from the buf0buf.ic to
buf0buf.h, so that it can be called from other modules.
mtr_t: Add n_freed_pages (number of pages that have been freed).
page_rec_get_nth_const(), page_rec_get_nth(): The inverse function of
page_rec_get_n_recs_before(), get the nth record of the record
list. This is faster than iterating the linked list. Refactored from
page_get_middle_rec().
trx_undo_rec_copy(): Add a debug assertion for the length.
trx_undo_add_page(): Return a block descriptor or NULL instead of a
page number or FIL_NULL.
trx_undo_report_row_operation(): Add debug assertions.
trx_sys_create_doublewrite_buf(): Assert that each page was not
previously X-latched.
page_cur_insert_rec_zip_reorg(): Make use of page_rec_get_nth().
row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG, so that
the repositioning of the cursor can be avoided.
row_ins_index_entry_low(): Add DEBUG_SYNC points before and after
writing off-page columns. If inserting by updating a delete-marked
record, do not reposition the cursor or commit the mini-transaction
before writing the off-page columns.
row_build(): Tighten a debug assertion about null BLOB pointers.
row_upd_clust_rec(): Add DEBUG_SYNC points before and after writing
off-page columns. Do not reposition the cursor or commit the
mini-transaction before writing the off-page columns.
rb:939 approved by Jimmy Yang
IS EXECUTED TWICE FROM P
This bug is a duplicate of bug 12567331, which was pushed to the
optimizer backporting tree on 2011-06-11. This is just a back-port of
the fix. Both test cases are included as they differ somewhat.
GRACEFUL SHUTDOWN
During startup mysql picks up .frm files from the tmpdir directory and
tries to drop those tables in the storage engine.
The problem is that when tmpdir ends in / then ha_innobase::delete_table()
is passed a string like "/var/tmp//#sql123", then it wrongly normalizes it
to "/#sql123" and calls row_drop_table_for_mysql() which of course fails
to delete the table entry from the InnoDB dictionary cache.
ha_innobase::delete_table() returns an error but nevertheless mysql wipes
away the .frm file and the entry in the InnoDB dictionary cache remains
orphaned with no easy way to remove it.
The "no easy" way to remove it is to create a similar temporary table again,
copy its .frm file to tmpdir under "#sql123.frm" and restart mysqld with
tmpdir=/var/tmp (no trailing slash) - this way mysql will pick the .frm file
after restart and will try to issue drop table for "/var/tmp/#sql123"
(notice do double slash), ha_innobase::delete_table() will normalize it to
"tmp/#sql123" and row_drop_table_for_mysql() will successfully remove the
table entry from the dictionary cache.
The solution is to fix normalize_table_name_low() to normalize things like
"/var/tmp//table" correctly to "tmp/table".
This patch also adds a test function which invokes
normalize_table_name_low() with various inputs to make sure it works
correctly and a mtr test that calls this test function.
Reviewed by: Marko (http://bur03.no.oracle.com/rb/r/929/)
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
KEY HANDLING ON SUBSEQUENT CREATE TABLE IF NOT EXISTS
PROBLEM:
--------
Consider a SP routine which does CREATE TABLE
with REFERENCES clause. The first call to this routine
invokes parser and the parsed items are cached, so as
to avoid parsing for the second execution of the routine.
It is obsevered that valgrind reports a warning
upon read of thd->lex->alter_info->key_list->Foreign_key object,
which seem to be pointing to a invalid memory address
during second time execution of the routine. Accessing this object
theoretically could cause a crash.
ANALYSIS:
---------
The problem stems from the fact that for some reason
elements of ref_columns list in thd->lex->alter_info->
key_list->Foreign_key object are changed to point to
objects allocated on runtime memory root.
During the first execution of routine we create
a copy of thd->lex->alter_info object.
As part of this process we create a clones of objects in
Alter_info::key_list and of Foreign_key object in particular.
Then Foreign_key object is cloned for some reason we
perform shallow copies of both Foreign_key::ref_columns
and Foreign_key::columns list. So new instance of
Foreign_key object starts to SHARE contents of ref_columns
and columns list with the original instance.
After that as part of cloning process we call
list_copy_and_replace_each_value() for elements of
ref_columns list. As result ref_columns lists in both
original and cloned Foreign_key object start to contain
pointers to Key_part_spec objects allocated on runtime
memory root because of shallow copy.
So when we start copying of thd->lex->alter_info object
during the second execution of stored routine we indeed
encounter pointer to the Key_part_spec object allocated
on runtime mem-root which was cleared during at the end
of previous execution. This is done in sp_head::execute(),
by a call to free_root(&execute_mem_root,MYF(0));
As result we get valgrind warnings about accessing
unreferenced memory.
FIX:
----
The safest solution to this problem is to
fix Foreign_key(Foreign_key, MEM_ROOT) constructor to do
a deep copy of columns lists, similar to Key(Key, MEM_ROOT)
constructor.
Bug#13011410 CRASH IN FILESORT CODE WITH GROUP BY/ROLLUP
The assert in 13580775 is visible in 5.6 only,
but shows that all versions are vulnerable.
13011410 crashes in all versions.
filesort tries to re-use the sort buffer between invocations in order to save
malloc/free overhead.
The fix for Bug 11748783 - 37359: FILESORT CAN BE MORE EFFICIENT.
added an assert that buffer properties (num_records, record_length) are
consistent between invocations. Indeed, they are not necessarily consistent.
Fix: re-allocate the sort buffer if properties change.
BUG#13519696 - 62940: SELECT RESULTS VARY WITH VERSION AND
WITH/WITHOUT INDEX RANGE SCAN
BUG#13453382 - REGRESSION SINCE 5.1.39, RANGE OPTIMIZER WRONG
RESULTS WITH DECIMAL CONVERSION
BUG#13463488 - 63437: CHAR & BETWEEN WITH INDEX RETURNS WRONG
RESULT AFTER MYSQL 5.1.
Those are all cases where the range optimizer got it wrong
with > and >=.
- Reverting the patch for Bug # 12584302
The patch will be reverted in 5.1 and 5.5.
The patch will not be reverted in 5.6, the change will
be properly documented in 5.6.
- Backporting DBUG_ASSERT not to crash on '0000-01-00'
(already fixed in mysql-trunk (5.6))
Introducing new collations:
utf8_general_mysql500_ci and ucs2_general_mysql500_ci,
to reproduce behaviour of utf8_general_ci and ucs2_general_ci
from mysql-5.1.23 (and earlier).
The collations are added to simplify upgrade from mysql-5.1.23 and earlier.
Note: The patch does not make new server start over old data automatically.
Some manual upgrade procedures are assumed.
Paul: please get in touch with me to discuss upgrade procedures
when documenting this bug.
modified:
include/m_ctype.h
mysql-test/r/ctype_utf8.result
mysql-test/t/ctype_utf8.test
mysys/charset-def.c
strings/ctype-ucs2.c
strings/ctype-utf8.c
Test extra/rpl_tests/rpl_extra_col_master.test (used by
rpl_extra_col_master_*) ends with the active connection pointing to the
slave. Thus, the two last tests never succeed in changing the binlog
format of the master away from 'row'. With correct active connection
(master) tests fail for binlog 'statement' and 'mixed' formats.
Tests rpl_extra_col_master_* only run when binary log format is
row. Statement and mix replication do not make sense in this
tests since it will try to execute statements on columns that do
not exist. This fix is basically a backport from mysql-5.5, see
changes done as part of BUG 39934.
routines.
mysqldump in xml mode did not dump routines, events or
triggers.
This patch fixes this issue by fixing the if conditions
that disallowed the dump of above mentioned objects in
xml mode, and added the required code to enable dump
in xml format.
If we meet DB_TOO_MANY_CONCURRENT_TRXS during the execution tab_create_graph from row_create_table_for_mysql(), .ibd file for the table should be created already but was not deleted for the error handling.
rb:875 approved by Jimmy Yang
------------------------------------------------------------
revno: 3258
committer: Jon Olav Hauglid <jon.hauglid@oracle.com>
branch nick: mysql-trunk-bug12663165
timestamp: Thu 2011-07-14 10:05:12 +0200
message:
Bug#12663165 SP DEAD CODE REMOVAL DOESN'T UNDERSTAND CONTINUE HANDLERS
When stored routines are loaded, a simple optimizer tries to locate
and remove dead code. The problem was that this dead code removal
did not work correctly with CONTINUE handlers.
If a statement triggers a CONTINUE handler, the following statement
will be executed after the handler statement has completed. This
means that the following statement is not dead code even if the
previous statement unconditionally alters control flow. This fact
was lost on the dead code removal routine, which ended up with
removing instructions that could have been executed. This could
then lead to assertions, crashes and generally bad behavior when
the stored routine was executed.
This patch fixes the problem by marking as live code all stored
routine instructions that are in the same scope as a CONTINUE handler.
Test case added to sp.test.
If init_command was incorrect, we couldn't let users execute
queries, but we couldn't report the issue to the client either
as it does not expect error messages before even sending a
command. Thus, we simply disconnected them without throwing
a clear error.
We now go through the proper sequence once (without executing
any user statements) so we can report back what the problem
is. Only then do we disconnect the user.
As always, root remains unaffected by this as init_command is
(still) not executed for them.
CREATE TABLE bug13510739 (c INTEGER NOT NULL, PRIMARY KEY (c)) ENGINE=INNODB;
INSERT INTO bug13510739 VALUES (1), (2), (3), (4);
DELETE FROM bug13510739 WHERE c=2;
HANDLER bug13510739 OPEN;
HANDLER bug13510739 READ `primary` = (2);
HANDLER bug13510739 READ `primary` NEXT; <-- crash
The bug is that in the particular testcase row_search_for_mysql() picked up
a delete-marked record and quit, leaving the cursor non-positioned state and
on the subsequent 'get next' call the code crashed because of the
non-positioned cursor.
In row0sel.cc (line numbers from mysql-trunk):
4653 if (rec_get_deleted_flag(rec, comp)) {
...
4679 if (index == clust_index && unique_search) {
4680
4681 err = DB_RECORD_NOT_FOUND;
4682
4683 goto normal_return;
4684 }
it quit from here, not storing the cursor position.
In contrast, if the record=2 is not found at all (e.g. sleep(1) after DELETE
to let the purge wipe it away completely) then 'get = 2' does find record=3
and quits from here:
4366 if (0 != cmp_dtuple_rec(search_tuple, rec, offsets)) {
...
4394 btr_pcur_store_position(pcur, &mtr);
4395
4396 err = DB_RECORD_NOT_FOUND;
4397 #if 0
4398 ut_print_name(stderr, trx, FALSE, index->name);
4399 fputs(" record not found 3\n", stderr);
4400 #endif
4401
4402 goto normal_return;
Another fix could be to extend the condition on line 4366 to hold only if
seach_tuple matches rec AND if rec is not delete marked.
Notice that in the above test case if we wait about 1 second somewhere after
DELETE and before 'get = 2', then the testcase does not crash and returns 4
instead. Not sure if this is the correct behavior, but this bugfix removes
the crash and makes the code return what it also returns in the non-crashing
case (if rec=2 is not found during 'get = 2', e.g. we have sleep(1) there).
Approved by: Marko (http://bur03.no.oracle.com/rb/r/863/)
The counter handler_read_key (SSV::ha_read_key_count) is incremented
incorrectly.
The mysql server maintains a per thread system_status_var (SSV)
object. This object contains among other things the counter
SSV::ha_read_key_count. The purpose of this counter is to measure the
number of requests to read a row based on a key (or the number of
index lookups).
This counter was wrongly incremented in the
ha_innobase::innobase_get_index(). The fix removes
this increment statement (for both innodb and innodb_plugin).
The various callers of the innobase_get_index() was checked to
determine if anybody must increment this counter (if they first call
innobase_get_index() and then perform an index lookup). It was found
that no caller of innobase_get_index() needs to worry about the
SSV::ha_read_key_count counter.
SMALL KEY CACHE
The server crashed on division by zero because the key cache was not
initialized and the block length was 0 which was used in a division.
The fix was to not allow CACHE INDEX if the key cache was not initiallized.
Thus never try LOAD INDEX INTO CACHE for an uninitialized key cache.
Also added some windows files/directories to .bzrignore.
AND HANG IN SHOW TABLE STATUS.
ISSUE: Table corruption due to concurrent queries.
Different threads running insert and check
query leads to table corruption. Not properly locked,
rows are inserted in between check query.
SOLUTION: In check query mutex lock is acquired
for a longer time to handle concurrent
insert and check query.
NOTE: Additionally we backported the fix for CHECKSUM
issue(bug#11758979).
rb://816
approved by: Marko Makela
The title is misleading. This bug was actually introduced by
bug 12635227 and was unearthed by a later optimization.
We need to free buf_page_t structs that we are allocating using
malloc() at shutdown.
The bug was accidentally fixed by fixing
Bug#11759688 52020: InnoDB can still deadlock on just INSERT...ON DUPLICATE KEY
a.k.a. the reintroduction of
Bug#7975 deadlock without any locking, simple select and update
a.k.a. Bug#7975 deadlock without any locking, simple select and update
Bug#7975 was reintroduced when the storage engine API was made
pluggable in MySQL 5.1. Instead of looking at thd->lex directly, we
rely on handler::extra(). But, we were looking at the wrong extra()
flag, and we were ignoring the TRX_DUP_REPLACE flag in places where we
should obey it.
innodb_replace.test: Add tests for hopefully all affected statement
types, so that bug should never ever resurface. This kind of tests
should have been added when fixing Bug#7975 in MySQL 5.0.3 in the
first place.
rb:806 approved by Sunny Bains
This was an attempt to address problems with the Bug#12612184 fix.
Even with this follow-up fix, crash recovery can be broken.
Let us fix the bug later.