When a new primary key is added to an InnoDB table, then the following
steps are taken by InnoDB plugin:
. let t1 be the original table.
. a temporary table t1@00231 will be created by cloning t1.
. all data will be copied from t1 to t1@00231.
. rename t1 to t1@00232.
. rename t1@00231 to t1.
. drop t1@00232.
The rename and drop operations involve file operations. But file operations
cannot be rolled back. So in row_merge_rename_tables(), just after doing data
dictionary update and before doing any file operations, generate redo logs
for file operations and commit the transaction. This will ensure that any
crash after this commit, the table is still recoverable by moving .ibd and
.frm files. Manual recovery is required.
During recovery, the rename file operation redo logs are processed.
Previously this was being ignored.
rb://1460 approved by Marko Makela.
Brief description: After insert some rows to MEMORY table with HASH key some
rows can't be deleted in one step.
Problem Analysis/solution: info->current_ptr will have the information about the
current hash pointer from where we can traverse to the list to get all the
remaining tuples.
In hp_delete_key we are updating info->current_ptr with the last_pos based on
the flag parameter(which is the keydef and last index are same). As part of the
fix we are making it to zero only when the code flow reaches to the end of the
function hp_delete_key() it means that the next record which has to get deleted
will be at the starting of the list so, that in the next call to
read record(heap_rnext()) will take line number 100 path instead of 102 path,
please see the below code in file hp_rnext.c, function heap_rnext().
99 else if (!info->current_ptr) /* Deleted or first call */
100 pos= hp_search(info, keyinfo, info->lastkey, 0);
101 else
102 pos= hp_search(info, keyinfo, info->lastkey, 1);
with that change the hp_search() will update the info->current_ptr with the
record which needs to be deleted.
Problem description:
mysql server crashes when we run repair table on currupted table.
Analysis:
The problem with this bug seem to be key_reflength out of bounds
(186 according to debugger). We read this value from meta-data
segment of .MYI file while doing mi_open().
If you look into _mi_kpointer() you can see that the upper limit
for key_reflength is 7.
Solution:
In mi_open() there is a line like:
if (share->base.keystart > 65535 || share->base.rec_reflength > 8)
we should verify key_reflength here as well.
TO 'MYISAM_SORT_BUFFER_SIZE'
Problem: 'myisam_sort_buffer_size' is a parameter used by
mysqld program only whereas 'sort_buffer_size' is used by
mysqld and myisamchk programs. But the error message printed
when myisamchk program is run with insufficient buffer size
is myisam_sort_buffer_size is too small which may mislead to the
server parameter myisam_sort_buffer_size.
SOLUTION: A parameter 'myisam_sort_buffer_size' is added as an
alias for 'sort_buffer_size' and the 'sort_buffer_size' parameter
is marked as deprecated. So myisamchk also has both the parameters
with the same role.
TO 'MYISAM_SORT_BUFFER_SIZE'
Problem: 'myisam_sort_buffer_size' is a parameter used by
mysqld program only whereas 'sort_buffer_size' is used by
mysqld and myisamchk programs. But the error message printed
when myisamchk program is run with insufficient buffer size
is myisam_sort_buffer_size is too small which may mislead to the
server parameter myisam_sort_buffer_size.
SOLUTION: A parameter 'myisam_sort_buffer_size' is added as an
alias for 'sort_buffer_size' and the 'sort_buffer_size' parameter
is marked as deprecated. So myisamchk also has both the parameters
with the same role.
IN ALTER TABLE ... ADD UNIQUE KEY
A bogus debug assertion failure occurred when reporting a duplicate
key on a column prefix of a CHAR column.
This is a regression from Bug#14729221 IN-PLACE ALTER TABLE REPORTS ''
INSTEAD OF REAL DUPLICATE VALUE FOR PREFIX KEYS. The assertion is only
present when UNIV_DEBUG is defined (which it is in debug builds
starting from MySQL 5.5). It is a case of overasserting.
Fix approved by Inaam Rana on IM.
LEN <= SIZEOF(ULONGLONG)
This bug was caught in the WL#6255 ALTER TABLE...ADD COLUMN in MySQL
5.6, but there is a bug in all InnoDB versions that support
auto-increment columns.
row_search_autoinc_read_column(): When reading the maximum value of
the auto-increment column, and the column only contains NULL values,
return 0. This corresponds to the case when the table is empty in
row_search_max_autoinc().
rb:1415 approved by Sunny Bains
REAL DUPLICATE VALUE FOR PREFIX KEYS
innobase_rec_to_mysql(): Invoke dict_index_get_nth_col_or_prefix_pos()
instead of dict_index_get_nth_col_pos() to find the column.
SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE
LOOKUPS (honoring kill query while accessing sec_index)
If secondary index is being used for select query evaluation and this
query is operating with consistent read snapshot it might take good time for
secondary index to return back control to mysql as MVCC would kick in.
If user issues "kill query <id>" while query is actively accessing
secondary index it will not be honored as there is no hook to check
for this condition. Added hook for this check.
-----
Parallely secondary index taking too long to evaluate for consistent
read snapshot case is being examined for performance improvement. WL#6540.
SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE
LOOKUPS (honoring kill query while accessing sec_index)
If secondary index is being used for select query evaluation and this
query is operating with consistent read snapshot it might take good time for
secondary index to return back control to mysql as MVCC would kick in.
If user issues "kill query <id>" while query is actively accessing
secondary index it will not be honored as there is no hook to check
for this condition. Added hook for this check.
-----
Parallely secondary index taking too long to evaluate for consistent
read snapshot case is being examined for performance improvement. WL#6540.
The problem is in the error handling in row_create_table_for_mysql().
In the 'disk full' case we may forget to call dict_mem_table_free() on
the table object.
Approved by: Marko (rb:1377 and rb:1386)
CONSISTENT SNAPSHOT OPTION
A transaction is started with a consistent snapshot. After
the transaction is started new indexes are added to the
table. Now when we issue an update statement, the optimizer
chooses an index. When the index scan is being initialized
via ha_innobase::change_active_index(), InnoDB reports
the error code HA_ERR_TABLE_DEF_CHANGED, with message
stating that "insufficient history for index".
This error message is propagated up to the SQL layer. But
the my_error() api is never called. The statement level
diagnostics area is not updated with the correct error
status (it remains in Diagnostics_area::DA_EMPTY).
Hence the following check in the Protocol::end_statement()
fails.
516 case Diagnostics_area::DA_EMPTY:
517 default:
518 DBUG_ASSERT(0);
519 error= send_ok(thd->server_status, 0, 0, 0, NULL);
520 break;
The fix is to backport the fix of bugs 14365043, 11761652
and 11746399.
14365043 PROTOCOL::END_STATEMENT(): ASSERTION `0' FAILED
11761652 HA_RND_INIT() RESULT CODE NOT CHECKED
11746399 RETURN VALUES OF HA_INDEX_INIT() AND INDEX_INIT() IGNORED
rb://1227 approved by guilhem and mattiasj.
We did not allocate enough bits for index->trx_id_offset, causing an
UPDATE or DELETE of a table with a PRIMARY KEY longer than 1024 bytes
to corrupt the PRIMARY KEY.
dict_index_t: Allocate enough bits.
dict_index_build_internal_clust(): Check for overflow of
index->trx_id_offset. Trip a debug assertion when overflow occurs.
rb:1380 approved by Jimmy Yang
TRANSACTION ROLLBACK
Description: During the rollback operation, a blob page
is removed earlier than desired. Consider following scenario:
1. create table t1(a int primary key,b blob) engine=innodb;
2. insert into t1 values (1,repeat('b',9000));
3. begin;
4. update t1 set b=concat(b,'b');
5. update t1 set a=a+1;
6. insert into t1 values (1,repeat('b',9000));
7. rollback;
The update operation in line 5 produces 2 undo log record. The first
undo record (TRX_UNDO_DEL_MARK_REC) goes to trx->update_undo and the
second undo record (TRX_UNDO_INSERT_REC) goes to trx->insert_undo.
During rollback, they are executed out of order.
When the undo record TRX_UNDO_DEL_MARK_REC is applied/executed,
the blob ownership is also reset. Because of this the blob page
is released earlier than desired. This blob page must have been
freed only as part of applying/executing the undo record
TRX_UNDO_INSERT_REC.
This problem can be avoided by executing the undo records in
order. This patch will make innodb to execute the undo records
in order.
rb://1125 approved by Marko.
Delete-mark change buffer records when resorting to a pessimistic
delete from the change buffer B-tree. Skip delete-marked records in
the change buffer merge and when estimating whether an operation can
be buffered. Without this fix, we could try to apply the same buffered
changes multiple times if the server was killed at the right moment.
In MySQL 5.5 and later: ibuf_get_volume_buffered_count_func(): Ignore
delete-marked (already processed) records.
ibuf_delete_rec(): Add a crash point before optimistic delete. If the
optimistic delete fails, flag the record processed before
mtr_commit().
ibuf_merge_or_delete_for_page(): Ignore delete-marked (already
processed) records.
Backport to 5.1: Rename btr_cur_del_unmark_for_ibuf() to
btr_cur_set_deleted_flag_for_ibuf() and add a parameter.
rb:1307 approved by Jimmy Yang
page_zip_validate(), page_zip_validate_low(): Add a parameter for the
B-tree index.
page_zip_validate_low(): If the page contents does not match, check
that the record link chains match. Furthermore, if dict_index_t is
passed, check that the records match. (This reduces coverage a bit: if
index=NULL, we will ignore differences in record contents, that is,
the page payload.)
rb:1264 approved by Inaam Rana
rb://1293
approved by: Marko Makela
There is race when dropping a single table tablespace where a reader
thread can initiate a read request before the delete flag is set and
before it is finished the deleting thread can attempt to free the
fil_node.
This patch checks the status in fil_io() to make sure that the
tablespace is not being deleted. If it is being deleted then
an error is returned instead of attempting IO.
THOUGH IT IS NOT.
The following error message is misleading because it claims
that the BLOB space is not counted.
"ERROR 1118 (42000): Row size too large. The maximum row size for
the used table type, not counting BLOBs, is 8126. You have to
change some columns to TEXT or BLOBs"
When the ROW_FORMAT=compact or ROW_FORMAT=REDUNDANT is used,
the BLOB prefix is stored inline along with the row. So
the above error message is changed as follows depending on
the row format used:
For ROW_FORMAT=COMPRESSED or ROW_FORMAT=DYNAMIC, the error
message is as follows:
"ERROR 42000: Row size too large (> 8126). Changing some
columns to TEXT or BLOB may help. In current row format,
BLOB prefix of 0 bytes is stored inline."
For ROW_FORMAT=COMPACT or ROW_FORMAT=REDUNDANT, the error
message is as follows:
"ERROR 42000: Row size too large (> 8126). Changing some
columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or
ROW_FORMAT=COMPRESSED may help. In current row
format, BLOB prefix of 768 bytes is stored inline."
rb://1252 approved by Marko Makela
PAGE SPLIT
page_rec_get_nth_const(): Map nth==0 to the page infimum.
btr_compress(adjust=TRUE): Add a debug assertion for nth>0. The cursor
should never be positioned on the page infimum.
btr_index_page_validate(): Add test instrumentation for checking the
return values of page_rec_get_nth_const() during CHECK TABLE, and for
checking that the page directory slot 0 always contains only one
record, the predefined page infimum record.
page_cur_delete_rec(), page_delete_rec_list_end(): Add debug
assertions guarding against accessing the page slot 0.
page_copy_rec_list_start(): Clarify a comment about ret_pos==0.
rb:1248 approved by Jimmy Yang
ha_innodb::records_in_range(): Remove a debug assertion
that prohibits an open range (full table).
The patch by Jorgen Loland only removed the assertion from the
built-in InnoDB, not from the InnoDB Plugin.
ha_innobase::records_in_range(): Remove a debug assertion
that prohibits an open range (full table).
This assertion catches unnecessary calls to this method,
but such calls are not harming correctness.
The ha_innobase table handler contained two search key buffers
(srch_key_val1, srch_key_val2) of fixed size used to store the search
key. The size of these buffers where fixed at
REC_VERSION_56_MAX_INDEX_COL_LEN + 2. But this size is not sufficient
to hold the search key. Hence the following assert in
row_sel_convert_mysql_key_to_innobase() failed.
2438 /* Storing may use at most data_len bytes of buf */
2439
2440 if (UNIV_LIKELY(!is_null)) {
2441 ut_a(buf + data_len <= original_buf + buf_len);
2442 row_mysql_store_col_in_innobase_format(
2443 dfield, buf,
2444 FALSE, /* MySQL key value format col */
2445 key_ptr + data_offset, data_len,
2446 dict_table_is_comp(index->table));
2447 buf += data_len;
2448 }
The buffer size is now calculated with the formula
MAX_KEY_LENGTH + MAX_REF_PARTS*2. This properly takes into account
the extra bytes needed to store the length for each column. An index
can contain a maximum of MAX_REF_PARTS columns in it, and for each
column 2 bytes are needed to store length.
rb://1238 approved by Marko and Vasil Dimov.
Backport from mysql-5.6 the fix
(revision-id sunny.bains@oracle.com-20120315045831-20rgfa4cozxmz7kz)
Bug#13839886 - CRASH IN INNOBASE_NEXT_AUTOINC
The assertion introduce in the fix for Bug#13817703
is too strong, a negative number can be greater
than the column max value, when the column value is
a negative number.
rb://978 Approved by Jimmy Yang.
rb:1236 approved by Marko Makela
WARNING
This patch is for mysql-5.5 only,
to be null-merged to mysql-5.6 and mysql-trunk.
This is a partial rollback of the file io instrumentation,
removing the instrumentation for mysql_file_stat in the archive engine.
See the bug comments for details.
HEURISTICS FOR COMPRESSED PAGE SIZE
The fix of Bug#12845774 was supposed to skip known-to-fail
btr_cur_optimistic_insert() calls. There was only one such call, in
btr_cur_pessimistic_update(). All other callers of
btr_cur_pessimistic_insert() would release and reacquire the B-tree
page latch before attempting the pessimistic insert. This would allow
other threads to restructure the B-tree, allowing (and requiring) the
insert to succeed as an optimistic (single-page) operation.
Failure to attempt an optimistic insert before a pessimistic one would
trigger an attempt to split an empty page.
rb:1234 approved by Sunny Bains
Facebook got a case where the page compresses really well so that
btr_cur_optimistic_update() returns DB_UNDERFLOW, but when a record
gets updated, the compression rate radically changes so that
btr_cur_insert_if_possible() can not insert in place despite
reorganizing/recompressing the page, leading to the assertion failing.
rb:1220 approved by Sunny Bains
COMPRESSED PAGE SIZE
This was submitted as MySQL Bug 61456 and a patch provided by
Facebook. This patch follows the same idea, but instead of adding a
parameter to btr_cur_pessimistic_insert(), we simply remove the
btr_cur_optimistic_insert() call there and add it to the only caller
that needs it.
btr_cur_pessimistic_insert(): Do not try btr_cur_optimistic_insert().
btr_insert_on_non_leaf_level_func(): Invoke btr_cur_optimistic_insert()
before invoking btr_cur_pessimistic_insert().
btr_cur_pessimistic_update(): Clarify in a comment why it is not
necessary to invoke btr_cur_optimistic_insert().
btr_root_raise_and_insert(): Assert that the root page is not empty.
This could happen if a pessimistic insert (involving a split or merge)
is performed without first attempting an optimistic (intra-page) insert.
rb:1219 approved by Sunny Bains
btr_cur_optimistic_insert(): Remove a bogus assertion. The insert may
fail after reorganizing the page.
btr_cur_optimistic_update(): Do not attempt to reorganize compressed pages,
because compression may fail after reorganization.
page_copy_rec_list_start(): Use page_rec_get_nth() to restore to the
ret_pos, which may also be the page infimum.
rb:1221
IN QUERIES
This bug was caused by an incorrect fix of
Bug#13807811 BTR_PCUR_RESTORE_POSITION() CAN SKIP A RECORD
There was nothing wrong with btr_pcur_restore_position(), but with the
use of it in the table scan during index creation.
rb:1206 approved by Jimmy Yang
Problem description:
Table 't' created with two colums having compound index on both the
columns under innodb/myisam engine at remote machine. In the local
machine same table is created undet the federated engine.
A select having where clause with along 'AND' operation gives wrong
results on local machine.
Analysis:
The given query at federated engine is wrongly transformed by
federated::create_where_from_key() function and the same was sent to
the remote machine. Hence the local machine is showing wrong results.
Given query "select c1 from t where c1 <= 2 and c2 = 1;"
Query transformed, after ha_federated::create_where_from_key() function is:
SELECT `c1`, `c2` FROM `t` WHERE (`c1` IS NOT NULL ) AND
( (`c1` >= 2) AND (`c2` <= 1) ) and the same sent to real_query().
In the above the '<=' and '=' conditions were transformed to '>=' and
'<=' respectively.
ha_federated::create_where_from_key() function behaving as below:
The key_range is having both the start_key and end_key. The start_key
is used to get "(`c1` IS NOT NULL )" part of the where clause, this
transformation is correct. The end_key is used to get "( (`c1` >= 2)
AND (`c2` <= 1) )", which is wrong, here the given conditions('<=' and '=')
are changed as wrong conditions('>=' and '<=').
The end_key is having {key = 0x39fa6d0 "", length = 10, keypart_map = 3,
flag = HA_READ_AFTER_KEY}
The store_length is having value '5'. Based on store_length and length
values the condition values is applied in HA_READ_AFTER_KEY switch case.
The switch case 'HA_READ_AFTER_KEY' is applicable to only the last part of
the end_key and for previous parts it is going to 'HA_READ_KEY_OR_NEXT' case,
here the '>=' is getting added as a condition instead of '<='.
Fix:
Updated the 'if' condition in 'HA_READ_AFTER_KEY' case to affect for all
parts of the end_key. i.e 'i > 0' will used for end_key, Hence added it in
the if condition.
Backporting the WL#5716, "Information schema table for InnoDB
buffer pool information". Backporting revisions 2876.244.113,
2876.244.102 from mysql-trunk.
rb://1175 approved by Jimmy Yang.
Backporting the WL#5716, "Information schema table for InnoDB
buffer pool information". Backporting revisions 2876.244.113,
2876.244.102 from mysql-trunk.
rb://1177 approved by Jimmy Yang.
ISSUE: Incorrect key file. Key file is corrupted,
Reading incorrect key information (keyseg)
from index file. Key definition in .MYI
and .FRM file differs. Starting pointer
to read the keyseg information is changed
to a value greater than the pack_reclength.
Memcpy tries to read keyseg information from
unallocated memory which causes the crash.
SOLUTION: One more check added to compare the
the key definition in .MYI and .FRM
file. If the definition differ, server
produces an error.
primary key with innodb tables
The bug was triggered if a single ALTER TABLE statement both
added and dropped indexes and ALTER TABLE failed during drop
(e.g. because the index was needed in a foreign key constraint).
In such cases, the server index information would get out of
sync with InnoDB - the added index would be present inside
InnoDB, but not in the server. This could then lead to InnoDB
error messages and/or server crashes.
The root cause is that new indexes are added before old indexes
are dropped. This means that if ALTER TABLE fails while dropping
indexes, index changes will be reverted in the server but not
inside InnoDB.
This patch fixes the problem by dropping any added indexes
if drop fails (for ALTER TABLE statements that both adds
and drops indexes).
However, this won't work if we added a primary key as this
key might not be possible to drop inside InnoDB. Therefore,
we resort to the copy algorithm if a primary key is added
by an ALTER TABLE statement that also drops an index.
In 5.6 this bug is more properly fixed by the handler interface
changes done in the scope of WL#5534 "Online ALTER".
ISSUE: Incorrect key file. Key file is corrupted,
Reading incorrect key information (keyseg)
from index file. Key definition in .MYI
and .FRM file differs. Starting pointer
to read the keyseg information is changed
to a value greater than the pack_reclength.
Memcpy tries to read keyseg information from
unallocated memory which causes the crash.
SOLUTION: One more check added to compare the
the key definition in .MYI and .FRM
file. If the definition differ, server
produces an error.
Changes in the InnoDB codebase required to compile and
integrate the MEB codebase with MySQL 5.5.
@ storage/innobase/btr/btr0btr.c
Excluded buffer pool usage from MEB build.
buf_pool_from_bpage calls are in buf0buf.ic, and
the buffer pool functions from that file are
disabled in MEB.
@ storage/innobase/buf/buf0buf.c
Disabling more buffer pool functions unused in MEB.
@ storage/innobase/dict/dict0dict.c
Disabling dict_ind_free that is unused in MEB.
@ storage/innobase/dict/dict0mem.c
The include
#include "ha_prototypes.h"
Was causing conflicts with definitions in my_global.h
Linking C executable mysqlbackup
libinnodb.a(dict0mem.c.o): In function `dict_mem_foreign_table_name_lookup_set':
dict0mem.c:(.text+0x91c): undefined reference to `innobase_get_lower_case_table_names'
libinnodb.a(dict0mem.c.o): In function `dict_mem_referenced_table_name_lookup_set':
dict0mem.c:(.text+0x9fc): undefined reference to `innobase_get_lower_case_table_names'
libinnodb.a(dict0mem.c.o): In function `dict_mem_foreign_table_name_lookup_set':
dict0mem.c:(.text+0x96e): undefined reference to `innobase_casedn_str'
libinnodb.a(dict0mem.c.o): In function `dict_mem_referenced_table_name_lookup_set':
dict0mem.c:(.text+0xa4e): undefined reference to `innobase_casedn_str'
collect2: ld returned 1 exit status
make[2]: *** [mysqlbackup] Error 1
innobase_get_lower_case_table_names
innobase_casedn_str
are functions that are part of ha_innodb.cc that is not part of the build
dict_mem_foreign_table_name_lookup_set
function is not there in the current codebase, meaning we do not use it in MEB.
@ storage/innobase/fil/fil0fil.c
The srv_fast_shutdown variable is declared in
srv0srv.c that is not compiled in the
mysqlbackup codebase.
This throws an undeclared error.
From the Manual
---------------
innodb_fast_shutdown
--------------------
The InnoDB shutdown mode. The default value is 1
as of MySQL 3.23.50, which causes a “fast� shutdown
(the normal type of shutdown). If the value is 0,
InnoDB does a full purge and an insert buffer merge
before a shutdown. These operations can take minutes,
or even hours in extreme cases. If the value is 1,
InnoDB skips these operations at shutdown.
This ideally does not matter from mysqlbackup
@ storage/innobase/ha/ha0ha.c
In file included from /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/ha/ha0ha.c:34:0:
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/btr0sea.h:286:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
make[2]: *** [CMakeFiles/innodb.dir/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/ha/ha0ha.c.o] Error 1
make[1]: *** [CMakeFiles/innodb.dir/all] Error 2
make: *** [all] Error 2
# include "sync0rw.h" is excluded from hotbackup compilation in dict0dict.h
This causes extern rw_lock_t* btr_search_latch_temp; to throw a failure because
the definition of rw_lock_t is not found.
@ storage/innobase/include/buf0buf.h
Excluding buffer pool functions that are unused from the
MEB codebase.
@ storage/innobase/include/buf0buf.ic
replicated the exclusion of
#include "buf0flu.h"
#include "buf0lru.h"
#include "buf0rea.h"
by looking at the current codebase in <meb-trunk>/src/innodb
@ storage/innobase/include/dict0dict.h
dict_table_x_lock_indexes, dict_table_x_unlock_indexes, dict_table_is_corrupted,
dict_index_is_corrupted, buf_block_buf_fix_inc_func are unused in MEB and was
leading to compilation errors and hence excluded.
@ storage/innobase/include/dict0dict.ic
dict_table_x_lock_indexes, dict_table_x_unlock_indexes, dict_table_is_corrupted,
dict_index_is_corrupted, buf_block_buf_fix_inc_func are unused in MEB and was
leading to compilation errors and hence excluded.
@ storage/innobase/include/log0log.h
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/log0log.h: At top level:
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/log0log.h:767:2: error: expected specifier-qualifier-list before â⠂¬Ëœmutex_t’
mutex_t definitions were excluded as seen from ambient code
hence excluding definition for log_flush_order_mutex also.
@ storage/innobase/include/os0file.h
Bug in InnoDB code, create_mode should have been create.
@ storage/innobase/include/srv0srv.h
In file included from /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/buf/buf0buf.c:50:0:
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/srv0srv.h: At top level:
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/srv0srv.h:120:16: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘srv_use_native_aio’
srv_use_native_aio - we do not use native aio of the OS anyway from MEB. MEB does not compile
InnoDB with this option. Hence disabling it.
@ storage/innobase/include/trx0sys.h
[ 56%] Building C object CMakeFiles/innodb.dir/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c.o
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c: In function ‘trx_sys_read_file_format_id’:
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c:1499:20: error: ‘TRX_SYS_FILE_FORMAT_TAG_MAGIC_N’ undeclared (first use in this function)
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c:1499:20: note: each undeclared identifier is reported only once for each function it appears in
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c: At top level:
/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/buf0buf.h:607:1: warning: ‘buf_block_buf_fix_inc_func’ declared ‘static’ but never defined
make[2]: *** [CMakeFiles/innodb.dir/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c.o] Error 1
unused calls excluded to enable compilation
@ storage/innobase/mem/mem0dbg.c
excluding #include "ha_prototypes.h" that lead to definitions in ha_innodb.cc
@ storage/innobase/os/os0file.c
InnoDB not compiled with aio support from MEB anyway. Hence excluding this from
the compilation.
@ storage/innobase/page/page0zip.c
page0zip.c:(.text+0x4e9e): undefined reference to `buf_pool_from_block'
collect2: ld returned 1 exit status
buf_pool_from_block defined in buf0buf.ic, most of the file is excluded for compilation of MEB
@ storage/innobase/ut/ut0dbg.c
excluding #include "ha_prototypes.h" since it leads to definitions in ha_innodb.cc
innobase_basename(file) is defined in ha_innodb.cc. Hence excluding that also.
@ storage/innobase/ut/ut0ut.c
cal_tm unused from MEB, was leading to earnings, hence disabling for MEB.
WHEN KILLING
Suppose there is a query waiting for a lock. If the user kills
this query, then "Got error -1 when reading table" error message
must not be logged in the server log file. Since this is a user
requested interruption, no spurious error message must be logged
in the server log. This patch will remove the error message from
the log.
approved by joh and tatjana
Fixed by backport of:
------------------------------------------------------------
revno: 3402.50.156
committer: Jon Olav Hauglid <jon.hauglid@oracle.com>
branch nick: mysql-trunk-test
timestamp: Wed 2012-02-08 14:10:23 +0100
message:
Bug#13417754 ASSERT IN ROW_DROP_DATABASE_FOR_MYSQL DURING DROP SCHEMA
This assert could be triggered if an InnoDB table was being moved
to a different database using ALTER TABLE ... RENAME, while this
database concurrently was being dropped by DROP DATABASE.
The reason for the problem was that no metadata lock was taken
on the target database by ALTER TABLE ... RENAME.
DROP DATABASE was therefore not blocked and could remove
the database while ALTER TABLE ... RENAME was executing. This
could cause the assert in InnoDB to be triggered.
This patch fixes the problem by taking a IX metadata lock on
the target database before ALTER TABLE ... RENAME starts
moving a table to a different database.
Note that this problem did not occur with RENAME TABLE which
already takes the correct metadata locks.
Also note that this patch slightly changes the behavior of
ALTER TABLE ... RENAME. Before, the statement would abort and
return an error if a lock on the target table name could not
be taken immediately. With this patch, ALTER TABLE ... RENAME
will instead block and wait until the lock can be taken
(or until we get a lock timeout). This also means that it is
possible to get ER_LOCK_DEADLOCK errors in this situation
since we allow ALTER TABLE ... RENAME to wait and not just
abort immediately.
WHEN KILLING
Suppose there is a query waiting for a lock. If the user kills
this query, then "Got error -1 when reading table" error message
must not be logged in the server log file. Since this is a user
requested interruption, no spurious error message must be logged
in the server log. This patch will remove the error message from
the log.
approved by joh and tatjana
Details:
- Archive storage engine file access were not instrumented and thus
were not shown in PS tables.
Fix:
- Added instrumentation code by using PS Apis for I/O.
rb://1088
approved by: Marko Makela
This bug was introduced in early stages of plugin. We were not
checking for an implicit lock on sec index rec for trx_id that is
stamped on current version of the clust_index in case where the
clust_index has a previous delete marked version.
The following scenario crashes our mysql server:
1. set global innodb_file_per_table=1;
2. create table t1(c1 int) engine=innodb;
3. alter table t1 discard tablespace;
4. alter table t1 add unique index(c1);
Step 4 crashes the server. This patch introduces a check on discarded
tablespace to avoid the crash.
rb://1041 approved by Marko Makela
FULLTEXT INDEX AND CONCURRENT DML.
Problem Statement:
------------------
1) Create a table with FT index.
2) Enable concurrent inserts.
3) In multiple threads do below operations repeatedly
a) truncate table
b) insert into table ....
c) select ... match .. against .. non-boolean/boolean mode
After some time we could observe two different assert core dumps
Analysis:
--------
1)assert core dump at key_read_cache():
Two select threads operating in-parallel on same key
root block.
1st select thread block->status is set to BLOCK_ERROR
because the my_pread() in read_block() is returning '0'.
Truncate table made the index file size as 1024 and pread
was asked to get the block of count bytes(1024 bytes)
from offset of 1024 which it cannot read since its
"end of file" and retuning '0' setting
"my_errno= HA_ERR_FILE_TOO_SHORT" and the key_file_length,
key_root[0] is same i.e. 1024. Since block status has BLOCK_ERROR
the 1st select thread enter into the free_block() and will
be under wait on conditional mutex by making status as
BLOCK_REASSIGNED and goes for wait_on_readers(). Other select
thread will also work on the same block and sees the status as
BLOCK_ERROR and enters into free_block(), checks for BLOCK_REASSIGNED
and asserting the server.
2)assert core dump at key_write_cache():
One select thread and One insert thread.
Select thread gets the unlocks the 'keycache->cache_lock',
which allows other threads to continue and gets the pread()
return value as'0'(please see the explanation above) and
tries to get the lock on 'keycache->cache_lock' and waits
there for the lock.
Insert thread requests for the block, block will be assigned
from the hash list and makes the page_status as
'PAGE_WAIT_TO_BE_READ' and goes for the read_block(), waits
in the queue since there are some other threads performing
reads on the same block.
Select thread which was waiting for the 'keycache->cache_lock'
mutex in the read_block() will continue after getting the my_pread()
value as '0' and sets the block status as BLOCK_ERROR and goes to
the free_block() and go to the wait_for_readers().
Now the insert thread will awake and continues. and checks
block->status as not BLOCK_READ and it asserts.
Fix:
---
In the full text code, multiple readers of index file is not guarded.
Hence added below below code in _ft2_search() and walk_and_match().
to lock the key_root I have used below code in _ft2_search()
if (info->s->concurrent_insert)
mysql_rwlock_rdlock(&share->key_root_lock[0]);
and to unlock
if (info->s->concurrent_insert)
mysql_rwlock_unlock(&share->key_root_lock[0]);
dict_table_replace_index_in_foreign_list(): Replace the dropped index
also in the foreign key constraints of child tables that are
referencing this table.
row_ins_check_foreign_constraint(): If the underlying index is
missing, refuse the operation.
rb:1051 approved by Jimmy Yang
The following scenario crashes our mysql server:
1. set global innodb_file_per_table=1;
2. create table t1(c1 int) engine=innodb;
3. alter table t1 discard tablespace;
4. alter table t1 add unique index(c1);
Step 4 crashes the server. This patch introduces a check on discarded
tablespace to avoid the crash.
rb://1041 approved by Marko Makela
FULLTEXT INDEX AND CONCURRENT DML.
Problem Statement:
------------------
1) Create a table with FT index.
2) Enable concurrent inserts.
3) In multiple threads do below operations repeatedly
a) truncate table
b) insert into table ....
c) select ... match .. against .. non-boolean/boolean mode
After some time we could observe two different assert core dumps
Analysis:
--------
1)assert core dump at key_read_cache():
Two select threads operating in-parallel on same key
root block.
1st select thread block->status is set to BLOCK_ERROR
because the my_pread() in read_block() is returning '0'.
Truncate table made the index file size as 1024 and pread
was asked to get the block of count bytes(1024 bytes)
from offset of 1024 which it cannot read since its
"end of file" and retuning '0' setting
"my_errno= HA_ERR_FILE_TOO_SHORT" and the key_file_length,
key_root[0] is same i.e. 1024. Since block status has BLOCK_ERROR
the 1st select thread enter into the free_block() and will
be under wait on conditional mutex by making status as
BLOCK_REASSIGNED and goes for wait_on_readers(). Other select
thread will also work on the same block and sees the status as
BLOCK_ERROR and enters into free_block(), checks for BLOCK_REASSIGNED
and asserting the server.
2)assert core dump at key_write_cache():
One select thread and One insert thread.
Select thread gets the unlocks the 'keycache->cache_lock',
which allows other threads to continue and gets the pread()
return value as'0'(please see the explanation above) and
tries to get the lock on 'keycache->cache_lock' and waits
there for the lock.
Insert thread requests for the block, block will be assigned
from the hash list and makes the page_status as
'PAGE_WAIT_TO_BE_READ' and goes for the read_block(), waits
in the queue since there are some other threads performing
reads on the same block.
Select thread which was waiting for the 'keycache->cache_lock'
mutex in the read_block() will continue after getting the my_pread()
value as '0' and sets the block status as BLOCK_ERROR and goes to
the free_block() and go to the wait_for_readers().
Now the insert thread will awake and continues. and checks
block->status as not BLOCK_READ and it asserts.
Fix:
---
In the full text code, multiple readers of index file is not guarded.
Hence added below below code in _ft2_search() and walk_and_match().
to lock the key_root I have used below code in _ft2_search()
if (info->s->concurrent_insert)
mysql_rwlock_rdlock(&share->key_root_lock[0]);
and to unlock
if (info->s->concurrent_insert)
mysql_rwlock_unlock(&share->key_root_lock[0]);
BY A CONCURRENT TRANSACTIO
The member function QUICK_RANGE_SELECT::init_ror_merged_scan() performs
a table handler clone. Innodb does not provide a clone operation.
The ha_innobase::clone() is not there. The handler::clone() does not
take care of the ha_innobase->prebuilt->select_lock_type. Because of
this what happens is that for one index we do a locking read, and
for the other index we were doing a non-locking (consistent) read.
The patch introduces ha_innobase::clone() member function.
It is implemented similar to ha_myisam::clone(). It calls the
base class handler::clone() and then does any additional operation
required. I am setting the ha_innobase->prebuilt->select_lock_type
correctly.
rb://1060 approved by Marko
rb://1043
approved by: Sunny Bains
Two internal counters were incremented twice for a single
operations. The counters are:
srv_buf_pool_flushed
buf_lru_flush_page_count
TABLES IN INCORRECT ENGINE
PROBLEM:
CREATE/ALTER TABLE currently can move system tables like
mysql.db, user, host etc, to engines other than MyISAM. This is not
completely supported as of now, by mysqld. When some of system tables
like plugin, servers, event, func, *_priv, time_zone* are moved
to innodb, mysqld restart crashes. Currently system tables
can be moved to BLACKHOLE also!!!.
ANALYSIS:
The problem is that there is no check before creating or moving
a system table to some particular engine.
System tables are suppose to be residing in MyISAM. We can think
of restricting system tables to exist only in MyISAM. But, there could
be future needs of these system tables to be part of other engines
by design. For eg, NDB cluster expects some tables to be on innodb
or ndb engine. This calls for a solution, by which system
tables can be supported by any desired engine, with minimal effort.
FIX:
The solution provides a handlerton interface using which,
mysqld server can query particular storage engine handlerton for
system tables that it supports. This way each storage engine
layer can define their own system database and system tables.
The check_engine() function uses the new handlerton function
ha_check_if_supported_system_table() to check if db.tablename
provided in the DDL is supported by the SE.
Note: This fix has modified a test in help.test, which was moving
mysql.help_* to innodb. The primary intention of the test was not
to move them between engines.
Bug#13639204 64111: CRASH ON SELECT SUBQUERY WITH NON UNIQUE INDEX
The crash happened due to wrong calculation
of key length during creation of reference for
sort order index. The problem is that
keyuse->used_tables can have OUTER_REF_TABLE_BIT enabled
but used_tables parameter(create_ref_for_key() func) does
not have it. So key parts which have OUTER_REF_TABLE_BIT
are ommited and it could lead to incorrect key length
calculation(zero key length).
Fix the calculation of the next autoinc value when offset > 1. Some of the
results have changed due to the changes in the allocation calculation. The
new calculation will result in slightly bigger gaps for bulk inserts.
rb://866 Approved by Jimmy Yang.
Backported from mysql-trunk (5.6)
LOCK_THREAD_COUNT
When using the performance schema file io instrumentation in MySQL 5.5,
a thread would loop forever inside lf_pinbox_put_pins, when disconnecting.
It would also hold LOCK_thread_count while doing so, effectively killing the
server.
The root cause of the loop in lf_pinbox_put_pins() is a leak of LF_PINS,
when used with the filename_hash LF_HASH table in the performance schema.
This fix contains the following changes:
1)
Added the missing call to lf_hash_search_unpin(), to prevent the leak.
2)
In mysys/lf_alloc-pin.c, there was some extra debugging code
(MY_LF_EXTRA_DEBUG) written to detect precisely this kind of issues,
but it was never used.
Replaced MY_LF_EXTRA_DEBUG with DBUG_OFF, so that leaks similar to this one
can be always detected in regular debug builds.
3)
Backported the fix for the following bug, from 5.6 to 5.5:
Bug#13417446 - 63339: INCORRECT FILE PATH IN PEFORMANCE_SCHEMA ON WINDOWS
HANG IN PREPARING WITH 100% CPU USAGE
Infinite loop in the subselect_indexsubquery_engine::exec()
function caused Server hang with 100% CPU usage.
The BLACKHOLE storage engine didn't update handler's
table->status variable after index operations, that
caused an infinite "while(!table->status)" execution.
Index access methods of the BLACKHOLE engine handler
have been updated to set table->status variable to
STATUS_NOT_FOUND or 0 when such a method returns a
HA_ERR_END_OF_FILE error or 0 respectively.
rb://942
approved by: Marko Makela
We don't need to scan LRU for dropping AHI entries when DROPing a table.
AHI entries are already removed when we free up extents for the btree.
FROM BUFFER POOL
rb://975
approved by: Marko Makela
There is a race in lock_validate() where we try to access a page
without ensuring that the tablespace stays valid during the operation
i.e.: it is not deleted. This patch tries to fix that by using an
existing flag (the flag is renamed to make it's name more generic
in line with it's new use).
IN OS_THREAD_EQ
rb://977
approved by: Marko Makela
rw_lock::writer_thread field contains the thread id of current x-holder
or wait-x thread. This field is un-initialized at lock creation and is
written to for the first time when an attempt is made to x-lock.
Current code considers ::writer_thread as valid memory region only when
the lock is held in x-mode (or there is an x-waiter). This is an
overkill and it generates valgrind warnings.
The fix is to consider ::writer_thread as valid memory region once it
has been written to.
Reasoning:
==========
The ::writer_thread can be safely considered valid because:
* We only ever do comparison with current calling threads id.
* We only ever do comparison when ::recursive flag is set
* We always unset ::recursive flag in x-unlock
* Same thread cannot be unlocking and attempting to lock at the same
time
* thread_id recycling is not an issue because before an id is recycled
the thread must leave innodb meaning it must release all locks meaning
it must unset ::recursive flag.
IN LOCK_VALIDATE()
rb://917
approved by: Marko Makela
In lock_validate() the limit is used to release the kernel_mutex during
the validation, to obey the latching order.
If we do the limit++ then we are rechecking the same lock most times on
each iteration because limit is being incremented by one and
<space, page_no> will nearly always be > limit. If we set the limit
correctly to (space, page+1) then we are actually making progress
during the iteration.
truncating, inserting the same set of rows. When a table is
re-created with the same set of rows, the data file size must
not grow.
rb:968
Approved by Marko.
This bug has been there at least since MySQL 4.0.9. (Before 4.0.9, the
code probably was even more severely broken.)
btr_pcur_restore_position(): When cursor restoration fails, before
invoking btr_pcur_store_position() move to the previous or next record
unless cursor->rel_pos==BTR_PCUR_ON or the record was not a user
record.
This bug can cause skipped records when btr_pcur_store_position() is
called on the last record of a page. A symptom would be record count
mismatch in CHECK TABLE, or failure to find a record to delete-mark or
update or purge. The following operations should be affected by the
bug:
* row_search_for_mysql(): SELECT, UPDATE, REPLACE, CHECK TABLE,
(almost anything else than INSERT)
* foreign key CASCADE operations
* row_merge_read_clustered_index(): index creation (since MySQL 5.1
InnoDB Plugin)
* multi-threaded purge (after MySQL 5.5): not sure, but it might fail
to purge some records
Not all callers of btr_pcur_restore_position() should be affected.
Anything that asserts or checks that restoration succeeds is
unaffected. For example, cursor restoration on the change buffer tree
should always succeed, because access is being protected by additional
latches. Likewise, rollback, or any code accesses data dictionary
tables while holding dict_sys->mutex should be safe.
rb:967 approved by Jimmy Yang
There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
Backporting the fix from MySQL 5.5 to 5.1
rb:961
rb:947
also filed as Bug#13146269, Bug#13713178
btr_get_size(): Add mtr_t parameter. Require that the caller S-latches
index->lock. If index->page==FIL_NULL or the index is to be dropped,
return ULINT_UNDEFINED to indicate that the statistics are
unavailable.
dict_update_statistics(): If btr_get_size() returns ULINT_UNDEFINED,
fake the index cardinality statistics.
dict_index_set_page(): Unused function, remove.
row_drop_table_for_mysql(): Before starting to drop the table, mark
the indexes unavailable in the data dictionary cache while holding
index->lock X-latch.
ha_innobase::prepare_drop_index(), ha_innobase::final_drop_index():
When setting index->to_be_dropped, acquire the index->lock X-latch.
rb:960 approved by Jimmy Yang
There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
rb:947
approved by Jimmy Yang
This bug was originally filed and fixed as Bug#12612184. The original
fix was buggy, and it was patched by Bug#12704861. Also that patch was
buggy (potentially breaking crash recovery), and both fixes were
reverted.
This fix was not ported to the built-in InnoDB of MySQL 5.1, because
the function signatures of many core functions are different from
InnoDB Plugin and later versions. The block allocation routines and
their callers would have to changed so that they handle block
descriptors instead of page frames.
When a record is updated so that its size grows, non-updated columns
can be selected for external (off-page) storage. The bug is that the
initially inserted updated record contains an all-zero BLOB pointer to
the field that was not updated. Only after the BLOB pages have been
allocated and written, the valid pointer can be written to the record.
Between the release of the page latch in mtr_commit(mtr) after
btr_cur_pessimistic_update() and the re-latching of the page in
btr_pcur_restore_position(), other threads can see the invalid BLOB
pointer consisting of 20 zero bytes. Moreover, if the system crashes
at this point, the situation could persist after crash recovery, and
the contents of the non-updated column would be permanently lost.
The problem is amplified by the ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED that were introduced in
innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist
in all InnoDB versions.
The fix is as follows. After a pessimistic B-tree operation that needs
to write out off-page columns, allocate the pages for these columns in
the mini-transaction that performed the B-tree operation (btr_mtr),
but write the pages in a separate mini-transaction (blob_mtr). Do
mtr_commit(blob_mtr) before mtr_commit(btr_mtr). A quirk: Do not reuse
pages that were previously freed in btr_mtr. Only write the off-page
columns to 'fresh' pages.
In this way, crash recovery will see redo log entries for blob_mtr
before any redo log entry for btr_mtr. It will apply the BLOB page
writes to pages that were marked free at that point. If crash recovery
fails to see all of the btr_mtr redo log, there will be some
unreachable BLOB data in free pages, but the B-tree will be in a
consistent state.
btr_page_alloc_low(): Renamed from btr_page_alloc(). Add the parameter
init_mtr. Return an allocated block, or NULL. If init_mtr!=mtr but
the page was already X-latched in mtr, do not initialize the page.
btr_page_alloc(): Wrapper for btr_page_alloc_for_ibuf() and
btr_page_alloc_low().
btr_page_free(): Add a debug assertion that the page was a B-tree page.
btr_lift_page_up(): Return the father block.
btr_compress(), btr_cur_compress_if_useful(): Add the parameter ibool
adjust, for adjusting the cursor position.
btr_cur_pessimistic_update(): Preserve the cursor position when
big_rec will be written and the new flag BTR_KEEP_POS_FLAG is defined.
Remove a duplicate rec_get_offsets() call. Keep the X-latch on
index->lock when big_rec is needed.
btr_store_big_rec_extern_fields(): Replace update_inplace with
an operation code, and local_mtr with btr_mtr. When not doing a
fresh insert and btr_mtr has freed pages, put aside any pages that
were previously X-latched in btr_mtr, and free the pages after
writing out all data. The data must be written to 'fresh' pages,
because btr_mtr will be committed and written to the redo log after
the BLOB writes have been written to the redo log.
btr_blob_op_is_update(): Check if an operation passed to
btr_store_big_rec_extern_fields() is an update or insert-by-update.
fseg_alloc_free_page_low(), fsp_alloc_free_page(),
fseg_alloc_free_extent(), fseg_alloc_free_page_general(): Add the
parameter init_mtr. Return an allocated block, or NULL. If
init_mtr!=mtr but the page was already X-latched in mtr, do not
initialize the page.
xdes_get_descriptor_with_space_hdr(): Assert that the file space
header is being X-latched.
fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page().
fsp_page_create(): New function, for allocating, X-latching and
potentially initializing a page. If init_mtr!=mtr but the page was
already X-latched in mtr, do not initialize the page.
fsp_free_page(): Add ut_ad(0) to the error outcomes.
fsp_free_page(), fseg_free_page_low(): Increment mtr->n_freed_pages.
fsp_alloc_seg_inode_page(), fseg_create_general(): Assert that the
page was not previously X-latched in the mini-transaction. A file
segment or inode page should never be allocated in the middle of an
mini-transaction that frees pages, such as btr_cur_pessimistic_delete().
fseg_alloc_free_page_low(): If the hinted page was allocated, skip the
check if the tablespace should be extended. Return NULL instead of
FIL_NULL on failure. Remove the flag frag_page_allocated. Instead,
return directly, because the page would already have been initialized.
fseg_find_free_frag_page_slot() would return ULINT_UNDEFINED on error,
not FIL_NULL. Correct a bogus assertion.
fseg_alloc_free_page(): Redefine as a wrapper macro around
fseg_alloc_free_page_general().
buf_block_buf_fix_inc(): Move the definition from the buf0buf.ic to
buf0buf.h, so that it can be called from other modules.
mtr_t: Add n_freed_pages (number of pages that have been freed).
page_rec_get_nth_const(), page_rec_get_nth(): The inverse function of
page_rec_get_n_recs_before(), get the nth record of the record
list. This is faster than iterating the linked list. Refactored from
page_get_middle_rec().
trx_undo_rec_copy(): Add a debug assertion for the length.
trx_undo_add_page(): Return a block descriptor or NULL instead of a
page number or FIL_NULL.
trx_undo_report_row_operation(): Add debug assertions.
trx_sys_create_doublewrite_buf(): Assert that each page was not
previously X-latched.
page_cur_insert_rec_zip_reorg(): Make use of page_rec_get_nth().
row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG, so that
the repositioning of the cursor can be avoided.
row_ins_index_entry_low(): Add DEBUG_SYNC points before and after
writing off-page columns. If inserting by updating a delete-marked
record, do not reposition the cursor or commit the mini-transaction
before writing the off-page columns.
row_build(): Tighten a debug assertion about null BLOB pointers.
row_upd_clust_rec(): Add DEBUG_SYNC points before and after writing
off-page columns. Do not reposition the cursor or commit the
mini-transaction before writing the off-page columns.
rb:939 approved by Jimmy Yang
The problem was introduced in 5.5.20 by Bug 13116225. It tried to
protect against downgrading from a version 5.6.4 database that was
created with a page size other than 16k. Version 5.6.4 supports
page sizes 4k and 8k and stamps that page size into the header page
of each tablespace file. Version 5.5.20 attempts to read that page
size in the file header.
But it turns out that only the first system tablespace file has a
reliable flags field in the header. So only ibdata1 can be or needs
to be tested for another page size. Extra system tablespace files
like ibdata2, ibdata3, etc do not and should not be tested since the
flags field is unreliable.
OF WIDE RECORDS
row_ins_index_entry_low(), row_upd_clust_rec(): Make a redo log
checkpoint if a DEBUG flag is set. Add DEBUG_SYNC around
btr_store_big_rec_extern_fields().
rb:946 approved by Jimmy Yang
During FIC error handling the trx->error_state was not being set to DB_SUCCESS
after failure, before attempting the next DDL SQL operation. This reset to
DB_SUCCESS is somewhat of a requirement though not explicitly stated anywhere.
The fix is to reset it to DB_SUCCESS in row0merge.cc if row_merge_rename_indexes
or row_merge_drop_index functions fail, also reset to DB_SUCCESS at trx commit.
rb://935 Approved by Jimmy Yang.
During FIC error handling the trx->error_state was not being set to DB_SUCCESS
after failure, before attempting the next DDL SQL operation. This reset to
DB_SUCCESS is somewhat of a requirement though not explicitly stated anywhere.
The fix is to reset it to DB_SUCCESS in row0merge.cc if row_merge_rename_indexes
or row_merge_drop_index functions fail, also reset to DB_SUCCESS at trx commit.
rb://935 Approved by Jimmy Yang.
The actual Bug#11754376 does not exist in MySQL 5.5 because at startup
we drop entries for temporary tables from InnoDB dictionary cache (only
if ROW_FORMAT is not REDUNDANT). But nevertheless the bug in
normalize_table_name_low() is present so we fix it.
GRACEFUL SHUTDOWN
During startup mysql picks up .frm files from the tmpdir directory and
tries to drop those tables in the storage engine.
The problem is that when tmpdir ends in / then ha_innobase::delete_table()
is passed a string like "/var/tmp//#sql123", then it wrongly normalizes it
to "/#sql123" and calls row_drop_table_for_mysql() which of course fails
to delete the table entry from the InnoDB dictionary cache.
ha_innobase::delete_table() returns an error but nevertheless mysql wipes
away the .frm file and the entry in the InnoDB dictionary cache remains
orphaned with no easy way to remove it.
The "no easy" way to remove it is to create a similar temporary table again,
copy its .frm file to tmpdir under "#sql123.frm" and restart mysqld with
tmpdir=/var/tmp (no trailing slash) - this way mysql will pick the .frm file
after restart and will try to issue drop table for "/var/tmp/#sql123"
(notice do double slash), ha_innobase::delete_table() will normalize it to
"tmp/#sql123" and row_drop_table_for_mysql() will successfully remove the
table entry from the dictionary cache.
The solution is to fix normalize_table_name_low() to normalize things like
"/var/tmp//table" correctly to "tmp/table".
This patch also adds a test function which invokes
normalize_table_name_low() with various inputs to make sure it works
correctly and a mtr test that calls this test function.
Reviewed by: Marko (http://bur03.no.oracle.com/rb/r/929/)
CORRUPTED WHEN RUN CONCURRENTLY WITH
ISSUE: Table corruption due to concurrent queries.
Different threads running check, repair query
along with insert. Locks not properly acquired
in repair query. Rows are inserted inbetween
repair query.
SOLUTION: Mutex lock is acquired before the
repair call. Concurrent queries wont
effect the call to repair.
ON 64 BIT MACHINES
PROBLEM: When sorting index during repair of
myisam tables, due to improper casting
of buffer size variables value of myisam_
sort_buffer_size is not set greater than
4GB.
SOLUTION: Proper casting of buffer size variable.
myisam_buffer_size changed to unsigned
long long to handle size > 4GB on
linux as well as windows.
print page dump
buf_page_print(): Remove the ut_ad(0) from the beginning. Add two flags
(enum buf_page_print_flags) that can be bitwise-ORed together:
BUF_PAGE_PRINT_NO_CRASH:
Do not crash debug builds at the end of buf_page_print().
BUF_PAGE_PRINT_NO_FULL:
Do not print the full page dump. This can be useful when adding
diagnostic printout to flushing or to the doublewrite buffer.
trx_sys_doublewrite_init_or_restore_page(): Replace exit(1) with ut_error,
so that we can get a core dump if this extraordinary condition happens.
rb:924 approved by Sunny Bains
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
rb://914
approved by: Marko Makela
Poll in fil_rename_tablespace() after setting ::stop_ios flag can
result in a hang because the other thread actually dispatching the IO
won't wake IO helper threads or flush the tablespace before starting
wait in fil_mutex_enter_and_prepare_for_io().
This fix does not remove the underlying cause of the assertion
failure. It just works around the problem, allowing a corrupted
secondary index to be fixed by DROP INDEX and CREATE INDEX (or in the
worst case, by re-creating the table).
ibuf_delete(): If the record to be purged is the last one in the page
or it is not delete-marked, refuse to purge it. Instead, write an
error message to the error log and let a debug assertion fail.
ibuf_set_del_mark(): If the record to be delete-marked is not found,
display some more information in the error log and let a debug
assertion fail.
row_undo_mod_del_unmark_sec_and_undo_update(),
row_upd_sec_index_entry(): Let a debug assertion fail when the record
to be delete-marked is not found.
buf_page_print(): Add ut_ad(0) so that corruption will be more
prominent in stress testing with debug binaries. Add ut_ad(0) here and
there where corruption is noticed.
btr_corruption_report(): Display some data on page_is_comp() mismatch.
btr_assert_not_corrupted(): A wrapper around btr_corruption_report().
Assert that page_is_comp() agrees with the table flags.
rb:911 approved by Inaam Rana
rb://898
approved by: Marko Makela
On some kernel versions native aio operations are not supported on
tmpfs. Check this during start up and fall back to simulated aio.
When mode==BUF_KEEP_OLD, buffered inserts are being merged to the page.
It is possible that a read request for a page was pending while the page
was freed in DROP INDEX or DROP TABLE. In these cases, it is OK (although
useless) to merge the buffered changes to the freed page.
ISSUES WITH COPYING PARTITIONED INNODB TABLES FROM LINUX TO WINDOWS
This problem was already fixed in mysql-trunk as part of bug #11755924. I am
backporting the fix to mysql-5.1.
If we meet DB_TOO_MANY_CONCURRENT_TRXS during the execution tab_create_graph from row_create_table_for_mysql(), .ibd file for the table should be created already but was not deleted for the error handling.
rb:875 approved by Jimmy Yang
If we meet DB_TOO_MANY_CONCURRENT_TRXS during the execution tab_create_graph from row_create_table_for_mysql(), .ibd file for the table should be created already but was not deleted for the error handling.
rb:875 approved by Jimmy Yang
InnoDB: Remove HAVE_purify, UNIV_INIT_MEM_TO_ZERO, UNIV_SET_MEM_TO_ZERO.
The compile-time setting HAVE_purify can mask potential bugs.
It is being set in PB2 Valgrind runs. We should simply get rid of it,
and replace it with UNIV_MEM_INVALID() to declare uninitialized memory
as such in Valgrind-instrumented binaries.
os_mem_alloc_large(), ut_malloc_low(): Remove the parameter set_to_zero.
ut_malloc(): Define as a macro that invokes ut_malloc_low().
buf_pool_init(): Never initialize the buffer pool frames. All pages
must be initialized before flushing them to disk.
mem_heap_alloc(): Never initialize the allocated memory block.
os_mem_alloc_nocache(), ut_test_malloc(): Unused function, remove.
rb:813 approved by Jimmy Yang
COMMUNICATION PACKETS, ERROR_CODE: 1160
If idle FEDERATED table is evicted from the table cache when
a connection to remote server is lost, query that initiated
eviction may fail.
If this query is executed by slave SQL thread it may fail as well.
An error of close was stored in diagnostics area, which was later
attributed to the statement that caused eviction.
With this patch FEDERATED clears an error of close.
CREATE TABLE bug13510739 (c INTEGER NOT NULL, PRIMARY KEY (c)) ENGINE=INNODB;
INSERT INTO bug13510739 VALUES (1), (2), (3), (4);
DELETE FROM bug13510739 WHERE c=2;
HANDLER bug13510739 OPEN;
HANDLER bug13510739 READ `primary` = (2);
HANDLER bug13510739 READ `primary` NEXT; <-- crash
The bug is that in the particular testcase row_search_for_mysql() picked up
a delete-marked record and quit, leaving the cursor non-positioned state and
on the subsequent 'get next' call the code crashed because of the
non-positioned cursor.
In row0sel.cc (line numbers from mysql-trunk):
4653 if (rec_get_deleted_flag(rec, comp)) {
...
4679 if (index == clust_index && unique_search) {
4680
4681 err = DB_RECORD_NOT_FOUND;
4682
4683 goto normal_return;
4684 }
it quit from here, not storing the cursor position.
In contrast, if the record=2 is not found at all (e.g. sleep(1) after DELETE
to let the purge wipe it away completely) then 'get = 2' does find record=3
and quits from here:
4366 if (0 != cmp_dtuple_rec(search_tuple, rec, offsets)) {
...
4394 btr_pcur_store_position(pcur, &mtr);
4395
4396 err = DB_RECORD_NOT_FOUND;
4397 #if 0
4398 ut_print_name(stderr, trx, FALSE, index->name);
4399 fputs(" record not found 3\n", stderr);
4400 #endif
4401
4402 goto normal_return;
Another fix could be to extend the condition on line 4366 to hold only if
seach_tuple matches rec AND if rec is not delete marked.
Notice that in the above test case if we wait about 1 second somewhere after
DELETE and before 'get = 2', then the testcase does not crash and returns 4
instead. Not sure if this is the correct behavior, but this bugfix removes
the crash and makes the code return what it also returns in the non-crashing
case (if rec=2 is not found during 'get = 2', e.g. we have sleep(1) there).
Approved by: Marko (http://bur03.no.oracle.com/rb/r/863/)
The counter handler_read_key (SSV::ha_read_key_count) is incremented
incorrectly.
The mysql server maintains a per thread system_status_var (SSV)
object. This object contains among other things the counter
SSV::ha_read_key_count. The purpose of this counter is to measure the
number of requests to read a row based on a key (or the number of
index lookups).
This counter was wrongly incremented in the
ha_innobase::innobase_get_index(). The fix removes
this increment statement (for both innodb and innodb_plugin).
The various callers of the innobase_get_index() was checked to
determine if anybody must increment this counter (if they first call
innobase_get_index() and then perform an index lookup). It was found
that no caller of innobase_get_index() needs to worry about the
SSV::ha_read_key_count counter.
This bug ensures that a live downgrade from an InnoDB 5.6 with WL5756 and
a database created with innodb-page-size=8k or 4k will make a version 5.5
installation politely refuse to start. Instead of crashing or giving some
indication about a corrupted database, it will indicate the page size
difference.
This patch takes only that part of the Wl5756 patch that is needed to
protect against opening a tablespace that is stamped with a different
page size.
It also contains the change in dict_index_find_on_id_low() just in case
a database with another page size is created by recompiling a pre-WL5756
InnoDB.
WITH LARGE BUFFER POOL
(Note: this a backport of revno:3472 from mysql-trunk)
rb://845
approved by: Marko
When dropping a table (with an .ibd file i.e.: with
innodb_file_per_table set) we scan entire LRU to invalidate pages from
that table. This can be painful in case of large buffer pools as we hold
the buf_pool->mutex for the scan. Note that gravity of the problem does
not depend on the size of the table. Even with an empty table but a
large and filled up buffer pool we'll end up scanning a very long LRU
list.
The fix is to scan flush_list and just remove the blocks belonging to
the table from the flush_list, marking them as non-dirty. The blocks
are left in the LRU list for eventual eviction due to aging. The
flush_list is typically much smaller than the LRU list but for cases
where it is very long we have the solution of releasing the
buf_pool->mutex after scanning 1K pages.
buf_page_[set|unset]_sticky(): Use new IO-state BUF_IO_PIN to ensure
that a block stays in the flush_list and LRU list when we release
buf_pool->mutex. Previously we have been abusing BUF_IO_READ to achieve
this.
WITH MYISAM_USE_MMAP ENABLED
MySQL server can crash due to segmentation fault when
started with myisam_use_mmap.
The reason behind this being, while making a request to
unmap (munmap) the previously mapped memory (mmap), the
size passed was 7 bytes larger than the size requested at
the time of mapping. This can eventually unmap the adjacent
memory mapped block, belonging to some other memory-map pool.
Hence the subsequent call to mmap can map a region which was
still a valid memory mapped area.
Fixed by removing the extra 7-byte margin which was erroneously
added to the size, used for unmappping.
AND HANG IN SHOW TABLE STATUS.
ISSUE: Table corruption due to concurrent queries.
Different threads running insert and check
query leads to table corruption. Not properly locked,
rows are inserted in between check query.
SOLUTION: In check query mutex lock is acquired
for a longer time to handle concurrent
insert and check query.
NOTE: Additionally we backported the fix for CHECKSUM
issue(bug#11758979).
Remove btr_cur_t::ibuf_cnt. The only dependence on cursor->ibuf_cnt
was ibuf_set_entry_counter(). That code path was removed in the
original bug fix (marko.makela@oracle.com-20111107072802-dgwagejlpub0rjkd).
rb:819 approved by Jimmy Yang
rb://816
approved by: Marko Makela
The title is misleading. This bug was actually introduced by
bug 12635227 and was unearthed by a later optimization.
We need to free buf_page_t structs that we are allocating using
malloc() at shutdown.
I manually checked that all the conflicting InnoDB changes are in 5.5 already.
Two things I am not sure about - I commented them with XXX in this patch.
I will further check with the authors of the changesets whether these things
should be present or not.
a.k.a. Bug#7975 deadlock without any locking, simple select and update
Bug#7975 was reintroduced when the storage engine API was made
pluggable in MySQL 5.1. Instead of looking at thd->lex directly, we
rely on handler::extra(). But, we were looking at the wrong extra()
flag, and we were ignoring the TRX_DUP_REPLACE flag in places where we
should obey it.
innodb_replace.test: Add tests for hopefully all affected statement
types, so that bug should never ever resurface. This kind of tests
should have been added when fixing Bug#7975 in MySQL 5.0.3 in the
first place.
rb:806 approved by Sunny Bains
btr_pcur_restore_position_func(): When the cursor was positioned at
the tree infimum or supremum, initialize pos_state and latch_mode. The
assertion failed, because pos_state was BTR_PCUR_WAS_POSITIONED. In
the test failure of WL#5874, the purge thread attempted to restore the
cursor position on the infimum record (the clustered index was empty).
btr_pcur_detach(), btr_pcur_is_detached(): Unused functions, remove.
rb:804 approved by Inaam Rana
ibuf_insert_low(), the only caller of ibuf_set_entry_counter(), will
have latched an insert buffer bitmap page in bitmap_mtr before
invoking ibuf_set_entry_counter(). The latching order forbids any
further pages to be latched.
ibuf_set_entry_counter(): Renamed to ibuf_get_entry_counter(),
simplified the code and added comments.
Added the following symbols for predefined field numbers in change
buffer records:
#define IBUF_REC_FIELD_SPACE 0 /*!< in the pre-4.1 format,
the page number. later, the space_id */
#define IBUF_REC_FIELD_MARKER 1 /*!< starting with 4.1, a marker
consisting of 1 byte that is 0 */
#define IBUF_REC_FIELD_PAGE 2 /*!< starting with 4.1, the
page number */
#define IBUF_REC_FIELD_METADATA 3 /* the metadata field */
#define IBUF_REC_FIELD_USER 4 /* first user field */
rb:802 approved by Sunny Bains
row_rename_table_for_mysql(): Return DB_ERROR instead of DB_SUCCESS
when fil_rename_tablespace() returns an error. This bug was introduced
in the InnoDB Plugin.
Approved by Sunny Bains over IM.
Bug#12612184 RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE()
The fix introduced potentially more severe crash recovery problems
than the bug causes. Revert the fix for now.
This was an attempt to address problems with the Bug#12612184 fix.
Even with this follow-up fix, crash recovery can be broken.
Let us fix the bug later.
In the ON UPDATE CASCADE clause of FOREIGN KEY constraints, the
calculated update vector was not fully initialized. This bug was
introduced in the InnoDB Plugin when implementing support for
ROW_FORMAT=DYNAMIC.
Additionally, the data type information was not initialized, but
apparently it has never been needed in this case. Nevertheless, it is
not good programming practice to pass uninitialized values around.
calc_row_difference(): Declare the update field uninitialized in
Valgrind. Copy the data type information as well, except when the
field is SQL NULL. In the built-in InnoDB, initialize
ufield->extern_storage = FALSE (an initialization bug that had gone
unnoticed this far). The InnoDB Plugin and later have this flag to
dfield_t and have always initialized it properly.
row_ins_cascade_calc_update_vec(): Reduce the scope of some
pointers. Initialize orig_len. (This caused the bug in InnoDB Plugin
and later.)
row_ins_foreign_check_on_constraint(): Simplify a condition. Declare
the update vector uninitialized.
rb:771 approved by Jimmy Yang
TESTS: CRASH, CORRUPTION, 4G MEMOR
Issue: Valgrind errors due to checksum and optimize
query against archive tables with null columns.
Table record buffer was not initialized.
Solution: Initialize the record buffer.