storage/innobase/include/sync0rw.ic:
Prerequisite for compiling with gcc4 on solaris: ignore result from
os_compare_and_swap_ulint
storage/myisam/mi_dynrec.c:
Prerequisite for compiling with gcc4 on solaris: cast to void*
There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
Backporting the fix from MySQL 5.5 to 5.1
rb:961
rb:947
There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
rb:947
approved by Jimmy Yang
print page dump
buf_page_print(): Remove the ut_ad(0) from the beginning. Add two flags
(enum buf_page_print_flags) that can be bitwise-ORed together:
BUF_PAGE_PRINT_NO_CRASH:
Do not crash debug builds at the end of buf_page_print().
BUF_PAGE_PRINT_NO_FULL:
Do not print the full page dump. This can be useful when adding
diagnostic printout to flushing or to the doublewrite buffer.
trx_sys_doublewrite_init_or_restore_page(): Replace exit(1) with ut_error,
so that we can get a core dump if this extraordinary condition happens.
rb:924 approved by Sunny Bains
This fix does not remove the underlying cause of the assertion
failure. It just works around the problem, allowing a corrupted
secondary index to be fixed by DROP INDEX and CREATE INDEX (or in the
worst case, by re-creating the table).
ibuf_delete(): If the record to be purged is the last one in the page
or it is not delete-marked, refuse to purge it. Instead, write an
error message to the error log and let a debug assertion fail.
ibuf_set_del_mark(): If the record to be delete-marked is not found,
display some more information in the error log and let a debug
assertion fail.
row_undo_mod_del_unmark_sec_and_undo_update(),
row_upd_sec_index_entry(): Let a debug assertion fail when the record
to be delete-marked is not found.
buf_page_print(): Add ut_ad(0) so that corruption will be more
prominent in stress testing with debug binaries. Add ut_ad(0) here and
there where corruption is noticed.
btr_corruption_report(): Display some data on page_is_comp() mismatch.
btr_assert_not_corrupted(): A wrapper around btr_corruption_report().
Assert that page_is_comp() agrees with the table flags.
rb:911 approved by Inaam Rana
ISSUES WITH COPYING PARTITIONED INNODB TABLES FROM LINUX TO WINDOWS
This problem was already fixed in mysql-trunk as part of bug #11755924. I am
backporting the fix to mysql-5.1.
If we meet DB_TOO_MANY_CONCURRENT_TRXS during the execution tab_create_graph from row_create_table_for_mysql(), .ibd file for the table should be created already but was not deleted for the error handling.
rb:875 approved by Jimmy Yang
If we meet DB_TOO_MANY_CONCURRENT_TRXS during the execution tab_create_graph from row_create_table_for_mysql(), .ibd file for the table should be created already but was not deleted for the error handling.
rb:875 approved by Jimmy Yang
InnoDB: Remove HAVE_purify, UNIV_INIT_MEM_TO_ZERO, UNIV_SET_MEM_TO_ZERO.
The compile-time setting HAVE_purify can mask potential bugs.
It is being set in PB2 Valgrind runs. We should simply get rid of it,
and replace it with UNIV_MEM_INVALID() to declare uninitialized memory
as such in Valgrind-instrumented binaries.
os_mem_alloc_large(), ut_malloc_low(): Remove the parameter set_to_zero.
ut_malloc(): Define as a macro that invokes ut_malloc_low().
buf_pool_init(): Never initialize the buffer pool frames. All pages
must be initialized before flushing them to disk.
mem_heap_alloc(): Never initialize the allocated memory block.
os_mem_alloc_nocache(), ut_test_malloc(): Unused function, remove.
rb:813 approved by Jimmy Yang
This bug ensures that a live downgrade from an InnoDB 5.6 with WL5756 and
a database created with innodb-page-size=8k or 4k will make a version 5.5
installation politely refuse to start. Instead of crashing or giving some
indication about a corrupted database, it will indicate the page size
difference.
This patch takes only that part of the Wl5756 patch that is needed to
protect against opening a tablespace that is stamped with a different
page size.
It also contains the change in dict_index_find_on_id_low() just in case
a database with another page size is created by recompiling a pre-WL5756
InnoDB.
WITH LARGE BUFFER POOL
(Note: this a backport of revno:3472 from mysql-trunk)
rb://845
approved by: Marko
When dropping a table (with an .ibd file i.e.: with
innodb_file_per_table set) we scan entire LRU to invalidate pages from
that table. This can be painful in case of large buffer pools as we hold
the buf_pool->mutex for the scan. Note that gravity of the problem does
not depend on the size of the table. Even with an empty table but a
large and filled up buffer pool we'll end up scanning a very long LRU
list.
The fix is to scan flush_list and just remove the blocks belonging to
the table from the flush_list, marking them as non-dirty. The blocks
are left in the LRU list for eventual eviction due to aging. The
flush_list is typically much smaller than the LRU list but for cases
where it is very long we have the solution of releasing the
buf_pool->mutex after scanning 1K pages.
buf_page_[set|unset]_sticky(): Use new IO-state BUF_IO_PIN to ensure
that a block stays in the flush_list and LRU list when we release
buf_pool->mutex. Previously we have been abusing BUF_IO_READ to achieve
this.
Remove btr_cur_t::ibuf_cnt. The only dependence on cursor->ibuf_cnt
was ibuf_set_entry_counter(). That code path was removed in the
original bug fix (marko.makela@oracle.com-20111107072802-dgwagejlpub0rjkd).
rb:819 approved by Jimmy Yang
Bug#12612184 RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE()
The fix introduced potentially more severe crash recovery problems
than the bug causes. Revert the fix for now.
This was an attempt to address problems with the Bug#12612184 fix.
Even with this follow-up fix, crash recovery can be broken.
Let us fix the bug later.
sql/sql_insert.cc:
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
******
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
sql/sql_table.cc:
small cleanup
******
small cleanup
rw_lock_x_lock_func(): Assert that the thread is not already holding
the lock in a conflicting mode (RW_LOCK_SHARED).
rw_lock_s_lock_func(): Assert that the thread is not already holding
the lock in a conflicting mode (RW_LOCK_EX).
Bug 12980094 - ASSERTION IN INNODB DETECTED IN RQG_PARTITION_DDL
Bug 13034534 - RQG TESTS FAIL ON WINDOWS WITH CRASH NEAR RW_LOCK_DEBUG_PRINT
All access to struct rw_lock_debug_struct must be protected by rw_lock_debug_mutex_enter().
This fix was accidentally pushed to mysql-5.1 after the 5.1.59 clone-off in
bzr revision id marko.makela@oracle.com-20110829081642-z0w992a0mrc62s6w
with the fix of Bug#12704861 Corruption after a crash during BLOB update
but not merged to mysql-5.5 and upwards.
In the Barracuda formats, the clustered index record no longer
contains a prefix of off-page columns. Because of this, the undo log
must contain these prefixes, so that purge and multi-versioning will
continue to work. However, this also means that an undo log record can
become too big to fit in an undo log page. (It is a limitation of the
undo log that undo records cannot span across multiple pages.)
In case the checks for undo log size fail when CREATE TABLE or CREATE
INDEX is executed, we need a fallback that blocks a modification
operation when the undo log record would exceed the maximum size.
trx_undo_free_last_page_func(): Renamed from trx_undo_free_page_in_rollback().
Define the trx_t parameter only in debug builds.
trx_undo_free_last_page(): Wrapper for trx_undo_free_last_page_func().
Pass the trx_t parameter only in debug builds.
trx_undo_truncate_end_func(): Renamed from trx_undo_truncate_end().
Define the trx_t parameter only in debug builds. Rewrite a for(;;) loop
as a while loop for clarity.
trx_undo_truncate_end(): Wrapper for from trx_undo_truncate_end_func().
Pass the trx_t parameter only in debug builds.
trx_undo_erase_page_end(): Return TRUE if the page was non-empty
to begin with. Refuse to erase empty pages.
trx_undo_report_row_operation(): If the page for which the undo log
was too big was empty, free the undo page and return DB_TOO_BIG_RECORD.
rb:749 approved by Inaam Rana
The fix of Bug#12612184 broke crash recovery. When a record that
contains off-page columns (BLOBs) is updated, we must first write redo
log about the BLOB page writes, and only after that write the redo log
about the B-tree changes. The buggy fix would log the B-tree changes
first, meaning that after recovery, we could end up having a record
that contains a null BLOB pointer.
Because we will be redo logging the writes off the off-page columns
before the B-tree changes, we must make sure that the pages chosen for
the off-page columns are free both before and after the B-tree
changes. In this way, the worst thing that can happen in crash
recovery is that the BLOBs are written to free pages, but the B-tree
changes are not applied. The BLOB pages would correctly remain free in
this case. To achieve this, we must allocate the BLOB pages in the
mini-transaction of the B-tree operation. A further quirk is that BLOB
pages are allocated from the same file segment as leaf pages. Because
of this, we must temporarily "hide" any leaf pages that were freed
during the B-tree operation by "fake allocating" them prior to writing
the BLOBs, and freeing them again before the mtr_commit() of the
B-tree operation, in btr_mark_freed_leaves().
btr_cur_mtr_commit_and_start(): Remove this faulty function that was
introduced in the Bug#12612184 fix. The problem that this function was
trying to address was that when we did mtr_commit() the BLOB writes
before the mtr_commit() of the update, the new BLOB pages could have
overwritten clustered index B-tree leaf pages that were freed during
the update. If recovery applied the redo log of the BLOB writes but
did not see the log of the record update, the index tree would be
corrupted. The correct solution is to make the freed clustered index
pages unavailable to the BLOB allocation. This function is also a
likely culprit of InnoDB hangs that were observed when testing the
Bug#12612184 fix.
btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
or freed (nonfree=FALSE) before committing the mini-transaction.
btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).
btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or if this is a "fake allocation"
(init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).
btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
initialized and X-latched in a different mini-transaction than the one
that is used for the allocation. Invoke btr_page_alloc_low(). If a
clustered index leaf page was previously freed in mtr, remove it from
the memo of previously freed pages.
btr_page_free(): Assert that the page is a B-tree page and it has been
X-latched by the mini-transaction. If the freed page was a leaf page
of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
the mini-transaction.
btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.
fsp_alloc_from_free_frag(): Refactored from
fsp_alloc_free_page(). Allocate the specified page from a partially
free extent.
fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or NULL if this is a "fake allocation"
that prevents the reuse of a previously freed B-tree page for BLOB
storage. If init_mtr==NULL, try harder to reallocate the specified page
and assert that it succeeded.
fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
specifying the mini-transaction where the page should be initialized.
Do not allow init_mtr == NULL, because this function is never to be
used for "fake allocations".
mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
mtr->freed_clust_leaf for quickly determining if any
MTR_MEMO_FREE_CLUST_LEAF operations have been posted.
row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
the mini-transaction as the alloc_mtr to
btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
row_build(): Correct a comment, and add a debug assertion that a
record that contains NULL BLOB pointers must be a fresh insert.
row_upd_clust_rec(): When columns are being moved off-page, invoke
btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
buf_reset_check_index_page_at_flush(): Remove. The function
fsp_init_file_page_low() already sets
bpage->check_index_page_at_flush=FALSE.
There is a known issue in tablespace extension. If the request to
allocate a BLOB page leads to the tablespace being extended, crash
recovery could see BLOB writes to pages that are off the tablespace
file bounds. This should trigger an assertion failure in fil_io() at
crash recovery. The safe thing would be to write redo log about the
tablespace extension to the mini-transaction of the BLOB write, not to
the mini-transaction of the record update. However, there is no redo
log record for file extension in the current redo log format.
rb:693 approved by Sunny Bains
Also addressed issues in bug #11745133, where we could mark a table
corrupted instead of crashing the server when found a corrupted buffer/page
if the table created with innodb_file_per_table on.
Some ut_a(!rec_offs_any_null_extern()) assertion failures are indicating
genuine BLOB bugs, others are bogus failures when rolling back incomplete
transactions at crash recovery. This needs more work, and until I get a
chance to work on it, other testing must not be disrupted by this.
If UNIV_DEBUG or UNIV_BLOB_LIGHT_DEBUG is enabled, add
!rec_offs_any_null_extern() assertions, ensuring that records do not
contain null pointers to externally stored columns in inappropriate
places.
btr_cur_optimistic_update(): Assert !rec_offs_any_null_extern().
Incomplete records must never be updated or deleted. This assertion
will cover also the pessimistic route.
row_build(): Assert !rec_offs_any_null_extern(). Search tuples must
never be built from incomplete index entries.
row_rec_to_index_entry(): Assert !rec_offs_any_null_extern() unless
ROW_COPY_DATA is requested. ROW_COPY_DATA is used for
multi-versioning, and therefore it might be valid to copy the most
recent (uncommitted) version while it contains a null pointer to
off-page columns.
row_vers_build_for_consistent_read(),
row_vers_build_for_semi_consistent_read(): Assert !rec_offs_any_null_extern()
on all versions except the most recent one.
trx_undo_prev_version_build(): Assert !rec_offs_any_null_extern() on
the previous version.
rb:682 approved by Sunny Bains
Refactor the !rec_offs_any_extern relaxation in row_build().
trx_assert_active(trx_id): Assert that the given transaction is active.
(In the 5.1 built-in InnoDB, there is no trx->is_recovered field.)
trx_assert_recovered(trx_id): Assert that the given transaction is
active and has been recovered after a crash.
row_build(): Replace a bunch of code with an assertion that invokes
trx_assert_active() or trx_assert_recovered() and row_get_rec_trx_id().
row_get_trx_id_offset(): Make the function inlined. Remove the unused
parameter rec, and make all parameters const.
row_get_rec_trx_id(), row_get_rec_roll_ptr(): Make all parameters const.
rb:691 approved by Jimmy Yang
Replace UNIV_BLOB_NULL_DEBUG with UNIV_DEBUG||UNIV_BLOB_LIGHT_DEBUG.
Fix known bogus failures.
btr_cur_optimistic_update(): If rec_offs_any_null_extern(), assert that
the current transaction is an incomplete transaction that is being
rolled back in crash recovery.
row_build(): If rec_offs_any_null_extern(), assert that the transaction
that last updated the record was recovered during crash recovery
(and will soon be rolled back).
btr_cur_compress_if_useful(), btr_compress(): Add the parameter ibool
adjust. If adjust=TRUE, adjust the cursor position after compressing
the page.
btr_lift_page_up(): Return a pointer to the father page.
BTR_KEEP_POS_FLAG: A new flag for btr_cur_pessimistic_update().
btr_cur_pessimistic_update(): If *big_rec != NULL and flags &
BTR_KEEP_POS_FLAG, keep the cursor positioned on the updated record.
Also, do not release the index tree x-lock if *big_rec != NULL.
btr_cur_mtr_commit_and_start(): Commits and restarts a
mini-transaction so that it will retain an x-lock on index->lock and
the page of the cursor. This is invoked when
btr_cur_pessimistic_update() returns *big_rec != NULL.
In all callers of btr_cur_pessimistic_update() that do not pass
BTR_KEEP_POS_FLAG, assert that *big_rec == NULL.
btr_cur_compress(): Unused function [in the built-in MySQL 5.1], remove.
page_rec_get_nth(): Return the nth record on the page (an inverse
function of page_rec_get_n_recs_before()). Refactored from
page_get_middle_rec().
page_get_middle_rec(): Invoke page_rec_get_nth().
page_cur_insert_rec_zip_reorg(): Make use of the page directory
shortcuts in page_rec_get_nth() instead of scanning the whole list of
records.
row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update().
row_ins_index_entry_low(): If row_ins_clust_index_entry_by_modify()
returns a big_rec, invoke btr_cur_mtr_commit_and_start() in order to
commit and start the mini-transaction without releasing the x-locks on
index->lock and the cursor page, and write the big_rec. Releasing the
page latch in mtr_commit() caused a race condition.
row_upd_clust_rec(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update(). If it returns a big_rec, invoke
btr_cur_mtr_commit_and_start() in order to commit and start the
mini-transaction without releasing the x-locks on index->lock and the
cursor page, and write the big_rec. Releasing the page latch in
mtr_commit() caused a race condition.
sync_thread_add_level(): Add the parameter ibool relock. When TRUE,
bypass the latching order rules.
rw_lock_add_debug_info(): For nested X-lock requests, pass relock=TRUE
to sync_thread_add_level().
rb:678 approved by Jimmy Yang
With this change, the index prefix column length lifted from 767 bytes
to 3072 bytes if "innodb_large_prefix" is set to "true".
rb://603 approved by Marko
mtr_start(): Declare the mtr memory area uninitialized in Valgrind
before initializing the fields.
mtr_commit(): Declare everything uninitialized except
mtr->start_lsn, mtr->end_lsn and mtr->state.
lock_clust_rec_some_has_impl(), row_get_rec_trx_id(),
lock_rec_queue_validate(), lock_table_other_has_incompatible(),
lock_table_has_to_wait_in_queue(), lock_table_queue_validate():
Add const qualifiers.
row_get_trx_id_offset(): Add const qualifiers. Keep the parameter rec
only in UNIV_DEBUG builds. Inline the function.
lock_rec_validate_page(): Take the buffer block as a parameter, to
avoid a buf_page_get_gen() call in most cases.
lock_rec_validate_page_low(): A version of lock_rec_validate_page()
that assumes that the lock system mutexes are already being held.
lock_rec_get_next_on_page_const(): A const variant of
lock_rec_get_next_on_page().
lock_validate(): Do not release the lock system mutex while
buffer-fixing the block for the lock_rec_validate_page() call.
Releasing the mutex apparently caused the assertion failure.
rb:665 approved by Sunny Bains
------------------------------------------------------------
revno 2876.244.305
revision id marko.makela@oracle.com-20110413082211-e6ouhjz5rmqxcqap
parent marko.makela@oracle.com-20110413075948-kvytmc37ye1nt7d9
committer Marko Mäkelä <marko.makela@oracle.com>
branch nick 5.6-innodb
timestamp Wed 2011-04-13 11:22:11 +0300
message:
Suppress the Bug #58815 (Bug #11765812) assertion failure.
buf_page_get_gen(): Introduce BUF_GET_POSSIBLY_FREED for suppressing the
check that the file page must not have been freed.
btr_estimate_n_rows_in_range_on_level(): Pass BUF_GET_POSSIBLY_FREED and
explain in the comments why this is needed and why it should be mostly
harmless to ignore the problem. If InnoDB had always initialized all
unused fields in data files, no problem would exist.
This change does not fix the bug, it just "shoots the messenger".
rb:647 approved by Jimmy Yang
DB_COL_APPEARS_TWICE_IN_INDEX: Remove. This condition is already
checked and reported by MySQL before passing the index definition to
the storage engine.
row_create_index_for_mysql(): Remove the redundant check for
DB_COL_APPEARS_TWICE_IN_INDEX. When enforcing the column prefix index
limit, invoke dict_mem_index_free(index) to plug the memory leak. In
the loop, use index->n_def instead of dict_index_get_n_fields(index),
because the latter would be 0 for indexes that have not been copied to
the data dictionary cache.
innodb-use-sys-malloc.test:
Add test cases for attempting to trigger the error checks in
row_create_index_for_mysql(). Before MySQL 5.5 and WL#5743, the leak
is only reproducible if ha_innobase::max_supported_key_part_length()
returned a higher limit than the one used in
row_create_index_for_mysql().
In MySQL 5.5 and later, the leak is reproducible with
innodb_large_prefix=true.
rb:688 approved by Jimmy Yang
The innoDB global variable srv_lower_case_table_names is set to the value of lower_case_table_names declared in mysqld.h server in ha_innodb.cc. Since this variable can change at runtime, it is reset for each handler call to ::create, ::open, ::rename_table & ::delete_table.
But it is possible for tables to be implicitly opened before an explicit handler call is made when an engine is first started or restarted. I was able to reproduce that with the testcase in this patch on a version of InnoDB from 2 weeks ago. It seemed like the change buffer entries for the secondary key was getting put into pages after the restart. (But I am not sure, I did not write down the call stack while it was reproducing.) In the current code, the implicit open, which is actually a call to dict_load_foreigns(), does not occur with this testcase.
The change is to replace srv_lower_case_table_names by an interface function in innodb.cc that retrieves the server global variable when it is needed.
causes future shutdown hang
InnoDB would hang on shutdown if any XA transactions exist in the
system in the PREPARED state. This has been masked by the fact that
MySQL would roll back any PREPARED transaction on shutdown, in the
spirit of Bug #12161 Xa recovery and client disconnection.
[mysql-test-run] do_shutdown_server: Interpret --shutdown_server 0 as
a request to kill the server immediately without initiating a
shutdown procedure.
xid_cache_insert(): Initialize XID_STATE::rm_error in order to avoid a
bogus error message on XA ROLLBACK of a recovered PREPARED transaction.
innobase_commit_by_xid(), innobase_rollback_by_xid(): Free the InnoDB
transaction object after rolling back a PREPARED transaction.
trx_get_trx_by_xid(): Only consider transactions whose
trx->is_prepared flag is set. The MySQL layer seems to prevent
attempts to roll back connected transactions that are in the PREPARED
state from another connection, but it is better to play it safe. The
is_prepared flag was introduced in the InnoDB Plugin.
trx_n_prepared: A new counter, counting the number of InnoDB
transactions in the PREPARED state.
logs_empty_and_mark_files_at_shutdown(): On shutdown, allow
trx_n_prepared transactions to exist in the system.
trx_undo_free_prepared(), trx_free_prepared(): New functions, to free
the memory objects of PREPARED transactions on shutdown. This is not
needed in the built-in InnoDB, because it would collect all allocated
memory on shutdown. The InnoDB Plugin needs this because of
innodb_use_sys_malloc.
trx_sys_close(): Invoke trx_free_prepared() on all remaining
transactions.
On shutdown, do not exit threads in os_event_wait(). This method of
exiting was only used by the I/O handler threads. Exit them on a
higher level.
os_event_wait_low(), os_event_wait_time_low(): Do not exit on shutdown.
os_thread_exit(), ut_dbg_assertion_failed(), ut_print_timestamp(): Add
attribute cold, so that GCC knows that these functions are rarely
invoked and can be optimized for size.
os_aio_linux_collect(): Return on shutdown.
os_aio_linux_handle(), os_aio_simulated_handle(), os_aio_windows_handle():
Set *message1 = *message2 = NULL and return TRUE on shutdown.
fil_aio_wait(): Return on shutdown.
logs_empty_and_mark_files_at_shutdown(): Even in very fast shutdown
(innodb_fast_shutdown=2), allow the background threads to exit, but
skip the flushing and log checkpointing.
innobase_shutdown_for_mysql(): Always wait for all the threads to exit.
rb:633 approved by Sunny Bains
Remove most references to thread id in InnoDB. Three references
remain: the current holder of a mutex, and the current x-lock holder
of a rw-lock, and some references in UNIV_SYNC_DEBUG checks. This
allows MySQL to change the thread associated to a client connection.
Tighten the UNIV_SYNC_DEBUG checks, trying to ensure that no InnoDB
mutex or x-lock is being held when returning control to MySQL. The
only semaphore that may be held is the btr_search_latch in shared mode.
sync_thread_levels_empty_except_dict(): A wrapper for
sync_thread_levels_empty_gen(TRUE).
sync_thread_levels_nonempty_trx(): Check that the current thread is
not holding any InnoDB semaphores, except btr_search_latch if
trx->has_search_latch.
sync_thread_levels_empty(): Unused function; remove.
trx_t: Remove mysql_thread_id and mysql_process_no.
srv_slot_t: Remove id and handle.
row_search_for_mysql(), srv_conc_enter_innodb(),
srv_conc_force_enter_innodb(), srv_conc_force_exit_innodb(),
srv_conc_exit_innodb(), srv_suspend_mysql_thread: Assert
!sync_thread_levels_nonempty_trx().
rb:634 approved by Sunny Bains
sync_array_print_long_waits(): Return the longest waiting thread ID
and the longest waited-for lock. Only if those remain unchanged
between calls in srv_error_monitor_thread(), increment
fatal_cnt. Otherwise, reset fatal_cnt.
Background: There is a built-in watchdog in InnoDB whose purpose is to
kill the server when some thread is stuck waiting for a mutex or
rw-lock. Before this fix, the logic was flawed.
The function sync_array_print_long_waits() returns TRUE if it finds a
lock wait that exceeds 10 minutes (srv_fatal_semaphore_wait_threshold).
The function srv_error_monitor_thread() will kill the server if this
happens 10 times in a row (fatal_cnt reaches 10), checked every 30
seconds. This is wrong, because this situation does not mean that the
server is hung. If the server is very busy for a little over 15
minutes, it will be killed.
Consider this example. Thread T1 is waiting for mutex M. Some time
later, threads T2..Tn start waiting for the same mutex M. If T1 keeps
waiting for 600 seconds, fatal_cnt will be incremented to 1. So far,
so good. Now, if M is granted to T1, the server was obviously not
stuck. But, T2..Tn keeps waiting, and their wait time will be longer
than 600 seconds. If 5 minutes later, some Tn has still been waiting
for more than 10 minutes for the mutex M, the server can be killed,
even though it is not stuck.
rb:622 approved by Jimmy Yang
ibuf_inside(), ibuf_enter(), ibuf_exit(): Add the parameter mtr. The
flag is no longer kept in the thread-local storage but in the
mini-transaction (mtr->inside_ibuf).
mtr_start(): Clean up the comment and remove the unused return value.
mtr_commit(): Assert !ibuf_inside(mtr) in debug builds.
ibuf_mtr_start(): Like mtr_start(), but sets the flag.
ibuf_mtr_commit(), ibuf_btr_pcur_commit_specify_mtr(): Wrappers that
assert ibuf_inside().
buf_page_get_zip(), buf_page_init_for_read(),
buf_read_ibuf_merge_pages(), fil_io(), ibuf_free_excess_pages(),
ibuf_contract_ext(): Remove assertions on ibuf_inside(), because a
mini-transaction is not available.
buf_read_ahead_linear(): Add the parameter inside_ibuf.
ibuf_restore_pos(): When this function returns FALSE, it commits mtr
and must therefore do ibuf_exit(mtr).
ibuf_delete_rec(): This function commits mtr and must therefore do
ibuf_exit(mtr).
ibuf_rec_get_page_no(), ibuf_rec_get_space(), ibuf_rec_get_info(),
ibuf_rec_get_op_type(), ibuf_build_entry_from_ibuf_rec(),
ibuf_rec_get_volume(), ibuf_get_merge_page_nos(),
ibuf_get_volume_buffered_count(), ibuf_get_entry_counter_low(): Add
the parameter mtr in debug builds, for asserting ibuf_inside(mtr).
rb:585 approved by Sunny Bains
Remove the slot_no member of struct thr_local_struct.
enum srv_thread_type: Remove unused thread types.
srv_get_thread_type(): Unused function, remove.
thr_local_get_slot_no(), thr_local_set_slot_no(): Remove.
srv_thread_type_validate(), srv_slot_get_type(): New functions, for debugging.
srv_table_reserve_slot(): Return the srv_slot_t* directly. Do not create
thread-local storage.
srv_suspend_thread(): Get the srv_slot_t* as parameter. Return void;
the caller knows slot->event already.
srv_thread_has_reserved_slot(), srv_release_threads(): Assert
srv_thread_type_validate(type).
srv_init(): Use mem_zalloc() instead of mem_alloc(). Replace
srv_table_get_nth_slot(), because it now asserts that the kernel_mutex
is being held.
srv_master_thread(), srv_purge_thread(): Remember the slot from
srv_table_reserve_slot().
rb:629 approved by Inaam Rana