sync_array_print_long_waits(): Return the longest waiting thread ID
and the longest waited-for lock. Only if those remain unchanged
between calls in srv_error_monitor_thread(), increment
fatal_cnt. Otherwise, reset fatal_cnt.
Background: There is a built-in watchdog in InnoDB whose purpose is to
kill the server when some thread is stuck waiting for a mutex or
rw-lock. Before this fix, the logic was flawed.
The function sync_array_print_long_waits() returns TRUE if it finds a
lock wait that exceeds 10 minutes (srv_fatal_semaphore_wait_threshold).
The function srv_error_monitor_thread() will kill the server if this
happens 10 times in a row (fatal_cnt reaches 10), checked every 30
seconds. This is wrong, because this situation does not mean that the
server is hung. If the server is very busy for a little over 15
minutes, it will be killed.
Consider this example. Thread T1 is waiting for mutex M. Some time
later, threads T2..Tn start waiting for the same mutex M. If T1 keeps
waiting for 600 seconds, fatal_cnt will be incremented to 1. So far,
so good. Now, if M is granted to T1, the server was obviously not
stuck. But, T2..Tn keeps waiting, and their wait time will be longer
than 600 seconds. If 5 minutes later, some Tn has still been waiting
for more than 10 minutes for the mutex M, the server can be killed,
even though it is not stuck.
rb:622 approved by Jimmy Yang
ibuf_inside(), ibuf_enter(), ibuf_exit(): Add the parameter mtr. The
flag is no longer kept in the thread-local storage but in the
mini-transaction (mtr->inside_ibuf).
mtr_start(): Clean up the comment and remove the unused return value.
mtr_commit(): Assert !ibuf_inside(mtr) in debug builds.
ibuf_mtr_start(): Like mtr_start(), but sets the flag.
ibuf_mtr_commit(), ibuf_btr_pcur_commit_specify_mtr(): Wrappers that
assert ibuf_inside().
buf_page_get_zip(), buf_page_init_for_read(),
buf_read_ibuf_merge_pages(), fil_io(), ibuf_free_excess_pages(),
ibuf_contract_ext(): Remove assertions on ibuf_inside(), because a
mini-transaction is not available.
buf_read_ahead_linear(): Add the parameter inside_ibuf.
ibuf_restore_pos(): When this function returns FALSE, it commits mtr
and must therefore do ibuf_exit(mtr).
ibuf_delete_rec(): This function commits mtr and must therefore do
ibuf_exit(mtr).
ibuf_rec_get_page_no(), ibuf_rec_get_space(), ibuf_rec_get_info(),
ibuf_rec_get_op_type(), ibuf_build_entry_from_ibuf_rec(),
ibuf_rec_get_volume(), ibuf_get_merge_page_nos(),
ibuf_get_volume_buffered_count(), ibuf_get_entry_counter_low(): Add
the parameter mtr in debug builds, for asserting ibuf_inside(mtr).
rb:585 approved by Sunny Bains
Remove the slot_no member of struct thr_local_struct.
enum srv_thread_type: Remove unused thread types.
srv_get_thread_type(): Unused function, remove.
thr_local_get_slot_no(), thr_local_set_slot_no(): Remove.
srv_thread_type_validate(), srv_slot_get_type(): New functions, for debugging.
srv_table_reserve_slot(): Return the srv_slot_t* directly. Do not create
thread-local storage.
srv_suspend_thread(): Get the srv_slot_t* as parameter. Return void;
the caller knows slot->event already.
srv_thread_has_reserved_slot(), srv_release_threads(): Assert
srv_thread_type_validate(type).
srv_init(): Use mem_zalloc() instead of mem_alloc(). Replace
srv_table_get_nth_slot(), because it now asserts that the kernel_mutex
is being held.
srv_master_thread(), srv_purge_thread(): Remember the slot from
srv_table_reserve_slot().
rb:629 approved by Inaam Rana
Bug #11766501: Multiple RBS break the get rseg with mininum trx_t::no code during purge
Bug# 59291 changes:
Main problem is that truncating the UNDO log at the completion of every
trx_purge() call is expensive as the number of rollback segments is increased.
We truncate after a configurable amount of pages. The innodb_purge_batch_size
parameter is used to control when InnoDB does the actual truncate. The truncate
is done once after 128 (or TRX_SYS_N_RSEGS iterations). In other words we
truncate after purge 128 * innodb_purge_batch_size. The smaller the batch
size the quicker we truncate.
Introduce a new parameter that allows how many rollback segments to use for
storing REDO information. This is really step 1 in allowing complete control
to the user over rollback space management.
New parameters:
i) innodb_rollback_segments = number of rollback_segments to use
(default is now 128) dynamic parameter, can be changed anytime.
Currently there is little benefit in changing it from the default.
Optimisations in the patch.
i. Change the O(n) behaviour of trx_rseg_get_on_id() to O(log n)
Backported from 5.6. Refactor some of the binary heap code.
Create a new include/ut0bh.ic file.
ii. Avoid truncating the rollback segments after every purge.
Related changes that were moved to a separate patch:
i. Purge should not do any flushing, only wait for space to be free so that
it only does purging of records unless it is held up by a long running
transaction that is preventing it from progressing.
ii. Give the purge thread preference over transactions when acquiring the
rseg->mutex during commit. This to avoid purge blocking unnecessarily
when getting the next rollback segment to purge.
Bug #11766501 changes:
Add the rseg to the min binary heap under the cover of the kernel mutex and
the binary heap mutex. This ensures the ordering of the min binary heap.
The two changes have to be committed together because they share the same
that fixes both issues.
rb://567 Approved by: Inaam Rana.
rb://566
approved by: Sunny
When using native aio on linux each IO helper thread should be able to
handle upto 256 IO requests. The number 256 is the same which is used
for simulated aio as well. In case of windows where we also use native
aio this limit is 32 because of OS constraints. It seems that we are
using the limit of 32 for all the platforms where we are using native
aio. The fix is to use 256 on all platforms except windows (when native
aio is enabled on windows)
"rows examined" estimates". This change implements "innodb_stats_method"
with options of "nulls_equal", "nulls_unequal" and "null_ignored".
rb://553 approved by Marko
Check whether the master and purge thread are active after creating them. Do
not proceed until both threads have started. We do this by checking whether a
slot has been reserved by both the respective threads.
Add srv_thread_has_reserved_slot() returns slot no or ULINT_UNDEFINED.
rb://536 Approved by Jimmy
InnoDB does not attempt to handle lower_case_table_names == 2 when looking
up foreign table names and referenced table name. It turned that server
variable into a boolean and ignored the possibility of it being '2'.
The setting lower_case_table_names == 2 means that it should be stored and
displayed in mixed case as given, but compared internally in lower case.
Normally the server deals with this since it stores table names. But
InnoDB stores referential constraints for the server, so it needs to keep
track of both lower case and given names.
This solution creates two table name pointers for each foreign and referenced
table name. One to display the name, and one to look it up. Both pointers
point to the same allocated string unless this setting is 2. So the overhead
added is not too much.
Two functions are created in dict0mem.c to populate the ..._lookup versions
of these pointers. Both dict_mem_foreign_table_name_lookup_set() and
dict_mem_referenced_table_name_lookup_set() are called 5 times each.
Fix a race condition in srv_master_thread(). We need to acquire the kernel
mutex before calling srv_table_reserve_slot(). Add a mutex_own() assertion
in srv_table_reserve_slot().
Add new function os_cond_wait_timed(). Change the os_thread_sleep() calls
to timed conditional waits. Signal the background threads during the shutdown
phase so that we avoid waiting for the sleep to timeout thus saving some time.
rb://439 -- Approved by Jimmy Yang
Issue an error message to the error log when
trx->dict_operation_lock_mode == RW_X_LATCH in
srv_suspend_mysql_thread(). Transactions that modify InnoDB
data dictionary tables must be free of lock waits, because they
must be holding the data dictionary latch in exclusive mode.
The transactions must not be accessing any other tables other than
the data dictionary tables.
The handling of RW_X_LATCH was accidentally added in the InnoDB Plugin,
as a wrong fix of an assertion failure. (Fast index creation was accessing
both data dictionary tables and user tables in the same transaction.)
Silence the UNIV_SYNC_DEBUG assertion failure while upgrading old data files
to multiple rollback segments during server startup. Because the upgrade
takes place while InnoDB is running a single thread, we can safely ignore the
latching order checks without fearing deadlocks.
innobase_start_or_create_for_mysql(): Set srv_is_being_started = FALSE
only after trx_sys_create_rsegs() has completed.
sync_thread_add_level(): If srv_is_being_started, ignore latching order
violations for SYNC_TRX_SYS_HEADER and SYNC_IBUF_BITMAP.
Create all the non-IO threads after creating the extra rollback segments.
Patch originally from Marko with some additions by Sunny.
This patch was originally developed by Vladislav Vaintroub.
The main changes are:
* Use TryEnterCriticalSection in os_fast_mutex_trylock().
* Use lightweight condition variables on Vista or later Windows;
but fall back to events on older Windows, such as XP.
This patch also fixes the following bugs:
bug# 52102 InnoDB Plugin shows performance drop compared to InnoDB
on Windows
bug# 53204 os_fastmutex_trylock is implemented incorrectly on Windows
rb://363 approved by Inaam Rana
and innodb_file_format_max two system variables. And this also fixes
bug #53654 after 2nd shutdown innodb_file_format_check attains strange
values.
rb://366 approved by Marko
In order to allow thread schedulers to be dynamically loaded,
it is necessary to make the following changes to the server:
- Two new service interfaces
- Modifications to InnoDB to inform the thread scheduler of state changes.
- Changes to the VIO subsystem for checking if data is available on a socket.
- Elimination of remains of the old thread pool implementation.
The two new service interfaces introduces are:
my_thread_scheduler
A service interface to register a thread
scheduler.
thd_wait
A service interface to inform thread scheduler
that the thread is about to start waiting.
In addition, the patch adds code that:
- Add a call to thd_wait for table locks in mysys
thd_lock.c by introducing a set function that
can be used to set a callback to be used when
waiting on a lock and resuming from waiting.
- Calling the mysys set function from the server
to set the callbacks correctly.
------------------------------------------------------------
revno: 3094
revision-id: vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh
parent: vasil.dimov@oracle.com-20100512173700-byf8xntxjur1hqov
committer: Vasil Dimov <vasil.dimov@oracle.com>
branch nick: mysql-trunk-innodb
timestamp: Thu 2010-05-13 10:46:52 +0300
message:
Followup to Bug#51920, fix binlog.binlog_killed
This is a followup to the fix of
Bug#51920 Innodb connections in row lock wait ignore KILL until lock wait
timeout
in that fix (rb://279) the behavior was changed to honor when a trx is
interrupted during lock wait, but the returned error code was still
"lock wait timeout" when it should be "interrupted".
This change fixes the non-deterministically failing test binlog.binlog_killed,
that failed like this:
binlog.binlog_killed 'stmt' [ fail ]
Test ended at 2010-05-12 11:39:08
CURRENT_TEST: binlog.binlog_killed
mysqltest: At line 208: query 'reap' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 0...
Approved by: Sunny Bains (rb://344)
------------------------------------------------------------
This merge is non-trivial since it has to introduce the DB_INTERRUPTED
error code.
Also revert vasil.dimov@oracle.com-20100408165555-9rpjh24o0sa9ad5y
which adjusted the binlog.binlog_killed test to the new (wrong) behavior
This is a followup to the fix of
Bug#51920 Innodb connections in row lock wait ignore KILL until lock wait
timeout
in that fix (rb://279) the behavior was changed to honor when a trx is
interrupted during lock wait, but the returned error code was still
"lock wait timeout" when it should be "interrupted".
This change fixes the non-deterministically failing test binlog.binlog_killed,
that failed like this:
binlog.binlog_killed 'stmt' [ fail ]
Test ended at 2010-05-12 11:39:08
CURRENT_TEST: binlog.binlog_killed
mysqltest: At line 208: query 'reap' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 0...
Approved by: Sunny Bains (rb://344)
in the code but they have nothing to do with the kernel mutex split code.
Some subsequent commits use the new functions. This patch has been tested
with: ./mtr --suite=innodb with UNIV_DEBUG and UNIV_SYNC_DEBUG enabled.
All tests were successful.