SRV_CONC_FORCE_EXIT_INNODB
This is a bogus UNIV_SYNC_DEBUG assertion failure that I introduced
when introducing assertions for checking that InnoDB is not holding
any mutexes or rw-locks when returning control to MySQL.
srv_suspend_mysql_thread(): Release dict_operation_lock before
invoking srv_conc_force_exit_innodb(), which would now check that the
thread is not holding any mutexes or rw-locks. After resuming, check
sync_thread_levels_nonempty_trx() and do srv_conc_force_enter_innodb()
before reacquiring the dict_operation_lock.
rb:646 approved by Sunny Bains
SRV_CONC_FORCE_EXIT_INNODB
This is a bogus UNIV_SYNC_DEBUG assertion failure that I introduced
when introducing assertions for checking that InnoDB is not holding
any mutexes or rw-locks when returning control to MySQL.
srv_suspend_mysql_thread(): Release dict_operation_lock before
invoking srv_conc_force_exit_innodb(), which would now check that the
thread is not holding any mutexes or rw-locks. After resuming, check
sync_thread_levels_nonempty_trx() and do srv_conc_force_enter_innodb()
before reacquiring the dict_operation_lock.
rb:646 approved by Sunny Bains
On shutdown, do not exit threads in os_event_wait(). This method of
exiting was only used by the I/O handler threads. Exit them on a
higher level.
os_event_wait_low(), os_event_wait_time_low(): Do not exit on shutdown.
os_thread_exit(), ut_dbg_assertion_failed(), ut_print_timestamp(): Add
attribute cold, so that GCC knows that these functions are rarely
invoked and can be optimized for size.
os_aio_linux_collect(): Return on shutdown.
os_aio_linux_handle(), os_aio_simulated_handle(), os_aio_windows_handle():
Set *message1 = *message2 = NULL and return TRUE on shutdown.
fil_aio_wait(): Return on shutdown.
logs_empty_and_mark_files_at_shutdown(): Even in very fast shutdown
(innodb_fast_shutdown=2), allow the background threads to exit, but
skip the flushing and log checkpointing.
innobase_shutdown_for_mysql(): Always wait for all the threads to exit.
rb:633 approved by Sunny Bains
On shutdown, do not exit threads in os_event_wait(). This method of
exiting was only used by the I/O handler threads. Exit them on a
higher level.
os_event_wait_low(), os_event_wait_time_low(): Do not exit on shutdown.
os_thread_exit(), ut_dbg_assertion_failed(), ut_print_timestamp(): Add
attribute cold, so that GCC knows that these functions are rarely
invoked and can be optimized for size.
os_aio_linux_collect(): Return on shutdown.
os_aio_linux_handle(), os_aio_simulated_handle(), os_aio_windows_handle():
Set *message1 = *message2 = NULL and return TRUE on shutdown.
fil_aio_wait(): Return on shutdown.
logs_empty_and_mark_files_at_shutdown(): Even in very fast shutdown
(innodb_fast_shutdown=2), allow the background threads to exit, but
skip the flushing and log checkpointing.
innobase_shutdown_for_mysql(): Always wait for all the threads to exit.
rb:633 approved by Sunny Bains
Remove most references to thread id in InnoDB. Three references
remain: the current holder of a mutex, and the current x-lock holder
of a rw-lock, and some references in UNIV_SYNC_DEBUG checks. This
allows MySQL to change the thread associated to a client connection.
Tighten the UNIV_SYNC_DEBUG checks, trying to ensure that no InnoDB
mutex or x-lock is being held when returning control to MySQL. The
only semaphore that may be held is the btr_search_latch in shared mode.
sync_thread_levels_empty_except_dict(): A wrapper for
sync_thread_levels_empty_gen(TRUE).
sync_thread_levels_nonempty_trx(): Check that the current thread is
not holding any InnoDB semaphores, except btr_search_latch if
trx->has_search_latch.
sync_thread_levels_empty(): Unused function; remove.
trx_t: Remove mysql_thread_id and mysql_process_no.
srv_slot_t: Remove id and handle.
row_search_for_mysql(), srv_conc_enter_innodb(),
srv_conc_force_enter_innodb(), srv_conc_force_exit_innodb(),
srv_conc_exit_innodb(), srv_suspend_mysql_thread: Assert
!sync_thread_levels_nonempty_trx().
rb:634 approved by Sunny Bains
Remove most references to thread id in InnoDB. Three references
remain: the current holder of a mutex, and the current x-lock holder
of a rw-lock, and some references in UNIV_SYNC_DEBUG checks. This
allows MySQL to change the thread associated to a client connection.
Tighten the UNIV_SYNC_DEBUG checks, trying to ensure that no InnoDB
mutex or x-lock is being held when returning control to MySQL. The
only semaphore that may be held is the btr_search_latch in shared mode.
sync_thread_levels_empty_except_dict(): A wrapper for
sync_thread_levels_empty_gen(TRUE).
sync_thread_levels_nonempty_trx(): Check that the current thread is
not holding any InnoDB semaphores, except btr_search_latch if
trx->has_search_latch.
sync_thread_levels_empty(): Unused function; remove.
trx_t: Remove mysql_thread_id and mysql_process_no.
srv_slot_t: Remove id and handle.
row_search_for_mysql(), srv_conc_enter_innodb(),
srv_conc_force_enter_innodb(), srv_conc_force_exit_innodb(),
srv_conc_exit_innodb(), srv_suspend_mysql_thread: Assert
!sync_thread_levels_nonempty_trx().
rb:634 approved by Sunny Bains
sync_array_print_long_waits(): Return the longest waiting thread ID
and the longest waited-for lock. Only if those remain unchanged
between calls in srv_error_monitor_thread(), increment
fatal_cnt. Otherwise, reset fatal_cnt.
Background: There is a built-in watchdog in InnoDB whose purpose is to
kill the server when some thread is stuck waiting for a mutex or
rw-lock. Before this fix, the logic was flawed.
The function sync_array_print_long_waits() returns TRUE if it finds a
lock wait that exceeds 10 minutes (srv_fatal_semaphore_wait_threshold).
The function srv_error_monitor_thread() will kill the server if this
happens 10 times in a row (fatal_cnt reaches 10), checked every 30
seconds. This is wrong, because this situation does not mean that the
server is hung. If the server is very busy for a little over 15
minutes, it will be killed.
Consider this example. Thread T1 is waiting for mutex M. Some time
later, threads T2..Tn start waiting for the same mutex M. If T1 keeps
waiting for 600 seconds, fatal_cnt will be incremented to 1. So far,
so good. Now, if M is granted to T1, the server was obviously not
stuck. But, T2..Tn keeps waiting, and their wait time will be longer
than 600 seconds. If 5 minutes later, some Tn has still been waiting
for more than 10 minutes for the mutex M, the server can be killed,
even though it is not stuck.
rb:622 approved by Jimmy Yang
sync_array_print_long_waits(): Return the longest waiting thread ID
and the longest waited-for lock. Only if those remain unchanged
between calls in srv_error_monitor_thread(), increment
fatal_cnt. Otherwise, reset fatal_cnt.
Background: There is a built-in watchdog in InnoDB whose purpose is to
kill the server when some thread is stuck waiting for a mutex or
rw-lock. Before this fix, the logic was flawed.
The function sync_array_print_long_waits() returns TRUE if it finds a
lock wait that exceeds 10 minutes (srv_fatal_semaphore_wait_threshold).
The function srv_error_monitor_thread() will kill the server if this
happens 10 times in a row (fatal_cnt reaches 10), checked every 30
seconds. This is wrong, because this situation does not mean that the
server is hung. If the server is very busy for a little over 15
minutes, it will be killed.
Consider this example. Thread T1 is waiting for mutex M. Some time
later, threads T2..Tn start waiting for the same mutex M. If T1 keeps
waiting for 600 seconds, fatal_cnt will be incremented to 1. So far,
so good. Now, if M is granted to T1, the server was obviously not
stuck. But, T2..Tn keeps waiting, and their wait time will be longer
than 600 seconds. If 5 minutes later, some Tn has still been waiting
for more than 10 minutes for the mutex M, the server can be killed,
even though it is not stuck.
rb:622 approved by Jimmy Yang
ibuf_inside(), ibuf_enter(), ibuf_exit(): Add the parameter mtr. The
flag is no longer kept in the thread-local storage but in the
mini-transaction (mtr->inside_ibuf).
mtr_start(): Clean up the comment and remove the unused return value.
mtr_commit(): Assert !ibuf_inside(mtr) in debug builds.
ibuf_mtr_start(): Like mtr_start(), but sets the flag.
ibuf_mtr_commit(), ibuf_btr_pcur_commit_specify_mtr(): Wrappers that
assert ibuf_inside().
buf_page_get_zip(), buf_page_init_for_read(),
buf_read_ibuf_merge_pages(), fil_io(), ibuf_free_excess_pages(),
ibuf_contract_ext(): Remove assertions on ibuf_inside(), because a
mini-transaction is not available.
buf_read_ahead_linear(): Add the parameter inside_ibuf.
ibuf_restore_pos(): When this function returns FALSE, it commits mtr
and must therefore do ibuf_exit(mtr).
ibuf_delete_rec(): This function commits mtr and must therefore do
ibuf_exit(mtr).
ibuf_rec_get_page_no(), ibuf_rec_get_space(), ibuf_rec_get_info(),
ibuf_rec_get_op_type(), ibuf_build_entry_from_ibuf_rec(),
ibuf_rec_get_volume(), ibuf_get_merge_page_nos(),
ibuf_get_volume_buffered_count(), ibuf_get_entry_counter_low(): Add
the parameter mtr in debug builds, for asserting ibuf_inside(mtr).
rb:585 approved by Sunny Bains
ibuf_inside(), ibuf_enter(), ibuf_exit(): Add the parameter mtr. The
flag is no longer kept in the thread-local storage but in the
mini-transaction (mtr->inside_ibuf).
mtr_start(): Clean up the comment and remove the unused return value.
mtr_commit(): Assert !ibuf_inside(mtr) in debug builds.
ibuf_mtr_start(): Like mtr_start(), but sets the flag.
ibuf_mtr_commit(), ibuf_btr_pcur_commit_specify_mtr(): Wrappers that
assert ibuf_inside().
buf_page_get_zip(), buf_page_init_for_read(),
buf_read_ibuf_merge_pages(), fil_io(), ibuf_free_excess_pages(),
ibuf_contract_ext(): Remove assertions on ibuf_inside(), because a
mini-transaction is not available.
buf_read_ahead_linear(): Add the parameter inside_ibuf.
ibuf_restore_pos(): When this function returns FALSE, it commits mtr
and must therefore do ibuf_exit(mtr).
ibuf_delete_rec(): This function commits mtr and must therefore do
ibuf_exit(mtr).
ibuf_rec_get_page_no(), ibuf_rec_get_space(), ibuf_rec_get_info(),
ibuf_rec_get_op_type(), ibuf_build_entry_from_ibuf_rec(),
ibuf_rec_get_volume(), ibuf_get_merge_page_nos(),
ibuf_get_volume_buffered_count(), ibuf_get_entry_counter_low(): Add
the parameter mtr in debug builds, for asserting ibuf_inside(mtr).
rb:585 approved by Sunny Bains
Remove the slot_no member of struct thr_local_struct.
enum srv_thread_type: Remove unused thread types.
srv_get_thread_type(): Unused function, remove.
thr_local_get_slot_no(), thr_local_set_slot_no(): Remove.
srv_thread_type_validate(), srv_slot_get_type(): New functions, for debugging.
srv_table_reserve_slot(): Return the srv_slot_t* directly. Do not create
thread-local storage.
srv_suspend_thread(): Get the srv_slot_t* as parameter. Return void;
the caller knows slot->event already.
srv_thread_has_reserved_slot(), srv_release_threads(): Assert
srv_thread_type_validate(type).
srv_init(): Use mem_zalloc() instead of mem_alloc(). Replace
srv_table_get_nth_slot(), because it now asserts that the kernel_mutex
is being held.
srv_master_thread(), srv_purge_thread(): Remember the slot from
srv_table_reserve_slot().
rb:629 approved by Inaam Rana
Remove the slot_no member of struct thr_local_struct.
enum srv_thread_type: Remove unused thread types.
srv_get_thread_type(): Unused function, remove.
thr_local_get_slot_no(), thr_local_set_slot_no(): Remove.
srv_thread_type_validate(), srv_slot_get_type(): New functions, for debugging.
srv_table_reserve_slot(): Return the srv_slot_t* directly. Do not create
thread-local storage.
srv_suspend_thread(): Get the srv_slot_t* as parameter. Return void;
the caller knows slot->event already.
srv_thread_has_reserved_slot(), srv_release_threads(): Assert
srv_thread_type_validate(type).
srv_init(): Use mem_zalloc() instead of mem_alloc(). Replace
srv_table_get_nth_slot(), because it now asserts that the kernel_mutex
is being held.
srv_master_thread(), srv_purge_thread(): Remember the slot from
srv_table_reserve_slot().
rb:629 approved by Inaam Rana
Bug #11766501: Multiple RBS break the get rseg with mininum trx_t::no code during purge
Bug# 59291 changes:
Main problem is that truncating the UNDO log at the completion of every
trx_purge() call is expensive as the number of rollback segments is increased.
We truncate after a configurable amount of pages. The innodb_purge_batch_size
parameter is used to control when InnoDB does the actual truncate. The truncate
is done once after 128 (or TRX_SYS_N_RSEGS iterations). In other words we
truncate after purge 128 * innodb_purge_batch_size. The smaller the batch
size the quicker we truncate.
Introduce a new parameter that allows how many rollback segments to use for
storing REDO information. This is really step 1 in allowing complete control
to the user over rollback space management.
New parameters:
i) innodb_rollback_segments = number of rollback_segments to use
(default is now 128) dynamic parameter, can be changed anytime.
Currently there is little benefit in changing it from the default.
Optimisations in the patch.
i. Change the O(n) behaviour of trx_rseg_get_on_id() to O(log n)
Backported from 5.6. Refactor some of the binary heap code.
Create a new include/ut0bh.ic file.
ii. Avoid truncating the rollback segments after every purge.
Related changes that were moved to a separate patch:
i. Purge should not do any flushing, only wait for space to be free so that
it only does purging of records unless it is held up by a long running
transaction that is preventing it from progressing.
ii. Give the purge thread preference over transactions when acquiring the
rseg->mutex during commit. This to avoid purge blocking unnecessarily
when getting the next rollback segment to purge.
Bug #11766501 changes:
Add the rseg to the min binary heap under the cover of the kernel mutex and
the binary heap mutex. This ensures the ordering of the min binary heap.
The two changes have to be committed together because they share the same
that fixes both issues.
rb://567 Approved by: Inaam Rana.
Bug #11766501: Multiple RBS break the get rseg with mininum trx_t::no code during purge
Bug# 59291 changes:
Main problem is that truncating the UNDO log at the completion of every
trx_purge() call is expensive as the number of rollback segments is increased.
We truncate after a configurable amount of pages. The innodb_purge_batch_size
parameter is used to control when InnoDB does the actual truncate. The truncate
is done once after 128 (or TRX_SYS_N_RSEGS iterations). In other words we
truncate after purge 128 * innodb_purge_batch_size. The smaller the batch
size the quicker we truncate.
Introduce a new parameter that allows how many rollback segments to use for
storing REDO information. This is really step 1 in allowing complete control
to the user over rollback space management.
New parameters:
i) innodb_rollback_segments = number of rollback_segments to use
(default is now 128) dynamic parameter, can be changed anytime.
Currently there is little benefit in changing it from the default.
Optimisations in the patch.
i. Change the O(n) behaviour of trx_rseg_get_on_id() to O(log n)
Backported from 5.6. Refactor some of the binary heap code.
Create a new include/ut0bh.ic file.
ii. Avoid truncating the rollback segments after every purge.
Related changes that were moved to a separate patch:
i. Purge should not do any flushing, only wait for space to be free so that
it only does purging of records unless it is held up by a long running
transaction that is preventing it from progressing.
ii. Give the purge thread preference over transactions when acquiring the
rseg->mutex during commit. This to avoid purge blocking unnecessarily
when getting the next rollback segment to purge.
Bug #11766501 changes:
Add the rseg to the min binary heap under the cover of the kernel mutex and
the binary heap mutex. This ensures the ordering of the min binary heap.
The two changes have to be committed together because they share the same
that fixes both issues.
rb://567 Approved by: Inaam Rana.
rb://566
approved by: Sunny
When using native aio on linux each IO helper thread should be able to
handle upto 256 IO requests. The number 256 is the same which is used
for simulated aio as well. In case of windows where we also use native
aio this limit is 32 because of OS constraints. It seems that we are
using the limit of 32 for all the platforms where we are using native
aio. The fix is to use 256 on all platforms except windows (when native
aio is enabled on windows)
rb://566
approved by: Sunny
When using native aio on linux each IO helper thread should be able to
handle upto 256 IO requests. The number 256 is the same which is used
for simulated aio as well. In case of windows where we also use native
aio this limit is 32 because of OS constraints. It seems that we are
using the limit of 32 for all the platforms where we are using native
aio. The fix is to use 256 on all platforms except windows (when native
aio is enabled on windows)
"rows examined" estimates". This change implements "innodb_stats_method"
with options of "nulls_equal", "nulls_unequal" and "null_ignored".
rb://553 approved by Marko
"rows examined" estimates". This change implements "innodb_stats_method"
with options of "nulls_equal", "nulls_unequal" and "null_ignored".
rb://553 approved by Marko
Check whether the master and purge thread are active after creating them. Do
not proceed until both threads have started. We do this by checking whether a
slot has been reserved by both the respective threads.
Add srv_thread_has_reserved_slot() returns slot no or ULINT_UNDEFINED.
rb://536 Approved by Jimmy
Check whether the master and purge thread are active after creating them. Do
not proceed until both threads have started. We do this by checking whether a
slot has been reserved by both the respective threads.
Add srv_thread_has_reserved_slot() returns slot no or ULINT_UNDEFINED.
rb://536 Approved by Jimmy
InnoDB does not attempt to handle lower_case_table_names == 2 when looking
up foreign table names and referenced table name. It turned that server
variable into a boolean and ignored the possibility of it being '2'.
The setting lower_case_table_names == 2 means that it should be stored and
displayed in mixed case as given, but compared internally in lower case.
Normally the server deals with this since it stores table names. But
InnoDB stores referential constraints for the server, so it needs to keep
track of both lower case and given names.
This solution creates two table name pointers for each foreign and referenced
table name. One to display the name, and one to look it up. Both pointers
point to the same allocated string unless this setting is 2. So the overhead
added is not too much.
Two functions are created in dict0mem.c to populate the ..._lookup versions
of these pointers. Both dict_mem_foreign_table_name_lookup_set() and
dict_mem_referenced_table_name_lookup_set() are called 5 times each.
InnoDB does not attempt to handle lower_case_table_names == 2 when looking
up foreign table names and referenced table name. It turned that server
variable into a boolean and ignored the possibility of it being '2'.
The setting lower_case_table_names == 2 means that it should be stored and
displayed in mixed case as given, but compared internally in lower case.
Normally the server deals with this since it stores table names. But
InnoDB stores referential constraints for the server, so it needs to keep
track of both lower case and given names.
This solution creates two table name pointers for each foreign and referenced
table name. One to display the name, and one to look it up. Both pointers
point to the same allocated string unless this setting is 2. So the overhead
added is not too much.
Two functions are created in dict0mem.c to populate the ..._lookup versions
of these pointers. Both dict_mem_foreign_table_name_lookup_set() and
dict_mem_referenced_table_name_lookup_set() are called 5 times each.
Fix a race condition in srv_master_thread(). We need to acquire the kernel
mutex before calling srv_table_reserve_slot(). Add a mutex_own() assertion
in srv_table_reserve_slot().
Fix a race condition in srv_master_thread(). We need to acquire the kernel
mutex before calling srv_table_reserve_slot(). Add a mutex_own() assertion
in srv_table_reserve_slot().