Contains also:
MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE || m_lock_type != 2' failed. (branch bb-10.2-jan)
Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
enable tests that were fixed in MDEV-10549
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables
Contains also
MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7
The failure happened because 5.7 has changed the signature of
the bool handler::primary_key_is_clustered() const
virtual function ("const" was added). InnoDB was using the old
signature which caused the function not to be used.
MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7
Fixed mutexing problem on lock_trx_handle_wait. Note that
rpl_parallel and rpl_optimistic_parallel tests still
fail.
MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
Reason: incorrect merge
MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
Reason: incorrect merge
Bug #79636: CACHE_LINE_SIZE should be 128 on AArch64
Bug #79637: Hard-coded cache line size
Bug #79638: Reconcile CACHE_LINE_SIZE with CPU_LEVEL1_DCACHE_LINESIZE
Bug #79652: Suspicious padding in srv_conc_t
- changed CPU_LEVEL1_DCACHE_LINESIZE to default to 128 bytes on POWER
and AArch64 architectures in cases when no value could be detected
by CMake using getconf
- changed CACHE_LINE_SIZE definition in ut0counter.h to be an alias of
CPU_LEVEL1_DCACHE_LINESIZE
- changed a number of hard-coded 64-byte cache line size values in the
InnoDB code
- fixed insufficient padding for srv_conc members in srv0conc.cc
Ported to Mariadb by Daniel Black <daniel.black@au.ibm.com>
Added s390 cache size of 256 at same time.
we can see from the hang stacktrace, srv_monitor_thread is blocked
when getting log_sys::mutex, so that sync_arr_wake_threads_if_sema_free
cannot get a change to break the mutex deadlock.
The fix is simply removing any mutex wait in srv_monitor_thread.
Patch is reviewed by Sunny over IM.
we can see from the hang stacktrace, srv_monitor_thread is blocked
when getting log_sys::mutex, so that sync_arr_wake_threads_if_sema_free
cannot get a change to break the mutex deadlock.
The fix is simply removing any mutex wait in srv_monitor_thread.
Patch is reviewed by Sunny over IM.
Backport pull request #125 from grooverdan/MDEV-8923_innodb_buffer_pool_dump_pct to 10.0
WL#6504 InnoDB buffer pool dump/load enchantments
This patch consists of two parts:
1. Dump only the hottest N% of the buffer pool(s)
2. Prevent hogging the server duing BP load
From MySQL - commit b409342c43ce2edb68807100a77001367c7e6b8e
Add testcases for innodb_buffer_pool_dump_pct_basic.
Part of the code authored by Daniel Black
- Make accelerated checksum available to InnoDB and XtraDB.
- Fall back to slice-by-eight if not available. The mode used is printed on startup.
- Will only build on POWER systems at the moment until CMakeLists are modified
to only add the crc32_power8/ files when building on POWER.
running MySQL-5.7 unittest/gunit/innodb/ut0crc32-t
Before:
1..2
Using software crc32 implementation, CPU is little-endian
ok 1
Using software crc32 implementation, CPU is little-endian
normal CRC32: real 0.148006 sec
normal CRC32: user 0.148000 sec
normal CRC32: sys 0.000000 sec
big endian CRC32: real 0.144293 sec
big endian CRC32: user 0.144000 sec
big endian CRC32: sys 0.000000 sec
ok 2
After:
1..2
Using POWER8 crc32 implementation, CPU is little-endian
ok 1
Using POWER8 crc32 implementation, CPU is little-endian
normal CRC32: real 0.008097 sec
normal CRC32: user 0.008000 sec
normal CRC32: sys 0.000000 sec
big endian CRC32: real 0.147043 sec
big endian CRC32: user 0.144000 sec
big endian CRC32: sys 0.000000 sec
ok 2
Author CRC32 ASM code: Anton Blanchard <anton@au.ibm.com>
ref: https://github.com/antonblanchard/crc32-vpmsum
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
WL#6504 InnoDB buffer pool dump/load enchantments
This patch consists of two parts:
1. Dump only the hottest N% of the buffer pool(s)
2. Prevent hogging the server duing BP load
From MySQL - commit b409342c43ce2edb68807100a77001367c7e6b8e
Analysis: Current implementation will write and read at least one block
(sort_buffer_size bytes) from disk / index even if that block does not
contain any records.
Fix: Avoid writing / reading empty blocks to temporary files (disk).
Added new dynamic configuration variable innodb_buf_dump_status_frequency
to configure how often buffer pool dump status is printed in the logs.
A number between [0, 100] that tells how oftern buffer pool dump status
in percentages should be printed. E.g. 10 means that buffer pool dump
status is printed when every 10% of number of buffer pool pages are
dumped. Default is 0 (only start and end status is printed).
Analysis: Problem was that we did create crypt data for encrypted table but
this new crypt data was not written to page 0. Instead a default crypt data
was written to page 0 at table creation.
Fixed by explicitly writing new crypt data to page 0 after successfull
table creation.
Avoid creating innodb buffer pool dump/load thread if mysqld is started
with wsrep recovery mode (--wsrep-recover).
(Merged fix for lp:1305955 from lp:percona-xtradb-cluster)
if XA PREPARE transactions hold explicit locks.
innobase_shutdown_for_mysql(): Call trx_sys_close() before lock_sys_close()
(and dict_close()) so that trx_free_prepared() will see all locks intact.
RB: 8561
Reviewed-by: Vasil Dimov <vasil.dimov@oracle.com>
Step 2:
-- Introduce temporal memory array to buffer pool where to allocate
temporary memory for encryption/compression
-- Rename PAGE_ENCRYPTION -> ENCRYPTION
-- Rename PAGE_ENCRYPTION_KEY -> ENCRYPTION_KEY
-- Rename innodb_default_page_encryption_key -> innodb_default_encryption_key
-- Allow enable/disable encryption for tables by changing
ENCRYPTION to enum having values DEFAULT, ON, OFF
-- In create table store crypt_data if ENCRYPTION is ON or OFF
-- Do not crypt tablespaces having ENCRYPTION=OFF
-- Store encryption mode to crypt_data and redo-log
Step 1:
-- Remove page encryption from dictionary (per table
encryption will be handled by storing crypt_data to page 0)
-- Remove encryption/compression from os0file and all functions
before that (compression will be added to buf0buf.cc)
-- Use same CRYPT_SCHEME_1 for all encryption methods
-- Do some code cleanups to confort InnoDB coding style
MDEV-7399: Add support for INFORMATION_SCHEMA.INNODB_MUTEXES
MDEV-7618: Improve semaphore instrumentation
Introduced two new information schema tables to monitor mutex waits
and semaphore waits. Added a new configuration variable
innodb_intrument_semaphores to add thread_id, file name and
line of current holder of mutex/rw_lock.
in file buf0mtflu.cc line 439
Analysis: At shutdown multi-threaded flush sends a exit work
items to all mtflush threads. We wait until the work queue is
empty. However, as we did not hold the mutex, some other thread
could also put work-items to work queue.
Fix: Take mutex before adding exit work items to work queue and
wait until all work-items are really processed. Release
mutex after we have marked that multi-threaded flush is not
anymore active.
Fix test failure on innodb_bug12902967 caused by unnecessary
info output on xtradb/buf/buf0mtflush.cc.
Do not try to enable atomic writes if the file type
is not OS_DATA_FILE. Atomic writes are unnecessary
for log files. If we try to enable atomic writes
to log writes that are stored to media supporting
atomic writes we will end up problems later.
Analysis: For some reason actual thread handle is not
returned on Windows instead lpThreadId was returned and
thread handle was closed after thread create. Later
CloseHandle was called for recv_writer_thread_handle
and psort_info->thread_hdl.
Fix: Return thread handle from os_thread_create()
also on Windows and store these thread handles also
in srv0start.cc so that they can be later closed.
Use traditional statistics estimation by default (innodb-stats-traditional=true).
There could be performance regression for customers if there is a lot of
open table operations.
innodb_stats_sample_pages
Analysis: If you set the number of analyzed pages
to very low number compared to actual pages on
that table/index it randomly pics those pages
(default 8 pages), this leads to fact that query
after analyze table returns different results. If
the index tree is small, smaller than 10 *
n_sample_pages + total_external_size, then the
estimate is ok. For bigger index trees it is
common that we do not see any borders between
key values in the few pages we pick. But still
there may be n_sample_pages different key values,
or even more. And it just tries to
approximate to n_sample_pages (8).
Fix: (1) Introduced new dynamic configuration variable
innodb_stats_sample_traditional that retains
the current design. Default false.
(2) If traditional sample is not used we use
n_sample_pages = max(min(srv_stats_sample_pages,
index->stat_index_size),
log2(index->stat_index_size)*
srv_stats_sample_pages);
(3) Introduced new dynamic configuration variable
stat_modified_counter (default = 0) if set
sets lower bound for row updates when statistics is re-estimated.
If user has provided upper bound for how many rows needs to be updated
before we calculate new statistics we use minimum of provided value
and 1/16 of table every 16th round. If no upper bound is provided
(srv_stats_modified_counter = 0, default) then calculate new statistics
if 1 / 16 of table has been modified
since the last time a statistics batch was run.
We calculate statistics at most every 16th round, since we may have
a counter table which is very small and updated very often.
@param t table
@return true if the table has changed too much and stats need to be
recalculated
*/
#define DICT_TABLE_CHANGED_TOO_MUCH(t) \
((ib_int64_t) (t)->stat_modified_counter > (srv_stats_modified_counter ? \
ut_min(srv_stats_modified_counter, (16 + (t)->stat_n_rows / 16)) : \
16 + (t)->stat_n_rows / 16))
Merge Facebook commit cd063ab930
authored by Peng Tian from https://github.com/facebook/mysql-5.6
Introduced a new configuration variable innodb_fatal_semaphore_wait_threshold,
it makes the fatal semaphore timeout configurable. Modified original commit
so that no MariaDB server files are changed, instead introduced a new
InnoDB/XtraDB configuration variable.
Its default/min/max vlaues are 600/1/2^32-1 in seconds (it was hardcoded
as 600, now its default value is 600, so the default behavior of this diff
should be no change).
Merged Facebook commit 617aef9f911d825e9053f3d611d0389e02031225
authored by Inaam Rana to InnoDB storage engine (not XtraDB)
from https://github.com/facebook/mysql-5.6
WL#7047 - Optimize buffer pool list scans and related batch processing
Reduce excessive scanning of pages when doing flush list batches. The
fix is to introduce the concept of "Hazard Pointer", this reduces the
time complexity of the scan from O(n*n) to O.
The concept of hazard pointer is reversed in this work. Academically
hazard pointer is a pointer that the thread working on it will declar
such and as long as that thread is not done no other thread is allowe
do anything with it.
In this WL we declare the pointer as a hazard pointer and then if any
thread attempts to work on it, it is allowed to do so but it has to a
the hazard pointer to the next valid value. We use hazard pointer sol
reverse traversal of lists within a buffer pool instance.
Add an event to control the background flush thread. The background f
thread wait has been converted to an os event timed wait so that it c
signalled by threads that want to kick start a background flush when
buffer pool is running low on free/dirty pages.
Merge Facebook commit 154c579b828a60722a7d9477fc61868c07453d08
and e8f0052f9b112dc786bf9b957ed5b16a5749f7fd authored
by Steaphan Greene from https://github.com/facebook/mysql-5.6
Optimize prefix index queries to skip cluster index lookup when possible.
Currently InnoDB will always fetch the clustered index (primary key
index) for all prefix columns in an index, even when the value of a
particular record is smaller than the prefix length. This change
optimizes that case to use the record from the secondary index and avoid
the extra lookup.
Also adds two status vars that track how effective this is:
innodb_secondary_index_triggered_cluster_reads:
Times secondary index lookup triggered cluster lookup.
innodb_secondary_index_triggered_cluster_reads_avoided:
Times prefix optimization avoided triggering cluster lookup.
Merge Facebook commit 4f3e0343fd2ac3fc7311d0ec9739a8f668274f0d
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
Adds innodb_idle_flush_pct to enable tuning of the page flushing rate
when the system is relatively idle. We care about this, since doing
extra unnecessary flash writes shortens the lifespan of the flash.
New generation hard drives, SSDs and NVM devices support 4K
sector size. Supported sector size can be found using fstatvfs()
or GetDiskFreeSpace() functions.
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6
This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220
Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.
innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.
dd2d11be7a
Merged Facebook commit ecff018632c6db49bad73d9233c3cdc9f41430e9
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This change is to fix: http://bugs.mysql.com/62534
This makes innodb_max_dirty_pages_pct a double with min,default,max values
0.001, 75, 99.999.
This also makes innodb_max_dirty_pages_pct_lwm and adaptive_flushing_lwm
doubles, as these sysvars are inter-dependent.
Added more to the BUFFER POOL AND MEMORY section of SHOW INNODB STATUS:
Percent pages dirty: X.X
This is all n_dirty_pages / used_pages
Percent all pages dirty: X.X
This is all n_dirty_pages / all-pages
Max dirty pages percent: X.X
This is innodb_max_dirty_pages_pct
Also changed all of buf from 2 to 3 digits of precision (%.2f -> %.3f).
you the progress of inplace alter table and row log buffer usage
- (x 100%, it's 4-digit. 10000 means 100.00%)
- Innodb_onlineddl_rowlog_rows
Shows how many rows are stored in row log buffer.
- Innodb_onlineddl_rowlog_pct_used
Shows row log buffer usage in percent ( *100%, it's 4-digit. 10000 means 100.00% ).
- Innodb_onlineddl_pct_progress
Shows the progress of inplace alter table. It might
be not so accurate because inplace alter is highly
depend on disk and buffer pool status.
But still it is useful and better than nothing.
- Add some log for inplace alter table
XtraDB/InnoDB will print some message before and
after doing some task.
Merged lp:maria/maria-10.0-galera up to revision 3880.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
Merged lp:maria/maria-10.0-galera up to revision 3879.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
Analysis: When database is migrated from 5.5 or earlier and
database needs crash recovery, there is possibility that
SYS_DATAFILES system table does not exists, but
crash recovery in function dict_check_tablespaces_and_store_max_id()
assumes that SYS_DATAFILES exists.
Fix: If SYS_DATAFILES does not exists, create it before
we end up to function dict_check_tablespaces_and_store_max_id()
on crash recovery.
Part of this work is based on Stewart Smitch's memory barrier and lower priori
patches for power8.
- Added memory syncronization for innodb & xtradb for power8.
- Added HAVE_WINDOWS_MM_FENCE to CMakeList.txt
- Added os_isync to fix a syncronization problem on power
- Added log_get_lsn_nowait which is now used srv_error_monitor_thread to ensur
if log mutex is locked.
All changes done both for InnoDB and Xtradb
Analysis: InnoDB writes also files that do not contain FIL-header.
This could lead incorrect analysis on os_fil_read_func function
when it tries to see is page page compressed based on FIL_PAGE_TYPE
field on FIL-header. With bad luck uncompressed page that does
not contain FIL-headed, the byte on FIL_PAGE_TYPE position could
indicate that page is page comrpessed.
Fix: Upper layer must indicate is file space page compressed
or not. If this is not yet known, we need to read the FIL-header
and find it out. Files that we know that are not page compressed
we can always just provide FALSE.
Analysis: Can't disable the error message because you may get database
started with incorrect log file size.
Fix: Thus only improve the error message to give more information
to users.
special builds UNIV_PAGECOMPRESS_DEBUG and UNIV_MTFLUSH_DEBUG).
Added a new status variable compress_pages_page_compression_error to count possible
compression errors.
unnecessary. There is a lot more index pages than there is normal pages.
Earlier all pages were compressed and this provided best performance and
compression ratio. Added status variable to show how many non index pages
are written.
innodb_force_primary_key default off. If option is true, create table without
primary key or unique key where all keyparts are NOT NULL is not
accepted. Instead an error message is printed. Variable value can
be changed with set global innodb_force_primary_key = <value>.
Added option to disable multi-threaded flush with innodb_use_mtflush = 0
option, by default multi-threaded flush is used.
Updated innochecksum tool, still it does not support new checksums.
Fixed issue on multi-threaded flush at shutdown.
Removed unnecessary startup option innodb_compress_pages.
Added a new startup option innodb_mtflush_threads, default 8.
Update InnoDB to 5.6.14
Apply MySQL-5.6 hack for MySQL Bug#16434374
Move Aria-only HA_RTREE_INDEX from my_base.h to maria_def.h (breaks an assert in InnoDB)
Fix InnoDB memory leak
IN DOCUMENTATION
Problem
-------
The documentation says that we support 'K' prefix
while specifiying size for innodb datafile in the
server variable for innodb_data_file_path ,but the
function srv_parse_megabytes() only handles only
'M' (megabytes) and 'G' (gigabytes) .
Fix
---
Modify srv_parse_megabytes() to handle Kilobytes.
Add in documentation that while specifying size
in KB it should be mentioned in multiples of 1024
other wise they will be rounded off to nearest
MB (megabyte) boundry .(eg if size mentioned
as 2313KB will be considered as 2 MB ).
[ Approved by Marko #rb 2387 ]
IN DOCUMENTATION
Problem
-------
The documentation says that we support 'K' prefix
while specifiying size for innodb datafile in the
server variable for innodb_data_file_path ,but the
function srv_parse_megabytes() only handles only
'M' (megabytes) and 'G' (gigabytes) .
Fix
---
Modify srv_parse_megabytes() to handle Kilobytes.
Add in documentation that while specifying size
in KB it should be mentioned in multiples of 1024
other wise they will be rounded off to nearest
MB (megabyte) boundry .(eg if size mentioned
as 2313KB will be considered as 2 MB ).
[ Approved by Marko #rb 2387 ]
includes:
* remove some remnants of "Bug#14521864: MYSQL 5.1 TO 5.5 BUGS PARTITIONING"
* introduce LOCK_share, now LOCK_ha_data is strictly for engines
* rea_create_table() always creates .par file (even in "frm-only" mode)
* fix a 5.6 bug, temp file leak on dummy ALTER TABLE
INNODB_FAST_SHUTDOWN IS 2
Problem:
When innodb_fast_shutdown is set to 2 and the master thread enters
flush loop, under some circumstances it will not be able to exit it.
This may lead to a shutdown hanging.
This is happening because of the following:
1. In the flush_loop block of code, if the srv_fast_shutdown is
equal to 2 (very fast shutdown), then we do not flush dirty
pages in buffer pool to disk.
2. In the same flush_loop block of code, if the number of dirty
pages is more than user specified limit, we go to step 1.
This results in infinite loop.
Solution:
When we are in the process of doing a very fast shutdown, don't
do step 2 above.
rb#2328 approved by Inaam.
INNODB_FAST_SHUTDOWN IS 2
Problem:
When innodb_fast_shutdown is set to 2 and the master thread enters
flush loop, under some circumstances it will not be able to exit it.
This may lead to a shutdown hanging.
This is happening because of the following:
1. In the flush_loop block of code, if the srv_fast_shutdown is
equal to 2 (very fast shutdown), then we do not flush dirty
pages in buffer pool to disk.
2. In the same flush_loop block of code, if the number of dirty
pages is more than user specified limit, we go to step 1.
This results in infinite loop.
Solution:
When we are in the process of doing a very fast shutdown, don't
do step 2 above.
rb#2328 approved by Inaam.
After a clean shutdown, InnoDB will not check the *.ibd file headers,
for maximum performance. This is unchanged before and after this
patch.
What this fix addresses is the case when crash recovery is
needed. Previously, InnoDB could load a corrupted tablespace file.
buf_page_is_corrupted(): Add the parameter check_lsn.
fil_check_first_page(): New function, to perform a consistency check
on the first page of a file. This can be overridden by setting
innodb_force_recovery.
fil_read_first_page(), fil_open_single_table_tablespace(),
fil_load_single_table_tablespace(): Invoke fil_check_first_page().
open_or_create_data_files(): Check the status of
fil_open_single_table_tablespace().
rb#2352 approved by Jimmy Yang
After a clean shutdown, InnoDB will not check the *.ibd file headers,
for maximum performance. This is unchanged before and after this
patch.
What this fix addresses is the case when crash recovery is
needed. Previously, InnoDB could load a corrupted tablespace file.
buf_page_is_corrupted(): Add the parameter check_lsn.
fil_check_first_page(): New function, to perform a consistency check
on the first page of a file. This can be overridden by setting
innodb_force_recovery.
fil_read_first_page(), fil_open_single_table_tablespace(),
fil_load_single_table_tablespace(): Invoke fil_check_first_page().
open_or_create_data_files(): Check the status of
fil_open_single_table_tablespace().
rb#2352 approved by Jimmy Yang
trx_undo_prev_version_build() should confirm existence of inherited (not-own) external pages.
Bug #14676084 : ROW_UNDO_MOD_UPD_DEL_SEC() DOESN'T NEED UNDO_ROW AND UNDO_EXT INITIALIZED
mtr script could hit the assertion error !bpage->file_page_was_freed using this path.
So, also fixed
rb://1337 approved by Marko Makela.
trx_undo_prev_version_build() should confirm existence of inherited (not-own) external pages.
Bug #14676084 : ROW_UNDO_MOD_UPD_DEL_SEC() DOESN'T NEED UNDO_ROW AND UNDO_EXT INITIALIZED
mtr script could hit the assertion error !bpage->file_page_was_freed using this path.
So, also fixed
rb://1337 approved by Marko Makela.
btr_lift_page_up() writes wrong page number (different by -1) for upper than father page.
But in almost all of the cases, the father page should be root page, no upper
pages. It is very rare path.
In addition the leaf page should not be lifted unless the father page is root.
Because the branch pages should not become the leaf pages.
rb://1336 approved by Marko Makela.
btr_lift_page_up() writes wrong page number (different by -1) for upper than father page.
But in almost all of the cases, the father page should be root page, no upper
pages. It is very rare path.
In addition the leaf page should not be lifted unless the father page is root.
Because the branch pages should not become the leaf pages.
rb://1336 approved by Marko Makela.
btr_lift_page_up() writes wrong page number (different by -1) for upper than father page.
But in almost all of the cases, the father page should be root page, no upper
pages. It is very rare path.
In addition the leaf page should not be lifted unless the father page is root.
Because the branch pages should not become the leaf pages.
rb://1336 approved by Marko Makela.
btr_lift_page_up() writes wrong page number (different by -1) for upper than father page.
But in almost all of the cases, the father page should be root page, no upper
pages. It is very rare path.
In addition the leaf page should not be lifted unless the father page is root.
Because the branch pages should not become the leaf pages.
rb://1336 approved by Marko Makela.
The problem was introduced in 5.5.20 by Bug 13116225. It tried to
protect against downgrading from a version 5.6.4 database that was
created with a page size other than 16k. Version 5.6.4 supports
page sizes 4k and 8k and stamps that page size into the header page
of each tablespace file. Version 5.5.20 attempts to read that page
size in the file header.
But it turns out that only the first system tablespace file has a
reliable flags field in the header. So only ibdata1 can be or needs
to be tested for another page size. Extra system tablespace files
like ibdata2, ibdata3, etc do not and should not be tested since the
flags field is unreliable.
The problem was introduced in 5.5.20 by Bug 13116225. It tried to
protect against downgrading from a version 5.6.4 database that was
created with a page size other than 16k. Version 5.6.4 supports
page sizes 4k and 8k and stamps that page size into the header page
of each tablespace file. Version 5.5.20 attempts to read that page
size in the file header.
But it turns out that only the first system tablespace file has a
reliable flags field in the header. So only ibdata1 can be or needs
to be tested for another page size. Extra system tablespace files
like ibdata2, ibdata3, etc do not and should not be tested since the
flags field is unreliable.
This fix does not remove the underlying cause of the assertion
failure. It just works around the problem, allowing a corrupted
secondary index to be fixed by DROP INDEX and CREATE INDEX (or in the
worst case, by re-creating the table).
ibuf_delete(): If the record to be purged is the last one in the page
or it is not delete-marked, refuse to purge it. Instead, write an
error message to the error log and let a debug assertion fail.
ibuf_set_del_mark(): If the record to be delete-marked is not found,
display some more information in the error log and let a debug
assertion fail.
row_undo_mod_del_unmark_sec_and_undo_update(),
row_upd_sec_index_entry(): Let a debug assertion fail when the record
to be delete-marked is not found.
buf_page_print(): Add ut_ad(0) so that corruption will be more
prominent in stress testing with debug binaries. Add ut_ad(0) here and
there where corruption is noticed.
btr_corruption_report(): Display some data on page_is_comp() mismatch.
btr_assert_not_corrupted(): A wrapper around btr_corruption_report().
Assert that page_is_comp() agrees with the table flags.
rb:911 approved by Inaam Rana
This fix does not remove the underlying cause of the assertion
failure. It just works around the problem, allowing a corrupted
secondary index to be fixed by DROP INDEX and CREATE INDEX (or in the
worst case, by re-creating the table).
ibuf_delete(): If the record to be purged is the last one in the page
or it is not delete-marked, refuse to purge it. Instead, write an
error message to the error log and let a debug assertion fail.
ibuf_set_del_mark(): If the record to be delete-marked is not found,
display some more information in the error log and let a debug
assertion fail.
row_undo_mod_del_unmark_sec_and_undo_update(),
row_upd_sec_index_entry(): Let a debug assertion fail when the record
to be delete-marked is not found.
buf_page_print(): Add ut_ad(0) so that corruption will be more
prominent in stress testing with debug binaries. Add ut_ad(0) here and
there where corruption is noticed.
btr_corruption_report(): Display some data on page_is_comp() mismatch.
btr_assert_not_corrupted(): A wrapper around btr_corruption_report().
Assert that page_is_comp() agrees with the table flags.
rb:911 approved by Inaam Rana
This bug ensures that a live downgrade from an InnoDB 5.6 with WL5756 and
a database created with innodb-page-size=8k or 4k will make a version 5.5
installation politely refuse to start. Instead of crashing or giving some
indication about a corrupted database, it will indicate the page size
difference.
This patch takes only that part of the Wl5756 patch that is needed to
protect against opening a tablespace that is stamped with a different
page size.
It also contains the change in dict_index_find_on_id_low() just in case
a database with another page size is created by recompiling a pre-WL5756
InnoDB.
This bug ensures that a live downgrade from an InnoDB 5.6 with WL5756 and
a database created with innodb-page-size=8k or 4k will make a version 5.5
installation politely refuse to start. Instead of crashing or giving some
indication about a corrupted database, it will indicate the page size
difference.
This patch takes only that part of the Wl5756 patch that is needed to
protect against opening a tablespace that is stamped with a different
page size.
It also contains the change in dict_index_find_on_id_low() just in case
a database with another page size is created by recompiling a pre-WL5756
InnoDB.
The innoDB global variable srv_lower_case_table_names is set to the value of lower_case_table_names declared in mysqld.h server in ha_innodb.cc. Since this variable can change at runtime, it is reset for each handler call to ::create, ::open, ::rename_table & ::delete_table.
But it is possible for tables to be implicitly opened before an explicit handler call is made when an engine is first started or restarted. I was able to reproduce that with the testcase in this patch on a version of InnoDB from 2 weeks ago. It seemed like the change buffer entries for the secondary key was getting put into pages after the restart. (But I am not sure, I did not write down the call stack while it was reproducing.) In the current code, the implicit open, which is actually a call to dict_load_foreigns(), does not occur with this testcase.
The change is to replace srv_lower_case_table_names by an interface function in innodb.cc that retrieves the server global variable when it is needed.
The innoDB global variable srv_lower_case_table_names is set to the value of lower_case_table_names declared in mysqld.h server in ha_innodb.cc. Since this variable can change at runtime, it is reset for each handler call to ::create, ::open, ::rename_table & ::delete_table.
But it is possible for tables to be implicitly opened before an explicit handler call is made when an engine is first started or restarted. I was able to reproduce that with the testcase in this patch on a version of InnoDB from 2 weeks ago. It seemed like the change buffer entries for the secondary key was getting put into pages after the restart. (But I am not sure, I did not write down the call stack while it was reproducing.) In the current code, the implicit open, which is actually a call to dict_load_foreigns(), does not occur with this testcase.
The change is to replace srv_lower_case_table_names by an interface function in innodb.cc that retrieves the server global variable when it is needed.
SRV_CONC_FORCE_EXIT_INNODB
This is a bogus UNIV_SYNC_DEBUG assertion failure that I introduced
when introducing assertions for checking that InnoDB is not holding
any mutexes or rw-locks when returning control to MySQL.
srv_suspend_mysql_thread(): Release dict_operation_lock before
invoking srv_conc_force_exit_innodb(), which would now check that the
thread is not holding any mutexes or rw-locks. After resuming, check
sync_thread_levels_nonempty_trx() and do srv_conc_force_enter_innodb()
before reacquiring the dict_operation_lock.
rb:646 approved by Sunny Bains
SRV_CONC_FORCE_EXIT_INNODB
This is a bogus UNIV_SYNC_DEBUG assertion failure that I introduced
when introducing assertions for checking that InnoDB is not holding
any mutexes or rw-locks when returning control to MySQL.
srv_suspend_mysql_thread(): Release dict_operation_lock before
invoking srv_conc_force_exit_innodb(), which would now check that the
thread is not holding any mutexes or rw-locks. After resuming, check
sync_thread_levels_nonempty_trx() and do srv_conc_force_enter_innodb()
before reacquiring the dict_operation_lock.
rb:646 approved by Sunny Bains
On shutdown, do not exit threads in os_event_wait(). This method of
exiting was only used by the I/O handler threads. Exit them on a
higher level.
os_event_wait_low(), os_event_wait_time_low(): Do not exit on shutdown.
os_thread_exit(), ut_dbg_assertion_failed(), ut_print_timestamp(): Add
attribute cold, so that GCC knows that these functions are rarely
invoked and can be optimized for size.
os_aio_linux_collect(): Return on shutdown.
os_aio_linux_handle(), os_aio_simulated_handle(), os_aio_windows_handle():
Set *message1 = *message2 = NULL and return TRUE on shutdown.
fil_aio_wait(): Return on shutdown.
logs_empty_and_mark_files_at_shutdown(): Even in very fast shutdown
(innodb_fast_shutdown=2), allow the background threads to exit, but
skip the flushing and log checkpointing.
innobase_shutdown_for_mysql(): Always wait for all the threads to exit.
rb:633 approved by Sunny Bains
On shutdown, do not exit threads in os_event_wait(). This method of
exiting was only used by the I/O handler threads. Exit them on a
higher level.
os_event_wait_low(), os_event_wait_time_low(): Do not exit on shutdown.
os_thread_exit(), ut_dbg_assertion_failed(), ut_print_timestamp(): Add
attribute cold, so that GCC knows that these functions are rarely
invoked and can be optimized for size.
os_aio_linux_collect(): Return on shutdown.
os_aio_linux_handle(), os_aio_simulated_handle(), os_aio_windows_handle():
Set *message1 = *message2 = NULL and return TRUE on shutdown.
fil_aio_wait(): Return on shutdown.
logs_empty_and_mark_files_at_shutdown(): Even in very fast shutdown
(innodb_fast_shutdown=2), allow the background threads to exit, but
skip the flushing and log checkpointing.
innobase_shutdown_for_mysql(): Always wait for all the threads to exit.
rb:633 approved by Sunny Bains
Remove most references to thread id in InnoDB. Three references
remain: the current holder of a mutex, and the current x-lock holder
of a rw-lock, and some references in UNIV_SYNC_DEBUG checks. This
allows MySQL to change the thread associated to a client connection.
Tighten the UNIV_SYNC_DEBUG checks, trying to ensure that no InnoDB
mutex or x-lock is being held when returning control to MySQL. The
only semaphore that may be held is the btr_search_latch in shared mode.
sync_thread_levels_empty_except_dict(): A wrapper for
sync_thread_levels_empty_gen(TRUE).
sync_thread_levels_nonempty_trx(): Check that the current thread is
not holding any InnoDB semaphores, except btr_search_latch if
trx->has_search_latch.
sync_thread_levels_empty(): Unused function; remove.
trx_t: Remove mysql_thread_id and mysql_process_no.
srv_slot_t: Remove id and handle.
row_search_for_mysql(), srv_conc_enter_innodb(),
srv_conc_force_enter_innodb(), srv_conc_force_exit_innodb(),
srv_conc_exit_innodb(), srv_suspend_mysql_thread: Assert
!sync_thread_levels_nonempty_trx().
rb:634 approved by Sunny Bains
Remove most references to thread id in InnoDB. Three references
remain: the current holder of a mutex, and the current x-lock holder
of a rw-lock, and some references in UNIV_SYNC_DEBUG checks. This
allows MySQL to change the thread associated to a client connection.
Tighten the UNIV_SYNC_DEBUG checks, trying to ensure that no InnoDB
mutex or x-lock is being held when returning control to MySQL. The
only semaphore that may be held is the btr_search_latch in shared mode.
sync_thread_levels_empty_except_dict(): A wrapper for
sync_thread_levels_empty_gen(TRUE).
sync_thread_levels_nonempty_trx(): Check that the current thread is
not holding any InnoDB semaphores, except btr_search_latch if
trx->has_search_latch.
sync_thread_levels_empty(): Unused function; remove.
trx_t: Remove mysql_thread_id and mysql_process_no.
srv_slot_t: Remove id and handle.
row_search_for_mysql(), srv_conc_enter_innodb(),
srv_conc_force_enter_innodb(), srv_conc_force_exit_innodb(),
srv_conc_exit_innodb(), srv_suspend_mysql_thread: Assert
!sync_thread_levels_nonempty_trx().
rb:634 approved by Sunny Bains
sync_array_print_long_waits(): Return the longest waiting thread ID
and the longest waited-for lock. Only if those remain unchanged
between calls in srv_error_monitor_thread(), increment
fatal_cnt. Otherwise, reset fatal_cnt.
Background: There is a built-in watchdog in InnoDB whose purpose is to
kill the server when some thread is stuck waiting for a mutex or
rw-lock. Before this fix, the logic was flawed.
The function sync_array_print_long_waits() returns TRUE if it finds a
lock wait that exceeds 10 minutes (srv_fatal_semaphore_wait_threshold).
The function srv_error_monitor_thread() will kill the server if this
happens 10 times in a row (fatal_cnt reaches 10), checked every 30
seconds. This is wrong, because this situation does not mean that the
server is hung. If the server is very busy for a little over 15
minutes, it will be killed.
Consider this example. Thread T1 is waiting for mutex M. Some time
later, threads T2..Tn start waiting for the same mutex M. If T1 keeps
waiting for 600 seconds, fatal_cnt will be incremented to 1. So far,
so good. Now, if M is granted to T1, the server was obviously not
stuck. But, T2..Tn keeps waiting, and their wait time will be longer
than 600 seconds. If 5 minutes later, some Tn has still been waiting
for more than 10 minutes for the mutex M, the server can be killed,
even though it is not stuck.
rb:622 approved by Jimmy Yang
sync_array_print_long_waits(): Return the longest waiting thread ID
and the longest waited-for lock. Only if those remain unchanged
between calls in srv_error_monitor_thread(), increment
fatal_cnt. Otherwise, reset fatal_cnt.
Background: There is a built-in watchdog in InnoDB whose purpose is to
kill the server when some thread is stuck waiting for a mutex or
rw-lock. Before this fix, the logic was flawed.
The function sync_array_print_long_waits() returns TRUE if it finds a
lock wait that exceeds 10 minutes (srv_fatal_semaphore_wait_threshold).
The function srv_error_monitor_thread() will kill the server if this
happens 10 times in a row (fatal_cnt reaches 10), checked every 30
seconds. This is wrong, because this situation does not mean that the
server is hung. If the server is very busy for a little over 15
minutes, it will be killed.
Consider this example. Thread T1 is waiting for mutex M. Some time
later, threads T2..Tn start waiting for the same mutex M. If T1 keeps
waiting for 600 seconds, fatal_cnt will be incremented to 1. So far,
so good. Now, if M is granted to T1, the server was obviously not
stuck. But, T2..Tn keeps waiting, and their wait time will be longer
than 600 seconds. If 5 minutes later, some Tn has still been waiting
for more than 10 minutes for the mutex M, the server can be killed,
even though it is not stuck.
rb:622 approved by Jimmy Yang
ibuf_inside(), ibuf_enter(), ibuf_exit(): Add the parameter mtr. The
flag is no longer kept in the thread-local storage but in the
mini-transaction (mtr->inside_ibuf).
mtr_start(): Clean up the comment and remove the unused return value.
mtr_commit(): Assert !ibuf_inside(mtr) in debug builds.
ibuf_mtr_start(): Like mtr_start(), but sets the flag.
ibuf_mtr_commit(), ibuf_btr_pcur_commit_specify_mtr(): Wrappers that
assert ibuf_inside().
buf_page_get_zip(), buf_page_init_for_read(),
buf_read_ibuf_merge_pages(), fil_io(), ibuf_free_excess_pages(),
ibuf_contract_ext(): Remove assertions on ibuf_inside(), because a
mini-transaction is not available.
buf_read_ahead_linear(): Add the parameter inside_ibuf.
ibuf_restore_pos(): When this function returns FALSE, it commits mtr
and must therefore do ibuf_exit(mtr).
ibuf_delete_rec(): This function commits mtr and must therefore do
ibuf_exit(mtr).
ibuf_rec_get_page_no(), ibuf_rec_get_space(), ibuf_rec_get_info(),
ibuf_rec_get_op_type(), ibuf_build_entry_from_ibuf_rec(),
ibuf_rec_get_volume(), ibuf_get_merge_page_nos(),
ibuf_get_volume_buffered_count(), ibuf_get_entry_counter_low(): Add
the parameter mtr in debug builds, for asserting ibuf_inside(mtr).
rb:585 approved by Sunny Bains
ibuf_inside(), ibuf_enter(), ibuf_exit(): Add the parameter mtr. The
flag is no longer kept in the thread-local storage but in the
mini-transaction (mtr->inside_ibuf).
mtr_start(): Clean up the comment and remove the unused return value.
mtr_commit(): Assert !ibuf_inside(mtr) in debug builds.
ibuf_mtr_start(): Like mtr_start(), but sets the flag.
ibuf_mtr_commit(), ibuf_btr_pcur_commit_specify_mtr(): Wrappers that
assert ibuf_inside().
buf_page_get_zip(), buf_page_init_for_read(),
buf_read_ibuf_merge_pages(), fil_io(), ibuf_free_excess_pages(),
ibuf_contract_ext(): Remove assertions on ibuf_inside(), because a
mini-transaction is not available.
buf_read_ahead_linear(): Add the parameter inside_ibuf.
ibuf_restore_pos(): When this function returns FALSE, it commits mtr
and must therefore do ibuf_exit(mtr).
ibuf_delete_rec(): This function commits mtr and must therefore do
ibuf_exit(mtr).
ibuf_rec_get_page_no(), ibuf_rec_get_space(), ibuf_rec_get_info(),
ibuf_rec_get_op_type(), ibuf_build_entry_from_ibuf_rec(),
ibuf_rec_get_volume(), ibuf_get_merge_page_nos(),
ibuf_get_volume_buffered_count(), ibuf_get_entry_counter_low(): Add
the parameter mtr in debug builds, for asserting ibuf_inside(mtr).
rb:585 approved by Sunny Bains
Remove the slot_no member of struct thr_local_struct.
enum srv_thread_type: Remove unused thread types.
srv_get_thread_type(): Unused function, remove.
thr_local_get_slot_no(), thr_local_set_slot_no(): Remove.
srv_thread_type_validate(), srv_slot_get_type(): New functions, for debugging.
srv_table_reserve_slot(): Return the srv_slot_t* directly. Do not create
thread-local storage.
srv_suspend_thread(): Get the srv_slot_t* as parameter. Return void;
the caller knows slot->event already.
srv_thread_has_reserved_slot(), srv_release_threads(): Assert
srv_thread_type_validate(type).
srv_init(): Use mem_zalloc() instead of mem_alloc(). Replace
srv_table_get_nth_slot(), because it now asserts that the kernel_mutex
is being held.
srv_master_thread(), srv_purge_thread(): Remember the slot from
srv_table_reserve_slot().
rb:629 approved by Inaam Rana
Remove the slot_no member of struct thr_local_struct.
enum srv_thread_type: Remove unused thread types.
srv_get_thread_type(): Unused function, remove.
thr_local_get_slot_no(), thr_local_set_slot_no(): Remove.
srv_thread_type_validate(), srv_slot_get_type(): New functions, for debugging.
srv_table_reserve_slot(): Return the srv_slot_t* directly. Do not create
thread-local storage.
srv_suspend_thread(): Get the srv_slot_t* as parameter. Return void;
the caller knows slot->event already.
srv_thread_has_reserved_slot(), srv_release_threads(): Assert
srv_thread_type_validate(type).
srv_init(): Use mem_zalloc() instead of mem_alloc(). Replace
srv_table_get_nth_slot(), because it now asserts that the kernel_mutex
is being held.
srv_master_thread(), srv_purge_thread(): Remember the slot from
srv_table_reserve_slot().
rb:629 approved by Inaam Rana
Bug #11766501: Multiple RBS break the get rseg with mininum trx_t::no code during purge
Bug# 59291 changes:
Main problem is that truncating the UNDO log at the completion of every
trx_purge() call is expensive as the number of rollback segments is increased.
We truncate after a configurable amount of pages. The innodb_purge_batch_size
parameter is used to control when InnoDB does the actual truncate. The truncate
is done once after 128 (or TRX_SYS_N_RSEGS iterations). In other words we
truncate after purge 128 * innodb_purge_batch_size. The smaller the batch
size the quicker we truncate.
Introduce a new parameter that allows how many rollback segments to use for
storing REDO information. This is really step 1 in allowing complete control
to the user over rollback space management.
New parameters:
i) innodb_rollback_segments = number of rollback_segments to use
(default is now 128) dynamic parameter, can be changed anytime.
Currently there is little benefit in changing it from the default.
Optimisations in the patch.
i. Change the O(n) behaviour of trx_rseg_get_on_id() to O(log n)
Backported from 5.6. Refactor some of the binary heap code.
Create a new include/ut0bh.ic file.
ii. Avoid truncating the rollback segments after every purge.
Related changes that were moved to a separate patch:
i. Purge should not do any flushing, only wait for space to be free so that
it only does purging of records unless it is held up by a long running
transaction that is preventing it from progressing.
ii. Give the purge thread preference over transactions when acquiring the
rseg->mutex during commit. This to avoid purge blocking unnecessarily
when getting the next rollback segment to purge.
Bug #11766501 changes:
Add the rseg to the min binary heap under the cover of the kernel mutex and
the binary heap mutex. This ensures the ordering of the min binary heap.
The two changes have to be committed together because they share the same
that fixes both issues.
rb://567 Approved by: Inaam Rana.
Bug #11766501: Multiple RBS break the get rseg with mininum trx_t::no code during purge
Bug# 59291 changes:
Main problem is that truncating the UNDO log at the completion of every
trx_purge() call is expensive as the number of rollback segments is increased.
We truncate after a configurable amount of pages. The innodb_purge_batch_size
parameter is used to control when InnoDB does the actual truncate. The truncate
is done once after 128 (or TRX_SYS_N_RSEGS iterations). In other words we
truncate after purge 128 * innodb_purge_batch_size. The smaller the batch
size the quicker we truncate.
Introduce a new parameter that allows how many rollback segments to use for
storing REDO information. This is really step 1 in allowing complete control
to the user over rollback space management.
New parameters:
i) innodb_rollback_segments = number of rollback_segments to use
(default is now 128) dynamic parameter, can be changed anytime.
Currently there is little benefit in changing it from the default.
Optimisations in the patch.
i. Change the O(n) behaviour of trx_rseg_get_on_id() to O(log n)
Backported from 5.6. Refactor some of the binary heap code.
Create a new include/ut0bh.ic file.
ii. Avoid truncating the rollback segments after every purge.
Related changes that were moved to a separate patch:
i. Purge should not do any flushing, only wait for space to be free so that
it only does purging of records unless it is held up by a long running
transaction that is preventing it from progressing.
ii. Give the purge thread preference over transactions when acquiring the
rseg->mutex during commit. This to avoid purge blocking unnecessarily
when getting the next rollback segment to purge.
Bug #11766501 changes:
Add the rseg to the min binary heap under the cover of the kernel mutex and
the binary heap mutex. This ensures the ordering of the min binary heap.
The two changes have to be committed together because they share the same
that fixes both issues.
rb://567 Approved by: Inaam Rana.
rb://566
approved by: Sunny
When using native aio on linux each IO helper thread should be able to
handle upto 256 IO requests. The number 256 is the same which is used
for simulated aio as well. In case of windows where we also use native
aio this limit is 32 because of OS constraints. It seems that we are
using the limit of 32 for all the platforms where we are using native
aio. The fix is to use 256 on all platforms except windows (when native
aio is enabled on windows)
rb://566
approved by: Sunny
When using native aio on linux each IO helper thread should be able to
handle upto 256 IO requests. The number 256 is the same which is used
for simulated aio as well. In case of windows where we also use native
aio this limit is 32 because of OS constraints. It seems that we are
using the limit of 32 for all the platforms where we are using native
aio. The fix is to use 256 on all platforms except windows (when native
aio is enabled on windows)
"rows examined" estimates". This change implements "innodb_stats_method"
with options of "nulls_equal", "nulls_unequal" and "null_ignored".
rb://553 approved by Marko
"rows examined" estimates". This change implements "innodb_stats_method"
with options of "nulls_equal", "nulls_unequal" and "null_ignored".
rb://553 approved by Marko
Check whether the master and purge thread are active after creating them. Do
not proceed until both threads have started. We do this by checking whether a
slot has been reserved by both the respective threads.
Add srv_thread_has_reserved_slot() returns slot no or ULINT_UNDEFINED.
rb://536 Approved by Jimmy
Check whether the master and purge thread are active after creating them. Do
not proceed until both threads have started. We do this by checking whether a
slot has been reserved by both the respective threads.
Add srv_thread_has_reserved_slot() returns slot no or ULINT_UNDEFINED.
rb://536 Approved by Jimmy
InnoDB does not attempt to handle lower_case_table_names == 2 when looking
up foreign table names and referenced table name. It turned that server
variable into a boolean and ignored the possibility of it being '2'.
The setting lower_case_table_names == 2 means that it should be stored and
displayed in mixed case as given, but compared internally in lower case.
Normally the server deals with this since it stores table names. But
InnoDB stores referential constraints for the server, so it needs to keep
track of both lower case and given names.
This solution creates two table name pointers for each foreign and referenced
table name. One to display the name, and one to look it up. Both pointers
point to the same allocated string unless this setting is 2. So the overhead
added is not too much.
Two functions are created in dict0mem.c to populate the ..._lookup versions
of these pointers. Both dict_mem_foreign_table_name_lookup_set() and
dict_mem_referenced_table_name_lookup_set() are called 5 times each.
InnoDB does not attempt to handle lower_case_table_names == 2 when looking
up foreign table names and referenced table name. It turned that server
variable into a boolean and ignored the possibility of it being '2'.
The setting lower_case_table_names == 2 means that it should be stored and
displayed in mixed case as given, but compared internally in lower case.
Normally the server deals with this since it stores table names. But
InnoDB stores referential constraints for the server, so it needs to keep
track of both lower case and given names.
This solution creates two table name pointers for each foreign and referenced
table name. One to display the name, and one to look it up. Both pointers
point to the same allocated string unless this setting is 2. So the overhead
added is not too much.
Two functions are created in dict0mem.c to populate the ..._lookup versions
of these pointers. Both dict_mem_foreign_table_name_lookup_set() and
dict_mem_referenced_table_name_lookup_set() are called 5 times each.
Fix a race condition in srv_master_thread(). We need to acquire the kernel
mutex before calling srv_table_reserve_slot(). Add a mutex_own() assertion
in srv_table_reserve_slot().
Fix a race condition in srv_master_thread(). We need to acquire the kernel
mutex before calling srv_table_reserve_slot(). Add a mutex_own() assertion
in srv_table_reserve_slot().
Add new function os_cond_wait_timed(). Change the os_thread_sleep() calls
to timed conditional waits. Signal the background threads during the shutdown
phase so that we avoid waiting for the sleep to timeout thus saving some time.
rb://439 -- Approved by Jimmy Yang