The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
Remove the debug parameter innodb_force_recovery_crash that was
introduced into MySQL 5.6 by me in WL#6494 which allowed InnoDB
to resize the redo log on startup.
Let innodb.log_file_size actually start up the server, but ensure
that the InnoDB storage engine refuses to start up in each of the
scenarios.
srv_release_threads(): Actually wait for the threads to resume
from suspension. On CentOS 5 and possibly other platforms,
os_event_set() may be lost.
srv_resume_thread(): A counterpart of srv_suspend_thread().
Optionally wait for the event to be set, optionally with a timeout,
and then release the thread from suspension.
srv_free_slot(): Unconditionally suspend the thread. It is always
in resumed state when this function is entered.
srv_active_wake_master_thread_low(): Only call os_event_set().
srv_purge_coordinator_suspend(): Use srv_resume_thread() instead
of the complicated logic.
Sometimes innodb_data_file_size_debug was reported as INT UNSIGNED
instead of BIGINT UNSIGNED. Make it uint instead of ulong to get
a more deterministic result.
InnoDB shutdown failed to properly take fil_crypt_thread() into account.
The encryption threads were signalled to shut down together with other
non-critical tasks. This could be much too early in case of slow shutdown,
which could need minutes to complete the purge. Furthermore, InnoDB
failed to wait for the fil_crypt_thread() to actually exit before
proceeding to the final steps of shutdown, causing the race conditions.
Furthermore, the log_scrub_thread() was shut down way too early.
Also it should remain until the SRV_SHUTDOWN_FLUSH_PHASE.
fil_crypt_threads_end(): Remove. This would cause the threads to
be terminated way too early.
srv_buf_dump_thread_active, srv_dict_stats_thread_active,
lock_sys->timeout_thread_active, log_scrub_thread_active,
srv_monitor_active, srv_error_monitor_active: Remove a race condition
between startup and shutdown, by setting these in the startup thread
that creates threads, not in each created thread. In this way, once the
flag is cleared, it will remain cleared during shutdown.
srv_n_fil_crypt_threads_started, fil_crypt_threads_event: Declare in
global rather than static scope.
log_scrub_event, srv_log_scrub_thread_active, log_scrub_thread():
Declare in static rather than global scope. Let these be created by
log_init() and freed by log_shutdown().
rotate_thread_t::should_shutdown(): Do not shut down before the
SRV_SHUTDOWN_FLUSH_PHASE.
srv_any_background_threads_are_active(): Remove. These checks now
exist in logs_empty_and_mark_files_at_shutdown().
logs_empty_and_mark_files_at_shutdown(): Shut down the threads in
the proper order. Keep fil_crypt_thread() and log_scrub_thread() alive
until SRV_SHUTDOWN_FLUSH_PHASE, and check that they actually terminate.
fil_space_t::recv_size: New member: recovered tablespace size in pages;
0 if no size change was read from the redo log,
or if the size change was implemented.
fil_space_set_recv_size(): New function for setting space->recv_size.
innodb_data_file_size_debug: A debug parameter for setting the system
tablespace size in recovery even when the redo log does not contain
any size changes. It is hard to write a small test case that would
cause the system tablespace to be extended at the critical moment.
recv_parse_log_rec(): Note those tablespaces whose size is being changed
by the redo log, by invoking fil_space_set_recv_size().
innobase_init(): Correct an error message, and do not require a larger
innodb_buffer_pool_size when starting up with a smaller innodb_page_size.
innobase_start_or_create_for_mysql(): Allow startup with any initial
size of the ibdata1 file if the autoextend attribute is set. Require
the minimum size of fixed-size system tablespaces to be 640 pages,
not 10 megabytes. Implement innodb_data_file_size_debug.
open_or_create_data_files(): Round the system tablespace size down
to pages, not to full megabytes, (Our test truncates the system
tablespace to more than 800 pages with innodb_page_size=4k.
InnoDB should not imagine that it was truncated to 768 pages
and then overwrite good pages in the tablespace.)
fil_flush_low(): Refactored from fil_flush().
fil_space_extend_must_retry(): Refactored from
fil_extend_space_to_desired_size().
fil_mutex_enter_and_prepare_for_io(): Extend the tablespace if
fil_space_set_recv_size() was called.
The test case has been successfully run with all the
innodb_page_size values 4k, 8k, 16k, 32k, 64k.
Reduce the number of calls to encryption_get_key_get_latest_version
when doing key rotation with two different methods:
(1) We need to fetch key information when tablespace not yet
have a encryption information, invalid keys are handled now
differently (see below). There was extra call to detect
if key_id is not found on key rotation.
(2) If key_id is not found from encryption plugin, do not
try fetching new key_version for it as it will fail anyway.
We store return value from encryption_get_key_get_latest_version
call and if it returns ENCRYPTION_KEY_VERSION_INVALID there
is no need to call it again.
Analysis: By design InnoDB was reading first page of every .ibd file
at startup to find out is tablespace encrypted or not. This is
because tablespace could have been encrypted always, not
encrypted newer or encrypted based on configuration and this
information can be find realible only from first page of .ibd file.
Fix: Do not read first page of every .ibd file at startup. Instead
whenever tablespace is first time accedded we will read the first
page to find necessary information about tablespace encryption
status.
TODO: Add support for SYS_TABLEOPTIONS where all table options
encryption information included will be stored.
Backport pull request #125 from grooverdan/MDEV-8923_innodb_buffer_pool_dump_pct to 10.0
WL#6504 InnoDB buffer pool dump/load enchantments
This patch consists of two parts:
1. Dump only the hottest N% of the buffer pool(s)
2. Prevent hogging the server duing BP load
From MySQL - commit b409342c43ce2edb68807100a77001367c7e6b8e
Add testcases for innodb_buffer_pool_dump_pct_basic.
Part of the code authored by Daniel Black
WL#6504 InnoDB buffer pool dump/load enchantments
This patch consists of two parts:
1. Dump only the hottest N% of the buffer pool(s)
2. Prevent hogging the server duing BP load
From MySQL - commit b409342c43ce2edb68807100a77001367c7e6b8e
Analysis: Current implementation will write and read at least one block
(sort_buffer_size bytes) from disk / index even if that block does not
contain any records.
Fix: Avoid writing / reading empty blocks to temporary files (disk).
Added new dynamic configuration variable innodb_buf_dump_status_frequency
to configure how often buffer pool dump status is printed in the logs.
A number between [0, 100] that tells how oftern buffer pool dump status
in percentages should be printed. E.g. 10 means that buffer pool dump
status is printed when every 10% of number of buffer pool pages are
dumped. Default is 0 (only start and end status is printed).
Step 2:
-- Introduce temporal memory array to buffer pool where to allocate
temporary memory for encryption/compression
-- Rename PAGE_ENCRYPTION -> ENCRYPTION
-- Rename PAGE_ENCRYPTION_KEY -> ENCRYPTION_KEY
-- Rename innodb_default_page_encryption_key -> innodb_default_encryption_key
-- Allow enable/disable encryption for tables by changing
ENCRYPTION to enum having values DEFAULT, ON, OFF
-- In create table store crypt_data if ENCRYPTION is ON or OFF
-- Do not crypt tablespaces having ENCRYPTION=OFF
-- Store encryption mode to crypt_data and redo-log
MDEV-7399: Add support for INFORMATION_SCHEMA.INNODB_MUTEXES
MDEV-7618: Improve semaphore instrumentation
Introduced two new information schema tables to monitor mutex waits
and semaphore waits. Added a new configuration variable
innodb_intrument_semaphores to add thread_id, file name and
line of current holder of mutex/rw_lock.
This patch ports the work that facebook has performed
to make innochecksum handle compressed tables.
the basic idea is to use actual innodb-code to perform
checksum verification rather than duplicating in innochecksum.cc.
to make this work, innodb code has been annotated with
lots of #ifndef UNIV_INNOCHECKSUM so that it can be
compiled outside of storage/innobase.
A new testcase is also added that verifies that innochecksum
works on compressed/non-compressed tables.
Merged from commit fabc79d2ea976c4ff5b79bfe913e6bc03ef69d42
from https://code.google.com/p/google-mysql/
The actual steps to produce this patch are:
take innochecksum from 5.6.14
apply changes in innodb from facebook patches needed to make innochecksum compile
apply changes in innochecksum from facebook patches
add handcrafted testcase
The referenced facebook patches used are:
91e25120e7847fe76ea51135628a5a4dbf7c240c
innodb_stats_sample_pages
Analysis: If you set the number of analyzed pages
to very low number compared to actual pages on
that table/index it randomly pics those pages
(default 8 pages), this leads to fact that query
after analyze table returns different results. If
the index tree is small, smaller than 10 *
n_sample_pages + total_external_size, then the
estimate is ok. For bigger index trees it is
common that we do not see any borders between
key values in the few pages we pick. But still
there may be n_sample_pages different key values,
or even more. And it just tries to
approximate to n_sample_pages (8).
Fix: (1) Introduced new dynamic configuration variable
innodb_stats_sample_traditional that retains
the current design. Default false.
(2) If traditional sample is not used we use
n_sample_pages = max(min(srv_stats_sample_pages,
index->stat_index_size),
log2(index->stat_index_size)*
srv_stats_sample_pages);
(3) Introduced new dynamic configuration variable
stat_modified_counter (default = 0) if set
sets lower bound for row updates when statistics is re-estimated.
If user has provided upper bound for how many rows needs to be updated
before we calculate new statistics we use minimum of provided value
and 1/16 of table every 16th round. If no upper bound is provided
(srv_stats_modified_counter = 0, default) then calculate new statistics
if 1 / 16 of table has been modified
since the last time a statistics batch was run.
We calculate statistics at most every 16th round, since we may have
a counter table which is very small and updated very often.
@param t table
@return true if the table has changed too much and stats need to be
recalculated
*/
#define DICT_TABLE_CHANGED_TOO_MUCH(t) \
((ib_int64_t) (t)->stat_modified_counter > (srv_stats_modified_counter ? \
ut_min(srv_stats_modified_counter, (16 + (t)->stat_n_rows / 16)) : \
16 + (t)->stat_n_rows / 16))
Merge Facebook commit cd063ab930
authored by Peng Tian from https://github.com/facebook/mysql-5.6
Introduced a new configuration variable innodb_fatal_semaphore_wait_threshold,
it makes the fatal semaphore timeout configurable. Modified original commit
so that no MariaDB server files are changed, instead introduced a new
InnoDB/XtraDB configuration variable.
Its default/min/max vlaues are 600/1/2^32-1 in seconds (it was hardcoded
as 600, now its default value is 600, so the default behavior of this diff
should be no change).
Merge Facebook commit 154c579b828a60722a7d9477fc61868c07453d08
and e8f0052f9b112dc786bf9b957ed5b16a5749f7fd authored
by Steaphan Greene from https://github.com/facebook/mysql-5.6
Optimize prefix index queries to skip cluster index lookup when possible.
Currently InnoDB will always fetch the clustered index (primary key
index) for all prefix columns in an index, even when the value of a
particular record is smaller than the prefix length. This change
optimizes that case to use the record from the secondary index and avoid
the extra lookup.
Also adds two status vars that track how effective this is:
innodb_secondary_index_triggered_cluster_reads:
Times secondary index lookup triggered cluster lookup.
innodb_secondary_index_triggered_cluster_reads_avoided:
Times prefix optimization avoided triggering cluster lookup.
Merge Facebook commit 4f3e0343fd2ac3fc7311d0ec9739a8f668274f0d
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
Adds innodb_idle_flush_pct to enable tuning of the page flushing rate
when the system is relatively idle. We care about this, since doing
extra unnecessary flash writes shortens the lifespan of the flash.
New generation hard drives, SSDs and NVM devices support 4K
sector size. Supported sector size can be found using fstatvfs()
or GetDiskFreeSpace() functions.
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6
This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220
Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.
innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.
dd2d11be7a
Merged Facebook commit ecff018632c6db49bad73d9233c3cdc9f41430e9
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This change is to fix: http://bugs.mysql.com/62534
This makes innodb_max_dirty_pages_pct a double with min,default,max values
0.001, 75, 99.999.
This also makes innodb_max_dirty_pages_pct_lwm and adaptive_flushing_lwm
doubles, as these sysvars are inter-dependent.
Added more to the BUFFER POOL AND MEMORY section of SHOW INNODB STATUS:
Percent pages dirty: X.X
This is all n_dirty_pages / used_pages
Percent all pages dirty: X.X
This is all n_dirty_pages / all-pages
Max dirty pages percent: X.X
This is innodb_max_dirty_pages_pct
Also changed all of buf from 2 to 3 digits of precision (%.2f -> %.3f).
you the progress of inplace alter table and row log buffer usage
- (x 100%, it's 4-digit. 10000 means 100.00%)
- Innodb_onlineddl_rowlog_rows
Shows how many rows are stored in row log buffer.
- Innodb_onlineddl_rowlog_pct_used
Shows row log buffer usage in percent ( *100%, it's 4-digit. 10000 means 100.00% ).
- Innodb_onlineddl_pct_progress
Shows the progress of inplace alter table. It might
be not so accurate because inplace alter is highly
depend on disk and buffer pool status.
But still it is useful and better than nothing.
- Add some log for inplace alter table
XtraDB/InnoDB will print some message before and
after doing some task.
Merged lp:maria/maria-10.0-galera up to revision 3880.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.