after Operating system error number 36 in a file operation.
Analysis: os_file_get_status did not handle error ENAMETOOLONG
correctly.
Fix: Add correct handling for error ENAMETOOLONG. Note that on InnoDB
case the error is not passed all the way up to server. That would
be bigger rewamp.
innodb_stats_sample_pages
Analysis: If you set the number of analyzed pages
to very low number compared to actual pages on
that table/index it randomly pics those pages
(default 8 pages), this leads to fact that query
after analyze table returns different results. If
the index tree is small, smaller than 10 *
n_sample_pages + total_external_size, then the
estimate is ok. For bigger index trees it is
common that we do not see any borders between
key values in the few pages we pick. But still
there may be n_sample_pages different key values,
or even more. And it just tries to
approximate to n_sample_pages (8).
Fix: (1) Introduced new dynamic configuration variable
innodb_stats_sample_traditional that retains
the current design. Default false.
(2) If traditional sample is not used we use
n_sample_pages = max(min(srv_stats_sample_pages,
index->stat_index_size),
log2(index->stat_index_size)*
srv_stats_sample_pages);
(3) Introduced new dynamic configuration variable
stat_modified_counter (default = 0) if set
sets lower bound for row updates when statistics is re-estimated.
If user has provided upper bound for how many rows needs to be updated
before we calculate new statistics we use minimum of provided value
and 1/16 of table every 16th round. If no upper bound is provided
(srv_stats_modified_counter = 0, default) then calculate new statistics
if 1 / 16 of table has been modified
since the last time a statistics batch was run.
We calculate statistics at most every 16th round, since we may have
a counter table which is very small and updated very often.
@param t table
@return true if the table has changed too much and stats need to be
recalculated
*/
#define DICT_TABLE_CHANGED_TOO_MUCH(t) \
((ib_int64_t) (t)->stat_modified_counter > (srv_stats_modified_counter ? \
ut_min(srv_stats_modified_counter, (16 + (t)->stat_n_rows / 16)) : \
16 + (t)->stat_n_rows / 16))
The bug was that full memory barrier was missing in the code that ensures that
a waiter on an InnoDB mutex will not go to sleep unless it is guaranteed to be
woken up again by another thread currently holding the mutex. This made
possible a race where a thread could get stuck waiting for a mutex that is in
fact no longer locked. If that thread was also holding other critical locks,
this could stall the entire server. There is an error monitor thread than can
break the stall, it runs about once per second. But if the error monitor
thread itself got stuck or was not running, then the entire server could hang
infinitely.
This was introduced on i386/amd64 platforms in 5.5.40 and 10.0.13 by an
incorrect patch that tried to fix the similar problem for PowerPC.
This commit reverts the incorrect PowerPC patch, and instead implements a fix
for PowerPC that does not change i386/amd64 behaviour, making PowerPC work
similarly to i386/amd64.
Merge Facebook commit cd063ab930
authored by Peng Tian from https://github.com/facebook/mysql-5.6
Introduced a new configuration variable innodb_fatal_semaphore_wait_threshold,
it makes the fatal semaphore timeout configurable. Modified original commit
so that no MariaDB server files are changed, instead introduced a new
InnoDB/XtraDB configuration variable.
Its default/min/max vlaues are 600/1/2^32-1 in seconds (it was hardcoded
as 600, now its default value is 600, so the default behavior of this diff
should be no change).
Analysis: InnoDB error monitor is responsible to call every second
sync_arr_wake_threads_if_sema_free() to wake up possible hanging
threads if they are missed in mutex_signal_object. This is not
possible if error monitor itself is on mutex/semaphore wait. We
should avoid all unnecessary mutex/semaphore waits on error monitor.
Currently error monitor calls function buf_flush_stat_update()
that calls log_get_lsn() function and there we will try to get
log_sys mutex. Better, solution for error monitor is that in
buf_flush_stat_update() we will try to get lsn with
mutex_enter_nowait() and if we did not get mutex do not update
the stats.
Fix: Use log_get_lsn_nowait() function on buf_flush_stat_update()
function. If returned lsn is 0, we do not update flush stats.
log_get_lsn_nowait() will use mutex_enter_nowait() and if
we get mutex we return a correct lsn if not we return 0.
on work-amd64-valgrind.
Fixed issue by finding out first the current used priority
for both treads and using that seeing did we really change
the priority or not.
Analysis: InnoDB error monitor is responsible to call every second
sync_arr_wake_threads_if_sema_free() to wake up possible hanging
threads if they are missed in mutex_signal_object. This is not
possible if error monitor itself is on mutex/semaphore wait. We
should avoid all unnecessary mutex/semaphore waits on error monitor.
Currently error monitor calls function buf_flush_stat_update()
that calls log_get_lsn() function and there we will try to get
log_sys mutex. Better, solution for error monitor is that in
buf_flush_stat_update() we will try to get lsn with
mutex_enter_nowait() and if we did not get mutex do not update
the stats.
Fix: Use log_get_lsn_nowait() function on buf_flush_stat_update()
function. If returned lsn is 0, we do not update flush stats.
log_get_lsn_nowait() will use mutex_enter_nowait() and if
we get mutex we return a correct lsn if not we return 0.
Merged Facebook commit 617aef9f911d825e9053f3d611d0389e02031225
authored by Inaam Rana to InnoDB storage engine (not XtraDB)
from https://github.com/facebook/mysql-5.6
WL#7047 - Optimize buffer pool list scans and related batch processing
Reduce excessive scanning of pages when doing flush list batches. The
fix is to introduce the concept of "Hazard Pointer", this reduces the
time complexity of the scan from O(n*n) to O.
The concept of hazard pointer is reversed in this work. Academically
hazard pointer is a pointer that the thread working on it will declar
such and as long as that thread is not done no other thread is allowe
do anything with it.
In this WL we declare the pointer as a hazard pointer and then if any
thread attempts to work on it, it is allowed to do so but it has to a
the hazard pointer to the next valid value. We use hazard pointer sol
reverse traversal of lists within a buffer pool instance.
Add an event to control the background flush thread. The background f
thread wait has been converted to an os event timed wait so that it c
signalled by threads that want to kick start a background flush when
buffer pool is running low on free/dirty pages.
Merge Facebook commit 154c579b828a60722a7d9477fc61868c07453d08
and e8f0052f9b112dc786bf9b957ed5b16a5749f7fd authored
by Steaphan Greene from https://github.com/facebook/mysql-5.6
Optimize prefix index queries to skip cluster index lookup when possible.
Currently InnoDB will always fetch the clustered index (primary key
index) for all prefix columns in an index, even when the value of a
particular record is smaller than the prefix length. This change
optimizes that case to use the record from the secondary index and avoid
the extra lookup.
Also adds two status vars that track how effective this is:
innodb_secondary_index_triggered_cluster_reads:
Times secondary index lookup triggered cluster lookup.
innodb_secondary_index_triggered_cluster_reads_avoided:
Times prefix optimization avoided triggering cluster lookup.
Merge Facebook commit 4f3e0343fd2ac3fc7311d0ec9739a8f668274f0d
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
Adds innodb_idle_flush_pct to enable tuning of the page flushing rate
when the system is relatively idle. We care about this, since doing
extra unnecessary flash writes shortens the lifespan of the flash.
New generation hard drives, SSDs and NVM devices support 4K
sector size. Supported sector size can be found using fstatvfs()
or GetDiskFreeSpace() functions.
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6
This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220
Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.
innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.
dd2d11be7a
Merged Facebook commit ecff018632c6db49bad73d9233c3cdc9f41430e9
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This change is to fix: http://bugs.mysql.com/62534
This makes innodb_max_dirty_pages_pct a double with min,default,max values
0.001, 75, 99.999.
This also makes innodb_max_dirty_pages_pct_lwm and adaptive_flushing_lwm
doubles, as these sysvars are inter-dependent.
Added more to the BUFFER POOL AND MEMORY section of SHOW INNODB STATUS:
Percent pages dirty: X.X
This is all n_dirty_pages / used_pages
Percent all pages dirty: X.X
This is all n_dirty_pages / all-pages
Max dirty pages percent: X.X
This is innodb_max_dirty_pages_pct
Also changed all of buf from 2 to 3 digits of precision (%.2f -> %.3f).
Merge Facebook commit f981a51a47519b0ba527917887f8adc6df9ae147
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6.
This just moves some structure definitions from inside a
single .cc file to a shared .h file, with a few tweaks to
allow these structures to be shared.
On its own, it should have no actual effect. This is needed later.
Merge Facebook commit 25295d003cb0c17aa8fb756523923c77250b3294
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This adds a pointer to the trx to each mtr.
This allows the trx to be accessed in parts of the code
where it was otherwise not available. This is needed later.
you the progress of inplace alter table and row log buffer usage
- (x 100%, it's 4-digit. 10000 means 100.00%)
- Innodb_onlineddl_rowlog_rows
Shows how many rows are stored in row log buffer.
- Innodb_onlineddl_rowlog_pct_used
Shows row log buffer usage in percent ( *100%, it's 4-digit. 10000 means 100.00% ).
- Innodb_onlineddl_pct_progress
Shows the progress of inplace alter table. It might
be not so accurate because inplace alter is highly
depend on disk and buffer pool status.
But still it is useful and better than nothing.
- Add some log for inplace alter table
XtraDB/InnoDB will print some message before and
after doing some task.
MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64
This problem affects only debug builds on PPC64.
There are at least two race conditions around
rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:
- rw_lock_debug_waiters was loaded/stored without setting
appropriate locks/memory barriers.
- there is a gap between calls to os_event_reset() and
os_event_wait() and in such case we're supposed to pass
return value of the former to the latter.
Fixed by replacing self-cooked spinlocks with system mutexes.
These days system mutexes offer much better performance. OTOH
performance is not that critical for debug builds.
MDEV-6450 - MariaDB crash on Power8 when built with advance tool
chain
InnoDB mutex_exit() function calls __sync_test_and_set() to release
the lock. According to manual this function is supposed to create
"acquire" memory barrier whereas in fact we need "release" memory
barrier at mutex_exit().
The problem isn't repeatable with gcc because it creates
"acquire-release" memory barrier for __sync_test_and_set().
ATC creates just "acquire" barrier.
Fixed by creating proper barrier at mutex_exit() by using
__sync_lock_release() instead of __sync_test_and_set().
Merged lp:maria/maria-10.0-galera up to revision 3880.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
Merged lp:maria/maria-10.0-galera up to revision 3879.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
Part of this work is based on Stewart Smitch's memory barrier and lower priori
patches for power8.
- Added memory syncronization for innodb & xtradb for power8.
- Added HAVE_WINDOWS_MM_FENCE to CMakeList.txt
- Added os_isync to fix a syncronization problem on power
- Added log_get_lsn_nowait which is now used srv_error_monitor_thread to ensur
if log mutex is locked.
All changes done both for InnoDB and Xtradb
buf0buf.cc line 2642.
Analysis: innodb_compression_algorithm is a global variable and
can change while we are building page compressed page. This could
lead page corruption.
Fix: Cache innodb_compression_algorithm on local variable before
doing any compression or page formating to avoid concurrent
change. Improved page verification on debug builds.
Analysis: InnoDB writes also files that do not contain FIL-header.
This could lead incorrect analysis on os_fil_read_func function
when it tries to see is page page compressed based on FIL_PAGE_TYPE
field on FIL-header. With bad luck uncompressed page that does
not contain FIL-headed, the byte on FIL_PAGE_TYPE position could
indicate that page is page comrpessed.
Fix: Upper layer must indicate is file space page compressed
or not. If this is not yet known, we need to read the FIL-header
and find it out. Files that we know that are not page compressed
we can always just provide FALSE.
tool chain
This is an addition to the original patch. On Windows
InterlockedExchange implies full memory barrier, whereas
only acquire/release barriers required.
This problem affects only debug builds on PPC64.
There are at least two race conditions around
rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:
- rw_lock_debug_waiters was loaded/stored without setting
appropriate locks/memory barriers.
- there is a gap between calls to os_event_reset() and
os_event_wait() and in such case we're supposed to pass
return value of the former to the latter.
Fixed by replacing self-cooked spinlocks with system mutexes.
These days system mutexes offer much better performance. OTOH
performance is not that critical for debug builds.