Commit graph

225 commits

Author SHA1 Message Date
Sergei Golubchik
853077ad7e Merge branch '10.0' into bb-10.1-merge
Conflicts:
	.bzrignore
	VERSION
	cmake/plugin.cmake
	debian/dist/Debian/control
	debian/dist/Ubuntu/control
	mysql-test/r/join_outer.result
	mysql-test/r/join_outer_jcl6.result
	mysql-test/r/null.result
	mysql-test/r/old-mode.result
	mysql-test/r/union.result
	mysql-test/t/join_outer.test
	mysql-test/t/null.test
	mysql-test/t/old-mode.test
	mysql-test/t/union.test
	packaging/rpm-oel/mysql.spec.in
	scripts/mysql_config.sh
	sql/ha_ndbcluster.cc
	sql/ha_ndbcluster_binlog.cc
	sql/ha_ndbcluster_cond.cc
	sql/item_cmpfunc.h
	sql/lock.cc
	sql/sql_select.cc
	sql/sql_show.cc
	sql/sql_update.cc
	sql/sql_yacc.yy
	storage/innobase/buf/buf0flu.cc
	storage/innobase/fil/fil0fil.cc
	storage/innobase/include/srv0srv.h
	storage/innobase/lock/lock0lock.cc
	storage/tokudb/CMakeLists.txt
	storage/xtradb/buf/buf0flu.cc
	storage/xtradb/fil/fil0fil.cc
	storage/xtradb/include/srv0srv.h
	storage/xtradb/lock/lock0lock.cc
	support-files/mysql.spec.sh
2014-12-02 22:25:16 +01:00
Jan Lindström
f3bdf9d741 MDEV-7046: MySQL#74480 - Failing assertion: os_file_status(newpath, &exists, &type)
after Operating system error number 36 in a file operation.

Analysis: os_file_get_status did not handle error ENAMETOOLONG
correctly.

Fix: Add correct handling for error ENAMETOOLONG. Note that on InnoDB
case the error is not passed all the way up to server. That would
be bigger rewamp.
2014-11-25 11:38:01 +02:00
Sergei Golubchik
32ec8625af XtraDB 5.6.21-70.0 2014-11-20 17:05:13 +01:00
Sergei Golubchik
3c12c27907 5.5 merge 2014-11-20 16:07:34 +01:00
Sergei Golubchik
afca52bb52 5.5 merge 2014-11-20 15:26:31 +01:00
Jan Lindström
8bc5eabea8 MDEV-7084: innodb index stats inadequate using constant
innodb_stats_sample_pages

Analysis: If you set the number of analyzed pages 
to very low number compared to actual pages on 
that table/index it randomly pics those pages 
(default 8 pages), this leads to fact that query 
after analyze table returns different results. If 
the index tree is small, smaller than 10 * 
n_sample_pages + total_external_size, then the 
estimate is ok. For bigger index trees it is 
common that we do not see any borders between 
key values in the few pages we pick. But still 
there may be n_sample_pages different key values, 
or even more. And it just tries to 
approximate to n_sample_pages (8).

Fix: (1) Introduced new dynamic configuration variable
innodb_stats_sample_traditional  that retains
the current design. Default false.

(2) If traditional sample is not used we use
n_sample_pages = max(min(srv_stats_sample_pages,
                         index->stat_index_size),
                     log2(index->stat_index_size)*
                          srv_stats_sample_pages);

(3) Introduced new dynamic configuration variable
stat_modified_counter (default = 0) if set
sets lower bound for row updates when statistics is re-estimated.

If user has provided upper bound for how many rows needs to be updated
before we calculate new statistics we use minimum of provided value
and 1/16 of table every 16th round. If no upper bound is provided
(srv_stats_modified_counter = 0, default) then calculate new statistics
if 1 / 16 of table has been modified
since the last time a statistics batch was run.
We calculate statistics at most every 16th round, since we may have
a counter table which is very small and updated very often.
@param t table
@return true if the table has changed too much and stats need to be
recalculated
*/
#define DICT_TABLE_CHANGED_TOO_MUCH(t) \
	((ib_int64_t) (t)->stat_modified_counter > (srv_stats_modified_counter ? \
	ut_min(srv_stats_modified_counter, (16 + (t)->stat_n_rows / 16)) : \
		16 + (t)->stat_n_rows / 16))
2014-11-19 20:27:34 +02:00
Sergei Golubchik
3495801e2e 5.5 merge 2014-11-19 17:23:39 +01:00
Kristian Nielsen
6ea41f1e84 MDEV-7026: Race in InnoDB/XtraDB mutex implementation can stall or hang the server.
The bug was that full memory barrier was missing in the code that ensures that
a waiter on an InnoDB mutex will not go to sleep unless it is guaranteed to be
woken up again by another thread currently holding the mutex. This made
possible a race where a thread could get stuck waiting for a mutex that is in
fact no longer locked. If that thread was also holding other critical locks,
this could stall the entire server. There is an error monitor thread than can
break the stall, it runs about once per second. But if the error monitor
thread itself got stuck or was not running, then the entire server could hang
infinitely.

This was introduced on i386/amd64 platforms in 5.5.40 and 10.0.13 by an
incorrect patch that tried to fix the similar problem for PowerPC.

This commit reverts the incorrect PowerPC patch, and instead implements a fix
for PowerPC that does not change i386/amd64 behaviour, making PowerPC work
similarly to i386/amd64.
2014-11-19 13:56:46 +01:00
Jan Lindström
7bf391c205 MDEV-7108: Make long semaphore wait timeout configurable
Merge Facebook commit cd063ab930
authored by Peng Tian from https://github.com/facebook/mysql-5.6

Introduced a new configuration variable innodb_fatal_semaphore_wait_threshold,
it makes the fatal semaphore timeout configurable. Modified original commit
so that no MariaDB server files are changed, instead introduced a new
InnoDB/XtraDB configuration variable.

Its default/min/max vlaues are 600/1/2^32-1 in seconds (it was hardcoded
as 600, now its default value is 600, so the default behavior of this diff
should be no change).
2014-11-17 09:59:52 +02:00
Jan Lindström
bff2d46bf7 MDEV-7100: InnoDB error monitor might unnecessary wait log_sys mutex
Analysis: InnoDB error monitor is responsible to call every second
sync_arr_wake_threads_if_sema_free() to wake up possible hanging
threads if they are missed in mutex_signal_object. This is not
possible if error monitor itself is on mutex/semaphore wait. We
should avoid all unnecessary mutex/semaphore waits on error monitor.
Currently error monitor calls function buf_flush_stat_update()
that calls log_get_lsn() function and there we will try to get
log_sys mutex. Better, solution for error monitor is that in
buf_flush_stat_update() we will try to get lsn with
mutex_enter_nowait() and if we did not get mutex do not update
the stats.

Fix: Use log_get_lsn_nowait() function on buf_flush_stat_update()
function. If returned lsn is 0, we do not update flush stats.
log_get_lsn_nowait() will use mutex_enter_nowait() and if
we get mutex we return a correct lsn if not we return 0.
2014-11-13 12:00:57 +02:00
Jan Lindström
84f3f3fa1e MDEV-7083: sys_vars.innodb_sched_priority* tests fail in buildbot
on work-amd64-valgrind.

Fixed issue by finding out first the current used priority
for both treads and using that seeing did we really change
the priority or not.
2014-11-13 11:05:22 +02:00
Jan Lindström
8c7ef99bb2 MDEV-7100: InnoDB error monitor might unnecessary wait log_sys mutex
Analysis: InnoDB error monitor is responsible to call every second
sync_arr_wake_threads_if_sema_free() to wake up possible hanging 
threads if they are missed in mutex_signal_object. This is not 
possible if error monitor itself is on mutex/semaphore wait. We 
should avoid all unnecessary mutex/semaphore waits on error monitor.
Currently error monitor calls function buf_flush_stat_update() 
that calls log_get_lsn() function and there we will try to get 
log_sys mutex. Better, solution for error monitor is that in 
buf_flush_stat_update() we will try to get lsn with 
mutex_enter_nowait() and if we did not get mutex do not update 
the stats.

Fix: Use log_get_lsn_nowait() function on buf_flush_stat_update()
function. If returned lsn is 0, we do not update flush stats. 
log_get_lsn_nowait() will use mutex_enter_nowait() and if
we get mutex we return a correct lsn if not we return 0.
2014-11-13 11:24:19 +02:00
Jan Lindström
a03dd94be8 MDEV-6936: Buffer pool list scan optimization
Merged Facebook commit 617aef9f911d825e9053f3d611d0389e02031225
authored by Inaam Rana to InnoDB storage engine (not XtraDB)
from https://github.com/facebook/mysql-5.6

WL#7047 - Optimize buffer pool list scans and related batch processing

Reduce excessive scanning of pages when doing flush list batches. The
fix is to introduce the concept of "Hazard Pointer", this reduces the
time complexity of the scan from O(n*n) to O.

The concept of hazard pointer is reversed in this work. Academically
hazard pointer is a pointer that the thread working on it will declar
such and as long as that thread is not done no other thread is allowe
do anything with it.

In this WL we declare the pointer as a hazard pointer and then if any
thread attempts to work on it, it is allowed to do so but it has to a
the hazard pointer to the next valid value. We use hazard pointer sol
reverse traversal of lists within a buffer pool instance.

Add an event to control the background flush thread. The background f
thread wait has been converted to an os event timed wait so that it c
signalled by threads that want to kick start a background flush when
buffer pool is running low on free/dirty pages.
2014-11-06 13:17:11 +02:00
Jan Lindström
8b1b62dd8f Fix compiler failure on Windows. 2014-11-04 15:41:39 +02:00
Jan Lindström
2da6f7ceba MDEV-7017: Add function to print semaphore waits
Add function to print to stderr all current semaphore 
waits. This function should be able to executed 
inside a gdb/ddd.
2014-11-03 15:43:44 +02:00
Jan Lindström
cb37c55768 MDEV-6929: Port Facebook Prefix Index Queries Optimization
Merge Facebook commit 154c579b828a60722a7d9477fc61868c07453d08
and e8f0052f9b112dc786bf9b957ed5b16a5749f7fd authored
by Steaphan Greene from https://github.com/facebook/mysql-5.6

Optimize prefix index queries to skip cluster index lookup when possible.

Currently InnoDB will always fetch the clustered index (primary key
index) for all prefix columns in an index, even when the value of a
particular record is smaller than the prefix length. This change
optimizes that case to use the record from the secondary index and avoid
the extra lookup.

Also adds two status vars that track how effective this is:

innodb_secondary_index_triggered_cluster_reads:
Times secondary index lookup triggered cluster lookup.

innodb_secondary_index_triggered_cluster_reads_avoided:
Times prefix optimization avoided triggering cluster lookup.
2014-11-03 11:18:52 +02:00
Jan Lindström
2bf3e416fe MDEV-6932: Enable Lazy Flushing
Merge Facebook commit 4f3e0343fd2ac3fc7311d0ec9739a8f668274f0d
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6

Adds innodb_idle_flush_pct to enable tuning of the page flushing rate
when the system is relatively idle. We care about this, since doing
extra unnecessary flash writes shortens the lifespan of the flash.
2014-10-29 13:49:12 +02:00
Jan Lindström
b96697d286 MDEV-6648: InnoDB: Add support for 4K sector size if supported
New generation hard drives, SSDs and NVM devices support 4K
sector size. Supported sector size can be found using fstatvfs()
or GetDiskFreeSpace() functions.
2014-10-29 11:07:11 +02:00
Jan Lindström
caeffc7a7d MDEV-6926: innodb_rows_updated is misleading on slav
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6

This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220

Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.

innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.

dd2d11be7a
2014-10-26 07:22:51 +02:00
Jan Lindström
60e995cfec MDEV-6930: Make innodb_max_dirty_pages_pct my.cnf variable a double
Merged Facebook commit ecff018632c6db49bad73d9233c3cdc9f41430e9
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6

This change is to fix: http://bugs.mysql.com/62534

This makes innodb_max_dirty_pages_pct a double with min,default,max values
0.001, 75, 99.999.

This also makes innodb_max_dirty_pages_pct_lwm and adaptive_flushing_lwm
doubles, as these sysvars are inter-dependent.

Added more to the BUFFER POOL AND MEMORY section of SHOW INNODB STATUS:
Percent pages dirty: X.X
This is all n_dirty_pages / used_pages
Percent all pages dirty: X.X
This is all n_dirty_pages / all-pages
Max dirty pages percent: X.X
This is innodb_max_dirty_pages_pct

Also changed all of buf from 2 to 3 digits of precision (%.2f -> %.3f).
2014-10-25 09:24:39 +03:00
Jan Lindström
cff0012d28 MDEV-6927: Share more structures
Merge Facebook commit f981a51a47519b0ba527917887f8adc6df9ae147
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6.

This just moves some structure definitions from inside a
single .cc file to a shared .h file, with a few tweaks to
allow these structures to be shared.

On its own, it should have no actual effect. This is needed later.
2014-10-25 08:21:52 +03:00
Jan Lindström
3486135bb5 MDEV-6928: Add trx pointer to struct mtr_t
Merge Facebook commit 25295d003cb0c17aa8fb756523923c77250b3294
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6

This adds a pointer to the trx to each mtr.
This allows the trx to be accessed in parts of the code
where it was otherwise not available. This is needed later.
2014-10-24 22:26:31 +03:00
Sergei Golubchik
f62c12b405 Merge 10.0.14 into 10.1 2014-10-15 12:59:13 +02:00
Sergei Golubchik
7e6d4bba0c XtraDB 5.5.40-36.1 2014-10-08 00:44:37 +02:00
Sergei Golubchik
0d2cba5df2 XtraDB 5.5.39-36.0 2014-10-06 20:06:39 +02:00
Sergei Golubchik
7aabc2ded2 fixing embedded: WaaS. Wsrep as a Service. 2014-10-01 23:48:34 +02:00
Sergey Vojtovich
5e1fad8512 MDEV-6533 - MySQL Bug#72718 - CACHE_LINE_SIZE in innodb should be 128 on POWER
This patch is suggested by Stewart Smith.
Define CACHE_LINE_SIZE to 128 on POWER.
2014-10-03 17:38:46 +04:00
Jan Lindström
fc2df3c637 MDEV-6812: Merge Kakao: Add global status variables which tell
you the progress of inplace alter table and row log buffer usage

-    (x 100%, it's 4-digit. 10000 means 100.00%)
-    Innodb_onlineddl_rowlog_rows
       Shows how many rows are stored in row log buffer.
-    Innodb_onlineddl_rowlog_pct_used
       Shows row log buffer usage in percent ( *100%, it's 4-digit. 10000 means 100.00% ).
-    Innodb_onlineddl_pct_progress
       Shows the progress of inplace alter table. It might
       be not so accurate because inplace alter is highly
       depend on disk and buffer pool status.
       But still it is useful and better than nothing.

-    Add some log for inplace alter table
       XtraDB/InnoDB will print some message before and
       after doing some task.
2014-09-30 14:50:34 +03:00
Sergei Golubchik
a006813fb1 merge 2014-09-16 14:08:05 +02:00
Sergei Golubchik
7e29c1b539 5.5 merge 2014-09-16 14:03:17 +02:00
Jan Lindström
ec46381990 Avoid using log_sys mutex when printing out show engine innodb status,
instead peek (maybe) old data.
2014-09-16 07:37:00 +03:00
Sergei Golubchik
4a68817d8d XtraDB 5.6.20-68.0 2014-09-11 16:44:16 +02:00
Sergei Golubchik
8deb9066e2 fix compilation on windows - wrong include file 2014-09-08 17:10:48 +02:00
Sergey Vojtovich
c01c819209 Backport from 10.0:
MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64

This problem affects only debug builds on PPC64.

There are at least two race conditions around
rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:

- rw_lock_debug_waiters was loaded/stored without setting
  appropriate locks/memory barriers.
- there is a gap between calls to os_event_reset() and
  os_event_wait() and in such case we're supposed to pass
  return value of the former to the latter.

Fixed by replacing self-cooked spinlocks with system mutexes.
These days system mutexes offer much better performance. OTOH
performance is not that critical for debug builds.
2014-08-29 16:14:11 +04:00
Sergey Vojtovich
40497577ff Backport from 10.0:
MDEV-6450 - MariaDB crash on Power8 when built with advance tool
            chain

InnoDB mutex_exit() function calls __sync_test_and_set() to release
the lock. According to manual this function is supposed to create
"acquire" memory barrier whereas in fact we need "release" memory
barrier at mutex_exit().

The problem isn't repeatable with gcc because it creates
"acquire-release" memory barrier for __sync_test_and_set().
ATC creates just "acquire" barrier.

Fixed by creating proper barrier at mutex_exit() by using
__sync_lock_release() instead of __sync_test_and_set().
2014-08-29 16:02:46 +04:00
Jan Lindström
ab150128ce MDEV-6247: Merge 10.0-galera to 10.1.
Merged lp:maria/maria-10.0-galera up to revision 3880.

    Added a new functions to handler API to forcefully abort_transaction,
    producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
    were added for future possiblity to add more storage engines that
    could use galera replication.
2014-08-27 13:15:37 +03:00
Jan Lindström
df4dd593f2 MDEV-6247: Merge 10.0-galera to 10.1.
Merged lp:maria/maria-10.0-galera up to revision 3879.

Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
2014-08-26 15:43:46 +03:00
Michael Widenius
5569132ffe MDEV-6450 - MariaDB crash on Power8 when built with advance tool chain
Part of this work is based on Stewart Smitch's memory barrier and lower priori
patches for power8.

- Added memory syncronization for innodb & xtradb for power8.
- Added HAVE_WINDOWS_MM_FENCE to CMakeList.txt
- Added os_isync to fix a syncronization problem on power
- Added log_get_lsn_nowait which is now used srv_error_monitor_thread to ensur
  if log mutex is locked.

All changes done both for InnoDB and Xtradb
2014-08-19 19:28:35 +03:00
Monty
3bde13932e Minor cleanups, fix compiler warnings 2014-08-09 13:22:01 +03:00
Jan Lindström
6dad23f04a MDEV-5834: Merge Kakao Defragmentation implementation to MariaDB 10.1
Merge https://github.com/kakao/mariadb-10.0 that contains Facebook's
    implementation for defragmentation

    facebook/mysql-5.6@a2d3a74
    facebook/mysql-5.6@def96c8
    facebook/mysql-5.6@9c67c5d
    facebook/mysql-5.6@921a81b
    facebook/mysql-5.6@aa519bd
    facebook/mysql-5.6@fea7d13
    facebook/mysql-5.6@09b29d3
    facebook/mysql-5.6@9284abb
    facebook/mysql-5.6@dbd623d
    facebook/mysql-5.6@aed55dc
    facebook/mysql-5.6@aad5c82

    This version does not add new SQL-syntax and new handler API function.
    Instead optimize table is mapped to defragment table if
    innodb_defragment=ON, by default the feature is off.

    Contains changes authored by Sunguck Lee (Kakao).
2014-08-06 15:28:58 +03:00
Jan Lindström
27d23c020a Merged percona-server-5.5.38-35.2. 2014-08-01 12:54:56 +03:00
Jan Lindström
e974b56438 MDEV-6512: InnoDB: Assertion failure in thread 4537024512 in file
buf0buf.cc line 2642.

Analysis: innodb_compression_algorithm is a global variable and
can change while we are building page compressed page. This could
lead page corruption.

Fix: Cache innodb_compression_algorithm on local variable before
doing any compression or page formating to avoid concurrent
change. Improved page verification on debug builds.
2014-07-31 11:31:39 +03:00
Jan Lindström
56c4b016ad Fiix random test failures on fil_decompress_page_2 function.
Analysis: InnoDB writes also files that do not contain FIL-header.
This could lead incorrect analysis on os_fil_read_func function
when it tries to see is page page compressed based on FIL_PAGE_TYPE
field on FIL-header. With bad luck uncompressed page that does
not contain FIL-headed, the byte on FIL_PAGE_TYPE position could
indicate that page is page comrpessed.

Fix: Upper layer must indicate is file space page compressed
or not. If this is not yet known, we need to read the FIL-header
and find it out. Files that we know that are not page compressed
we can always just provide FALSE.
2014-07-25 14:37:10 +03:00
Jan Lindström
dbc79ce055 MDEV-6354: Implement a way to read MySQL 5.7.4-labs-tplc page
compression format (Fusion-IO).

Addeed LZMA and BZIP2 compression methods.
2014-07-22 06:56:50 +03:00
Michael Widenius
bd2117d154 Automatic merge from 5.5
Fixed 2 failing tests by replacing result files
2014-08-19 21:35:14 +03:00
Sergei Golubchik
a7f39aacd5 xtradb-5.6.19-67.0 2014-08-06 19:57:06 +02:00
Sergey Vojtovich
82ce2a2503 MDEV-6450 - MariaDB crash on Power8 when built with advance
tool chain

Reverted addition to the original patch:
InterlockedExchangeAcquire and InterlockedAndRelease are
supported only on Itanium-based systems.
2014-08-04 11:55:38 +04:00
Sergey Vojtovich
53360fd45f MDEV-6450 - MariaDB crash on Power8 when built with advance
tool chain

This is an addition to the original patch. On Windows
InterlockedExchange implies full memory barrier, whereas
only acquire/release barriers required.
2014-07-31 14:31:05 +04:00
Sergei Golubchik
3e9d4541d5 MDEV-6340 Mariadb 10.0.12 fatal "Lost connection" error w/ GCC 4.9 'Release' build; workaround ~ CFLAGS="-fno-delete-null-pointer-checks"
don't use attribute nonnull for arguments that can be null
2014-07-31 09:51:05 +02:00
Sergey Vojtovich
6192f0bffa MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64
This problem affects only debug builds on PPC64.

There are at least two race conditions around
rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:

- rw_lock_debug_waiters was loaded/stored without setting
  appropriate locks/memory barriers.
- there is a gap between calls to os_event_reset() and
  os_event_wait() and in such case we're supposed to pass
  return value of the former to the latter.

Fixed by replacing self-cooked spinlocks with system mutexes.
These days system mutexes offer much better performance. OTOH
performance is not that critical for debug builds.
2014-07-24 18:12:32 +04:00