file trx0trx.ic line 60
Problem was that trx might have not been started when we enter release
savepoint, this can happen when trx with savepoint is already aborted
or when we try to release non-existing savepoint.
MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64
This problem affects only debug builds on PPC64.
There are at least two race conditions around
rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:
- rw_lock_debug_waiters was loaded/stored without setting
appropriate locks/memory barriers.
- there is a gap between calls to os_event_reset() and
os_event_wait() and in such case we're supposed to pass
return value of the former to the latter.
Fixed by replacing self-cooked spinlocks with system mutexes.
These days system mutexes offer much better performance. OTOH
performance is not that critical for debug builds.
MDEV-6450 - MariaDB crash on Power8 when built with advance tool
chain
InnoDB mutex_exit() function calls __sync_test_and_set() to release
the lock. According to manual this function is supposed to create
"acquire" memory barrier whereas in fact we need "release" memory
barrier at mutex_exit().
The problem isn't repeatable with gcc because it creates
"acquire-release" memory barrier for __sync_test_and_set().
ATC creates just "acquire" barrier.
Fixed by creating proper barrier at mutex_exit() by using
__sync_lock_release() instead of __sync_test_and_set().
Merged lp:maria/maria-10.0-galera up to revision 3880.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
Merged lp:maria/maria-10.0-galera up to revision 3879.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
Analysis: When database is migrated from 5.5 or earlier and
database needs crash recovery, there is possibility that
SYS_DATAFILES system table does not exists, but
crash recovery in function dict_check_tablespaces_and_store_max_id()
assumes that SYS_DATAFILES exists.
Fix: If SYS_DATAFILES does not exists, create it before
we end up to function dict_check_tablespaces_and_store_max_id()
on crash recovery.
Analysis: Problem is that we execute galera code when we are actually
executing asyncronoush replication.
Fix: Do not execute galera code if wsrep provider is not set.
Part of this work is based on Stewart Smitch's memory barrier and lower priori
patches for power8.
- Added memory syncronization for innodb & xtradb for power8.
- Added HAVE_WINDOWS_MM_FENCE to CMakeList.txt
- Added os_isync to fix a syncronization problem on power
- Added log_get_lsn_nowait which is now used srv_error_monitor_thread to ensur
if log mutex is locked.
All changes done both for InnoDB and Xtradb
config.h.cmake:
define NOMINMAX, otherwise Windows system headers define min() and max() macros
sql/slave.cc:
mi->report() has one more argument in MariaDB
storage/xtradb/buf/buf0flu.cc:
xtradb fixes for windows, again
A donor node does a flush tables and then tries to
freeze innodb writes before proceeding with SST.
However, innodb_disallow_writes was missing in xtradb.
Merged 'InnodbFreeze' patch from maria-5.5-galera's to
xtradb. Also, merged some changes missing in innobase's
os0file.cc.
Added a basic test case for innodb_disallow_writes system
variable.
tool chain
This is an addition to the original patch. On Windows
InterlockedExchange implies full memory barrier, whereas
only acquire/release barriers required.
buf0buf.cc line 2642.
Analysis: innodb_compression_algorithm is a global variable and
can change while we are building page compressed page. This could
lead page corruption.
Fix: Cache innodb_compression_algorithm on local variable before
doing any compression or page formating to avoid concurrent
change. Improved page verification on debug builds.
buf0flu.cc line 549.
Analysis: If buf_page_get_state(bpage) == BUF_BLOCK_REMOVE_HASH then
buf_page_in_file(bpage) might not be true.
Fix: ut_a(buf_page_in_file(bpage) || buf_page_get_state(bpage) == BUF_BLOCK_REMOVE_HASH);
InnoDB: Failing assertion: mutex_own(&buf_pool->LRU_list_mutex)
and
InnoDB: Assertion failure in thread 2868898624 in file buf0lru.c line 1077
InnoDB: Failing assertion: mutex_own(&buf_pool->LRU_list_mutex)
Analysis: Function buf_LRU_free_block might release LRU_list_mutex on
same cases to avoid mutex order problems, we need to take it back
before accessing list.
Analysis: InnoDB writes also files that do not contain FIL-header.
This could lead incorrect analysis on os_fil_read_func function
when it tries to see is page page compressed based on FIL_PAGE_TYPE
field on FIL-header. With bad luck uncompressed page that does
not contain FIL-headed, the byte on FIL_PAGE_TYPE position could
indicate that page is page comrpessed.
Fix: Upper layer must indicate is file space page compressed
or not. If this is not yet known, we need to read the FIL-header
and find it out. Files that we know that are not page compressed
we can always just provide FALSE.
4229: MDEV-5670: Assertion failure in file buf0lru.c line 2355
Add more status information if repeatable.
4230: MDEV-5673: Crash while parallel dropping multiple tables under heavy load
Improve long semaphore wait output to include all semaphore waits
and try to find out if there is a sequence of waiters.
4233: Fix compiler errors on product build.
4237: Fix too agressive long semaphore wait output and add guard against introducing
compression failures on insert buffer.
4238: Fix test failure caused by simulated compression failure on
IBUF_DUMMY table.
This problem affects only debug builds on PPC64.
There are at least two race conditions around
rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:
- rw_lock_debug_waiters was loaded/stored without setting
appropriate locks/memory barriers.
- there is a gap between calls to os_event_reset() and
os_event_wait() and in such case we're supposed to pass
return value of the former to the latter.
Fixed by replacing self-cooked spinlocks with system mutexes.
These days system mutexes offer much better performance. OTOH
performance is not that critical for debug builds.
If mysql.innodb_table_stats or mysql.innodb_index_stats is not found or has
unexpected structure output that error only once and no other error for
every table trying to use them. If they do exists, then print fetch or
recalculation errors only once / table or index.
line 8473
In case InnoDB index is not found, print the MySQL and InnoDB index
name we were trying to find and all MySQL and InnoDB index names there
is for this table.
ha_innodb.cc line 8473
If index is not found from InnoDB make sure we print what we
were trying to find and all mysql and InnoDB index names there
is for this table.
chain
InnoDB mutex_exit() function calls __sync_test_and_set() to release
the lock. According to manual this function is supposed to create
"acquire" memory barrier whereas in fact we need "release" memory
barrier at mutex_exit().
The problem isn't repeatable with gcc because it creates
"acquire-release" memory barrier for __sync_test_and_set().
ATC creates just "acquire" barrier.
Fixed by creating proper barrier at mutex_exit() by using
__sync_lock_release() instead of __sync_test_and_set().
Merge the patches into MariaDB 10.0 main.
With this patch, parallel replication will now automatically retry a
transaction that fails due to deadlock or other temporary error, same as
single-threaded replication.
We catch deadlocks with InnoDB transactions due to enforced commit order. If
T1 must commit before T2 in parallel replication and T1 ends up waiting for T2
inside InnoDB, we kill T2 and retry it later to resolve the deadlock
automatically.
After-review changes.
For this patch in 10.0, we do not introduce a new public storage engine API,
we just fix the InnoDB/XtraDB issues. In 10.1, we will make a better public
API that can be used for all storage engines (MDEV-6429).
Eliminate the background thread that did deadlock kills asynchroneously.
Instead, we ensure that the InnoDB/XtraDB code can handle doing the kill from
inside the deadlock detection code (when thd_report_wait_for() needs to kill a
later thread to resolve a deadlock).
(We preserve the part of the original patch that introduces dedicated mutex
and condition for the slave init thread, to remove the abuse of
LOCK_thread_count for start/stop synchronisation of the slave init thread).
in file buf0mtflu.cc line 570.
Analysis: Real timing bug, we should take the mutex before we
try to send those shutdown messages, that would make sure
that threads doing a unfinished flush (they have acquired
this mutex) have time to do their work before we add shutdown
messages to work queue. Currently, we just add those shutdown
messages to work queue and code assumes that at flush, there
is constant number of items to be processed and thus
leading to assertion.
Analysis: For some reason table stats for a table pointed from a index
is not initialized. Added additional warning output on this situation
and table stats initialization. This is better than asserting.
than with InnoDB plugin
Fix: os0file.h in XtraDB had OS_AIO_N_PENDING_IOS_PER_THREAD 256
when on InnoDB it is OS_AIO_N_PENDING_IOS_PER_THREAD 32. Changed
XtraDB also to use 32.
* Introduce a set of PLUGIN_xxx cmake options with values
NO, STATIC, DYNAMIC, AUTO, YES (abort if plugin is not compiled)
* Deprecate redundant and ambiguous WITH_xxx, WITH_PLUGIN_xxx,
WITH_xxx_STORAGE_ENGINE, WITHOUT_xxx, WITHOUT_PLUGIN_xxx,
WITHOUT_xxx_STORAGE_ENGINE
* Actually check whether a plugin is disabled (DISABLED keyword was
always present, but it was ignored until now).
* Support conditionally disabled plugins - keyword ONLY_IF
* Use ONLY_IF for conditionally skipping plugins, instead of
doing MYSQL_ADD_PLUGIN conditionally as before. Because if
MYSQL_ADD_PLUGIN isn't done at all, PLUGIN_xxx=YES cannot work.
mark path-related variables (AIO_LIBRARY, ODBC_LIBRARY, ODBC_INCLUDE_DIR,
Thrift_LIBS, Thrift_INCLUDE_DIRS, CRYPTO_LIBRARY, OPENSSL_LIBRARIES,
OPENSSL_ROOT_DIR, OPENSSL_INCLUDE_DIR) as advanced - paths are
automatically discovered by cmake.
mark few choice variables (ENABLED_LOCAL_INFILE, WITHOUT_SERVER,
DISABLE_SHARED) as not advanced - they are user choices, not automatically
configured values.
remove unused BACKUP_TEST variable.
Auto-generate the allowed list of values for enum/set/flagset options
in --help output. But don't do that when the help text already has them.
Also, remove lists of values from help strings of various options, where
they were simply listed without any additional information.
than with InnoDB plugin
Fix: os0file.h in XtraDB had OS_AIO_N_PENDING_IOS_PER_THREAD 256
when on InnoDB it is OS_AIO_N_PENDING_IOS_PER_THREAD 32. Changed
XtraDB also to use 32.
Analysis: Based on crashed the buffer pool instance identifier is
not correct on block to be freed. Add LRU list mutex holding
on functions calling free and add additional safety checks.
replication causing replication to fail.
Remove the temporary fix for MDEV-5914, which used READ COMMITTED for parallel
replication worker threads. Replace it with a better, more selective solution.
The issue is with certain edge cases of InnoDB gap locks, for example between
INSERT and ranged DELETE. It is possible for the gap lock set by the DELETE to
block the INSERT, if the DELETE runs first, while the record lock set by
INSERT does not block the DELETE, if the INSERT runs first. This can cause a
conflict between the two in parallel replication on the slave even though they
ran without conflicts on the master.
With this patch, InnoDB will ask the server layer about the two involved
transactions before blocking on a gap lock. If the server layer tells InnoDB
that the transactions are already fixed wrt. commit order, as they are in
parallel replication, InnoDB will ignore the gap lock and allow the two
transactions to proceed in parallel, avoiding the conflict.
Improve the fix for MDEV-6020. When InnoDB itself detects a deadlock, it now
asks the server layer for any preferences about which transaction to roll
back. In case of parallel replication with two transactions T1 and T2 fixed to
commit T1 before T2, the server layer will ask InnoDB to roll back T2 as the
deadlock victim, not T1. This helps in some cases to avoid excessive deadlock
rollback, as T2 will in any case need to wait for T1 to complete before it can
itself commit.
Also some misc. fixes found during development and testing:
- Remove thd_rpl_is_parallel(), it is not used or needed.
- Use KILL_CONNECTION instead of KILL_QUERY when a parallel replication
worker thread is killed to resolve a deadlock with fixed commit
ordering. There are some cases, eg. in sql/sql_parse.cc, where a KILL_QUERY
can be ignored if the query otherwise completed successfully, and this
could cause the deadlock kill to be lost, so that the deadlock was not
correctly resolved.
- Fix random test failure due to missing wait_for_binlog_checkpoint.inc.
- Make sure that deadlock or other temporary errors during parallel
replication are not printed to the the error log; there were some places
around the replication code with extra error logging. These conditions can
occur occasionally and are handled automatically without breaking
replication, so they should not pollute the error log.
- Fix handling of rgi->gtid_sub_id. We need to be able to access this also at
the end of a transaction, to be able to detect and resolve deadlocks due to
commit ordering. But this value was also used as a flag to mark whether
record_gtid() had been called, by being set to zero, losing the value. Now,
introduce a separate flag rgi->gtid_pending, so rgi->gtid_sub_id remains
valid for the entire duration of the transaction.
- Fix one place where the code to handle ignored errors called reset_killed()
unconditionally, even if no error was caught that should be ignored. This
could cause loss of a deadlock kill signal, breaking deadlock detection and
resolution.
- Fix a couple of missing mysql_reset_thd_for_next_command(). This could
cause a prior error condition to remain for the next event executed,
causing assertions about errors already being set and possibly giving
incorrect error handling for following event executions.
- Fix code that cleared thd->rgi_slave in the parallel replication worker
threads after each event execution; this caused the deadlock detection and
handling code to not be able to correctly process the associated
transactions as belonging to replication worker threads.
- Remove useless error code in slave_background_kill_request().
- Fix bug where wfc->wakeup_error was not cleared at
wait_for_commit::unregister_wait_for_prior_commit(). This could cause the
error condition to wrongly propagate to a later wait_for_prior_commit(),
causing spurious ER_PRIOR_COMMIT_FAILED errors.
- Do not put the binlog background thread into the processlist. It causes
too many result differences in mtr, but also it probably is not useful
for users to pollute the process list with a system thread that does not
really perform any user-visible tasks...
replication causing replication to fail.
In parallel replication, we run transactions from the master in parallel, but
force them to commit in the same order they did on the master. If we force T1
to commit before T2, but T2 holds eg. a row lock that is needed by T1, we get
a deadlock when T2 waits until T1 has committed.
Usually, we do not run T1 and T2 in parallel if there is a chance that they
can have conflicting locks like this, but there are certain edge cases where
it can occasionally happen (eg. MDEV-5914, MDEV-5941, MDEV-6020). The bug was
that this would cause replication to hang, eventually getting a lock timeout
and causing the slave to stop with error.
With this patch, InnoDB will report back to the upper layer whenever a
transactions T1 is about to do a lock wait on T2. If T1 and T2 are parallel
replication transactions, and T2 needs to commit later than T1, we can thus
detect the deadlock; we then kill T2, setting a flag that causes it to catch
the kill and convert it to a deadlock error; this error will then cause T2 to
roll back and release its locks (so that T1 can commit), and later T2 will be
re-tried and eventually also committed.
The kill happens asynchroneously in a slave background thread; this is
necessary, as the reporting from InnoDB about lock waits happen deep inside
the locking code, at a point where it is not possible to directly call
THD::awake() due to mutexes held.
Deadlock is assumed to be (very) rarely occuring, so this patch tries to
minimise the performance impact on the normal case where no deadlocks occur,
rather than optimise the handling of the occasional deadlock.
Also fix transaction retry due to deadlock when it happens after a transaction
already signalled to later transactions that it started to commit. In this
case we need to undo this signalling (and later redo it when we commit again
during retry), so following transactions will not start too early.
Also add a missing thd->send_kill_message() that got triggered during testing
(this corrects an incorrect fix for MySQL Bug#58933).
This patch allows up to 64K pages for tables with DYNAMIC, COMPACT
and REDUNDANT row types. Tables with COMPRESSED row type allows
still only <= 16K page size. Note that single row size must be
still <= 16K and max key length is not affected.
on select from I_S.INNODB_CHANGED_PAGES
Analysis: limit_lsn_range_from_condition() incorrectly parses
start_lsn and/or end_lsn conditions.
Fix from SergeyP. Added some test cases.
Analysis: Can't disable the error message because you may get database
started with incorrect log file size.
Fix: Thus only improve the error message to give more information
to users.
on select from I_S.INNODB_CHANGED_PAGES
Analysis: limit_lsn_range_from_condition() incorrectly parses
start_lsn and/or end_lsn conditions.
Fix from SergeyP. Added some test cases.
option leaves the database in inconsistent state,
Analysis: Problem was that atomic writes variable had incorrect
type on same places leading to fact that e.g. OFF option was
not regognized. Furthermore, some error check code was missing
from both InnoDB and XtraDB engines. Finally, when table is
created we have already created the .ibd file and if we can't
set atomic writes it stays there.
Fix: Fix atomic writes variable type to ulint as it should be.
Fix: Add proper error code checking on os errors on both InnoDB
and XtraDB
Fix: Remove the .idb file when atomic writes can't be enabled
to a new table.
(Inadequate background LRU flushing for write workloads with InnoDB compression).
If InnoDB compression is used and the workload has writes, the
following situation is possible. The LRU flusher issues an LRU flush
request for an instance. buf_do_LRU_batch decides to perform
unzip_LRU eviction and this eviction might fully satisfy the
request. Then buf_flush_LRU_tail checks the number of flushed pages in
the last iteration, finds it to be zero, and wrongly decides not to
flush that instance anymore.
Fixed by maintaining unzip_LRU eviction counter in struct
flush_counter_t variables, and checking it in buf_flush_LRU_tail when
deciding whether to stop flushing the current instance.
Added test cases for new configuration files to get mysql-test-run suite sys_vars
to pass. Fix some small errors.
special builds UNIV_PAGECOMPRESS_DEBUG and UNIV_MTFLUSH_DEBUG).
Added a new status variable compress_pages_page_compression_error to count possible
compression errors.
when compressed tables are used.
Analysis: Number of flushed pages is incorrectly calculated at
buf_do_LRU_batch. This leads to problem when utility function
flushes dirty blocks from the end of the flush list of
all buffer pool instances in a loop until enough pages are flushed
or time limit is reached. As number of flushed pages is incorrectly
calculated, the loop mostly try to flush until time limit is
reached because the number of pages limit is not reached.
Fix: Fix the calculation of flushed pages (very short). This fix
was provided by Alexey Stroganov (Percona).
Due to how gap locks work, two transactions could group commit together on the
master, but get lock conflicts and then deadlock due to different thread
scheduling order on slave.
For now, remove these deadlocks by running the parallel slave in READ
COMMITTED mode. And let InnoDB/XtraDB allow statement-based binlogging for the
parallel slave in READ COMMITTED.
We are also investigating a different solution long-term, which is based on
relaxing the gap locks only between the transactions running in parallel for
one slave, but not against possibly external transactions.
Analysis: XtraDB merge regression, at the end of mutex_spin_wait before goto mutex_loop
there is missing
if (prio_mutex) {
os_atomic_decrement_ulint(&prio_mutex->high_priority_waiters, 1);
}
Hence we get unbalanced waiter count.
Thanks to Laurynas Biveinis for finding this.
Analysis: This was merge error on file fil0fil.cc. fil_system mutex was taken twice because of this.
Fix: Remove unnecessary mutex_enter and fixed the issue with slow posix_fallocate usage.
unnecessary. There is a lot more index pages than there is normal pages.
Earlier all pages were compressed and this provided best performance and
compression ratio. Added status variable to show how many non index pages
are written.
innodb_force_primary_key default off. If option is true, create table without
primary key or unique key where all keyparts are NOT NULL is not
accepted. Instead an error message is printed. Variable value can
be changed with set global innodb_force_primary_key = <value>.
- Table locks now ends with state "After table lock"
- Open table now ends with state "After opening tables"
- All calls to close_thread_tables(), not only from mysql_execute_command(), has state "closing tables"
- Added state "executing" for mysql admin commands, like CACHE INDEX, REPAIR TABLE etc.
- Added state "Finding key cache" for CACHE INDEX
- Added state "Filling schema table" when we generate temporary table for SHOW commands and information schema.
Other things:
Add limit from innobase for thread_sleep_delay. This fixed a failing tests case.
Added db.opt to support-files to make 'make package' work
mysql-test/suite/funcs_1/datadict/processlist_val.inc:
Use new state
mysql-test/suite/funcs_1/r/processlist_priv_no_prot.result:
Updated test result because of new state
mysql-test/suite/funcs_1/r/processlist_val_no_prot.result:
Updated test result because of new state
sql/CMakeLists.txt:
Have option files in support-files
sql/lock.cc:
Added new state 'After table lock'
sql/sql_admin.cc:
Added state "executing" and "Sending data" for mysql admin commands, like CACHE INDEX, REPAIR TABLE etc.
Added state "Finding key cache"
sql/sql_base.cc:
open tables now ends with state "After table lock", instead of NULL
sql/sql_parse.cc:
Moved state "closing tables" to close_thread_tables()
sql/sql_show.cc:
Added state "Filling schema table" when we generate temporary table for SHOW commands and information schema.
storage/xtradb/buf/buf0buf.c:
Removed compiler warning
storage/xtradb/handler/ha_innodb.cc:
Add limit from innobase for thread_sleep_delay. This fixed a failing tests case.
support-files/db.opt:
cmakes needs this to create data/test directory
This is not yet a fix. This is change to print additional information at the point
when this assertion is going to happen. Print as much information about the pages
and index to find out why next page is not a compact format.
Added option to disable multi-threaded flush with innodb_use_mtflush = 0
option, by default multi-threaded flush is used.
Updated innochecksum tool, still it does not support new checksums.
execution and global memory heap on shutdown.
Added a funcition to get work items from queue without waiting and
additional info when there is no work to do for a extended periods.
waiting for a job. They should sleep and let other threads to their work. At
shutdown, we know that we put "work" and that is handled as soon as possible.
file storage.
Analysis: posix_fallocate was called using 0 as offset and len as
desired size. This is not optimal for SSDs.
Fix: Call posix_fallocate with correct offset i.e. current file size
and extend the file from there len bytes.
on debug build when innodb_use_fallocate=1
Analysis: There was merge error that caused additional call to
fil_node_complete_io.
Fixed by removing incorrect call and fixed calculation of extended
file size.
Fixed issue on multi-threaded flush at shutdown.
Removed unnecessary startup option innodb_compress_pages.
Added a new startup option innodb_mtflush_threads, default 8.
Analysis: In wsrep case there is two transactions possible trx and
conflicting lock owning transaction. Code was handling trx mutexin
correctly but not for c_lock->trx case.
Fixed by taking clock-trx mutex when needed and releasing trx
mutex if it is taken at lower levels.