Commit graph

720 commits

Author SHA1 Message Date
Marko Mäkelä
6c3e860cbf Merge 10.4 into 10.5 2021-04-14 11:35:39 +03:00
Marko Mäkelä
5008171b05 Merge 10.3 into 10.4 2021-04-14 10:33:59 +03:00
sjaakola
a1e70388c4 MDEV-24966 Galera multi-master regression
After the merging of MDEV-24915, 10.6 branch has regressions with handling of
concurrent write load against two or more cluster nodes. These regressions may
surface as cluster hanging, node crashes or data inconsistency. With some test
scenarios, the only visible symptom could be that the BF victim aborting happens
only by innodb lock wait timeout expiration. This would result only to poor
performance (by default 50 sec hang for each BF conflict), and could be somewhat
difficult to diagnose.

This pull request has following fixes to handle concurrent write load from
multiple nodes:

In lock_wait_wsrep_kill(), the victim trx was expected to be only in
TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
that victim has advanced into pre commit state. This was fixed by choosing
victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.

Victim transaction may be in several different states at the time of detected
lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
may advance further before the actual BF aborting takes place. The BF aborting
in MDEV-24915 did not wake the victim, if it was in the state of waiting for
some other lock (than the one that was blocking the high priority thread).
This anomaly caused the innodb lock wait timeout expiration delays and poor
performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
to cancel this lock wait.

wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
and starts a new transaction if there was no active transaction before.
Due to late BF aborting, the victim may have e.g. failed in certification
and is already aborting or has aborted at this stage. This has caused
problems in testing where BF aborter tries to BF abort himself.
The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
or has aborted. Victim may not have started transaction yet in wsrep context,
but it may have acquired MDL locks (due to DDL execution), and this has
caused BF conflict. Such case does not require aborting in wsrep or
replication provider state.

BF aborting could cause BF-BF conflict scenario, if victim was already aborted
and changed to replayer having high priority as well. This BF-BF conflict
scenario is now avoided in lock_wait_wsrep() where we now check if blocking
lock holder is also high priority and is ordered before, caller should wait
for the lock in this situation.

The natural innodb deadlock resolving algorithm could pick BF thread as
deadlock victim. This is fixed by giving max weigh to BF threads in
Deadlock::report().

MDEV-24341 has changed excution paths in do_command() and this affects BF
aborted victim execution. This PR fixes one assert in do_command():
 DBUG_ASSERT(!thd->async_state.pending_ops())
Which fired if the thd was BF aborted earlier. This assert is now changed
to allow pending_ops() if thd was BF aborted before.

With these fixes, long term highly conflicting write load could be run against
to node cluster. If binlogging is configured, log_slave_updates should be
also set.
2021-04-13 14:58:54 +03:00
Marko Mäkelä
b8c8692fd9 MDEV-24620 ASAN heap-buffer-overflow in btr_pcur_restore_position()
Between btr_pcur_store_position() and btr_pcur_restore_position()
it is possible that purge empties a table and enlarges
index->n_core_fields and index->n_core_null_bytes.
Therefore, we must cache index->n_core_fields in
btr_pcur_t::old_n_core_fields so that btr_pcur_t::old_rec can be
parsed correctly.

Unfortunately, this is a huge change, because we will replace
"bool leaf" parameters with "ulint n_core"
(passing index->n_core_fields, or 0 for non-leaf pages).
For special cases where we know that index->is_instant() cannot hold,
we may also pass index->n_fields.
2021-04-13 10:28:13 +03:00
Marko Mäkelä
7c524d4414 MDEV-25371 Potential hang in wsrep_is_BF_lock_timeout()
In commit e71e613353 (MDEV-24671),
lock_sys.wait_mutex was moved above lock_sys.mutex
(which was later replaced with lock_sys.latch) in the latching order.

In commit 7cf4419fc4 (MDEV-24789),
a potential hang was introduced to Galera. The function lock_wait()
would hold lock_sys.wait_mutex while invoking wsrep_is_BF_lock_timeout(),
which in turn could acquire LockMutexGuard for some diagnostic printout.

wsrep_is_BF_lock_timeout(): Do not invoke trx_print_latched() or
LockMutexGuard.

lock_sys_t::wr_lock(), lock_sys_t::rd_lock(): Assert that the current
thread is not holding lock_sys.wait_mutex.
Unfortunately, RW-locks are not covered by SAFE_MUTEX.

Reviewed by: Jan Lindström
2021-04-08 13:32:16 +03:00
Marko Mäkelä
cf552f5886 MDEV-25312 Replace fil_space_t::name with fil_space_t::name()
A consistency check for fil_space_t::name is causing recovery failures
in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field
altogether.

fil_space_t::name was more or less a copy of dict_table_t::name
(except for some special cases), and it was not being used for
anything useful.

There used to be a name_hash, but it had been removed already in
commit a75dbfd718 (MDEV-12266).

We will also remove os_normalize_path(), OS_PATH_SEPARATOR,
OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and /
roughly in the same way. The intention is that for per-table
tablespaces, the filenames will always follow the pattern
prefix/databasename/tablename.ibd. (Any \ in the prefix must not
be converted.)

ut_basename_noext(): Remove (unused function).

read_link_file(): Replaces RemoteDatafile::read_link_file().
We will ensure that the last two path component separators are
forward slashes (converting up to 2 trailing backslashes on
Microsoft Windows), so that everywhere else we can
assume that data file names end in "/databasename/tablename.ibd".

Note: On Microsoft Windows, path names that start with \\?\ must
not contain / as path component separators. Previously, such paths
did work in the DATA DIRECTORY argument of InnoDB tables.

Reviewed by: Vladislav Vaintroub
2021-04-07 18:01:13 +03:00
Marko Mäkelä
1d41e9df4a MDEV-25314 Assertion `trx.is_wsrep()' failed in wsrep_is_BF_lock_timeout()
In commit 1bd4115841 the intention
was to move the trx_t::is_wsrep() check to the caller of the function
wsrep_is_BF_lock_timeout().
2021-04-01 09:19:21 +03:00
Marko Mäkelä
1bd4115841 After-merge fix: WITH_WSREP=ON CMAKE_BUILD_TYPE=RelWithDebInfo
Merge commit 176aaf93d1
accidentally broke the WITH_WSREP release build.
2021-03-31 22:15:54 +03:00
Marko Mäkelä
176aaf93d1 Merge 10.5 into 10.6 2021-03-31 12:04:50 +03:00
Marko Mäkelä
5eae8c2742 Merge 10.4 into 10.5 2021-03-31 11:05:21 +03:00
Marko Mäkelä
50de71b026 Merge 10.3 into 10.4 2021-03-31 09:47:14 +03:00
Marko Mäkelä
d6d3d9ae2f Merge 10.2 into 10.3 2021-03-31 08:01:03 +03:00
Jan Lindström
d217a925b2 MDEV-24923 : Port selected Galera conflict resolution changes from 10.6
Add condition on trx->state == TRX_STATE_COMMITTED_IN_MEMORY in order to
avoid unnecessary work. If a transaction has already been committed or
rolled back, it will release its locks in lock_release() and let
the waiting thread(s) continue execution.

Let BF wait on lock_rec_has_to_wait and if necessary other BF
is replayed.

wsrep_trx_order_before
  If BF is not even replicated yet then they are ordered
  correctly.

bg_wsrep_kill_trx
  Make sure victim_trx is found and check also its state. If
  state is TRX_STATE_COMMITTED_IN_MEMORY transaction is
  already committed or rolled back and will release it locks
  soon.

wsrep_assert_no_bf_bf_wait
  Transaction requesting new record lock should be TRX_STATE_ACTIVE
  Conflicting transaction can be in states TRX_STATE_ACTIVE,
  TRX_STATE_COMMITTED_IN_MEMORY or in TRX_STATE_PREPARED.
  If conflicting transaction is already committed in memory or
  prepared we should wait. When transaction is committed in memory we
  held trx mutex, but not lock_sys->mutex. Therefore, we
  could end here before transaction has time to do lock_release()
  that is protected with lock_sys->mutex.

lock_rec_has_to_wait
  We very well can let bf to wait normally as other BF will be
  replayed in case of conflict. For debug builds we will do
  additional sanity checks to catch unsupported bf wait if any.

wsrep_kill_victim
  Check is victim already in TRX_STATE_COMMITTED_IN_MEMORY state and
  if it is we can return.

lock_rec_dequeue_from_page
lock_rec_unlock
  Remove unnecessary wsrep_assert_no_bf_bf_wait function calls.
  We can very well let BF wait here.
2021-03-30 08:58:10 +03:00
Marko Mäkelä
8ea923f55b MDEV-24818: Optimize multi-statement INSERT into an empty table
If the user "opts in" (as in the parent
commit 92b2a911e5),
we can optimize multiple INSERT statements to use table-level locking
and undo logging.

There will be a change of behavior:

    CREATE TABLE t(a PRIMARY KEY) ENGINE=InnoDB;
    SET foreign_key_checks=0, unique_checks=0;
    BEGIN; INSERT INTO t SET a=1; INSERT INTO t SET a=1; COMMIT;

will end up with an empty table, because in case of an error,
the entire transaction will be rolled back, instead of rolling
back the failing statement. Previously, the second INSERT statement
would have been logged row by row, and only that second statement
would have been rolled back, leaving the first INSERT intact.

lock_table_x_unlock(), trx_mod_table_time_t::WAS_BULK: Remove.
Because we cannot really support statement rollback in this
optimized mode, we will not optimize the locking. The exclusive
table lock will be held until the end of the transaction.
2021-03-16 15:21:34 +02:00
Marko Mäkelä
bda8a2a63a MDEV-24671 fixup: Merge lock_sys_t::wait_pending into wait_count
The maximum number of concurrently waiting transactions is one less
than the maximum number of concurrent transactions.

A 45-bit cumulative counter of lock waits will support more than
one million lock waits per second for a year.
2021-03-09 09:05:26 +02:00
Marko Mäkelä
10d544aa7b Merge 10.4 into 10.5 2021-03-05 12:54:43 +02:00
Marko Mäkelä
8bab5bb332 Merge 10.3 into 10.4 2021-03-05 10:36:51 +02:00
Marko Mäkelä
1c7d4f8de7 MDEV-25016 Race condition between lock_sys_t::cancel() and page split or merge
In commit 8d16da1487 (MDEV-24789)
we accidentally introduced a race condition. During the time a
waiting lock request is being removed, the request might be
moved to another page due to a concurrent page split or merge.
To prevent this, we must hold exclusive lock_sys.latch when releasing
a record lock.

lock_release_autoinc_locks(): Avoid a potential hang.
No dict_table_t::lock_mutex must be waited for while already holding
lock_sys.wait_mutex or trx_t::mutex.

lock_cancel_waiting_and_release(): Correctly handle AUTO_INCREMENT locks.
2021-03-03 13:49:49 +02:00
Marko Mäkelä
5bd994b0d5 MDEV-24811 Assertion find(table) failed with innodb_evict_tables_on_commit_debug
This is a backport of commit 18535a4028
from 10.6.

lock_release(): Implement innodb_evict_tables_on_commit_debug.
Before releasing any locks, collect the identifiers of tables to
be evicted. After releasing all locks, look up for the tables and
evict them if it is safe to do so.

trx_commit_in_memory(): Invoke trx_update_mod_tables_timestamp()
before lock_release(), so that our locks will protect the tables
from being evicted.
2021-03-03 10:13:56 +02:00
Marko Mäkelä
33aec68a87 Merge 10.5 into 10.6 2021-03-02 14:30:50 +02:00
Marko Mäkelä
18535a4028 MDEV-24811 Assertion find(table) failed with innodb_evict_tables_on_commit_debug
lock_release_try(): Implement innodb_evict_tables_on_commit_debug.
Before releasing any locks, collect the identifiers of tables to
be evicted. After releasing all locks, look up for the tables and
evict them if it is safe to do so.

trx_t::commit_tables(): Remove the eviction logic.

trx_t::commit_in_memory(): Invoke release_locks() only after
commit_tables().
2021-03-02 14:26:57 +02:00
Marko Mäkelä
8513007c84 Cleanup: Remove some lock accessor functions 2021-03-02 14:26:57 +02:00
Marko Mäkelä
8d16da1487 MDEV-24789: Reduce lock_sys mutex contention further
lock_sys_t::deadlock_check(): Assume that only lock_sys.wait_mutex
is being held by the caller.

lock_sys_t::rd_lock_try(): New function.

lock_sys_t::cancel(trx_t*): Kill an active transaction that may be
holding a lock.

lock_sys_t::cancel(trx_t*, lock_t*): Cancel a waiting lock request.

lock_trx_handle_wait(): Avoid acquiring mutexes in some cases,
and in never acquire lock_sys.latch in exclusive mode.
This function is only invoked in a semi-consistent read
(locking a clustered index record only if it matches the search condition).
Normally, lock_wait() will take care of lock waits.

lock_wait(): Invoke the new function lock_sys_t::cancel() at the end,
to avoid acquiring exclusive lock_sys.latch.

lock_rec_other_trx_holds_expl(): Use LockGuard instead of LockMutexGuard.

lock_release_autoinc_locks(): Explicitly acquire table->lock_mutex,
in case only a shared lock_sys.latch is being held. Deadlock::report()
will still hold exclusive lock_sys.latch while invoking
lock_cancel_waiting_and_release().

lock_cancel_waiting_and_release(): Acquire trx->mutex in this function,
instead of expecting the caller to do so.

lock_unlock_table_autoinc(): Only acquire shared lock_sys.latch.

lock_table_has_locks(): Do not acquire lock_sys.latch at all.

Deadlock::check_and_resolve(): Only acquire shared lock_sys.latchm
for invoking lock_sys_t::cancel(trx, wait_lock).

innobase_query_caching_table_check_low(),
row_drop_tables_for_mysql_in_background(): Do not acquire lock_sys.latch.
2021-03-02 14:26:33 +02:00
Marko Mäkelä
01b44c054d MDEV-25026 Various code paths are accessing freed pages
The test case encryption.innodb_encrypt_freed was failing in
MemorySanitizer builds.

recv_recover_page(): Mark non-recovered pages as freed.

fil_crypt_rotate_page(): Before comparing the block->frame contents,
check if the block was marked as freed.

Other places: Whenever using BUF_GET_POSSIBLY_FREED, check the
block->page.status before accessing the page frame.

(Both uses of BUF_GET_IF_IN_POOL should be correct now.)
2021-03-02 11:51:22 +02:00
Marko Mäkelä
7cf4419fc4 MDEV-24789: Reduce lock_sys.wait_mutex contention
A performance regression was introduced by
commit e71e613353 (MDEV-24671)
and mostly addressed by
commit 455514c800.

The regression is likely caused by increased contention
lock_sys.latch (former lock_sys.mutex), possibly indirectly
caused by contention on lock_sys.wait_mutex. This change aims to
reduce both, but further improvements will be needed.

lock_wait(): Minimize the lock_sys.wait_mutex hold time.

lock_sys_t::deadlock_check(): Add a parameter for indicating
whether lock_sys.latch is exclusively locked.

trx_t::was_chosen_as_deadlock_victim: Always use atomics.

lock_wait_wsrep(): Assume that no mutex is being held.

Deadlock::report(): Always kill the victim transaction.

lock_sys_t::timeout: New counter to back MONITOR_TIMEOUT.
2021-02-26 14:58:48 +02:00
Marko Mäkelä
21987e5919 MDEV-20612 fixup: Reduce hash table lookups
Let us calculate the hash table cell address while we are calculating
the latch address, to avoid repeated computations of the address.
The latch address can be derived from the cell address with a simple
bitmask operation.
2021-02-24 14:47:42 +02:00
Marko Mäkelä
43b239a081 MDEV-24915 Galera conflict resolution is unnecessarily complex
The fix of MDEV-23328 introduced a background thread for
killing conflicting transactions.
Thanks to the refactoring that was conducted in MDEV-24671,
the high-priority ("brute-force") applier thread can kill the
conflicting transactions itself, before waiting for the
locks to be finally released (after the conflicting transactions
have been rolled back).

This also allows us to remove the hack LockGGuard that had to
be added in MDEV-20612, and remove Galera-related function
parameters from lock creation.
2021-02-18 12:16:51 +02:00
Marko Mäkelä
18dc5b0192 MDEV-20612 fixup: Remove a redundant check
lock_wait_rpl_report(): Only reload trx->lock.wait_lock
if lock_sys.wait_mutex had to be released and reacquired.
2021-02-18 12:02:36 +02:00
Marko Mäkelä
94b4578704 Merge 10.5 into 10.6 2021-02-17 19:39:05 +02:00
Marko Mäkelä
9f13670004 MDEV-24738 fixup: heap-use-after-poison in lock_sys_t::deadlock_check()
Deadlock::report(): Require the caller to acquire lock_sys.latch
if invoking on a transaction that is now owned by the current thread.
2021-02-17 17:44:23 +02:00
Marko Mäkelä
c68007d958 MDEV-24738 Improve the InnoDB deadlock checker
A new configuration parameter innodb_deadlock_report is introduced:
* innodb_deadlock_report=off: Do not report any details of deadlocks.
* innodb_deadlock_report=basic: Report transactions and waiting locks.
* innodb_deadlock_report=full (default): Report also the blocking locks.

The improved deadlock checker will consider all involved transactions
in one loop, even if the deadlock loop includes several transactions.
The theoretical maximum number of transactions that can be involved in
a deadlock is `innodb_page_size` * 8, limited by the persistent data
structures.

Note: Similar to
mysql/mysql-server@3859219875
our deadlock checker will consider at most one blocking transaction
for each waiting transaction. The new field trx->lock.wait_trx be
nullptr if and only if trx->lock.wait_lock is nullptr. Note that
trx->lock.wait_lock->trx == trx (the waiting transaction), while
trx->lock.wait_trx points to one of the transactions whose lock is
conflicting with trx->lock.wait_lock.

Considering only one blocking transaction will greatly simplify
our deadlock checker, but it may also make the deadlock checker
blind to some deadlocks where the deadlock cycle is 'hidden' by
the fact that the registered trx->lock.wait_trx is not actually
waiting for any InnoDB lock, but something else. So, instead of
deadlocks, sometimes lock wait timeout may be reported.

To improve on this, whenever trx->lock.wait_trx is changed, we
will register further 'candidate' transactions in Deadlock::to_check(),
and check for 'revealed' deadlocks as soon as possible, in lock_release()
and innobase_kill_query().

The old DeadlockChecker was holding lock_sys.latch, even though using
lock_sys.wait_mutex should be less contended (and thus preferred)
in the likely case that no deadlock is present.

lock_wait(): Defer the deadlock check to this function, instead of
executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().

DeadlockChecker: Complete rewrite:
(1) Explicitly keep track of transactions that are being waited for,
in trx->lock.wait_trx, protected by lock_sys.wait_mutex. Previously,
we were painstakingly traversing the lock heaps while blocking
concurrent registration or removal of any locks (even uncontended ones).
(2) Use Brent's cycle-detection algorithm for deadlock detection,
traversing each trx->lock.wait_trx edge at most 2 times.
(3) If a deadlock is detected, release lock_sys.wait_mutex,
acquire LockMutexGuard, re-acquire lock_sys.wait_mutex and re-invoke
find_cycle() to find out whether the deadlock is still present.
(4) Display information on all transactions that are involved in the
deadlock, and choose a victim to be rolled back.

lock_sys.deadlocks: Replaces lock_deadlock_found. Protected by wait_mutex.

Deadlock::find_cycle(): Quickly find a cycle of trx->lock.wait_trx...
using Brent's cycle detection algorithm.

Deadlock::report(): Report a deadlock cycle that was found by
Deadlock::find_cycle(), and choose a victim with the least weight.
Altogether, we may traverse each trx->lock.wait_trx edge up to 5
times (2*find_cycle()+1 time for reporting and choosing the victim).

Deadlock::check_and_resolve(): Find and resolve a deadlock.

lock_wait_rpl_report(): Report the waits-for information to
replication. This used to be executed as part of DeadlockChecker.
Replication must know the waits-for relations even if no deadlocks
are present in InnoDB.

Reviewed by: Vladislav Vaintroub
2021-02-17 12:44:08 +02:00
Marko Mäkelä
584e52118c MDEV-20612 fixup: Make comments refer to lock_sys.latch 2021-02-17 12:18:03 +02:00
Marko Mäkelä
e5d83ad472 MDEV-20612 fixup: Fix a memory leak in buffer pool resize 2021-02-16 11:27:13 +02:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Marko Mäkelä
26d6224dd6 MDEV-20612: Enable concurrent lock_release()
lock_release_try(): Try to release locks while only holding
shared lock_sys.latch.

lock_release(): If 5 attempts of lock_release_try() fail,
proceed to acquire exclusive lock_sys.latch.
2021-02-12 17:44:58 +02:00
Marko Mäkelä
b08448de64 MDEV-20612: Partition lock_sys.latch
We replace the old lock_sys.mutex (which was renamed to lock_sys.latch)
with a combination of a global lock_sys.latch and table or page hash lock
mutexes.

The global lock_sys.latch can be acquired in exclusive mode, or
it can be acquired in shared mode and another mutex will be acquired
to protect the locks for a particular page or a table.

This is inspired by
mysql/mysql-server@1d259b87a6
but the optimization of lock_release() will be done in the next commit.
Also, we will interleave mutexes with the hash table elements, similar
to how buf_pool.page_hash was optimized
in commit 5155a300fa (MDEV-22871).

dict_table_t::autoinc_trx: Use Atomic_relaxed.

dict_table_t::autoinc_mutex: Use srw_mutex in order to reduce the
memory footprint. On 64-bit Linux or OpenBSD, both this and the new
dict_table_t::lock_mutex should be 32 bits and be stored in the same
64-bit word. On Microsoft Windows, the underlying SRWLOCK is 32 or 64
bits, and on other systems, sizeof(pthread_mutex_t) can be much larger.

ib_lock_t::trx_locks, trx_lock_t::trx_locks: Document the new rules.
Writers must assert lock_sys.is_writer() || trx->mutex_is_owner().

LockGuard: A RAII wrapper for acquiring a page hash table lock.

LockGGuard: Like LockGuard, but when Galera Write-Set Replication
is enabled, we must acquire all shards, for updating arbitrary trx_locks.

LockMultiGuard: A RAII wrapper for acquiring two page hash table locks.

lock_rec_create_wsrep(), lock_table_create_wsrep(): Special
Galera conflict resolution in non-inlined functions in order
to keep the common code paths shorter.

lock_sys_t::prdt_page_free_from_discard(): Refactored from
lock_prdt_page_free_from_discard() and
lock_rec_free_all_from_discard_page().

trx_t::commit_tables(): Replaces trx_update_mod_tables_timestamp().

lock_release(): Let trx_t::commit_tables() invalidate the query cache
for those tables that were actually modified by the transaction.
Merge lock_check_dict_lock() to lock_release().

We must never release lock_sys.latch while holding any
lock_sys_t::hash_latch. Failure to do that could lead to
memory corruption if the buffer pool is resized between
the time lock_sys.latch is released and the hash_latch is released.
2021-02-12 17:44:32 +02:00
Marko Mäkelä
b01d8e1a33 MDEV-20612: Replace lock_sys.mutex with lock_sys.latch
For now, we will acquire the lock_sys.latch only in exclusive mode,
that is, use it as a mutex.

This is preparation for the next commit where we will introduce
a less intrusive alternative, combining a shared lock_sys.latch
with dict_table_t::lock_mutex or a mutex embedded in
lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash.
2021-02-11 14:52:10 +02:00
Marko Mäkelä
903464929c MDEV-20612 preparation: LockMutexGuard
Let us use the RAII wrapper LockMutexGuard for most operations where
lock_sys.mutex is acquired.
2021-02-11 14:36:11 +02:00
Marko Mäkelä
2e64513fba MDEV-20612 preparation: Fewer calls to buf_page_t::id() 2021-02-11 12:48:07 +02:00
Marko Mäkelä
74ab97f58f Cleanup: Remove lock_trx_lock_list_init(), lock_table_get_n_locks() 2021-02-07 11:18:21 +02:00
Marko Mäkelä
487fbc2e15 MDEV-21452 fixup: Introduce trx_t::mutex_is_owner()
When we replaced trx_t::mutex with srw_mutex
in commit 38fd7b7d91
we lost the SAFE_MUTEX instrumentation.
Let us introduce a replacement and restore the assertions.
2021-02-05 16:37:06 +02:00
Marko Mäkelä
455514c800 MDEV-24789: Try to reduce mutex contention 2021-02-05 16:16:44 +02:00
Marko Mäkelä
3e45f8e36a MDEV-24789: Reduce sizeof(trx_lock_t)
trx_lock_t::cond: Use pthread_cond_t directly, because no instrumentation
will ever be used. This saves sizeof(void*) and removes some duplicated
inline code.

trx_lock_t::was_chosen_as_wsrep_victim: Fold into
trx_lock_t::was_chosen_as_deadlock_victim.

trx_lock_t::cancel, trx_lock_t::rec_cached, trx_lock_t::table_cached:
Use only one byte of storage, reducing memory alignment waste.

On AMD64 GNU/Linux, MDEV-24671 caused a sizeof(trx_lock_t) increase
of 48 bytes (plus the PLUGIN_PERFSCHEMA overhead of trx_lock_t::cond).
These changes should save 32 bytes.
2021-02-05 13:15:56 +02:00
Marko Mäkelä
465bdabb7a Cleanup: Reduce some lock_sys.mutex contention
lock_table(): Remove the constant parameter flags=0.

lock_table_resurrect(): Merge lock_table_ix_resurrect() and
lock_table_x_resurrect().

lock_rec_lock(): Only acquire LockMutexGuard if lock_table_has()
does not hold.
2021-02-05 13:14:50 +02:00
Marko Mäkelä
de407e7cb4 MDEV-24731 fixup: bogus assertion
DeadlockChecker::search(): Move a bogus assertion into a condition.
If the current transaction is waiting for a table lock (on something
else than an auto-increment lock), it is well possible that other
transactions are holding not only a conflicting lock, but also an
auto-increment lock.

This mistake was noticed during the testing of MDEV-24731, but it was
accidentally introduced in commit 5f46385764.

lock_wait_end(): Remove an unused variable, and add an assertion.
2021-02-05 08:35:15 +02:00
Marko Mäkelä
5f46385764 MDEV-24731 Excessive mutex contention in DeadlockChecker::check_and_resolve()
The DeadlockChecker expects to be able to freeze the waits-for graph.
Hence, it is best executed somewhere where we are not holding any
additional mutexes.

lock_wait(): Defer the deadlock check to this function, instead
of executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().

DeadlockChecker::trx_rollback(): Merge with the only caller,
check_and_resolve().

LockMutexGuard: RAII accessor for lock_sys.mutex.

lock_sys.deadlocks: Replaces lock_deadlock_found.

trx_t: Clean up some comments.
2021-02-04 16:38:07 +02:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Jan Lindström
75546dfbb1 MDEV-24704 : Galera test failure on galera.galera_nopk_unicode
Analysis:
=========

Reason for test failure was a mutex deadlock between DeadlockChecker with stack

Thread 6 (Thread 0xffff70066070 (LWP 24667)):
0  0x0000ffff784e850c in __lll_lock_wait (futex=futex@entry=0xffff04002258, private=0) at lowlevellock.c:46
1  0x0000ffff784e19f0 in __GI___pthread_mutex_lock (mutex=mutex@entry=0xffff04002258) at pthread_mutex_lock.c:135
2  0x0000aaaaac8cd014 in inline_mysql_mutex_lock (src_file=0xaaaaacea0f28 "/home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc", src_line=762, that=0xffff04002258) at /home/buildbot/buildbot/build/mariadb-10.2.37/include/mysql/psi/mysql_thread.h:675
3  wsrep_thd_is_BF (thd=0xffff040009a8, sync=sync@entry=1 '\001') at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc:762
4  0x0000aaaaacadce68 in lock_rec_has_to_wait (for_locking=false, lock_is_on_supremum=<optimized out>, lock2=0xffff628952d0, type_mode=291, trx=0xffff62894070) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:826
5  lock_has_to_wait (lock1=<optimized out>, lock2=0xffff628952d0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:873
6  0x0000aaaaacadd0b0 in DeadlockChecker::search (this=this@entry=0xffff70061fe8) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:7142
7  0x0000aaaaacae2dd8 in DeadlockChecker::check_and_resolve (lock=lock@entry=0xffff62894120, trx=trx@entry=0xffff62894070) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:7286
8  0x0000aaaaacae3070 in lock_rec_enqueue_waiting (c_lock=0xffff628952d0, type_mode=type_mode@entry=3, block=block@entry=0xffff62076c40, heap_no=heap_no@entry=2, index=index@entry=0xffff4c076f28, thr=thr@entry=0xffff4c078810, prdt=prdt@entry=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:1796
9  0x0000aaaaacae3900 in lock_rec_lock_slow (thr=0xffff4c078810, index=0xffff4c076f28, heap_no=2, block=0xffff62076c40, mode=3, impl=0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:2106
10 lock_rec_lock (impl=false, mode=3, block=0xffff62076c40, heap_no=2, index=0xffff4c076f28, thr=0xffff4c078810) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:2168
11 0x0000aaaaacae3ee8 in lock_sec_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0xffff62076c40, rec=rec@entry=0xffff6240407f "\303\221\342\200\232\303\220\302\265\303\220\302\272\303\221\302\201\303\221\342\200\232", index=index@entry=0xffff4c076f28, offsets=0xffff4c080690, offsets@entry=0xffff70062a30, mode=LOCK_X, mode@entry=1653162096, gap_mode=0, gap_mode@entry=281470749427104, thr=thr@entry=0xffff4c078810) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/lock/lock0lock.cc:6082
12 0x0000aaaaacb684c4 in sel_set_rec_lock (pcur=0xaaaac841c270, pcur@entry=0xffff4c077d58, rec=0xffff6240407f "\303\221\342\200\232\303\220\302\265\303\220\302\272\303\221\302\201\303\221\342\200\232", rec@entry=0x28 <error: Cannot access memory at address 0x28>, index=index@entry=0xffff4c076f28, offsets=0xffff70062a30, mode=281472334905456, type=281470749427104, thr=0xffff4c078810, thr@entry=0x9f, mtr=0x0, mtr@entry=0xffff70063928) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/row/row0sel.cc:1270
13 0x0000aaaaacb6bb64 in row_search_mvcc (buf=buf@entry=0xffff4c080690 "\376\026", mode=mode@entry=PAGE_CUR_GE, prebuilt=0xffff4c077b98, match_mode=match_mode@entry=1, direction=direction@entry=0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/row/row0sel.cc:5181
14 0x0000aaaaacaae568 in ha_innobase::index_read (this=0xffff4c038a80, buf=0xffff4c080690 "\376\026", key_ptr=<optimized out>, key_len=768, find_flag=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc:9393
15 0x0000aaaaac9201cc in handler::ha_index_read_map (this=0xffff4c038a80, buf=0xffff4c080690 "\376\026", key=0xffff4c07ccf8 "", keypart_map=keypart_map@entry=18446744073709551615, find_flag=find_flag@entry=HA_READ_KEY_EXACT) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/handler.cc:2718
16 0x0000aaaaac9f36b0 in Rows_log_event::find_row (this=this@entry=0xffff4c030098, rgi=rgi@entry=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:13461
17 0x0000aaaaac9f3e44 in Update_rows_log_event::do_exec_row (this=0xffff4c030098, rgi=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:13936
18 0x0000aaaaac9e7ee8 in Rows_log_event::do_apply_event (this=0xffff4c030098, rgi=0xffff4c01b510) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.cc:11101
19 0x0000aaaaac8ca4e8 in Log_event::apply_event (rgi=0xffff4c01b510, this=0xffff4c030098) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/log_event.h:1454
20 wsrep_apply_events (buf_len=0, events_buf=0x1, thd=0xffff4c0009a8) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_applier.cc:164
21 wsrep_apply_cb (ctx=0xffff4c0009a8, buf=0x1, buf_len=18446743528248705000, flags=<optimized out>, meta=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_applier.cc:267
22 0x0000ffff7322d29c in galera::TrxHandle::apply (this=this@entry=0xffff4c027960, recv_ctx=recv_ctx@entry=0xffff4c0009a8, apply_cb=apply_cb@entry=0xaaaaac8c9fe8 <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, meta=...) at /home/buildbot/buildbot/build/galera/src/trx_handle.cpp:317
23 0x0000ffff73239664 in apply_trx_ws (recv_ctx=recv_ctx@entry=0xffff4c0009a8, apply_cb=0xaaaaac8c9fe8 <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, commit_cb=0xaaaaac8ca8d0 <wsrep_commit_cb(void*, uint32_t, wsrep_trx_meta_t const*, wsrep_bool_t*, bool)>, trx=..., meta=...) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:34
24 0x0000ffff7323c0c4 in galera::ReplicatorSMM::apply_trx (this=this@entry=0xaaaac7c7ebc0, recv_ctx=recv_ctx@entry=0xffff4c0009a8, trx=trx@entry=0xffff4c027960) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:454
25 0x0000ffff7323e8b8 in galera::ReplicatorSMM::process_trx (this=0xaaaac7c7ebc0, recv_ctx=0xffff4c0009a8, trx=0xffff4c027960) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:1258
26 0x0000ffff73268f68 in galera::GcsActionSource::dispatch (this=this@entry=0xaaaac7c7f348, recv_ctx=recv_ctx@entry=0xffff4c0009a8, act=..., exit_loop=@0xffff7006535f: false) at /home/buildbot/buildbot/build/galera/src/gcs_action_source.cpp:115
27 0x0000ffff73269dd0 in galera::GcsActionSource::process (this=0xaaaac7c7f348, recv_ctx=0xffff4c0009a8, exit_loop=@0xffff7006535f: false) at /home/buildbot/buildbot/build/galera/src/gcs_action_source.cpp:180
28 0x0000ffff7323ef5c in galera::ReplicatorSMM::async_recv (this=0xaaaac7c7ebc0, recv_ctx=0xffff4c0009a8) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:362
29 0x0000ffff73217760 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /home/buildbot/buildbot/build/galera/src/wsrep_provider.cpp:244
30 0x0000aaaaac8cb344 in wsrep_replication_process (thd=0xffff4c0009a8) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_thd.cc:486
31 0x0000aaaaac8bc3a0 in start_wsrep_THD (arg=arg@entry=0xaaaac7cb3e38) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/wsrep_mysqld.cc:2173
32 0x0000aaaaaca89198 in pfs_spawn_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/perfschema/pfs.cc:1869
33 0x0000ffff784defc4 in start_thread (arg=0xaaaaaca890d8 <pfs_spawn_thread(void*)>) at pthread_create.c:335
34 0x0000ffff7821c3f0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

and background victim transaction kill with stack

Thread 28 (Thread 0xffff485fa070 (LWP 24870)):
0  0x0000ffff784e530c in __pthread_cond_wait (cond=cond@entry=0xaaaac83e98e0, mutex=mutex@entry=0xaaaac83e98b0) at pthread_cond_wait.c:186
1  0x0000aaaaacb10788 in os_event::wait (this=0xaaaac83e98a0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:158
2  os_event::wait_low (reset_sig_count=2, this=0xaaaac83e98a0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:325
3  os_event_wait_low (event=0xaaaac83e98a0, reset_sig_count=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/os/os0event.cc:507
4  0x0000aaaaacb98480 in sync_array_wait_event (arr=arr@entry=0xaaaac7dbb450, cell=@0xffff485f96e8: 0xaaaac7dbb560) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/sync/sync0arr.cc:471
5  0x0000aaaaacab53c8 in TTASEventMutex<GenericPolicy>::enter (line=19524, filename=0xaaaaacf2ce40 "/home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc", max_delay=<optimized out>, max_spins=0, this=0xaaaac83cc8c0) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/include/ib0mutex.h:516
6  PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0xaaaac83cc8c0, n_spins=<optimized out>, n_delay=<optimized out>, name=0xaaaaacf2ce40 "/home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc", line=19524) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/include/ib0mutex.h:637
7  0x0000aaaaacaaa52c in bg_wsrep_kill_trx (void_arg=0xffff4c057430) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/innobase/handler/ha_innodb.cc:19524
8  0x0000aaaaac79e7f0 in handle_manager (arg=arg@entry=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.37/sql/sql_manager.cc:112
9  0x0000aaaaaca89198 in pfs_spawn_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.37/storage/perfschema/pfs.cc:1869
10 0x0000ffff784defc4 in start_thread (arg=0xaaaaaca890d8 <pfs_spawn_thread(void*)>) at pthread_create.c:335
11 0x0000ffff7821c3f0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Fix:
====

Do not use THD::LOCK_thd_data mutex if we already hold lock_sys->mutex because it
will cause mutexing order violation. Victim transaction holding conflicting
locks can't be committed or rolled back while we hold lock_sys->mutex. Thus,
it is safe to do wsrep_thd_is_BF call with no additional mutexes.
2021-01-27 19:46:20 +02:00
Marko Mäkelä
68b2819342 Cleanup: Remove many C-style lock_get_ accessors
Let us prefer member functions to the old C-style accessor functions.
Also, prefer bitwise AND operations for checking multiple flags.
2021-01-27 18:41:58 +02:00
Marko Mäkelä
cbb0a60c57 Cleanup: Remove lock_get_size() 2021-01-27 18:02:11 +02:00
Marko Mäkelä
121d0f7f53 MDEV-20612: Speed up lock_table_other_has_incompatible()
dict_table_t::n_lock_x_or_s: Keep track of LOCK_S or LOCK_X on the table.

lock_table_other_has_incompatible(): In the likely case that no
transaction is waiting for or holding LOCK_S or LOCK_X on the table,
return early: conflicts cannot exist.

This is based on the idea of Zhai Weixiang, who reported MySQL Bug #72948.

lock_table_has_to_wait_in_queue(), lock_table_dequeue():
Extend the optimization, inspired by
mysql/mysql-server@bb7191d6cb
by Jakub Łopuszański.
2021-01-27 15:45:39 +02:00
Marko Mäkelä
3329f0ed0c Cleanup: Remove LOCK_REC (which was mutually exclusive with LOCK_TABLE) 2021-01-27 15:45:39 +02:00
Marko Mäkelä
e71e613353 MDEV-24671: Replace lock_wait_timeout_task with mysql_cond_timedwait()
lock_wait(): Replaces lock_wait_suspend_thread(). Wait for the lock to
be granted or the transaction to be killed using mysql_cond_timedwait()
or mysql_cond_wait().

lock_wait_end(): Replaces que_thr_end_lock_wait() and
lock_wait_release_thread_if_suspended().

lock_wait_timeout_task: Remove. The operating system kernel will
resume the mysql_cond_timedwait() in lock_wait(). An added benefit
is that innodb_lock_wait_timeout no longer has a 'jitter' of 1 second,
which was caused by this wake-up task waking up only once per second,
and then waking up any threads for which the timeout (which was only
measured in seconds) was exceeded.

innobase_kill_query(): Set trx->error_state=DB_INTERRUPTED,
so that a call trx_is_interrupted(trx) in lock_wait() can be avoided.

We will protect things more consistently with lock_sys.wait_mutex,
which will be moved below lock_sys.mutex in the latching order.

trx_lock_t::cond: Condition variable for !wait_lock, used with
lock_sys.wait_mutex.

srv_slot_t: Remove. Replaced by trx_lock_t::cond,

lock_grant_after_reset(): Merged to to lock_grant().

lock_rec_get_index_name(): Remove.

lock_sys_t: Introduce wait_pending, wait_count, wait_time, wait_time_max
that are protected by wait_mutex.

trx_lock_t::que_state: Remove.

que_thr_state_t: Remove QUE_THR_COMMAND_WAIT, QUE_THR_LOCK_WAIT.

que_thr_t: Remove is_active, start_running(), stop_no_error().

que_fork_t::n_active_thrs, trx_lock_t::n_active_thrs: Remove.
2021-01-27 15:45:39 +02:00
Marko Mäkelä
7f1ab8f742 Cleanups:
que_thr_t::fork_type: Remove.

QUE_THR_SUSPENDED, TRX_QUE_COMMITTING: Remove.

Cleanup lock_cancel_waiting_and_release()
2021-01-27 15:45:39 +02:00
Marko Mäkelä
898dcf93a8 Cleanup the lock creation
LOCK_MAX_N_STEPS_IN_DEADLOCK_CHECK, LOCK_MAX_DEPTH_IN_DEADLOCK_CHECK,
LOCK_RELEASE_INTERVAL: Replace with the bare use of the constants.

lock_rec_create_low(): Remove LOCK_PAGE_BITMAP_MARGIN altogether.
We already have REDZONE_SIZE as a 'safety margin' in AddressSanitizer
builds, to catch any out-of-bounds access.

lock_prdt_add_to_queue(): Avoid a useless search when enqueueing
a waiting lock request.

lock_prdt_lock(): Reduce the size of the trx->mutex critical section.
2021-01-27 15:45:38 +02:00
Marko Mäkelä
469da6c34d Cleanup: Remove trx_get_id_for_print()
Any transaction that has requested a lock must have trx->id!=0.

trx_print_low(): Distinguish non-locking or inactive transaction
objects by displaying the pointer in parentheses.

fill_trx_row(): Do not try to map trx->id to a pointer-based value.
2021-01-27 15:45:38 +02:00
Marko Mäkelä
3cef4f8f0f MDEV-515 Reduce InnoDB undo logging for insert into empty table
We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.

For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().

Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.

Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.

This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).

INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.

In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.

Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.

dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.

ins_node_t::bulk_insert: Whether bulk insert was initiated.

trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.

trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().

trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.

row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.

lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.

btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().

trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.

ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.

row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.

row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().

lock_sec_rec_cons_read_sees(): Replaced with lower-level code.

btr_root_page_init(): Refactored from btr_create().

dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.

ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.

This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
2021-01-25 18:41:27 +02:00
sjaakola
beaea31ab1 MDEV-23851 BF-BF Conflict issue because of UK GAP locks
Some DML operations on tables having unique secondary keys cause scanning
in the secondary index, for instance to find potential unique key violations
in the seconday index. This scanning may involve GAP locking in the index.
As this locking happens also when applying replication events in high priority
applier threads, there is a probabality for lock conflicts between two wsrep
high priority threads.

This PR avoids lock conflicts of high priority wsrep threads, which do
secondary index scanning e.g. for duplicate key detection.

The actual fix is the patch in sql_class.cc:thd_need_ordering_with(), where
we allow relaxed GAP locking protocol between wsrep high priority threads.
wsrep high priority threads (replication appliers, replayers and TOI processors)
are ordered by the replication provider, and they will not need serializability
support gained by secondary index GAP locks.

PR contains also a mtr test, which exercises a scenario where two replication
applier threads have a false positive conflict in GAP of unique secondary index.
The conflicting local committing transaction has to replay, and the test verifies
also that the replaying phase will not conflict with the latter repllication applier.
Commit also contains new test scenario for galera.galera_UK_conflict.test,
where replayer starts applying after a slave applier thread, with later seqno,
has advanced to commit phase. The applier and replayer have false positive GAP
lock conflict on secondary unique index, and replayer should ignore this.
This test scenario caused crash with earlier version in this PR, and to fix this,
the secondary index uniquenes checking has been relaxed even further.

Now innodb trx_t structure has new member: bool wsrep_UK_scan, which is set to
true, when high priority thread is performing unique secondary index scanning.
The member trx_t::wsrep_UK_scan is defined inside WITH_WSREP directive, to make
it possible to prepare a MariaDB build where this additional trx_t member is
not present and is not used in the code base. trx->wsrep_UK_scan is set to true
only for the duration of function call for: lock_rec_lock() trx->wsrep_UK_scan
is used only in lock_rec_has_to_wait() function to relax the need to wait if
wsrep_UK_scan is set and conflicting transaction is also high priority.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-01-18 08:09:06 +02:00
Marko Mäkelä
666565c7f0 Merge 10.5 into 10.6 2021-01-11 17:32:08 +02:00
Marko Mäkelä
8de233af81 Merge 10.4 into 10.5 2021-01-11 16:29:51 +02:00
Marko Mäkelä
fd5e103aa4 Merge 10.3 into 10.4 2021-01-11 10:35:06 +02:00
Marko Mäkelä
5a1a714187 Merge 10.2 into 10.3 (except MDEV-17556)
The fix of MDEV-17556 (commit e25623e78a
and commit 61a362c949) has been
omitted due to conflicts and will have to be applied separately later.
2021-01-11 09:41:54 +02:00
Jan Lindström
775fccea0c MDEV-23536 : Race condition between KILL and transaction commit
A race condition may occur between the execution of transaction commit,
and an execution of a KILL statement that would attempt to abort that
transaction.

MDEV-17092 worked around this race condition by modifying InnoDB code.
After that issue was closed, Sergey Vojtovich pointed out that this
race condition would better be fixed above the storage engine layer:

If you look carefully into the above, you can conclude that
thd->free_connection() can be called concurrently with
KILL/thd->awake(). Which is the bug. And it is partially fixed in
THD::~THD(), that is destructor waits for KILL completion:

Fix: Add necessary mutex operations to THD::free_connection()
and move WSREP specific code also there. This ensures that no
one is using THD while we do free_connection(). These mutexes
will also ensures that there can't be concurrent KILL/THD::awake().

innobase_kill_query
  We can now remove usage of trx_sys_mutex introduced on MDEV-17092.

trx_t::free()
  Poison trx->state and trx->mysql_thd

This patch is validated with an RQG run similar to the one that
reproduced MDEV-17092.
2021-01-08 17:11:54 +02:00
Marko Mäkelä
8a4ca33938 Cleanup: Declare trx_weight_ge() inline 2021-01-05 14:18:10 +02:00
Marko Mäkelä
bd52f1a2dd Cleanup: Remove lock_number_of_rows_locked()
Let us access trx->lock.n_rec_locks directly.
2021-01-04 15:30:34 +02:00
Marko Mäkelä
a64cb6d265 Merge 10.3 into 10.4 2020-12-28 13:46:22 +02:00
Marko Mäkelä
7f037b8c9f Merge 10.2 into 10.3 2020-12-28 13:30:20 +02:00
sjaakola
8e3e87d2fc MDEV-23851 MDEV-24229 BF-BF conflict issues
Issues MDEV-23851 and MDEV-24229 are probably duplicates and are caused by the new self-asserting function lock0lock.cc:wsrep_assert_no_bf_bf_wait().
The criteria for asserting is too strict and does not take in consideration scenarios of "false positive" lock conflicts, which are resolved by replaying the local transaction.
As a fix, this PR is relaxing the assert criteria by two conditions, which skip assert if high priority transactions are locking in correct order or if conflicting high priority lock holder is aborting and has just not yet released the lock.

Alternative fix would be to remove wsrep_assert_no_bf_bf_wait() altogether, or remove the assert in this function and let it only print warnings in error log.
But in my high conflict rate multi-master test scenario, this relaxed asserting appears to be safe.

This PR also removes two wsrep_report_bf_lock_wait() calls in innodb lock manager, which cause mutex access assert in debug builds.

Foreign key appending missed handling of data types of float and double in INSERT execution. This is not directly related to the actual issue here but is fixed in this PR nevertheless. Missing these foreign keys values in certification could cause problems in some multi-master load scenarios.

Finally, some problem reports suggest that some of the issues reported in MDEV-23851 might relate to false positive lock conflicts over unique secondary index gaps. There is separate work for relaxing UK index gap locking of replication appliers, and separate PR will be submitted for it, with a related mtr test as well.
2020-12-28 09:06:16 +02:00
Marko Mäkelä
cf2480dd77 MDEV-21452: Retain the watchdog only on dict_sys.mutex, for performance
Most hangs seem to involve dict_sys.mutex. While holding lock_sys.mutex
we rarely acquire any buffer pool page latches, which are a frequent
source of potential hangs.
2020-12-15 17:56:18 +02:00
Marko Mäkelä
ff5d306e29 MDEV-21452: Replace ib_mutex_t with mysql_mutex_t
SHOW ENGINE INNODB MUTEX functionality is completely removed,
as are the InnoDB latching order checks.

We will enforce innodb_fatal_semaphore_wait_threshold
only for dict_sys.mutex and lock_sys.mutex.

dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex.

lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex.

FIXME: srv_sys should be removed altogether; it is duplicating tpool
functionality.

fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must
not hold fil_system.mutex.

fil_close_all_files(): To prevent SAFE_MUTEX warnings for
fil_space_destroy_crypt_data(), we must not hold fil_system.mutex
while invoking fil_space_free_low() on a detached tablespace.
2020-12-15 17:56:18 +02:00
Marko Mäkelä
38fd7b7d91 MDEV-21452: Replace all direct use of os_event_t
Let us replace os_event_t with mysql_cond_t, and replace the
necessary ib_mutex_t with mysql_mutex_t so that they can be
used with condition variables.

Also, let us replace polling (os_thread_sleep() or timed waits)
with plain mysql_cond_wait() wherever possible.

Furthermore, we will use the lightweight srw_mutex for trx_t::mutex,
to hopefully reduce contention on lock_sys.mutex.

FIXME: Add test coverage of
mariabackup --backup --kill-long-queries-timeout
2020-12-15 17:56:17 +02:00
Marko Mäkelä
9702be2c73 MDEV-24142: Remove __FILE__,__LINE__ related to buf_block_t::lock 2020-12-03 15:28:53 +02:00
Marko Mäkelä
ac028ec5d8 MDEV-24142: Remove the LatchDebug interface to rw-locks
The latching order checks for rw-locks have not caught many bugs
in the past few years and they are greatly complicating the code.

Last time the debug checks were useful was in
commit 59caf2c3c1 (MDEV-13485).

The B-tree hang MDEV-14637 was not caught by LatchDebug,
because the granularity of the checks is not sufficient
to distinguish the levels of non-leaf B-tree pages.

The interface was already made dead code by the grandparent
commit 03ca6495df.
2020-12-03 15:27:50 +02:00
Marko Mäkelä
4e359eb88f Cleanup: Reduce trx_t::mutex hold time 2020-11-24 15:44:55 +02:00
Marko Mäkelä
814bc21305 Cleanup: Use Atomic_relaxed for trx_t::state
For reading trx_t::state we can avoid acquiring trx_t::mutex.
Atomic load and store should be similar to normal load and store
on most instruction set architectures. The atomicity of the operation
would merely prohibit the compiler from reordering some operations.
2020-11-24 15:44:55 +02:00
Marko Mäkelä
edbde4a11f MDEV-24167: Replace fil_space::latch
We must avoid acquiring a latch while we are already holding one.
The tablespace latch was being acquired recursively in some
operations that allocate or free pages.
2020-11-24 15:43:12 +02:00
Marko Mäkelä
254bb1c35b Merge 10.5 into 10.6 2020-11-12 15:54:08 +02:00
Dmitry Shulga
7b20aa576b MDEV-24116: Fix clang 12 -Wrange-loop-analysis 2020-11-04 16:54:25 +02:00
Marko Mäkelä
09a1f0075a Merge 10.5 into 10.6 2020-11-02 12:49:19 +02:00
Marko Mäkelä
a8de8f261d Merge 10.2 into 10.3 2020-10-28 10:01:50 +02:00
Marko Mäkelä
118e258aaa MDEV-23855: Shrink fil_space_t
Merge n_pending_ios, n_pending_ops to std::atomic<uint32_t> n_pending.
Change some more fil_space_t members to uint32_t to reduce
the memory footprint.

fil_space_t::add(), fil_ibd_create(): Attach the already opened
handle to the tablespace, and enforce the fil_system.n_open limit.

dict_boot(): Initialize fil_system.max_assigned_id.

srv_boot(): Call srv_thread_pool_init() before anything else,
so that files should be opened in the correct mode on Windows.

fil_ibd_create(): Create the file in OS_FILE_AIO mode, just like
fil_node_open_file_low() does it.

dict_table_t::is_accessible(): Replaces fil_table_accessible().

Reviewed by: Vladislav Vaintroub
2020-10-26 17:53:54 +02:00
Marko Mäkelä
1657b7a583 Merge 10.4 to 10.5 2020-10-22 17:08:49 +03:00
Marko Mäkelä
46957a6a77 Merge 10.3 into 10.4 2020-10-22 13:27:18 +03:00
Marko Mäkelä
e3d692aa09 Merge 10.2 into 10.3 2020-10-22 08:26:28 +03:00
Marko Mäkelä
1595189250 MDEV-23897 SIGSEGV on commit with innodb_lock_schedule_algorithm=VATS
This regression for debug builds was introduced by
MDEV-23101 (commit 224c950462).

Due to MDEV-16664, the parameter
innodb_lock_schedule_algorithm=VATS
is not enabled by default.

The purpose of the added assertions was to enforce the invariant that
Galera replication cannot be enabled together with VATS due to MDEV-12837.
However, upon closer inspection, it is obvious that the variable 'lock'
may be assigned to the null pointer if no match is found in the
previous->hash list.

lock_grant_and_move_on_page(), lock_grant_and_move_on_rec():
Assert !lock->trx->is_wsrep() only after ensuring that lock
is not a null pointer.
2020-10-06 22:35:43 +03:00
Marko Mäkelä
b4fb15ccd4 MDEV-16664: Remove innodb_lock_schedule_algorithm
The setting innodb_lock_schedule_algorithm=VATS that was introduced
in MDEV-11039 (commit 021212b525)
causes conflicting exclusive locks to be incorrectly granted to
two transactions. Specifically, in lock_rec_insert_by_trx_age()
the predicate !lock_rec_has_to_wait_in_queue(in_lock) would hold even
though an active transaction is already holding an exclusive lock.
This was observed between two DELETE of the same clustered index record.
The HASH_DELETE invocation in lock_rec_enqueue_waiting() may be related.

Due to lack of progress in diagnosing the problem, we will remove the
option. The unsafe option was enabled by default between
commit 0c15d1a6ff (MariaDB 10.2.3)
and the parent of
commit 1cc1d0429d (MariaDB 10.2.17, 10.3.9),
and it was deprecated in
commit 295e2d500b (MariaDB 10.2.34).
2020-10-05 10:30:26 +03:00
Marko Mäkelä
a9550c47e4 MDEV-16264 fixup: Remove unused code and data
LATCH_ID_OS_AIO_READ_MUTEX,
LATCH_ID_OS_AIO_WRITE_MUTEX,
LATCH_ID_OS_AIO_LOG_MUTEX,
LATCH_ID_OS_AIO_IBUF_MUTEX,
LATCH_ID_OS_AIO_SYNC_MUTEX: Remove. The tpool is not instrumented.

lock_set_timeout_event(): Remove.

srv_sys_mutex_key, srv_sys_t::mutex, SYNC_THREADS: Remove.

srv_slot_t::suspended: Remove. We only ever assigned this data member
true, so it is redundant.

ib_wqueue_wait(), ib_wqueue_timedwait(): Remove.

os_thread_join(): Remove.

os_thread_create(), os_thread_exit(): Remove redundant parameters.

These were missed in commit 5e62b6a5e0.
2020-09-30 14:28:11 +03:00
Marko Mäkelä
882ce206db Merge 10.4 into 10.5 2020-09-23 11:32:43 +03:00
Marko Mäkelä
3a423088ac Merge 10.3 into 10.4 2020-09-21 12:29:00 +03:00
Marko Mäkelä
cbcb4ecabb Merge 10.2 into 10.3 2020-09-21 11:04:04 +03:00
Marko Mäkelä
00cd53d39a MDEV-23719: Make lock_sys use page_id_t
Since commit 8ccb3caafb it should be
more efficient to use page_id_t rather than two separate variables
for tablespace identifier and page number.

lock_rec_fold(): Replaced with page_id_t::fold().

lock_rec_hash(): Replaced with lock_sys.hash(page_id).

lock_rec_expl_exist_on_page(), lock_rec_get_first_on_page_addr(),
lock_rec_get_first_on_page(): Replaced with lock_sys.get_first().
2020-09-17 14:08:41 +03:00
Marko Mäkelä
852771ba1e MDEV-23719: Remove buf_block_t::lock_hash_val
The InnoDB buffer block page descriptor is caching a value
buf_block_t::lock_hash_val that should be quick to compute
in the first place, as suggested by
commit 14be814380.

lock_rec_fold(): Define as page_id_t::fold() instead of
ut_fold_ulint_pair().
2020-09-17 14:08:34 +03:00
Jan Lindström
224c950462 MDEV-23101 : SIGSEGV in lock_rec_unlock() when Galera is enabled
Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue
and move condition to correct callers. Add a function to report
BF lock waits and assert if incorrect BF-BF lock wait happens.

wsrep_report_bf_lock_wait
	Add a new function to report BF lock wait.

wsrep_assert_no_bf_bf_wait
	Add a new function to check do we have a
	BF-BF wait and if we have report this case
	and assert as it is a bug.

lock_rec_has_to_wait
	Use new wsrep_assert_bf_wait to check BF-BF wait.

lock_rec_create_low
lock_table_create
	Use new function to report BF lock waits.

lock_rec_insert_by_trx_age
lock_grant_and_move_on_page
lock_grant_and_move_on_rec
	Assert that trx is not Galera as VATS is not compatible
	with Galera.

lock_rec_add_to_queue
	If there is conflicting lock in a queue make sure that
	transaction is BF.

lock_rec_has_to_wait_in_queue
	Remove incorrect BF handling. If there is conflicting
	locks in a queue all transactions must wait.

lock_rec_dequeue_from_page
lock_rec_unlock
	If there is conflicting lock make sure it is not
	BF-BF case.

lock_rec_queue_validate
	Add Galera record locking rules comment and use
	new function to report BF lock waits.

All attempts to reproduce the original assertion have been
failed. Therefore, there is no test case on this commit.
2020-09-10 13:18:12 +03:00
Marko Mäkelä
1c58748196 Merge 10.4 into 10.5 2020-08-10 21:38:55 +03:00
Marko Mäkelä
eae968f62d Merge 10.3 into 10.4 2020-08-10 21:08:46 +03:00
Marko Mäkelä
bafc5c1321 Merge 10.2 into 10.3 2020-08-10 18:40:57 +03:00
Oleksandr Byelkin
48b5777ebd Merge branch '10.4' into 10.5 2020-08-04 17:24:15 +02:00
Marko Mäkelä
91caf130b7 MDEV-23101 fixup: Remove redundant code
lock_rec_has_to_wait_in_queue(): Remove an obviously redundant assertion
that was added in commit a8ec45863b
and also enclose a Galera-specific condition in #ifdef WITH_WSREP.
2020-08-04 09:56:09 +03:00
Marko Mäkelä
bbd70fcc43 MDEV-23379 Deprecate&ignore InnoDB concurrency throttling parameters
The parameters innodb_thread_concurrency and innodb_commit_concurrency
were useful years ago when both computing resources and the implementation
of some shared data structures were limited. MySQL 5.0 or 5.1 had trouble
scaling beyond 8 concurrent connections. Most of the scalability bottlenecks
have been removed since then, and the transactions per second delivered
by MariaDB Server 10.5 should not dramatically drop upon exceeding the
'optimal' number of connections.

Hence, enabling any concurrency throttling for InnoDB actually makes
things worse. We have seen many customers mistakenly setting this to a
small value like 16 or 64 and then complaining the server was slow.

Ignoring the parameters allows us to remove some normally unused code
and data structures, which could slightly improve performance.

innodb_thread_concurrency, innodb_commit_concurrency,
innodb_replication_delay, innodb_concurrency_tickets,
innodb_thread_sleep_delay, innodb_adaptive_max_sleep_delay:
Deprecate and ignore; hard-wire to 0.

The column INFORMATION_SCHEMA.INNODB_TRX.trx_concurrency_tickets
will always report 0.
2020-08-04 06:59:29 +03:00
Jan Lindström
a8ec45863b MDEV-23101: SIGSEGV in lock_rec_unlock() when Galera is enabled
lock_rec_has_to_wait
wsrep_kill_victim
lock_rec_create_low
lock_rec_add_to_queue
DeadlockChecker::select_victim()

	THD can't change from normal transaction to BF (brute force) transaction
	here, thus there is no need to syncronize access in wsrep_thd_is_BF
	function.

lock_rec_has_to_wait_in_queue

	Add condition that lock is not NULL and add assertions if we are in
	strong state.
2020-08-03 15:15:40 +03:00
Eugene Kosov
e9c389c334 MDEV-22701 InnoDB: encapsulate trx_sys.mutex and trx_sys.trx_list into a separate class
thread_safe_trx_ilist_t: almost generic one

UT_LIST was replaced with ilist<t>

innobase_kill_query: wrong comment removed.
2020-06-23 19:11:57 +03:00
Marko Mäkelä
cfd3d70ccb MDEV-22871: Remove pointer indirection for InnoDB hash_table_t
hash_get_n_cells(): Remove. Access n_cells directly.

hash_get_nth_cell(): Remove. Access array directly.

hash_table_clear(): Replaced with hash_table_t::clear().

hash_table_create(), hash_table_free(): Remove.

hash0hash.cc: Remove.
2020-06-18 14:16:01 +03:00
Marko Mäkelä
c7a2fb1e08 Merge 10.3 into 10.4 2020-06-06 22:05:32 +03:00
Marko Mäkelä
e14ffd85d0 MDEV-22721 fixup for 32-bit GCC
lock_check_trx_id_sanity(): Because the argument of UNIV_LIKELY
or __builtin_expect() can be less than sizeof(trx_id_t) on 32-bit
systems, it cannot reliably perform an implicit comparison to 0.
2020-06-06 18:18:40 +03:00
Marko Mäkelä
a08a8bc191 MDEV-22721: Fix GCC 5.3.1 -Wconversion 2020-06-06 17:29:41 +03:00
Marko Mäkelä
6877ef9a7c Merge 10.4 into 10.5 2020-06-05 20:36:43 +03:00
Marko Mäkelä
68d9d512e9 Merge 10.3 into 10.4 2020-06-05 18:05:22 +03:00
Marko Mäkelä
680463a8d9 Merge 10.2 into 10.3 2020-06-05 16:51:26 +03:00
Marko Mäkelä
b1ab211dee MDEV-15053 Reduce buf_pool_t::mutex contention
User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE
and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0
and will no longer report the PAGE_STATE value READY_FOR_USE.

We will remove some fields from buf_page_t and move much code to
member functions of buf_pool_t and buf_page_t, so that the access
rules of data members can be enforced consistently.

Evicting or adding pages in buf_pool.LRU will remain covered by
buf_pool.mutex.

Evicting or adding pages in buf_pool.page_hash will remain
covered by both buf_pool.mutex and the buf_pool.page_hash X-latch.

After this fix, buf_pool.page_hash lookups can entirely
avoid acquiring buf_pool.mutex, only relying on
buf_pool.hash_lock_get() S-latch.

Similarly, buf_flush_check_neighbors() can will rely solely on
buf_pool.mutex, no buf_pool.page_hash latch at all.

The buf_pool.mutex is rather contended in I/O heavy benchmarks,
especially when the workload does not fit in the buffer pool.

The first attempt to alleviate the contention was the
buf_pool_t::mutex split in
commit 4ed7082eef
which introduced buf_block_t::mutex, which we are now removing.

Later, multiple instances of buf_pool_t were introduced
in commit c18084f71b
and recently removed by us in
commit 1a6f708ec5 (MDEV-15058).

UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool
related debugging in otherwise non-debug builds has not been used
for years. Instead, we have been using UNIV_DEBUG, which is enabled
in CMAKE_BUILD_TYPE=Debug.

buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on
std::atomic and the buf_pool.page_hash latches, and in some cases
depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before.
We must always release buf_block_t::lock before invoking
unfix() or io_unfix(), to prevent a glitch where a block that was
added to the buf_pool.free list would apper X-latched. See
commit c5883debd6 how this glitch
was finally caught in a debug environment.

We move some buf_pool_t::page_hash specific code from the
ha and hash modules to buf_pool, for improved readability.

buf_pool_t::close(): Assert that all blocks are clean, except
on aborted startup or crash-like shutdown.

buf_pool_t::validate(): No longer attempt to validate
n_flush[] against the number of BUF_IO_WRITE fixed blocks,
because buf_page_t::flush_type no longer exists.

buf_pool_t::watch_set(): Replaces buf_pool_watch_set().
Reduce mutex contention by separating the buf_pool.watch[]
allocation and the insert into buf_pool.page_hash.

buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a
buf_pool.page_hash latch.
Replaces and extends buf_page_hash_lock_s_confirm()
and buf_page_hash_lock_x_confirm().

buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES.

buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads:
Use Atomic_counter.

buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out().

buf_pool_t::LRU_remove(): Remove a block from the LRU list
and return its predecessor. Incorporates buf_LRU_adjust_hp(),
which was removed.

buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(),
for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by
BTR_DELETE_OP (purge), which is never invoked on temporary tables.

buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments.

buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition.

buf_LRU_free_page(): Clarify the function comment.

buf_flush_check_neighbor(), buf_flush_check_neighbors():
Rewrite the construction of the page hash range. We will hold
the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64)
consecutive lookups of buf_pool.page_hash.

buf_flush_page_and_try_neighbors(): Remove.
Merge to its only callers, and remove redundant operations in
buf_flush_LRU_list_batch().

buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite.
Do not acquire buf_pool.mutex, and iterate directly with page_id_t.

ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined
and avoids any loops.

fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove.

buf_flush_page(): Add a fil_space_t* parameter. Minimize the
buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated
atomically with the io_fix, and we will protect most buf_block_t
fields with buf_block_t::lock. The function
buf_flush_write_block_low() is removed and merged here.

buf_page_init_for_read(): Use static linkage. Initialize the newly
allocated block and acquire the exclusive buf_block_t::lock while not
holding any mutex.

IORequest::IORequest(): Remove the body. We only need to invoke
set_punch_hole() in buf_flush_page() and nowhere else.

buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type.
This field is only used during a fil_io() call.
That function already takes IORequest as a parameter, so we had
better introduce  for the rarely changing field.

buf_block_t::init(): Replaces buf_page_init().

buf_page_t::init(): Replaces buf_page_init_low().

buf_block_t::initialise(): Initialise many fields, but
keep the buf_page_t::state(). Both buf_pool_t::validate() and
buf_page_optimistic_get() requires that buf_page_t::in_file()
be protected atomically with buf_page_t::in_page_hash
and buf_page_t::in_LRU_list.

buf_page_optimistic_get(): Now that buf_block_t::mutex
no longer exists, we must check buf_page_t::io_fix()
after acquiring the buf_pool.page_hash lock, to detect
whether buf_page_init_for_read() has been initiated.
We will also check the io_fix() before acquiring hash_lock
in order to avoid unnecessary computation.
The field buf_block_t::modify_clock (protected by buf_block_t::lock)
allows buf_page_optimistic_get() to validate the block.

buf_page_t::real_size: Remove. It was only used while flushing
pages of page_compressed tables.

buf_page_encrypt(): Add an output parameter that allows us ot eliminate
buf_page_t::real_size. Replace a condition with debug assertion.

buf_page_should_punch_hole(): Remove.

buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch().
Add the parameter size (to replace buf_page_t::real_size).

buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page().
Add the parameter size (to replace buf_page_t::real_size).

fil_system_t::detach(): Replaces fil_space_detach().
Ensure that fil_validate() will not be violated even if
fil_system.mutex is released and reacquired.

fil_node_t::complete_io(): Renamed from fil_node_complete_io().

fil_node_t::close_to_free(): Replaces fil_node_close_to_free().
Avoid invoking fil_node_t::close() because fil_system.n_open
has already been decremented in fil_space_t::detach().

BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY.

BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE,
and distinguish dirty pages by buf_page_t::oldest_modification().

BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead.
This state was only being used for buf_page_t that are in
buf_pool.watch.

buf_pool_t::watch[]: Remove pointer indirection.

buf_page_t::in_flush_list: Remove. It was set if and only if
buf_page_t::oldest_modification() is nonzero.

buf_page_decrypt_after_read(), buf_corrupt_page_release(),
buf_page_check_corrupt(): Change the const fil_space_t* parameter
to const fil_node_t& so that we can report the correct file name.

buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function.

buf_page_io_complete(): Split to buf_page_read_complete() and
buf_page_write_complete().

buf_dblwr_t::in_use: Remove.

buf_dblwr_t::buf_block_array: Add IORequest::flush_t.

buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of
os_aio_wait_until_no_pending_writes().

buf_flush_write_complete(): Declare static, not global.
Add the parameter IORequest::flush_t.

buf_flush_freed_page(): Simplify the code.

recv_sys_t::flush_lru: Renamed from flush_type and changed to bool.

fil_read(), fil_write(): Replaced with direct use of fil_io().

fil_buffering_disabled(): Remove. Check srv_file_flush_method directly.

fil_mutex_enter_and_prepare_for_io(): Return the resolved
fil_space_t* to avoid a duplicated lookup in the caller.

fil_report_invalid_page_access(): Clean up the parameters.

fil_io(): Return fil_io_t, which comprises fil_node_t and error code.
Always invoke fil_space_t::acquire_for_io() and let either the
sync=true caller or fil_aio_callback() invoke
fil_space_t::release_for_io().

fil_aio_callback(): Rewrite to replace buf_page_io_complete().

fil_check_pending_operations(): Remove a parameter, and remove some
redundant lookups.

fil_node_close_to_free(): Wait for n_pending==0. Because we no longer
do an extra lookup of the tablespace between fil_io() and the
completion of the operation, we must give fil_node_t::complete_io() a
chance to decrement the counter.

fil_close_tablespace(): Remove unused parameter trx, and document
that this is only invoked during the error handling of IMPORT TABLESPACE.

row_import_discard_changes(): Merged with the only caller,
row_import_cleanup(). Do not lock up the data dictionary while
invoking fil_close_tablespace().

logs_empty_and_mark_files_at_shutdown(): Do not invoke
fil_close_all_files(), to avoid a !needs_flush assertion failure
on fil_node_t::close().

innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files().

fil_close_all_files(): Invoke fil_flush_file_spaces()
to ensure proper durability.

thread_pool::unbind(): Fix a crash that would occur on Windows
after srv_thread_pool->disable_aio() and os_file_close().
This fix was submitted by Vladislav Vaintroub.

Thanks to Matthias Leich and Axel Schwenke for extensive testing,
Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.
2020-06-05 12:35:46 +03:00
Marko Mäkelä
eba2d10ac5 MDEV-22721 Remove bloat caused by InnoDB logger class
Introduce a new ATTRIBUTE_NOINLINE to
ib::logger member functions, and add UNIV_UNLIKELY hints to callers.

Also, remove some crash reporting output. If needed, the
information will be available using debugging tools.

Furthermore, remove some fts_enable_diag_print output that included
indexed words in raw form. The code seemed to assume that words are
NUL-terminated byte strings. It is not clear whether a NUL terminator
is always guaranteed to be present. Also, UCS2 or UTF-16 strings would
typically contain many NUL bytes.
2020-06-04 10:24:10 +03:00
Sergey Vojtovich
ccdfcedf10 MDEV-22693 - InnoDB: get rid of casts for rw_trx_hash iterator
Less error prone, stricter type control.
2020-05-30 10:24:11 +04:00
Sergey Vojtovich
50b0ce44a2 MDEV-22593 - InnoDB: don't take trx_sys.mutex in ReadView::open()
This was the last abuse of trx_sys.mutex, which is now exclusively
protecting trx_sys.trx_list.

This global acquisition was also potential scalability bottleneck for
oltp_read_write benchmark. Although it didn't expose itself as such due
to larger scalability issues.

Replaced trx_sys.mutex based synchronisation between ReadView creator
thread and purge coordinator thread performing latest view clone with
ReadView::m_mutex.

It also allowed to simplify tri-state view m_state down to boolean
m_open flag.

For performance reasons trx->read_view.close() is left as atomic relaxed
store, so that we don't have to waste resources for otherwise meaningless
mutex acquisition.
2020-05-26 17:11:20 +04:00
Marko Mäkelä
5ece2155cb Merge 10.4 into 10.5 2020-05-20 17:46:05 +03:00
Marko Mäkelä
2bf93a8fd6 Merge 10.3 into 10.4 2020-05-19 21:18:15 +03:00
Marko Mäkelä
79ed33c184 Merge 10.2 into 10.3 2020-05-19 17:05:05 +03:00
Marko Mäkelä
c93f8aca65 MDEV-22618 Assertion !dict_index_is_online_ddl ... in lock_table_locks_lookup
lock_table_locks_lookup(): Relax the assertion.
Locks must not exist while online secondary index creation is
in progress. However, if CREATE UNIQUE INDEX has not been committed
yet, but the index creation has been completed, concurrent DML
transactions may acquire record locks on the index. Furthermore,
such concurrent DML may cause duplicate key violation, causing
the DDL operation to be rolled back. After that, the online_status
may be ONLINE_INDEX_ABORTED or ONLINE_INDEX_ABORTED_DROPPED.

So, the debug assertion may only forbid the state ONLINE_INDEX_CREATION.
2020-05-19 10:42:56 +03:00
Aleksey Midenkov
f2a944516e Merge 10.4 into 10.5 2020-05-15 17:13:35 +03:00
Jan Lindström
523d67a272 MDEV-22494 : Galera assertion lock_sys.mutex.is_owned() at lock_trx_handle_wait_low
Problem was that trx->lock.was_chosen_as_wsrep_victim variable was
not set back to false after it was set true.

wsrep_thd_bf_abort
	Add assertions for correct mutex status and take necessary
	mutexes before calling thd->awake_no_mutex().

innobase_rollback_trx()
	Reset trx->lock.was_chosen_as_wsrep_victim

wsrep_abort_slave_trx()
	Removed unused function.

wsrep_innobase_kill_one_trx()
	Added function comment, removed unnecessary parameters
	and added debug assertions to enforce correct usage. Added
	more debug output to help out on error analysis.

wsrep_abort_transaction()
	Added debug assertions and removed unused variables.

trx0trx.h
	Removed assert_trx_is_free macro and replaced it with
	assert_freed() member function.

trx_create()
	Use above assert_free() and initialize wsrep variables.

trx_free()
	Use assert_free()

trx_t::commit_in_memory()
	Reset lock.was_chosen_as_wsrep_victim

trx_rollback_for_mysql()
	Reset trx->lock.was_chosen_as_wsrep_victim

Add test case galera_bf_kill
2020-05-15 09:04:02 +03:00
Marko Mäkelä
7bcaa541aa Merge 10.4 into 10.5 2020-05-05 21:16:22 +03:00
Marko Mäkelä
2c3c851d2c Merge 10.3 into 10.4 2020-05-05 20:33:10 +03:00
Oleksandr Byelkin
7fb73ed143 Merge branch '10.2' into 10.3 2020-05-04 16:47:11 +02:00
Marko Mäkelä
f544a712c8 Cleanup: Reduce que_thr_t, que_fork_t, trx_lock_t size
que_thr_t::magic_n: Remove. Access to freed data is best caught
by AddressSanitizer.

que_thr_t::start_running(): Replaces que_thr_move_to_run_state_for_mysql()
and que_thr_move_to_run_state(), which were identical non-inline functions.

que_thr_t::stop_no_error(): Replaces que_thr_stop_for_mysql_no_error().

que_fork_t::n_active_thrs, trx_lock_t::n_active_thrs: Make debug-only.

que_fork_t::set_active(bool active): Update n_active_thrs.
2020-04-30 10:43:43 +03:00
Marko Mäkelä
496d0372ef Merge 10.4 into 10.5 2020-04-29 15:40:51 +03:00
Daniel Black
ba2061da52 MDEV-21595: innodb offset_t rename to rec_offs
thanks to:

perl -i -pe 's/\boffset_t\b/rec_offs/g' $(git grep -lw offset_t storage/innobase)
2020-04-29 12:02:47 +03:00
Marko Mäkelä
cfbbf5424b MDEV-7962: Follow-up fix for 10.4
Replace wsrep_on() with trx_t::is_wsrep() where possible.

Also, rename some functions to member functions and
remove unused DBUG_EXECUTE_IF instrumentation:

trx_t::commit(): Renamed from trx_commit().

trx_t::commit_low(): Renamed from trx_commit_low().

trx_t::commit_in_memory(): Renamed from trx_commit_in_memory().
2020-04-29 11:50:03 +03:00
Marko Mäkelä
b63446984c Merge 10.3 into 10.4 2020-04-27 17:38:17 +03:00
Marko Mäkelä
2e12d471ea Merge 10.2 into 10.3 2020-04-27 14:24:41 +03:00
Marko Mäkelä
c06845d6f0 Merge 10.1 into 10.2 2020-04-27 13:28:13 +03:00
Marko Mäkelä
edbdfc2f99 MDEV-7962 wsrep_on() takes 0.14% in OLTP RO
The function wsrep_on() was being called rather frequently
in InnoDB and XtraDB. Let us cache it in trx_t and invoke
trx_t::is_wsrep() instead.

innobase_trx_init(): Cache trx->wsrep = wsrep_on(thd).

ha_innobase::write_row(): Replace many repeated calls to current_thd,
and test the cheapest condition first.
2020-04-27 11:18:11 +03:00
Marko Mäkelä
5203bc10f1 Merge 10.4 into 10.5 2020-03-21 11:37:10 +02:00
Marko Mäkelä
a786f50de5 MDEV-21962 Allocate buf_pool statically
Thanks to MDEV-15058, there is only one InnoDB buffer pool.
Allocating buf_pool statically removes one level of pointer indirection
and makes code more readable, and removes the awkward initialization of
some buf_pool members.

While doing this, we will also declare some buf_pool_t data members
private and replace some functions with member functions. This is
mostly affecting buffer pool resizing.

This is not aiming to be a complete rewrite of buf_pool_t to
a proper class. Most of the buffer pool interface, such as
buf_page_get_gen(), will remain in the C programming style
for now.

buf_pool_t::withdrawing: Replaces buf_pool_withdrawing.
buf_pool_t::withdraw_clock_: Replaces buf_withdraw_clock.

buf_pool_t::create(): Repalces buf_pool_init().
buf_pool_t::close(): Replaces buf_pool_free().

buf_bool_t::will_be_withdrawn(): Replaces buf_block_will_be_withdrawn(),
buf_frame_will_be_withdrawn().

buf_pool_t::clear_hash_index(): Replaces buf_pool_clear_hash_index().
buf_pool_t::get_n_pages(): Replaces buf_pool_get_n_pages().
buf_pool_t::validate(): Replaces buf_validate().
buf_pool_t::print(): Replaces buf_print().
buf_pool_t::block_from_ahi(): Replaces buf_block_from_ahi().
buf_pool_t::is_block_field(): Replaces buf_pointer_is_block_field().
buf_pool_t::is_block_mutex(): Replaces buf_pool_is_block_mutex().
buf_pool_t::is_block_lock(): Replaces buf_pool_is_block_lock().
buf_pool_t::is_obsolete(): Replaces buf_pool_is_obsolete().
buf_pool_t::io_buf: Make default-constructible.
buf_pool_t::io_buf::create(): Delayed 'constructor'
buf_pool_t::io_buf::close(): Early 'destructor'

HazardPointer: Make default-constructible. Define all member functions
inline, also for derived classes.
2020-03-18 22:32:40 +02:00
Marko Mäkelä
f224525204 MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC
The -Wconversion in GCC seems to be stricter than in clang.
GCC at least since version 4.4.7 issues truncation warnings for
assignments to bitfields, while clang 10 appears to only issue
warnings when the sizes in bytes rounded to the nearest integer
powers of 2 are different.

Before GCC 10.0.0, -Wconversion required more casts and would not
allow some operations, such as x<<=1 or x+=1 on a data type that
is narrower than int.

GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
about x|=y even when x and y are compatible types that are narrower
than int.  Hence, we must rewrite some x|=y as
x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.

In GCC 6 and later, the warning for assigning wider to bitfields
that are narrower than 8, 16, or 32 bits can be suppressed by
applying a bitwise & with the exact bitmask of the bitfield.
For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
cases.

The bitwise negation operator appears to promote short integers
to a wider type, and hence we must add explicit truncation casts
around them. Microsoft Visual C does not allow a static_cast to
truncate a constant, such as static_cast<byte>(1) truncating int.
Hence, we will use the constructor-style cast byte(~1) for such cases.

This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
2020-03-12 19:46:41 +02:00
Marko Mäkelä
574d8b2940 MDEV-21907: Fix most clang -Wconversion in InnoDB
Declare innodb_purge_threads as 4-byte integer (UINT)
instead of 4-or-8-byte (ULONG) and adjust the documentation string.
2020-03-11 08:29:48 +02:00
Thirunarayanan Balathandayuthapani
a35b4ae898 MDEV-15528 Punch holes when pages are freed
When a InnoDB data file page is freed, its contents becomes garbage,
and any storage allocated in the data file is wasted. During flushing,
InnoDB initializes the page with zeros if scrubbing is enabled. If the
tablespace is compressed then InnoDB should punch a hole else ignore the
flushing of the freed page.

buf_page_t:
- Replaced the variable file_page_was_freed, init_on_flush in buf_page_t
with status enum variable.
- Changed all debug assert of file_page_was_freed to DBUG_ASSERT
of buf_page_t::status

Removed buf_page_set_file_page_was_freed(),
buf_page_reset_file_page_was_freed().

buf_page_free(): Newly added function which takes X-lock on the page
before marking the status as FREED. So that InnoDB flush handler can
avoid concurrent flush of the freed page. Also while flushing the page,
InnoDB make sure that redo log which does freeing of the page also written
to the disk. Currently, this function only marks the page as FREED if
it is in buffer pool

buf_flush_freed_page(): Newly added function which initializes zeros
asynchorously if innodb_immediate_scrub_data_uncompressed is enabled.
Punch a hole to the file synchorously if page_compressed is enabled.
Reset the io_fix to NORMAL. Release the block from flush list and
associated mutex before writing zeros or punch a hole to the file.

buf_flush_page(): Removed the unnecessary usage of temporary
variable "flush"

fil_io(): Introduce new parameter called punch_hole. It allows fil_io()
to punch the hole to the file for the given offset.

buf_page_create(): Let the callers assign buf_page_t::status.
Every caller should eventually invoke mtr_t::init().

fsp_page_create(): Remove the unused mtr_t parameter.

In all other callers of buf_page_create() except fsp_page_create(),
before invoking mtr_t::init(), invoke
mtr_t::sx_latch_at_savepoint() or mtr_t::x_latch_at_savepoint().

mtr_t::init(): Initialize buf_page_t::status also for the temporary
tablespace (when redo logging is disabled), to avoid assertion failures.
2020-03-10 10:51:08 +05:30
mkaruza
d87c16be79 MDEV-20616: MariaDB-Galera 10.4.8 | Transaction aborted | Sig 6 Shutdown
When connections go to same node and deadlock happens, BF abort should
not happen for victim thread. Fixed by guarding
`wsrep_handle_SR_rollback()` so that is called only for SR transactions.

Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Daniele Sciascia <daniele.sciascia@galeracluster.com>
2020-03-03 10:29:45 +01:00
Vlad Lesin
721ec44e2a MDEV-14479: Do not acquire InnoDB record locks when covering table locks
exist

lock_rec_lock() does not set record lock if table lock is stronger or
equal to the acquired record lock.
2020-03-02 09:09:51 +03:00
Marko Mäkelä
1a6f708ec5 MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances
Our benchmarking efforts indicate that the reasons for splitting the
buf_pool in commit c18084f71b
have mostly gone away, possibly as a result of
mysql/mysql-server@ce6109ebfd
or similar work.

Only in one write-heavy benchmark where the working set size is
ten times the buffer pool size, the buf_pool->mutex would be
less contended with 4 buffer pool instances than with 1 instance,
in buf_page_io_complete(). That contention could be alleviated
further by making more use of std::atomic and by splitting
buf_pool_t::mutex further (MDEV-15053).

We will deprecate and ignore the following parameters:

	innodb_buffer_pool_instances
	innodb_page_cleaners

There will be only one buffer pool and one page cleaner task.

In a number of INFORMATION_SCHEMA views, columns that indicated
the buffer pool instance will be removed:

	information_schema.innodb_buffer_page.pool_id
	information_schema.innodb_buffer_page_lru.pool_id
	information_schema.innodb_buffer_pool_stats.pool_id
	information_schema.innodb_cmpmem.buffer_pool_instance
	information_schema.innodb_cmpmem_reset.buffer_pool_instance
2020-02-12 14:45:21 +02:00
Marko Mäkelä
28c89b7151 Merge 10.4 into 10.5 2019-12-16 07:47:17 +02:00
Marko Mäkelä
8fa759a576 Merge 10.3 into 10.4
We disable the MDEV-21189 test galera.galera_partition
because it times out.
2019-12-13 17:30:37 +02:00
Marko Mäkelä
3466b47b0d Merge 10.2 into 10.3 2019-12-13 10:08:57 +02:00
Eugene Kosov
f0aa073f2b MDEV-20950 Reduce size of record offsets
offset_t: this is a type which represents one record offset.
It's unsigned short int.

a lot of functions: replace ulint with offset_t

btr_pcur_restore_position_func(),
page_validate(),
row_ins_scan_sec_index_for_duplicate(),
row_upd_clust_rec_by_insert_inherit_func(),
row_vers_impl_x_locked_low(),
trx_undo_prev_version_build():
  allocate record offsets on the stack instead of waiting for rec_get_offsets()
  to allocate it from mem_heap_t. So, reducing  memory allocations.

RECORD_OFFSET, INDEX_OFFSET:
  now it's less convenient to store pointers in offset_t*
  array. One pointer occupies now several offset_t. And those constant are start
  indexes into array to places where to store pointer values

REC_OFFS_HEADER_SIZE: adjusted for the new reality

REC_OFFS_NORMAL_SIZE:
  increase size from 100 to 300 which means less heap allocations.
  And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which
  is smaller than previous 800 bytes.

REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality

rem0rec.h, rem0rec.ic, rem0rec.cc:
  various arguments, return values and local variables types were changed to
  fix numerous integer conversions issues.

enum field_type_t:
  offset types concept was introduces which replaces old offset flags stuff.
  Like in earlier version, 2 upper bits are used to store offset type.
  And this enum represents those types.

REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed

get_type(), set_type(), get_value(), combine():
  these are convenience functions to work with offsets and it's types

rec_offs_base()[0]:
  still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL

rec_offs_base()[i]:
  these have type offset_t now. Two upper bits contains type.
2019-12-13 00:26:50 +07:00
Vladislav Vaintroub
a808c18b73 Make .clang-format work with clang-8
Remove keywords that are too new.
2019-11-15 18:09:30 +01:00
Vladislav Vaintroub
5e62b6a5e0 MDEV-16264 Use threadpool for Innodb background work.
Almost all threads have gone
- the "ticking" threads, that sleep a while then do some work)
(srv_monitor_thread, srv_error_monitor_thread, srv_master_thread)
were replaced with timers. Some timers are periodic,
e.g the "master" timer.

- The btr_defragment_thread is also replaced by a timer , which
reschedules it self when current defragment "item" needs throttling

- the buf_resize_thread and buf_dump_threads are substitutes with tasks
Ditto with page cleaner workers.

- purge workers threads are not tasks as well, and purge cleaner
coordinator is a combination of a task and timer.

- All AIO is outsourced to tpool, Innodb just calls thread_pool::submit_io()
and provides the callback.

- The srv_slot_t was removed, and innodb_debug_sync used in purge
is currently not working, and needs reimplementation.
2019-11-15 18:09:30 +01:00
Marko Mäkelä
0ccfdc8eff Remove InnoDB wrappers of <string.h> functions
ut_strcmp(), ut_strcpy(), ut_strlen(), ut_memcpy(), ut_memcmp(),
ut_memmove(): Remove. Invoke the standard library functions directly.
2019-10-30 07:31:39 +02:00
Marko Mäkelä
72f671ab7b Merge 10.4 into 10.5 2019-09-27 07:15:07 +03:00
Marko Mäkelä
bb5afc7ceb Merge 10.3 into 10.4 2019-09-26 16:56:02 +03:00
Marko Mäkelä
3e4931cdf3 MDEV-20675 Crash in SHOW ENGINE INNODB STATUS with innodb_force_recovery=5
lock_print_info::operator(): Do not dereference purge_sys.query in case
it is NULL. We would not initialize purge_sys if innodb_force_recovery
is set to 5 or 6.

The test case will be added by merge from 10.2.
2019-09-26 13:18:22 +03:00
Marko Mäkelä
1333da90b5 Merge 10.4 into 10.5 2019-09-24 10:07:56 +03:00
Marko Mäkelä
5a92ccbaea Merge 10.3 into 10.4
Disable MDEV-20576 assertions until MDEV-20595 has been fixed.
2019-09-23 17:35:29 +03:00