If trx_free() and trx_create_low() were called while a call to
trx_reference() was pending, we could get a reference to a wrong
transaction object.
trx_reference(): Return NULL if the trx->id no longer matches.
lock_trx_release_locks(): Assign trx->id = 0, so that trx_reference()
will not return a reference to this object.
trx_cleanup_at_db_startup(): Assign trx->id = 0.
assert_trx_is_free(): Assert !trx->n_ref. Assert trx->id == 0,
now that it will be cleared as part of a transaction commit.
Allocate trx->lock.rec_pool and trx->lock.table_pool directly from trx_t.
Remove unnecessary use of std::vector.
In order to do this, move some definitions from lock0priv.h to
lock0types.h, so that ib_lock_t will not be an opaque type.
InnoDB executed code that is mean to execute only when Galera
is used and in bad luck one of the transactions is selected
incorrectly as deadlock victim. Fixed by adding wsrep_on_trx()
condition before entering actual Galera transaction handling.
No always repeatable test case for this issue is known.
Make dict_table_t::n_ref_count private, and protect it with
a combination of dict_sys->mutex and atomics. We want to be
able to invoke dict_table_t::release() without holding
dict_sys->mutex.
In InnoDB, an INSERT will not create an explicit lock object. Instead,
the inserted record is initially implicitly locked by the transaction
that wrote its trx_t::id to the hidden system column DB_TRX_ID.
(Other transactions would check if DB_TRX_ID is referring to a
transaction that has not been committed.)
If a record was inserted in the current transaction, it would be
implicitly locked by that transaction. Only if some other transaction
is requesting access to the record, the implicit lock should be
converted to an explicit one, so that the waits-for graph can be
constructed for detecting deadlocks and lock wait timeouts.
Before this fix, InnoDB would convert implicit locks to
explicit ones, even if no conflict exists.
lock_rec_convert_impl_to_expl(): Return whether caller_trx
already holds an explicit lock that covers the record.
row_vers_impl_x_locked_low(): Avoid a lookup if the record matches
caller_trx->id.
lock_trx_has_expl_x_lock(): Renamed from lock_trx_has_rec_x_lock().
row_upd_clust_step(): In a debug assertion, check for implicit lock
before invoking lock_trx_has_expl_x_lock().
rw_trx_hash_t::find(): Make do_ref_count a mandatory parameter.
Assert that trx_id is not 0 (the caller should check it).
trx_sys_t::is_registered(): Only invoke find() if id != 0.
trx_sys_t::find(): Add the optional parameter do_ref_count.
lock_rec_queue_validate(): Avoid lookup for trx_id == 0.
The following conditions will decide the query cache retrieval or
storing inside innodb:
(1) There should not be any locks on the table.
(2) Some other trx shouldn't invalidated the cache before the
transaction started.
(3) Read view shouldn't exist. If exists then the view
low_limit_id should be greater than or equal to the transaction that
invalidates the cache for the particular table.
For read-only transaction: should satisfy the above (1) and (3)
For read-write transaction: should satisfy the above (1), (2), (3).
- Changed the variable from query_cache_inv_id to query_cache_inv_trx_id.
- Moved the function row_search_check_if_query_cache_permitted from
row0sel.h and made it as static function in ha_innodb.cc
Thanks to Sergey Vojtovich for feedback and many ideas.
purge_state_t: Remove. The states are replaced with
purge_sys_t::enabled() and purge_sys_t::paused() as follows:
PURGE_STATE_INIT, PURGE_STATE_EXIT, PURGE_STATE_DISABLED: !enabled().
PURGE_STATE_RUN, PURGE_STATE_STOP: paused() distinguishes these.
purge_sys_t::m_paused: Renamed from purge_sys_t::n_stop.
Protected by atomic memory access only, not purge_sys_t::latch.
purge_sys_t::m_enabled: An atomically updated Boolean that
replaces purge_sys_t::state.
purge_sys_t:🏃 Remove, because it duplicates
srv_sys.n_threads_active[SRV_PURGE].
purge_sys_t::running(): Accessor for srv_sys.n_threads_active[SRV_PURGE].
purge_sys_t::stop(): Renamed from trx_purge_stop().
purge_sys_t::resume(): Renamed from trx_purge_run().
Do not acquire latch; solely rely on atomics.
purge_sys_t::is_initialised(), purge_sys_t::m_initialised: Remove.
purge_sys_t::create(), purge_sys_t::close(): Instead of invoking
is_initialised(), check whether event is NULL.
purge_sys_t::event: Move before latch, so that fields that are
protected by latch can reside on the same cache line with latch.
srv_start_wait_for_purge_to_start(): Merge to the only caller srv_start().
If the tablespace is dropped or truncated after the
space->is_stopping() check in fil_crypt_get_page_throttle_func(),
we would proceed to request the page, and eventually report a fatal
error.
buf_page_get_gen(): Do not retry reading if mode==BUF_GET_POSSIBLY_FREED.
lock_rec_block_validate(): Be prepared for a NULL return value when
invoking buf_page_get_gen() with mode=BUF_GET_POSSIBLY_FREED.
Remove unused InnoDB function parameters and functions.
i_s_sys_virtual_fill_table(): Do not allocate heap memory.
mtr_is_block_fix(): Replace with mtr_memo_contains().
mtr_is_page_fix(): Replace with mtr_memo_contains_page().
Bind more InnoDB parameters directly to MYSQL_SYSVAR and
remove "shadow variables".
innodb_change_buffering: Declare as ENUM, not STRING.
innodb_flush_method: Declare as ENUM, not STRING.
innodb_log_buffer_size: Bind directly to srv_log_buffer_size,
without rounding it to a multiple of innodb_page_size.
LOG_BUFFER_SIZE: Remove.
SysTablespace::normalize_size(): Renamed from normalize().
innodb_init_params(): A new function to initialize and validate
InnoDB startup parameters.
innodb_init(): Renamed from innobase_init(). Invoke innodb_init_params()
before actually trying to start up InnoDB.
srv_start(bool): Renamed from innobase_start_or_create_for_mysql().
Added the input parameter create_new_db.
SRV_ALL_O_DIRECT_FSYNC: Define only for _WIN32.
xb_normalize_init_values(): Merge to innodb_init_param().
fil_space_t::n_pending_ops, n_pending_ios: Use a combination of
fil_system.mutex and atomic memory access for protection.
fil_space_t::release(): Replaces fil_space_release().
Does not acquire fil_system.mutex.
fil_space_t::release_for_io(): Replaces fil_space_release_for_io().
Does not acquire fil_system.mutex.
We can rely on the dict_table_t::space. All indexes of a table object
are always in the same tablespace. (For fulltext indexes, the data is
located in auxiliary tables, and these will continue to have their own
table objects, separate from the main table.)
Currently trx_sys.mutex protects only trx_sys.mysql_trx_list and
trx_sys.m_views, which are not accessed by lock0lock debug routines.
Thus there's no need to bother trx_sys.mutex here.
Removed trx_assert_started(): this assertion is fully covered by
check_trx_state().
This is fixup after commit 8026cd6202.
We must not silently allow a lock wait to occur during InnoDB data
dictionary transactions. The dict_operation_lock is supposed to
prevent lock waits, and we want to be aware of any errors.
lock_rec_trx_wait(): Merge to the only caller lock_prdt_rec_move().
lock_rec_reset_nth_bit(), lock_set_lock_and_trx_wait(),
lock_reset_lock_and_trx_wait(): Define in lock0priv.h.
By definition, c_lock->trx->lock.wait_lock==c_lock cannot hold.
That is, the owner transaction of a lock cannot be waiting for that
particular lock. It must have been waiting for some other lock.
Remove the dead code related to that. Also, test c_lock for NULLness
only once.
Refactor lock_grant(). With innodb_lock_schedule_algorithm=VATS
some callers were passing an incorrect parameter owns_trx_mutex
to lock_grant().
lock_grant_after_reset(): Refactored from lock_grant(), without
the call to lock_reset_lock_and_trx_wait().
lock_grant_have_trx_mutex(): A variant of lock_grant() where the
caller already holds the lock->trx->mutex. The normal lock_grant()
will acquire and release lock->trx->mutex.
lock_grant(): Define as a wrapper that will acquire lock->trx->mutex.
lock_table_create(): Move the WSREP parameter c_lock last,
and make it NULL by default, to avoid the need for a wrapper
function.
lock_table_enqueue_waiting(): Move the WSREP parameter c_lock last.
Revert the dead code for MySQL 5.7 multi-master replication (GCS),
also known as
WL#6835: InnoDB: GCS Replication: Deterministic Deadlock Handling
(High Prio Transactions in InnoDB).
Also, make innodb_lock_schedule_algorithm=vats skip SPATIAL INDEX,
because the code does not seem to be compatible with them.
Add FIXME comments to some SPATIAL INDEX locking code. It looks
like Galera write-set replication might not work with SPATIAL INDEX.
The merge only covered 10.1 up to
commit 4d248974e0.
Actually merge the changes up to
commit 0a534348c7.
Also, remove the unused InnoDB field trx_t::abort_type.
Unlike commit a54abf0175 claimed,
the caller of THD::awake() may actually hold the InnoDB lock_sys->mutex.
That commit introduced a deadlock of threads in the replication slave
when running the test rpl.rpl_parallel_optimistic_nobinlog.
lock_trx_handle_wait(): Expect the callers to acquire and release
lock_sys->mutex and trx->mutex.
innobase_kill_query(): Restore the logic for conditionally acquiring
and releasing the mutexes. THD::awake() can be called from inside
InnoDB while holding one or both mutexes, via thd_report_wait_for() and
via wsrep_innobase_kill_one_trx().
By definition, c_lock->trx->lock.wait_lock==c_lock cannot hold.
That is, the owner transaction of a lock cannot be waiting for that
particular lock. It must have been waiting for some other lock.
Remove the dead code related to that. Also, test c_lock for NULLness
only once.
With MDEV-15132 in MariaDB 10.3.5, InnoDB no longer writes the
transaction identifier to the TRX_SYS page. The information is
only written to undo log headers and sometimes rollback segment
headers. Because the setting innodb_force_recovery=5 will skip
reading any of those pages, the maximum transaction identifier
will no longer be determined.
innobase_map_isolation_level(): Always report READ UNCOMMITTED
if innodb_force_recovery has been set to 5 or more, or
innodb_read_only is set. This will avoid errors reported by
lock_check_trx_id_sanity() and ReadView::check_trx_id_sanity().
lock_clust_rec_cons_read_sees(): Do not check for
innodb_read_only, now that innobase_map_isolation_level() will
guarantee that no read view will be created or used.
row_search_mvcc(): Do not check for innodb_force_recovery<5,
now that innobase_map_isolation_level() will guarantee that
no read view will be created or used.
There is only one lock_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.
lock_sys_t::create(): The deferred constructor.
lock_sys_t::close(): The early destructor.
There is only one purge_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.
purge_sys_t::create(): The deferred constructor.
purge_sys_t::close(): The early destructor.
undo::Truncate::create(): The deferred constructor.
Because purge_sys.undo_trunc is constructed before the start-up
parameters are parsed, the normal constructor would copy a
wrong value of srv_purge_rseg_truncate_frequency.
TrxUndoRsegsIterator: Do not forward-declare an inline constructor,
because the static construction of purge_sys.rseg_iter would not have
access to it.
Before MDEV-12288 in MariaDB 10.3.1, InnoDB used to partition
the persistent transaction undo log into insert_undo and update_undo.
MDEV-12288 repurposes the update_undo as the single undo log.
In order to support an upgrade from earlier MariaDB versions,
the insert_undo is recovered in data structures, called old_insert.
An assertion failure occurred in TrxUndoRsegsIterator::set_next()
when an incomplete transaction was recovered with both insert_undo
and update_undo log. This could be easily demonstrated by starting
./mysql-test-run --manual-gdb innodb.read_only_recovery
in MariaDB 10.2, and after the first kill, start up the MariaDB 10.3
server with the same parameters.
The problem is that MariaDB 10.3 would roll back the recovered
transaction, and finally "commit" it twice (with all changes to
data rolled back), both insert_undo and update_undo with the same
commit end identifier (trx->no).
Our fix is to introduce a "commit number" that comprises two components:
(trx->no << 1 | !old_insert). In this way, the assertion in the purge
subsystem can be relaxed so that only the trx->no component must match.
purge_iter_t::operator<=(): Ordering comparison.
This replaces trx_purge_check_limit() with the difference that
we are not comparing undo_rseg_space. (In MariaDB, temporary
undo logs do not enter the purge subsystem at all.)
purge_sys_t::done: Remove. This was not used for anything.
purge_sys_t::tail: Renamed from purge_sys_t::iter.
purge_sys_t::head: Renamed from purge_sys_t::limit.