mirror of
https://github.com/MariaDB/server.git
synced 2025-01-31 02:51:44 +01:00
b07beff894
MySQL 5.7.9 (and MariaDB 10.2.2) introduced a race condition between InnoDB transaction commit and the conversion of implicit locks into explicit ones. The assertion failure can be triggered with a test that runs 3 concurrent single-statement transactions in a loop on a simple table: CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; thread1: INSERT INTO t SET a=1; thread2: DELETE FROM t; thread3: SELECT * FROM t FOR UPDATE; -- or DELETE FROM t; The failure scenarios are like the following: (1) The INSERT statement is being committed, waiting for lock_sys->mutex. (2) At the time of the failure, both the DELETE and SELECT transactions are active but have not logged any changes yet. (3) The transaction where the !other_lock assertion fails started lock_rec_convert_impl_to_expl(). (4) After this point, the commit of the INSERT removed the transaction from trx_sys->rw_trx_set, in trx_erase_lists(). (5) The other transaction consulted trx_sys->rw_trx_set and determined that there is no implicit lock. Hence, it grabbed the lock. (6) The !other_lock assertion fails in lock_rec_add_to_queue() for the lock_rec_convert_impl_to_expl(), because the lock was 'stolen'. This assertion failure looks genuine, because the INSERT transaction is still active (trx->state=TRX_STATE_ACTIVE). The problematic step (4) was introduced in mysql/mysql-server@e27e0e0bb7 which fixed something related to MVCC (covered by the test innodb.innodb-read-view). Basically, it reintroduced an error that had been mentioned in an earlier commit mysql/mysql-server@a17be6963f: "The active transaction was removed from trx_sys->rw_trx_set prematurely." Our fix goes along the following lines: (a) Implicit locks will released by assigning trx->state=TRX_STATE_COMMITTED_IN_MEMORY as the first step. This transition will no longer be protected by lock_sys_t::mutex, only by trx->mutex. This idea is by Sergey Vojtovich. (b) We detach the transaction from trx_sys before starting to release explicit locks. (c) All callers of trx_rw_is_active() and trx_rw_is_active_low() must recheck trx->state after acquiring trx->mutex. (d) Before releasing any explicit locks, we will ensure that any activity by other threads to convert implicit locks into explicit will have ceased, by checking !trx_is_referenced(trx). There was a glitch in this check when it was part of lock_trx_release_locks(); at the end we would release trx->mutex and acquire lock_sys->mutex and trx->mutex, and fail to recheck (trx_is_referenced() is protected by trx_t::mutex). (e) Explicit locks can be released in batches (LOCK_RELEASE_INTERVAL=1000) just like we did before. trx_t::state: Document that the transition to COMMITTED is only protected by trx_t::mutex, no longer by lock_sys_t::mutex. trx_rw_is_active_low(), trx_rw_is_active(): Document that the transaction state should be rechecked after acquiring trx_t::mutex. trx_t::commit_state(): New function to change a transaction to committed state, to release implicit locks. trx_t::release_locks(): New function to release the explicit locks after commit_state(). lock_trx_release_locks(): Move much of the logic to the caller (which must invoke trx_t::commit_state() and trx_t::release_locks() as needed), and assert that the transaction will have locks. trx_get_trx_by_xid(): Make the parameter a pointer to const. lock_rec_other_trx_holds_expl(): Recheck trx->state after acquiring trx->mutex, and avoid a redundant lookup of the transaction. lock_rec_queue_validate(): Recheck impl_trx->state while holding impl_trx->mutex. row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): Document that the transaction state must be rechecked after trx_mutex_enter(). trx_free_prepared(): Adjust for the changes to lock_trx_release_locks().
156 lines
6 KiB
C++
156 lines
6 KiB
C++
/*****************************************************************************
|
|
|
|
Copyright (c) 1997, 2016, Oracle and/or its affiliates. All Rights Reserved.
|
|
Copyright (c) 2017, 2019, MariaDB Corporation.
|
|
|
|
This program is free software; you can redistribute it and/or modify it under
|
|
the terms of the GNU General Public License as published by the Free Software
|
|
Foundation; version 2 of the License.
|
|
|
|
This program is distributed in the hope that it will be useful, but WITHOUT
|
|
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
|
|
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License along with
|
|
this program; if not, write to the Free Software Foundation, Inc.,
|
|
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
|
|
|
|
*****************************************************************************/
|
|
|
|
/**************************************************//**
|
|
@file include/row0vers.h
|
|
Row versions
|
|
|
|
Created 2/6/1997 Heikki Tuuri
|
|
*******************************************************/
|
|
|
|
#ifndef row0vers_h
|
|
#define row0vers_h
|
|
|
|
#include "data0data.h"
|
|
#include "trx0types.h"
|
|
#include "que0types.h"
|
|
#include "rem0types.h"
|
|
#include "mtr0mtr.h"
|
|
#include "dict0mem.h"
|
|
#include "row0types.h"
|
|
|
|
// Forward declaration
|
|
class ReadView;
|
|
|
|
/** Determine if an active transaction has inserted or modified a secondary
|
|
index record.
|
|
@param[in] rec secondary index record
|
|
@param[in] index secondary index
|
|
@param[in] offsets rec_get_offsets(rec, index)
|
|
@return the active transaction; state must be rechecked after
|
|
trx_mutex_enter(), and trx_release_reference() must be invoked
|
|
@retval NULL if the record was committed */
|
|
trx_t*
|
|
row_vers_impl_x_locked(
|
|
const rec_t* rec,
|
|
dict_index_t* index,
|
|
const ulint* offsets);
|
|
|
|
/*****************************************************************//**
|
|
Finds out if we must preserve a delete marked earlier version of a clustered
|
|
index record, because it is >= the purge view.
|
|
@param[in] trx_id transaction id in the version
|
|
@param[in] name table name
|
|
@param[in,out] mtr mini transaction holding the latch on the
|
|
clustered index record; it will also hold
|
|
the latch on purge_view
|
|
@return TRUE if earlier version should be preserved */
|
|
ibool
|
|
row_vers_must_preserve_del_marked(
|
|
/*==============================*/
|
|
trx_id_t trx_id,
|
|
const table_name_t& name,
|
|
mtr_t* mtr);
|
|
|
|
/** Finds out if a version of the record, where the version >= the current
|
|
purge view, should have ientry as its secondary index entry. We check
|
|
if there is any not delete marked version of the record where the trx
|
|
id >= purge view, and the secondary index entry == ientry; exactly in
|
|
this case we return TRUE.
|
|
@param[in] also_curr TRUE if also rec is included in the versions
|
|
to search; otherwise only versions prior
|
|
to it are searched
|
|
@param[in] rec record in the clustered index; the caller
|
|
must have a latch on the page
|
|
@param[in] mtr mtr holding the latch on rec; it will
|
|
also hold the latch on purge_view
|
|
@param[in] index secondary index
|
|
@param[in] ientry secondary index entry
|
|
@param[in] roll_ptr roll_ptr for the purge record
|
|
@param[in] trx_id transaction ID on the purging record
|
|
@param[in,out] vcol_info virtual column information for purge thread.
|
|
@return TRUE if earlier version should have */
|
|
bool
|
|
row_vers_old_has_index_entry(
|
|
bool also_curr,
|
|
const rec_t* rec,
|
|
mtr_t* mtr,
|
|
dict_index_t* index,
|
|
const dtuple_t* ientry,
|
|
roll_ptr_t roll_ptr,
|
|
trx_id_t trx_id,
|
|
purge_vcol_info_t* vcol_info=NULL);
|
|
|
|
/*****************************************************************//**
|
|
Constructs the version of a clustered index record which a consistent
|
|
read should see. We assume that the trx id stored in rec is such that
|
|
the consistent read should not see rec in its present version.
|
|
@return DB_SUCCESS or DB_MISSING_HISTORY */
|
|
dberr_t
|
|
row_vers_build_for_consistent_read(
|
|
/*===============================*/
|
|
const rec_t* rec, /*!< in: record in a clustered index; the
|
|
caller must have a latch on the page; this
|
|
latch locks the top of the stack of versions
|
|
of this records */
|
|
mtr_t* mtr, /*!< in: mtr holding the latch on rec; it will
|
|
also hold the latch on purge_view */
|
|
dict_index_t* index, /*!< in: the clustered index */
|
|
ulint** offsets,/*!< in/out: offsets returned by
|
|
rec_get_offsets(rec, index) */
|
|
ReadView* view, /*!< in: the consistent read view */
|
|
mem_heap_t** offset_heap,/*!< in/out: memory heap from which
|
|
the offsets are allocated */
|
|
mem_heap_t* in_heap,/*!< in: memory heap from which the memory for
|
|
*old_vers is allocated; memory for possible
|
|
intermediate versions is allocated and freed
|
|
locally within the function */
|
|
rec_t** old_vers,/*!< out, own: old version, or NULL
|
|
if the history is missing or the record
|
|
does not exist in the view, that is,
|
|
it was freshly inserted afterwards */
|
|
dtuple_t** vrow); /*!< out: reports virtual column info if any */
|
|
|
|
/*****************************************************************//**
|
|
Constructs the last committed version of a clustered index record,
|
|
which should be seen by a semi-consistent read. */
|
|
void
|
|
row_vers_build_for_semi_consistent_read(
|
|
/*====================================*/
|
|
const rec_t* rec, /*!< in: record in a clustered index; the
|
|
caller must have a latch on the page; this
|
|
latch locks the top of the stack of versions
|
|
of this records */
|
|
mtr_t* mtr, /*!< in: mtr holding the latch on rec */
|
|
dict_index_t* index, /*!< in: the clustered index */
|
|
ulint** offsets,/*!< in/out: offsets returned by
|
|
rec_get_offsets(rec, index) */
|
|
mem_heap_t** offset_heap,/*!< in/out: memory heap from which
|
|
the offsets are allocated */
|
|
mem_heap_t* in_heap,/*!< in: memory heap from which the memory for
|
|
*old_vers is allocated; memory for possible
|
|
intermediate versions is allocated and freed
|
|
locally within the function */
|
|
const rec_t** old_vers,/*!< out: rec, old version, or NULL if the
|
|
record does not exist in the view, that is,
|
|
it was freshly inserted afterwards */
|
|
dtuple_t** vrow); /*!< out: holds virtual column info if any
|
|
is updated in the view */
|
|
|
|
#endif
|