mariadb/storage/innobase/row/row0purge.cc

1236 lines
33 KiB
C++
Raw Normal View History

/*****************************************************************************
Copyright (c) 1997, 2017, Oracle and/or its affiliates. All Rights Reserved.
Merge 10.1 into 10.2 This only merges MDEV-12253, adapting it to MDEV-12602 which is already present in 10.2 but not yet in the 10.1 revision that is being merged. TODO: Error handling in crash recovery needs to be improved. If a page cannot be decrypted (or read), we should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages. Currently, the test encryption.innodb-redo-badkey randomly fails like this (the last messages are from cmake -DWITH_ASAN): 2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994 2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1 2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption 2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown... i================================================================= ==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0 #0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750) #1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4 #2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17 #3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3 #4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2 #5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2 #6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
2017-05-05 10:25:29 +03:00
Copyright (c) 2017, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
/**************************************************//**
@file row/row0purge.cc
Purge obsolete records
Created 3/14/1997 Heikki Tuuri
*******************************************************/
#include "row0purge.h"
#include "fsp0fsp.h"
#include "mach0data.h"
MDEV-12698 innodb.innodb_stats_del_mark test failure In my merge of the MySQL fix for Oracle Bug#23333990 / WL#9513 I overlooked some subsequent revisions to the test, and I also failed to notice that the test is actually always failing. Oracle introduced the parameter innodb_stats_include_delete_marked but failed to consistently take it into account in FOREIGN KEY constraints that involve CASCADE or SET NULL. When innodb_stats_include_delete_marked=ON, obviously the purge of delete-marked records should update the statistics as well. One more omission was that statistics were never updated on ROLLBACK. We are fixing that as well, properly taking into account the parameter innodb_stats_include_delete_marked. dict_stats_analyze_index_level(): Simplify an expression. (Using the ternary operator with a constant operand is unnecessary obfuscation.) page_scan_method_t: Revert the change done by Oracle. Instead, examine srv_stats_include_delete_marked directly where it is needed. dict_stats_update_if_needed(): Renamed from row_update_statistics_if_needed(). row_update_for_mysql_using_upd_graph(): Assert that the table statistics are initialized, as guaranteed by ha_innobase::open(). Update the statistics in a consistent way, both for FOREIGN KEY triggers and for the main table. If FOREIGN KEY constraints exist, do not dereference a freed pointer, but cache the proper value of node->is_delete so that it matches prebuilt->table. row_purge_record_func(): Update statistics if innodb_stats_include_delete_marked=ON. row_undo_ins(): Update statistics (on ROLLBACK of a fresh INSERT). This is independent of the parameter; the record is not delete-marked. row_undo_mod(): Update statistics on the ROLLBACK of updating key columns, or (if innodb_stats_include_delete_marked=OFF) updating delete-marks. innodb.innodb_stats_persistent: Renamed and extended from innodb.innodb_stats_del_mark. Reduced the unnecessarily large dataset from 262,144 to 32 rows. Test both values of the configuration parameter innodb_stats_include_delete_marked. Test that purge is updating the statistics. innodb_fts.innodb_fts_multiple_index: Adjust the result. The test is performing a ROLLBACK of an INSERT, which now affects the statistics. include/wait_all_purged.inc: Moved from innodb.innodb_truncate_debug to its own file.
2017-05-16 12:07:26 +03:00
#include "dict0stats.h"
#include "trx0rseg.h"
#include "trx0trx.h"
#include "trx0roll.h"
#include "trx0undo.h"
#include "trx0purge.h"
#include "trx0rec.h"
#include "que0que.h"
#include "row0row.h"
#include "row0upd.h"
#include "row0vers.h"
#include "row0mysql.h"
#include "row0log.h"
#include "log0log.h"
#include "srv0mon.h"
#include "srv0start.h"
#include "handler.h"
#include "ha_innodb.h"
#include "fil0fil.h"
/*************************************************************************
IMPORTANT NOTE: Any operation that generates redo MUST check that there
is enough space in the redo log before for that operation. This is
done by calling log_free_check(). The reason for checking the
availability of the redo log space before the start of the operation is
that we MUST not hold any synchonization objects when performing the
check.
If you make a change in this module make sure that no codepath is
introduced where a call to log_free_check() is bypassed. */
/** Create a purge node to a query graph.
@param[in] parent parent node, i.e., a thr node
@param[in] heap memory heap where created
@return own: purge node */
purge_node_t*
row_purge_node_create(
que_thr_t* parent,
mem_heap_t* heap)
{
purge_node_t* node;
2016-06-21 14:21:03 +02:00
ut_ad(parent != NULL);
ut_ad(heap != NULL);
node = static_cast<purge_node_t*>(
mem_heap_zalloc(heap, sizeof(*node)));
node->common.type = QUE_NODE_PURGE;
node->common.parent = parent;
node->done = TRUE;
node->heap = mem_heap_create(256);
return(node);
}
/***********************************************************//**
Repositions the pcur in the purge node on the clustered index record,
2015-08-03 13:03:47 +02:00
if found. If the record is not found, close pcur.
@return TRUE if the record was found */
static
ibool
row_purge_reposition_pcur(
/*======================*/
ulint mode, /*!< in: latching mode */
purge_node_t* node, /*!< in: row purge node */
mtr_t* mtr) /*!< in: mtr */
{
if (node->found_clust) {
2015-08-03 13:03:47 +02:00
ut_ad(node->validate_pcur());
2015-08-03 13:03:47 +02:00
node->found_clust = btr_pcur_restore_position(mode, &node->pcur, mtr);
} else {
node->found_clust = row_search_on_row_ref(
&node->pcur, mode, node->table, node->ref, mtr);
if (node->found_clust) {
btr_pcur_store_position(&node->pcur, mtr);
}
}
2015-08-03 13:03:47 +02:00
/* Close the current cursor if we fail to position it correctly. */
if (!node->found_clust) {
btr_pcur_close(&node->pcur);
}
return(node->found_clust);
}
/***********************************************************//**
Removes a delete marked clustered index record if possible.
@retval true if the row was not found, or it was successfully removed
@retval false if the row was modified after the delete marking */
2016-06-21 14:21:03 +02:00
static MY_ATTRIBUTE((nonnull, warn_unused_result))
bool
row_purge_remove_clust_if_poss_low(
/*===============================*/
purge_node_t* node, /*!< in/out: row purge node */
ulint mode) /*!< in: BTR_MODIFY_LEAF or BTR_MODIFY_TREE */
{
dict_index_t* index;
bool success = true;
mtr_t mtr;
rec_t* rec;
mem_heap_t* heap = NULL;
ulint* offsets;
ulint offsets_[REC_OFFS_NORMAL_SIZE];
rec_offs_init(offsets_);
ut_ad(rw_lock_own(dict_operation_lock, RW_LOCK_S));
index = dict_table_get_first_index(node->table);
log_free_check();
mtr_start(&mtr);
mtr.set_named_space(index->space);
if (!row_purge_reposition_pcur(mode, node, &mtr)) {
/* The record was already removed. */
goto func_exit;
}
rec = btr_pcur_get_rec(&node->pcur);
offsets = rec_get_offsets(
rec, index, offsets_, ULINT_UNDEFINED, &heap);
if (node->roll_ptr != row_get_rec_roll_ptr(rec, index, offsets)) {
/* Someone else has modified the record later: do not remove */
goto func_exit;
}
ut_ad(rec_get_deleted_flag(rec, rec_offs_comp(offsets)));
/* In delete-marked records, DB_TRX_ID must
always refer to an existing undo log record. */
ut_ad(row_get_rec_trx_id(rec, index, offsets));
if (mode == BTR_MODIFY_LEAF) {
success = btr_cur_optimistic_delete(
btr_pcur_get_btr_cur(&node->pcur), 0, &mtr);
} else {
dberr_t err;
ut_ad(mode == (BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE));
btr_cur_pessimistic_delete(
&err, FALSE, btr_pcur_get_btr_cur(&node->pcur), 0,
false, &mtr);
switch (err) {
case DB_SUCCESS:
break;
case DB_OUT_OF_FILE_SPACE:
success = false;
break;
default:
ut_error;
}
}
func_exit:
if (heap) {
mem_heap_free(heap);
}
2015-08-03 13:03:47 +02:00
/* Persistent cursor is closed if reposition fails. */
if (node->found_clust) {
btr_pcur_commit_specify_mtr(&node->pcur, &mtr);
} else {
mtr_commit(&mtr);
}
return(success);
}
/***********************************************************//**
Removes a clustered index record if it has not been modified after the delete
marking.
@retval true if the row was not found, or it was successfully removed
@retval false the purge needs to be suspended because of running out
of file space. */
2016-06-21 14:21:03 +02:00
static MY_ATTRIBUTE((nonnull, warn_unused_result))
bool
row_purge_remove_clust_if_poss(
/*===========================*/
purge_node_t* node) /*!< in/out: row purge node */
{
if (row_purge_remove_clust_if_poss_low(node, BTR_MODIFY_LEAF)) {
return(true);
}
for (ulint n_tries = 0;
n_tries < BTR_CUR_RETRY_DELETE_N_TIMES;
n_tries++) {
if (row_purge_remove_clust_if_poss_low(
node, BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE)) {
return(true);
}
os_thread_sleep(BTR_CUR_RETRY_SLEEP_TIME);
}
return(false);
}
/***********************************************************//**
Determines if it is possible to remove a secondary index entry.
Removal is possible if the secondary index entry does not refer to any
not delete marked version of a clustered index record where DB_TRX_ID
is newer than the purge view.
NOTE: This function should only be called by the purge thread, only
while holding a latch on the leaf page of the secondary index entry
(or keeping the buffer pool watch on the page). It is possible that
this function first returns true and then false, if a user transaction
inserts a record that the secondary index entry would refer to.
However, in that case, the user transaction would also re-insert the
secondary index entry after purge has removed it and released the leaf
page latch.
@return true if the secondary index record can be purged */
bool
row_purge_poss_sec(
/*===============*/
purge_node_t* node, /*!< in/out: row purge node */
dict_index_t* index, /*!< in: secondary index */
const dtuple_t* entry) /*!< in: secondary index entry */
{
bool can_delete;
mtr_t mtr;
ut_ad(!dict_index_is_clust(index));
mtr_start(&mtr);
can_delete = !row_purge_reposition_pcur(BTR_SEARCH_LEAF, node, &mtr)
|| !row_vers_old_has_index_entry(TRUE,
btr_pcur_get_rec(&node->pcur),
&mtr, index, entry,
node->roll_ptr, node->trx_id);
2015-08-03 13:03:47 +02:00
/* Persistent cursor is closed if reposition fails. */
if (node->found_clust) {
btr_pcur_commit_specify_mtr(&node->pcur, &mtr);
} else {
mtr_commit(&mtr);
}
return(can_delete);
}
/***************************************************************
Removes a secondary index entry if possible, by modifying the
index tree. Does not try to buffer the delete.
@return TRUE if success or if not found */
2016-06-21 14:21:03 +02:00
static MY_ATTRIBUTE((nonnull, warn_unused_result))
ibool
row_purge_remove_sec_if_poss_tree(
/*==============================*/
purge_node_t* node, /*!< in: row purge node */
dict_index_t* index, /*!< in: index */
const dtuple_t* entry) /*!< in: index entry */
{
btr_pcur_t pcur;
btr_cur_t* btr_cur;
ibool success = TRUE;
dberr_t err;
mtr_t mtr;
enum row_search_result search_result;
log_free_check();
mtr_start(&mtr);
mtr.set_named_space(index->space);
if (!index->is_committed()) {
/* The index->online_status may change if the index is
or was being created online, but not committed yet. It
is protected by index->lock. */
mtr_sx_lock(dict_index_get_lock(index), &mtr);
if (dict_index_is_online_ddl(index)) {
/* Online secondary index creation will not
copy any delete-marked records. Therefore
there is nothing to be purged. We must also
skip the purge when a completed index is
dropped by rollback_inplace_alter_table(). */
goto func_exit_no_pcur;
}
} else {
/* For secondary indexes,
index->online_status==ONLINE_INDEX_COMPLETE if
index->is_committed(). */
ut_ad(!dict_index_is_online_ddl(index));
}
search_result = row_search_index_entry(
index, entry,
BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE,
&pcur, &mtr);
switch (search_result) {
case ROW_NOT_FOUND:
/* Not found. This is a legitimate condition. In a
rollback, InnoDB will remove secondary recs that would
be purged anyway. Then the actual purge will not find
the secondary index record. Also, the purge itself is
eager: if it comes to consider a secondary index
record, and notices it does not need to exist in the
index, it will remove it. Then if/when the purge
comes to consider the secondary index record a second
time, it will not exist any more in the index. */
/* fputs("PURGE:........sec entry not found\n", stderr); */
/* dtuple_print(stderr, entry); */
goto func_exit;
case ROW_FOUND:
break;
case ROW_BUFFERED:
case ROW_NOT_DELETED_REF:
/* These are invalid outcomes, because the mode passed
to row_search_index_entry() did not include any of the
flags BTR_INSERT, BTR_DELETE, or BTR_DELETE_MARK. */
ut_error;
}
btr_cur = btr_pcur_get_btr_cur(&pcur);
/* We should remove the index record if no later version of the row,
which cannot be purged yet, requires its existence. If some requires,
we should do nothing. */
if (row_purge_poss_sec(node, index, entry)) {
/* Remove the index record, which should have been
marked for deletion. */
2014-11-18 17:41:12 +01:00
if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur),
dict_table_is_comp(index->table))) {
ib::error()
<< "tried to purge non-delete-marked record"
" in index " << index->name
<< " of table " << index->table->name
<< ": tuple: " << *entry
<< ", record: " << rec_index_print(
btr_cur_get_rec(btr_cur), index);
2014-11-18 17:41:12 +01:00
ut_ad(0);
goto func_exit;
}
btr_cur_pessimistic_delete(&err, FALSE, btr_cur, 0,
false, &mtr);
switch (UNIV_EXPECT(err, DB_SUCCESS)) {
case DB_SUCCESS:
break;
case DB_OUT_OF_FILE_SPACE:
success = FALSE;
break;
default:
ut_error;
}
}
func_exit:
btr_pcur_close(&pcur);
func_exit_no_pcur:
mtr_commit(&mtr);
return(success);
}
/***************************************************************
Removes a secondary index entry without modifying the index tree,
if possible.
@retval true if success or if not found
@retval false if row_purge_remove_sec_if_poss_tree() should be invoked */
2016-06-21 14:21:03 +02:00
static MY_ATTRIBUTE((nonnull, warn_unused_result))
bool
row_purge_remove_sec_if_poss_leaf(
/*==============================*/
purge_node_t* node, /*!< in: row purge node */
dict_index_t* index, /*!< in: index */
const dtuple_t* entry) /*!< in: index entry */
{
mtr_t mtr;
btr_pcur_t pcur;
enum btr_latch_mode mode;
enum row_search_result search_result;
bool success = true;
log_free_check();
ut_ad(index->table == node->table);
ut_ad(!dict_table_is_temporary(index->table));
mtr_start(&mtr);
mtr.set_named_space(index->space);
if (!index->is_committed()) {
/* For uncommitted spatial index, we also skip the purge. */
if (dict_index_is_spatial(index)) {
goto func_exit_no_pcur;
}
/* The index->online_status may change if the the
index is or was being created online, but not
committed yet. It is protected by index->lock. */
mtr_s_lock(dict_index_get_lock(index), &mtr);
if (dict_index_is_online_ddl(index)) {
/* Online secondary index creation will not
copy any delete-marked records. Therefore
there is nothing to be purged. We must also
skip the purge when a completed index is
dropped by rollback_inplace_alter_table(). */
goto func_exit_no_pcur;
}
mode = BTR_PURGE_LEAF_ALREADY_S_LATCHED;
} else {
/* For secondary indexes,
index->online_status==ONLINE_INDEX_COMPLETE if
index->is_committed(). */
ut_ad(!dict_index_is_online_ddl(index));
/* Change buffering is disabled for spatial index. */
mode = dict_index_is_spatial(index)
? BTR_MODIFY_LEAF
: BTR_PURGE_LEAF;
}
/* Set the purge node for the call to row_purge_poss_sec(). */
pcur.btr_cur.purge_node = node;
if (dict_index_is_spatial(index)) {
rw_lock_sx_lock(dict_index_get_lock(index));
pcur.btr_cur.thr = NULL;
} else {
/* Set the query thread, so that ibuf_insert_low() will be
able to invoke thd_get_trx(). */
pcur.btr_cur.thr = static_cast<que_thr_t*>(
que_node_get_parent(node));
}
search_result = row_search_index_entry(
index, entry, mode, &pcur, &mtr);
if (dict_index_is_spatial(index)) {
rw_lock_sx_unlock(dict_index_get_lock(index));
}
switch (search_result) {
case ROW_FOUND:
/* Before attempting to purge a record, check
if it is safe to do so. */
if (row_purge_poss_sec(node, index, entry)) {
btr_cur_t* btr_cur = btr_pcur_get_btr_cur(&pcur);
/* Only delete-marked records should be purged. */
2014-11-18 17:41:12 +01:00
if (!rec_get_deleted_flag(
btr_cur_get_rec(btr_cur),
dict_table_is_comp(index->table))) {
ib::error()
<< "tried to purge non-delete-marked"
" record" " in index " << index->name
<< " of table " << index->table->name
<< ": tuple: " << *entry
<< ", record: "
<< rec_index_print(
btr_cur_get_rec(btr_cur),
index);
2014-11-18 17:41:12 +01:00
ut_ad(0);
btr_pcur_close(&pcur);
goto func_exit_no_pcur;
}
if (dict_index_is_spatial(index)) {
const page_t* page;
const trx_t* trx = NULL;
if (btr_cur->rtr_info != NULL
&& btr_cur->rtr_info->thr != NULL) {
trx = thr_get_trx(
btr_cur->rtr_info->thr);
}
page = btr_cur_get_page(btr_cur);
if (!lock_test_prdt_page_lock(
trx,
page_get_space_id(page),
page_get_page_no(page))
&& page_get_n_recs(page) < 2
&& btr_cur_get_block(btr_cur)
->page.id.page_no() !=
dict_index_get_page(index)) {
/* this is the last record on page,
and it has a "page" lock on it,
which mean search is still depending
on it, so do not delete */
DBUG_LOG("purge",
"skip purging last"
" record on page "
<< btr_cur_get_block(btr_cur)
->page.id);
btr_pcur_close(&pcur);
mtr_commit(&mtr);
return(success);
}
}
if (!btr_cur_optimistic_delete(btr_cur, 0, &mtr)) {
/* The index entry could not be deleted. */
success = false;
}
}
/* (The index entry is still needed,
or the deletion succeeded) */
/* fall through */
case ROW_NOT_DELETED_REF:
/* The index entry is still needed. */
case ROW_BUFFERED:
/* The deletion was buffered. */
case ROW_NOT_FOUND:
/* The index entry does not exist, nothing to do. */
btr_pcur_close(&pcur);
func_exit_no_pcur:
mtr_commit(&mtr);
return(success);
}
ut_error;
return(false);
}
/***********************************************************//**
Removes a secondary index entry if possible. */
2016-06-21 14:21:03 +02:00
UNIV_INLINE MY_ATTRIBUTE((nonnull(1,2)))
void
row_purge_remove_sec_if_poss(
/*=========================*/
purge_node_t* node, /*!< in: row purge node */
dict_index_t* index, /*!< in: index */
const dtuple_t* entry) /*!< in: index entry */
{
ibool success;
ulint n_tries = 0;
/* fputs("Purge: Removing secondary record\n", stderr); */
if (!entry) {
/* The node->row must have lacked some fields of this
index. This is possible when the undo log record was
written before this index was created. */
return;
}
if (row_purge_remove_sec_if_poss_leaf(node, index, entry)) {
return;
}
retry:
success = row_purge_remove_sec_if_poss_tree(node, index, entry);
/* The delete operation may fail if we have little
file space left: TODO: easiest to crash the database
and restart with more file space */
if (!success && n_tries < BTR_CUR_RETRY_DELETE_N_TIMES) {
n_tries++;
os_thread_sleep(BTR_CUR_RETRY_SLEEP_TIME);
goto retry;
}
ut_a(success);
}
/** Skip uncommitted virtual indexes on newly added virtual column.
@param[in,out] index dict index object */
static
inline
void
row_purge_skip_uncommitted_virtual_index(
dict_index_t*& index)
{
/* We need to skip virtual indexes which is not
committed yet. It's safe because these indexes are
newly created by alter table, and because we do
not support LOCK=NONE when adding an index on newly
added virtual column.*/
while (index != NULL && dict_index_has_virtual(index)
&& !index->is_committed() && index->has_new_v_col) {
index = dict_table_get_next_index(index);
}
}
/***********************************************************//**
Purges a delete marking of a record.
@retval true if the row was not found, or it was successfully removed
@retval false the purge needs to be suspended because of
running out of file space */
2016-06-21 14:21:03 +02:00
static MY_ATTRIBUTE((nonnull, warn_unused_result))
bool
row_purge_del_mark(
/*===============*/
purge_node_t* node) /*!< in/out: row purge node */
{
mem_heap_t* heap;
heap = mem_heap_create(1024);
while (node->index != NULL) {
/* skip corrupted secondary index */
dict_table_skip_corrupt_index(node->index);
row_purge_skip_uncommitted_virtual_index(node->index);
if (!node->index) {
break;
}
if (node->index->type != DICT_FTS) {
dtuple_t* entry = row_build_index_entry_low(
node->row, NULL, node->index,
heap, ROW_BUILD_FOR_PURGE);
row_purge_remove_sec_if_poss(node, node->index, entry);
mem_heap_empty(heap);
}
node->index = dict_table_get_next_index(node->index);
}
mem_heap_free(heap);
return(row_purge_remove_clust_if_poss(node));
}
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
2017-07-07 13:08:16 +03:00
/** Reset DB_TRX_ID, DB_ROLL_PTR of a clustered index record
whose old history can no longer be observed.
@param[in,out] node purge node
@param[in,out] mtr mini-transaction (will be started and committed) */
static
void
row_purge_reset_trx_id(purge_node_t* node, mtr_t* mtr)
{
ut_ad(rw_lock_own(dict_operation_lock, RW_LOCK_S));
/* Reset DB_TRX_ID, DB_ROLL_PTR for old records. */
mtr->start();
if (row_purge_reposition_pcur(BTR_MODIFY_LEAF, node, mtr)) {
dict_index_t* index = dict_table_get_first_index(
node->table);
ulint trx_id_pos = index->n_uniq ? index->n_uniq : 1;
rec_t* rec = btr_pcur_get_rec(&node->pcur);
mem_heap_t* heap = NULL;
/* Reserve enough offsets for the PRIMARY KEY and 2 columns
so that we can access DB_TRX_ID, DB_ROLL_PTR. */
ulint offsets_[REC_OFFS_HEADER_SIZE + MAX_REF_PARTS + 2];
rec_offs_init(offsets_);
ulint* offsets = rec_get_offsets(
rec, index, offsets_, trx_id_pos + 2, &heap);
ut_ad(heap == NULL);
ut_ad(dict_index_get_nth_field(index, trx_id_pos)
->col->mtype == DATA_SYS);
ut_ad(dict_index_get_nth_field(index, trx_id_pos)
->col->prtype == (DATA_TRX_ID | DATA_NOT_NULL));
ut_ad(dict_index_get_nth_field(index, trx_id_pos + 1)
->col->mtype == DATA_SYS);
ut_ad(dict_index_get_nth_field(index, trx_id_pos + 1)
->col->prtype == (DATA_ROLL_PTR | DATA_NOT_NULL));
/* Only update the record if DB_ROLL_PTR matches (the
record has not been modified after this transaction
became purgeable) */
if (node->roll_ptr
== row_get_rec_roll_ptr(rec, index, offsets)) {
ut_ad(!rec_get_deleted_flag(rec,
rec_offs_comp(offsets)));
mtr->set_named_space(index->space);
if (page_zip_des_t* page_zip
= buf_block_get_page_zip(
btr_pcur_get_block(&node->pcur))) {
page_zip_write_trx_id_and_roll_ptr(
page_zip, rec, offsets, trx_id_pos,
0, 1ULL << ROLL_PTR_INSERT_FLAG_POS,
mtr);
} else {
ulint len;
byte* ptr = rec_get_nth_field(
rec, offsets, trx_id_pos, &len);
ut_ad(len == DATA_TRX_ID_LEN);
memset(ptr, 0, DATA_TRX_ID_LEN
+ DATA_ROLL_PTR_LEN);
ptr[DATA_TRX_ID_LEN] = 1U
<< (ROLL_PTR_INSERT_FLAG_POS - CHAR_BIT
* (DATA_ROLL_PTR_LEN - 1));
mlog_log_string(ptr, DATA_TRX_ID_LEN
+ DATA_ROLL_PTR_LEN, mtr);
}
}
}
mtr->commit();
}
/***********************************************************//**
Purges an update of an existing record. Also purges an update of a delete
marked record if that record contained an externally stored field. */
static
void
row_purge_upd_exist_or_extern_func(
/*===============================*/
#ifdef UNIV_DEBUG
const que_thr_t*thr, /*!< in: query thread */
#endif /* UNIV_DEBUG */
purge_node_t* node, /*!< in: row purge node */
trx_undo_rec_t* undo_rec) /*!< in: record to purge */
{
mem_heap_t* heap;
ut_ad(rw_lock_own(dict_operation_lock, RW_LOCK_S));
if (node->rec_type == TRX_UNDO_UPD_DEL_REC
|| (node->cmpl_info & UPD_NODE_NO_ORD_CHANGE)) {
goto skip_secondaries;
}
heap = mem_heap_create(1024);
while (node->index != NULL) {
dict_table_skip_corrupt_index(node->index);
row_purge_skip_uncommitted_virtual_index(node->index);
if (!node->index) {
break;
}
if (row_upd_changes_ord_field_binary(node->index, node->update,
thr, NULL, NULL)) {
/* Build the older version of the index entry */
dtuple_t* entry = row_build_index_entry_low(
node->row, NULL, node->index,
heap, ROW_BUILD_FOR_PURGE);
row_purge_remove_sec_if_poss(node, node->index, entry);
mem_heap_empty(heap);
}
node->index = dict_table_get_next_index(node->index);
}
mem_heap_free(heap);
skip_secondaries:
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
2017-07-07 13:08:16 +03:00
mtr_t mtr;
dict_index_t* index = dict_table_get_first_index(node->table);
/* Free possible externally stored fields */
for (ulint i = 0; i < upd_get_n_fields(node->update); i++) {
const upd_field_t* ufield
= upd_get_nth_field(node->update, i);
if (dfield_is_ext(&ufield->new_val)) {
trx_rseg_t* rseg;
buf_block_t* block;
ulint internal_offset;
byte* data_field;
ibool is_insert;
ulint rseg_id;
ulint page_no;
ulint offset;
/* We use the fact that new_val points to
undo_rec and get thus the offset of
dfield data inside the undo record. Then we
can calculate from node->roll_ptr the file
address of the new_val data */
internal_offset
= ((const byte*)
dfield_get_data(&ufield->new_val))
- undo_rec;
ut_a(internal_offset < UNIV_PAGE_SIZE);
trx_undo_decode_roll_ptr(node->roll_ptr,
&is_insert, &rseg_id,
&page_no, &offset);
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
2017-03-30 13:11:34 +03:00
rseg = trx_sys->rseg_array[rseg_id];
ut_a(rseg != NULL);
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
2017-03-30 13:11:34 +03:00
ut_ad(rseg->id == rseg_id);
ut_ad(rseg->is_persistent());
mtr_start(&mtr);
/* We have to acquire an SX-latch to the clustered
index tree (exclude other tree changes) */
mtr_sx_lock(dict_index_get_lock(index), &mtr);
mtr.set_named_space(index->space);
/* NOTE: we must also acquire an X-latch to the
root page of the tree. We will need it when we
free pages from the tree. If the tree is of height 1,
the tree X-latch does NOT protect the root page,
because it is also a leaf page. Since we will have a
latch on an undo log page, we would break the
latching order if we would only later latch the
root page of such a tree! */
btr_root_get(index, &mtr);
block = buf_page_get(
page_id_t(rseg->space, page_no),
univ_page_size, RW_X_LATCH, &mtr);
buf_block_dbg_add_level(block, SYNC_TRX_UNDO_PAGE);
data_field = buf_block_get_frame(block)
+ offset + internal_offset;
ut_a(dfield_get_len(&ufield->new_val)
>= BTR_EXTERN_FIELD_REF_SIZE);
btr_free_externally_stored_field(
index,
data_field + dfield_get_len(&ufield->new_val)
- BTR_EXTERN_FIELD_REF_SIZE,
NULL, NULL, NULL, 0, false, &mtr);
mtr_commit(&mtr);
}
}
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
2017-07-07 13:08:16 +03:00
row_purge_reset_trx_id(node, &mtr);
}
#ifdef UNIV_DEBUG
# define row_purge_upd_exist_or_extern(thr,node,undo_rec) \
row_purge_upd_exist_or_extern_func(thr,node,undo_rec)
#else /* UNIV_DEBUG */
# define row_purge_upd_exist_or_extern(thr,node,undo_rec) \
row_purge_upd_exist_or_extern_func(node,undo_rec)
#endif /* UNIV_DEBUG */
/***********************************************************//**
Parses the row reference and other info in a modify undo log record.
@return true if purge operation required */
static
bool
row_purge_parse_undo_rec(
/*=====================*/
purge_node_t* node, /*!< in: row undo node */
trx_undo_rec_t* undo_rec, /*!< in: record to purge */
bool* updated_extern, /*!< out: true if an externally
stored field was updated */
que_thr_t* thr) /*!< in: query thread */
{
dict_index_t* clust_index;
byte* ptr;
undo_no_t undo_no;
table_id_t table_id;
roll_ptr_t roll_ptr;
ulint info_bits;
ulint type;
2016-06-21 14:21:03 +02:00
ut_ad(node != NULL);
ut_ad(thr != NULL);
ptr = trx_undo_rec_get_pars(
undo_rec, &type, &node->cmpl_info,
updated_extern, &undo_no, &table_id);
node->rec_type = type;
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
2017-07-07 13:08:16 +03:00
switch (type) {
case TRX_UNDO_INSERT_REC:
break;
default:
#ifdef UNIV_DEBUG
ut_ad(0);
return false;
case TRX_UNDO_UPD_DEL_REC:
case TRX_UNDO_UPD_EXIST_REC:
case TRX_UNDO_DEL_MARK_REC:
#endif /* UNIV_DEBUG */
ptr = trx_undo_update_rec_get_sys_cols(ptr, &node->trx_id,
&roll_ptr, &info_bits);
break;
}
/* Prevent DROP TABLE etc. from running when we are doing the purge
for this row */
try_again:
rw_lock_s_lock_inline(dict_operation_lock, 0, __FILE__, __LINE__);
node->table = dict_table_open_on_id(
table_id, FALSE, DICT_TABLE_OP_NORMAL);
if (node->table == NULL) {
/* The table has been dropped: no need to do purge */
goto err_exit;
}
ut_ad(!dict_table_is_temporary(node->table));
if (!fil_table_accessible(node->table)) {
dict_table_close(node->table, FALSE, FALSE);
node->table = NULL;
goto err_exit;
}
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
2017-07-07 13:08:16 +03:00
if (type != TRX_UNDO_INSERT_REC
&& node->table->n_v_cols && !node->table->vc_templ
&& dict_table_has_indexed_v_cols(node->table)) {
/* Need server fully up for virtual column computation */
if (!mysqld_server_started) {
dict_table_close(node->table, FALSE, FALSE);
rw_lock_s_unlock(dict_operation_lock);
if (srv_shutdown_state != SRV_SHUTDOWN_NONE) {
return(false);
}
os_thread_sleep(1000000);
goto try_again;
}
/* Initialize the template for the table */
innobase_init_vc_templ(node->table);
}
clust_index = dict_table_get_first_index(node->table);
if (clust_index == NULL
|| dict_index_is_corrupted(clust_index)) {
/* The table was corrupt in the data dictionary.
dict_set_corrupted() works on an index, and
we do not have an index to call it with. */
dict_table_close(node->table, FALSE, FALSE);
err_exit:
rw_lock_s_unlock(dict_operation_lock);
return(false);
}
ptr = trx_undo_rec_get_row_ref(ptr, clust_index, &(node->ref),
node->heap);
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
2017-07-07 13:08:16 +03:00
if (type == TRX_UNDO_INSERT_REC) {
return(true);
}
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
2017-07-07 13:08:16 +03:00
ptr = trx_undo_update_rec_get_update(ptr, clust_index, type,
node->trx_id,
roll_ptr, info_bits,
thr_get_trx(thr),
node->heap, &(node->update));
/* Read to the partial row the fields that occur in indexes */
if (!(node->cmpl_info & UPD_NODE_NO_ORD_CHANGE)) {
ptr = trx_undo_rec_get_partial_row(
ptr, clust_index, &node->row,
type == TRX_UNDO_UPD_DEL_REC,
node->heap);
}
return(true);
}
/***********************************************************//**
Purges the parsed record.
@return true if purged, false if skipped */
2016-06-21 14:21:03 +02:00
static MY_ATTRIBUTE((nonnull, warn_unused_result))
bool
row_purge_record_func(
/*==================*/
purge_node_t* node, /*!< in: row purge node */
trx_undo_rec_t* undo_rec, /*!< in: record to purge */
#ifdef UNIV_DEBUG
const que_thr_t*thr, /*!< in: query thread */
#endif /* UNIV_DEBUG */
bool updated_extern) /*!< in: whether external columns
were updated */
{
dict_index_t* clust_index;
bool purged = true;
2015-08-03 13:03:47 +02:00
ut_ad(!node->found_clust);
clust_index = dict_table_get_first_index(node->table);
node->index = dict_table_get_next_index(clust_index);
ut_ad(!trx_undo_roll_ptr_is_insert(node->roll_ptr));
switch (node->rec_type) {
case TRX_UNDO_DEL_MARK_REC:
purged = row_purge_del_mark(node);
MDEV-12698 innodb.innodb_stats_del_mark test failure In my merge of the MySQL fix for Oracle Bug#23333990 / WL#9513 I overlooked some subsequent revisions to the test, and I also failed to notice that the test is actually always failing. Oracle introduced the parameter innodb_stats_include_delete_marked but failed to consistently take it into account in FOREIGN KEY constraints that involve CASCADE or SET NULL. When innodb_stats_include_delete_marked=ON, obviously the purge of delete-marked records should update the statistics as well. One more omission was that statistics were never updated on ROLLBACK. We are fixing that as well, properly taking into account the parameter innodb_stats_include_delete_marked. dict_stats_analyze_index_level(): Simplify an expression. (Using the ternary operator with a constant operand is unnecessary obfuscation.) page_scan_method_t: Revert the change done by Oracle. Instead, examine srv_stats_include_delete_marked directly where it is needed. dict_stats_update_if_needed(): Renamed from row_update_statistics_if_needed(). row_update_for_mysql_using_upd_graph(): Assert that the table statistics are initialized, as guaranteed by ha_innobase::open(). Update the statistics in a consistent way, both for FOREIGN KEY triggers and for the main table. If FOREIGN KEY constraints exist, do not dereference a freed pointer, but cache the proper value of node->is_delete so that it matches prebuilt->table. row_purge_record_func(): Update statistics if innodb_stats_include_delete_marked=ON. row_undo_ins(): Update statistics (on ROLLBACK of a fresh INSERT). This is independent of the parameter; the record is not delete-marked. row_undo_mod(): Update statistics on the ROLLBACK of updating key columns, or (if innodb_stats_include_delete_marked=OFF) updating delete-marks. innodb.innodb_stats_persistent: Renamed and extended from innodb.innodb_stats_del_mark. Reduced the unnecessarily large dataset from 262,144 to 32 rows. Test both values of the configuration parameter innodb_stats_include_delete_marked. Test that purge is updating the statistics. innodb_fts.innodb_fts_multiple_index: Adjust the result. The test is performing a ROLLBACK of an INSERT, which now affects the statistics. include/wait_all_purged.inc: Moved from innodb.innodb_truncate_debug to its own file.
2017-05-16 12:07:26 +03:00
if (purged) {
if (node->table->stat_initialized
&& srv_stats_include_delete_marked) {
dict_stats_update_if_needed(node->table);
}
MONITOR_INC(MONITOR_N_DEL_ROW_PURGE);
}
break;
2017-08-15 17:18:55 +03:00
case TRX_UNDO_INSERT_REC:
node->roll_ptr |= 1ULL << ROLL_PTR_INSERT_FLAG_POS;
/* fall through */
default:
if (!updated_extern) {
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
2017-07-07 13:08:16 +03:00
mtr_t mtr;
row_purge_reset_trx_id(node, &mtr);
break;
}
/* fall through */
case TRX_UNDO_UPD_EXIST_REC:
row_purge_upd_exist_or_extern(thr, node, undo_rec);
MONITOR_INC(MONITOR_N_UPD_EXIST_EXTERN);
break;
}
if (node->found_clust) {
btr_pcur_close(&node->pcur);
node->found_clust = FALSE;
}
if (node->table != NULL) {
dict_table_close(node->table, FALSE, FALSE);
node->table = NULL;
}
return(purged);
}
#ifdef UNIV_DEBUG
# define row_purge_record(node,undo_rec,thr,updated_extern) \
row_purge_record_func(node,undo_rec,thr,updated_extern)
#else /* UNIV_DEBUG */
# define row_purge_record(node,undo_rec,thr,updated_extern) \
row_purge_record_func(node,undo_rec,updated_extern)
#endif /* UNIV_DEBUG */
/***********************************************************//**
Fetches an undo log record and does the purge for the recorded operation.
If none left, or the current purge completed, returns the control to the
parent node, which is always a query thread node. */
2016-06-21 14:21:03 +02:00
static MY_ATTRIBUTE((nonnull))
void
row_purge(
/*======*/
purge_node_t* node, /*!< in: row purge node */
trx_undo_rec_t* undo_rec, /*!< in: record to purge */
que_thr_t* thr) /*!< in: query thread */
{
if (undo_rec != &trx_purge_dummy_rec) {
bool updated_extern;
while (row_purge_parse_undo_rec(
node, undo_rec, &updated_extern, thr)) {
bool purged = row_purge_record(
node, undo_rec, thr, updated_extern);
rw_lock_s_unlock(dict_operation_lock);
if (purged
|| srv_shutdown_state != SRV_SHUTDOWN_NONE) {
return;
}
/* Retry the purge in a second. */
os_thread_sleep(1000000);
}
}
}
/***********************************************************//**
Reset the purge query thread. */
UNIV_INLINE
void
row_purge_end(
/*==========*/
que_thr_t* thr) /*!< in: query thread */
{
purge_node_t* node;
ut_ad(thr);
node = static_cast<purge_node_t*>(thr->run_node);
ut_ad(que_node_get_type(node) == QUE_NODE_PURGE);
thr->run_node = que_node_get_parent(node);
node->undo_recs = NULL;
node->done = TRUE;
ut_a(thr->run_node != NULL);
mem_heap_empty(node->heap);
}
/***********************************************************//**
Does the purge operation for a single undo log record. This is a high-level
function used in an SQL execution graph.
@return query thread to run next or NULL */
que_thr_t*
row_purge_step(
/*===========*/
que_thr_t* thr) /*!< in: query thread */
{
purge_node_t* node;
ut_ad(thr);
node = static_cast<purge_node_t*>(thr->run_node);
node->table = NULL;
node->row = NULL;
node->ref = NULL;
node->index = NULL;
node->update = NULL;
node->found_clust = FALSE;
node->rec_type = ULINT_UNDEFINED;
node->cmpl_info = ULINT_UNDEFINED;
ut_a(!node->done);
ut_ad(que_node_get_type(node) == QUE_NODE_PURGE);
if (!(node->undo_recs == NULL || ib_vector_is_empty(node->undo_recs))) {
trx_purge_rec_t*purge_rec;
purge_rec = static_cast<trx_purge_rec_t*>(
ib_vector_pop(node->undo_recs));
node->roll_ptr = purge_rec->roll_ptr;
row_purge(node, purge_rec->undo_rec, thr);
if (ib_vector_is_empty(node->undo_recs)) {
row_purge_end(thr);
} else {
thr->run_node = node;
}
} else {
row_purge_end(thr);
}
innobase_reset_background_thd(thr_get_trx(thr)->mysql_thd);
return(thr);
}
2015-08-03 13:03:47 +02:00
#ifdef UNIV_DEBUG
/***********************************************************//**
Validate the persisent cursor. The purge node has two references
to the clustered index record - one via the ref member, and the
other via the persistent cursor. These two references must match
each other if the found_clust flag is set.
@return true if the stored copy of persistent cursor is consistent
with the ref member.*/
bool
purge_node_t::validate_pcur()
{
if (!found_clust) {
return(true);
}
if (index == NULL) {
return(true);
}
if (index->type == DICT_FTS) {
return(true);
}
if (!pcur.old_stored) {
2015-08-03 13:03:47 +02:00
return(true);
}
dict_index_t* clust_index = pcur.btr_cur.index;
2015-08-03 13:03:47 +02:00
ulint* offsets = rec_get_offsets(
pcur.old_rec, clust_index, NULL, pcur.old_n_fields, &heap);
2015-08-03 13:03:47 +02:00
/* Here we are comparing the purge ref record and the stored initial
part in persistent cursor. Both cases we store n_uniq fields of the
cluster index and so it is fine to do the comparison. We note this
dependency here as pcur and ref belong to different modules. */
int st = cmp_dtuple_rec(ref, pcur.old_rec, offsets);
if (st != 0) {
ib::error() << "Purge node pcur validation failed";
ib::error() << rec_printer(ref).str();
ib::error() << rec_printer(pcur.old_rec, offsets).str();
2015-08-03 13:03:47 +02:00
return(false);
}
return(true);
}
#endif /* UNIV_DEBUG */