mariadb/sql/sql_rename.cc
Monty e62dc52420 MDEV-25292 Atomic CREATE OR REPLACE TABLE
Atomic CREATE OR REPLACE allows to keep an old table intact if the
command fails or during the crash. That is done by renaming the
original table to temporary name, as a backup and restoring it if the
CREATE fails. When the command is complete and logged the backup
table is deleted.

Atomic replace algorithm

  Two DDL chains are used for CREATE OR REPLACE:
  ddl_log_state_create (C) and ddl_log_state_rm (D).

  1. (C) Log rename of ORIG to TMP table (Rename TMP to original).
  2. Rename orignal to TMP.
  3. (C) Log CREATE_TABLE_ACTION of ORIG (drops ORIG);
  4. Do everything with ORIG (like insert data)
  5. (D) Log drop of TMP
  6. Write query to binlog (this marks (C) to be closed in
     case of failure)
  7. Execute drop of TMP through (D)
  8. Close (C) and (D)

  If there is a failure before 6) we revert the changes in (C)
  Chain (D) is only executed if 6) succeded (C is closed on
  crash recovery).

Foreign key errors will be found at the 1) stage.

Additional notes

  - CREATE TABLE without REPLACE and temporary tables is not affected
    by this commit.
    set @@drop_before_create_or_replace=1 can be used to
    get old behaviour where existing tables are dropped
    in CREATE OR REPLACE.

  - CREATE TABLE is reverted if binlogging the query fails.

  - Engines having HTON_EXPENSIVE_RENAME flag set are not affected by
    this commit. Conflicting tables marked with this flag will be
    deleted with CREATE OR REPLACE.

  - Replication execution is not affected by this commit.
    - Replication will first drop the conflicting table and then
      creating the new one.

  - CREATE TABLE .. SELECT XID usage is fixed and now there is no need
    to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in
    do_postlock()). XID is now correctly updated so it disables
    DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the
    final stage when the table is ready. So if we have XID in the
    binary log we don't need to drop the table.

  - Three variations of CREATE OR REPLACE handled:

    1. CREATE OR REPLACE TABLE t1 (..);
    2. CREATE OR REPLACE TABLE t1 LIKE t2;
    3. CREATE OR REPLACE TABLE t1 SELECT ..;

  - Test case uses 6 combinations for engines (aria, aria_notrans,
    myisam, ib, lock_tables, expensive_rename) and 2 combinations for
    binlog types (row, stmt). Combinations help to check differences
    between the results. Error failures are tested for the above three
    variations.

  - expensive_rename tests CREATE OR REPLACE without atomic
    replace. The effect should be the same as with the old behaviour
    before this commit.

  - Triggers mechanism is unaffected by this change. This is tested in
    create_replace.test.

  - LOCK TABLES is affected. Lock restoration must be done after new
    table is created or TMP is renamed back to ORIG

  - Moved ddl_log_complete() from send_eof() to finalize_ddl(). This
    checkpoint was not executed before for normal CREATE TABLE but is
    executed now.

  - CREATE TABLE will now rollback also if writing to the binary
    logging failed. See rpl_gtid_strict.test

backup ddl log changes

- In case of a successfull CREATE OR REPLACE we only log
  the CREATE event, not the DROP TABLE event of the old table.

ddl_log.cc changes

  ddl_log_execute_action() now properly return error conditions.
  ddl_log_disable_entry() added to allow one to disable one entry.
  The entry on disk is still reserved until ddl_log_complete() is
  executed.

On XID usage

  Like with all other atomic DDL operations XID is used to avoid
  inconsistency between master and slave in the case of a crash after
  binary log is written and before ddl_log_state_create is closed. On
  recovery XIDs are taken from binary log and corresponding DDL log
  events get disabled.  That is done by
  ddl_log_close_binlogged_events().

On linking two chains together

  Chains are executed in the ascending order of entry_pos of execute
  entries. But entry_pos assignment order is undefined: it may assign
  bigger number for the first chain and then smaller number for the
  second chain. So the execution order in that case will be reverse:
  second chain will be executed first.

  To avoid that we link one chain to another. While the base chain
  (ddl_log_state_create) is active the secondary chain
  (ddl_log_state_rm) is not executed. That is: only one chain can be
  executed in two linked chains.

  The interface ddl_log_link_chains() was defined in "MDEV-22166
  ddl_log_write_execute_entry() extension".

Atomic info parameters in HA_CREATE_INFO

  Many functions in CREATE TABLE pass the same parameters. These
  parameters are part of table creation info and should be in
  HA_CREATE_INFO (or whatever). Passing parameters via single
  structure is much easier for adding new data and
  refactoring.

InnoDB changes
  Added ha_innobase::can_be_renamed_to_backup() to check if
  a table with foreign keys can be renamed.

Aria changes:
- Fixed issue in Aria engine with CREATE + locked tables
  that data was not properly commited in some cases in
  case of crashes.

Known issues:
- InnoDB tables with foreign key definitions are not fully supported
  with atomic create and replace:
  - ha_innobase::can_be_renamed_to_backup() can detect some cases
    where InnoDB does not support renaming table with foreign key
    constraints.  In this case MariaDB will drop the old table before
    creating the new one.
    The detected cases are:
    - The new and old table is using the same foreign key constraint
      name.
    - The old table has self referencing constraints.
  - If the old and new table uses the same name for a constraint the
    create of the new table will fail. The orignal table will be
    restored in this case.
  - The above issues will be fixed in a future commit.
- CREATE OR REPLACE TEMPORARY table is not full atomic. Any conflicting
  table will always be dropped before creating a new one. (Old behaviour).
2025-03-18 18:28:16 +01:00

584 lines
19 KiB
C++

/*
Copyright (c) 2000, 2013, Oracle and/or its affiliates.
Copyright (c) 2011, 2021, Monty Program Ab.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA */
/*
Atomic rename of table; RENAME TABLE t1 to t2, tmp to t1 [,...]
*/
#include "mariadb.h"
#include "sql_priv.h"
#include "unireg.h"
#include "sql_rename.h"
#include "sql_cache.h" // query_cache_*
#include "sql_table.h" // write_bin_log
#include "sql_view.h" // mysql_frm_type, mysql_rename_view
#include "sql_trigger.h"
#include "sql_base.h" // tdc_remove_table, lock_table_names,
#include "sql_handler.h" // mysql_ha_rm_tables
#include "sql_statistics.h"
#include "ddl_log.h"
#include "wsrep_mysqld.h"
#include "debug.h"
/* used to hold table entries for as part of list of renamed temporary tables */
struct TABLE_PAIR
{
TABLE_LIST *from, *to;
};
static bool rename_tables(THD *thd, TABLE_LIST *table_list,
DDL_LOG_STATE *ddl_log_state,
bool skip_error, bool if_exits,
bool *force_if_exists,
bool *not_logged_temporary_tables);
/*
Every two entries in the table_list form a pair of original name and
the new name.
*/
bool mysql_rename_tables(THD *thd, TABLE_LIST *table_list, bool silent,
bool if_exists)
{
bool error= 1;
bool binlog_error= 0, force_if_exists, not_logged_temporary_tables;
TABLE_LIST *ren_table= 0;
int to_table;
const char *rename_log_table[2]= {NULL, NULL};
DDL_LOG_STATE ddl_log_state;
DBUG_ENTER("mysql_rename_tables");
/*
Avoid problems with a rename on a table that we have locked or
if the user is trying to to do this in a transcation context
*/
if (thd->locked_tables_mode || thd->in_active_multi_stmt_transaction())
{
my_message(ER_LOCK_OR_ACTIVE_TRANSACTION,
ER_THD(thd, ER_LOCK_OR_ACTIVE_TRANSACTION), MYF(0));
DBUG_RETURN(1);
}
mysql_ha_rm_tables(thd, table_list);
if (logger.is_log_table_enabled(QUERY_LOG_GENERAL) ||
logger.is_log_table_enabled(QUERY_LOG_SLOW))
{
/*
Rules for rename of a log table:
IF 1. Log tables are enabled
AND 2. Rename operates on the log table and nothing is being
renamed to the log table.
DO 3. Throw an error message.
ELSE 4. Perform rename.
*/
for (to_table= 0, ren_table= table_list; ren_table;
to_table= 1 - to_table, ren_table= ren_table->next_local)
{
int log_table_rename;
if ((log_table_rename= check_if_log_table(ren_table, TRUE, NullS)))
{
/*
as we use log_table_rename as an array index, we need it to start
with 0, while QUERY_LOG_SLOW == 1 and QUERY_LOG_GENERAL == 2.
So, we shift the value to start with 0;
*/
log_table_rename--;
if (rename_log_table[log_table_rename])
{
if (to_table)
rename_log_table[log_table_rename]= NULL;
else
{
/*
Two renames of "log_table TO" w/o rename "TO log_table" in
between.
*/
my_error(ER_CANT_RENAME_LOG_TABLE, MYF(0),
ren_table->table_name.str,
ren_table->table_name.str);
goto err;
}
}
else
{
if (to_table)
{
/*
Attempt to rename a table TO log_table w/o renaming
log_table TO some table.
*/
my_error(ER_CANT_RENAME_LOG_TABLE, MYF(0),
ren_table->table_name.str,
ren_table->table_name.str);
goto err;
}
else
{
/* save the name of the log table to report an error */
rename_log_table[log_table_rename]= ren_table->table_name.str;
}
}
}
}
if (rename_log_table[0] || rename_log_table[1])
{
if (rename_log_table[0])
my_error(ER_CANT_RENAME_LOG_TABLE, MYF(0), rename_log_table[0],
rename_log_table[0]);
else
my_error(ER_CANT_RENAME_LOG_TABLE, MYF(0), rename_log_table[1],
rename_log_table[1]);
goto err;
}
}
if (lock_table_names(thd, table_list, 0, thd->variables.lock_wait_timeout,
0))
goto err;
error=0;
bzero(&ddl_log_state, sizeof(ddl_log_state));
/*
An exclusive lock on table names is satisfactory to ensure
no other thread accesses this table.
*/
error= rename_tables(thd, table_list, &ddl_log_state,
0, if_exists, &force_if_exists,
&not_logged_temporary_tables);
if (likely(!silent && !error))
{
ulonglong save_option_bits= thd->variables.option_bits;
if (force_if_exists && ! if_exists)
{
/* Add IF EXISTS to binary log */
thd->variables.option_bits|= OPTION_IF_EXISTS;
}
debug_crash_here("ddl_log_rename_before_binlog");
/*
Store xid in ddl log and binary log so that we can check on ddl recovery
if the item is in the binary log (and thus the operation was complete
*/
thd->binlog_xid= thd->query_id;
ddl_log_update_xid(&ddl_log_state, thd->binlog_xid);
if (mysql_bin_log.is_open())
{
if (not_logged_temporary_tables)
binlog_error= thd->binlog_renamed_tmp_tables(table_list);
else
binlog_error= write_bin_log(thd, TRUE, thd->query(),
thd->query_length());
if (binlog_error)
error= 1;
}
thd->binlog_xid= 0;
thd->variables.option_bits= save_option_bits;
debug_crash_here("ddl_log_rename_after_binlog");
if (likely(!binlog_error))
my_ok(thd);
}
if (likely(!error))
{
query_cache_invalidate3(thd, table_list, 0);
ddl_log_complete(&ddl_log_state);
}
else
{
/* Revert the renames of normal tables with the help of the ddl log */
error|= ddl_log_revert(thd, &ddl_log_state);
}
err:
DBUG_RETURN(error || binlog_error);
}
static bool
do_rename_temporary(THD *thd, TABLE_LIST *ren_table, TABLE_LIST *new_table)
{
LEX_CSTRING *new_alias;
DBUG_ENTER("do_rename_temporary");
new_alias= (lower_case_table_names == 2) ? &new_table->alias :
&new_table->table_name;
if (thd->find_temporary_table(new_table, THD::TMP_TABLE_ANY))
{
my_error(ER_TABLE_EXISTS_ERROR, MYF(0), new_alias->str);
DBUG_RETURN(1); // This can't be skipped
}
DBUG_RETURN(thd->rename_temporary_table(ren_table->table,
&new_table->db, new_alias));
}
/**
check_rename()
Check pre-conditions for rename
- From table should exists
- To table should not exists.
SYNOPSIS
@param new_table_name The new table/view name
@param new_table_alias The new table/view alias
@param if_exists If not set, give an error if the table does not
exists. If set, just give a warning in this case.
@return
@retval 0 ok
@retval >0 Error (from table doesn't exists or to table exists)
@retval <0 Can't do rename, but no error
*/
static int
check_rename(THD *thd, rename_param *param,
const TABLE_LIST *ren_table,
const Lex_ident_db &new_db,
const Lex_ident_table &new_table_name,
const Lex_ident_table &new_table_alias,
bool if_exists)
{
DBUG_ENTER("check_rename");
DBUG_PRINT("enter", ("if_exists: %d", (int) if_exists));
if (lower_case_table_names == 2)
{
param->old_alias= ren_table->alias;
param->new_alias= new_table_alias;
}
else
{
param->old_alias= ren_table->table_name;
param->new_alias= new_table_name;
}
DBUG_ASSERT(param->new_alias.str);
if (!ha_table_exists(thd, &ren_table->db, &param->old_alias,
&param->old_version, &param->from_table_hton, NULL, 0) ||
!param->from_table_hton)
{
my_error(ER_NO_SUCH_TABLE, MYF(if_exists ? ME_NOTE : 0),
ren_table->db.str, param->old_alias.str);
DBUG_RETURN(if_exists ? -1 : 1);
}
if (param->from_table_hton != view_pseudo_hton &&
ha_check_if_updates_are_ignored(thd, param->from_table_hton, "RENAME"))
{
/*
Shared table. Just drop the old .frm as it's not correct anymore
Discovery will find the old table when it's accessed
*/
tdc_remove_table(thd, ren_table->db.str, ren_table->table_name.str);
quick_rm_table(thd, 0, &ren_table->db, &param->old_alias, QRMT_FRM);
DBUG_RETURN(-1);
}
if (ha_table_exists(thd, &new_db, &param->new_alias, NULL, NULL, NULL,
(param->rename_flags & FN_TO_IS_TMP)))
{
my_error(ER_TABLE_EXISTS_ERROR, MYF(0), param->new_alias.str);
DBUG_RETURN(1); // This can't be skipped
}
DBUG_RETURN(0);
}
/*
Rename a single table or a view
SYNPOSIS
do_rename()
thd Thread handle
param rename parameters
ddl_log_state Parameter for ddl logging from check_rename
ren_table A table/view to be renamed
new_db The database to which the table to be moved to
skip_error Skip error, but only if the table didn't exists
force_if_exists Set to 1 if we have to log the query with 'IF EXISTS'
Otherwise don't touch the value
DESCRIPTION
Rename a single table or a view.
In case of failure, all changes will be reverted
Even if mysql_rename_tables() cannot be used with LOCK TABLES,
the table can still be locked if we come here from CREATE ... REPLACE.
If ddl_log_state is NULL then we will not log the rename to the ddl log.
RETURN
false Ok
true rename failed
*/
bool
do_rename(THD *thd, const rename_param *param, DDL_LOG_STATE *ddl_log_state,
TABLE_LIST *ren_table, const Lex_ident_db *new_db,
bool skip_error, bool *force_if_exists)
{
int rc= 1;
handlerton *hton;
TRIGGER_RENAME_PARAM rename_param;
DBUG_ENTER("do_rename");
DBUG_PRINT("enter", ("skip_error: %d", (int) skip_error));
const Lex_ident_table * const old_alias= &param->old_alias;
const Lex_ident_table * const new_alias= &param->new_alias;
hton= param->from_table_hton;
rename_param.rename_flags= param->rename_flags;
#ifdef WITH_WSREP
if (WSREP(thd) && hton && hton != view_pseudo_hton &&
!wsrep_should_replicate_ddl(thd, hton))
DBUG_RETURN(1);
#endif
if (!(param->rename_flags & FN_FROM_IS_TMP))
tdc_remove_table(thd, ren_table->db.str, ren_table->table_name.str);
if (hton != view_pseudo_hton)
{
if (hton->flags & HTON_TABLE_MAY_NOT_EXIST_ON_SLAVE)
*force_if_exists= 1;
/* Check if we can rename triggers */
if (!(param->rename_flags & FN_IS_TMP) &&
Table_triggers_list::prepare_for_rename(thd, &rename_param,
ren_table->db,
*old_alias,
ren_table->table_name,
*new_db,
*new_alias))
DBUG_RETURN(!skip_error);
thd->replication_flags= 0;
if (ddl_log_state &&
ddl_log_rename_table(ddl_log_state, hton,
&ren_table->db, old_alias, new_db, new_alias,
DDL_RENAME_PHASE_TABLE,
rename_flags_to_ddl_flags(param->rename_flags)))
DBUG_RETURN(1);
debug_crash_here("ddl_log_rename_before_rename_table");
if (!(rc= mysql_rename_table(hton, &ren_table->db, old_alias,
new_db, new_alias, &param->old_version,
param->rename_flags | QRMT_DEFAULT)))
{
/* Table rename succeded.
It's safe to start recovery at rename trigger phase
*/
debug_crash_here("ddl_log_rename_before_phase_trigger");
if (ddl_log_state)
ddl_log_update_phase(ddl_log_state, DDL_RENAME_PHASE_TRIGGER);
debug_crash_here("ddl_log_rename_before_rename_trigger");
rc= 0;
if (!(param->rename_flags & FN_IS_TMP))
{
if (!(rc= Table_triggers_list::change_table_name(thd,
&rename_param,
&ren_table->db,
old_alias,
&ren_table->table_name,
new_db,
new_alias)))
{
debug_crash_here("ddl_log_rename_before_stat_tables");
(void) rename_table_in_stat_tables(thd, &ren_table->db,
&ren_table->table_name,
new_db, new_alias);
debug_crash_here("ddl_log_rename_after_stat_tables");
}
else
{
/*
We've succeeded in renaming table's .frm and in updating
corresponding handler data, but have failed to update table's
triggers appropriately. So let us revert operations on .frm
and handler's data and report about failure to rename table.
*/
debug_crash_here("ddl_log_rename_after_failed_rename_trigger");
(void) mysql_rename_table(hton, new_db, new_alias,
&ren_table->db, old_alias,
&param->old_version,
QRMT_DEFAULT | NO_FK_CHECKS);
debug_crash_here("ddl_log_rename_after_revert_rename_table");
if (ddl_log_state)
ddl_log_disable_entry(ddl_log_state);
debug_crash_here("ddl_log_rename_after_disable_entry");
}
}
else
{
/*
We come here in case of CREATE OR REPLACE when renaming the
trigger file TO/FROM a temporary table name
*/
Table_triggers_list::rename_trigger_file(thd,
&ren_table->db,
&ren_table->table_name,
new_db, new_alias,
param->rename_flags);
debug_crash_here("ddl_log_rename_after_rename_trigger_file");
}
}
if (thd->replication_flags & OPTION_IF_EXISTS)
*force_if_exists= 1;
}
else
{
/*
Change of schema is not allowed
except of ALTER ...UPGRADE DATA DIRECTORY NAME command
because a view has valid internal db&table names in this case.
*/
if (thd->lex->sql_command != SQLCOM_ALTER_DB_UPGRADE &&
cmp(&ren_table->db, new_db))
{
my_error(ER_FORBID_SCHEMA_CHANGE, MYF(0), ren_table->db.str, new_db->str);
DBUG_RETURN(1);
}
DBUG_ASSERT(ddl_log_state);
ddl_log_rename_view(ddl_log_state, &ren_table->db,
&ren_table->table_name, new_db, new_alias);
debug_crash_here("ddl_log_rename_before_rename_view");
rc= mysql_rename_view(thd, new_db, new_alias, &ren_table->db,
&ren_table->table_name);
debug_crash_here("ddl_log_rename_after_rename_view");
if (rc)
{
/*
On error mysql_rename_view() will leave things as such.
*/
if (ddl_log_state)
ddl_log_disable_entry(ddl_log_state);
debug_crash_here("ddl_log_rename_after_disable_entry");
}
}
DBUG_RETURN(rc && !skip_error ? 1 : 0);
}
/*
Rename all tables in list; Return pointer to wrong entry if something goes
wrong. Note that the table_list may be empty!
*/
/*
Rename tables/views in the list
SYNPOSIS
rename_tables()
thd Thread handle
table_list List of tables to rename
ddl_log_state ddl logging
skip_error Whether to skip errors
if_exists Don't give an error if table doesn't exists
force_if_exists Set to 1 if we have to log the query with 'IF EXISTS'
Otherwise set it to 0
not_logged_temporary_tables Set to 1 if there was a temporary table in the statement
that was not in the binary logged.
DESCRIPTION
Take a table/view name from and odd list element and rename it to a
the name taken from list element+1. Note that the table_list may be
empty.
RETURN
0 Ok
1 error
All tables are reverted to their original names
*/
static bool
rename_tables(THD *thd, TABLE_LIST *table_list, DDL_LOG_STATE *ddl_log_state,
bool skip_error, bool if_exists, bool *force_if_exists,
bool *not_logged_temporary_tables)
{
TABLE_LIST *ren_table, *new_table;
List<TABLE_PAIR> tmp_tables;
DBUG_ENTER("rename_tables");
*force_if_exists= 0;
*not_logged_temporary_tables= 0;
for (ren_table= table_list; ren_table; ren_table= new_table->next_local)
{
new_table= ren_table->next_local;
if (is_temporary_table(ren_table))
{
/*
Store renamed temporary tables into a list.
We don't store these in the ddl log to avoid writes and syncs
when only using temporary tables. We don't need the log as
all temporary tables will disappear anyway in a crash.
*/
TABLE_PAIR *pair= thd->alloc<TABLE_PAIR>(1);
if (! pair || tmp_tables.push_front(pair, thd->mem_root))
goto revert_rename;
pair->from= ren_table;
pair->to= new_table;
if (do_rename_temporary(thd, ren_table, new_table))
goto revert_rename;
if (!ren_table->table->s->table_creation_was_logged)
*not_logged_temporary_tables= 1;
}
else
{
int error;
rename_param param;
error= check_rename(thd, &param, ren_table, new_table->db,
new_table->table_name,
new_table->alias, (skip_error || if_exists));
if (error < 0)
continue; // Ignore rename (if exists)
if (error > 0)
goto revert_rename;
if (do_rename(thd, &param, ddl_log_state,
ren_table, &new_table->db,
skip_error, force_if_exists))
goto revert_rename;
}
}
DBUG_RETURN(0);
revert_rename:
/* Revert temporary tables. Normal tables are reverted in the caller */
List_iterator_fast<TABLE_PAIR> it(tmp_tables);
while (TABLE_PAIR *pair= it++)
do_rename_temporary(thd, pair->to, pair->from);
DBUG_RETURN(1);
}