Commit graph

459 commits

Author SHA1 Message Date
Marko Mäkelä
1dea7f7977 Merge 10.3 into 10.4 2021-05-25 15:38:57 +03:00
Marko Mäkelä
1864a8ea93 Merge 10.2 into 10.3 2021-05-24 09:38:49 +03:00
Thirunarayanan Balathandayuthapani
349d77ecdd MDEV-25721 Double free of table when inplace alter
FTS add index fails

Problem:
========
InnoDB double frees the table if auxiliary fts table
creation fails and fails to set the dict operation
for the transaction. It leads to failure while
dropping newly added index.

Solution:
=========
  InnoDB should avoid double freeing and set the
dictionary operation of transaction in
fts_create_common_tables()
2021-05-23 15:53:59 +05:30
Marko Mäkelä
49e2c8f0a6 MDEV-25743: Unnecessary copying of table names in InnoDB dictionary
Many InnoDB data dictionary cache operations require that the
table name be copied so that it will be NUL terminated.
(For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.)

dict_table_t::is_garbage_name(): Check if a name belongs to
the background drop table queue.

dict_check_if_system_table_exists(): Remove.

dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables
SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup.

dict_sys_t::create_or_check_sys_tables(): Replaces
dict_create_or_check_foreign_constraint_tables() and
dict_create_or_check_sys_virtual().

dict_sys_t::load_table(): Replaces dict_table_get_low()
and dict_load_table().

dict_sys_t::find_table(): Renamed from get_table().

dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded
tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist.

trx_t::has_stats_table_lock(): Moved to dict0stats.cc.

Some error messages will now report table names in the internal
databasename/tablename format, instead of `databasename`.`tablename`.
2021-05-21 18:03:40 +03:00
Monty
7762ee5dbe MDEV-25180 Atomic ALTER TABLE
MDEV-25604 Atomic DDL: Binlog event written upon recovery does not
           have default database

The purpose of this task is to ensure that ALTER TABLE is atomic even if
the MariaDB server would be killed at any point of the alter table.
This means that either the ALTER TABLE succeeds (including that triggers,
the status tables and the binary log are updated) or things should be
reverted to their original state.

If the server crashes before the new version is fully up to date and
commited, it will revert to the original table and remove all
temporary files and tables.
If the new version is commited, crash recovery will use the new version,
and update triggers, the status tables and the binary log.
The one execption is ALTER TABLE .. RENAME .. where no changes are done
to table definition. This one will work as RENAME and roll back unless
the whole statement completed, including updating the binary log (if
enabled).

Other changes:
- Added handlerton->check_version() function to allow the ddl recovery
  code to check, in case of inplace alter table, if the table in the
  storage engine is of the new or old version.
- Added handler->table_version() so that an engine can report the current
  version of the table. This should be changed each time the table
  definition changes.
- Added  ha_signal_ddl_recovery_done() and
  handlerton::signal_ddl_recovery_done() to inform all handlers when
  ddl recovery has been done. (Needed by InnoDB).
- Added handlerton call inplace_alter_table_committed, to signal engine
  that ddl_log has been closed for the alter table query.
- Added new handerton flag
  HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we
  should call hton->notify_tabledef_changed() during
  mysql_inplace_alter_table. This was required as MyRocks and InnoDB
  needed the call at different times.
- Added function server_uuid_value() to be able to generate a temporary
  xid when ddl recovery writes the query to the binary log. This is
  needed to be able to handle crashes during ddl log recovery.
- Moved freeing of the frm definition to end of mysql_alter_table() to
  remove duplicate code and have a common exit strategy.

-------
InnoDB part of atomic ALTER TABLE
(Implemented by Marko Mäkelä)
innodb_check_version(): Compare the saved dict_table_t::def_trx_id
to determine whether an ALTER TABLE operation was committed.

We must correctly recover dict_table_t::def_trx_id for this to work.
Before purge removes any trace of DB_TRX_ID from system tables, it
will make an effort to load the user table into the cache, so that
the dict_table_t::def_trx_id can be recovered.

ha_innobase::table_version(): return garbage, or the trx_id that would
be used for committing an ALTER TABLE operation.

In InnoDB, table names starting with #sql-ib will remain special:
they will be dropped on startup. This may be revisited later in
MDEV-18518 when we implement proper undo logging and rollback
for creating or dropping multiple tables in a transaction.

Table names starting with #sql will retain some special meaning:
dict_table_t::parse_name() will not consider such names for
MDL acquisition, and dict_table_rename_in_cache() will treat such
names specially when handling FOREIGN KEY constraints.

Simplify InnoDB DROP INDEX.
Prevent purge wakeup

To ensure that dict_table_t::def_trx_id will be recovered correctly
in case the server is killed before ddl_log_complete(), we will block
the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS
between ha_innobase::commit_inplace_alter_table(commit=true)
(purge_sys.stop_SYS()) and purge_sys.resume_SYS().
The completion callback purge_sys.resume_SYS() must be between
ddl_log_complete() and MDL release.

--------

MyRocks support for atomic ALTER TABLE
(Implemented by Sergui Petrunia)

Implement these SE API functions:
- ha_rocksdb::table_version()
- hton->check_version = rocksdb_check_versionMyRocks data dictionary
  now stores table version for each table.
  (Absence of table version record is interpreted as table_version=0,
  that is, which means no upgrade changes are needed)
- For inplace alter table of a partitioned table, call the underlying
  handlerton when checking if the table is ok. This assumes that the
  partition engine commits all changes at once.
2021-05-19 22:54:13 +02:00
Rucha Deodhar
2fdb556e04 MDEV-8334: Rename utf8 to utf8mb3
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
2021-05-19 06:48:36 +02:00
Marko Mäkelä
86dc7b4d4c MDEV-24626 Remove synchronous write of page0 file during file creation
During data file creation, InnoDB holds dict_sys mutex, tries to
write page 0 of the file and flushes the file. This not only causing
unnecessary contention but also a deviation from the write-ahead
logging protocol.

The clean sequence of operations is that we first start a dictionary
transaction and write SYS_TABLES and SYS_INDEXES records that identify
the tablespace. Then, we durably write a FILE_CREATE record to the
write-ahead log and create the file.

Recovery should not unnecessarily insist that the first page of each
data file that is referred to by the redo log is valid. It must be
enough that page 0 of the tablespace can be initialized based on the
redo log contents.

We introduce a new data structure deferred_spaces that keeps track
of corrupted-looking files during recovery. The data structure holds
the last LSN of a FILE_ record referring to the data file, the
tablespace identifier, and the last known file name.

There are two scenarios can happen during recovery:
i) Sufficient memory: InnoDB can reconstruct the
tablespace after parsing all redo log records.

ii) Insufficient memory(multiple apply phase): InnoDB should
store the deferred tablespace redo logs even though
tablespace is not present. InnoDB should start constructing
the tablespace when it first encounters deferred tablespace
id.

Mariabackup copies the zero filled ibd file in backup_fix_ddl() as
the extension of .new file. Mariabackup test case does page flushing
when it deals with DDL operation during backup operation.

fil_ibd_create(): Remove the write of page0 and flushing of file

fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has
zero filled page0

Datafile: Clean up the error handling, and do not report errors
if we are in the middle of recovery. The caller will check
Datafile::m_defer.

fil_node_t::deferred: Indicates whether the tablespace loading was
deferred during recovery

FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace
file was cannot be loaded.

recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to
initialize fil_space_t based on buffered metadata and records to
initialize page 0. Ignore the flags in fil_name_t, because they are
intentionally invalid.

fil_name_process(): Update deferred_spaces.

recv_sys_t::parse(): Store the redo log if the tablespace id
is present in deferred spaces

recv_sys_t::recover_low(): Should recover the first page of
the tablespace even though the tablespace instance is not
present

recv_sys_t::apply(): Initialize the deferred tablespace
before applying the deferred tablespace records

recv_validate_tablespace(): Skip the validation for deferred_spaces.

recv_rename_files(): Moved and revised from recv_sys_t::apply().
For deferred-recovery tablespaces, do not attempt to rename the
file if a deferred-recovery tablespace is associated with the name.

recv_recovery_from_checkpoint_start(): Invoke recv_rename_files()
and initialize all deferred tablespaces before applying redo log.

fil_node_t::read_page0(): Skip page0 validation if the tablespace
is deferred

buf_page_create_deferred(): A variant of buf_page_create() when
the fil_space_t is not available yet

This is joint work with Thirunarayanan Balathandayuthapani,
who implemented an initial prototype.
2021-05-17 18:12:33 +03:00
Marko Mäkelä
3a427c568b Cleanup: Remove useless message "If you are installing InnoDB" 2021-05-14 08:26:51 +03:00
Marko Mäkelä
916b237b3f Merge 10.5 into 10.6 2021-05-07 15:00:27 +03:00
Marko Mäkelä
cc2ddde4d8 MDEV-18518 follow-up fixes
Make DDL operations that involve FULLTEXT INDEX atomic.
In particular, we must drop the internal FTS_ tables in the same
DDL transaction with ALTER TABLE.

Remove all references to fts_drop_orphaned_tables().

row_merge_drop_temp_indexes(): Drop also the internal FTS_ tables
that are associated with index stubs that were created in
prepare_inplace_alter_table_dict() for
CREATE FULLTEXT INDEX before the server was killed.

fts_clear_all(): Remove the fts_drop_tables() call. It has to be
executed before the transaction is committed!

dict_load_indexes(): Do not load any metadata for index stubs
that had been created by prepare_inplace_alter_table_dict()

fts_create_one_common_table(), fts_create_common_tables(),
fts_create_one_index_table(), fts_create_index_tables():
Remove redundant error handling. The tables will be dropped
just fine by dict_drop_index_tree().

commit_try_norebuild(): Also drop the FTS_ tables when dropping
FULLTEXT INDEX.

The changes to the test case innodb_fts.crash_recovery has been
extensively tested. The non-debug server will be killed while
the 3 ALTER TABLE are in any phase of execution. With the debug
server, DEBUG_SYNC should make the test deterministic.
2021-05-06 16:04:29 +03:00
Nikita Malyavin
3f55c56951 Merge branch bb-10.4-release into bb-10.5-release 2021-05-05 23:57:11 +03:00
Nikita Malyavin
509e4990af Merge branch bb-10.3-release into bb-10.4-release 2021-05-05 23:03:01 +03:00
Nikita Malyavin
a8a925dd22 Merge branch bb-10.2-release into bb-10.3-release 2021-05-04 14:49:31 +03:00
Thirunarayanan Balathandayuthapani
13b9af50e4 MDEV-25536 InnoDB: Failing assertion: sym_node->table != NULL in pars_retrieve_table_def
- Fixing post-push failure of innodb_fts_misc_1 test case.
2021-04-30 20:05:12 +05:30
Thirunarayanan Balathandayuthapani
0024524d54 MDEV-25536 InnoDB: Failing assertion: sym_node->table != NULL in pars_retrieve_table_def
InnoDB tries to fetch the deleted doc ids for discarded
tablespace. In i_s_fts_deleted_generic_fill(), InnoDB needs
to check whether the table is discarded or not before fetching
deleted doc ids.
2021-04-30 13:40:28 +05:30
Marko Mäkelä
4930f9c94b Merge 10.5 into 10.6 2021-04-21 11:45:00 +03:00
Marko Mäkelä
d2e2d32933 Merge 10.5 into 10.6 2021-04-14 12:32:27 +03:00
Marko Mäkelä
6c3e860cbf Merge 10.4 into 10.5 2021-04-14 11:35:39 +03:00
Marko Mäkelä
5008171b05 Merge 10.3 into 10.4 2021-04-14 10:33:59 +03:00
Marko Mäkelä
450c017c2d Merge 10.2 into 10.3 2021-04-09 14:32:06 +03:00
Thirunarayanan Balathandayuthapani
c32edd7515 MDEV-25295 Aborted FTS_DOC_ID_INDEX considered as existing FTS_DOC_ID_INDEX during DDL
InnoDB should skip the dropped aborted FTS_DOC_ID_INDEX while
checking the existing FTS_DOC_ID_INDEX in the table. InnoDB
should able to create new FTS_DOC_ID_INDEX if the fulltext
index is being added for the first time.
2021-04-06 18:52:22 +05:30
Marko Mäkelä
176aaf93d1 Merge 10.5 into 10.6 2021-03-31 12:04:50 +03:00
Marko Mäkelä
5eae8c2742 Merge 10.4 into 10.5 2021-03-31 11:05:21 +03:00
Marko Mäkelä
50de71b026 Merge 10.3 into 10.4 2021-03-31 09:47:14 +03:00
Marko Mäkelä
d6d3d9ae2f Merge 10.2 into 10.3 2021-03-31 08:01:03 +03:00
Thirunarayanan Balathandayuthapani
b771ab242b MDEV-25200 Index count mismatch due to aborted FULLTEXT INDEX
- Aborting of fulltext index creation fails to remove the
index from sys indexes table. When we try to reload the
table definition, InnoDB fails with index count mismatch
error. InnoDB should remove the index from sys indexes while
rollbacking the secondary index creation.
2021-03-30 20:40:14 +05:30
Marko Mäkelä
92b2a911e5 MDEV-24818 Concurrent use of InnoDB table is impossible until the first transaction is finished
In MDEV-515, we enabled an optimization where an insert into an
empty table will use table-level locking and undo logging.
This may break applications that expect row-level locking.

The SQL statements created by the mysqldump utility will include the
following:

    SET unique_checks=0, foreign_key_checks=0;

We will use these flags to enable the table-level locked and logged
insert. Unless the parameters are set, INSERT will be executed in
the old way, with row-level undo logging and implicit record locks.
2021-03-16 15:20:26 +02:00
Marko Mäkelä
a43ff483fa Merge 10.5 into 10.6 2021-03-11 20:20:07 +02:00
Marko Mäkelä
a4b7232b2c Merge 10.4 into 10.5 2021-03-11 20:09:34 +02:00
Marko Mäkelä
1ea6ac3c95 Merge 10.3 into 10.4 2021-03-11 19:33:45 +02:00
Marko Mäkelä
6e7ac4060d MDEV-25070 fixup: Correct the result 2021-03-11 12:34:56 +02:00
Thirunarayanan Balathandayuthapani
cc9c303a45 MDEV-25070 SIGSEGV in fts_create_in_mem_aux_table
InnoDB set the space in dict_table_t as NULL when table
is discarded. So InnoDB shouldn't use the space present
in table to detect whether the given tablespace is
temporary tablespace.
2021-03-10 16:50:10 +05:30
Varun Gupta
f691d9865b MDEV-7317: Make an index ignorable to the optimizer
This feature adds the functionality of ignorability for indexes.
Indexes are not ignored be default.

To control index ignorability explicitly for a new index,
use IGNORE or NOT IGNORE as part of the index definition for
CREATE TABLE, CREATE INDEX, or ALTER TABLE.

Primary keys (explicit or implicit) cannot be made ignorable.

The table INFORMATION_SCHEMA.STATISTICS get a new column named IGNORED that
would store whether an index needs to be ignored or not.
2021-03-04 22:50:00 +05:30
Marko Mäkelä
94b4578704 Merge 10.5 into 10.6 2021-02-17 19:39:05 +02:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Marko Mäkelä
3cef4f8f0f MDEV-515 Reduce InnoDB undo logging for insert into empty table
We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.

For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().

Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.

Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.

This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).

INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.

In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.

Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.

dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.

ins_node_t::bulk_insert: Whether bulk insert was initiated.

trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.

trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().

trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.

row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.

lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.

btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().

trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.

ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.

row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.

row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().

lock_sec_rec_cons_read_sees(): Replaced with lower-level code.

btr_root_page_init(): Refactored from btr_create().

dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.

ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.

This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
2021-01-25 18:41:27 +02:00
Marko Mäkelä
46234f03c8 Merge 10.5 into 10.6 2021-01-25 12:56:30 +02:00
Marko Mäkelä
3467f63764 Merge 10.3 into 10.4 2021-01-25 11:02:07 +02:00
Aleksey Midenkov
775aa6f08a MDEV-24403 Segfault on CREATE TABLE with explicit FTS_DOC_ID_INDEX by multiple fields
Ignore table->fts freed previously by create_table_info_t::create_table().
2021-01-19 14:25:51 +03:00
Marko Mäkelä
049811ec3c Merge 10.2 into 10.3 2021-01-19 08:31:03 +02:00
Thirunarayanan Balathandayuthapani
fdc4b7a6b2 MDEV-21478 Inplace ALTER fails to report error when FTS_DOC_ID
with wrong data type is added

  Inplace alter fails to report error when fts_doc_id column with
wrong data type is added.

prepare_inplace_alter_table_dict(): Should check whether the column
is fts_doc_id. It should be of bigint type, should accept non null
data type and it should be in capital letters.
2021-01-11 18:12:59 +05:30
Marko Mäkelä
9bc874a594 MDEV-23497 Make ROW_FORMAT=COMPRESSED read-only by default
Let us introduce the parameter innodb_read_only_compressed
that is ON by default, making any ROW_FORMAT=COMPRESSED tables
read-only.

I developed the ROW_FORMAT=COMPRESSED format based on
Heikki Tuuri's rough design between 2005 and 2008. It might
have been a good idea back then, but no proper benchmarks were
ever run to validate the design or the implementation.

The format has been more or less obsolete for years.
It limits innodb_page_size to 16384 bytes (the default),
and instant ALTER TABLE is not supported.

This is the first step towards deprecating and removing
write support for ROW_FORMAT=COMPRESSED tables.
2020-11-11 11:15:11 +02:00
Marko Mäkelä
46957a6a77 Merge 10.3 into 10.4 2020-10-22 13:27:18 +03:00
Marko Mäkelä
e3d692aa09 Merge 10.2 into 10.3 2020-10-22 08:26:28 +03:00
Marko Mäkelä
620ea816ad Merge 10.1 into 10.2 2020-10-21 14:02:04 +03:00
Marko Mäkelä
c7552969d0 MDEV-23999 Potential stack overflow in InnoDB fulltext search
fts_query_t::nested_sub_exp: Keep track of nested
fts_ast_visit_sub_exp() calls.

fts_ast_visit_sub_exp(): Return DB_OUT_OF_MEMORY if the
maximum recursion depth is exceeded.

This is motivated by a change in MySQL 5.6.50:
mysql/mysql-server@e2a46b4834
Bug #29929684 USING MANY NESTED ARGUMENTS WITH BOOLEAN FTS CAN LEAD
TO TERMINATE SERVER
2020-10-21 10:04:44 +03:00
Thirunarayanan Balathandayuthapani
6504d3d229 MDEV-23722 InnoDB: Assertion: result != FTS_INVALID in fts_trx_row_get_new_state
Marking of deletion of row in fts index happens twice in
self-referential foreign key relation. So while performing
referential checks of foreign key, InnoDB can avoid updating
of fts index if the foreign key has self-referential relationship.

Reviewed-by: Marko Mäkelä
2020-10-08 19:41:03 +05:30
Marko Mäkelä
f347b3e0e6 Merge 10.3 into 10.4 2020-07-02 07:39:33 +03:00
Marko Mäkelä
1df1a63924 Merge 10.2 into 10.3 2020-07-02 06:17:51 +03:00