InnoDB insisted on closing the file handle before renaming a file.
Renaming a file should never be a problem on POSIX systems. Also on
Windows it should work if the file was opened in FILE_SHARE_DELETE
mode.
fil_space_t::stop_ios: Remove. We no longer need to stop file access
during rename operations.
fil_mutex_enter_and_prepare_for_io(): Remove the wait for stop_ios.
fil_rename_tablespace(): Remove the retry logic; do not close the
file handle. Remove the unused fault injection that was added along
with the DATA DIRECTORY functionality (MySQL WL#5980).
os_file_create_simple_func(), os_file_create_func(),
os_file_create_simple_no_error_handling_func(): Include FILE_SHARE_DELETE
in the share_mode. (We will still prevent multiple InnoDB instances
from using the same files by not setting FILE_SHARE_WRITE.)
ha_innobase::optimize(): If both innodb_defragment and
innodb_optimize_fulltext_only are at their default settings (OFF),
fall back to ALTER TABLE. Else process one or both options.
ha_innobase::set_partition_owner_stats(): Remove (unused function).
ha_innobase::ha_partition_stats: Remove (the variable is never read).
Remove unused ut_timer functions.
Also fixes MDEV-14727, MDEV-14491
InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0
by replacing the flawed wait logic in dict_index_remove_from_cache_low().
On DISCARD TABLESPACE, there is no need to drop the adaptive hash index.
We must drop it on IMPORT TABLESPACE, and eventually on DROP TABLE or
DROP INDEX. As long as the dict_index_t object remains in the cache
and the table remains inaccessible, the adaptive hash index entries
to orphaned pages would not do any harm. They would be dropped when
buffer pool pages are reused for something else.
btr_search_drop_page_hash_when_freed(), buf_LRU_drop_page_hash_batch():
Remove the parameter zip_size, and pass 0 to buf_page_get_gen().
buf_page_get_gen(): Ignore zip_size if mode==BUF_PEEK_IF_IN_POOL.
buf_LRU_drop_page_hash_for_tablespace(): Drop the adaptive hash index
even if the tablespace is inaccessible.
buf_LRU_drop_page_hash_for_tablespace(): New global function, to drop
the adaptive hash index.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Remove the parameter drop_ahi.
dict_index_remove_from_cache_low(): Actively drop the adaptive hash index
if entries exist. This should prevent InnoDB hangs on DROP TABLE or
DROP INDEX.
row_import_for_mysql(): Drop any adaptive hash index entries for the table.
row_drop_table_for_mysql(): Drop any adaptive hash index for the table,
except if the table resides in the system tablespace. (DISCARD TABLESPACE
does not apply to the system tablespace, and we do no want to drop the
adaptive hash index for other tables than the one that is being dropped.)
row_truncate_table_for_mysql(): Drop any adaptive hash index entries for
the table, except if the table resides in the system tablespace.
When the transaction isolation level is SERIALIZABLE, or when
a locking read is performed in the REPEATABLE READ isolation level,
InnoDB must lock delete-marked records in order to prevent another
transaction from inserting something.
However, at READ UNCOMMITTED or READ COMMITTED isolation level or
when the parameter innodb_locks_unsafe_for_binlog is set, the
repeatability of the reads does not matter, and there is no need
to lock any records.
row_search_for_mysql(): Skip locks on delete-marked committed records
upfront, instead of invoking row_unlock_for_mysql() afterwards.
The unlocking never worked for secondary index records.
Problem:
When FTS index is added into a table which doesn't have 'FTS_DOC_ID'
column, Innodb rebuilds table to add column 'FTS_DOC_ID'. when this FTS
index is dropped from this table. Innodb doesn't not rebuild table to
remove 'FTS_DOC_ID' column and deletes FTS index auxiliary tables.
But it doesn't delete FTS common auxiliary tables.
Later when the database having this table is renamed, FTS auxiliary
tables are not renamed because table's flags2 (dict_table_t.flags2)
has been resetted for DICT_TF2_FTS flag during FTS index drop operation.
Now when we drop old database, it leads to an assert.
Fix:
During renaming of FTS auxiliary tables, ORed a condition to check if
table has DICT_TF2_FTS_HAS_DOC_ID flag set.
RB: 18769
Reviewed by : Jimmy.Yang@oracle.com
Problem:
=======
Multiple insert statement in table contains FULLTEXT KEY and a
FTS_DOC_ID column aborts the server if the FTS_DOC_ID exceeds
FTS_DOC_ID_MAX_STEP.
Solution:
========
Remove the exception for first committed insert statement.
Reviewed-by: Jimmy Yang<jimmy.yang@oracle.com>
RB: 18023
When Oracle fixed MDEV-13899 in their own way, they moved the
condition to the only caller of PageConverter::update_records().
Thus, the merge of 5.6.40 into MariaDB added a redundant condition.
PageConverter::update_records(): Move the page_is_leaf() condition
to the only caller, PageConverter::update_index_page().
The problem is hard to repeat, and I failed to create a deterministic
test case. Online index creation creates stubs for to-be-created indexes.
If index creation fails, we could remove these stubs while locks exist
in the indexes. (This would require that the index creation was completed,
and a concurrent DML operation acquired a lock on a record in the
uncommitted index. If a duplicate key error occurs in an uncommitted
index, the error will be reported for the CREATE UNIQUE INDEX, not for
the DML operation that tried to insert the duplicate.)
dict_table_try_drop_aborted(), row_merge_drop_indexes(): If transactional
locks exist on the table, keep the table->indexes intact.
ha_innobase::commit_inplace_alter_table(): Defer the freeing of ctx->trx
until after the operation has been successfully committed. In this way,
rollback on a partitioned table will be possible.
rollback_inplace_alter_table(): Handle ctx->new_table == NULL when
ctx->trx != NULL.
If the tablespace is dropped or truncated after the
space->is_stopping() check in fil_crypt_get_page_throttle_func(),
we would proceed to request the page, and eventually report a fatal
error.
buf_page_get_gen(): Do not retry reading if mode==BUF_GET_POSSIBLY_FREED.
lock_rec_block_validate(): Be prepared for a NULL return value when
invoking buf_page_get_gen() with mode=BUF_GET_POSSIBLY_FREED.
Issue:
------
Prefix for externally stored columns were being stored in online_log when a
table is altered and alter causes table to be rebuilt. Space in online_log is
limited and if length of prefix of externally stored columns is very big, then
it is being written to online log without making sure if it fits. This leads to
memory corruption.
Fix:
----
After fix for Bug#16544143, there is no need to store prefixes of externally
stored columnd in online_log. Thus remove the code which stores column prefixes
for externally stored columns. Also, before writing anything on online_log,
make sure it fits to available memory to avoid memory corruption.
Read RB page for more details.
Reviewed-by: Annamalai Gurusami <annamalai.gurusami@oracle.com>
RB: 18239
In commit e7f4e61f6e
the call fil_flush_file_spaces(FIL_LOG) is necessary.
Tablespaces will be flushed as part of the redo log
checkpoint, but the redo log will not necessarily
be flushed, depending on innodb_flush_method.
These test can sporadically show mutex deadlock warnings between LOCK_wsrep_thd
and LOCK_thd_data mutexes. This means that these mutexes can be locked in opposite
order by different threads, and thus result in deadlock situation.
To fix such issue, the locking policy of these mutexes should be revised and
enforced to be uniform. However, a quick code review shows that the number of
lock/unlock operations for these mutexes combined is between 100-200, and all these
mutex invocations should be checked/fixed.
On the other hand, it turns out that LOCK_wsrep_thd is used for protecting access to
wsrep variables of THD (wsrep_conflict_state, wsrep_query_state), whereas LOCK_thd_data
protects query, db and mysys_var variables in THD. Extending LOCK_thd_data to protect
also wsrep variables looks like a viable solution, as there should not be a use case
where separate threads need simultaneous access to wsrep variables and THD data variables.
In this commit LOCK_wsrep_thd mutex is refactored to be replaced by LOCK_thd_data.
By bluntly replacing LOCK_wsrep_thd by LOCK_thd_data, will result in double locking
of LOCK_thd_data, and some adjustements have been performed to fix such situations.
dict_load_table_low(): When flagging an error, assign *table = NULL.
Failure to do so could cause a crash if an error was flagged when
accessing INFORMATION_SCHEMA.INNODB_SYS_TABLES.
While the test case crashes a MariaDB 10.2 debug build only,
let us apply the fix to the earliest applicable MariaDB series (10.0)
to avoid any data corruption on a table-rebuilding ALTER TABLE
using ALGORITHM=INPLACE.
innobase_create_key_defs(): Use altered_table->s->primary_key
when a new primary key is being created.
disable online alter add primary key for innodb, if the
table is opened/locked more than once in the current connection
(see assert in ha_innobase::add_index())
fil_crypt_rotate_pages
If tablespace is marked as stopping stop also page rotation
fil_crypt_flush_space
If tablespace is marked as stopping do not try to read
page 0 and write it back.
Problem was that if tablespace was encrypted we try to copy
also page 0 from read buffer to write buffer that are in
that case the same memory area.
fil_iterate
When tablespace is encrypted or compressed its
first page (i.e. page 0) is not encrypted or
compressed and there is no need to copy buffer.
If innodb_fast_shutdown<2, all transactions of active connections
will be rolled back on shutdown. This can take a long time, and
the systemd shutdown timeout should be extended during the wait.
logs_empty_and_mark_files_at_shutdown(): Extend the timeout when
waiting for other threads to complete.
This reverts commit 76ec37f522.
This behaviour change will be done separately in:
MDEV-15832 With innodb_fast_shutdown=3, skip the rollback
of connected transactions
fil_crypt_read_crypt_data(): Do not attempt to read the tablespace
if the file is unaccessible due to a pending DDL operation, such as
renaming the file or DROP TABLE or TRUNCATE TABLE. This is only
reducing the probability of the race condition, not completely
preventing it.