Add variable rocksdb_remove_mariabackup_checkpoint.
If set, it will remove $rocksdb_datadir/mariabackup-checkpoint directory.
The variable is to be used by exclusively by mariabackup,
to remove temporary checkpoints.
InnoDB insisted on closing the file handle before renaming a file.
Renaming a file should never be a problem on POSIX systems. Also on
Windows it should work if the file was opened in FILE_SHARE_DELETE
mode.
fil_space_t::stop_ios: Remove. We no longer need to stop file access
during rename operations.
fil_mutex_enter_and_prepare_for_io(): Remove the wait for stop_ios.
fil_rename_tablespace(): Remove the retry logic; do not close the
file handle. Remove the unused fault injection that was added along
with the DATA DIRECTORY functionality (MySQL WL#5980).
os_file_create_simple_func(), os_file_create_func(),
os_file_create_simple_no_error_handling_func(): Include FILE_SHARE_DELETE
in the share_mode. (We will still prevent multiple InnoDB instances
from using the same files by not setting FILE_SHARE_WRITE.)
ha_innobase::optimize(): If both innodb_defragment and
innodb_optimize_fulltext_only are at their default settings (OFF),
fall back to ALTER TABLE. Else process one or both options.
ha_innobase::set_partition_owner_stats(): Remove (unused function).
ha_innobase::ha_partition_stats: Remove (the variable is never read).
Remove unused ut_timer functions.
After a failed ADD INDEX, dict_index_remove_from_cache_low()
could iterate the index fields and dereference a freed virtual
column object when trying to remove the index from the v_indexes
of the virtual column.
This regression was caused by a merge of
MDEV-16119 InnoDB lock->index refers to a freed object.
ha_innobase_inplace_ctx::clear_added_indexes(): Detach the
indexes of uncommitted indexes from virtual columns, so that
the iteration in dict_index_remove_from_cache_low() can be avoided.
ha_innobase::prepare_inplace_alter_table(): Ignore uncommitted
corrupted indexes when rejecting ALTER TABLE. (This minor bug was
revealed by the extension of the test case.)
dict_index_t::detach_columns(): Detach an index from virtual columns.
Invoked by both dict_index_remove_from_cache_low() and
ha_innobase_inplace_ctx::clear_added_indexes().
dict_col_t::detach(const dict_index_t& index): Detach an index from
a column.
dict_col_t::is_virtual(): Replaces dict_col_is_virtual().
dict_index_t::has_virtual(): Replaces dict_index_has_virtual().
log_crypt_101_read_block(): Mimic MariaDB 10.1, and use the first
encryption key if the key for the checkpoint cannot be found.
Redo log encryption key rotation was ultimately disabled
in MDEV-9422 (MariaDB 10.1.13) due to design issues. So, from
MariaDB 10.1.13 onwards only one log encryption key should matter.
recv_log_format_0_recover(): Add the parameter 'bool crypt'.
Indicate when the log cannot be decrypted for upgrade, instead of
making a possibly false claim that the log requires crash recovery.
init_crypt_key(): Remove extra space from a message.
Also fixes MDEV-14727, MDEV-14491
InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0
by replacing the flawed wait logic in dict_index_remove_from_cache_low().
On DISCARD TABLESPACE, there is no need to drop the adaptive hash index.
We must drop it on IMPORT TABLESPACE, and eventually on DROP TABLE or
DROP INDEX. As long as the dict_index_t object remains in the cache
and the table remains inaccessible, the adaptive hash index entries
to orphaned pages would not do any harm. They would be dropped when
buffer pool pages are reused for something else.
btr_search_drop_page_hash_when_freed(), buf_LRU_drop_page_hash_batch():
Remove the parameter zip_size, and pass 0 to buf_page_get_gen().
buf_page_get_gen(): Ignore zip_size if mode==BUF_PEEK_IF_IN_POOL.
buf_LRU_drop_page_hash_for_tablespace(): Drop the adaptive hash index
even if the tablespace is inaccessible.
buf_LRU_drop_page_hash_for_tablespace(): New global function, to drop
the adaptive hash index.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Remove the parameter drop_ahi.
dict_index_remove_from_cache_low(): Actively drop the adaptive hash index
if entries exist. This should prevent InnoDB hangs on DROP TABLE or
DROP INDEX.
row_import_for_mysql(): Drop any adaptive hash index entries for the table.
row_drop_table_for_mysql(): Drop any adaptive hash index for the table,
except if the table resides in the system tablespace. (DISCARD TABLESPACE
does not apply to the system tablespace, and we do no want to drop the
adaptive hash index for other tables than the one that is being dropped.)
row_truncate_table_for_mysql(): Drop any adaptive hash index entries for
the table, except if the table resides in the system tablespace.
When the transaction isolation level is SERIALIZABLE, or when
a locking read is performed in the REPEATABLE READ isolation level,
InnoDB must lock delete-marked records in order to prevent another
transaction from inserting something.
However, at READ UNCOMMITTED or READ COMMITTED isolation level or
when the parameter innodb_locks_unsafe_for_binlog is set, the
repeatability of the reads does not matter, and there is no need
to lock any records.
row_search_for_mysql(): Skip locks on delete-marked committed records
upfront, instead of invoking row_unlock_for_mysql() afterwards.
The unlocking never worked for secondary index records.
infos[]: Allocate enough entries to accommodate all keys from both
checkpoint pages.
infos_used: The size of infos[].
get_crypt_info(): Merge to the only caller, log_crypt_101_read_block().
log_crypt_101_read_block(): Do not validate the log block checksum,
because it will not be valid when upgrading from MariaDB 10.1.10.
Instead, check that the encryption key exists.
log_crypt_101_read_checkpoint(): Append to infos[] instead of overwriting.
thd_destructor_proxy(): Ensure that purge actually exits,
like the logic should have been ever since MDEV-14080.
srv_purge_shutdown(): A new function to wait for the
purge coordinator to exit. Before exiting, the
purge coordinator will ensure that all purge workers have exited.
This is the MariaDB 10.2 version of the patch.
field_store_string(): Simplify the code.
field_store_index_name(): Remove, and use field_store_string()
instead. Starting with MariaDB 10.2.2, there is the predicate
dict_index_t::is_committed(), and dict_index_t::name never
contains the magic byte 0xff.
Correct some comments to refer to TEMP_INDEX_PREFIX_STR.
i_s_cmp_per_index_fill_low(): Use the appropriate value NULL to
identify that an index was not found. Check that storing each
column value succeeded.
i_s_innodb_buffer_page_fill(), i_s_innodb_buf_page_lru_fill():
Only invoke Field::set_notnull() if the index was found.
(This fixes the bug.)
i_s_dict_fill_sys_indexes(): Adjust the index->name that was
directly loaded from SYS_INDEXES.NAME (which can start with
the 0xff byte). This was the only function that depended
on the translation in field_store_index_name().
MDEV-10130 Assertion `share->in_trans == 0' failed in storage/maria/ma_close.c
MDEV-10378 Assertion `trn' failed in virtual int ha_maria::start_stmt
The problem was that maria_handler->trn was not properly reset
at commit/rollback and ha_maria::exernal_lock() could get confused
because.
There was some old code in ha_maria::implicit_commit() that tried
to take care of this, but it was not bullet proof.
Fixed by adding list of all tables that is part of the maria transaction to
TRN.
A nice side effect was of the fix is that loops in
ha_maria::implict_commit() got to be much simpler.
Other things:
- Fixed a bug in mysql_admin_table() where argument open_for_modify
was wrongly reset for the next table in the chain
- rollback admin command also in case of fatal error.
- Split _ma_set_trn_for_table() to three version to simplify code
and debugging.
- Several new asserts to detect the original problem (that file was
not properly removed from trn before calling ma_close())
Step#1: RocksDB files require a special #define when they are compiled
with valgrind. Without that, valgrind fails with an 'unimplemented syscall'
error for fcntl call.
The failures with valgrind occur as a result of Spider sometimes using the
wrong transaction for operations in background threads that send requests to
the data nodes. The use of the wrong transaction caused the networking to the
data nodes to use the wrong thread in some cases. Valgrind eventually
detects this when such a thread is destroyed before it is used to disconnect
from the data node by that wrong transaction when it is freed.
I have fixed the problem by correcting the transaction used in each of these
cases.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit afe5a51 on branch 10.2
The failures with valgrind occur as a result of Spider sometimes using the
wrong transaction for operations in background threads that send requests to
the data nodes. The use of the wrong transaction caused the networking to the
data nodes to use the wrong thread in some cases. Valgrind eventually
detects this when such a thread is destroyed before it is used to disconnect
from the data node by that wrong transaction when it is freed.
I have fixed the problem by correcting the transaction used in each of these
cases.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Merged:
Commit 4d576d9 on branch bb-10.3-MDEV-12900
The crash occurs when a thread that is closing its connection attempts to
access Spider transaction information when another thread has freed that memory
while processing Spider plugin deinit. This occurs because Spider does not
adjust the plugin's reference count when it sets a transaction information
pointer for the plugin.
The fix I implemented changes the way Spider sets the transaction information
pointer to use thd_set_ha_data() so that Spider's plugin reference counter is
adjusted as well.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Merged From:
Commit ab9d420 on branch 10.2
Fix two issues:
1. Rdb_ddl_manager::rename() loses the value of m_hidden_pk_val. new
object used to get 0, which means "not loaded from the db yet".
2. ha_rocksdb::load_hidden_pk_value() uses current transaction (and its
snapshot) when loading hidden PK value from disk. This may cause it to
load an out-of-date value.
The crash occurs when a thread that is closing its connection attempts to
access Spider transaction information when another thread has freed that memory
while processing Spider plugin deinit. This occurs because Spider does not
adjust the plugin's reference count when it sets a transaction information
pointer for the plugin.
The fix I implemented changes the way Spider sets the transaction information
pointer to use thd_set_ha_data() so that Spider's plugin reference counter is
adjusted as well.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Merged From:
Commit eabfadc on branch bb-10.3-MDEV-7914
jemalloc > 5.0.0 doesn't like to be linked with
a dlopen-ed module.
Don't link tokudb with jemalloc on Fedora 28,
LD_PRELOAD it instead with mysqld_safe
and with systemd.
srv_purge_coordinator_thread(): Wait for all purge worker threads
to actually exit. An analysis of a core dump of a hung 10.3 server
revealed that one srv_worker_thread did not exit, even though the
purge coordinator had exited. This caused kill_server_thread and
mysqld_main to wait indefinitely. The main InnoDB shutdown was
never called, because unireg_end() was never called.
Problem was that if copy_data_between_tables() didn't do proper
clean up in case of failures:
- copy object was not properly freed
- end_bulk_insert() was not called
- mysql_trans_prepare_alter_copy_data() set THD->transaction.on to
false which was not properly restored
The last part caused a crash in Aria as Aria depends on that THD
is correct.
Other things:
- Reset info->switched_transactional after usage (safety)
- Reset bulk_insert_single_undo (safety)
RocksDB now supports "iterator bounds" which are min and max keys
that an iterator is interested in.
Iterator initialization function doesn't copy the keys, though, it keeps
pointers to them.
So if the buffer space for the keys is used for another iterator (the one
for checking for UNIUQE constraint violation in ha_rocksdb::ha_update_row)
then one can get incorrect query result.
Fixed by using a separate buffer for iterator bounds in the unique constraint
violation check.
Problem was that we the bitmap needs to be flushed before disabling
logging of redo entires, as writing the bitmap to disk by
background checkpoint may cause redo entries.
Merge a test case and a code change from MySQL 5.7.22.
There was no commit message, but a test case was included.
d3ec326bcd
There is no Bug 25899959 mentioned in the MySQL 8.0.11 history.
Based on the number, it should have been filed before August 2017.
Maybe it was initially fixed in a not-yet-public MySQL 9.0 branch?
The code change differs from MySQL 5.7, because the mbminmaxlen
were split in MariaDB in MDEV-7049.
MariaDB has a scaled-down version of the test so we need to set
@@rocksdb_max_row_locks lower to trigger the desired error
(didn't catch this on test BB run because this test is marked as "big")
Bloom filter is only used when reading the data from disk. If the data
happens to be still in the memtable, bloomfilter wont be used.
Stabilize the testcase by making sure the data is on disk before we
read it.
trx_purge_stop(): Release purge_sys->latch before attempting to
wake up the purge threads, so that they can actually wake up.
This is a regression of commit a13a636c74
which attempted to fix MDEV-11802 by ensuring that srv_purge_wakeup()
will actually wait until all purge threads wake up.
Due to the purge_sys->latch, the threads might never wake up,
because some purge threads could end up waiting for purge_sys->latch
in trx_undo_prev_version_build() while holding dict_operation_lock
in shared mode. This in turn would block any DDL operations, the
InnoDB master thread, and InnoDB shutdown.