Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed,
we are observing sporadic hangs during the execution of the
RESET MASTER statement. We are hoping to fix the hangs with these
changes, but due to the rather infrequent occurrence of the hangs
and our inability to reliably reproduce the hangs, we cannot be
sure of this.
What we do know is that innodb_force_recovery=2 (or a larger setting)
will prevent srv_master_callback (the former srv_master_thread) from
running. In that mode, periodic log flushes would never occur and
RESET MASTER could hang indefinitely. That is demonstrated by the new
test case that was developed by Andrei Elkin. We fix this case by
implementing a special case for it.
This also includes some code cleanup and renames of misleadingly
named code. The interface has nothing to do with log checkpoints in
the storage engine; it is only about requesting log writes to be
persistent.
handlerton::commit_checkpoint_request,
commit_checkpoint_notify_ha(): Remove the unused parameter hton.
log_requests.start: Replaces pending_checkpoint_list.
log_requests.end: Replaces pending_checkpoint_list_end.
log_requests.mutex: Replaces pending_checkpoint_mutex.
log_flush_notify_and_unlock(), log_flush_notify(): Replaces
innobase_mysql_log_notify(). The new implementation should be
functionally equivalent to the old one.
innodb_log_flush_request(): Replaces innobase_checkpoint_request().
Implement a fast path for common cases, and reduce the mutex hold time.
POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha()
for the current request if it is already satisfied, as well as invoke
log_flush_notify_and_unlock() for any satisfied requests.
log_write(): Invoke log_flush_notify() when the write is already durable.
This was missing WITH_PMEM when the log is in persistent memory.
Reviewed by: Vladislav Vaintroub
The problem was that the CONNECT engine is trying to open the .frm file
during drop_table(), which the code did not take into account.
Fixed by adding the HA_REUSES_FILE_NAMES table flag to CONNECT.
Other things:
- Fixed a wrong test of HA_REUSE_FILE_NAMES of in mysql_alter_table()
(Comment was correct, no the code)
- Added a test in the connect engine that if the .frm it tries to use in
delete is not made for connect, it will generate an error instead of
crash.
This feature adds the functionality of ignorability for indexes.
Indexes are not ignored be default.
To control index ignorability explicitly for a new index,
use IGNORE or NOT IGNORE as part of the index definition for
CREATE TABLE, CREATE INDEX, or ALTER TABLE.
Primary keys (explicit or implicit) cannot be made ignorable.
The table INFORMATION_SCHEMA.STATISTICS get a new column named IGNORED that
would store whether an index needs to be ignored or not.
When doing a truncate on an Innodb under lock tables, InnoDB would rename
the old table to #sql-... and recreate a new 't1' table. The table lock
would still be on the #sql-table.
When doing ALTER TABLE, Innodb would do the changes on the #sql table
(which would disappear on close).
When the SQL layer, as part of inline alter table, would close the
original t1 table (#sql in InnoDB) and then reopen the t1 table, Innodb
would notice that this does not match it's own (old) t1 table and
generate an error.
Fixed by adding code in truncate table that if we are under lock tables
and truncating an InnoDB table, we would close, reopen and lock the
table after truncate. This will remove the #sql table and ensure that
lock tables is using the new empty table.
Reviewer: Marko Mäkelä
Ever since commit 007f68c37f,
ALTER TABLE no longer invokes handler::open() after
handler::commit_inplace_alter_table().
ha_innobase::reload_statistics(): Reload or recompute statistics
after ALTER TABLE.
innodb_notify_tabledef_changed(): A new function to invoke
ha_innobase::reload_statistics().
handlerton::notify_tabledef_changed(): Add the parameter handler*
so that ha_innobase::reload_statistics() can be invoked.
ha_partition::notify_tabledef_changed(),
partition_notify_tabledef_changed(): Pass through the call
to any partitions or subpartitions.
This is based on code that was supplied by Monty.
Cause: no table->update_handler cloned at the moment of
vers_insert_history_row(). update_handler is needed because there
can't be several inited indexes at once in the same handler. First
index is inited by QUICK_RANGE_SELECT::reset(). Then when history row
is inserted check_duplicate_long_entry_key() is done and it requires
another index.
The idea of this fix is that it's enough to prevent the
next_auto_inc_val from incrementing if an error, to fix this problem
and also the MDEV-17333.
So this patch basically reverts the existing fix to the MDEV-17333.
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
MDEV-21953 deadlock between BACKUP STAGE BLOCK_COMMIT and parallel
replication
Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking
before log_and_order.
The original problem for MDEV-21953 was that while a thread was waiting in
for another threads to commit in 'log_and_order', it had the
MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the
MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks.
This causes a deadlock as the waited-for thread can never get past the
MDL_BACKUP_COMMIT lock in ha_commit_trans.
The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while
a thread is waiting for other 'previous' threads to commit. This ensures
that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which
ensures that there are no deadlocks anymore.
An instant ADD/DROP/reorder column could create a dummy table
object with the wrong ROW_FORMAT when innodb_default_row_format
was changed between CREATE TABLE and ALTER TABLE.
prepare_inplace_alter_table_dict(): If we had promised that
ALGORITHM=INPLACE is supported, we must preserve the ROW_FORMAT.
dict_table_t::prepare_instant(): Add debug assertions to catch
ROW_FORMAT mismatch.
The rest of the changes are related to adding
Alter_inplace_info::inplace_supported to cache the return value of
handler::check_if_supported_inplace_alter().
The issue was:
T1, a parallel slave worker thread, is waiting for another worker thread to
commit. While waiting, it has the MDL_BACKUP_COMMIT lock.
T2, working for mariabackup, is doing BACKUP STAGE BLOCK_COMMIT and blocks
all commits.
This causes a deadlock as the thread T1 is waiting for can't commit.
Fixed by moving locking of MDL_BACKUP_COMMIT from ha_commit_trans() to
commit_one_phase_2()
Other things:
- Added a new argument to ha_comit_one_phase() to signal if the
transaction was a write transaction.
- Ensured that ha_maria::implicit_commit() is always called under
MDL_BACKUP_COMMIT. This code is not needed in 10.5
- Ensure that MDL_Request values 'type' and 'ticket' are always
initialized. This makes it easier to check the state of the MDL_Request.
- Moved thd->store_globals() earlier in handle_rpl_parallel_thread() as
thd->init_for_queries() could use a MDL that could crash if store_globals
where not called.
- Don't call ha_enable_transactions() in THD::init_for_queries() as this
is both slow (uses MDL locks) and not needed.
first step in moving drop table out of the handler.
todo: other methods that don't need an open table
for now hton->drop_table is optional, for backward compatibility
reasons
When using field_conv(), which is called in case of field1=field2 copy in
fill_records(), full varstring's was copied, including unitialized bytes.
This caused valgrind to compilain about usage of unitialized bytes when
using Aria static length records.
Fixed by not using memcpy when copying varstrings but instead just copy
the real bytes.
When converting a table (test.s3_table) from S3 to another engine, the
following will be logged to the binary log:
DROP TABLE IF EXISTS test.t1;
CREATE OR REPLACE TABLE test.t1 (...) ENGINE=new_engine
INSERT rows to test.t1 in binary-row-log-format
The bug is that the above statements are logged one by one to the binary
log. This means that a fast slave, configured to use the same S3 storage
as the master, would be able to execute the DROP and CREATE from the
binary log before the master has finished the ALTER TABLE.
In this case the slave would ignore the DROP (as it's on a S3 table) but
it will stop on CREATE of the local tale, as the table is still exists in
S3. The REPLACE part will be ignored by the slave as it can't touch the
S3 table.
The fix is to ensure that all the above statements is written to binary
log AFTER the table has been deleted from S3.
MCOL-3875 Columnstore write cache
The main change is to change thr_lock function get_status to
return a value that indicates we have to abort the lock.
Other thing:
- Made start_bulk_insert() and end_bulk_insert() protected so that the
insert cache can use these
The used code is largely based on code from Tencent
The problem is that in some rare cases there may be a conflict between .frm
files and the files in the storage engine. In this case the DROP TABLE
was not able to properly drop the table.
Some MariaDB/MySQL forks has solved this by adding a FORCE option to
DROP TABLE. After some discussion among MariaDB developers, we concluded
that users expects that DROP TABLE should always work, even if the
table would not be consistent. There should not be a need to use a
separate keyword to ensure that the table is really deleted.
The used solution is:
- If a .frm table doesn't exists, try dropping the table from all storage
engines.
- If the .frm table exists but the table does not exist in the engine
try dropping the table from all storage engines.
- Update storage engines using many table files (.CVS, MyISAM, Aria) to
succeed with the drop even if some of the files are missing.
- Add HTON_AUTOMATIC_DELETE_TABLE to handlerton's where delete_table()
is not needed and always succeed. This is used by ha_delete_table_force()
to know which handlers to ignore when trying to drop a table without
a .frm file.
The disadvantage of this solution is that a DROP TABLE on a non existing
table will be a bit slower as we have to ask all active storage engines
if they know anything about the table.
Other things:
- Added a new flag MY_IGNORE_ENOENT to my_delete() to not give an error
if the file doesn't exist. This simplifies some of the code.
- Don't clear thd->error in ha_delete_table() if there was an active
error. This is a bug fix.
- handler::delete_table() will not abort if first file doesn't exists.
This is bug fix to handle the case when a drop table was aborted in
the middle.
- Cleaned up mysql_rm_table_no_locks() to ensure that if_exists uses
same code path as when it's not used.
- Use non_existing_Table_error() to detect if table didn't exists.
Old code used different errors tests in different position.
- Table_triggers_list::drop_all_triggers() now drops trigger file if
it can't be parsed instead of leaving it hanging around (bug fix)
- InnoDB doesn't anymore print error about .frm file out of sync with
InnoDB directory if .frm file does not exists. This change was required
to be able to try to drop an InnoDB file when .frm doesn't exists.
- Fixed bug in mi_delete_table() where the .MYD file would not be dropped
if the .MYI file didn't exists.
- Fixed memory leak in Mroonga when deleting non existing table
- Fixed memory leak in Connect when deleting non existing table
Bugs fixed introduced by the original version of this commit:
MDEV-22826 Presence of Spider prevents tables from being force-deleted from
other engines
Apply this patch from Percona Server (amended for 10.5):
commit cd7201514fee78aaf7d3eb2b28d2573c76f53b84
Author: Laurynas Biveinis <laurynas.biveinis@gmail.com>
Date: Tue Nov 14 06:34:19 2017 +0200
Fix bug 1704195 / 87065 / TDB-83 (Stop ANALYZE TABLE from flushing table definition cache)
Make ANALYZE TABLE stop flushing affected tables from the table
definition cache, which has the effect of not blocking any subsequent
new queries involving the table if there's a parallel long-running
query:
- new table flag HA_ONLINE_ANALYZE, return it for InnoDB and TokuDB
tables;
- in mysql_admin_table, if we are performing ANALYZE TABLE, and the
table flag is set, do not remove the table from the table
definition cache, do not invalidate query cache;
- in partitioning handler, refresh the query optimizer statistics
after ANALYZE if the underlying handler supports HA_ONLINE_ANALYZE;
- new testcases main.percona_nonflushing_analyze_debug,
parts.percona_nonflushing_abalyze_debug and a supporting debug sync
point.
For TokuDB, this change exposes bug TDB-83 (Index cardinality stats
updated for handler::info(HA_STATUS_CONST), not often enough for
tokudb_cardinality_scale_percent). TokuDB may return different
rec_per_key values depending on dynamic variable
tokudb_cardinality_scale_percent value. The server does not have a way
of knowing that changing this variable invalidates the previous
rec_per_key values in any opened table shares, and so does not call
info(HA_STATUS_CONST) again. Fix by updating rec_per_key for both
HA_STATUS_CONST and HA_STATUS_VARIABLE. This also forces a re-record
of tokudb.bugs.db756_card_part_hash_1_pick, with the new output
seeming to be more correct.