The function ibuf_remove_free_page() may be called while the caller
is holding several mutexes or rw-locks. Because of this, this
housekeeping loop may cause performance glitches for operations that
involve tables that are stored in the InnoDB system tablespace.
Also deadlocks might be possible.
The worst impact of all is that due to the mutexes being held, calls to
log_free_check() had to be skipped during this housekeeping.
This means that the cyclic InnoDB redo log may be overwritten.
If the system crashes during this, it would be unable to recover.
The entry point to the problematic code is ibuf_free_excess_pages().
It would make sense to call it before acquiring any mutexes or rw-locks,
in any 'pessimistic' operation that involves the system tablespace.
fseg_create_general(), fseg_alloc_free_page_general(): Do not call
ibuf_free_excess_pages() while potentially holding some latches.
ibuf_remove_free_page(): Do call log_free_check(), like every operation
that is about to generate redo log should do.
ibuf_free_excess_pages(): Remove some assertions that are replaced
by stricter assertions in the log_free_check() that is now called by
ibuf_remove_free_page().
row_ins_sec_index_entry(), row_undo_ins_remove_sec_low(),
row_undo_mod_del_mark_or_remove_sec_low(),
row_undo_mod_del_unmark_sec_and_undo_update(): Call
ibuf_free_excess_pages() if the operation may involve allocating pages
and change buffering in the system tablespace.
This bug was a regression caused by MDEV-12698.
On non-leaf pages, the delete-mark flag in the node pointer records is
basically garbage. (Delete-marking only makes sense at the leaf level
anyway. The purpose of the delete-mark is to tell MVCC, locking and purge
that a leaf-level record does not exist in the READ UNCOMMITTED view,
but it used to exist.)
Node pointer records and non-leaf pages are glue that attaches multiple
leaf pages to an index. This glue is supposed to be transparent to the
transactional layer.
When a page is split, InnoDB creates a node pointer record out of the
child page record that the cursor is positioned on. The node pointer record
for the parent page will be a copy of the child page record, amended with
the child page number. If the child page record happened to carry the
delete-mark flag, then the node pointer record would also carry this flag
(even though the flag makes no sense outside child pages).
(On a related note, for the first node pointer record in the first
node pointer page of each tree level, if the MIN_REC_FLAG is set,
the rest of the record contents (except the child page number)
is basically garbage. From this garbage you could deduce at which point
the child was originally split.)
page_scan_method_t: Replace with bool, as there are only 2 values.
dict_stats_scan_page(): Replace the parameter scan_method with is_leaf.
Ignore the bogus (garbage) delete-mark flag if !is_leaf.
- Add include/index_merge*. Upstream has different files than MariaDB,
use copies theirs, not ours.
- There was a prblem with running "DDL-like" commands with binlog=ON:
MariaDB sets binlog_format=STATEMENT for the duration of such command
to prevent RBR replication from catching (and replicating) updates to
system tables.
However, MyRocks tries to prevent any writes to MyRocks tables with
binlog_format!=ROW.
- Added exceptions for DDL-type commands (ANALYZE TABLE, OPTIMIZE TABLE)
- Added special handling for "LOCK TABLE(s) myrocks_table WRITE".
# ib_logfile0 expecting FOUND
-FOUND 3 /public|gossip/ in ib_logfile0
+FOUND 2 /public|gossip/ in ib_logfile0
The most plausible explanation for this difference
should be that the redo log payload grew was so big that
one of the strings (for writing the undo log record,
clustered index record, and secondary index record)
was written to ib_logfile1 instead of ib_logfile0.
Let us run the test with --innodb-log-files-in-group=1 so that
only a single log file will be used.
When MySQL 5.0.3 introduced InnoDB support for two-phase commit,
it also introduced the questionable logic to roll back XA PREPARE
transactions on startup when innodb_force_recovery is 1 or 2.
Remove this logic in order to avoid unwanted side effects when
innodb_force_recovery is being set for other reasons. That is,
XA PREPARE transactions will always remain in that state until
InnoDB receives an explicit XA ROLLBACK or XA COMMIT request
from the upper layer.
At the time the logic was introduced in MySQL 5.0.3, there already
was a startup parameter that is the preferred way of achieving
the behaviour: --tc-heuristic-recover=ROLLBACK.
The MySQL 5.6.36 merge (commit 0af9818240
in MariaDB Server 10.0.31, 10.1.24, 10.2.7) introduced a change from
Oracle:
Bug#25551311 BACKPORT BUG #23517560 REMOVE SPACE_ID RESTRICTION
FOR UNDO TABLESPACES
Some debug assertions in MariaDB 10.2 were still assuming that the
InnoDB undo tablespace IDs start from 1. With the above mentioned
change, the undo tablespace IDs must be contiguous and nonzero.
In key rotation, we must initialize unallocated but previously
initialized pages, so that if encryption is enabled on a table,
all clear-text data for the page will eventually be overwritten.
But we should not rotate keys on pages that were never allocated
after the data file was created.
According to the latching order rules, after acquiring the
tablespace latch, no page latches of previously allocated user pages
may be acquired. So, key rotation should check the page allocation
status after acquiring the page latch, not before. But, the latching
order rules also prohibit accessing pages that were not allocated first,
and then acquiring the tablespace latch. Such behaviour would indeed
result in a deadlock when running the following tests:
encryption.innodb_encryption-page-compression
encryption.innodb-checksum-algorithm
Because the key rotation is accessing potentially unallocated pages, it
cannot reliably check if these pages were allocated. It can only check
the page header. If the page number is zero, we can assume that the
page is unallocated.
fil_crypt_rotate_pages(): Skip pages that are known to be uninitialized.
fil_crypt_rotate_page(): Detect uninitialized pages by FIL_PAGE_OFFSET.
Page 0 is never encrypted, and on other pages that are initialized,
FIL_PAGE_OFFSET must contain the page number.
fil_crypt_is_page_uninitialized(): Remove. It suffices to check the
page number field in fil_crypt_rotate_page().
In key rotation, we must initialize unallocated but previously
initialized pages, so that if encryption is enabled on a table,
all clear-text data for the page will eventually be overwritten.
But we should not rotate keys on pages that were never allocated
after the data file was created.
According to the latching order rules, after acquiring the
tablespace latch, no page latches of previously allocated user pages
may be acquired. So, key rotation should check the page allocation
status after acquiring the page latch, not before. But, the latching
order rules also prohibit accessing pages that were not allocated first,
and then acquiring the tablespace latch. Such behaviour would indeed
result in a deadlock when running the following tests:
encryption.innodb_encryption-page-compression
encryption.innodb-checksum-algorithm
Because the key rotation is accessing potentially unallocated pages, it
cannot reliably check if these pages were allocated. It can only check
the page header. If the page number is zero, we can assume that the
page is unallocated.
fil_crypt_rotate_page(): Detect uninitialized pages by FIL_PAGE_OFFSET.
Page 0 is never encrypted, and on other pages that are initialized,
FIL_PAGE_OFFSET must contain the page number.
fil_crypt_is_page_uninitialized(): Remove. It suffices to check the
page number field in fil_crypt_rotate_page().
xdes_get_descriptor_const(): New function, to get read-only access to
the allocation descriptor.
fseg_page_is_free(): Only acquire a shared latch on the tablespace,
not an exclusive latch. Calculate the descriptor page address before
acquiring the tablespace latch. If the page number is out of bounds,
return without fetching any page. Access only one descriptor page.
fsp_page_is_free(), fsp_page_is_free_func(): Remove.
Use fseg_page_is_free() instead.
fsp_init_file_page(): Move the debug parameter into a separate function.
btr_validate_level(): Remove the unused variable "seg".
The parameter --innodb-sync-debug, which is disabled by default,
aims to find potential deadlocks in InnoDB.
When the parameter is enabled, lots of tests failed. Most of these
failures were due to bogus diagnostics. But, as part of this fix,
we are also fixing a bug in error handling code and removing dead
code, and fixing cases where an uninitialized mutex was being
locked and unlocked.
dict_create_foreign_constraints_low(): Remove an extraneous
mutex_exit() call that could cause corruption in an error handling
path. Also, do not unnecessarily acquire dict_foreign_err_mutex.
Its only purpose is to control concurrent access to
dict_foreign_err_file.
row_ins_foreign_trx_print(): Replace a redundant condition with a
debug assertion.
srv_dict_tmpfile, srv_dict_tmpfile_mutex: Remove. The
temporary file is never being written to or read from.
log_free_check(): Allow SYNC_FTS_CACHE (fts_cache_t::lock)
to be held.
ha_innobase::inplace_alter_table(), row_merge_insert_index_tuples():
Assert that no unexpected latches are being held.
sync_latch_meta_init(): Properly initialize dict_operation_lock_key
at SYNC_DICT_OPERATION. dict_sys->mutex is SYNC_DICT, and
the now-removed SRV_DICT_TMPFILE was wrongly registered at
SYNC_DICT_OPERATION.
buf_block_init(): Correctly register buf_block_t::debug_latch.
It was previously misleadingly reported as LATCH_ID_DICT_FOREIGN_ERR.
latch_level_t: Correct the relative latching order of
SYNC_IBUF_PESS_INSERT_MUTEX,SYNC_INDEX_TREE and
SYNC_FILE_FORMAT_TAG,SYNC_DICT_OPERATION to avoid bogus failures.
row_drop_table_for_mysql(): Avoid accessing btr_defragment_mutex
if the defragmentation thread has not been started. This is the
case during fts_drop_orphaned_tables() in recv_recovery_rollback_active().
fil_space_destroy_crypt_data(): Avoid acquiring fil_crypt_threads_mutex
when it is uninitialized. We may have created crypt_data before the
mutex was created, and the mutex creation would be skipped if
InnoDB startup failed or --innodb-read-only was specified.
InnoDB stopped generating the MLOG_INIT_FILE_PAGE record in
MySQL 5.7.5. Starting with MySQL 5.7.9 (which was imported to
MariaDB Server 10.2.2), the InnoDB redo log format tag prevents
crash recovery from old-format redo logs.
Remove the dead code for dealing with MLOG_INIT_FILE_PAGE.
The previous fix (commit dcdc1c6d09)
should have removed the assertion from log_close(), because every
caller that requires this assertion is already asserting that log
writes are allowed. When fil_names_clear() is called, it must be
able to write the MLOG_CHECKPOINT records. The purpose of the debug
variable recv_no_log_write is to prevent the creation of page-level
redo log records, or modifications to persistent data.
It allows to push conditions into derived with window functions not
only in the cases when the window specifications of these window
functions use the same partition, but also in the cases when the window
functions use partitions that share only some fields. In these
cases only the conditions over the common fields are pushed.
Add suppressions for the read and decompression errors.
This may be 10.3 specific and related to MDEV-13536 which increases
purge activity. But it does not hurt to suppress rarely occurring
and plausible error messages for this fault-injection test already in 10.2.
backup_release(): New function, refactored from backup_finish().
Release some resources that may have been acquired by backup_startup()
and should be released even after a failed operation.
xtrabackup_backup_low(): Refactored from xtrabackup_backup_func().
xtrabackup_backup_func(): Always call backup_release() after calling
backup_start().
The test mariabackup.incremental_backup revealed a memory leak
in have_queries_to_wait_for(). The problem is that
xb_mysql_query() is being invoked with bool use_result=true
but the result is not being freed by mysql_store_result().
There are similar leaks in other functions.
have_queries_to_wait_for(): Invoke mysql_free_result() to
clean up after the mysql_store_result() that was invoked
by xb_mysql_query().
select_incremental_lsn_from_history(): Plug the leak on failure.
kill_long_queries(): Plug the memory leak.
(This function always leaked memory when it was called.)
Problem was that if column was created in alter table when
it was refered again it was not tried to find from list
of current columns.
mysql_prepare_alter_table:
There is two cases
(1) If alter table adds a new column and then later alter
changes the field definition, there was no check from
list of new columns, instead an incorrect error was given.
(2) If alter table adds a new column and then later alter
changes the default, there was no check from list of
new columns, instead an incorrect error was given.
The fix broke mariabackup --prepare --incremental.
The restore of an incremental backup starts up (parts of) InnoDB twice.
First, all data files are discovered for applying .delta files. Then,
after the .delta files have been applied, InnoDB will be restarted
more completely, so that the redo log records will be applied via the
buffer pool.
During the first startup, the buffer pool is not initialized, and thus
trx_rseg_get_n_undo_tablespaces() must not be invoked. The apply of
the .delta files will currently assume that the --innodb-undo-tablespaces
option correctly specifies the number of undo tablespace files, just
like --backup does.
The second InnoDB startup of --prepare for applying the redo log will
properly invoke trx_rseg_get_n_undo_tablespaces().
enum srv_operation_mode: Add SRV_OPERATION_RESTORE_DELTA for
distinguishing the apply of .delta files from SRV_OPERATION_RESTORE.
srv_undo_tablespaces_init(): In mariabackup --prepare --incremental,
in the initial SRV_OPERATION_RESTORE_DELTA phase, do not invoke
trx_rseg_get_n_undo_tablespaces() because the buffer pool or the
redo logs are not available. Instead, blindly rely on the parameter
--innodb-undo-tablespaces.
fails with ERROR_INVALID_FUNCTION
This DeviceIoControl seems to happen on different boxes from time to time,
and there is not much user can do about it.
Instead of error, log a single INFO message, so it does not disturb users
much.
(and mask another fail until MDEV-10179 is fixed)
modified: storage/connect/mysql-test/connect/r/tbl_thread.result
modified: storage/connect/mysql-test/connect/t/tbl_thread.test