number of non-user tablespace.
fil_space_t::try_to_close(): Don't try to close
the tablespace which is acquired by the caller of
the function
Added the suppression message in open_files_limit test case
number of non-user tablespace.
- InnoDB only closes the user tablespace when the number of open
files exceeds innodb_open_files limit. In that case, InnoDB should
make sure that innodb_open_files value should be greater
than number of undo tablespace, system and temporary tablespace files.
- InnoDB page compression works only on COMPACT or DYNAMIC row
format tables. So InnoDB should throw error when alter table
tries to enable PAGE_COMPRESSED for redundant table.
BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size.
For example, for BP size lower than 80M and 16 K page size, the limit is
more than 5% of total BP and for lowest BP 5M, it is 80% of the BP.
Non-data objects like explicit locks could occupy part of the BP pool
reducing the pages available for LRU. If LRU reaches minimum limit and
if no free pages are available, server would hang with page cleaner not
able to free any more pages.
Fix: To avoid such hang, we adjust the LRU limit lower than the limit
for data objects as checked in buf_LRU_check_size_of_non_data_objects()
i.e. one page less than 5% of BP.
trx_free_at_shutdown(): Similar to trx_t::commit_in_memory(),
clear the detailed_error (FOREIGN KEY constraint error) before
invoking trx_t::free(). We only do this on debug instrumented
builds in order to avoid a debug assertion failure on shutdown.
Problem:
========
- InnoDB wrongly calulates the record size in
btr_node_ptr_max_size() when prefix index of
the column has to be stored externally.
Fix:
====
- InnoDB should add the maximum field size to
record size when the field is a fixed length one.
On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to
conflicting share modes between processes accessing the same file can
result in CreateFile failures.
mysys' my_open() already incorporates a workaround by implementing
wait/retry logic on Windows.
But this does not help if files are opened using shell redirection like
mysqltest traditionally did it, i.e via
--echo exec "some text" > output_file
In such cases, it is cmd.exe, that opens the output_file, and it
won't do any sharing-violation retries.
This commit addresses the issue by introducing a new built-in command,
'write_line', in mysqltest. This new command serves as a brief alternative
to 'write_file', with a single line output, that also resolves variables
like "exec" would.
Internally, this command will use my_open(), and therefore retry-on-error
logic.
Hopefully this will eliminate the very sporadic "can't open file because
it is used by another process" error on CI.
Issue:
------
The actual order of acquisition of the IBUF pessimistic insert mutex
(SYNC_IBUF_PESS_INSERT_MUTEX) and IBUF header page latch
(SYNC_IBUF_HEADER) w.r.t space latch (SYNC_FSP) differs from the order
defined in sync0types.h. It was not discovered earlier as the path to
ibuf_remove_free_page was not covered by the mtr test. Ideal order and
one defined in sync0types.h is as follows.
SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX -> SYNC_FSP
In ibuf_remove_free_page, we acquire space latch earlier and we have
the order as follows resulting in the assert with innodb_sync_debug=on.
SYNC_FSP -> SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX
Fix:
---
We do maintain this order in other places and there doesn't seem to be
any real issue here. To reduce impact in GA versions, we avoid doing
extensive changes in mutex ordering to match the current
SYNC_IBUF_PESS_INSERT_MUTEX order. Instead we relax the ordering check
for IBUF pessimistic insert mutex using SYNC_NO_ORDER_CHECK.
In the case if some unique key fields are nullable, there can be
several records with the same key fields in unique index with at least
one key field equal to NULL, as NULL != NULL.
When transaction is resumed after waiting on the record with at least one
key field equal to NULL, and stored in persistent cursor record is
deleted, persistent cursor can be restored to the record with all key
fields equal to the stored ones, but with at least one field equal to
NULL. And such record is wrongly treated as a record with the same unique
key as stored in persistent cursor record one, what is wrong as
NULL != NULL.
The fix is to check if at least one unique field is NULL in restored
persistent cursor position, and, if so, then don't treat the record as
one with the same unique key as in the stored record key.
dict_index_t::nulls_equal was removed, as it was initially developed for
never existed in MariaDB "intrinsic tables", and there is no code, which
would set it to "true".
Reviewed by Marko Mäkelä.
In commit d74d95961a (MDEV-18543)
there was an error that would cause the hidden metadata record
to be deleted, and therefore cause the table to appear corrupted
when it is reloaded into the data dictionary cache.
PageConverter::update_records(): Do not delete the metadata record,
but do validate it.
RecIterator::open(): Make the API more similar to 10.6, to simplify
merges.
- InnoDB reserves the free extents unnecessarily during blob
page allocation even though btr_page_alloc() can handle
reserving the extent when the existing ran out of pages to be used.
on a busy system it might take time for buffer_page_written_index_leaf
to reach the correct value. Wait for it.
also, tag identical statements to be different in the result file.
Problem:
======
- InnoDB fail to do instant operation while adding the variable
length column. Problem is that InnoDB wrongly assumes that
variable character length can never part of externally stored
page.
Solution:
========
instant_alter_column_possible(): Variable length
character field can be stored as externally stored page.
- Suppress the "Difficult to find free blocks" warning
globally to avoid many different test case failing.
- Demote the error information in validate_first_page() to note.
So first page can recovered from doublewrite buffer and can throw
error in case the page wasn't found in doublewrite buffer.
The issue here is ha_innobase::get_auto_increment() could cause a
deadlock involving auto-increment lock and rollback the transaction
implicitly. For such cases, storage engines usually call
thd_mark_transaction_to_rollback() to inform SQL engine about it which
in turn takes appropriate actions and close the transaction. In innodb,
we call it while converting Innodb error code to MySQL.
However, since ::innobase_get_autoinc() returns void, we skip the call
for error code conversion and also miss marking the transaction for
rollback for deadlock error. We assert eventually while releasing a
savepoint as the transaction state is not active.
Since convert_error_code_to_mysql() is handling some generic error
handling part, like invoking the callback when needed, we should call
that function in ha_innobase::get_auto_increment() even if we don't
return the resulting mysql error code back.
Problem:
========
During upgrade, InnoDB does write the redo log for adjusting
the tablespace size or tablespace flags even before the log
has upgraded to configured format. This could lead to data
inconsistent if any crash happened during upgrade process.
Fix:
===
srv_start(): Write the tablespace flags adjustment, increased
tablespace size redo log only after redo log upgradation.
log_write_low(), log_reserve_and_write_fast(): Check whether
the redo log is in physical format.
The root cause is the WAL logging of file operation when the actual
operation fails afterwards. It creates a situation with a log entry for
a operation that would always fail. I could simulate both the backup
scenario error and Innodb recovery failure exploiting the weakness.
We are following WAL for file rename operation and once logged the
operation must eventually complete successfully, or it is a major
catastrophe. Right now, we fail for rename and handle it as normal error
and it is the problem.
I created a patch to address RENAME operation to a non existing schema
where the destination schema directory is missing. The patch checks for
the missing schema before logging in an attempt to avoid the failure
after WAL log is written/flushed. I also checked that the schema cannot
be dropped or there cannot be any race with other rename to the same
file. This is protected by the MDL lock in SQL today.
The patch should this be a good improvement over the current situation
and solves the issue at hand.
Problem:
========
- InnoDB reads the length of the variable length field wrongly
while applying the modification log of instant table.
Solution:
========
rec_init_offsets_comp_ordinary(): For the temporary instant
file record, InnoDB should read the length of the variable length
field from the record itself.
MDEV-33308 CHECK TABLE is modifying .frm file even if --read-only
As noted in commit d0ef1aaf61,
MySQL as well as older versions of MariaDB server would during
ALTER TABLE ... IMPORT TABLESPACE write bogus values to the
PAGE_MAX_TRX_ID field to pages of the clustered index, instead of
letting that field remain 0.
In commit 8777458a6e this field
was repurposed for PAGE_ROOT_AUTO_INC in the clustered index root page.
To avoid trouble when upgrading from MySQL or older versions of MariaDB,
we will try to detect and correct bogus values of PAGE_ROOT_AUTO_INC
when opening a table for the first time from the SQL layer.
btr_read_autoinc_with_fallback(): Add the parameters to mysql_version,max
to indicate the TABLE_SHARE::mysql_version of the .frm file and the
maximum value allowed for the type of the AUTO_INCREMENT column.
In case the table was originally created in MySQL or an older version of
MariaDB, read also the maximum value of the AUTO_INCREMENT column from
the table and reset the PAGE_ROOT_AUTO_INC if it is above the limit.
dict_table_t::get_index(const dict_col_t &) const: Find an index that
starts with the specified column.
ha_innobase::check_for_upgrade(): Return HA_ADMIN_FAILED if InnoDB
needs upgrading but is in read-only mode. In this way, the call to
update_frm_version() will be skipped.
row_import_autoinc(): Adjust the AUTO_INCREMENT column at the end of
ALTER TABLE...IMPORT TABLESPACE. This refinement was suggested by
Debarun Banerjee.
The changes outside InnoDB were developed by Michael 'Monty' Widenius:
Added print_check_msg() service for easy reporting of check/repair messages
in ENGINE=Aria and ENGINE=InnoDB.
Fixed that CHECK TABLE do not update the .frm file under --read-only.
Added 'handler_flags' to HA_CHECK_OPT as a way for storage engines to
store state from handler::check_for_upgrade().
Reviewed by: Debarun Banerjee
row_discard_tablespace(): Do not invoke dict_index_t::clear_instant_alter()
because that would corrupt any adaptive hash index entries in the table.
row_import_for_mysql(): Invoke dict_index_t::clear_instant_alter()
after detaching any adaptive hash index entries.
If we fail to open a tablespace while looking for FILE_CHECKPOINT, we
set the corruption flag. Specifically, if encryption key is missing, we
would not be able to open an encrypted tablespace and the flag could be
set. We miss checking for this flag and report "Missing FILE_CHECKPOINT"
Address review comment to improve the test. Flush pages before starting
no-checkpoint block. It should improve the number of cases where the
test is skipped because some intermediate checkpoint is triggered.
THE FIX MUST NOT BE MERGED TO 10.6+, BECAUSE 10.6+ IS NOT AFFECTED!
The test is waiting for delete-marked record purging. But this does not
happen under the following conditions:
1. "START TRANSACTION WITH CONSISTENT SNAPSHOT" - is active, has not
been rolled back yet
2. "DELETE FROM t WHERE b = 20 # trx_1" - is committed
3. "INSERT INTO t VALUES(10, 20) # trx_2" - hanging on
"ib_after_row_insert" sync point, waiting for "first_ins_cont" signal
4. "DELETE FROM t WHERE b = 20 # trx_3" - blocked on delete-marked by
trx_1 record, waiting for trx_2
5. connection "default" is waiting on
'now WAIT_FOR row_purge_del_mark_finished'
purge_coordinator_callback_low() sets
purge_state.m_history_length= srv_do_purge(&n_total_purged);
even if nothing was purged, like in our case. Nothing was purged because
transaction with consistent snapshot was still alive during purging
procedure.
Then purge_coordinator_timer_callback() does not wake purge thread if
the following condition is true:
purge_state.m_history_length == trx_sys.rseg_history_len
The above condition is true for our case, because we are waiting for
delete-marked record purging, and trx_sys.rseg_history_len does not
grow.
Only 10.5 is affected, because there is no such condition in 10.6, i.e.
purge thread is woken up even if history size was not changed during
purge coordinator thread suspending.
The easiest way to fix it is just to remove the test from 10.5.
During import, if cfg file is not specified, we don't update the autoinc
field in innodb dictionary object dict_table_t. The next insert tries to
insert from the starting position of auto increment and fails.
It can be observed that the issue is resolved once server is restarted
as the persistent value is read correctly from PAGE_ROOT_AUTO_INC from
index root page. The patch fixes the issue by reading the the auto
increment value directly from PAGE_ROOT_AUTO_INC during import if cfg
file is not specified.
Test Fix:
1. import_bugs.test: Embedded mode warning has absolute path. Regular
expression replacement in test.
2. full_crc32_import.test: Table level auto increment mismatch after
import. It was using the auto increment data from the table prior to
discard and import which is not right. This value has cached auto
increment value higher than the actual inserted value and value stored
in PAGE_ROOT_AUTO_INC. Updated the result file and added validation for
checking the maximum value of auto increment column.
Reason:
======
undo_space_dblwr test case fails if the first page of undo
tablespace is not flushed before restart the server. While
restarting the server, InnoDB fails to detect the first
page of undo tablespace from doublewrite buffer.
Fix:
===
Use "ib_log_checkpoint_avoid_hard" debug sync point
to avoid checkpoint and make sure to flush the
dirtied page before killing the server.
innodb_make_page_dirty(): Fails to set
srv_fil_make_page_dirty_debug variable.
recv_dblwr_t::find_first_page(): Free the allocated memory
to read the first 3 pages from tablespace.
innodb.doublewrite: Added sleep to ensure page cleaner thread
wake up from my_cond_wait
- InnoDB fails to find the space id from the page0 of
the tablespace. In that case, InnoDB can use
doublewrite buffer to recover the page0 and write
into the file.
- buf_dblwr_t::init_or_load_pages(): Loads only the pages
which are valid.(page lsn >= checkpoint). To do that,
InnoDB has to open the redo log before system
tablespace, read the latest checkpoint information.
recv_dblwr_t::find_first_page():
1) Iterate the doublewrite buffer pages and find the 0th page
2) Read the tablespace flags, space id from the 0th page.
3) Read the 1st, 2nd and 3rd page from tablespace file and
compare the space id with the space id which is stored
in doublewrite buffer.
4) If it matches then we can write into the file.
5) Return space which matches the pages from the file.
SysTablespace::read_lsn_and_check_flags(): Remove the
retry logic for validating the first page. After
restoring the first page from doublewrite buffer,
assign tablespace flags by reading the first page.
recv_recovery_read_max_checkpoint(): Reads the maximum
checkpoint information from log file
recv_recovery_from_checkpoint_start(): Avoid reading
the checkpoint header information from log file
Datafile::validate_first_page(): Throw error in case
of first page validation fails.
The problem is the test is skipped after sourcing include/master-slave.inc.
This leaves the slave threads running after the test is skipped, causing a
following test to fail during rpl setup.
Also rename have_normal_bzip.inc to the more appropriate _zlib.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
buf_flush_page_cleaner(): A continue or break inside DBUG_EXECUTE_IF
actually is a no-op. Use an explicit call to _db_keyword_() to
actually avoid advancing the checkpoint.
buf_flush_list_now_set(): Invoke os_aio_wait_until_no_pending_writes()
to ensure that the page write to the system tablespace is completed.
srv_start(): Move a read only mode startup tweak from
innodb_init_params() to the correct location. Also if
innodb_force_recovery=6 we will disable the doublewrite buffer,
because InnoDB must run in read-only mode to prevent further corruption.
This change only affects debug checks. Whenever srv_read_only_mode holds,
the buf_pool.flush_list will be empty, that is, there will be no writes
of persistent InnoDB data pages.
Reviewed by: Thirunarayanan Balathandayuthapani
- innodb.doublewrite_debug should avoid the checkpoint
before killing the server. So used debug sync and
innodb_flush_sync to avoid the checkpoint completely.
Test case allowed to skip on MSAN builder due to extra
checkpoint.
row_ins_clust_index_entry_low(): Invoke btr_set_instant() in the same
mini-transaction that has successfully inserted the metadata record.
In this way, if inserting the metadata record fails before any
undo log record was written for it, the index root page will remain
consistent.
innobase_instant_try(): Remove the btr_set_instant() call.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
ha_innobase::check_if_supported_inplace_alter(): On ALTER_OPTIONS,
if innodb_file_per_table=1 and the table resides in the system tablespace,
require that the table be rebuilt (and moved to an .ibd file).
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
This test was using a sleep of 1 second in an attempt to ensure that the
timestamp that is part of an InnoDB status string would increase.
This not only prolongs the test execution time by 1+1 seconds, but it
also is inaccurate. It is possible that the actual sleep duration is
less than a second.
Let us wait for the creation of the file ib_buffer_pool and then wait
for the buffer pool dump completion. In that way, the test can complete
in a dozen or two milliseconds (1% of the previous duration) and work
more reliably.
Omit `state` when selecting processlist to verify which threads are running.
The state changes as threads are running (enter_state()), and this causes
sporadic test failures.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Let us remove a test that frequently fails with a result difference.
This test had been added in fc279d7ea2
to cover a bug in thd_destructor_proxy(), which was replaced with
simpler logic in 5e62b6a5e0 (MDEV-16264).
- Split the doublewrite test into two test (doublewrite,
doublewrite_debug) to reduce the execution time of the test
- Removed big_test tag for the newly added test case
- Made doublewrite test as non-debug test
- Added search pattern to make sure that InnoDB uses doublewrite buffer
- Replaced all kill_mysqld.inc with shutdown_mysqld.inc and
zero shutdown timeout
- Removed the case where fsp_flags got corrupted. Because from commit
3da5d047b8 (MDEV-31851) onwards,
doublewrite buffer removes the conversion the fsp flags from buggy
10.1 format
Thanks to Marko Mäkelä for providing the non-debug test