- Replace statement fails with duplicate key error when multiple
unique key conflict happens. Reason is that Server expects the
InnoDB engine to store the confliciting keys in ascending order.
But the InnoDB doesn't store the conflicting keys
in ascending order.
Fix:
===
- Enable HA_DUPLICATE_KEY_NOT_IN_ORDER for InnoDB storage engine
only when unique index order is different in .frm and innodb dictionary.
Problem:
========
- dict_stats_table_clone_create() does not initialize the
flag stats_error_printed in either dict_table_t or dict_index_t.
Because dict_stats_save_index_stat() is operating on a copy
of a dict_index_t object, it appears that
dict_index_t::stats_error_printed will always be false
for actual metadata objects, and uninitialized in
dict_stats_save_index_stat().
Solution:
=========
dict_stats_table_clone_create(): Assign stats_error_printed
for table and index while copying the statistics
Post-fix for MDEV-35144.
Cannot allocate options values on the statement arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was initially empty, it will be reset for every execution
and any values allocated on the statement arena will be lost.
Cannot allocate option values on the execution arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was initially NOT empty, any values appended to the
end will be preserved and if they're on the execution arena their
content will be destroyed.
Let's use thd->change_item_tree() to save and restore necessary pointers
for every execution.
followup for 3da565c41d
When adding a column or index that uses plugin-defined
sysvar-based options with CREATE ... LIKE the server
was using the current value of the sysvar, not the default one.
Because parse_option_list() function was used both in create
and open and it tried to guess when it's create (need to use
current sysvar value and add a new name=value pair to the list)
or open (need to use default, without extending the list).
Let's move the list extending functionality into a separate
function and call it explicitly when needed. Operations that
add new objects (CREATE, ALTER ... ADD) will extend the list,
other operations (ALTER, CREATE ... LIKE, open) will not.
Remove workaround for MDEV-13941, it served for 5 years,and all affected
pre-release 10.2 installation should have been already fixed in between.
Apparently Innodb is using is_sparse parameter in os_file_set_size()
inconsistently, and it passes is_sparse=false now during first file
extension. With MDEV-13941 workaround in place, it would unsparse
the file, which is makes compression not to work at all anymore.
Problem:
=======
- Redundant table fails to insert into the table after
instant drop blob column. Instant drop column only marking
the column as hidden and consecutive insert statement tries
to insert NULL value for the dropped BLOB column and returns
the fixed length of the blob type as 65535. This lead to
row size too large error.
Fix:
====
For redundant table, if the non-fixed dropped column can be null
then set the length of the field type as 0.
- InnoDB fails to set the index information or index number
for the spatial index error HA_ERR_NULL_IN_SPATIAL.
row_build_spatial_index_key(): Initialize the tmp_mbr array completely.
check_if_supported_inplace_alter(): Fix the spelling mistake of alter
Sometimes, in MariaDB Server 10.5 but apparently not in later branches,
the test would hang because con1 and con2 would be blocked in
debug_sync (for example, lock_wait_suspend_thread_enter and
row_ins_sec_index_entry_dup_locks_created) and therefore blocking
the purge of transactions from completing.
To prevent an occasional DEBUG_SYNC induced hang in the test, we will
wait for everything to be purged, except the last 2 transactions.
This change should be null-merged to 10.6, because the test is not
failing in 10.6 or later major versions.
row_purge_reset_trx_id(): Reserve large enough offsets for accomodating
the maximum width PRIMARY KEY followed by DB_TRX_ID,DB_ROLL_PTR.
Reviewed by: Thirunarayanan Balathandayuthapani
Don't allow the referencing key column from NULL TO NOT NULL
when
1) Foreign key constraint type is ON UPDATE SET NULL
2) Foreign key constraint type is ON DELETE SET NULL
3) Foreign key constraint type is UPDATE CASCADE and referenced
column declared as NULL
Don't allow the referenced key column from NOT NULL to NULL
when foreign key constraint type is UPDATE CASCADE
and referencing key columns doesn't allow NULL values
get_foreign_key_info(): InnoDB sends the information about
nullability of the foreign key fields and referenced key fields.
fk_check_column_changes(): Enforce the above rules for COPY
algorithm
innobase_check_foreign_drop_col(): Checks whether the dropped
column exists in existing foreign key relation
innobase_check_foreign_low() : Enforce the above rules for
INPLACE algorithm
dict_foreign_t::check_fk_constraint_valid(): This is used
by CREATE TABLE statement to check nullability for foreign
key relation.
Updated tests: cases with bugs or which cannot be run
with the cursor-protocol were excluded with
"--disable_cursor_protocol"/"--enable_cursor_protocol"
Fix for v.10.5
In the bug report MDEV-32817 it occurred that the function
row_mysql_get_table_status() is outputting a fil_space_t*
as if it were a numeric tablespace identifier.
ib_push_warning(): Remove. Let us invoke push_warning_printf() directly.
innodb_decryption_failed(): Report a decryption failure and set the
dict_table_t::file_unreadable flag. This code was being duplicated in
very many places. We return the constant value DB_DECRYPTION_FAILED
in order to avoid code duplication in the callers and to allow tail calls.
innodb_fk_error(): Report a FOREIGN KEY error.
dict_foreign_def_get(), dict_foreign_def_get_fields(): Remove.
This code was being used in dict_create_add_foreign_to_dictionary()
in an apparently uncovered code path. That ib_push_warning() call
would pass the integer i+1 instead of a pointer to NUL terminated
string ("%s"), and therefore the call should have resulted in a crash.
dict_print_info_on_foreign_key_in_create_format(),
innobase_quote_identifier(): Add const qualifiers.
row_mysql_get_table_error(): Replaces row_mysql_get_table_status().
Display no message on DB_CORRUPTION; it should be properly reported at
the SQL layer anyway.
Reason:
======
- InnoDB fails to load the instant alter table metadata from
clustered index while loading the table definition.
The reason is that InnoDB metadata blob has the column length
exceeds maximum fixed length column size.
Fix:
===
- InnoDB should treat the long fixed length column as variable
length fields that needs external storage while initializing
the field map for instant alter operation
- commit 85db534731 (MDEV-33400)
retains the instantness in the table definition after discard
tablespace. So there is no need to assign n_core_null_bytes
during instant table preparation unless they are not
initialized.
- During XA PREPARE, InnoDB releases the non-exclusive locks.
But it fails to remove the non-exclusive table lock from the
transaction table locks. In the mean time, main thread evicts
the table from the LRU cache. While rollbacking the XA transaction,
InnoDB iterates through the table locks to check whether it
holds lock on any system tables and wrongly assumes the
evicted table as system table since the table id is 0
Fix:
===
During XA PREPARE, remove the table locks of the transaction while
releasing the non-exclusive locks.
- Column stat_value and sample_size in mysql.innodb_index_stats
table is declared as BIGINT UNSIGNED without any check constraint.
user manually updates the value of stat_value and sample_size
to zero. InnoDB aborts the server while reading the statistics
information because InnoDB expects at least one leaf
page to exist for the index.
- To fix this issue, InnoDB should interpret the value of
stat_n_leaf_pages, stat_index_size in innodb_index_stats
stat_clustered_index_size, stat_sum_of_other_index_sizes
in innodb_table_stats as valid one even though user
mentioned it as 0.
During read only mode, InnoDB doesn't allow checkpoint to happen.
So InnoDB should throw the warning when InnoDB tries to
force the checkpoint when innodb_read_only = 1 or
innodb_force_recovery = 6.
- Few of test case should make sure that InnoDB does hit
the debug sync point during startup of the server.
InnoDB can remove the double quotes of debug point
in restart parameters.
- InnoDB tries to write FILE_CHECKPOINT marker during
early recovery when log file size is insufficient.
While updating the log checkpoint at the end of the recovery,
InnoDB must already have written out all pending changes
to the persistent files. To complete the checkpoint, InnoDB
has to write some log records for the checkpoint and to
update the checkpoint header. If the server gets killed
before updating the checkpoint header then it would lead
the logfile to be unrecoverable.
- This patch avoids FILE_CHECKPOINT marker during early
recovery and narrows down the window of opportunity to
make the log file unrecoverable.
number of non-user tablespace.
fil_space_t::try_to_close(): Don't try to close
the tablespace which is acquired by the caller of
the function
Added the suppression message in open_files_limit test case
number of non-user tablespace.
- InnoDB only closes the user tablespace when the number of open
files exceeds innodb_open_files limit. In that case, InnoDB should
make sure that innodb_open_files value should be greater
than number of undo tablespace, system and temporary tablespace files.
- InnoDB page compression works only on COMPACT or DYNAMIC row
format tables. So InnoDB should throw error when alter table
tries to enable PAGE_COMPRESSED for redundant table.
BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size.
For example, for BP size lower than 80M and 16 K page size, the limit is
more than 5% of total BP and for lowest BP 5M, it is 80% of the BP.
Non-data objects like explicit locks could occupy part of the BP pool
reducing the pages available for LRU. If LRU reaches minimum limit and
if no free pages are available, server would hang with page cleaner not
able to free any more pages.
Fix: To avoid such hang, we adjust the LRU limit lower than the limit
for data objects as checked in buf_LRU_check_size_of_non_data_objects()
i.e. one page less than 5% of BP.
trx_free_at_shutdown(): Similar to trx_t::commit_in_memory(),
clear the detailed_error (FOREIGN KEY constraint error) before
invoking trx_t::free(). We only do this on debug instrumented
builds in order to avoid a debug assertion failure on shutdown.
Problem:
========
- InnoDB wrongly calulates the record size in
btr_node_ptr_max_size() when prefix index of
the column has to be stored externally.
Fix:
====
- InnoDB should add the maximum field size to
record size when the field is a fixed length one.
On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to
conflicting share modes between processes accessing the same file can
result in CreateFile failures.
mysys' my_open() already incorporates a workaround by implementing
wait/retry logic on Windows.
But this does not help if files are opened using shell redirection like
mysqltest traditionally did it, i.e via
--echo exec "some text" > output_file
In such cases, it is cmd.exe, that opens the output_file, and it
won't do any sharing-violation retries.
This commit addresses the issue by introducing a new built-in command,
'write_line', in mysqltest. This new command serves as a brief alternative
to 'write_file', with a single line output, that also resolves variables
like "exec" would.
Internally, this command will use my_open(), and therefore retry-on-error
logic.
Hopefully this will eliminate the very sporadic "can't open file because
it is used by another process" error on CI.
Issue:
------
The actual order of acquisition of the IBUF pessimistic insert mutex
(SYNC_IBUF_PESS_INSERT_MUTEX) and IBUF header page latch
(SYNC_IBUF_HEADER) w.r.t space latch (SYNC_FSP) differs from the order
defined in sync0types.h. It was not discovered earlier as the path to
ibuf_remove_free_page was not covered by the mtr test. Ideal order and
one defined in sync0types.h is as follows.
SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX -> SYNC_FSP
In ibuf_remove_free_page, we acquire space latch earlier and we have
the order as follows resulting in the assert with innodb_sync_debug=on.
SYNC_FSP -> SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX
Fix:
---
We do maintain this order in other places and there doesn't seem to be
any real issue here. To reduce impact in GA versions, we avoid doing
extensive changes in mutex ordering to match the current
SYNC_IBUF_PESS_INSERT_MUTEX order. Instead we relax the ordering check
for IBUF pessimistic insert mutex using SYNC_NO_ORDER_CHECK.
In the case if some unique key fields are nullable, there can be
several records with the same key fields in unique index with at least
one key field equal to NULL, as NULL != NULL.
When transaction is resumed after waiting on the record with at least one
key field equal to NULL, and stored in persistent cursor record is
deleted, persistent cursor can be restored to the record with all key
fields equal to the stored ones, but with at least one field equal to
NULL. And such record is wrongly treated as a record with the same unique
key as stored in persistent cursor record one, what is wrong as
NULL != NULL.
The fix is to check if at least one unique field is NULL in restored
persistent cursor position, and, if so, then don't treat the record as
one with the same unique key as in the stored record key.
dict_index_t::nulls_equal was removed, as it was initially developed for
never existed in MariaDB "intrinsic tables", and there is no code, which
would set it to "true".
Reviewed by Marko Mäkelä.
In commit d74d95961a (MDEV-18543)
there was an error that would cause the hidden metadata record
to be deleted, and therefore cause the table to appear corrupted
when it is reloaded into the data dictionary cache.
PageConverter::update_records(): Do not delete the metadata record,
but do validate it.
RecIterator::open(): Make the API more similar to 10.6, to simplify
merges.
- InnoDB reserves the free extents unnecessarily during blob
page allocation even though btr_page_alloc() can handle
reserving the extent when the existing ran out of pages to be used.
on a busy system it might take time for buffer_page_written_index_leaf
to reach the correct value. Wait for it.
also, tag identical statements to be different in the result file.
Problem:
======
- InnoDB fail to do instant operation while adding the variable
length column. Problem is that InnoDB wrongly assumes that
variable character length can never part of externally stored
page.
Solution:
========
instant_alter_column_possible(): Variable length
character field can be stored as externally stored page.
- Suppress the "Difficult to find free blocks" warning
globally to avoid many different test case failing.
- Demote the error information in validate_first_page() to note.
So first page can recovered from doublewrite buffer and can throw
error in case the page wasn't found in doublewrite buffer.
The issue here is ha_innobase::get_auto_increment() could cause a
deadlock involving auto-increment lock and rollback the transaction
implicitly. For such cases, storage engines usually call
thd_mark_transaction_to_rollback() to inform SQL engine about it which
in turn takes appropriate actions and close the transaction. In innodb,
we call it while converting Innodb error code to MySQL.
However, since ::innobase_get_autoinc() returns void, we skip the call
for error code conversion and also miss marking the transaction for
rollback for deadlock error. We assert eventually while releasing a
savepoint as the transaction state is not active.
Since convert_error_code_to_mysql() is handling some generic error
handling part, like invoking the callback when needed, we should call
that function in ha_innobase::get_auto_increment() even if we don't
return the resulting mysql error code back.
Problem:
========
During upgrade, InnoDB does write the redo log for adjusting
the tablespace size or tablespace flags even before the log
has upgraded to configured format. This could lead to data
inconsistent if any crash happened during upgrade process.
Fix:
===
srv_start(): Write the tablespace flags adjustment, increased
tablespace size redo log only after redo log upgradation.
log_write_low(), log_reserve_and_write_fast(): Check whether
the redo log is in physical format.
The root cause is the WAL logging of file operation when the actual
operation fails afterwards. It creates a situation with a log entry for
a operation that would always fail. I could simulate both the backup
scenario error and Innodb recovery failure exploiting the weakness.
We are following WAL for file rename operation and once logged the
operation must eventually complete successfully, or it is a major
catastrophe. Right now, we fail for rename and handle it as normal error
and it is the problem.
I created a patch to address RENAME operation to a non existing schema
where the destination schema directory is missing. The patch checks for
the missing schema before logging in an attempt to avoid the failure
after WAL log is written/flushed. I also checked that the schema cannot
be dropped or there cannot be any race with other rename to the same
file. This is protected by the MDL lock in SQL today.
The patch should this be a good improvement over the current situation
and solves the issue at hand.
Problem:
========
- InnoDB reads the length of the variable length field wrongly
while applying the modification log of instant table.
Solution:
========
rec_init_offsets_comp_ordinary(): For the temporary instant
file record, InnoDB should read the length of the variable length
field from the record itself.
MDEV-33308 CHECK TABLE is modifying .frm file even if --read-only
As noted in commit d0ef1aaf61,
MySQL as well as older versions of MariaDB server would during
ALTER TABLE ... IMPORT TABLESPACE write bogus values to the
PAGE_MAX_TRX_ID field to pages of the clustered index, instead of
letting that field remain 0.
In commit 8777458a6e this field
was repurposed for PAGE_ROOT_AUTO_INC in the clustered index root page.
To avoid trouble when upgrading from MySQL or older versions of MariaDB,
we will try to detect and correct bogus values of PAGE_ROOT_AUTO_INC
when opening a table for the first time from the SQL layer.
btr_read_autoinc_with_fallback(): Add the parameters to mysql_version,max
to indicate the TABLE_SHARE::mysql_version of the .frm file and the
maximum value allowed for the type of the AUTO_INCREMENT column.
In case the table was originally created in MySQL or an older version of
MariaDB, read also the maximum value of the AUTO_INCREMENT column from
the table and reset the PAGE_ROOT_AUTO_INC if it is above the limit.
dict_table_t::get_index(const dict_col_t &) const: Find an index that
starts with the specified column.
ha_innobase::check_for_upgrade(): Return HA_ADMIN_FAILED if InnoDB
needs upgrading but is in read-only mode. In this way, the call to
update_frm_version() will be skipped.
row_import_autoinc(): Adjust the AUTO_INCREMENT column at the end of
ALTER TABLE...IMPORT TABLESPACE. This refinement was suggested by
Debarun Banerjee.
The changes outside InnoDB were developed by Michael 'Monty' Widenius:
Added print_check_msg() service for easy reporting of check/repair messages
in ENGINE=Aria and ENGINE=InnoDB.
Fixed that CHECK TABLE do not update the .frm file under --read-only.
Added 'handler_flags' to HA_CHECK_OPT as a way for storage engines to
store state from handler::check_for_upgrade().
Reviewed by: Debarun Banerjee
row_discard_tablespace(): Do not invoke dict_index_t::clear_instant_alter()
because that would corrupt any adaptive hash index entries in the table.
row_import_for_mysql(): Invoke dict_index_t::clear_instant_alter()
after detaching any adaptive hash index entries.