Problem:
========
Currently import operation fails with schema mismatch when
cfg file has hidden fts document id and hidden fts document index.
Fix:
====
To fix this issue, simply add the fts doc id column,
indexes in table definition and try to import the table.
In case of success:
1) update the fts document id in sys columns.
2) update the number of columns in sys tables.
3) insert the new fts index entry in sys indexes table
and sys fields.
4) Reload the table with new table definition
purge_sys_t::truncating_tablespace(): Clamp the
innodb_max_undo_log_size to the maximum number of pages
before converting the result into a 32-bit unsigned integer.
This fixes up commit f8c88d905b (MDEV-33213).
In later major versions, we would use 32-bit unsigned integer here
due to commit ca501ffb04
and the code would crash also on 64-bit processors.
Reviewed by: Debarun Banerjee
This fixes up the merge commit 7e39470e33
dict_table_open_on_name(): Report ER_TABLE_CORRUPT in a consistent
fashion, with a pretty-printed table name.
MDEV-33308 CHECK TABLE is modifying .frm file even if --read-only
As noted in commit d0ef1aaf61,
MySQL as well as older versions of MariaDB server would during
ALTER TABLE ... IMPORT TABLESPACE write bogus values to the
PAGE_MAX_TRX_ID field to pages of the clustered index, instead of
letting that field remain 0.
In commit 8777458a6e this field
was repurposed for PAGE_ROOT_AUTO_INC in the clustered index root page.
To avoid trouble when upgrading from MySQL or older versions of MariaDB,
we will try to detect and correct bogus values of PAGE_ROOT_AUTO_INC
when opening a table for the first time from the SQL layer.
btr_read_autoinc_with_fallback(): Add the parameters to mysql_version,max
to indicate the TABLE_SHARE::mysql_version of the .frm file and the
maximum value allowed for the type of the AUTO_INCREMENT column.
In case the table was originally created in MySQL or an older version of
MariaDB, read also the maximum value of the AUTO_INCREMENT column from
the table and reset the PAGE_ROOT_AUTO_INC if it is above the limit.
dict_table_t::get_index(const dict_col_t &) const: Find an index that
starts with the specified column.
ha_innobase::check_for_upgrade(): Return HA_ADMIN_FAILED if InnoDB
needs upgrading but is in read-only mode. In this way, the call to
update_frm_version() will be skipped.
row_import_autoinc(): Adjust the AUTO_INCREMENT column at the end of
ALTER TABLE...IMPORT TABLESPACE. This refinement was suggested by
Debarun Banerjee.
The changes outside InnoDB were developed by Michael 'Monty' Widenius:
Added print_check_msg() service for easy reporting of check/repair messages
in ENGINE=Aria and ENGINE=InnoDB.
Fixed that CHECK TABLE do not update the .frm file under --read-only.
Added 'handler_flags' to HA_CHECK_OPT as a way for storage engines to
store state from handler::check_for_upgrade().
Reviewed by: Debarun Banerjee
row_discard_tablespace(): Do not invoke dict_index_t::clear_instant_alter()
because that would corrupt any adaptive hash index entries in the table.
row_import_for_mysql(): Invoke dict_index_t::clear_instant_alter()
after detaching any adaptive hash index entries.
lock_table_children(): A new function to lock all child tables of a table.
We will only hold dict_sys.latch while traversing
dict_table_t::referenced_set. To prevent a race condition with
std::set::erase() we will copy the pointers to the child tables to a
local vector. Once we have acquired MDL and references to all child tables,
we can safely release dict_sys.latch, wait for the locks, and finally
release the references.
dict_acquire_mdl_shared(): A new variant that takes mdl_context as a
parameter.
lock_table_for_trx(): Assert that we are not holding dict_sys.latch.
ha_innobase::truncate(): When foreign_key_checks=ON, assert that
no child tables exist (other than the current table).
In any case, we will invoke lock_table_children()
so that the child table metadata can be safely updated.
(It is possible that a child table is being created concurrently
with TRUNCATE TABLE.)
ha_innobase::delete_table(): Before and after acquiring exclusive
locks on the current table as well as all child tables,
check that FOREIGN KEY constraints will not be violated.
In this way, we can reject impossible DROP TABLE without having to
wait for locks first.
This fixes up commit 2ca1123464 (MDEV-26217)
and commit c3c53926c4 (MDEV-26554).
If we fail to open a tablespace while looking for FILE_CHECKPOINT, we
set the corruption flag. Specifically, if encryption key is missing, we
would not be able to open an encrypted tablespace and the flag could be
set. We miss checking for this flag and report "Missing FILE_CHECKPOINT"
Address review comment to improve the test. Flush pages before starting
no-checkpoint block. It should improve the number of cases where the
test is skipped because some intermediate checkpoint is triggered.
This follows up commit 383f77cd84
which simplified dict_table_schema_check().
Note: We can display quoted names like this:
my_snprintf(buf, sizeof buf, "%`.*s.%`s",
int(t->name.dblen()), t->name.m_name, t->name.basename());
THE FIX MUST NOT BE MERGED TO 10.6+, BECAUSE 10.6+ IS NOT AFFECTED!
The test is waiting for delete-marked record purging. But this does not
happen under the following conditions:
1. "START TRANSACTION WITH CONSISTENT SNAPSHOT" - is active, has not
been rolled back yet
2. "DELETE FROM t WHERE b = 20 # trx_1" - is committed
3. "INSERT INTO t VALUES(10, 20) # trx_2" - hanging on
"ib_after_row_insert" sync point, waiting for "first_ins_cont" signal
4. "DELETE FROM t WHERE b = 20 # trx_3" - blocked on delete-marked by
trx_1 record, waiting for trx_2
5. connection "default" is waiting on
'now WAIT_FOR row_purge_del_mark_finished'
purge_coordinator_callback_low() sets
purge_state.m_history_length= srv_do_purge(&n_total_purged);
even if nothing was purged, like in our case. Nothing was purged because
transaction with consistent snapshot was still alive during purging
procedure.
Then purge_coordinator_timer_callback() does not wake purge thread if
the following condition is true:
purge_state.m_history_length == trx_sys.rseg_history_len
The above condition is true for our case, because we are waiting for
delete-marked record purging, and trx_sys.rseg_history_len does not
grow.
Only 10.5 is affected, because there is no such condition in 10.6, i.e.
purge thread is woken up even if history size was not changed during
purge coordinator thread suspending.
The easiest way to fix it is just to remove the test from 10.5.
During import, if cfg file is not specified, we don't update the autoinc
field in innodb dictionary object dict_table_t. The next insert tries to
insert from the starting position of auto increment and fails.
It can be observed that the issue is resolved once server is restarted
as the persistent value is read correctly from PAGE_ROOT_AUTO_INC from
index root page. The patch fixes the issue by reading the the auto
increment value directly from PAGE_ROOT_AUTO_INC during import if cfg
file is not specified.
Test Fix:
1. import_bugs.test: Embedded mode warning has absolute path. Regular
expression replacement in test.
2. full_crc32_import.test: Table level auto increment mismatch after
import. It was using the auto increment data from the table prior to
discard and import which is not right. This value has cached auto
increment value higher than the actual inserted value and value stored
in PAGE_ROOT_AUTO_INC. Updated the result file and added validation for
checking the maximum value of auto increment column.
Reason:
======
undo_space_dblwr test case fails if the first page of undo
tablespace is not flushed before restart the server. While
restarting the server, InnoDB fails to detect the first
page of undo tablespace from doublewrite buffer.
Fix:
===
Use "ib_log_checkpoint_avoid_hard" debug sync point
to avoid checkpoint and make sure to flush the
dirtied page before killing the server.
innodb_make_page_dirty(): Fails to set
srv_fil_make_page_dirty_debug variable.
It is not sufficient to sleep for only 1 second to
ensure that the page cleaner has gone idle.
The timings could have been changed compared to earlier releases
by commit a635c40648 (MDEV-27774).
Let us allow the non-debug test innodb.doublewrite to be skipped,
and remove the insufficient 1-second sleep.
recv_dblwr_t::find_first_page(): Free the allocated memory
to read the first 3 pages from tablespace.
innodb.doublewrite: Added sleep to ensure page cleaner thread
wake up from my_cond_wait
- InnoDB fails to find the space id from the page0 of
the tablespace. In that case, InnoDB can use
doublewrite buffer to recover the page0 and write
into the file.
- buf_dblwr_t::init_or_load_pages(): Loads only the pages
which are valid.(page lsn >= checkpoint). To do that,
InnoDB has to open the redo log before system
tablespace, read the latest checkpoint information.
recv_dblwr_t::find_first_page():
1) Iterate the doublewrite buffer pages and find the 0th page
2) Read the tablespace flags, space id from the 0th page.
3) Read the 1st, 2nd and 3rd page from tablespace file and
compare the space id with the space id which is stored
in doublewrite buffer.
4) If it matches then we can write into the file.
5) Return space which matches the pages from the file.
SysTablespace::read_lsn_and_check_flags(): Remove the
retry logic for validating the first page. After
restoring the first page from doublewrite buffer,
assign tablespace flags by reading the first page.
recv_recovery_read_max_checkpoint(): Reads the maximum
checkpoint information from log file
recv_recovery_from_checkpoint_start(): Avoid reading
the checkpoint header information from log file
Datafile::validate_first_page(): Throw error in case
of first page validation fails.
The problem is the test is skipped after sourcing include/master-slave.inc.
This leaves the slave threads running after the test is skipped, causing a
following test to fail during rpl setup.
Also rename have_normal_bzip.inc to the more appropriate _zlib.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
buf_flush_page_cleaner(): A continue or break inside DBUG_EXECUTE_IF
actually is a no-op. Use an explicit call to _db_keyword_() to
actually avoid advancing the checkpoint.
buf_flush_list_now_set(): Invoke os_aio_wait_until_no_pending_writes()
to ensure that the page write to the system tablespace is completed.
srv_start(): Move a read only mode startup tweak from
innodb_init_params() to the correct location. Also if
innodb_force_recovery=6 we will disable the doublewrite buffer,
because InnoDB must run in read-only mode to prevent further corruption.
This change only affects debug checks. Whenever srv_read_only_mode holds,
the buf_pool.flush_list will be empty, that is, there will be no writes
of persistent InnoDB data pages.
Reviewed by: Thirunarayanan Balathandayuthapani
- innodb.doublewrite_debug should avoid the checkpoint
before killing the server. So used debug sync and
innodb_flush_sync to avoid the checkpoint completely.
Test case allowed to skip on MSAN builder due to extra
checkpoint.
row_ins_clust_index_entry_low(): Invoke btr_set_instant() in the same
mini-transaction that has successfully inserted the metadata record.
In this way, if inserting the metadata record fails before any
undo log record was written for it, the index root page will remain
consistent.
innobase_instant_try(): Remove the btr_set_instant() call.
This is a 10.6 version of c17aca2f11.
Tested by: Matthias Leich
Reviewed by: Thirunarayanan Balathandayuthapani
row_ins_clust_index_entry_low(): Invoke btr_set_instant() in the same
mini-transaction that has successfully inserted the metadata record.
In this way, if inserting the metadata record fails before any
undo log record was written for it, the index root page will remain
consistent.
innobase_instant_try(): Remove the btr_set_instant() call.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
ha_innobase::check_if_supported_inplace_alter(): On ALTER_OPTIONS,
if innodb_file_per_table=1 and the table resides in the system tablespace,
require that the table be rebuilt (and moved to an .ibd file).
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
This test was using a sleep of 1 second in an attempt to ensure that the
timestamp that is part of an InnoDB status string would increase.
This not only prolongs the test execution time by 1+1 seconds, but it
also is inaccurate. It is possible that the actual sleep duration is
less than a second.
Let us wait for the creation of the file ib_buffer_pool and then wait
for the buffer pool dump completion. In that way, the test can complete
in a dozen or two milliseconds (1% of the previous duration) and work
more reliably.
Omit `state` when selecting processlist to verify which threads are running.
The state changes as threads are running (enter_state()), and this causes
sporadic test failures.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Let us remove a test that frequently fails with a result difference.
This test had been added in fc279d7ea2
to cover a bug in thd_destructor_proxy(), which was replaced with
simpler logic in 5e62b6a5e0 (MDEV-16264).
Enable unusable key notes for non-equality predicates:
<, <=, =>, >, BETWEEN, IN, LIKE
Note, in some scenarios it displays duplicate notes, e.g.
for queries with ORDER BY:
SELECT * FROM t1
WHERE indexed_string_column >= 10
ORDER BY indexed_string_column
LIMIT 5;
This should be tolarable. Getting rid of the diplicate note
completely would need a much more complex patch, which is
not desiable in 10.6.
Details:
- Changing RANGE_OPT_PARAM::note_unusable_keys from bool
to a new data type Item_func::Bitmap, so the caller can
choose with a better granuality which predicates
should raise unusable key notes inside the range optimizer:
a. all predicates (=, <=>, <, <=, =>, >, BETWEEN, IN, LIKE)
b. all predicates except equality (=, <=>)
c. none of the predicates
"b." is needed because in some scenarios equality predicates (=, <=>)
send unusable key notes at an earlier stage, before the range optimizer,
during update_ref_and_keys(). Calling the range optimizer with
"all predicates" would produce duplicate notes for = and <=> in such cases.
- Fixing get_quick_record_count() to call the range optimizer
with "all predicates except equality" instead of "none of the predicates".
Before this change the range optimizer suppressed all notes for
non-equality predicates: <, <=, =>, >, BETWEEN, IN, LIKE.
This actually fixes the reported problem.
- Fixing JOIN::make_range_rowid_filters() to call the range optimizer
with "all predicates except equality" instead of "all predicates".
Before this change the range optimizer produced duplicate notes
for = and <=> during a rowid_filter optimization.
- Cleanup:
Adding the op_collation argument to Field::raise_note_cannot_use_key_part()
and displaying the operation collation rather than the argument collation
in the unusable key note. This is important for operations with more than
two arguments: BETWEEN and IN, e.g.:
SELECT * FROM t1
WHERE column_utf8mb3_general_ci
BETWEEN 'a' AND 'b' COLLATE utf8mb3_unicode_ci;
SELECT * FROM t1
WHERE column_utf8mb3_general_ci
IN ('a', 'b' COLLATE utf8mb3_unicode_ci);
The note for 'a' now prints utf8mb3_unicode_ci as the collation.
which is the collation of the entire operation:
Cannot use key key1 part[0] for lookup:
"`column_utf8mb3_general_ci`" of collation `utf8mb3_general_ci` >=
"'a'" of collation `utf8mb3_unicode_ci`
Before this change it printed the collation of 'a',
so the note was confusing:
Cannot use key key1 part[0] for lookup:
"`column_utf8mb3_general_ci`" of collation `utf8mb3_general_ci` >=
"'a'" of collation `utf8mb3_general_ci`"
- Split the doublewrite test into two test (doublewrite,
doublewrite_debug) to reduce the execution time of the test
- Removed big_test tag for the newly added test case
- Made doublewrite test as non-debug test
- Added search pattern to make sure that InnoDB uses doublewrite buffer
- Replaced all kill_mysqld.inc with shutdown_mysqld.inc and
zero shutdown timeout
- Removed the case where fsp_flags got corrupted. Because from commit
3da5d047b8 (MDEV-31851) onwards,
doublewrite buffer removes the conversion the fsp flags from buggy
10.1 format
Thanks to Marko Mäkelä for providing the non-debug test
srv_export_innodb_status(): Update
export_vars.innodb_buffer_pool_read_requests
with buf_pool.stat.n_page_gets. This is caused due
to incorrect merge commit 44c9008ba6
Problem:
========
- InnoDB should have two keys on the same column for the self
referencing foreign key relation.
Solution:
=========
- Allow self referential foreign key relation to work with one
key.
InnoDB could return off-by-1 estimates for the involved tables.
This would cause off-by-many difference in join output cardinality
for the top-level SELECT, and so different query plan for the subquery.
The fix: Introduce mysql-test/include/innodb_stable_estimates.{inc,opt}
which disables InnoDB's background statistics collection, and use it.
Fix old_mode flags conflict between OLD_MODE_NO_NULL_COLLATION_IDS
and OLD_MODE_LOCK_ALTER_TABLE_COPY.
Both flags used to be 1 << 6, now OLD_MODE_LOCK_ALTER_TABLE_COPY changed
to be 1 << 7
row_upd_clust_rec_by_insert(): If we are resuming from a lock wait,
reset the 'disowned' flag of the BLOB pointers in 'entry' that we
copied from 'rec' on which we had invoked btr_cur_disown_inherited_fields()
before the lock wait started. In this way, the inserted record with
the updated PRIMARY KEY value will have the BLOB ownership associated
with itself, like it is supposed to be.
Note: If the lock wait had been aborted, then rollback would have
invoked btr_cur_unmark_extern_fields() and no corruption would be possible.
Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
trx_t::commit_in_memory(): Empty the detailed_error string, so that
FOREIGN KEY error messages from an earlier transaction will not be
wrongly reused in ha_innobase::get_error_message().
Reviewed by: Thirunarayanan Balathandayuthapani
mysql_prepare_alter_table(): Alter table should check whether
foreign key exists when it expected to exists and
report the error in early stage
dict_foreign_parse_drop_constraints(): Don't throw error if the
foreign key constraints doesn't exist when if exists is given
in the statement.
Let us wait for the completion of purge before testing the KILL of
CREATE INDEX c2d ON t1(c2), so that there will be no table handle
acquisition by a purge task before the operation is rolled back.
Also, let us make the test compatible with ./mtr --repeat,
and convert variable_value from string to integer so that any
comparisons will be performed correctly.
Let us make the test compatible with ./mtr --repeat
and convert variable_value to integer, so that comparisons like
16>9 will work as intended, instead of being compared as '16'<'9'.
In any test that uses wait_all_purged.inc, ensure that InnoDB tables
will be created without persistent statistics.
This is a follow-up to commit cd04673a17
after a similar failure was observed in the innodb_zip.blob test.
Problem:
========
- InnoDB fails to free the allocated buffer of stored cursor
when information schema query is interrupted.
Solution:
=========
- In case of error handling, information schema query should free
the allocated buffer to store the cursor.
Let us tolerate multiple "Memory pressure event freed"
in case there a real memory pressure event occurred
in addition to the one that this test simulates.
Also, clean up some SET variables.
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758 and revised in
commit c1fd082e9c for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES.
dict_check_tablespaces_and_store_max_id(): In the normal case
(no encryption plugin has been loaded and the change buffer is empty),
invoke dict_find_max_space_id() and do not open any .ibd files.
If a std::set<uint32_t> has been specified, open the files whose
tablespace ID is mentioned. Else, open all data files that are identified
by SYS_TABLES records.
fil_ibd_open(): Remove a call to os_file_get_last_error() that can
report a misleading error, such as EINVAL inside my_realpath() that is
not an actual error. This could be invoked when a data file is found
but the FSP_SPACE_FLAGS are incorrect, such as is the case for
table test.td in
./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags
buf_load(): If any tablespaces could not be found, invoke
dict_check_tablespaces_and_store_max_id() on the missing tablespaces.
dict_load_tablespace(): Try to load the tablespace unless it was found
to be futile. This fixes failures related to FTS_*.ibd files for
FULLTEXT INDEX.
btr_cur_t::search_leaf(): Prevent a crash when the tablespace
does not exist. This was caught by the test innodb_fts.fts_concurrent_insert
when the change to dict_load_tablespaces() was not present.
We modify a few tests to ensure that tables will not be loaded at startup.
For some fault injection tests this means that the corrupted tables
will not be loaded, because dict_load_tablespace() would perform stricter
checks than dict_check_tablespaces_and_store_max_id().
Tested by: Matthias Leich
Reviewed by: Thirunarayanan Balathandayuthapani
ha_innobase::extra(): Do not invoke log_buffer_flush_to_disk()
if high_level_read_only holds.
log_buffer_flush_to_disk(): Remove an assertion that duplicates one
at the start of log_write_up_to().
This test occasionally fails with a failure to purge history.
Let us try to purge everything before starting the interesting part,
to make that occasional failure go away.
The patch for "MDEV-25440: Indexed CHAR ... broken with NO_PAD collations"
fixed these scenarios from MDEV-26743:
- Basic latin letter vs equal accented letter
- Two letters vs equal (but space padded) expansion
However, this scenario was still broken:
- Basic latin letter (but followed by an ignorable character)
vs equal accented letter
Fix:
When processing for a NOPAD collation a string with trailing ignorable
characters, like:
'<non-ignorable><ignorable><ignorable>'
the string gets virtually converted to:
'<non-ignorable><ignorable><ignorable><space><space><space>...'
After the fix the code works differently in these two cases:
1. <space> fits into the "nchars" limit
2. <space> does not fit into the "nchars" limit
Details:
1. If "nchars" is large enough (4+ in this example),
return weights as follows:
'[weight-for-non-ignorable, 1 char] [weight-for-space-character, 3 chars]'
i.e. the weight for the virtual trailing space character now indicates
that it corresponds to total 3 characters:
- two ignorable characters
- one virtual trailing space character
2. If "nchars" is small (3), then the virtual trailing space character
does not fit into the "nchar" limit, so return 0x00 as weight, e.g.:
'[weight-for-non-ignorable, 1 char] [0x00, 2 chars]'
Adding corresponding MTR tests and unit tests.
recv_group_scan_log_recs(): Set the debug flag recv_sys.after_apply
after actually completing the log scan.
In the test, suppress some errors that may be reported when
the crash recovery of RENAME TABLE t1 TO t2 is preceded by
copying t2.ibd to t1.ibd.
This imports and adapts a number of MySQL 5.7 test cases that are
applicable to MariaDB.
Some tests for old bug fixes are not that relevant because the code
has been refactored since then (especially starting with
MariaDB Server 10.6), and the tests would not reproduce the
original bug if the fix was reverted.
In the test innodb_fts.opt, there are many duplicate MATCH ranks, which
would make the results nondeterministic. The test was stabilized by
changing some LIMIT clauses or by adding sorted_result in those cases
where the purpose of a test was to show that no sorting took place
in the server.
In the test innodb_fts.phrase, MySQL 5.7 would generate FTS_DOC_ID that
are 1 larger than in MariaDB. In innodb_fts.index_table the difference is 2.
This is because in MariaDB, fts_get_next_doc_id() post-increments
cache->next_doc_id, while MySQL 5.7 pre-increments it.
Reviewed by: Thirunarayanan Balathandayuthapani
Before MariaDB 10.3.5, the binlog position was stored in the TRX_SYS page,
while after it is stored in rollback segments. There is code to read the
legacy position from TRX_SYS to handle upgrades. The problem was if the
legacy position happens to compare larger than the position found in
rollback segments; in this case, the old TRX_SYS position would incorrectly
be preferred over the newer position from rollback segments.
Fixed by always preferring a position from rollback segments over a legacy
position.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Problem:
=======
- InnoDB fails to find the foreign key index for the newly
added foreign key relation. This is caused by commit
5f09b53bdb (MDEV-31086).
FIX:
===
In check_col_is_in_fk_indexes(), while iterating through
the newly added foreign key relationship, InnoDB should
consider that foreign key relation may not have foreign index
when foreign key check is disabled.
The InnoDB table lookup in purge worker threads is a bottleneck that can
degrade a slow shutdown to utilize less than 2 threads. Let us fix that
bottleneck by constructing a local lookup table that does not require any
synchronization while the undo log records of the current batch
are being processed.
TRX_PURGE_TABLE_BUCKETS: The initial number of std::unordered_map
hash buckets used during a purge batch. This could avoid some
resizing and rehashing in trx_purge_attach_undo_recs().
purge_node_t::tables: A lookup table from table ID to an already
looked up and locked table. Replaces many fields.
trx_purge_attach_undo_recs(): Look up each table in the purge batch
only once.
trx_purge(): Close all tables and release MDL at the end of the batch.
trx_purge_table_open(), trx_purge_table_acquire(): Open a table in purge
and acquire a metadata lock on it. This replaces
dict_table_open_on_id<true>() and dict_acquire_mdl_shared().
purge_sys_t::close_and_reopen(): In case of an MDL conflict, close and
reopen all tables that are covered by the current purge batch.
It may be that some of the tables have been dropped meanwhile and can
be ignored. This replaces wait_SYS() and wait_FTS().
row_purge_parse_undo_rec(): Make purge_coordinator_task issue a
MDL warrant to any purge_worker_task which might need it
when innodb_purge_threads>1.
purge_node_t::end(): Clear the MDL warrant.
Reviewed by: Vladislav Lesin and Vladislav Vaintroub
The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.
Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.
The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.
purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).
purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.
purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().
purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().
Reviewed by: Vladislav Lesin
Also fixes: MDEV-30050 Inconsistent results of DISTINCT with NOPAD
Problem:
Key segments for CHAR columns where compared using strnncollsp()
for engines MyISAM and Aria.
This did not work correct in case if the engine applyied trailing
space compression.
Fix:
Replacing ha_compare_text() calls to new functions:
- ha_compare_char_varying()
- ha_compare_char_fixed()
- ha_compare_word()
- ha_compare_word_prefix()
- ha_compare_word_or_prefix()
The code branch corresponding to comparison of CHAR column keys
(HA_KEYTYPE_TEXT segment type) now uses ha_compare_char_fixed()
which calls strnncollsp_nchars().
This patch does not change the behavior for the rest of the code:
- comparison of VARCHAR/TEXT column keys
(HA_KEYTYPE_VARTEXT1, HA_KEYTYPE_VARTEXT2 segments types)
- comparison in the fulltext code
Problem:
========
InnoDB fails to find the foreign key index for the
foreign key relation in the table while iterating the
foreign key constraints during alter operation. This is
caused by commit 5f09b53bdb
(MDEV-31086).
Fix:
====
In check_col_is_in_fk_indexes(), while iterating through
the foreign key relationship, InnoDB should consider that
foreign key relation may not have foreign index when
foreign key check is disabled.
In MDEV-31086, SET FOREIGN_KEY_CHECKS=0 cannot bypass checks that
make column types of foreign keys incompatible. An unfortunate
consequence is that adding an AUTO_INCREMENT is considered
incompatible in Field_{num,decimal}::is_equal and for the purpose
of FK checks this isn't relevant.
innodb.foreign_key - pragmaticly left wait_until_count_sessions.inc at
end of test to match the second line of test.
Reporter: horrockss@github - https://github.com/MariaDB/mariadb-docker/issues/528
Co-Author: Marko Mäkelä <marko.makela@mariadb.com>
Reviewer: Nikita Malyavin
For the future reader this was attempted:
Removing AUTO_INCREMENT checks from Field_{num,decimal}::is_equals
failed in the following locations (noted for future fixing):
* MyISAM and Aria (not InnoDB) don't adjust AUTO_INCREMENT next number
correctly, hence added a test to main.auto_increment to catch
the next person that attempts this fix.
* InnoDB must perform an ALGORITHM=COPY to populate NULL values of
an original table (MDEV-19190 mtr test period.copy), this requires
ALTER_STORED_COLUMN_TYPE to be set in fill_alter_inplace_info
which doesn't get hit because field->is_equal is true.
* InnoDB must not perform the change inplace (below patch)
* innodb.innodb-alter-timestamp main.partition_innodb test would
also need futher investigation.
InnoDB ha_innobase::check_if_supported_inplace_alter to support the
removal of Field_{num,decimal}::is_equal AUTO_INCREMENT checks would need the following change
diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc
index a5ccb1957f3..9d778e2d39a 100644
--- a/storage/innobase/handler/handler0alter.cc
+++ b/storage/innobase/handler/handler0alter.cc
@@ -2455,10 +2455,15 @@ ha_innobase::check_if_supported_inplace_alter(
/* An AUTO_INCREMENT attribute can only
be added to an existing column by ALGORITHM=COPY,
but we can remove the attribute. */
- ut_ad((MTYP_TYPENR((*af)->unireg_check)
- != Field::NEXT_NUMBER)
- || (MTYP_TYPENR(f->unireg_check)
- == Field::NEXT_NUMBER));
+ if ((MTYP_TYPENR((*af)->unireg_check)
+ == Field::NEXT_NUMBER)
+ && (MTYP_TYPENR(f->unireg_check)
+ != Field::NEXT_NUMBER))
+ {
+ ha_alter_info->unsupported_reason = my_get_err_msg(
+ ER_ALTER_OPERATION_NOT_SUPPORTED_REASON_AUTOINC);
+ DBUG_RETURN(HA_ALTER_INPLACE_NOT_SUPPORTED);
+ }
With this change the main.auto_increment test for bug #14573, under
innodb, will pass without the 2 --error ER_DUP_ENTRY entries.
The function header comment was updated to reflect the MDEV-31086
changes.
fil_page_compress_low returns 0 for both innodb_compression_algorithm=0
and where there is compression errors. On the two callers to this
function, don't increment the compression errors if the algorithm was
none.
Reviewed by: Marko Mäkelä
Problem:
========
- InnoDB fails to open undo tablespace when page0 is corrupted
and fails to throw error.
Solution:
=========
- InnoDB throws DB_CORRUPTION error when InnoDB encounters
page0 corruption of undo tablespace.
- InnoDB restores the page0 of undo tablespace from
doublewrite buffer if it encounters page corruption
- Moved Datafile::restore_from_doublewrite() to
recv_dblwr_t::restore_first_page(). So that undo
tablespace and system tablespace can use this function
instead of duplicating the code
srv_undo_tablespace_open(): Returns 0 if file doesn't exist
or ULINT_UNDEFINED if page0 is corrupted.
- InnoDB fails to check the overflow buffer while applying
the operation to the table that was rebuilt. This is caused
by commit 3cef4f8f0f (MDEV-515).
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
The error is caused by MDEV-30165 fix with the following commit:
d13a57ae81
There is logical error in lock_release_on_prepare_try():
if (supremum_bit)
lock_rec_unlock_supremum(*cell, lock);
else
lock_rec_dequeue_from_page(lock, false);
Because there can be other bits set in the lock's bitmap, and the lock
type can be suitable for releasing criteria, but the above logic
releases only supremum bit of the lock.
The fix is to release lock if it suits for releasing criteria and unlock
supremum if supremum is locked otherwise.
Tere is also the test for the case, which was reported by QA team. I
placed it in a separate files, because it requires debug build.
Reviewed by: Marko Mäkelä
While checking for altered column in foreign key constraints,
InnoDB fails to ignore virtual columns. This issue caused
by commit 5f09b53bdb4e973e7c7ec2c53a24c98321223f98(MDEV-31086).
MONITOR_OVLD_ROW_LOCK_CURRENT_WAIT monitor should has
MONITOR_DISPLAY_CURRENT flag set in its definition, as it shows the
current state and does not accumulate anything.
Reviewed by: Marko Mäkelä
trx_t::set_skip_lock_inheritance() must be invoked at the very beginning
of lock_release_on_prepare(). Currently trx_t::set_skip_lock_inheritance()
is invoked at the end of lock_release_on_prepare() when lock_sys and trx
are released, and there can be a case when locks on prepare are released,
but "not inherit gap locks" bit has not yet been set, and page split
inherits lock to supremum.
Also reset supremum bit and rebuild waiting queue when XA is prepared.
Reviewed by: Marko Mäkelä
fseg_free_extent(): After fsp_free_extent() succeeded, properly
mark the affected pages as freed. We failed to write FREE_PAGE records.
This bug was revealed or caused by
commit e938d7c18f (MDEV-32028).
lock_wait(): Never return the transient error code DB_LOCK_WAIT.
In commit 78a04a4c22 (MDEV-29869)
some assignments assign trx->error_state = DB_SUCCESS were removed,
and it was possible that the field was left at its initial value
DB_LOCK_WAIT.
The test case for this is nondeterministic; without this fix, it
would only occasionally fail.
Reviewed by: Vladislav Lesin
innodb_monitor_validate(): Let item_val_str() allocate the memory
in THD, so that it will be available to innodb_monitor_update().
In this way, there is no need to allocate another buffer, and
no problem if the call to innodb_monitor_update() is skipped due
to an invalid value that is passed to another configuration parameter.
There are some other callers to st_mysql_sys_var::val_str()
that validate configuration parameters that are related to FULLTEXT INDEX,
but they will allocate memory by invoking thd_strmake().
The problem is that s390x is not using the default bzip library we use
on other platforms, which causes compressed string lengths to be differnt
than what mtr tests expects.
Fixed by:
- Added have_normal_bzip.inc, which checks if compress() returns the
expected length.
- Adjust the results to match the expected one
- main.func_compress.test & archive.archive
- Don't print lengths that depends on compression library
- mysqlbinlog compress tests & connect.zip
- Don't print DATA_LENGTH for SET column_compression_zlib_level=1
- main.column_compression
- Server aborts when table doesn't have referenced index.
This is caused by 5f09b53bdb (MDEV-31086).
While iterating the foreign key constraints, we fail to
consider that InnoDB doesn't have referenced index for
it when foreign key check is disabled.
- This issue caused by commit 4700f2ac70f8c79f2ac1968b6b59d18716f492bf(MDEV-30796)
During bulk insert operation, InnoDB wrongly stores the next autoincrement
value as current autoincrement value. So update the current autoincrement
value rather than next auto increment value.
Problem:
========
InnoDB fails to mark the page status as FREED during
freeing of an extent of a segment. This behaviour affects
scrubbing and doesn't write all zeroes in file even though
pages are freed.
Solution:
========
InnoDB should mark the page status as FREED before
reinitialize the extent descriptor entry.
- This commit is different from 10.6 commit c438284863.
Due to Commit 045757af4c (MDEV-24621),
InnoDB does buffer and pre-sort the records for each index, and build
the indexes one page at a time.
Multiple large insert ignore statment aborts the server during bulk
insert operation. Problem is that InnoDB merge record exceeds
the page size. To avoid this scenario, InnoDB should catch
too big record while buffering the insert operation itself.
row_merge_buf_encode(): returns length of the encoded index record
row_merge_buf_write(): Catches the DB_TOO_BIG_RECORD earlier and
returns error
trx_t::commit_empty(): A special case of transaction "commit" when
the transaction was actually rolled back or the persistent undo log
is empty. In this case, we need to change the undo log header state to
TRX_UNDO_CACHED and move the undo log from rseg->undo_list to
rseg->undo_cached for fast reuse. Furthermore, unless this is the only
undo log record in the page, we will remove the record and rewind
TRX_UNDO_PAGE_START, TRX_UNDO_PAGE_FREE, TRX_UNDO_LAST_LOG.
We must also ensure that the system-wide transaction identifier
will be persisted up to this->id, so that there will not be warnings or
errors due to a PAGE_MAX_TRX_ID being too large. We might have modified
secondary index pages before being rolled back, and any changes of
PAGE_MAX_TRX_ID are never rolled back.
Even though it is not going to be written persistently anywhere,
we will invoke trx_sys.assign_new_trx_no(this), so that in the test
innodb.instant_alter everything will be purged as expected.
trx_t::write_serialisation_history(): Renamed from
trx_write_serialisation_history(). If there is no undo log,
invoke commit_empty().
trx_purge_add_undo_to_history(): Simplify an assertion and remove a
comment. This function will not be invoked on an empty undo log anymore.
trx_undo_header_create(): Add a debug assertion.
trx_undo_mem_create_at_db_start(): Remove a duplicated assignment.
Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
innodb_max_purge_lag_wait_update(): Return immediately if we are
in high_level_read_only mode.
srv_wake_purge_thread_if_not_active(): Relax a debug assertion.
If srv_read_only_mode holds, purge_sys.enabled() will not hold
and this function will do nothing.
trx_t::commit_in_memory(): Remove a redundant condition before
invoking srv_wake_purge_thread_if_not_active().
The test innodb.row_size_error_log_warnings_3 that was added in
commit 372b0e6355 (MDEV-20194)
failed to take into account the earlier adjustment in
commit cf574cf53b (MDEV-27634)
that is specific to many GNU/Linux distributions for the s390x.
The test innodb.alter_rename_files rather frequently hangs in
checkpoint_set_now. The test was removed in MariaDB Server 10.5
commit 37e7bde12a when the code that
it aimed to cover was simplified. Starting with MariaDB Server 10.5
the page flushing and log checkpointing is much simpler, handled
by the single buf_flush_page_cleaner() thread.
Let us remove the test to avoid occasional failures. We are not going
to fix the cause of the failure in MariaDB Server 10.4.
- InnoDB aborts when table is dropping the column. This is
caused by 5f09b53bdb (MDEV-31086).
While iterating the altered table fields, we fail to consider
the dropped columns.
* invoke check_expression() for all vcol_info's in
mysql_prepare_create_table() to check for FK CASCADE
* also check for SET NULL and SET DEFAULT
* to check against existing FKs when a vcol is added in ALTER TABLE,
old FKs must be added to the new_key_list just like other indexes are
* check columns recursively, if vcol1 references vcol2,
flags of vcol2 must be taken into account
* remove check_table_name_processor(), put that logic under
check_vcol_func_processor() to avoid walking the tree twice
- Introduce the option :autoshrink attribute to be
added to innodb_data_file_path variable to allow
the shrinking of system tablespace during startup process.
Steps for shrinking the system tablespace:
1) Find the last used extent in system tablespace
by iterating through the BITMAP in extent descriptor pages
2) If the last used extent is lesser than user specified size
then set desired target size to user specified size.
3) Store the page contents of "to be modified" extent
descriptor pages, latches the "to be modified"
extent descriptor pages and check for buffer pool
memory availability
4) Make checkpoint to flush all pages in buffer pool, so
that pages in flush list doesn't have to use doublewrite
buffer and disable doublewrite buffer during shrinking process
5) Update the FSP_SIZE and FSP_FREE_LIMIT in header page
6) Remove the "to be truncated" pages from FSP_FREE and
FSP_FREE_FRAG list
7) Reset the bitmap in the last descriptor pages for the
"to be truncated" pages.
8) In case of multiple files, calculate the truncated last
file size and do the truncation in last file
9) Check whether mini-transaction log size doesn't exceed
the minimum value of innodb_log_buffer_size which is 2MB.
In that case, replace the modified buffer pool pages with
the page old content.
11) Commit the mini-transaction for shrinking the tablespace
and enable/disable the doublewrite buffer depends on user
specified value.
recv_sys_t::apply(): Handle the truncation of system tablespace
only if the recovered tablespace size is lesser than actual
existing size.