buf_flush_page(): Never wait for a page latch, even in checkpoint
flushing (flush_type == BUF_FLUSH_LIST), to prevent a hang of the
page cleaner threads when a large number of pages is latched.
In mysql/mysql-server@9542f3015b
it was claimed that such a hang only affects CREATE FULLTEXT INDEX.
Their fix was to retain buffer-fix but release exclusive latch
on non-leaf pages, and subsequently write to those pages while
they are not associated with the mini-transaction, which would
trip a debug assertion in the MariaDB version of
mtr_t::memo_modify_page() and cause potential corruption
when using the default MariaDB setting innodb_log_optimize_ddl=OFF.
This change essentially backports a small part of
commit 7cffb5f6e8 (MDEV-23399)
from MariaDB Server 10.5.7.
Added checking for support of vfork by a platform where
building being done. Set HAVE_VFORK macros in case vfork()
system call is supported. Use vfork() system call if the
macros HAVE_VFORK is set, else use fork().
Starting with 10.3, an assertion would fail on the rollback of
a recovered incomplete transaction if a table definition violates
a FOREIGN KEY constraint.
DICT_ERR_IGNORE_RECOVER_LOCK: Include also DICT_ERR_IGNORE_FK_NOKEY
so that trx_resurrect_table_locks() will be able to load
table definitions and resurrect IX locks. Previously, if the
FOREIGN KEY constraints of a table were incomplete, the table
would fail to load until rollback, and in 10.3 or later an assertion
would fail that the rollback was not protected by a table IX lock.
Thanks to commit 9de2e60d74 there
will be no problems to enforce subsequent FOREIGN KEY operations
even though a table with invalid REFERENCES clause was loaded.
Creating a temporary table with Spider is non-sense because a Spider
table cannot hold any physical data and it requires an additional
effort to manage even if it is configured correctly.
Set HTON_TEMPORARY_NOT_SUPPORTED to spider_hton->flags.
Reviewed-by: nayuta.yanagisawa@hey.com
Co-authored-by: d8sk4ueun@gmail.com
mariabackup does not load dictionary during backup, but it loads
tablespaces, that is why fil_system.max_assigned_id is not set with
dict_check_tablespaces_and_store_max_id(). There is no sense to issue the
warning during backup.
In commit 437da7bc54 (MDEV-19534),
the default value of the global variable srv_checksum_algorithm
in innochecksum was changed from SRV_CHECKSUM_ALGORITHM_INNODB
to implied 0 (innodb_checksum_algorithm=crc32). As a result,
the function buf_page_is_corrupted() would by default invoke
buf_calc_page_crc32() in innochecksum, and crc32_inited would hold.
This would cause "innochecksum" to fail on a particular page.
The actual problem is older, introduced in 2011 in
mysql/mysql-server@17e497bdb7
(MySQL 5.6.3). It should affect the validation of pages of old
data files that were written with innodb_checksum_algorithm=innodb.
When using innodb_checksum_algorithm=crc32 (the default setting
since MariaDB Server 10.2), some valid pages would be rejected
only because exactly one of the two checksum fields accidentally
matches the innodb_checksum_algorithm=crc32 value.
buf_page_is_corrupted(): Simplify the logic of non-strict
checksum validation, by always invoking buf_calc_page_crc32().
Remove a bogus condition that if only one of the checksum fields
contains the value returned by buf_calc_page_crc32(), the page
is corrupted.
btr_cur_optimistic_insert(): Disregard DEBUG_DBUG injection to
invoke btr_page_reorganize() if the page (and the table) is empty.
Otherwise, an assertion would fail in btr_page_reorganize_low()
because PAGE_MAX_TRX_ID is 0 in an empty secondary index leaf page.
The result is not used anywhere but in the output of Innodb information
schema, but this can take as much as 7%CPU (only) on a benchmark.
Fix to move fs blocksize calculate to where it is used.
convert_error_code_to_mysql(): Use the correct limit FK_MAX_CASCADE_DEL
in the error message. The DICT_FK_MAX_RECURSIVE_LOAD applies to
the number of foreign key constraints in table definitions,
not to the number of rows that are visited while processing
a foreign key constraint.
Some GNU/Linux distributions ship a zlib that is modified to use
the s390x DFLTCC instruction. That modification would essentially
redefine compressBound(sourceLen) as (sourceLen * 16 + 2308) / 8 + 6.
Let us relax the tests for InnoDB ROW_FORMAT=COMPRESSED to cope with
such a weaker compression guarantee.
create_table_info_t::row_size_is_acceptable(): Remove a bogus debug-only
assertion that would fail to hold for the test innodb_zip.bug36169.
The function page_zip_empty_size() may indeed return 0.
row_sel_sec_rec_is_for_clust_rec() treats empty BLOB prefix field in
secondary index as a field equal to any external BLOB field in clustered
index. Row_sel_get_clust_rec_for_mysql::operator() doesn't zerro out
clustered record pointer in row_search_mvcc(), and row_search_mvcc()
thinks that delete-marked secondary index record has visible for
"CHECK TABLE"'s read view old-versioned clustered index record, and
row_scan_index_for_mysql() counts it as a row.
The fix is to execute row_sel_sec_rec_is_for_blob() in
row_sel_sec_rec_is_for_clust_rec() if clustered field contains BLOB's
reference.
accessing freed memory.
Before XMLCOL::WriteColumn() Tdbp->Clist gets assigned
a nodelist in
Clist = RowNode->SelectNodes(g, Colname, Clist);
which is RowNode->Doc->Xop->nodesetval.
In XMLCOL::WriteColumn()
ValNode = ColNode->SelectSingleNode(g, Xname, Vxnp);
calls LIBXMLDOC::GetNodeList() again, which frees the previous
XPath object Xop and replaces it with a new one.
In this case RowNode->Doc == ColNode->Doc, so Clist->Listp
points to a freed memory now.
It's misleading to compare and write to user number of columns and fields.
Thus, it would be better to remove that check and let use see a subsequent
error message about missing or mispaced column.
row_import::match_schema(): remove misleading check
The InnoDB DATA DIRECTORY attribute is not implemented via
symbolic links but something similar, *.isl files that contain
the names of data files.
InnoDB failed to ignore the DATA DIRECTORY attribute even though
the server was started with --skip-symbolic-links.
Native ALTER TABLE in InnoDB will retain the DATA DIRECTORY attribute
of the table, no matter if the table will be rebuilt or not.
Generic ALTER TABLE (with ALGORITHM=COPY) as well as TRUNCATE TABLE
will discard the DATA DIRECTORY attribute.
All tests have been run with and without the ./mtr option
--mysqld=--skip-symbolic-links
and some tests that use the InnoDB DATA DIRECTORY attribute
have been adjusted for this.
Add a --rocksdb_ignore_datadic_errors plugin option for MyRocks.
The default is 0, and this means MyRocks will call abort() if it detects
a DDL mismatch.
Setting rocksdb_ignore_datadic_errors=1 makes MyRocks to try to ignore the
errors and allow to start the server for repairs.
In ha_rocksdb::open(), check if the number of indexes seen from the
SQL layer matches the number of indexes in the internal MyRocks data
dictionary.
Produce an error if there is a mismatch. (If we don't produce this error,
we are likely to crash as soon as we attempt to use an index)
.. to be the same as startup.
In resolving MDEV-27461, BUF_LRU_MIN_LEN (256) is the minimum number of
pages for the innodb buffer pool size. Obviously we need more than just
flushing pages. Taking the 16k page size and its default minimum, an
extra 25% is needed on top of the flushing pages to make a workable buffer
pool.
The minimum innodb_buffer_pool_chunk_size (1M) restricts the minimum
otherwise we'd have a pool made up of different chunk sizes.
The resulting minimum innodb buffer pool sizes are:
Page Size, Previously minimum (startup), with change.
4k 5M 2M
8k 5M 3M
16k 5M 5M
32k 24M 10M
64k 24M 20M
With this patch, SET GLOBAL innodb_buffer_pool_size minimums are
enforced.
The evident minimum system variable size for innodb_buffer_pool_size
is 2M, however this is only setable if using 4k page size. As
the order of the page_size and buffer_pool_size aren't fixed, we can't
hide this change.
Subsequent changes:
* innodb_buffer_pool_resize_with_chunks.test - raised of pool resize due to new
minimums. Chunk size also needed increase as the test was for
pool_size < chunk_size to generate a warning.
* Removed srv_buf_pool_min_size and replaced use with MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* Removed srv_buf_pool_def_size and replaced constant defination in
MYSQL_SYSVAR_LONGLONG(buffer_pool_size)
* Reordered ha_innodb to allow for direct use of MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* Moved buf_pool_size_align into ha_innodb to access to MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* loose-innodb_disable_resize_buffer_pool_debug is needed in the
innodb.restart.opt test so that under debug mode, resizing of the
innodb buffer pool can occur.
There were no InnoDB changes in the MySQL 5.7.37 release that would be
relevant to MariaDB Server. We will merely update the reported
InnoDB version number.
table: rows are counted twice
Analysis: When the table we are trying to insert into and the SELECT table
are same for INSERT ... SELECT, rows from the SELECT table are copied into
internal temporary table and then to the INSERT table. We only want to
count the rows when we start inserting into the table.
Fix: Reset the counter to 1 before starting to copy from internal temporary
table to select table and then increment the counter.
create_log_files(): Check log_set_capacity() before modifying
or creating any log files.
innobase_start_or_create_for_mysql(): If create_log_files()
fails and we were initializing a new database, delete the
system tablespace files before exiting.
fil_space_decrypt(): change signature to return status via dberr_t only.
Also replace impossible condition with an assertion and prove it via
test cases.
Upon investigation, decided this to be a compiler bug
(happens with new compiler, on code that did not change for the last 15 years)
Fixed by de-optimizing single function remove_key(), using MSVC pragma
When restoring lastinx last_key.keyinfo must be updated as well. The
good example is in _ma_check_index().
The point of failure is extra(HA_EXTRA_NO_KEYREAD) in
ha_maria::get_auto_increment():
1. extra(HA_EXTRA_KEYREAD) saves lastinx;
2. maria_rkey() changes index, so the lastinx and last_key.keyinfo;
3. extra(HA_EXTRA_NO_KEYREAD) restores lastinx but not
last_key.keyinfo.
So we have discrepancy between lastinx and last_key.keyinfo after 3.
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>