Problem:
========
InnoDB fails to fetch the page0 from dblwr if page0 is
corrupted.In that case, InnoDB defers the tablespace
and doesn't find the INIT_PAGE redo log record for page0
and it leads to failure.
Solution:
=========
InnoDB should recover page0 from dblwr if space_id can
be found for deferred tablespace.
trx_rseg_header_create(): Add a parameter for the value that is
to be written to TRX_RSEG_MAX_TRX_ID. If we omit this write, then
the updated test innodb.undo_truncate will fail for the 4k, 8k, 16k
page sizes. This was broken ever since
commit 947efe17ed (MDEV-15158)
removed the writes of transaction identifiers to the TRX_SYS page.
srv_do_purge(): Truncate undo tablespaces also during slow shutdown
(innodb_fast_shutdown=0).
Thanks to Krunal Bauskar for noticing this problem.
trx_purge_truncate_history(): Do not force a write of the undo tablespace
that is being truncated. Instead, prevent page writes by acquiring
an exclusive latch on all dirty pages of the tablespace.
fseg_create(): Relax an assertion that could fail if a dirty undo page
is being initialized during undo tablespace truncation (and
trx_purge_truncate_history() already acquired an exclusive latch on it).
fsp_page_create(): If we are truncating a tablespace, try to reuse
a page that we may have already latched exclusively (because it was
in buf_pool.flush_list). To some extent, this helps the test
innodb.undo_truncate,16k to avoid running out of buffer pool.
mtr_t::commit_shrink(): Mark as clean all pages that are outside the
new bounds of the tablespace, and only add the newly reinitialized pages
to the buf_pool.flush_list.
buf_page_create(): Do not unnecessarily invoke change buffer merge on
undo tablespaces.
buf_page_t::clear_oldest_modification(bool temporary): Move some
assertions to the caller buf_page_write_complete().
innodb.undo_truncate: Use a bigger innodb_buffer_pool_size=24M.
On my system, it would otherwise hang 1 out of 1547 attempts
(on the 40th repeat of innodb.undo_truncate,16k).
Other page sizes were not affected.
At least since commit 055a3334ad
(MDEV-13564) the undo log truncation in InnoDB did not work correctly.
The main issue is that during the execution of
trx_purge_truncate_history() some pages of the newly truncated
undo tablespace could be discarded.
This is improved from commit 1cb218c37c
which was applied to earlier-version branches.
fsp_try_extend_data_file(): Apply the peculiar rounding of
fil_space_t::size_in_header only to the system tablespace,
whose size can be expressed in megabytes in a configuration parameter.
Other files may freely grow by a number of pages.
fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces,
and mention the file name in the error message.
mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace:
(1) durably write the log
(2) release the page latches of the rebuilt tablespace
(3) release the mutexes
(4) truncate the file
(5) release the tablespace latch
This is refactored from trx_purge_truncate_history().
log_write_and_flush_prepare(), log_write_and_flush(): New functions
to durably write log during mtr_t::commit_shrink().
While the redo log is being resized in srv_start(),
we must not write checkpoint information to the old log.
Thanks to Matthias Leich for noticing this.
Actual problem was that we tried to calculate persistent statistics
to wsrep_schema tables in this case wsrep_streaming_log. These tables
should not have persistent statistics. Therefore, in table creation
tables should be created with STATS_PERSISTENT=0 table option. During
rolling-upgrade tables naturally already exists, thus we need to
alter them to contain STATS_PERSISTENT=0 table option.
At least since commit 055a3334ad
(MDEV-13564) the undo log truncation in InnoDB did not work correctly.
The main issue is that during the execution of
trx_purge_truncate_history() some pages of the newly truncated
undo tablespace could be discarded.
fsp_try_extend_data_file(): Apply the peculiar rounding of
fil_space_t::size_in_header only to the system tablespace,
whose size can be expressed in megabytes in a configuration parameter.
Other files may freely grow by a number of pages.
fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces,
and mention the file name in the error message.
mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace
file. First, durably write the log, then shrink the file, and finally
release the page latches of the rebuilt tablespace. Refactored from
trx_purge_truncate_history().
log_write_and_flush_prepare(), log_write_and_flush(): New functions
to durably write log during mtr_t::commit_shrink().
- Handle stored function conditions correctly, with the same logic as with UDFs.
- When running queries on Spider SE, by default, we do not push down WHERE conditions containing usage of UDFs/stored functions to remote data nodes, unless the user demands (by setting spider_use_pushdown_udf).
- Disable direct update/delete when a udf condition is skipped.
- Handle stored function conditions correctly, with the same logic as with UDFs.
- When running queries on Spider SE, by default, we do not push down WHERE conditions containing usage of UDFs/stored functions to remote data nodes, unless the user demands (by setting spider_use_pushdown_udf).
this fix is adding alternative name to MariaDB-common on Fedora.
the fix is placed outside of IF/ELSEIF blocks to do not overwrite
existing one for Fedora.
Avoid reading uninitialized memory by thd_get_error_context_description().
Note, that THD::real_id can't be initialized at this stage, so it will be zeroed.
added alternative name for MariaDB-devel package to replace
mariadb-connector-c-devel from RHEL 8 distribution.
this patch is for 10.3+ on RHEL/Centos 8
SST scripts currently use Linux-specific construction
to create a temporary directory if the path prefix for
that directory is specified by the user. This does not
work with FreeBSD. This commit adds support for FreeBSD.
No separate test required.
btr_defragment_save_defrag_stats_if_needed(): Do not save
defragmentation statistics for temporary tables.
They are exempt of defragmentation anyway
(ha_innobase::optimize() never invokes defragmentation for them),
and the user-visible names are not available inside InnoDB.
Furthermore, InnoDB assumes that temporary tables are never accessed
by other threads than the one that handles the session with which
the temporary table is associated with.
Furthermore, we simplify the test innodb.innodb_defrag_stats
and include a test case that demonstrates that defragmentation
statistics are no longer being saved for temporary tables.
dict_stats_process_entry_from_defrag_pool(): Acquire MDL on the table
for which we are invoking dict_stats_save_defrag_stats(), to avoid
any race condition with DROP TABLE or similar operations.
When computing statistics, let us play it safe and check whether
an insert into an empty table is in progress, once we have acquired
the root page latch. If yes, let us pretend that the table is empty,
just like MVCC reads would do.
It is unlikely that this race condition could lead into any crashes
before MDEV-24621, because the index tree structure should be protected
by the page latches. But, at least we can avoid some busy work and
return earlier.
As part of this, some code that is only used for statistics calculation
is being moved into static functions in that compilation unit.
btr_root_block_get(): Check for index->page == FIL_NULL.
btr_root_get(): Declare static. Other callers can invoke
btr_root_block_get() directly.
btr_get_size(): Remove conditions that are checked in
btr_root_block_get().
* buffer pool has latches that protect access to pages.
* there is a latch per N pages.
(check page_hash_table for more details)
* N is calculated based on the cacheline size.
* for example: if cacheline size is
: 64 then 7 pages pointers + 1 latch can be hosted on the same cacheline
: 128 then 15 pages pointers + 1 latch can be hosted on the same cacheline
* arm generally have wider cacheline so with arm 1 latch is used
to access 15 pages vs with x86 1 latch is used to access 7 pages.
Naturally, the contention is more with arm case.
* said patch help relax this contention by limiting the elements
per cacheline to 7 (+ 1 latch slot).
for wider-cacheline (say 128), the remaining 8 slots are kept empty.
this ensures there are no 2 latches on the same cacheline to avoid
latch level contention.
Based on suggestion from Marko, the same logic is now extended to
lock_sys_t::hash_table.
Problem was that there was extra condition !thd->lex->no_write_to_binlog
before call to begin TOI. It seems that this variable is not initialized.
TRUNCATE does not support [NO_WRITE_TO_BINLOG | LOCAL] keywords, thus
we should not check this condition. All this was hidden in a macro,
so I decided to remove those macros that were used only a few places
with actual function calls.
innodb_max_purge_lag_wait_update(): To align with
purge_coordinator_state::refresh(), we must trigger a page flush batch
if the last log checkpoint is too old.
This was caught in a hang of the test innodb_gis.rtree_compress.
buf_flush_page_cleaner(): Always try to advance the log checkpoint,
even when no pages were flushed during the latest batch.
Maybe, since the previous batch, there was an LRU flush that
removed the last dirty pages.
Failure to advance the log checkpoint will cause unnecessary work
in Mariabackup and on crash recovery.
This was required as I added a new error code to my_base.h and rocksdb
is adding it's own errors after the last official one
Updated result file also for index_merge_rocksdb2. This is a big test
and we have probably not before noticed that some optimizer changes
caused a difference.
dict_index_t::clear_instant_alter(): when searhing for an AUTO_INCREMENT column
don't skip the beginning of the list because the field can be at the beginning of the list