Commit graph

1416 commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
be0dfcdb99 MDEV-34200 InnoDB tries to write to read-only system tablespace
in buf_dblwr_t::init_or_load_pages()

- InnoDB fails to set the TRX_SYS_DOUBLEWRITE_SPACE_ID_STORED
flag in transaction system header page while recreating
the undo log tablespaces

buf_dblwr_t::init_or_load_pages(): Tries to reset the
space id and try to write into doublewrite buffer even
when read_only mode is enabled.

In srv_all_undo_tablespaces_open(), InnoDB should try to
open the extra unused undo tablespaces instead of trying to
creating it.
2024-05-22 13:16:10 +05:30
Sergei Golubchik
a6b2f820e0 Merge branch '10.6' into 10.11 2024-05-10 20:02:18 +02:00
Sergei Golubchik
7b53672c63 Merge branch '10.5' into 10.6 2024-05-08 20:06:00 +02:00
Thirunarayanan Balathandayuthapani
9b2bf09b95 MDEV-33980 mariadb-backup --backup is missing retry logic for undo tablespaces
- This is a merge of commit f378e76434
from 10.4 to 10.5.
2024-05-06 19:50:20 +05:30
Julius Goryavsky
b88c20ce1b Merge branch 10.4 into 10.5 2024-05-06 13:55:42 +02:00
Thirunarayanan Balathandayuthapani
f378e76434 MDEV-33980 mariadb-backup --backup is missing retry logic for undo tablespaces
Problem:
========
- Currently mariabackup have to reread the pages in case they are
modified by server concurrently. But while reading the undo
tablespace, mariabackup failed to do reread the page in case of
error.

Fix:
===
Mariabackup --backup functionality should have retry logic
while reading the undo tablespaces.
2024-04-30 16:15:26 +05:30
mariadb-DebarunBanerjee
5928e04d5f MDEV-32489 Change buffer index fails to delete the records
When the change buffer records for a page span across multiple change
buffer leaf pages or the starting record is at the beginning of a page
with a left sibling, ibuf_delete_recs deletes only the records in first
page and fails to move to subsequent pages.

Subsequently a slow shutdown hangs trying to delete those left over
records.

Fix-A: Position the cursor to an user record in B-tree and exit only
when all records are exhausted.

Fix-B: Make sure we call ibuf_delete_recs during slow shutdown for
pages with IBUF entries to cleanup any previously left over records.
2024-04-18 08:30:21 +05:30
Marko Mäkelä
788953463d Merge 10.6 into 10.11
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
2024-03-28 09:16:57 +02:00
Marko Mäkelä
fa8a46eb68 MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool
By design, InnoDB has always hung when permanently running out of
buffer pool, for example when several threads are waiting to allocate
a block, and all of the buffer pool is buffer-fixed by the active threads.

The hang that we are fixing here occurs when the buffer pool is only
temporarily running out and the situation could be rescued by writing out
some dirty pages or evicting some clean pages.

buf_LRU_get_free_block(): Simplify the way how we wait for
the buf_flush_page_cleaner thread. This fixes occasional hangs
of the test encryption.innochecksum that were introduced by
commit a55b951e60 (MDEV-26827).
To play it safe, we use a timed wait when waiting for the
buf_flush_page_cleaner() thread to perform its job. Should that
thread get stuck, we will invoke buf_pool.LRU_warn() in order to
display a message that pages could not be freed, and keep trying
to wake up the buf_flush_page_cleaner() thread.

The INFORMATION_SCHEMA.INNODB_METRICS counters
buffer_LRU_single_flush_failure_count and
buffer_LRU_get_free_waits will be removed.
The latter is represented by buffer_pool_wait_free.

Also removed will be the message
"InnoDB: Difficult to find free blocks in the buffer pool"
because in d34479dc66 we
introduced a more precise message
"InnoDB: Could not free any blocks in the buffer pool"
in the buf_flush_page_cleaner thread.

buf_pool_t::LRU_warn(): Issue the warning message that we could
not free any blocks in the buffer pool. This may also be invoked
by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears
to be stuck.

buf_pool_t::n_flush_dec(): Remove.

buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec().

buf_flush_LRU_list_batch(): Increment the eviction counter for blocks
of temporary, discarded or dropped tablespaces.

buf_flush_LRU(): Make static, and remove the constant parameter
evict=false. The only caller will be the buf_flush_page_cleaner()
thread.

IORequest::is_LRU(): Remove. The only case of evicting pages on
write completion will be when we are writing out pages of the
temporary tablespace. Those pages are not in buf_pool.flush_list,
only in buf_pool.LRU.

buf_page_t::flush(): Remove the parameter evict.

buf_page_t::write_complete(): Change the parameter "bool temporary"
to "bool persistent" and add a parameter for an already read state().

Reviewed by: Debarun Banerjee
2024-03-22 14:17:39 +02:00
Marko Mäkelä
c3a00dfa53 Merge 10.5 into 10.6 2024-03-12 09:19:57 +02:00
Thirunarayanan Balathandayuthapani
6e5333fc8c MDEV-32445 InnoDB may corrupt its log before upgrading it on startup
Problem:
========
 During upgrade, InnoDB does write the redo log for adjusting
the tablespace size or tablespace flags even before the log
has upgraded to configured format. This could lead to data
inconsistent if any crash happened during upgrade process.

Fix:
===
srv_start(): Write the tablespace flags adjustment, increased
tablespace size redo log only after redo log upgradation.

log_write_low(), log_reserve_and_write_fast(): Check whether
the redo log is in physical format.
2024-03-06 15:01:26 +05:30
Marko Mäkelä
71834ccb6c MDEV-24671 fixup: Remove srv_max_n_threads
The variable srv_max_n_threads lost its usefulness in
commit db006a9a43 (MDEV-21452)
and commit e71e613353 (MDEV-24671).
2024-02-27 11:14:28 +02:00
Marko Mäkelä
3dd7b0a80c Cleanup: Remove OS_FILE_ON_ERROR_NO_EXIT
Ever since commit 412ee0330c
or commit a440d6ed3a
InnoDB should generally not abort when failing to open or create files.
In Datafile::open_or_create() we had failed to set the flag
to avoid abort() on failure, but everywhere else we were setting it.

We may still call abort() via os_file_handle_error().

Reviewed by: Vladislav Vaintroub
2024-02-20 11:22:52 +02:00
Marko Mäkelä
466069b184 Merge 10.5 into 10.6 2024-02-08 10:38:53 +02:00
mariadb-DebarunBanerjee
fb9da7f751 MDEV-33023 Crash in mariadb-backup --prepare --export after --prepare
mariadb-backup with --prepare option could result in empty redo log
file. When --prepare is followed by --prepare --export, we exit early
in srv_start function without opening the ibdata1 tablespace. Later
while trying to read rollback segment header page, we hit the debug
assert which claims that the system space should already have been
opened.

There are two assert cases here.

Issue-1: System tablespace object is not there in fil space hash i.e.
srv_sys_space.open_or_create() is not called.

Issue-2: The system tablespace data file ibdata1 is not opened i.e.
fil_system.sys_space->open() is not called.

Fix: For empty redo log and restore operation, open system tablespace
before returning.
2024-02-07 23:12:15 +05:30
Marko Mäkelä
9d20853c74 Merge 10.6 into 10.11 2024-01-18 19:22:23 +02:00
Marko Mäkelä
ad13fb36bf Merge 10.6 into 10.11 2024-01-17 17:37:15 +02:00
Marko Mäkelä
3a96eba25f Merge 10.5 into 10.6 2024-01-17 13:35:05 +02:00
Marko Mäkelä
f8c88d905b MDEV-33213 History list is not shrunk unless there is a pause in the workload
The parameter innodb_undo_log_truncate=ON enables a multi-phased logic:
1. Any "producers" (new starting transactions) are prohibited
from using the rollback segments that reside in the undo tablespace.
2. Any transactions that use any of the rollback segments must be
committed or aborted.
3. The purge of committed transaction history must process all the
rollback segments.
4. The undo tablespace is truncated and rebuilt.
5. The rollback segments are re-enabled for new transactions.

There was one flaw in this logic: The first step was not being invoked
as often as it could be, and therefore innodb_undo_log_truncate=ON
would have no chance to work during a heavy write workload.

Independent of innodb_undo_log_truncate, even after
commit 86767bcc0f
we are missing some chances to free processed undo log pages.
If we prohibited the creation of new transactions in one busy
rollback segment at a time, we would be eventually guaranteed
to be able to free such pages.

purge_sys_t::skipped_rseg: The current candidate rollback segment
for shrinking the history independent of innodb_undo_log_truncate.

purge_sys_t::iterator::free_history_rseg(): Renamed from
trx_purge_truncate_rseg_history(). Implement the logic
around purge_sys.m_skipped_rseg.

purge_sys_t::truncate_undo_space: Renamed from truncate.

purge_sys.truncate_undo_space.last: Changed the type to integer
to get rid of some pointer dereferencing and conditional branches.

purge_sys_t::truncating_tablespace(), purge_sys_t::undo_truncate_try():
Refactored from trx_purge_truncate_history().
Set purge_sys.truncate_undo_space.current if applicable,
or return an already set purge_sys.truncate_undo_space.current.

purge_coordinator_state::do_purge(): Invoke
purge_sys_t::truncating_tablespace() as part of the normal work loop,
to implement innodb_undo_log_truncate=ON as often as possible.

trx_purge_truncate_rseg_history(): Remove a redundant parameter.

trx_undo_truncate_start(): Replace dead code with a debug assertion.

Correctness tested by: Matthias Leich
Performance tested by: Axel Schwenke
Reviewed by: Debarun Banerjee
2024-01-17 11:14:24 +02:00
Thirunarayanan Balathandayuthapani
caad34df54 MDEV-32968 InnoDB fails to restore tablespace first page from doublewrite buffer when page is empty
- InnoDB fails to find the space id from the page0 of
the tablespace. In that case, InnoDB can use
doublewrite buffer to recover the page0 and write
into the file.

- buf_dblwr_t::init_or_load_pages(): Loads only the pages
which are valid.(page lsn >= checkpoint). To do that,
InnoDB has to open the redo log before system
tablespace, read the latest checkpoint information.

recv_dblwr_t::find_first_page():
1) Iterate the doublewrite buffer pages and find the 0th page
2) Read the tablespace flags, space id from the 0th page.
3) Read the 1st, 2nd and 3rd page from tablespace file and
compare the space id with the space id which is stored
in doublewrite buffer.
4) If it matches then we can write into the file.
5) Return space which matches the pages from the file.

SysTablespace::read_lsn_and_check_flags(): Remove the
retry logic for validating the first page. After
restoring the first page from doublewrite buffer,
assign tablespace flags by reading the first page.

recv_recovery_read_max_checkpoint(): Reads the maximum
checkpoint information from log file

recv_recovery_from_checkpoint_start(): Avoid reading
the checkpoint header information from log file

Datafile::validate_first_page(): Throw error in case
of first page validation fails.
2024-01-15 14:08:27 +05:30
Marko Mäkelä
1eb11da3e5 Merge 10.6 into 10.11 2024-01-10 12:37:19 +02:00
Marko Mäkelä
593278f927 MDEV-32050 fixup: Remove srv_purge_rseg_truncate_frequency 2024-01-10 11:52:26 +02:00
Marko Mäkelä
bdf65893dd Merge 10.6 into 10.11 2024-01-03 15:37:57 +02:00
Marko Mäkelä
8bd5a3de7f Merge 10.5 into 10.6 2024-01-03 14:24:47 +02:00
Marko Mäkelä
cc5c0eda4c MDEV-33156 Crash on innodb_buf_flush_list_now=ON and innodb_force_recovery=6
srv_start(): Move a read only mode startup tweak from
innodb_init_params() to the correct location. Also if
innodb_force_recovery=6 we will disable the doublewrite buffer,
because InnoDB must run in read-only mode to prevent further corruption.

This change only affects debug checks. Whenever srv_read_only_mode holds,
the buf_pool.flush_list will be empty, that is, there will be no writes
of persistent InnoDB data pages.

Reviewed by: Thirunarayanan Balathandayuthapani
2024-01-03 12:08:21 +02:00
Sergei Golubchik
fd0b47f9d6 Merge branch '10.6' into 10.11 2023-12-18 11:19:04 +01:00
Thirunarayanan Balathandayuthapani
d018b90990 MDEV-32920 innodb_buffer_pool_read_requests always 0
srv_export_innodb_status(): Update
export_vars.innodb_buffer_pool_read_requests
with buf_pool.stat.n_page_gets. This is caused due
to incorrect merge commit 44c9008ba6
2023-12-07 15:18:24 +05:30
Marko Mäkelä
90d968dab9 Merge 10.6 into 10.11 2023-11-20 10:08:19 +02:00
Marko Mäkelä
eb1f8b2919 MDEV-32027 Opening all .ibd files on InnoDB startup can be slow
dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES.

dict_check_tablespaces_and_store_max_id(): In the normal case
(no encryption plugin has been loaded and the change buffer is empty),
invoke dict_find_max_space_id() and do not open any .ibd files.
If a std::set<uint32_t> has been specified, open the files whose
tablespace ID is mentioned. Else, open all data files that are identified
by SYS_TABLES records.

fil_ibd_open(): Remove a call to os_file_get_last_error() that can
report a misleading error, such as EINVAL inside my_realpath() that is
not an actual error. This could be invoked when a data file is found
but the FSP_SPACE_FLAGS are incorrect, such as is the case for
table test.td in
./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags

buf_load(): If any tablespaces could not be found, invoke
dict_check_tablespaces_and_store_max_id() on the missing tablespaces.

dict_load_tablespace(): Try to load the tablespace unless it was found
to be futile. This fixes failures related to FTS_*.ibd files for
FULLTEXT INDEX.

btr_cur_t::search_leaf(): Prevent a crash when the tablespace
does not exist. This was caught by the test innodb_fts.fts_concurrent_insert
when the change to dict_load_tablespaces() was not present.

We modify a few tests to ensure that tables will not be loaded at startup.
For some fault injection tests this means that the corrupted tables
will not be loaded, because dict_load_tablespace() would perform stricter
checks than dict_check_tablespaces_and_store_max_id().

Tested by: Matthias Leich
Reviewed by: Thirunarayanan Balathandayuthapani
2023-11-17 15:07:51 +02:00
Marko Mäkelä
52ca2e65af Merge 10.5 into 10.6 2023-11-15 14:10:21 +02:00
Marko Mäkelä
c638051d80 MDEV-32798 innodb_fast_shutdown=0 hang after incomplete startup
innodb_preshutdown(): Only wait for active transactions to be terminated
if InnoDB was started and innodb_force_recovery=3 or larger does not
prevent a rollback.

This fixes the following:

./mtr --parallel=auto --mysqld=--innodb-fast-shutdown=0 \
innodb.log_file_size innodb.innodb_force_recovery \
innodb.read_only_recovery innodb.read_only_recover_committed \
mariabackup.apply-log-only-incr
2023-11-14 14:35:51 +02:00
Oleksandr Byelkin
fecd78b837 Merge branch '10.10' into 10.11 2023-11-08 16:46:47 +01:00
Oleksandr Byelkin
04d9a46c41 Merge branch '10.6' into 10.10 2023-11-08 16:23:30 +01:00
Oleksandr Byelkin
b83c379420 Merge branch '10.5' into 10.6 2023-11-08 15:57:05 +01:00
Oleksandr Byelkin
6cfd2ba397 Merge branch '10.4' into 10.5 2023-11-08 12:59:00 +01:00
Marko Mäkelä
2ba9702163 MDEV-32050: Boost innodb_purge_batch_size on slow shutdown
A slow shutdown using the previous default innodb_purge_batch_size=300
could be extremely slow, employing at most a few CPU cores on the average.
Let us use the maximum batch size in order to increase throughput.

Reviewed by: Vladislav Lesin
2023-10-25 10:21:49 +03:00
Marko Mäkelä
88733282fb MDEV-32050: Look up tables in the purge coordinator
The InnoDB table lookup in purge worker threads is a bottleneck that can
degrade a slow shutdown to utilize less than 2 threads. Let us fix that
bottleneck by constructing a local lookup table that does not require any
synchronization while the undo log records of the current batch
are being processed.

TRX_PURGE_TABLE_BUCKETS: The initial number of std::unordered_map
hash buckets used during a purge batch. This could avoid some
resizing and rehashing in trx_purge_attach_undo_recs().

purge_node_t::tables: A lookup table from table ID to an already
looked up and locked table. Replaces many fields.

trx_purge_attach_undo_recs(): Look up each table in the purge batch
only once.

trx_purge(): Close all tables and release MDL at the end of the batch.

trx_purge_table_open(), trx_purge_table_acquire(): Open a table in purge
and acquire a metadata lock on it. This replaces
dict_table_open_on_id<true>() and dict_acquire_mdl_shared().

purge_sys_t::close_and_reopen(): In case of an MDL conflict, close and
reopen all tables that are covered by the current purge batch.
It may be that some of the tables have been dropped meanwhile and can
be ignored. This replaces wait_SYS() and wait_FTS().

row_purge_parse_undo_rec(): Make purge_coordinator_task issue a
MDL warrant to any purge_worker_task which might need it
when innodb_purge_threads>1.

purge_node_t::end(): Clear the MDL warrant.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub
2023-10-25 10:08:20 +03:00
Marko Mäkelä
d70a98ae06 MDEV-32050: Revert the throttling of MDEV-26356
purge_coordinator_state::do_purge(): Simply use all innodb_purge_threads,
no matter what the LSN age is. During shutdown with innodb_fast_shutdown=0
this code could degrade to using only 1 thread.

Also, restore periodical "InnoDB: to purge" messages that were
accidentally disabled in commit 80585c9d6f.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub
2023-10-25 09:42:38 +03:00
Marko Mäkelä
44689eb7d8 MDEV-32050: Improve srv_wake_purge_thread_if_not_active()
purge_sys_t::wake_if_not_active(): Replaces
srv_wake_purge_thread_if_not_active().

innodb_ddl_recovery_done(): Move the wakeup call to
srv_init_purge_tasks().

purge_coordinator_timer: Remove. The srv_master_callback() already
invokes purge_sys.wake_if_not_active() once per second.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub
2023-10-25 09:38:21 +03:00
Marko Mäkelä
14685b10df MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency
The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.

Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.

The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.

purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).

purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.

purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().

purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().

Reviewed by: Vladislav Lesin
2023-10-25 09:11:58 +03:00
Marko Mäkelä
65700edb26 Merge 10.10 into 10.11 2023-10-19 14:50:42 +03:00
Marko Mäkelä
c92d06748a Merge 10.6 into 10.10 2023-10-19 14:35:31 +03:00
Marko Mäkelä
6991b1c47c Merge 10.5 into 10.6 2023-10-19 13:50:00 +03:00
Thirunarayanan Balathandayuthapani
85751ed81d MDEV-31851 After crash recovery, undo tablespace fails to open
srv_all_undo_tablespaces_open(): While opening the extra unused
undo tablespaces, InnoDB should use ULINT_UNDEFINED instead of
SRV_SPACE_ID_UPPER_BOUND.
2023-10-19 15:39:44 +05:30
Marko Mäkelä
f833ef5a2a Merge 10.10 into 10.11 2023-10-18 18:35:39 +03:00
Marko Mäkelä
c857259ebb Merge 10.6 into 10.10 2023-10-18 16:38:09 +03:00
Marko Mäkelä
bf7c6fc20b MDEV-32511 Assertion !os_aio_pending_writes() failed
In MemorySanitizer builds of 10.10 and 10.11, we would rather often
have the assertion fail in innodb_init() during mariadb-backup --prepare.
The assertion could also fail during InnoDB startup, but less often.

Before commit 685d958e38 in 10.8 the
log file cleanup after a successfully applied backup is different,
and the os_aio_pending_writes() assertion is in srv0start.cc.

IORequest::write_complete(): Invoke node->complete_write() before
releasing the page latch, so that a log checkpoint that is about to
execute concurrently will not miss a fdatasync() or fsync() on the
file, in case this was the first write since the last such call.

create_log_file(), srv_start(): Replace the debug assertion with
a debug check. For all intents and purposes, all writes could have
been completed but some write_io_callback() may not have invoked
io_slots::release() yet.
2023-10-18 16:33:11 +03:00
Thirunarayanan Balathandayuthapani
3da5d047b8 MDEV-31851 After crash recovery, undo tablespace fails to open
Problem:
========
- InnoDB fails to open undo tablespace when page0 is corrupted
and fails to throw error.

Solution:
=========
- InnoDB throws DB_CORRUPTION error when InnoDB encounters
page0 corruption of undo tablespace.

- InnoDB restores the page0 of undo tablespace from
doublewrite buffer if it encounters page corruption

- Moved Datafile::restore_from_doublewrite() to
recv_dblwr_t::restore_first_page(). So that undo
tablespace and system tablespace can use this function
instead of duplicating the code

srv_undo_tablespace_open(): Returns 0 if file doesn't exist
or ULINT_UNDEFINED if page0 is corrupted.
2023-10-17 18:41:21 +05:30
Marko Mäkelä
2ecc0443ec Merge 10.10 into 10.11 2023-10-17 16:04:21 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00