Commit graph

817 commits

Author SHA1 Message Date
Daniel Black
fbd11d5f29 MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success
Review cleanups.
2023-10-13 09:48:57 +11:00
Daniel Black
c79ca7c7ad MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success
There are many filesystem related errors that can occur with
MariaBackup. These already outputed to stderr with a good description of
the error. Many of these are permission or resource (file descriptor)
limits where the assertion and resulting core crash doesn't offer
developers anything more than the log message. To the user, assertions
and core crashes come across as poor error handling.

As such we return an error and handle this all the way up the stack.
2023-10-12 21:37:27 +11:00
Marko Mäkelä
6e9b421f77 MDEV-32364 Server crashes when starting server with high innodb_log_buffer_size
log_t::create(): Return whether the initialisation succeeded.
It may fail if too large an innodb_log_buffer_size is specified.
2023-10-06 14:16:01 +03:00
Marko Mäkelä
84dbd0253d MDEV-31487: Recovery or backup failure after innodb_undo_log_truncate=ON
recv_sys_t::parse(): For undo tablespace truncation mini-transactions,
remember the start_lsn instead of the end LSN. This is what we expect
after commit 461402a564 (MDEV-30479).
2023-06-27 09:12:38 +03:00
Marko Mäkelä
bb9da13baf MDEV-31373 innodb_undo_log_truncate=ON recovery results in a corrupted undo log
recv_sys_t::apply(): When applying an undo log truncation operation,
invoke os_file_truncate() on space->recv_size, which must not be
less than the original truncated file size.

Alternatively, as pointed out by Thirunarayanan Balathandayuthapani,
we could assign space->size = t.pages, so that
fil_system_t::extend_to_recv_size() would extend the file back
to space->recv_size.
2023-06-01 12:11:18 +03:00
Oleksandr Byelkin
ac5a534a4c Merge remote-tracking branch '10.4' into 10.5 2023-03-31 21:32:41 +02:00
Vlad Lesin
4c226c1850 MDEV-29050 mariabackup issues error messages during InnoDB tablespaces export on partial backup preparing
The solution is to suppress error messages for missing tablespaces if
mariabackup is launched with "--prepare --export" options.

"mariabackup --prepare --export" invokes itself with --mysqld parameter.
If the parameter is set, then it starts server to feed "FLUSH TABLES ...
FOR EXPORT;" queries for exported tablespaces. This is "normal" server
start, that's why new srv_operation value is introduced.

Reviewed by Marko Makela.
2023-03-27 20:15:10 +03:00
Marko Mäkelä
5300c0fb76 MDEV-30657 InnoDB: Not applying UNDO_APPEND due to corruption
This almost completely reverts
commit acd23da4c2 and
retains a safe optimization:

recv_sys_t::parse(): Remove any old redo log records for the
truncated tablespace, to free up memory earlier.
If recovery consists of multiple batches, then recv_sys_t::apply()
will must invoke recv_sys_t::trim() again to avoid wrongly
applying old log records to an already truncated undo tablespace.
2023-02-15 18:16:41 +02:00
Thirunarayanan Balathandayuthapani
1a5c7552ea MDEV-30552 InnoDB recovery crashes when error handling scenario
- InnoDB fails to reset the after_apply variable before applying
the redo log in last batch during multi-batch recovery.
2023-02-14 14:36:17 +05:30
Thirunarayanan Balathandayuthapani
3eea2e8e10 MDEV-30551 InnoDB recovery hangs when buffer pool ran out of memory
- During non-last batch of multi-batch recovery, InnoDB holds
log_sys.mutex and preallocates the block which may intiate
page flush, which may initiate log flush, which requires
log_sys.mutex to acquire again. This leads to assert failure.
So InnoDB recovery should release log_sys.mutex before
preallocating the block.
2023-02-14 14:35:35 +05:30
Marko Mäkelä
acd23da4c2 MDEV-30479 optimization: Invoke recv_sys_t::trim() earlier
recv_sys_t::parse(): Discard old page-level redo log when parsing
a TRIM_PAGES record.

recv_sys_t::apply(): trim() was invoked in parse() already.

recv_sys_t::truncated_undo_spaces[]: Only store the size, no LSN.
2023-02-06 20:29:42 +02:00
Marko Mäkelä
461402a564 MDEV-30479 OPT_PAGE_CHECKSUM mismatch after innodb_undo_log_truncate=ON
page_recv_t::trim(): Do remove log records for mini-transactions
that end right at the threshold LSN. This will avoid an inconsistency
where a dirty page had been evicted from the buffer pool during
undo tablespace truncation, and recovery would attempt to apply
log records for which the last available copy in the data file is
too new. These changes would be discarded anyway.
2023-02-06 20:29:29 +02:00
Thirunarayanan Balathandayuthapani
17858e03a7 MDEV-30179 mariabackup --backup fails with FATAL ERROR: ... failed
to copy datafile

- Mariabackup fails to copy the undo log tablespace when it undergoes
truncation. So Mariabackup should detect the redo log which does
undo tablespace truncation and also backup should read the minimum
file size of the tablespace and ignore the error while reading.

- Throw error when innodb undo tablespace read failed, but backup
doesn't find the redo log for undo tablespace truncation
2023-01-10 15:47:13 +05:30
Marko Mäkelä
bd694bb7b2 MDEV-24412 InnoDB: Upgrade after a crash is not supported
recv_log_recover_10_4(): Widen the operand of bitwise and to 64 bits,
so that the upgrade check will work when the redo log record is located
more than 4 gigabytes from the start of the first file.
2022-11-28 11:56:09 +02:00
Marko Mäkelä
9d388192c7 Cleanup: Say "mariadbd" instead of "mysqld" in InnoDB messages 2022-11-22 15:32:47 +02:00
Marko Mäkelä
cff9939d09 MDEV-30068 Confusing error message when encryption is not available on recovery
fil_name_process(): If fil_ibd_load() returns FIL_LOAD_INVALID,
display the file name and the tablespace identifier.
2022-11-22 15:31:12 +02:00
Marko Mäkelä
0b25551a61 MDEV-29999 innodb_undo_log_truncate=ON is not crash safe
If a log checkpoint occurs at the end LSN of mtr.commit_shrink(space)
in trx_purge_truncate_history(), then recovery may fail because
it could try to apply too old log records to too old copies of
undo log pages. This was repeated with the following test:

./mtr innodb.undo_log_truncate,4k,strict_full_crc32

recv_sys_t::trim(): Move some code to the caller.

recv_sys_t::apply(): For undo tablespace truncation, discard
all old redo log for the undo tablespace, and then truncate
the file to the desired size.

Tested by: Matthias Leich
2022-11-15 16:56:13 +02:00
Marko Mäkelä
e0e096faaa MDEV-29982 Improve the InnoDB log overwrite error message
The InnoDB write-ahead log ib_logfile0 is of fixed size,
specified by innodb_log_file_size. If the tail of the log
manages to overwrite the head (latest checkpoint) of the log,
crash recovery will be broken.

Let us clarify the messages about this, including adding
a message on the completion of a log checkpoint that notes
that the dangerous situation is over.

To reproduce the dangerous scenario, we will introduce the
debug injection label ib_log_checkpoint_avoid_hard, which will
avoid log checkpoints even harder than the previous
ib_log_checkpoint_avoid.

log_t::overwrite_warned: The first known dangerous log sequence number.
Set in log_close() and cleared in log_write_checkpoint_info(),
which will output a "Crash recovery was broken" message.
2022-11-14 12:18:03 +02:00
Marko Mäkelä
2283f82de5 MDEV-29905 fixup: Remove some unnecessary code
srv_shutdown(): Do not call log_free_check(), because it will now
be repeatedly called by ibuf_merge_all(). Do not call
srv_sync_log_buffer_in_background(), because we do not actually care
about durability during shutdown. Log writes will already be triggered
by buf_flush_page_cleaner() for writing back modified pages, possibly by
log_free_check().

logs_empty_and_mark_files_at_shutdown(): Clean up a condition.
This function is the caller of srv_shutdown(), and it will ensure that
the log and the buffer pool will be in clean state before shutdown.
2022-11-14 10:22:29 +02:00
Marko Mäkelä
a732d5e2ba Merge 10.4 into 10.5 2022-11-08 17:01:28 +02:00
Marko Mäkelä
93b4f84ab2 Merge 10.3 into 10.4 2022-11-08 16:04:01 +02:00
Marko Mäkelä
9ac8be4e29 Include some advice in the crash-upgrade message 2022-11-08 10:39:29 +02:00
Marko Mäkelä
2f1a4328cb MDEV-29613 fixup: clang -Wunused-but-set-variable 2022-10-11 15:36:24 +03:00
Marko Mäkelä
fed0d85de7 MDEV-29559 Recovery of INSERT_HEAP_DYNAMIC into secondary index fails
log_phys_t::apply(): When parsing an INSERT_HEAP_DYNAMIC record,
allow ll==rlen to hold for the last part. A secondary index record
may inherit all preceding bytes from the infimum pseudo-record.

For INSERT_HEAP_REDUNDANT, some header bytes will always be present
because the header will never be copied from the page infimum.
We will tolerate ll==rlen also in that case to be consistent with
the parsing of INSERT_HEAP_DYNAMIC.
2022-09-19 11:46:25 +03:00
Marko Mäkelä
244fdc435d MDEV-29438 Recovery or backup of instant ALTER TABLE is incorrect
This bug was found in MariaDB Server 10.6 thanks to the
OPT_PAGE_CHECKSUM record that was implemented
in commit 4179f93d28 for catching
this type of recovery failures.

page_cur_insert_rec_low(): If the previous record is the page infimum,
correctly limit the end of the record. We do not want to copy data from
the header of the page supremum. This omission caused the incorrect
recovery of DB_TRX_ID in an instant ALTER TABLE metadata record, because
part of the DB_TRX_ID was incorrectly copied from the n_owned of the
page supremum, which in recovery would be updated after the copying,
but in normal operation would already have been updated at the time the
common prefix was being determined.

log_phys_t::apply(): If a data page is found to be corrupted, do not
flag the log corrupted but instead return a new status APPLIED_CORRUPTED
so that the caller may discard all log for this page. We do not want
the recovery of unrelated pages to fail in recv_recover_page().

No test case is included, because the known test case would only work
in 10.6, and even after this fix, it would trigger another bug in
instant ALTER TABLE crash recovery.
2022-09-05 09:54:47 +03:00
Marko Mäkelä
5d8dcfd86c MDEV-25975: Merge 10.4 into 10.5 2022-04-06 10:30:49 +03:00
Marko Mäkelä
cbdf62ae90 MDEV-25975 merge fixup 2022-04-06 10:13:21 +03:00
Marko Mäkelä
d172df9913 MDEV-25975: Merge 10.3 into 10.4 2022-04-06 09:18:38 +03:00
Marko Mäkelä
e9735a8185 MDEV-25975 innodb_disallow_writes causes shutdown to hang
We will remove the parameter innodb_disallow_writes because it is badly
designed and implemented. The parameter was never allowed at startup.
It was only internally used by Galera snapshot transfer.
If a user executed
SET GLOBAL innodb_disallow_writes=ON;
the server could hang even on subsequent read operations.

During Galera snapshot transfer, we will block writes
to implement an rsync friendly snapshot, as follows:

sst_flush_tables() will acquire a global lock by executing
FLUSH TABLES WITH READ LOCK, which will block any writes
at the high level.

sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
will suspend or disable InnoDB background tasks or threads that could
initiate writes. As part of this, log_make_checkpoint() will be invoked
to ensure that anything in the InnoDB buf_pool.flush_list will be written
to the data files. This has the nice side effect that the Galera joiner
will avoid crash recovery.

The changes to sql/wsrep.cc and to the tests are based on a prototype
that was developed by Jan Lindström.

Reviewed by: Jan Lindström
2022-04-06 08:06:49 +03:00
Marko Mäkelä
42609c240d Cleanup: Replace log_sys.n_pending_checkpoint_writes with a Boolean
Only one checkpoint may be in progress at a time.
The counter log_sys.n_pending_checkpoint_writes
was being protected by log_sys.mutex.
Let us replace it with the Boolean log_sys.checkpoint_pending.
2022-03-29 14:56:44 +03:00
Marko Mäkelä
5503c40460 Stabilize innodb.redo_log_during_checkpoint
Externally kill and restart the server, and remove the
unreliable crash_after_checkpoint.
2022-03-11 09:46:50 +02:00
Vladislav Vaintroub
881918bf77 MDEV-27754 : Assertion with innodb_flush_method=O_DSYNC
If innodb_flush_method=O_DSYNC, log_sys.flushed_to_disk_lsn  is changed
without 'flush_lock' protection inside log_write().

This leads to a race condition, if there are 2 threads running in parallel,
doing log_write_up_to() with different values for 'flush_to_disk'

In this case, log_write() and log_write_flush_to_disk_low() can execute at
the same time, and both would change flushed_lsn.

The fix is to remove special treatment of durable writes from log_write().
There is no apparent reason for this special treatment, log_write_flush_to_disk_low()
is already optimized for durable writes.

Nor there is an apparent reason to call log_flush_notify() more often in
for O_DSYNC.
2022-02-07 09:14:00 +01:00
Marko Mäkelä
56f5599f09 MDEV-27610 Unnecessary wait in InnoDB crash recovery
In recv_sys_t::apply(), we were unnecessarily looking up pages
in buf_pool.page_hash and potentially waiting for exclusive page latches.

Before buf_page_get_low() would return an x-latched page,
that page will have to be read and buf_page_read_complete() would
have invoked recv_recover_page() to apply the log to the page.

Therefore, it suffices to invoke recv_read_in_area() to trigger
a transition from RECV_NOT_PROCESSED.

recv_read_in_area(): Take the iterator as a parameter, and remove
page_id lookups. Should the page already be in buf_pool.page_hash,
buf_page_init_for_read() will return nullptr to buf_read_page_low()
and buf_read_page_background().

recv_sys_t::apply(): Replace goto, remove dead code, and add assertions
to guarantee that the iteration will make progress.

Reviewed by: Vladislav Lesin
2022-01-26 13:29:34 +02:00
Marko Mäkelä
8535c260dd Remove FIXME comments that refer to an early MDEV-14425 plan
In MDEV-14425, an early plan was to introduce a separate log file
for file-level records and checkpoint information. The reasoning was
that fil_system.mutex contention would be reduced by not having to
maintain fil_system.named_spaces. The mutex contention was actually
fixed in MDEV-23855 by making some data fields in fil_space_t and
fil_node_t use std::atomic.

Using a single circular log file simplifies recovery and backup.
2022-01-14 20:27:51 +02:00
Eugene Kosov
f443cd1100 MDEV-27022 Buffer pool is being flushed during recovery
The problem was introduced by the removal of buf_pool.flush_rbt
in commit 46b1f50098 (MDEV-23399)

recv_sys_t::apply(): don't write to disc and fsync() the last batch.
Insead, sort it by oldest_modification for MariaDB server and some
mariabackup operations.

log_sort_flush_list(): a thread-safe function which sorts buf_pool::flush_list
2022-01-11 16:20:20 +03:00
Marko Mäkelä
4c3ad24413 MDEV-27416 InnoDB hang in buf_flush_wait_flushed(), on log checkpoint
InnoDB could sometimes hang when triggering a log checkpoint. This is
due to commit 7b1252c03d (MDEV-24278),
which introduced an untimed wait to buf_flush_page_cleaner().

The hang was noticed by occasional failures of IMPORT TABLESPACE tests,
such as innodb.innodb-wl5522, which would (unnecessarily) invoke
log_make_checkpoint() from row_import_cleanup().

The reason of the hang was that buf_flush_page_cleaner() would enter
untimed sleep despite buf_flush_sync_lsn being set. The exact failure
scenario is unclear, because buf_flush_sync_lsn should actually be
protected by buf_pool.flush_list_mutex. We prevent the hang by
invoking buf_pool.page_cleaner_set_idle(false) whenever we are
setting buf_flush_sync_lsn and signaling buf_pool.do_flush_list.

The bulk of these changes was originally developed as a preparation
for MDEV-26827, to invoke buf_flush_list() from fewer threads,
and tested on 10.6 by Matthias Leich.

This fix was tested by running 100 repetitions of 100 concurrent instances
of the test innodb.innodb-wl5522 on a RelWithDebInfo build, using ext4fs
and innodb_flush_method=O_DIRECT on a SATA SSD with 4096-byte block size.
During the test, the call to log_make_checkpoint() in row_import_cleanup()
was present.

buf_flush_list(): Make static.

buf_flush_wait(): Wait for buf_pool.get_oldest_modification()
to reach a target, by work done in the buf_flush_page_cleaner.
If buf_flush_sync_lsn is going to be set, we will invoke
buf_pool.page_cleaner_set_idle(false).

buf_flush_ahead(): If buf_flush_sync_lsn or buf_flush_async_lsn
is going to be set and the page cleaner woken up, we will invoke
buf_pool.page_cleaner_set_idle(false).

buf_flush_wait_flushed(): Invoke buf_flush_wait().

buf_flush_sync(): Invoke recv_sys.apply() at the start in case
crash recovery is active. Invoke buf_flush_wait().

buf_flush_sync_batch(): A lower-level variant of buf_flush_sync()
that is only called by recv_sys_t::apply().

buf_flush_sync_for_checkpoint(): Do not trigger log apply
or checkpoint during recovery.

buf_dblwr_t::create(): Only initiate a buffer pool flush, not
a checkpoint.

row_import_cleanup(): Do not unnecessarily invoke log_make_checkpoint().
Invoking buf_flush_list_space() before starting to generate redo log
for the imported tablespace should suffice.

srv_prepare_to_delete_redo_log_file():
Set recv_sys.recovery_on in order to prevent
buf_flush_sync_for_checkpoint() from initiating a checkpoint
while the log is inaccessible. Remove a wait loop that is already
part of buf_flush_sync().
Do not invoke fil_names_clear() if the log is being upgraded,
because the FILE_MODIFY record is specific to the latest format.

create_log_file(): Clear recv_sys.recovery_on only after calling
log_make_checkpoint(), to prevent buf_flush_page_cleaner from
invoking a checkpoint.

innodb_shutdown(): Simplify the logic in mariadb-backup --prepare.

os_aio_wait_until_no_pending_writes(): Update the function comment.
Apart from row_quiesce_table_start() during FLUSH TABLES...FOR EXPORT,
this is being called by buf_flush_list_space(), which is invoked
by ALTER TABLE...IMPORT TABLESPACE as well as some encryption operations.
2022-01-04 07:40:31 +02:00
Marko Mäkelä
1df05a0854 Correct some copyright messages
Most of the Facebook contribution
mysql/mysql-server@72d656acdf
was removed in
commit 5bea43f5e0 (MDEV-12353).
Mainly the configuration parameter innodb_compression_level remains.
It had been renamed to page_zip_level in
mysql/mysql-server@5b38f2a712.
2022-01-03 07:23:39 +02:00
Marko Mäkelä
c14dd0d19d Cleanup: Remove RECV_READ_AHEAD_AREA
Let us directly use the constant 32 in recv_read_in_area().
2022-01-03 07:23:18 +02:00
Marko Mäkelä
cfcfdc65df MDEV-27190 InnoDB upgrade from 10.2, 10.3, 10.4 is not crash-safe
During startup, InnoDB must write a FILE_CHECKPOINT record.
However, before MDEV-12353 (in MariaDB Server 10.2, 10.3, 10.4)
the corresponding record MLOG_CHECKPOINT was encoded in a different way.

When we are upgrading from a logically empty 10.2, 10.3, or 10.4 redo log,
we must not write anything to the old log file, because if the server were
killed during the upgrade, we would end up with a corrupted log file, and
both the old and the new server would refuse to start up.

On upgrade, we must simply create a new logically empty log file
and replace the old ib_logfile0 with that.
2021-12-07 17:00:46 +02:00
Eugene Kosov
890c55177d MDEV-27183 optimize std::map lookup in during crash recovery
This is a low hanging fruit. Before this patch std::map::emplace() was
a ~50% of the whole recv_sys_t::parse() operation in by test.
After the fix it's only ~20%.

recv_sys_t::parse() recv_sys_t::pages is a collection of all pages
to recovery. Often, there are multiple changes for a single page.
Often, they go in a row and for such cases let's avoid
lookup in a std::map. cached_pages_it serves as a cache
of size 1.

recv_sys_t::add(): replace page_id argument with a std::map::iterator
2021-12-07 15:50:00 +06:00
Eugene Kosov
0064316f19 cleanup: reduce code bloat 2021-12-06 14:06:17 +06:00
Eugene Kosov
5d7da02793 MDEV-27139 32-bit systems fail to use big innodb-log-file-size
log_write_buf(): do not cast to size_t which prevents to write to files
which a bigger that 4G and remove useless assertion
2021-12-03 14:57:23 +06:00
Marko Mäkelä
a0fda162eb Fix GCC 11.2.0 -m32 (IA-32) warnings
page_create_low(): Fix -Warray-bounds

log_buffer_extend(): Fix -Wstringop-overflow
2021-10-21 15:31:21 +03:00
Marko Mäkelä
a736a3174a Merge 10.3 into 10.4 2021-10-13 12:03:32 +03:00
Marko Mäkelä
4a7dfda373 Merge 10.2 into 10.3 2021-10-13 11:38:21 +03:00
Marko Mäkelä
2bb8d7c2f3 MDEV-26811: Assertion "log_sys->n_pending_flushes == 1" fails
In commit 1cb218c37c (MDEV-26450)
we introduced the function log_write_and_flush(), which may
compete with log_checkpoint() invoking log_write_flush_to_disk_low()
from another thread.

The assertion n_pending_flushes==1 is too strict.
There is no possibility of a race condition here, because
fil_flush() is protected by fil_system->mutex and the
rest will be protected by log_sys->mutex.

log_write_flush_to_disk_low(), log_write_and_flush():
Relax the assertions to test for a nonzero count.
2021-10-13 10:38:41 +03:00
Marko Mäkelä
f5fddae3cb MDEV-26450: Corruption due to innodb_undo_log_truncate
At least since commit 055a3334ad
(MDEV-13564) the undo log truncation in InnoDB did not work correctly.

The main issue is that during the execution of
trx_purge_truncate_history() some pages of the newly truncated
undo tablespace could be discarded.

This is improved from commit 1cb218c37c
which was applied to earlier-version branches.

fsp_try_extend_data_file(): Apply the peculiar rounding of
fil_space_t::size_in_header only to the system tablespace,
whose size can be expressed in megabytes in a configuration parameter.
Other files may freely grow by a number of pages.

fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces,
and mention the file name in the error message.

mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace:
(1) durably write the log
(2) release the page latches of the rebuilt tablespace
(3) release the mutexes
(4) truncate the file
(5) release the tablespace latch
This is refactored from trx_purge_truncate_history().

log_write_and_flush_prepare(), log_write_and_flush(): New functions
to durably write log during mtr_t::commit_shrink().
2021-09-24 08:22:19 +03:00
Marko Mäkelä
9024498e88 Merge 10.3 into 10.4 2021-09-22 18:26:54 +03:00
Marko Mäkelä
b46cf33ab8 Merge 10.2 into 10.3 2021-09-22 18:01:41 +03:00
Marko Mäkelä
1cb218c37c MDEV-26450: Corruption due to innodb_undo_log_truncate
At least since commit 055a3334ad
(MDEV-13564) the undo log truncation in InnoDB did not work correctly.

The main issue is that during the execution of
trx_purge_truncate_history() some pages of the newly truncated
undo tablespace could be discarded.

fsp_try_extend_data_file(): Apply the peculiar rounding of
fil_space_t::size_in_header only to the system tablespace,
whose size can be expressed in megabytes in a configuration parameter.
Other files may freely grow by a number of pages.

fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces,
and mention the file name in the error message.

mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace
file. First, durably write the log, then shrink the file, and finally
release the page latches of the rebuilt tablespace. Refactored from
trx_purge_truncate_history().

log_write_and_flush_prepare(), log_write_and_flush(): New functions
to durably write log during mtr_t::commit_shrink().
2021-09-22 14:15:00 +03:00