Commit graph

466 commits

Author SHA1 Message Date
Yuchen Pei
2d3e2c58b6
Merge branch '10.11' into 11.1 2024-05-31 10:54:31 +10:00
Marko Mäkelä
22ba7e4ff8 Merge 10.6 into 10.11 2024-05-30 16:04:00 +03:00
Marko Mäkelä
5ba542e9ee Merge 10.5 into 10.6 2024-05-30 14:27:07 +03:00
mariadb-DebarunBanerjee
b2944adb76 MDEV-34166 Server could hang with BP < 80M under stress
BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size.
For example, for BP size lower than 80M and 16 K page size, the limit is
more than 5% of total BP and for lowest BP 5M, it is 80% of the BP.
Non-data objects like explicit locks could occupy part of the BP pool
reducing the pages available for LRU. If LRU reaches minimum limit and
if no free pages are available, server would hang with page cleaner not
able to free any more pages.

Fix: To avoid such hang, we adjust the LRU limit lower than the limit
for data objects as checked in buf_LRU_check_size_of_non_data_objects()
i.e. one page less than 5% of BP.
2024-05-21 14:13:29 +05:30
Sergei Golubchik
f9807aadef Merge branch '10.11' into 11.0 2024-05-12 12:18:28 +02:00
Sergei Golubchik
a6b2f820e0 Merge branch '10.6' into 10.11 2024-05-10 20:02:18 +02:00
Sergei Golubchik
7b53672c63 Merge branch '10.5' into 10.6 2024-05-08 20:06:00 +02:00
Vladislav Vaintroub
b18259ecf5 Compiling - Fix MSVC compile warnings, on x86 2024-05-03 22:00:56 +02:00
Marko Mäkelä
3f9f5ca48e MDEV-33447: libpmem is not available in RHEL 8
Because the Red Hat Enterprise Linux 8 core repository does not include
libpmem, let us implement the necessary subset ourselves.

pmem_persist(): Implement for 64-bit x86, ARM, POWER, RISC-V, Loongarch
in a way that should be compatible with the https://github.com/pmem/pmdk/
implementation of pmem_persist().

The CMake option WITH_INNODB_PMEM can be used for enabling or disabling
this interface at compile time. By default, it is enabled on all applicable
systems that are covered by our CI system.

Note: libpmem had not been previously enabled for Loongarch in our
Debian packaging. It was enabled for RISC-V, but we will not enable it
by default on RISC-V or Loongarch because we lack CI coverage.

The generated code for x86_64 was reviewed and tested on two
Intel implementations: one that only supports clflush, and
another that supports both clflushopt and clwb.

The generated machine code was also reviewed on https://godbolt.org
using various compiler versions. Godbolt helpfully includes an option
to compile to binary code and display the encoding, which was
useful on POWER.

Reviewed by: Vladislav Vaintroub
2024-04-19 10:54:08 +03:00
Marko Mäkelä
1122ac978e MDEV-33545: Improve innodb_doublewrite to cover NO_FSYNC
In commit 24648768b4 (MDEV-30136)
the parameter innodb_flush_method was deprecated, with no direct
replacement for innodb_flush_method=O_DIRECT_NO_FSYNC.

Let us change innodb_doublewrite from Boolean to ENUM that can
be changed while the server is running:

OFF: Assume that writes of innodb_page_size are atomic
ON: Prevent torn writes (the default)
fast: Like ON, but avoid synchronizing writes to data files

The deprecated start-up parameter innodb_flush_method=NO_FSYNC will cause
innodb_doublewrite=ON to be changed to innodb_doublewrite=fast,
which will prevent InnoDB from making any durable writes to data files.
This would normally be done right before the log checkpoint LSN is updated.
Depending on the file systems being used and their configuration,
this may or may not be safe.

The value innodb_doublewrite=fast differs from the previous combination of
innodb_doublewrite=ON and innodb_flush_method=O_DIRECT_NO_FSYNC by always
invoking os_file_flush() on the doublewrite buffer itself
in buf_dblwr_t::flush_buffered_writes_completed(). This should be safer
when there are multiple doublewrite batches between checkpoints.
Typically, once per second, buf_flush_page_cleaner() would write out
up to innodb_io_capacity pages and advance the log checkpoint.
Also typically, innodb_io_capacity>128, which is the size of the
doublewrite buffer in pages. Should os_file_flush_func() not be invoked
between doublewrite batches, writes could be reordered in an unsafe way.

The setting innodb_doublewrite=fast could be safe when the doublewrite
buffer (the first file of the system tablespace) and the data files
reside in the same file system.

This was tested by running "./mtr --rr innodb.alter_kill". On the first
server startup, with innodb_doublewrite=fast, os_file_flush_func()
would only be invoked on the ibdata1 file and possibly ib_logfile0.
On subsequent startups with innodb_doublewrite=OFF, os_file_flush_func()
will be invoked on the individual data files during log_checkpoint().

Note: The setting debug_no_sync (in the code, my_disable_sync) would
disable all durable writes to InnoDB files, which would be much less safe.

IORequest::Type: Introduce special values WRITE_DBL and PUNCH_DBL
for asynchronous writes that are submitted via the doublewrite buffer.
In this way, fil_space_t::use_doublewrite() or buf_dblwr.in_use()
will only be consulted during buf_page_t::flush() and the doublewrite
buffer can be enabled or disabled without any fear of inconsistency.

buf_dblwr_t::block_size: Replaces block_size().

buf_dblwr_t::flush_buffered_writes(): If !in_use() and the doublewrite
buffer is empty, just invoke fil_flush_file_spaces() and return. The
doublewrite buffer could have been disabled while a batch was in
progress.

innodb_init_params(): If innodb_flush_method=O_DIRECT_NO_FSYNC,
set innodb_doublewrite=fast or innodb_doublewrite=fearless.

Thanks to Mark Callaghan for reporting this, and Vladislav Vaintroub
for feedback.
2024-04-04 08:12:54 +03:00
Marko Mäkelä
fec2fd6add Merge 10.11 into 11.0 2024-03-28 10:51:36 +02:00
Marko Mäkelä
788953463d Merge 10.6 into 10.11
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
2024-03-28 09:16:57 +02:00
Marko Mäkelä
ccb7a1e9a1 Merge 10.5 into 10.6 2024-03-27 15:00:56 +02:00
Marko Mäkelä
f0590db5c5 MDEV-33591 MONITOR_INC_VALUE_CUMULATIVE is executed regardless of "if" condition
MONITOR_INC_VALUE_CUMULATIVE is a multiline macro, so the second statement
will be executed always, regardless of "if" condition.

These problems first started with
commit b1ab211dee (MDEV-15053).

Thanks to Yury Chaikou from ServiceNow for the report.
2024-03-22 14:57:00 +02:00
Marko Mäkelä
fa8a46eb68 MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool
By design, InnoDB has always hung when permanently running out of
buffer pool, for example when several threads are waiting to allocate
a block, and all of the buffer pool is buffer-fixed by the active threads.

The hang that we are fixing here occurs when the buffer pool is only
temporarily running out and the situation could be rescued by writing out
some dirty pages or evicting some clean pages.

buf_LRU_get_free_block(): Simplify the way how we wait for
the buf_flush_page_cleaner thread. This fixes occasional hangs
of the test encryption.innochecksum that were introduced by
commit a55b951e60 (MDEV-26827).
To play it safe, we use a timed wait when waiting for the
buf_flush_page_cleaner() thread to perform its job. Should that
thread get stuck, we will invoke buf_pool.LRU_warn() in order to
display a message that pages could not be freed, and keep trying
to wake up the buf_flush_page_cleaner() thread.

The INFORMATION_SCHEMA.INNODB_METRICS counters
buffer_LRU_single_flush_failure_count and
buffer_LRU_get_free_waits will be removed.
The latter is represented by buffer_pool_wait_free.

Also removed will be the message
"InnoDB: Difficult to find free blocks in the buffer pool"
because in d34479dc66 we
introduced a more precise message
"InnoDB: Could not free any blocks in the buffer pool"
in the buf_flush_page_cleaner thread.

buf_pool_t::LRU_warn(): Issue the warning message that we could
not free any blocks in the buffer pool. This may also be invoked
by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears
to be stuck.

buf_pool_t::n_flush_dec(): Remove.

buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec().

buf_flush_LRU_list_batch(): Increment the eviction counter for blocks
of temporary, discarded or dropped tablespaces.

buf_flush_LRU(): Make static, and remove the constant parameter
evict=false. The only caller will be the buf_flush_page_cleaner()
thread.

IORequest::is_LRU(): Remove. The only case of evicting pages on
write completion will be when we are writing out pages of the
temporary tablespace. Those pages are not in buf_pool.flush_list,
only in buf_pool.LRU.

buf_page_t::flush(): Remove the parameter evict.

buf_page_t::write_complete(): Change the parameter "bool temporary"
to "bool persistent" and add a parameter for an already read state().

Reviewed by: Debarun Banerjee
2024-03-22 14:17:39 +02:00
Marko Mäkelä
bf0b82d24b MDEV-33515 log_sys.lsn_lock causes excessive context switching
The log_sys.lsn_lock is a very contended resource with a small
critical section in log_sys.append_prepare(). On many processor
microarchitectures, replacing the system call based log_sys.lsn_lock
with a pure spin lock would fare worse during high concurrency workloads,
wasting a significant amount of CPU cycles in the spin loop.

On other microarchitectures, we would see a significant amount of time
being spent in native_queued_spin_lock_slowpath() in the Linux kernel,
plus context switching between user and kernel address space. This was
pointed out by Steve Shaw from Intel Corporation.

Depending on the workload and the hardware implementation, it may be
useful to use a pure spin lock in log_sys.append_prepare().
We will introduce a parameter. The statement

	SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=50;

would enable a spin lock that will execute that many MY_RELAX_CPU()
operations (such as the x86 PAUSE instruction) between successive
attempts of acquiring the spin lock. The use of a system call based
log_sys.lsn_lock (which is the default setting) can be enabled by

	SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=0;

This patch will also introduce #ifdef LOG_LATCH_DEBUG
(part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON) for more accurate
tracking of log_sys.latch ownership and reorganize the fields of
log_sys to improve the locality of reference and to reduce the
chances of false sharing.

When a spin lock is being used, it will be maintained in the
most significant bit of log_sys.buf_free. This is useful, because that is
one of the fields that is covered by the lock. For IA-32 or AMD64, we
implement the spin lock specially via log_t::lsn_lock_bts(), employing the
i386 LOCK BTS instruction. A straightforward std::atomic::fetch_or() would
translate into an inefficient loop around LOCK CMPXCHG.

mtr_t::spin_wait_delay: The value of innodb_log_spin_wait_delay.

mtr_t::finisher: Pointer to the currently used mtr_t::finish_write()
implementation. This allows to avoid introducing conditional branches.
We no longer invoke log_sys.is_pmem() at the mini-transaction level,
but we would do that in log_write_up_to().

mtr_t::finisher_update(): Update finisher when spin_wait_delay is
changed from or to 0 (the spin lock is changed to log_sys.lsn_lock or
vice versa).
2024-03-22 12:29:01 +02:00
Marko Mäkelä
0772ac1f16 MDEV-33508 Performance regression due to frequent scan of full buf_pool.flush_list
buf_flush_page_cleaner(): Remove a loop that had originally been added
in commit 9d1466522e (MDEV-32029) and made
redundant by commit 5b53342a6a (MDEV-32588).

Starting with commit d34479dc66 (MDEV-33053)
this loop would cause a significant performance regression in workloads
where buf_pool.need_LRU_eviction() constantly holds in
buf_flush_page_cleaner().

Thanks to Steve Shaw of Intel for noticing this.

Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
2024-02-28 12:48:38 +02:00
Marko Mäkelä
8155342a96 Merge 10.11 into 11.0 2024-02-20 15:31:18 +02:00
Marko Mäkelä
3dd7b0a80c Cleanup: Remove OS_FILE_ON_ERROR_NO_EXIT
Ever since commit 412ee0330c
or commit a440d6ed3a
InnoDB should generally not abort when failing to open or create files.
In Datafile::open_or_create() we had failed to set the flag
to avoid abort() on failure, but everywhere else we were setting it.

We may still call abort() via os_file_handle_error().

Reviewed by: Vladislav Vaintroub
2024-02-20 11:22:52 +02:00
Marko Mäkelä
35cc4b6c05 Merge 10.11 into 11.0 2024-01-22 10:10:50 +02:00
Marko Mäkelä
b3ca7fa089 Merge 10.6 into 10.11 2024-01-22 08:49:04 +02:00
Marko Mäkelä
495e7f1b3d MDEV-33053 fixup: Correct a condition before a message 2024-01-22 08:24:08 +02:00
Marko Mäkelä
7e65e3027e MDEV-33275 buf_flush_LRU(): mysql_mutex_assert_owner(&buf_pool.mutex) failed
In commit a55b951e60 (MDEV-26827)
an error was introduced in a rarely executed code path of the
buf_flush_page_cleaner() thread. As a result, the function
buf_flush_LRU() could be invoked while not holding buf_pool.mutex.

Reviewed by: Debarun Banerjee
2024-01-19 12:40:32 +02:00
Marko Mäkelä
d34479dc66 MDEV-33053 InnoDB LRU flushing does not run before running out of buffer pool
buf_flush_LRU(): Display a warning if no pages could be evicted and
no writes initiated.

buf_pool_t::need_LRU_eviction(): Renamed from buf_pool_t::ran_out().
Check if the amount of free pages is smaller than innodb_lru_scan_depth
instead of checking if it is 0.

buf_flush_page_cleaner(): For the final LRU flush after a checkpoint
flush, use a "budget" of innodb_io_capacity_max, like we do in the
case when we are not in "furious" checkpoint flushing.

Co-developed by: Debarun Banerjee
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
2024-01-19 12:40:16 +02:00
Marko Mäkelä
c2da55ac01 Merge 10.11 into 11.0 2024-01-10 12:42:56 +02:00
Marko Mäkelä
1eb11da3e5 Merge 10.6 into 10.11 2024-01-10 12:37:19 +02:00
Marko Mäkelä
3613fb2aa8 MDEV-33112 innodb_undo_log_truncate=ON is blocking page write
When innodb_undo_log_truncate=ON causes an InnoDB undo tablespace
to be truncated, we must guarantee that the undo tablespace will
be rebuilt atomically: After mtr_t::commit_shrink() has durably
written the mini-transaction that rebuilds the undo tablespace,
we must not write any old pages to the tablespace.

To guarantee this, in trx_purge_truncate_history() we used to
traverse the entire buf_pool.flush_list in order to acquire
exclusive latches on all pages for the undo tablespace that
reside in the buffer pool, so that those pages cannot be written
and will be evicted during mtr_t::commit_shrink(). But, this
traversal may interfere with the page writing activity of
buf_flush_page_cleaner(). It would be better to lazily discard
the old pages of the truncated undo tablespace.

fil_space_t::is_being_truncated, fil_space_t::clear_stopping(): Remove.

fil_space_t::create_lsn: A new field, identifying the LSN of the
latest rebuild of a tablespace.

buf_page_t::flush(), buf_flush_try_neighbors(): Evict pages whose
FIL_PAGE_LSN is below fil_space_t::create_lsn.

mtr_t::commit_shrink(): Update fil_space_t::create_lsn and
fil_space_t::size right before the log is durably written and the
tablespace file is being truncated.

fsp_page_create(), trx_purge_truncate_history(): Simplify the logic.

Reviewed by: Thirunarayanan Balathandayuthapani, Vladislav Lesin
Performance tested by: Axel Schwenke
Correctness tested by: Matthias Leich
2024-01-10 11:53:00 +02:00
Marko Mäkelä
6538a91e94 Merge 10.5 into 10.6 2024-01-08 14:39:56 +02:00
Marko Mäkelä
0b612619ef MDEV-33098: Fix some instrumentation for innodb.doublewrite_debug
buf_flush_page_cleaner(): A continue or break inside DBUG_EXECUTE_IF
actually is a no-op. Use an explicit call to _db_keyword_() to
actually avoid advancing the checkpoint.

buf_flush_list_now_set(): Invoke os_aio_wait_until_no_pending_writes()
to ensure that the page write to the system tablespace is completed.
2024-01-08 14:36:54 +02:00
Sergei Golubchik
8c8bce05d2 Merge branch '10.11' into 11.0 2023-12-19 15:53:18 +01:00
Sergei Golubchik
fd0b47f9d6 Merge branch '10.6' into 10.11 2023-12-18 11:19:04 +01:00
Sergei Golubchik
e95bba9c58 Merge branch '10.5' into 10.6 2023-12-17 11:20:43 +01:00
Marko Mäkelä
c8346c0bac MDEV-31939 Adaptive flush recommendation ignores dirty ratio and checkpoint age
buf_flush_page_cleaner(): Pass pct_lwm=srv_max_dirty_pages_pct_lwm
(innodb_max_dirty_pages_pct_lwm) to
page_cleaner_flush_pages_recommendation() unless the dirty page ratio
of the buffer pool is below that. Starting with
commit d4265fbde5 we used to always
pass pct_lwm=0.0, which was not intended.

Reviewed by: Vladislav Vaintroub
2023-12-08 09:31:34 +02:00
Marko Mäkelä
5b6134b040 Merge 10.11 into 11.0 2023-11-24 11:20:56 +02:00
Marko Mäkelä
f2bd662f6c Merge 10.6 into 10.11 2023-11-22 18:14:11 +02:00
Marko Mäkelä
d963584d4c Merge 10.5 into 10.6 2023-11-22 16:56:47 +02:00
Marko Mäkelä
78c9a12c8f MDEV-32861 InnoDB hangs when running out of I/O slots
When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1
and the server is run with the minimum parameters
innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs
were observed.

tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire()
will be woken up when the cache previously was empty.

buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite
batch so that os_aio_wait_until_no_pending_writes() has a chance of
returning. Add a Boolean parameter and pass wait_for_reads=false inside
buf_page_decrypt_after_read(), because those calls will be executed
inside a read completion callback, and therefore
os_aio_wait_until_no_pending_reads() would block indefinitely.
2023-11-22 16:54:41 +02:00
Marko Mäkelä
7443ad1c8a MDEV-32374 log_sys.lsn_lock is a performance hog
The log_sys.lsn_lock that was introduced in
commit a635c40648
had better be located in the same cache line with log_sys.latch
so that log_t::append_prepare() needs to modify only two first
cache lines where log_sys is stored.

log_t::lsn_lock: On Linux, change the type from pthread_mutex_t to
something that may be as small as 32 bits, to pack more data members
in the same cache line. On Microsoft Windows, CRITICAL_SECTION works
better.

log_t::check_flush_or_checkpoint_: Renamed to need_checkpoint.
There is no need to pause all writer threads in log_free_check() when
we only need to write log_sys.buf to ib_logfile0. That will be done in
mtr_t::commit().

log_t::append_prepare_wait(): Make the member function non-static
to simplify the call interface, and add a parameter for the LSN.

log_t::append_prepare(): Invoke append_prepare_wait() at most once.
Only set_check_for_checkpoint() if a log checkpoint needs to
be written. If the log buffer needs to be written, we will take care
of it ourselves later in our caller. This will reduce interference
with log_free_check() in other threads.

mtr_t::commit(): Call log_write_up_to() if needed.

log_t::get_write_target(): Return a log_write_up_to() target
to mtr_t::commit().

buf_flush_ahead(): If we are in furious flushing, call
log_sys.set_check_for_checkpoint() so that all writers will wait
in log_free_check() until the checkpoint is done. Otherwise,
the test innodb.insert_into_empty could occasionally report
an error "Crash recovery is broken".

log_check_margins(): Replaced by log_free_check().

log_flush_margin(): Removed. This is part of mtr_t::commit()
and other operations that write log.

log_t::create(), log_t::attach(): Guarantee that buf_free < max_buf_free
will always hold on PMEM, to satisfy an assumption of
log_t::get_write_target().

log_write_up_to(): Assert lsn!=0. Such calls are not incorrect, but it
is cheaper to test that single unlikely condition in mtr_t::commit()
rather than test several conditions in log_write_up_to().

innodb_drop_database(), unlock_and_close_files(): Check the LSN before
calling log_write_up_to().

ha_innobase::commit_inplace_alter_table(): Remove redundant calls to
log_write_up_to() after calling unlock_and_close_files().

Reviewed by: Vladislav Vaintroub
Stress tested by: Matthias Leich
Performance tested by: Steve Shaw
2023-11-21 14:38:35 +02:00
Marko Mäkelä
9a545eb67c MDEV-26055: Correct the formula for adaptive flushing
This is a 10.5 backport of 10.6
commit d4265fbde5.

page_cleaner_flush_pages_recommendation(): If dirty_pct is
between innodb_max_dirty_pages_pct_lwm
and innodb_max_dirty_pages_pct,
scale the effort relative to how close we are to
innodb_max_dirty_pages_pct.

The previous formula was missing a multiplication by 100.
2023-11-16 17:45:37 +02:00
Marko Mäkelä
a3d0d5fc33 MDEV-26055: Improve adaptive flushing
This is a 10.5 backport from 10.6
commit 9593cccf28.

Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0
(not default) and innodb_adaptive_flushing=ON (default).
There is also the parameter innodb_adaptive_flushing_lwm
(default: 10 per cent of the log capacity). It should enable some
adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0.
That is not being changed here.

This idea was first presented by Inaam Rana several years ago,
and I discussed it with Jean-François Gagné at FOSDEM 2023.

buf_flush_page_cleaner(): When we are not near the log capacity limit
(neither buf_flush_async_lsn nor buf_flush_sync_lsn are set),
also try to move clean blocks from the buf_pool.LRU list to buf_pool.free
or initiate writes (but not the eviction) of dirty blocks, until
the remaining I/O capacity has been consumed.

buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify
whether dirty least recently used pages (from buf_pool.LRU) should
be evicted immediately after they have been written out. Callers outside
buf_flush_page_cleaner() will pass evict=true, to retain the existing
behaviour.

buf_do_LRU_batch(): Add the parameter bool evict.
Return counts of evicted and flushed pages.

buf_flush_LRU(): Add the parameter bool evict.
Assume that the caller holds buf_pool.mutex and
will invoke buf_dblwr.flush_buffered_writes() afterwards.

buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list()
whose caller must hold buf_pool.mutex and invoke
buf_dblwr.flush_buffered_writes() afterwards.

buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have
buf_flush_wait_batch_end().

page_cleaner_flush_pages_recommendation(): Avoid some floating-point
arithmetics.

buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(),
buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict".

buf_free_from_unzip_LRU_list_batch(): Remove the parameter.
Only actual page writes will contribute towards the limit.

buf_LRU_free_page(): Evict freed pages of temporary tables.

buf_pool.done_free: Broadcast whenever a block is freed
(and buf_pool.try_LRU_scan is set).

buf_pool_t::io_buf_t::reserve(): Retry indefinitely.
During the test encryption.innochecksum we easily run out of
these buffers for PAGE_COMPRESSED or ENCRYPTED pages.

Tested by Matthias Leich and Axel Schwenke
2023-11-16 17:45:18 +02:00
Oleksandr Byelkin
48af85db21 Merge branch '10.11' into 11.0 2023-11-08 17:09:44 +01:00
Oleksandr Byelkin
04d9a46c41 Merge branch '10.6' into 10.10 2023-11-08 16:23:30 +01:00
Marko Mäkelä
5b53342a6a MDEV-32588 InnoDB may hang when running out of buffer pool
buf_flush_LRU_list_batch(): Do not skip pages that are actually clean
but in buf_pool.flush_list due to the "lazy removal" optimization of
commit 22b62edaed, but try to evict them.
After acquiring buf_pool.flush_list_mutex, reread oldest_modification
to ensure that the block still remains in buf_pool.flush_list.

In addition to server hangs, this bug could also cause
InnoDB: Failing assertion: list.count > 0
in invocations of UT_LIST_REMOVE(flush_list, ...).

This fixes a regression that was caused by
commit a55b951e60
and possibly made more likely to hit due to
commit aa719b5010.
2023-10-26 15:10:53 +03:00
Marko Mäkelä
39e3ca8bd2 MDEV-31826 InnoDB may fail to recover after being killed in fil_delete_tablespace()
InnoDB was violating the write-ahead-logging protocol when a file
was being deleted, like this:

1. fil_delete_tablespace() set the fil_space_t::STOPPING flag
2. The buf_flush_page_cleaner() thread discards some changed pages for
this tablespace advances the log checkpoint a little.
3. The server process is killed before fil_delete_tablespace() wrote
a FILE_DELETE record.
4. Recovery will try to apply log to pages of the tablespace, because
there was no FILE_DELETE record. This will fail, because some pages
that had been modified since the latest checkpoint had not been written
by the page cleaner.

Page writes must not be stopped before a FILE_DELETE record has been
durably written.

fil_space_t::drop(): Replaces fil_space_t::check_pending_operations().
Add the parameter detached_handle, and return a tablespace pointer
if this thread was the first one to stop I/O on the tablespace.

mtr_t::commit_file(): Remove the parameter detached_handle, and
move some handling to fil_space_t::drop().

fil_space_t: STOPPING_READS, STOPPING_WRITES: Separate flags for STOPPING.
We want to stop reads (and encryption) before stopping page writes.

fil_space_t::is_stopping_writes(), fil_space_t::get_for_write():
Special accessors for the write path.

fil_space_t::flush_low(): Ignore the STOPPING_READS flag and only
stop if STOPPING_WRITES is set, to avoid an infinite loop in
fil_flush_file_spaces(), which was occasionally repeated by
running the test encryption.create_or_replace.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2023-10-26 15:07:59 +03:00
Marko Mäkelä
5a8fca5a4f Merge 10.6 into 10.10 2023-10-23 18:43:36 +03:00
Marko Mäkelä
b21f52ee73 Merge 10.5 into 10.6 2023-10-23 16:43:48 +03:00
Marko Mäkelä
b5e43a1d35 MDEV-32552 Write-ahead logging is broken for freed pages
buf_page_free(): Flag the freed page as modified if it is found in
the buffer pool.

buf_flush_page(): If the page has been freed, ensure that the log
for it has been durably written, before removing the page
from buf_pool.flush_list.

FindBlockX: Find also MTR_MEMO_PAGE_X_MODIFY in order to avoid an
occasional failure of innodb.innodb_defrag_concurrent, which involves
freeing and reallocating pages in the same mini-transaction.

This fixes a regression that was introduced in
commit a35b4ae898 (MDEV-15528).

This logic was tested by commenting out the $shutdown_timeout line
from a test and running the following:

./mtr --rr innodb.scrub
rr replay var/log/mysqld.1.rr/mariadbd-0

A breakpoint in the modified buf_flush_page() was hit, and the
FIL_PAGE_LSN of that page had been last modified during the
mtr_t::commit() of a mini-transaction where buf_page_free()
had been executed on that page.
2023-10-23 16:13:16 +03:00
Marko Mäkelä
be24e75229 Merge 10.11 into 11.0 2023-10-19 08:12:16 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00
Marko Mäkelä
cdd2fa7fc5 MDEV-32134 InnoDB hang in buf_flush_wait_LRU_batch_end()
buf_flush_page_cleaner(): Before finishing a batch, wake up any threads
that are waiting for buf_pool.done_flush_LRU.

This should fix a hung shutdown that we observed
after SET GLOBAL innodb_buffer_pool_size started was executed
to shrink the InnoDB buffer pool.
2023-09-11 14:54:50 +03:00