Commit graph

1566 commits

Author SHA1 Message Date
Sergei Golubchik
fd0b47f9d6 Merge branch '10.6' into 10.11 2023-12-18 11:19:04 +01:00
Sergei Golubchik
e95bba9c58 Merge branch '10.5' into 10.6 2023-12-17 11:20:43 +01:00
Marko Mäkelä
c8346c0bac MDEV-31939 Adaptive flush recommendation ignores dirty ratio and checkpoint age
buf_flush_page_cleaner(): Pass pct_lwm=srv_max_dirty_pages_pct_lwm
(innodb_max_dirty_pages_pct_lwm) to
page_cleaner_flush_pages_recommendation() unless the dirty page ratio
of the buffer pool is below that. Starting with
commit d4265fbde5 we used to always
pass pct_lwm=0.0, which was not intended.

Reviewed by: Vladislav Vaintroub
2023-12-08 09:31:34 +02:00
Marko Mäkelä
f074223ae7 MDEV-32068 Some calls to buf_read_ahead_linear() seem to be useless
The linear read-ahead (enabled by nonzero innodb_read_ahead_threshold)
works best if index leaf pages or undo log pages have been allocated
on adjacent page numbers. The read-ahead is assumed not to be helpful
in other types of page accesses, such as non-leaf index pages.

buf_page_get_low(): Do not invoke buf_page_t::set_accessed(),
buf_page_make_young_if_needed(), or buf_read_ahead_linear().
We will invoke them in those callers of buf_page_get_gen() or
buf_page_get() where it makes sense: the access is not
one-time-on-startup and the page and not going to be freed soon.

btr_copy_blob_prefix(), btr_pcur_move_to_next_page(),
trx_undo_get_prev_rec_from_prev_page(),
trx_undo_get_first_rec(), btr_cur_t::search_leaf(),
btr_cur_t::open_leaf(): Invoke buf_read_ahead_linear().

We will not invoke linear read-ahead in functions that would
essentially allocate or free pages, because pages that are
freshly allocated are expected to be initialized by buf_page_create()
and not read from the data file. Likewise, freeing pages should
not involve accessing any sibling pages, except for freeing
singly-linked lists of BLOB pages.

We will not invoke read-ahead in btr_cur_t::pessimistic_search_leaf()
or in a pessimistic operation of btr_cur_t::open_leaf(), because
it is assumed that pessimistic operations should be preceded by
optimistic operations, which should already have invoked read-ahead.

buf_page_make_young_if_needed(): Invoke also buf_page_t::set_accessed()
and return the result.

btr_cur_nonleaf_make_young(): Like buf_page_make_young_if_needed(),
but do not invoke buf_page_t::set_accessed().

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2023-12-05 12:31:29 +02:00
Marko Mäkelä
6d0bcfc4b9 Merge 10.6 into 10.11 2023-11-30 13:03:59 +02:00
Marko Mäkelä
bb511def1d MDEV-32371 Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict()
buf_page_get_zip(): Do not wait for the page latch while holding hash_lock.
If the latch is not available, ensure that any concurrent
buf_pool_t::corrupted_evict() will be able to acquire the hash_lock,
and then retry the lookup. If the page was corrupted, we will finally
"goto must_read_page", retry the read once more, and then report an error.

Reviewed by: Thirunarayanan Balathandayuthapani
2023-11-30 10:35:53 +02:00
Daniel Black
a48c1b89c8 MDEV-24670 memory pressure - eventfd rather than pipe
Eventfds have a simplier interface and are one file
descriptor rather than two.

Reuse the patten of the accepting socket connections
by testing for abort after a poll returns. This way
the same event descriptor can be used for Quit
and debugging trigger.

Also correct the registration of mem pressure file
descriptors.
2023-11-23 18:39:38 +11:00
Marko Mäkelä
f2bd662f6c Merge 10.6 into 10.11 2023-11-22 18:14:11 +02:00
Marko Mäkelä
d963584d4c Merge 10.5 into 10.6 2023-11-22 16:56:47 +02:00
Marko Mäkelä
78c9a12c8f MDEV-32861 InnoDB hangs when running out of I/O slots
When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1
and the server is run with the minimum parameters
innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs
were observed.

tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire()
will be woken up when the cache previously was empty.

buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite
batch so that os_aio_wait_until_no_pending_writes() has a chance of
returning. Add a Boolean parameter and pass wait_for_reads=false inside
buf_page_decrypt_after_read(), because those calls will be executed
inside a read completion callback, and therefore
os_aio_wait_until_no_pending_reads() would block indefinitely.
2023-11-22 16:54:41 +02:00
Marko Mäkelä
7443ad1c8a MDEV-32374 log_sys.lsn_lock is a performance hog
The log_sys.lsn_lock that was introduced in
commit a635c40648
had better be located in the same cache line with log_sys.latch
so that log_t::append_prepare() needs to modify only two first
cache lines where log_sys is stored.

log_t::lsn_lock: On Linux, change the type from pthread_mutex_t to
something that may be as small as 32 bits, to pack more data members
in the same cache line. On Microsoft Windows, CRITICAL_SECTION works
better.

log_t::check_flush_or_checkpoint_: Renamed to need_checkpoint.
There is no need to pause all writer threads in log_free_check() when
we only need to write log_sys.buf to ib_logfile0. That will be done in
mtr_t::commit().

log_t::append_prepare_wait(): Make the member function non-static
to simplify the call interface, and add a parameter for the LSN.

log_t::append_prepare(): Invoke append_prepare_wait() at most once.
Only set_check_for_checkpoint() if a log checkpoint needs to
be written. If the log buffer needs to be written, we will take care
of it ourselves later in our caller. This will reduce interference
with log_free_check() in other threads.

mtr_t::commit(): Call log_write_up_to() if needed.

log_t::get_write_target(): Return a log_write_up_to() target
to mtr_t::commit().

buf_flush_ahead(): If we are in furious flushing, call
log_sys.set_check_for_checkpoint() so that all writers will wait
in log_free_check() until the checkpoint is done. Otherwise,
the test innodb.insert_into_empty could occasionally report
an error "Crash recovery is broken".

log_check_margins(): Replaced by log_free_check().

log_flush_margin(): Removed. This is part of mtr_t::commit()
and other operations that write log.

log_t::create(), log_t::attach(): Guarantee that buf_free < max_buf_free
will always hold on PMEM, to satisfy an assumption of
log_t::get_write_target().

log_write_up_to(): Assert lsn!=0. Such calls are not incorrect, but it
is cheaper to test that single unlikely condition in mtr_t::commit()
rather than test several conditions in log_write_up_to().

innodb_drop_database(), unlock_and_close_files(): Check the LSN before
calling log_write_up_to().

ha_innobase::commit_inplace_alter_table(): Remove redundant calls to
log_write_up_to() after calling unlock_and_close_files().

Reviewed by: Vladislav Vaintroub
Stress tested by: Matthias Leich
Performance tested by: Steve Shaw
2023-11-21 14:38:35 +02:00
Marko Mäkelä
90d968dab9 Merge 10.6 into 10.11 2023-11-20 10:08:19 +02:00
Marko Mäkelä
2323483528 MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758 and revised in
commit c1fd082e9c for Microsoft Windows.

buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.

To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management

To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.

On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.

The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds

The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.

For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.

Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").

Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.

Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.

To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.

ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427

Co-Author: Daniel Black (on memory pressure trigger)

Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
   Thirunarayanan Balathandayuthapani

Tested by: Matthias Leich
2023-11-18 20:12:33 +11:00
Marko Mäkelä
eb1f8b2919 MDEV-32027 Opening all .ibd files on InnoDB startup can be slow
dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES.

dict_check_tablespaces_and_store_max_id(): In the normal case
(no encryption plugin has been loaded and the change buffer is empty),
invoke dict_find_max_space_id() and do not open any .ibd files.
If a std::set<uint32_t> has been specified, open the files whose
tablespace ID is mentioned. Else, open all data files that are identified
by SYS_TABLES records.

fil_ibd_open(): Remove a call to os_file_get_last_error() that can
report a misleading error, such as EINVAL inside my_realpath() that is
not an actual error. This could be invoked when a data file is found
but the FSP_SPACE_FLAGS are incorrect, such as is the case for
table test.td in
./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags

buf_load(): If any tablespaces could not be found, invoke
dict_check_tablespaces_and_store_max_id() on the missing tablespaces.

dict_load_tablespace(): Try to load the tablespace unless it was found
to be futile. This fixes failures related to FTS_*.ibd files for
FULLTEXT INDEX.

btr_cur_t::search_leaf(): Prevent a crash when the tablespace
does not exist. This was caught by the test innodb_fts.fts_concurrent_insert
when the change to dict_load_tablespaces() was not present.

We modify a few tests to ensure that tables will not be loaded at startup.
For some fault injection tests this means that the corrupted tables
will not be loaded, because dict_load_tablespace() would perform stricter
checks than dict_check_tablespaces_and_store_max_id().

Tested by: Matthias Leich
Reviewed by: Thirunarayanan Balathandayuthapani
2023-11-17 15:07:51 +02:00
Marko Mäkelä
9a545eb67c MDEV-26055: Correct the formula for adaptive flushing
This is a 10.5 backport of 10.6
commit d4265fbde5.

page_cleaner_flush_pages_recommendation(): If dirty_pct is
between innodb_max_dirty_pages_pct_lwm
and innodb_max_dirty_pages_pct,
scale the effort relative to how close we are to
innodb_max_dirty_pages_pct.

The previous formula was missing a multiplication by 100.
2023-11-16 17:45:37 +02:00
Marko Mäkelä
a3d0d5fc33 MDEV-26055: Improve adaptive flushing
This is a 10.5 backport from 10.6
commit 9593cccf28.

Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0
(not default) and innodb_adaptive_flushing=ON (default).
There is also the parameter innodb_adaptive_flushing_lwm
(default: 10 per cent of the log capacity). It should enable some
adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0.
That is not being changed here.

This idea was first presented by Inaam Rana several years ago,
and I discussed it with Jean-François Gagné at FOSDEM 2023.

buf_flush_page_cleaner(): When we are not near the log capacity limit
(neither buf_flush_async_lsn nor buf_flush_sync_lsn are set),
also try to move clean blocks from the buf_pool.LRU list to buf_pool.free
or initiate writes (but not the eviction) of dirty blocks, until
the remaining I/O capacity has been consumed.

buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify
whether dirty least recently used pages (from buf_pool.LRU) should
be evicted immediately after they have been written out. Callers outside
buf_flush_page_cleaner() will pass evict=true, to retain the existing
behaviour.

buf_do_LRU_batch(): Add the parameter bool evict.
Return counts of evicted and flushed pages.

buf_flush_LRU(): Add the parameter bool evict.
Assume that the caller holds buf_pool.mutex and
will invoke buf_dblwr.flush_buffered_writes() afterwards.

buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list()
whose caller must hold buf_pool.mutex and invoke
buf_dblwr.flush_buffered_writes() afterwards.

buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have
buf_flush_wait_batch_end().

page_cleaner_flush_pages_recommendation(): Avoid some floating-point
arithmetics.

buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(),
buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict".

buf_free_from_unzip_LRU_list_batch(): Remove the parameter.
Only actual page writes will contribute towards the limit.

buf_LRU_free_page(): Evict freed pages of temporary tables.

buf_pool.done_free: Broadcast whenever a block is freed
(and buf_pool.try_LRU_scan is set).

buf_pool_t::io_buf_t::reserve(): Retry indefinitely.
During the test encryption.innochecksum we easily run out of
these buffers for PAGE_COMPRESSED or ENCRYPTED pages.

Tested by Matthias Leich and Axel Schwenke
2023-11-16 17:45:18 +02:00
Oleksandr Byelkin
fecd78b837 Merge branch '10.10' into 10.11 2023-11-08 16:46:47 +01:00
Oleksandr Byelkin
04d9a46c41 Merge branch '10.6' into 10.10 2023-11-08 16:23:30 +01:00
Marko Mäkelä
5b53342a6a MDEV-32588 InnoDB may hang when running out of buffer pool
buf_flush_LRU_list_batch(): Do not skip pages that are actually clean
but in buf_pool.flush_list due to the "lazy removal" optimization of
commit 22b62edaed, but try to evict them.
After acquiring buf_pool.flush_list_mutex, reread oldest_modification
to ensure that the block still remains in buf_pool.flush_list.

In addition to server hangs, this bug could also cause
InnoDB: Failing assertion: list.count > 0
in invocations of UT_LIST_REMOVE(flush_list, ...).

This fixes a regression that was caused by
commit a55b951e60
and possibly made more likely to hit due to
commit aa719b5010.
2023-10-26 15:10:53 +03:00
Marko Mäkelä
39e3ca8bd2 MDEV-31826 InnoDB may fail to recover after being killed in fil_delete_tablespace()
InnoDB was violating the write-ahead-logging protocol when a file
was being deleted, like this:

1. fil_delete_tablespace() set the fil_space_t::STOPPING flag
2. The buf_flush_page_cleaner() thread discards some changed pages for
this tablespace advances the log checkpoint a little.
3. The server process is killed before fil_delete_tablespace() wrote
a FILE_DELETE record.
4. Recovery will try to apply log to pages of the tablespace, because
there was no FILE_DELETE record. This will fail, because some pages
that had been modified since the latest checkpoint had not been written
by the page cleaner.

Page writes must not be stopped before a FILE_DELETE record has been
durably written.

fil_space_t::drop(): Replaces fil_space_t::check_pending_operations().
Add the parameter detached_handle, and return a tablespace pointer
if this thread was the first one to stop I/O on the tablespace.

mtr_t::commit_file(): Remove the parameter detached_handle, and
move some handling to fil_space_t::drop().

fil_space_t: STOPPING_READS, STOPPING_WRITES: Separate flags for STOPPING.
We want to stop reads (and encryption) before stopping page writes.

fil_space_t::is_stopping_writes(), fil_space_t::get_for_write():
Special accessors for the write path.

fil_space_t::flush_low(): Ignore the STOPPING_READS flag and only
stop if STOPPING_WRITES is set, to avoid an infinite loop in
fil_flush_file_spaces(), which was occasionally repeated by
running the test encryption.create_or_replace.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2023-10-26 15:07:59 +03:00
Marko Mäkelä
3036b36f9b Merge 10.10 into 10.11 2023-10-23 18:44:12 +03:00
Marko Mäkelä
5a8fca5a4f Merge 10.6 into 10.10 2023-10-23 18:43:36 +03:00
Marko Mäkelä
b21f52ee73 Merge 10.5 into 10.6 2023-10-23 16:43:48 +03:00
Marko Mäkelä
b5e43a1d35 MDEV-32552 Write-ahead logging is broken for freed pages
buf_page_free(): Flag the freed page as modified if it is found in
the buffer pool.

buf_flush_page(): If the page has been freed, ensure that the log
for it has been durably written, before removing the page
from buf_pool.flush_list.

FindBlockX: Find also MTR_MEMO_PAGE_X_MODIFY in order to avoid an
occasional failure of innodb.innodb_defrag_concurrent, which involves
freeing and reallocating pages in the same mini-transaction.

This fixes a regression that was introduced in
commit a35b4ae898 (MDEV-15528).

This logic was tested by commenting out the $shutdown_timeout line
from a test and running the following:

./mtr --rr innodb.scrub
rr replay var/log/mysqld.1.rr/mariadbd-0

A breakpoint in the modified buf_flush_page() was hit, and the
FIL_PAGE_LSN of that page had been last modified during the
mtr_t::commit() of a mini-transaction where buf_page_free()
had been executed on that page.
2023-10-23 16:13:16 +03:00
Marko Mäkelä
65700edb26 Merge 10.10 into 10.11 2023-10-19 14:50:42 +03:00
Marko Mäkelä
c92d06748a Merge 10.6 into 10.10 2023-10-19 14:35:31 +03:00
Marko Mäkelä
6991b1c47c Merge 10.5 into 10.6 2023-10-19 13:50:00 +03:00
Thirunarayanan Balathandayuthapani
3da5d047b8 MDEV-31851 After crash recovery, undo tablespace fails to open
Problem:
========
- InnoDB fails to open undo tablespace when page0 is corrupted
and fails to throw error.

Solution:
=========
- InnoDB throws DB_CORRUPTION error when InnoDB encounters
page0 corruption of undo tablespace.

- InnoDB restores the page0 of undo tablespace from
doublewrite buffer if it encounters page corruption

- Moved Datafile::restore_from_doublewrite() to
recv_dblwr_t::restore_first_page(). So that undo
tablespace and system tablespace can use this function
instead of duplicating the code

srv_undo_tablespace_open(): Returns 0 if file doesn't exist
or ULINT_UNDEFINED if page0 is corrupted.
2023-10-17 18:41:21 +05:30
Marko Mäkelä
2ecc0443ec Merge 10.10 into 10.11 2023-10-17 16:04:21 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00
Vladislav Vaintroub
e33e2fa949 MDEV-31095 tpool - restrict threadpool concurrency during bufferpool load
Add threadpool functionality to restrict concurrency during "batch"
periods (where tasks are added in rapid succession).
This will throttle thread creation more agressively than usual, while
keeping performance at least on-par.

One of these cases is bufferpool load, where async read IOs are executed
without any throttling. There can be as much as 650K read IOs for
loading 10GB buffer pool.

Another one is recovery, where "fake read" IOs are executed.

Why there are more threads than we expect?
Worker threads are not be recognized as idle, until they return to the
standby list, and to return to that list, they need to acquire
mutex currently held in the submit_task(). In those cases, submit_task()
has no worker to wake, and would create threads until default concurrency
level (2*ncpus) is satisfied. Only after that throttling would happen.
2023-10-04 17:44:02 +02:00
Marko Mäkelä
0f9acce3f2 Merge 10.5 into 10.6 2023-09-14 09:01:15 +03:00
Marko Mäkelä
d20a4da23d MDEV-32150 InnoDB reports corruption on 32-bit platforms with ibd files sizes > 4GB
buf_read_page_low(): Use 64-bit arithmetics when computing the
file byte offset. In other calls to fil_space_t::io() the offset
was being computed correctly, for example by
buf_page_t::physical_offset().
2023-09-12 15:16:31 +03:00
Thirunarayanan Balathandayuthapani
a03b8cd0a2 MDEV-32145 Disable read-ahead for temporary tablespace
- Lifetime of temporary tables is expected to be short, it would
seem to make sense to assume that all temporary tablespace pages
will remain in the buffer pool. It doesn't make sense to have
read-ahead for pages of temporary tablespace
2023-09-11 18:02:53 +05:30
Marko Mäkelä
cdd2fa7fc5 MDEV-32134 InnoDB hang in buf_flush_wait_LRU_batch_end()
buf_flush_page_cleaner(): Before finishing a batch, wake up any threads
that are waiting for buf_pool.done_flush_LRU.

This should fix a hung shutdown that we observed
after SET GLOBAL innodb_buffer_pool_size started was executed
to shrink the InnoDB buffer pool.
2023-09-11 14:54:50 +03:00
Marko Mäkelä
9d1466522e MDEV-32029 Assertion failures in log_sort_flush_list upon crash recovery
In commit 0d175968d1 (MDEV-31354)
we only waited that no buf_pool.flush_list writes are in progress.
The buf_flush_page_cleaner() thread could still initiate page writes
from the buf_pool.LRU list while only holding buf_pool.mutex, not
buf_pool.flush_list_mutex. This is something that was changed in
commit a55b951e60 (MDEV-26827).

log_sort_flush_list(): Wait for the buf_flush_page_cleaner() thread to
be completely idle, including LRU flushing.

buf_flush_page_cleaner(): Always broadcast buf_pool.done_flush_list
when becoming idle, so that log_sort_flush_list() will be woken up.
Also, ensure that buf_pool.n_flush_inc() or
buf_pool.flush_list_set_active() has been invoked before any page
writes are initiated.

buf_flush_try_neighbors(): Release buf_pool.mutex here and not in the
callers, to avoid code duplication. Make innodb_flush_neighbors=ON
obey the innodb_io_capacity limit.
2023-08-30 14:40:13 +03:00
Marko Mäkelä
31ea201ecc MDEV-30986 Slow full index scan for I/O bound case
buf_page_init_for_read(): Test a condition before acquiring a latch,
not while holding it.

buf_read_ahead_linear(): Do not use a memory transaction, because it
could be too large, leading to frequent retries.
Release the hash_lock as early as possible.
2023-08-30 13:20:27 +03:00
Marko Mäkelä
08a549c33d Clean up buf_LRU_remove_hashed()
buf_LRU_block_remove_hashed(): Test for "not ROW_FORMAT=COMPRESSED" first,
because in that case we can assume that an uncompressed page exists.
This removes a condition from the likely code branch.
2023-08-25 13:44:59 +03:00
Marko Mäkelä
a60462d93e Remove bogus references to replaced Google contributions
In commit 03ca6495df and
commit ff5d306e29
we forgot to remove some Google copyright notices related to
a contribution of using atomic memory access in the old InnoDB
mutex_t and rw_lock_t implementation.

The copyright notices had been mostly added in
commit c6232c06fa
due to commit a1bb700fd2.

The following Google contributions remain:
* some logic related to the parameter innodb_io_capacity
* innodb_encrypt_tables, added in MariaDB Server 10.1
2023-08-21 15:51:16 +03:00
Marko Mäkelä
6cc88c3db1 Clean up buf0buf.inl
Let us move some #include directives from buf0buf.inl to
the compilation units where they are really used.
2023-08-21 15:51:10 +03:00
Marko Mäkelä
448c2077fb Merge 10.5 into 10.6 2023-08-21 15:50:31 +03:00
Marko Mäkelä
be5fd3ec35 Remove a stale comment
buf_LRU_block_remove_hashed(): Remove a comment that had been added
in mysql/mysql-server@aad1c7d0dd
and apparently referring to buf_LRU_invalidate_tablespace(),
which was later replaced with buf_LRU_flush_or_remove_pages() and
ultimately with buf_flush_remove_pages() and buf_flush_list_space().
All that code is covered by buf_pool.mutex. The note about releasing
the hash_lock for the buf_pool.page_hash slice would actually apply to
the last reference to hash_lock in buf_LRU_free_page(), for the
case zip=false (retaining a ROW_FORMAT=COMPRESSED page while
discarding the uncompressed one).
2023-08-21 13:28:12 +03:00
Oleksandr Byelkin
036df5f970 Merge branch '10.10' into 10.11 2023-08-08 14:57:31 +02:00
Oleksandr Byelkin
34a8e78581 Merge branch '10.6' into 10.9 2023-08-04 08:01:06 +02:00
Marko Mäkelä
72928e640e MDEV-27593: Crashing on I/O error is unhelpful
buf_page_t::write_complete(), buf_page_write_complete(),
IORequest::write_complete(): Add a parameter for passing
an error code. If an error occurred, we will release the
io-fix, buffer-fix and page latch but not reset the
oldest_modification field. The block would remain in
buf_pool.LRU and possibly buf_pool.flush_list, to be written
again later, by buf_flush_page_cleaner(). If all page writes
start consistently failing, all write threads should eventually
hang in log_free_check() because the log checkpoint cannot
be advanced to make room in the circular write-ahead-log ib_logfile0.

IORequest::read_complete(): Add a parameter for passing
an error code. If a read operation fails, we report the error
and discard the page, just like we would do if the page checksum
was not validated or the page could not be decrypted.
This only affects asynchronous reads, due to linear or random read-ahead
or crash recovery. When buf_page_get_low() invokes buf_read_page(),
that will be a synchronous read, not involving this code.

This was tested by randomly injecting errors in
write_io_callback() and read_io_callback(), like this:

  if (!ut_rnd_interval(100))
    cb->m_err= 42;
2023-08-01 14:39:29 +03:00
Marko Mäkelä
96cfdb8710 MDEV-31816 fixup: Relax a debug assertion
buf_LRU_free_page(): The block may also be in the IBUF_EXIST state
when executing the test innodb.innodb_bulk_create_index_debug.
2023-08-01 13:22:16 +03:00
Marko Mäkelä
d794d3484b MDEV-31816 buf_LRU_free_page() does not preserve ROW_FORMAT=COMPRESSED block state
buf_LRU_free_page(): When we are discarding the uncompressed copy of a
ROW_FORMAT=COMPRESSED page, buf_page_t::can_relocate() must have ensured
that the block descriptor state is one of FREED, UNFIXED, REINIT.
Do not overwrite the state with UNFIXED. We do not want to write back
pages that were actually freed, and we want to avoid doublewrite for
pages that were (re)initialized by log records written since the latest
checkpoint. Last but not least, we do not want crashes like those that
commit dc1bd1802a (MDEV-31386)
was supposed to fix.

The test innodb_zip.wl5522_zip should typically cover all 3 states.

This bug is a regression due to
commit aaef2e1d8c (MDEV-27058).
2023-08-01 09:58:15 +03:00
Marko Mäkelä
bce3ee704f Merge 10.10 into 10.11 2023-07-26 14:44:43 +03:00
Marko Mäkelä
7cde5c539b Merge 10.6 into 10.9 2023-07-10 11:22:21 +03:00
Monty
99bd226059 MDEV-31558 Add InnoDB engine information to the slow query log
The new statistics is enabled by adding the "engine", "innodb" or "full"
option to --log-slow-verbosity

Example output:

 # Pages_accessed: 184  Pages_read: 95  Pages_updated: 0  Old_rows_read: 1
 # Pages_read_time: 17.0204  Engine_time: 248.1297

Page_read_time is time doing physical reads inside a storage engine.
(Writes cannot be tracked as these are usually done in the background).
Engine_time is the time spent inside the storage engine for the full
duration of the read/write/update calls. It uses the same code as
'analyze statement' for calculating the time spent.

The engine statistics is done with a generic interface that should be
easy for any engine to use. It can also easily be extended to provide
even more statistics.

Currently only InnoDB has counters for Pages_% and Undo_% status.
Engine_time works for all engines.

Implementation details:

class ha_handler_stats holds all engine stats.  This class is included
in handler and THD classes.
While a query is running, all statistics is updated in the handler. In
close_thread_tables() the statistics is added to the THD.

handler::handler_stats is a pointer to where statistics should be
collected. This is set to point to handler::active_handler_stats if
stats are requested. If not, it is set to 0.
handler_stats has also an element, 'active' that is 1 if stats are
requested. This is to allow engines to avoid doing any 'if's while
updating the statistics.

Cloned or partition tables have the pointer set to the base table if
status are requested.

There is a small performance impact when using --log-slow-verbosity=engine:
- All engine calls in 'select' will be timed.
- IO calls for InnoDB reads will be timed.
- Incrementation of counters are done on local variables and accesses
  are inline, so these should have very little impact.
- Statistics has to be reset for each statement for the THD and each
  used handler. This is only 40 bytes, which should be neglectable.
- For partition tables we have to loop over all partitions to update
  the handler_status as part of table_init(). Can be optimized in the
  future to only do this is log-slow-verbosity changes. For this to work
  we have to update handler_status for all opened partitions and
  also for all partitions opened in the future.

Other things:
- Added options 'engine' and 'full' to log-slow-verbosity.
- Some of the new files in the test suite comes from Percona server, which
  has similar status information.
- buf_page_optimistic_get(): Do not increment any counter, since we are
  only validating a pointer, not performing any buf_pool.page_hash lookup.
- Added THD argument to save_explain_data_intern().
- Switched arguments for save_explain_.*_data() to have
  always THD first (generates better code as other functions also have THD
  first).
2023-07-07 12:53:18 +03:00