Commit graph

78 commits

Author SHA1 Message Date
Vladislav Vaintroub
37b7986467 Merge branch '10.5' into 10.6 2024-11-05 21:02:22 +01:00
Vladislav Vaintroub
d64034770e MDEV-35273 tpool::worker_data - replace MY_ALIGNED with pad member 2024-10-28 16:24:53 +01:00
Daniel Black
d472391471 tpool: correct LIBAIO_REQIRED typo
Noticed thanks to Razvan Liviu Varzaru
2024-06-29 15:25:08 +10:00
Marko Mäkelä
0076eb3d4e Merge 10.5 into 10.6 2024-06-24 13:09:47 +03:00
Dave Gosselin
db0c28eff8 MDEV-33746 Supply missing override markings
Find and fix missing virtual override markings.  Updates cmake
maintainer flags to include -Wsuggest-override and
-Winconsistent-missing-override.
2024-06-20 11:32:13 -04:00
Marko Mäkelä
c849952b71 MDEV-33840: Fix GCC -Wreorder
This fixes up the merge commit 829cb1a49c
2024-06-13 19:57:40 +03:00
mariadb-DebarunBanerjee
52f6df99ed MDEV-33669 mariabackup --backup hangs
This is a server hang and not an issue with backup. While concurrent
DDLs in server gets in hanged state, mariabackup waits for DDLs to
finish trying to acquire MDL_BACKUP_BLOCK_DDL.

The server hang is serious in nature and caused by thread pool state
being incorrectly set to thread creation pending state while no creation
is actually pending. Once a thread pool reaches such state no new thread
gets created in the pool.

While it could possibly affect all thread pools in server, the innodb
thread pool is the victim in current bug where IO job gets blocked when
the pool is stuck with much less number of threads than intended.
Available workers are blocked in purge waiting for page lock to be
released by IO write (SX lock) causing a complete deadlock.

The issue is caused by the state variable m_thread_creation_pending
introduced by MDEV-31095: 9e62ab7aaf. We check and set the variable
early while attempting to create a new thread in pool but fail to reset
it if we exit the flow for other reasons like maximum threads reached
or get into thread creation throttling path.

Fix: The simple fix is to make sure that the state is reset back in case
we don't actually attempt to create the thread.
2024-04-29 08:25:18 +05:30
Marko Mäkelä
829cb1a49c Merge 10.5 into 10.6 2024-04-17 14:14:58 +03:00
Vladislav Vaintroub
f6e9600f42 MDEV-33840 tpool- switch to longer maintainence timer interval, if pool is idle
Previous solution, that would entirely switch timer off, turned out
to be deadlock prone.

This patch fixed previous attempt to switch between long/short interval
periods in MDEV-24295. Now, initial state of the timer is fixed (it is ON).
Also, avoid switching timer to longer periods if there is any activity in
the pool.
2024-04-17 10:49:59 +02:00
Vladislav Vaintroub
2ba79aba2b Revert "MDEV-33840 tpool : switch off maintenance timer when not needed."
This reverts commit 09bae92c16.
2024-04-17 09:58:34 +02:00
Sergei Golubchik
41296a07c8 Merge branch '10.5' into 10.6 2024-04-11 13:58:22 +02:00
Vladislav Vaintroub
09bae92c16 MDEV-33840 tpool : switch off maintenance timer when not needed.
Before patch, maintenance timer will tick every 0.4 seconds.
After this patch, timer will tick every 0.4 seconds when necessary(
there are delayed thread creation), switching off completely after 20
seconds of being idle.
2024-04-09 08:31:32 +02:00
Marko Mäkelä
d963584d4c Merge 10.5 into 10.6 2023-11-22 16:56:47 +02:00
Marko Mäkelä
78c9a12c8f MDEV-32861 InnoDB hangs when running out of I/O slots
When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1
and the server is run with the minimum parameters
innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs
were observed.

tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire()
will be woken up when the cache previously was empty.

buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite
batch so that os_aio_wait_until_no_pending_writes() has a chance of
returning. Add a Boolean parameter and pass wait_for_reads=false inside
buf_page_decrypt_after_read(), because those calls will be executed
inside a read completion callback, and therefore
os_aio_wait_until_no_pending_reads() would block indefinitely.
2023-11-22 16:54:41 +02:00
Vladislav Vaintroub
9e62ab7aaf MDEV-31095 tpool - do not create new worker, if thread creation is pending.
Use an std::atomic_flag to track thread creation in progress.
This is mainly a cleanup, the effect of this change was not measureable
in my tests.
2023-10-04 17:44:13 +02:00
Vladislav Vaintroub
e33e2fa949 MDEV-31095 tpool - restrict threadpool concurrency during bufferpool load
Add threadpool functionality to restrict concurrency during "batch"
periods (where tasks are added in rapid succession).
This will throttle thread creation more agressively than usual, while
keeping performance at least on-par.

One of these cases is bufferpool load, where async read IOs are executed
without any throttling. There can be as much as 650K read IOs for
loading 10GB buffer pool.

Another one is recovery, where "fake read" IOs are executed.

Why there are more threads than we expect?
Worker threads are not be recognized as idle, until they return to the
standby list, and to return to that list, they need to acquire
mutex currently held in the submit_task(). In those cases, submit_task()
has no worker to wake, and would create threads until default concurrency
level (2*ncpus) is satisfied. Only after that throttling would happen.
2023-10-04 17:44:02 +02:00
Marko Mäkelä
f50abab195 MDEV-31048 PERFORMANCE_SCHEMA lakcs InnoDB read_slots and write_slots
tpool::cache::m_mtx: Add PERFORMANCE_SCHEMA instrumentation
(wait/synch/mutex/innodb/tpool_cache_mutex). This covers the
InnoDB read_slots and write_slots for asynchronous data page I/O.
2023-04-13 15:18:26 +03:00
Marko Mäkelä
a091d6ac4e MDEV-26827 fixup: Do not duplicate io_slots::pending_io_count()
os_aio_pending_reads_approx(), os_aio_pending_reads(): Replaces
buf_pool.n_pend_reads.

os_aio_pending_writes(): Replaces buf_dblwr.pending_writes().

buf_dblwr_t::write_cond, buf_dblwr_t::writes_pending: Remove.
2023-04-12 13:49:57 +03:00
Marko Mäkelä
6aec87544c Merge 10.5 into 10.6 2023-02-10 13:03:01 +02:00
Khem Raj
75bbf645a6 Add missing include <cstdio>
This is needed with GCC 13 and newer [1]

[1] https://www.gnu.org/software/gcc/gcc-13/porting_to.html

Signed-off-by: Khem Raj <raj.khem@gmail.com>
2023-01-27 12:43:38 +11:00
Heiko Becker
15226a2822 Add missing include for std::runtime_error
Fixes the following error when building with gcc 13:

"tpool/aio_liburing.cc:64:18: error: 'runtime_error' is not a member of 'std'
   64 |       throw std::runtime_error("aio_uring()");"
2023-01-25 17:30:18 +11:00
Daniel Black
fd8dbe0d2c MDEV-29443: prevent uring access to galera sst /notify scripts
The resources like uring in MariaDB aren't intended for spawned
processes so we restrict access using the io_uring_ring_dontfork
liburing library call.
2022-09-06 08:03:49 +10:00
Marko Mäkelä
76bb671e42 Merge 10.5 into 10.6 2022-08-25 16:02:44 +03:00
Vladislav Vaintroub
a3fd9e6b06 MDEV-29367 Refactor tpool::cache
Removed use std::vector's ba push_back(), pop_back() to  make it more
obvious that memory in the vectors won't be reallocated.

Also, "borrowed" elements can be debugged a little better now,
they are put into the start of the m_cache vector.
2022-08-24 13:36:49 +02:00
Vladislav Vaintroub
eb7f46ca1e Merge remote-tracking branch 'origin/10.5' into 10.6 2022-06-23 06:29:57 +02:00
Vladislav Vaintroub
35f2cdcb99 MDEV-28920 Rescheduling of innodb_stats_func() missing
Fixed tpool timer implementation on POSIX.
Prior to this patch, under some specific rare circumstances (concurrency
related), timer callback execution might be skipped.
2022-06-23 05:53:55 +02:00
Marko Mäkelä
3794673111 MDEV-28836: Memory alignment cleanup
Table_cache_instance: Define the structure aligned at
the CPU cache line, and remove a pad[] data member.
Krunal Bauskar reported this to improve performance on ARMv8.

aligned_malloc(): Wrapper for the Microsoft _aligned_malloc()
and the ISO/IEC 9899:2011 <stdlib.h> aligned_alloc().
Note: The parameters are in the Microsoft order (size, alignment),
opposite of aligned_alloc(alignment, size).
Note: The standard defines that size must be an integer multiple
of alignment. It is enforced by AddressSanitizer but not by GNU libc
on Linux.

aligned_free(): Wrapper for the Microsoft _aligned_free() and
the standard free().

HAVE_ALIGNED_ALLOC: A new test. Unfortunately, support for
aligned_alloc() may still be missing on some platforms.
We will fall back to posix_memalign() for those cases.

HAVE_MEMALIGN: Remove, along with any use of the nonstandard memalign().

PFS_ALIGNEMENT (sic): Removed; we will use CPU_LEVEL1_DCACHE_LINESIZE.

PFS_ALIGNED: Defined using the C++11 keyword alignas.

buf_pool_t::page_hash_table::create(),
lock_sys_t::hash_table::create():
lock_sys_t::hash_table::resize(): Pad the allocation size to an
integer multiple of the alignment.

Reviewed by: Vladislav Vaintroub
2022-06-21 16:59:49 +03:00
Marko Mäkelä
db0fde3f24 MDEV-28665 aio_uring::thread_routine terminates prematurely, causing hang
aio_uring::thread_routine(): Handle -EINTR from io_uring_wait_cqe()
in the same way as aio_linux::getevent_thread_routine() does it:
simply ignore it and invoke the system call again.

Reviewed by: Vladislav Vaintroub
2022-05-25 13:18:24 +03:00
Daniel Black
6350a52445 tpool: liburing typo in error
Also the ENOSYS is more likely explained by seccomp
filters in containers than a pre-5.1 kernel, so include
both.
2022-04-27 09:56:28 +10:00
Marko Mäkelä
2aed566d22 Cleanup: alignas(CPU_LEVEL1_DCACHE_LINESIZE)
Let us replace all use of MY_ALIGNED in InnoDB with C++11 alignas.

CACHE_LINE_SIZE: Replaced with CPU_LEVEL1_DCACHE_LINESIZE.
2022-04-14 10:40:26 +03:00
Sergei Golubchik
f92388fa14 MDEV-27900 fixes
* prevent infinite recursion in beyond-EOF reads (when pread returns 0)
* reduce code duplication

followup for d78173828e and f4fb6cb3fe
2022-03-25 20:33:42 +01:00
Daniel Black
f4fb6cb3fe MDEV-27900: aio handle partial reads/writes (uring)
MDEV-27900 continued for uring.

Also spell synchronously correctly in sql_parse.cc.

Reviewed by Wlad.
2022-03-12 16:16:47 +11:00
Daniel Black
bd1ba7801f Merge branch 10.5 into 10.6 2022-03-12 16:16:03 +11:00
Daniel Black
d78173828e MDEV-27900: aio handle partial reads/writes
As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can
really confuse MariaDB.

Filipe Manana (SuSE)[1] showed how database programmers can assume
O_DIRECT is all or nothing.

While a fix was done in the kernel side, we can do better in our code by
requesting that the rest of the block be read/written synchronously if
we do only get a partial read/write.

Per the APIs, a partial read/write can occur before an error, so
reattempting the request will leave the caller with a concrete error to
handle.

[1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a

Also spell synchronously correctly in other files.
2022-03-12 09:47:53 +11:00
Sergei Golubchik
e3894f5d39 Merge branch '10.5 into 10.6 2022-02-10 21:07:03 +01:00
Vladislav Vaintroub
012e724deb MDEV-27796 Windows - starting server with huge innodb-log-buffer-size may fail
Fixed tpool::pread() and tpool::pwrite() to return SSIZE_T on Windows,
so that huge numbers are not converted to negatives.

Also, make sure to never attempt reading/writing more bytes than
DWORD can accomodate (4G)
2022-02-10 17:25:12 +01:00
Marko Mäkelä
db915f7387 MDEV-27058: Move buf_page_t::slot to IORequest::slot
MDEV-23855 and MDEV-23399 already moved some transient data fields
from buffer pool page descriptors to IORequest, but the write buffer
of PAGE_COMPRESSED or ENCRYPTED tables was missed. Since is only needed
during asynchronous page write requests, it belongs to IORequest.
2021-11-18 17:44:33 +02:00
Marko Mäkelä
03e4cb2484 MDEV-24512 fixup: Remove after_task_callback
In commit ff5d306e29 we removed
dbug_after_task_callback but forgot to revert the rest of
commit bada05a883.
2021-09-14 16:23:23 +03:00
Marko Mäkelä
b630f0b1b9 Merge 10.5 into 10.6 2021-06-18 09:16:20 +03:00
Vladislav Vaintroub
78bd7d86a4 MDEV-25953 Tpool - prevent potential deadlock in simulated AIO
Do not execute user callback just after pwrite. Instead, submit user
function as task into thread pool. This way, the IO thread would not hog
aiocb, which is a limited (in Innodb) resource
2021-06-17 20:39:47 +02:00
Marko Mäkelä
6ba938af62 MDEV-25905: Assertion table2==NULL in dict_sys_t::add()
In commit 49e2c8f0a6 (MDEV-25743)
we made dict_sys_t::find() incompatible with the rest of the
table name hash table operations in case the table name contains
non-ASCII octets (using a compatibility mode that facilitates the
upgrade into the MySQL 5.0 filename-safe encoding) and the target
platform implements signed char.

ut_fold_string(): Remove; replace with my_crc32c(). This also makes
table name hash value calculations independent on whether char
is unsigned or signed.
2021-06-14 12:38:56 +03:00
Marko Mäkelä
28e362eaca MDEV-25760: Resubmit IO job on -EAGAIN from io_uring
The server still may abort if there is no enough free space in the
ring buffer to resubmit the IO job, but the behavior is equal to
the failure of os_aio() -> submit_io().
2021-06-14 12:38:56 +03:00
Daniel Black
f82e69735e io_liburing: ENOMEM handling - use io_uring_mlock_size
This gives the user the size required and how to set
memlock limits for the process.

Thanks Jens Axboe for providing this requested interface

ref: https://github.com/axboe/liburing/issues/246

Also don't put \n on my_printf_error, its implicit.
2021-04-15 07:42:13 +10:00
Vladislav Vaintroub
cb545f1116 CMake cleanup
- use FIND_PACKAGE(LIBAIO) to find libaio
- Use standard CMake conventions in Find{PMEM,URING}.cmake
- Drop the LIB from LIB{PMEM,URING}_{INCLUDE_DIR,LIBRARIES}
  It is cleaner, and consistent with how other packages are handled in CMake.
  e.g successful FIND_PACKAGE(PMEM) now sets PMEM_FOUND, PMEM_LIBRARIES,
  PMEM_INCLUDE_DIR, not  LIBPMEM_{FOUND,LIBRARIES,INCLUDE_DIR}.
- Decrease the output. use FIND_PACKAGE with QUIET argument.
- for Linux packages, either liburing, or libaio is required
  If liburing is installed, libaio does not need to be present   .
  Use FIND_PACKAGE([LIBAIO|URING] REQUIRED) if either library is required.
2021-03-23 17:20:17 +01:00
Marko Mäkelä
e8113f7572 CMake cleanup: Make WITH_URING, WITH_PMEM Boolean
The new default values WITH_URING:BOOL=OFF, WITH_PMEM:BOOL=OFF imply
that the dependencies are optional.
An explicit request WITH_URING=ON or WITH_PMEM=ON will cause the
build to fail if the requested dependencies are not available.

Last, to prevent a feature to be built in even though the built-time
dependencies are available, the following can be used:

cmake -DCMAKE_DISABLE_FIND_PACKAGE_URING=1
cmake -DCMAKE_DISABLE_FIND_PACKAGE_PMEM=1

This cleanup was suggested by Vladislav Vaintroub.
2021-03-20 16:23:47 +02:00
Marko Mäkelä
a0558b8c96 MDEV-24883 fixup: Add a dependency
In commit 783625d78f we forgot to
declare a dependency on the generated file mysqld_error.h.
2021-03-15 12:11:41 +02:00
Marko Mäkelä
783625d78f MDEV-24883 add io_uring support for tpool
liburing is a new optional dependency (WITH_URING=auto|yes|no)
that replaces libaio when it is available.

aio_uring: class which wraps io_uring stuff

aio_uring::bind()/unbind(): optional optimization

aio_uring::submit_io(): mutex prevents data race. liburing calls are
thread-unsafe. But if you look into it's implementation you'll see
atomic operations. They're used for synchronization between kernel and
user-space only. That's why our own synchronization is still needed.

For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288)
because the io_uring_setup system call that is invoked
by io_uring_queue_init() requests locked memory. The value
was found empirically; with 262144, we would occasionally
fail to enable io_uring when using the maximum values of
innodb_read_io_threads=64 and innodb_write_io_threads=64.

aio_uring::thread_routine(): Tolerate -EINTR return from
io_uring_wait_cqe(), because it may occur on shutdown
on Ubuntu 20.10 (Groovy Gorilla).

This was mostly implemented by Eugene Kosov. Systemd integration
and improved startup/shutdown error handling by Marko Mäkelä.
2021-03-15 11:30:17 +02:00
Marko Mäkelä
f24b738318 MDEV-24313 (2 of 2): Silently ignored innodb_use_native_aio=1
In commit 5e62b6a5e0 (MDEV-16264)
the logic of os_aio_init() was changed so that it will never fail,
but instead automatically disable innodb_use_native_aio (which is
enabled by default) if the io_setup() system call would fail due
to resource limits being exceeded. This is questionable, especially
because falling back to simulated AIO may lead to significantly
reduced performance.

srv_n_file_io_threads, srv_n_read_io_threads, srv_n_write_io_threads:
Change the data type from ulong to uint.

os_aio_init(): Remove the parameters, and actually return an error code.

thread_pool::configure_aio(): Do not silently fall back to simulated AIO.

Reviewed by: Vladislav Vaintroub
2020-12-14 15:27:03 +02:00
Vladislav Vaintroub
3ee24b2306 Simplify clang workarounds. 2020-12-07 10:35:57 +01:00
Marko Mäkelä
f3a58ed801 MDEV-24295: Fix the non-clang build
Sorry, only tested commit 4174fc1a1b
on clang. Other compilers do not define __has_feature().
2020-12-02 22:04:57 +02:00