Commit graph

38 commits

Author SHA1 Message Date
Vladislav Vaintroub
c483c5ca56 MDEV-33627 preparation - tpool fix
Fix tpool to not use maintenance timer for fixed pool size.
2024-07-16 15:00:29 +02:00
Marko Mäkelä
0076eb3d4e Merge 10.5 into 10.6 2024-06-24 13:09:47 +03:00
Dave Gosselin
db0c28eff8 MDEV-33746 Supply missing override markings
Find and fix missing virtual override markings.  Updates cmake
maintainer flags to include -Wsuggest-override and
-Winconsistent-missing-override.
2024-06-20 11:32:13 -04:00
Marko Mäkelä
c849952b71 MDEV-33840: Fix GCC -Wreorder
This fixes up the merge commit 829cb1a49c
2024-06-13 19:57:40 +03:00
mariadb-DebarunBanerjee
52f6df99ed MDEV-33669 mariabackup --backup hangs
This is a server hang and not an issue with backup. While concurrent
DDLs in server gets in hanged state, mariabackup waits for DDLs to
finish trying to acquire MDL_BACKUP_BLOCK_DDL.

The server hang is serious in nature and caused by thread pool state
being incorrectly set to thread creation pending state while no creation
is actually pending. Once a thread pool reaches such state no new thread
gets created in the pool.

While it could possibly affect all thread pools in server, the innodb
thread pool is the victim in current bug where IO job gets blocked when
the pool is stuck with much less number of threads than intended.
Available workers are blocked in purge waiting for page lock to be
released by IO write (SX lock) causing a complete deadlock.

The issue is caused by the state variable m_thread_creation_pending
introduced by MDEV-31095: 9e62ab7aaf. We check and set the variable
early while attempting to create a new thread in pool but fail to reset
it if we exit the flow for other reasons like maximum threads reached
or get into thread creation throttling path.

Fix: The simple fix is to make sure that the state is reset back in case
we don't actually attempt to create the thread.
2024-04-29 08:25:18 +05:30
Marko Mäkelä
829cb1a49c Merge 10.5 into 10.6 2024-04-17 14:14:58 +03:00
Vladislav Vaintroub
f6e9600f42 MDEV-33840 tpool- switch to longer maintainence timer interval, if pool is idle
Previous solution, that would entirely switch timer off, turned out
to be deadlock prone.

This patch fixed previous attempt to switch between long/short interval
periods in MDEV-24295. Now, initial state of the timer is fixed (it is ON).
Also, avoid switching timer to longer periods if there is any activity in
the pool.
2024-04-17 10:49:59 +02:00
Vladislav Vaintroub
2ba79aba2b Revert "MDEV-33840 tpool : switch off maintenance timer when not needed."
This reverts commit 09bae92c16.
2024-04-17 09:58:34 +02:00
Sergei Golubchik
41296a07c8 Merge branch '10.5' into 10.6 2024-04-11 13:58:22 +02:00
Vladislav Vaintroub
09bae92c16 MDEV-33840 tpool : switch off maintenance timer when not needed.
Before patch, maintenance timer will tick every 0.4 seconds.
After this patch, timer will tick every 0.4 seconds when necessary(
there are delayed thread creation), switching off completely after 20
seconds of being idle.
2024-04-09 08:31:32 +02:00
Vladislav Vaintroub
9e62ab7aaf MDEV-31095 tpool - do not create new worker, if thread creation is pending.
Use an std::atomic_flag to track thread creation in progress.
This is mainly a cleanup, the effect of this change was not measureable
in my tests.
2023-10-04 17:44:13 +02:00
Vladislav Vaintroub
e33e2fa949 MDEV-31095 tpool - restrict threadpool concurrency during bufferpool load
Add threadpool functionality to restrict concurrency during "batch"
periods (where tasks are added in rapid succession).
This will throttle thread creation more agressively than usual, while
keeping performance at least on-par.

One of these cases is bufferpool load, where async read IOs are executed
without any throttling. There can be as much as 650K read IOs for
loading 10GB buffer pool.

Another one is recovery, where "fake read" IOs are executed.

Why there are more threads than we expect?
Worker threads are not be recognized as idle, until they return to the
standby list, and to return to that list, they need to acquire
mutex currently held in the submit_task(). In those cases, submit_task()
has no worker to wake, and would create threads until default concurrency
level (2*ncpus) is satisfied. Only after that throttling would happen.
2023-10-04 17:44:02 +02:00
Vladislav Vaintroub
eb7f46ca1e Merge remote-tracking branch 'origin/10.5' into 10.6 2022-06-23 06:29:57 +02:00
Vladislav Vaintroub
35f2cdcb99 MDEV-28920 Rescheduling of innodb_stats_func() missing
Fixed tpool timer implementation on POSIX.
Prior to this patch, under some specific rare circumstances (concurrency
related), timer callback execution might be skipped.
2022-06-23 05:53:55 +02:00
Marko Mäkelä
3794673111 MDEV-28836: Memory alignment cleanup
Table_cache_instance: Define the structure aligned at
the CPU cache line, and remove a pad[] data member.
Krunal Bauskar reported this to improve performance on ARMv8.

aligned_malloc(): Wrapper for the Microsoft _aligned_malloc()
and the ISO/IEC 9899:2011 <stdlib.h> aligned_alloc().
Note: The parameters are in the Microsoft order (size, alignment),
opposite of aligned_alloc(alignment, size).
Note: The standard defines that size must be an integer multiple
of alignment. It is enforced by AddressSanitizer but not by GNU libc
on Linux.

aligned_free(): Wrapper for the Microsoft _aligned_free() and
the standard free().

HAVE_ALIGNED_ALLOC: A new test. Unfortunately, support for
aligned_alloc() may still be missing on some platforms.
We will fall back to posix_memalign() for those cases.

HAVE_MEMALIGN: Remove, along with any use of the nonstandard memalign().

PFS_ALIGNEMENT (sic): Removed; we will use CPU_LEVEL1_DCACHE_LINESIZE.

PFS_ALIGNED: Defined using the C++11 keyword alignas.

buf_pool_t::page_hash_table::create(),
lock_sys_t::hash_table::create():
lock_sys_t::hash_table::resize(): Pad the allocation size to an
integer multiple of the alignment.

Reviewed by: Vladislav Vaintroub
2022-06-21 16:59:49 +03:00
Marko Mäkelä
2aed566d22 Cleanup: alignas(CPU_LEVEL1_DCACHE_LINESIZE)
Let us replace all use of MY_ALIGNED in InnoDB with C++11 alignas.

CACHE_LINE_SIZE: Replaced with CPU_LEVEL1_DCACHE_LINESIZE.
2022-04-14 10:40:26 +03:00
Sergei Golubchik
f92388fa14 MDEV-27900 fixes
* prevent infinite recursion in beyond-EOF reads (when pread returns 0)
* reduce code duplication

followup for d78173828e and f4fb6cb3fe
2022-03-25 20:33:42 +01:00
Daniel Black
bd1ba7801f Merge branch 10.5 into 10.6 2022-03-12 16:16:03 +11:00
Daniel Black
d78173828e MDEV-27900: aio handle partial reads/writes
As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can
really confuse MariaDB.

Filipe Manana (SuSE)[1] showed how database programmers can assume
O_DIRECT is all or nothing.

While a fix was done in the kernel side, we can do better in our code by
requesting that the rest of the block be read/written synchronously if
we do only get a partial read/write.

Per the APIs, a partial read/write can occur before an error, so
reattempting the request will leave the caller with a concrete error to
handle.

[1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a

Also spell synchronously correctly in other files.
2022-03-12 09:47:53 +11:00
Marko Mäkelä
03e4cb2484 MDEV-24512 fixup: Remove after_task_callback
In commit ff5d306e29 we removed
dbug_after_task_callback but forgot to revert the rest of
commit bada05a883.
2021-09-14 16:23:23 +03:00
Marko Mäkelä
783625d78f MDEV-24883 add io_uring support for tpool
liburing is a new optional dependency (WITH_URING=auto|yes|no)
that replaces libaio when it is available.

aio_uring: class which wraps io_uring stuff

aio_uring::bind()/unbind(): optional optimization

aio_uring::submit_io(): mutex prevents data race. liburing calls are
thread-unsafe. But if you look into it's implementation you'll see
atomic operations. They're used for synchronization between kernel and
user-space only. That's why our own synchronization is still needed.

For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288)
because the io_uring_setup system call that is invoked
by io_uring_queue_init() requests locked memory. The value
was found empirically; with 262144, we would occasionally
fail to enable io_uring when using the maximum values of
innodb_read_io_threads=64 and innodb_write_io_threads=64.

aio_uring::thread_routine(): Tolerate -EINTR return from
io_uring_wait_cqe(), because it may occur on shutdown
on Ubuntu 20.10 (Groovy Gorilla).

This was mostly implemented by Eugene Kosov. Systemd integration
and improved startup/shutdown error handling by Marko Mäkelä.
2021-03-15 11:30:17 +02:00
Vladislav Vaintroub
3ee24b2306 Simplify clang workarounds. 2020-12-07 10:35:57 +01:00
Marko Mäkelä
f3a58ed801 MDEV-24295: Fix the non-clang build
Sorry, only tested commit 4174fc1a1b
on clang. Other compilers do not define __has_feature().
2020-12-02 22:04:57 +02:00
Marko Mäkelä
4174fc1a1b MDEV-24295: Fix the WITH_MSAN build
For some reason, commit 5bb5d4ad3a
made clang++-11 unhappy about a constexpr declaration.
2020-12-02 21:46:01 +02:00
Vladislav Vaintroub
5bb5d4ad3a MDEV-24295 Reduce wakeups by tpool maintenance timer, when server is idle
If maintenance timer does not do much for prolonged time, it will
wake up less frequently, once every 4 seconds instead of once every 0.4
second.

It will wakeup more often if thread creation is throttled, to avoid stalls.
2020-11-30 16:46:05 +01:00
Monty
279b5f87de Avoid some DBUG prints from idle server in thread pool 2020-11-26 19:13:37 +02:00
Vladislav Vaintroub
2de95f7a1e Fix misspelling.
Kudos to Marko for finding.
2020-11-25 13:07:49 +01:00
Vicențiu Ciorbaru
efa9079fbd Fix compilation error due to type mismatch in tpool_generic.cc
size_t compared to int
2020-02-13 13:42:01 +02:00
Vladislav Vaintroub
b19760b843 MDEV-21551 : Assertion `m_active_threads.size() >= m_long_tasks_count + m_waiting_task_count' failed"
Happened when running innodb_fts.sync_ddl

m_long_task_count could be wrongly reset to 0, if m_task_queue is
empty.
2020-01-23 15:23:46 +01:00
Vladislav Vaintroub
fde1589f9b MDEV-21551 Fix race condition in thread_pool_generic::wait_begin()
While waiting for mutex, thread_pool_generic::wait_begin(),
current task can be marked long-running. This is done by periodic
mantainence task, that runs in parallel.

Fix to recheck is_long_task() after the mutex acquisition.
2020-01-22 19:36:08 +01:00
Marko Mäkelä
588eac58fd MDEV-21551: Fix -Wsign-compare
An assertion added in commit c20bf8fd49
includes a sign mismatch. Make the affected data members unsigned.
2020-01-22 10:06:07 +02:00
Vladislav Vaintroub
c20bf8fd49 MDEV-21551 Fix calculation of current concurrency level in
maybe_wake_or_create_thread()

A task that is executed,could be counted as waiting (after wait_begin()
before wait_end()) or as long-running (callback runs for a long time).

If task is both marked waiting and long running, then calculation of
current concurrency (# of executing tasks - # of long tasks - #of waiting tasks)
is wrong, as task is counted twice.

Thus current concurrency could go negative, but with unsigned arithmetic
it will become a huge number.

As a result, maybe_wake_or_create_thread() would neither wake or create
a thread, when it should. Which may result in a deadlock.
2020-01-22 00:01:25 +01:00
Vladislav Vaintroub
508bc20a85 tpool - misc fixes 2020-01-12 21:34:59 +01:00
Vladislav Vaintroub
c27577a1ad MDEV-21326 : Address TSAN warnings in tpool.
1. Fix places where data race warnings were relevant.

tls_worker_data::m_state should be modified under mutex protection,
since both maintainence timer and current worker set this flag.

2. Suppress warnings that are legitimate, yet harmless.
Apparently, the dirty reads in waitable_task::get_ref_count() or
write_slots->pending_io_count()

Avoiding race entirely without side-effects here is tricky,
and the effects of race is harmless.

The worst thing that can happen due to race is an extra wait notification,
under rare circumstances.
2020-01-12 20:30:26 +01:00
Vladislav Vaintroub
bada05a883 tpool - implement post-task callback (for Innodb debugging) 2020-01-12 19:08:02 +01:00
Vladislav Vaintroub
66de4fef76 MDEV-16264 - some improvements
- wait notification, tpool_wait_begin/tpool_wait_end - to notify the
threadpool that current thread is going to wait

Use it to wait for IOs to complete and also when purge waits for workers.
2019-12-09 21:12:13 +01:00
Marko Mäkelä
8040998624 MDEV-16264: Fix some white space 2019-11-15 19:55:13 +02:00
Vladislav Vaintroub
00ee8d85c9 MDEV-16264: Add threadpool library
The library is capable of
- asynchronous execution of tasks (and optionally waiting for them)
- asynchronous file IO
  This is implemented using libaio on Linux and completion ports on
  Windows. Elsewhere, async io is "simulated", which means worker threads
  are performing synchronous IO.
- timers, scheduling work asynchronously in some point of the future.
  Also periodic timers are implemented.
2019-11-15 16:50:22 +01:00