Commit graph

9 commits

Author SHA1 Message Date
Marko Mäkelä
783625d78f MDEV-24883 add io_uring support for tpool
liburing is a new optional dependency (WITH_URING=auto|yes|no)
that replaces libaio when it is available.

aio_uring: class which wraps io_uring stuff

aio_uring::bind()/unbind(): optional optimization

aio_uring::submit_io(): mutex prevents data race. liburing calls are
thread-unsafe. But if you look into it's implementation you'll see
atomic operations. They're used for synchronization between kernel and
user-space only. That's why our own synchronization is still needed.

For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288)
because the io_uring_setup system call that is invoked
by io_uring_queue_init() requests locked memory. The value
was found empirically; with 262144, we would occasionally
fail to enable io_uring when using the maximum values of
innodb_read_io_threads=64 and innodb_write_io_threads=64.

aio_uring::thread_routine(): Tolerate -EINTR return from
io_uring_wait_cqe(), because it may occur on shutdown
on Ubuntu 20.10 (Groovy Gorilla).

This was mostly implemented by Eugene Kosov. Systemd integration
and improved startup/shutdown error handling by Marko Mäkelä.
2021-03-15 11:30:17 +02:00
Vladislav Vaintroub
1435f35bda Clarify some comments.
- the intention for my_getevents syscall is now better explained,
why are we using it (to be able to interrupt io_getevents syscall via
io_destroy()).

- Fix comment for MAX_EVENTS in getevent_thread_routine.
MAX_EVENTS is more of less arbitrary constant, chosen such that events array
is big enough to get multiple simultaneous io completions, but small
enough so it does not blow the thread's stack.
2020-11-30 16:46:06 +01:00
Marko Mäkelä
f693b72547 MDEV-24270: Clarify some comments 2020-11-25 16:08:26 +02:00
Vladislav Vaintroub
c130c60b2b Cleanup. Provide accurate comment on my_getevents(). 2020-11-25 13:07:08 +01:00
Vladislav Vaintroub
78df9e37a6 Partially Revert "MDEV-24270: Collect multiple completed events at a time"
This partially reverts commit 6479006e14.

Remove the constant tpool::aio::N_PENDING, which has no
intrinsic meaning for the tpool.
2020-11-25 13:07:08 +01:00
Marko Mäkelä
6479006e14 MDEV-24270: Collect multiple completed events at a time
tpool::aio::N_PENDING: Replaces OS_AIO_N_PENDING_IOS_PER_THREAD.
This limits two similar things: the number of outstanding requests
that a thread may io_submit(), and the number of completed requests
collected at a time by io_getevents().
2020-11-25 09:42:38 +02:00
Marko Mäkelä
7a9405e3dc MDEV-24270 Misuse of io_getevents() causes wake-ups at least twice per second
In the asynchronous I/O interface, InnoDB is invoking io_getevents()
with a timeout value of half a second, and requesting exactly 1 event
at a time.

The reason to have such a short timeout is to facilitate shutdown.

We can do better: Use an infinite timeout, wait for a larger maximum
number of events. On shutdown, we will invoke io_destroy(), which
should lead to the io_getevents system call reporting EINVAL.

my_getevents(): Reimplement the libaio io_getevents() by only invoking
the system call. The library implementation would try to elide the
system call and return 0 immediately if aio_ring_is_empty() holds.
Here, we do want a blocking system call, not 100% CPU usage. Neither
do we want the aio_ring_is_empty() trigger SIGSEGV because it is
dereferencing some memory that was freed by io_destroy().
2020-11-25 09:40:12 +02:00
Marko Mäkelä
57444a3b30 MDEV-16264: Minor cleanup
aio_linux::m_max_io_count: Unused data member; remove.

aiocb::m_ret_len: Declare as the more compatible type size_t.
Unfortunately, ssize_t is not available on Microsoft Visual Studio.
2019-12-03 11:05:18 +02:00
Vladislav Vaintroub
00ee8d85c9 MDEV-16264: Add threadpool library
The library is capable of
- asynchronous execution of tasks (and optionally waiting for them)
- asynchronous file IO
  This is implemented using libaio on Linux and completion ports on
  Windows. Elsewhere, async io is "simulated", which means worker threads
  are performing synchronous IO.
- timers, scheduling work asynchronously in some point of the future.
  Also periodic timers are implemented.
2019-11-15 16:50:22 +01:00