Commit graph

12 commits

Author SHA1 Message Date
Sergei Golubchik
f92388fa14 MDEV-27900 fixes
* prevent infinite recursion in beyond-EOF reads (when pread returns 0)
* reduce code duplication

followup for d78173828e and f4fb6cb3fe
2022-03-25 20:33:42 +01:00
Daniel Black
bd1ba7801f Merge branch 10.5 into 10.6 2022-03-12 16:16:03 +11:00
Daniel Black
d78173828e MDEV-27900: aio handle partial reads/writes
As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can
really confuse MariaDB.

Filipe Manana (SuSE)[1] showed how database programmers can assume
O_DIRECT is all or nothing.

While a fix was done in the kernel side, we can do better in our code by
requesting that the rest of the block be read/written synchronously if
we do only get a partial read/write.

Per the APIs, a partial read/write can occur before an error, so
reattempting the request will leave the caller with a concrete error to
handle.

[1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a

Also spell synchronously correctly in other files.
2022-03-12 09:47:53 +11:00
Marko Mäkelä
783625d78f MDEV-24883 add io_uring support for tpool
liburing is a new optional dependency (WITH_URING=auto|yes|no)
that replaces libaio when it is available.

aio_uring: class which wraps io_uring stuff

aio_uring::bind()/unbind(): optional optimization

aio_uring::submit_io(): mutex prevents data race. liburing calls are
thread-unsafe. But if you look into it's implementation you'll see
atomic operations. They're used for synchronization between kernel and
user-space only. That's why our own synchronization is still needed.

For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288)
because the io_uring_setup system call that is invoked
by io_uring_queue_init() requests locked memory. The value
was found empirically; with 262144, we would occasionally
fail to enable io_uring when using the maximum values of
innodb_read_io_threads=64 and innodb_write_io_threads=64.

aio_uring::thread_routine(): Tolerate -EINTR return from
io_uring_wait_cqe(), because it may occur on shutdown
on Ubuntu 20.10 (Groovy Gorilla).

This was mostly implemented by Eugene Kosov. Systemd integration
and improved startup/shutdown error handling by Marko Mäkelä.
2021-03-15 11:30:17 +02:00
Vladislav Vaintroub
1435f35bda Clarify some comments.
- the intention for my_getevents syscall is now better explained,
why are we using it (to be able to interrupt io_getevents syscall via
io_destroy()).

- Fix comment for MAX_EVENTS in getevent_thread_routine.
MAX_EVENTS is more of less arbitrary constant, chosen such that events array
is big enough to get multiple simultaneous io completions, but small
enough so it does not blow the thread's stack.
2020-11-30 16:46:06 +01:00
Marko Mäkelä
f693b72547 MDEV-24270: Clarify some comments 2020-11-25 16:08:26 +02:00
Vladislav Vaintroub
c130c60b2b Cleanup. Provide accurate comment on my_getevents(). 2020-11-25 13:07:08 +01:00
Vladislav Vaintroub
78df9e37a6 Partially Revert "MDEV-24270: Collect multiple completed events at a time"
This partially reverts commit 6479006e14.

Remove the constant tpool::aio::N_PENDING, which has no
intrinsic meaning for the tpool.
2020-11-25 13:07:08 +01:00
Marko Mäkelä
6479006e14 MDEV-24270: Collect multiple completed events at a time
tpool::aio::N_PENDING: Replaces OS_AIO_N_PENDING_IOS_PER_THREAD.
This limits two similar things: the number of outstanding requests
that a thread may io_submit(), and the number of completed requests
collected at a time by io_getevents().
2020-11-25 09:42:38 +02:00
Marko Mäkelä
7a9405e3dc MDEV-24270 Misuse of io_getevents() causes wake-ups at least twice per second
In the asynchronous I/O interface, InnoDB is invoking io_getevents()
with a timeout value of half a second, and requesting exactly 1 event
at a time.

The reason to have such a short timeout is to facilitate shutdown.

We can do better: Use an infinite timeout, wait for a larger maximum
number of events. On shutdown, we will invoke io_destroy(), which
should lead to the io_getevents system call reporting EINVAL.

my_getevents(): Reimplement the libaio io_getevents() by only invoking
the system call. The library implementation would try to elide the
system call and return 0 immediately if aio_ring_is_empty() holds.
Here, we do want a blocking system call, not 100% CPU usage. Neither
do we want the aio_ring_is_empty() trigger SIGSEGV because it is
dereferencing some memory that was freed by io_destroy().
2020-11-25 09:40:12 +02:00
Marko Mäkelä
57444a3b30 MDEV-16264: Minor cleanup
aio_linux::m_max_io_count: Unused data member; remove.

aiocb::m_ret_len: Declare as the more compatible type size_t.
Unfortunately, ssize_t is not available on Microsoft Visual Studio.
2019-12-03 11:05:18 +02:00
Vladislav Vaintroub
00ee8d85c9 MDEV-16264: Add threadpool library
The library is capable of
- asynchronous execution of tasks (and optionally waiting for them)
- asynchronous file IO
  This is implemented using libaio on Linux and completion ports on
  Windows. Elsewhere, async io is "simulated", which means worker threads
  are performing synchronous IO.
- timers, scheduling work asynchronously in some point of the future.
  Also periodic timers are implemented.
2019-11-15 16:50:22 +01:00