Commit graph

92 commits

Author SHA1 Message Date
Marko Mäkelä
f2bd662f6c Merge 10.6 into 10.11 2023-11-22 18:14:11 +02:00
Marko Mäkelä
d963584d4c Merge 10.5 into 10.6 2023-11-22 16:56:47 +02:00
Marko Mäkelä
78c9a12c8f MDEV-32861 InnoDB hangs when running out of I/O slots
When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1
and the server is run with the minimum parameters
innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs
were observed.

tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire()
will be woken up when the cache previously was empty.

buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite
batch so that os_aio_wait_until_no_pending_writes() has a chance of
returning. Add a Boolean parameter and pass wait_for_reads=false inside
buf_page_decrypt_after_read(), because those calls will be executed
inside a read completion callback, and therefore
os_aio_wait_until_no_pending_reads() would block indefinitely.
2023-11-22 16:54:41 +02:00
Marko Mäkelä
2ecc0443ec Merge 10.10 into 10.11 2023-10-17 16:04:21 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00
Vladislav Vaintroub
9e62ab7aaf MDEV-31095 tpool - do not create new worker, if thread creation is pending.
Use an std::atomic_flag to track thread creation in progress.
This is mainly a cleanup, the effect of this change was not measureable
in my tests.
2023-10-04 17:44:13 +02:00
Vladislav Vaintroub
e33e2fa949 MDEV-31095 tpool - restrict threadpool concurrency during bufferpool load
Add threadpool functionality to restrict concurrency during "batch"
periods (where tasks are added in rapid succession).
This will throttle thread creation more agressively than usual, while
keeping performance at least on-par.

One of these cases is bufferpool load, where async read IOs are executed
without any throttling. There can be as much as 650K read IOs for
loading 10GB buffer pool.

Another one is recovery, where "fake read" IOs are executed.

Why there are more threads than we expect?
Worker threads are not be recognized as idle, until they return to the
standby list, and to return to that list, they need to acquire
mutex currently held in the submit_task(). In those cases, submit_task()
has no worker to wake, and would create threads until default concurrency
level (2*ncpus) is satisfied. Only after that throttling would happen.
2023-10-04 17:44:02 +02:00
Marko Mäkelä
656c2e18b1 Merge 10.10 into 10.11 2023-04-14 13:08:28 +03:00
Marko Mäkelä
e552747cfd Merge 10.6 into 10.8 2023-04-13 15:52:46 +03:00
Marko Mäkelä
f50abab195 MDEV-31048 PERFORMANCE_SCHEMA lakcs InnoDB read_slots and write_slots
tpool::cache::m_mtx: Add PERFORMANCE_SCHEMA instrumentation
(wait/synch/mutex/innodb/tpool_cache_mutex). This covers the
InnoDB read_slots and write_slots for asynchronous data page I/O.
2023-04-13 15:18:26 +03:00
Marko Mäkelä
1d1e0ab2cc Merge 10.6 into 10.8 2023-04-12 15:50:08 +03:00
Marko Mäkelä
a091d6ac4e MDEV-26827 fixup: Do not duplicate io_slots::pending_io_count()
os_aio_pending_reads_approx(), os_aio_pending_reads(): Replaces
buf_pool.n_pend_reads.

os_aio_pending_writes(): Replaces buf_dblwr.pending_writes().

buf_dblwr_t::write_cond, buf_dblwr_t::writes_pending: Remove.
2023-04-12 13:49:57 +03:00
Marko Mäkelä
1fd0099839 Merge 10.10 into 10.11 2023-02-16 11:41:18 +02:00
Marko Mäkelä
dbab3e8d90 Merge 10.6 into 10.8 2023-02-10 13:43:53 +02:00
Marko Mäkelä
6aec87544c Merge 10.5 into 10.6 2023-02-10 13:03:01 +02:00
Oleksandr Byelkin
c7c415734d Merge branch '10.10' into 10.11 2023-01-31 11:07:08 +01:00
Oleksandr Byelkin
b923b80cfd Merge branch '10.6' into 10.7 2023-01-31 09:33:58 +01:00
Khem Raj
75bbf645a6 Add missing include <cstdio>
This is needed with GCC 13 and newer [1]

[1] https://www.gnu.org/software/gcc/gcc-13/porting_to.html

Signed-off-by: Khem Raj <raj.khem@gmail.com>
2023-01-27 12:43:38 +11:00
Heiko Becker
15226a2822 Add missing include for std::runtime_error
Fixes the following error when building with gcc 13:

"tpool/aio_liburing.cc:64:18: error: 'runtime_error' is not a member of 'std'
   64 |       throw std::runtime_error("aio_uring()");"
2023-01-25 17:30:18 +11:00
Marko Mäkelä
3ec4241b00 Merge 10.10 into 10.11 2022-09-07 10:14:41 +03:00
Marko Mäkelä
0c0b697ae3 Merge 10.6 into 10.7 2022-09-07 08:56:06 +03:00
Daniel Black
fd8dbe0d2c MDEV-29443: prevent uring access to galera sst /notify scripts
The resources like uring in MariaDB aren't intended for spawned
processes so we restrict access using the io_uring_ring_dontfork
liburing library call.
2022-09-06 08:03:49 +10:00
Marko Mäkelä
fe1f8f2c6b Merge 10.10 into 10.11 2022-08-30 13:36:30 +03:00
Marko Mäkelä
b86be02ecf Merge 10.6 into 10.7 2022-08-30 13:02:42 +03:00
Marko Mäkelä
76bb671e42 Merge 10.5 into 10.6 2022-08-25 16:02:44 +03:00
Vladislav Vaintroub
a3fd9e6b06 MDEV-29367 Refactor tpool::cache
Removed use std::vector's ba push_back(), pop_back() to  make it more
obvious that memory in the vectors won't be reallocated.

Also, "borrowed" elements can be debugged a little better now,
they are put into the start of the m_cache vector.
2022-08-24 13:36:49 +02:00
Vladislav Vaintroub
c8e3bcf79b MDEV-11026 Make InnoDB number of IO write/read threads dynamic
Fix concurrency error  - avoid accessing deleted memory, when io_slots is
resized. the deleted memory in this case was vftable pointer in
aiocb::m_internal_task

The fix avoids calling dummy release function, via a flag in task_group.
2022-06-27 12:00:31 +02:00
Vladislav Vaintroub
49e660bb12 MDEV-11026 Make InnoDB number of IO write/read threads dynamic
Resize the read/write slots, and recreate the io_context (for Linux libaio)
2022-06-27 11:59:20 +02:00
Marko Mäkelä
5d0496c749 Merge 10.6 into 10.7 2022-06-23 13:20:25 +03:00
Vladislav Vaintroub
eb7f46ca1e Merge remote-tracking branch 'origin/10.5' into 10.6 2022-06-23 06:29:57 +02:00
Vladislav Vaintroub
35f2cdcb99 MDEV-28920 Rescheduling of innodb_stats_func() missing
Fixed tpool timer implementation on POSIX.
Prior to this patch, under some specific rare circumstances (concurrency
related), timer callback execution might be skipped.
2022-06-23 05:53:55 +02:00
Marko Mäkelä
6680fd8d4b Merge 10.6 into 10.7 2022-06-21 18:02:41 +03:00
Marko Mäkelä
3794673111 MDEV-28836: Memory alignment cleanup
Table_cache_instance: Define the structure aligned at
the CPU cache line, and remove a pad[] data member.
Krunal Bauskar reported this to improve performance on ARMv8.

aligned_malloc(): Wrapper for the Microsoft _aligned_malloc()
and the ISO/IEC 9899:2011 <stdlib.h> aligned_alloc().
Note: The parameters are in the Microsoft order (size, alignment),
opposite of aligned_alloc(alignment, size).
Note: The standard defines that size must be an integer multiple
of alignment. It is enforced by AddressSanitizer but not by GNU libc
on Linux.

aligned_free(): Wrapper for the Microsoft _aligned_free() and
the standard free().

HAVE_ALIGNED_ALLOC: A new test. Unfortunately, support for
aligned_alloc() may still be missing on some platforms.
We will fall back to posix_memalign() for those cases.

HAVE_MEMALIGN: Remove, along with any use of the nonstandard memalign().

PFS_ALIGNEMENT (sic): Removed; we will use CPU_LEVEL1_DCACHE_LINESIZE.

PFS_ALIGNED: Defined using the C++11 keyword alignas.

buf_pool_t::page_hash_table::create(),
lock_sys_t::hash_table::create():
lock_sys_t::hash_table::resize(): Pad the allocation size to an
integer multiple of the alignment.

Reviewed by: Vladislav Vaintroub
2022-06-21 16:59:49 +03:00
Marko Mäkelä
712b443a3c Merge 10.6 into 10.7 2022-06-02 07:48:30 +03:00
Marko Mäkelä
db0fde3f24 MDEV-28665 aio_uring::thread_routine terminates prematurely, causing hang
aio_uring::thread_routine(): Handle -EINTR from io_uring_wait_cqe()
in the same way as aio_linux::getevent_thread_routine() does it:
simply ignore it and invoke the system call again.

Reviewed by: Vladislav Vaintroub
2022-05-25 13:18:24 +03:00
Sergei Golubchik
fd132be117 Merge branch '10.6' into 10.7 2022-05-11 11:25:33 +02:00
Daniel Black
6350a52445 tpool: liburing typo in error
Also the ENOSYS is more likely explained by seccomp
filters in containers than a pre-5.1 kernel, so include
both.
2022-04-27 09:56:28 +10:00
Marko Mäkelä
c235295525 Merge 10.6 into 10.7 2022-04-14 13:31:07 +03:00
Marko Mäkelä
2aed566d22 Cleanup: alignas(CPU_LEVEL1_DCACHE_LINESIZE)
Let us replace all use of MY_ALIGNED in InnoDB with C++11 alignas.

CACHE_LINE_SIZE: Replaced with CPU_LEVEL1_DCACHE_LINESIZE.
2022-04-14 10:40:26 +03:00
Marko Mäkelä
a4d753758f Merge 10.6 into 10.7 2022-03-30 08:52:05 +03:00
Sergei Golubchik
f92388fa14 MDEV-27900 fixes
* prevent infinite recursion in beyond-EOF reads (when pread returns 0)
* reduce code duplication

followup for d78173828e and f4fb6cb3fe
2022-03-25 20:33:42 +01:00
Marko Mäkelä
e67d46e4a1 Merge 10.6 into 10.7 2022-03-14 11:30:32 +02:00
Daniel Black
f4fb6cb3fe MDEV-27900: aio handle partial reads/writes (uring)
MDEV-27900 continued for uring.

Also spell synchronously correctly in sql_parse.cc.

Reviewed by Wlad.
2022-03-12 16:16:47 +11:00
Daniel Black
bd1ba7801f Merge branch 10.5 into 10.6 2022-03-12 16:16:03 +11:00
Daniel Black
d78173828e MDEV-27900: aio handle partial reads/writes
As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can
really confuse MariaDB.

Filipe Manana (SuSE)[1] showed how database programmers can assume
O_DIRECT is all or nothing.

While a fix was done in the kernel side, we can do better in our code by
requesting that the rest of the block be read/written synchronously if
we do only get a partial read/write.

Per the APIs, a partial read/write can occur before an error, so
reattempting the request will leave the caller with a concrete error to
handle.

[1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a

Also spell synchronously correctly in other files.
2022-03-12 09:47:53 +11:00
Sergei Golubchik
65f602310c Merge branch '10.6' into 10.7 2022-02-10 21:16:50 +01:00
Sergei Golubchik
e3894f5d39 Merge branch '10.5 into 10.6 2022-02-10 21:07:03 +01:00
Vladislav Vaintroub
012e724deb MDEV-27796 Windows - starting server with huge innodb-log-buffer-size may fail
Fixed tpool::pread() and tpool::pwrite() to return SSIZE_T on Windows,
so that huge numbers are not converted to negatives.

Also, make sure to never attempt reading/writing more bytes than
DWORD can accomodate (4G)
2022-02-10 17:25:12 +01:00
Marko Mäkelä
7e8a13d9d7 Merge 10.6 into 10.7 2021-11-19 17:45:52 +02:00
Marko Mäkelä
db915f7387 MDEV-27058: Move buf_page_t::slot to IORequest::slot
MDEV-23855 and MDEV-23399 already moved some transient data fields
from buffer pool page descriptors to IORequest, but the write buffer
of PAGE_COMPRESSED or ENCRYPTED tables was missed. Since is only needed
during asynchronous page write requests, it belongs to IORequest.
2021-11-18 17:44:33 +02:00