MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758cb3bbff76672405d4ce1bce6da6c6f and revised in
commit c1fd082e9c7369f4511eb5a52e58cb15489caa74 for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2023-10-24 09:47:46 +03:00
|
|
|
--source include/have_debug.inc
|
2024-08-14 08:03:37 +10:00
|
|
|
--source include/have_cgroupv2.inc
|
MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758cb3bbff76672405d4ce1bce6da6c6f and revised in
commit c1fd082e9c7369f4511eb5a52e58cb15489caa74 for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2023-10-24 09:47:46 +03:00
|
|
|
--source include/not_embedded.inc
|
|
|
|
--source include/have_innodb.inc
|
|
|
|
--source include/have_sequence.inc
|
|
|
|
|
|
|
|
--echo #
|
|
|
|
--echo # MDEV-24670 avoid OOM by linux kernel co-operative memory management
|
|
|
|
--echo #
|
|
|
|
|
|
|
|
set @save_dbug=@@debug_dbug;
|
|
|
|
|
|
|
|
set @save_limit=@@GLOBAL.innodb_limit_optimistic_insert_debug;
|
2023-11-20 13:44:47 +02:00
|
|
|
# Wait for the undo logs to be empty from previous tests.
|
|
|
|
# This is not an actual parameter, so there is no need to restore it.
|
MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758cb3bbff76672405d4ce1bce6da6c6f and revised in
commit c1fd082e9c7369f4511eb5a52e58cb15489caa74 for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2023-10-24 09:47:46 +03:00
|
|
|
set GLOBAL innodb_max_purge_lag_wait=0;
|
|
|
|
|
2023-11-20 13:44:47 +02:00
|
|
|
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
|
MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758cb3bbff76672405d4ce1bce6da6c6f and revised in
commit c1fd082e9c7369f4511eb5a52e58cb15489caa74 for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2023-10-24 09:47:46 +03:00
|
|
|
SET GLOBAL innodb_limit_optimistic_insert_debug=2;
|
2023-11-20 13:44:47 +02:00
|
|
|
SET STATEMENT unique_checks=0, foreign_key_checks=0 FOR
|
|
|
|
INSERT INTO t1 SELECT * FROM seq_1_to_1000;
|
MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758cb3bbff76672405d4ce1bce6da6c6f and revised in
commit c1fd082e9c7369f4511eb5a52e58cb15489caa74 for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2023-10-24 09:47:46 +03:00
|
|
|
|
|
|
|
SET GLOBAL innodb_limit_optimistic_insert_debug=@save_limit;
|
|
|
|
|
|
|
|
DROP TABLE t1;
|
|
|
|
|
2024-10-03 10:55:08 +03:00
|
|
|
--disable_cursor_protocol
|
MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758cb3bbff76672405d4ce1bce6da6c6f and revised in
commit c1fd082e9c7369f4511eb5a52e58cb15489caa74 for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2023-10-24 09:47:46 +03:00
|
|
|
SELECT CAST(VARIABLE_VALUE AS INTEGER) INTO @dirty_prev
|
|
|
|
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
|
|
|
|
WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty';
|
2024-10-03 10:55:08 +03:00
|
|
|
--enable_cursor_protocol
|
MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758cb3bbff76672405d4ce1bce6da6c6f and revised in
commit c1fd082e9c7369f4511eb5a52e58cb15489caa74 for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2023-10-24 09:47:46 +03:00
|
|
|
|
|
|
|
set debug_dbug="d,trigger_garbage_collection";
|
|
|
|
SET GLOBAL innodb_buffer_pool_size=@@innodb_buffer_pool_size;
|
|
|
|
|
2024-08-14 08:03:37 +10:00
|
|
|
let SEARCH_FILE= $MYSQLTEST_VARDIR/log/mysqld.1.err;
|
|
|
|
# either a fail or the pressure event
|
|
|
|
let SEARCH_PATTERN= [Mm]emory pressure.*;
|
|
|
|
--source include/search_pattern_in_file.inc
|
|
|
|
|
2024-10-18 16:49:51 +02:00
|
|
|
# The garbage collection happens asynchronously after trigger, in a background
|
|
|
|
# thread. So wait for it to happen to avoid sporadic failure.
|
|
|
|
let $wait_condition=
|
|
|
|
SELECT CAST(VARIABLE_VALUE AS INTEGER) < @dirty_prev AS LESS_DIRTY_IS_GOOD
|
|
|
|
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
|
|
|
|
WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty';
|
|
|
|
--source include/wait_condition.inc
|
|
|
|
eval $wait_condition;
|
2023-11-20 13:44:47 +02:00
|
|
|
let SEARCH_PATTERN= InnoDB: Memory pressure event freed.*;
|
2024-10-18 16:49:51 +02:00
|
|
|
let SEARCH_WAIT= FOUND;
|
MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression
buf_page_t::set_os_unused(): Remove the system call that had been added in
commit 16c9718758cb3bbff76672405d4ce1bce6da6c6f and revised in
commit c1fd082e9c7369f4511eb5a52e58cb15489caa74 for Microsoft Windows.
buf_pool_t::garbage_collect(): A new function to collect any garbage
from the InnoDB buffer pool that can be removed without writing any
log or data files. This will also invoke madvise() for all of buf_pool.free.
To trigger this the following MDEV is implemented:
MDEV-24670 avoid OOM by linux kernel co-operative memory management
To avoid frequent triggers that caused the MDEV-31953 regression, while
still preserving the 10.11 functionality of non-greedy kernel memory
usage, memory triggers are used.
On the triggering of memory pressure, if supported in the Linux kernel,
trigger the garbage collection of the innodb buffer pool.
The hard coded triggers occur where there is:
* some memory pressure in 5 of the last 10 seconds
* a full stall on memory pressure for 10ms in the last 2 seconds
The kernel will trigger only one in each of these time windows. To avoid
mariadb being in a constant state of memory garbage collection, this has
been limited to once per minute.
For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
a systemd service as its setting a capability inside a usernamespace.
Running under systemd v254+ requires the default MemoryPressureWatch=auto
(or alternately "on").
Functionality was tested in a 6.4 kernel Fedora successfully under a
systemd service.
Running in a container requires that (unmask=)/sys/fs/cgroup be writable
by the mariadbd process.
To aid testing, the buf_pool_resize was a convient trigger point on
which to trigger garbage collection.
ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
Co-Author: Daniel Black (on memory pressure trigger)
Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2023-10-24 09:47:46 +03:00
|
|
|
--source include/search_pattern_in_file.inc
|
|
|
|
|
|
|
|
set debug_dbug=@save_dbug;
|
|
|
|
|
|
|
|
--echo # End of 10.11 tests
|