When using the default innodb_log_buffer_size=2m, mariadb-backup --backup
would spend a lot of time re-reading and re-parsing the log. For reads,
it would be beneficial to memory-map the entire ib_logfile0 to the
address space (typically 48 bits or 256 TiB) and read it from there,
both during --backup and --prepare.
We will introduce the Boolean read-only parameter innodb_log_file_mmap
that will be OFF by default on most platforms, to avoid aggressive
read-ahead of the entire ib_logfile0 in when only a tiny portion would be
accessed. On Linux and FreeBSD the default is innodb_log_file_mmap=ON,
because those platforms define a specific mmap(2) option for enabling
such read-ahead and therefore it can be assumed that the default would
be on-demand paging. This parameter will only have impact on the initial
InnoDB startup and recovery. Any writes to the log will use regular I/O,
except when the ib_logfile0 is stored in a specially configured file system
that is backed by persistent memory (Linux "mount -o dax").
We also experimented with allowing writes of the ib_logfile0 via a
memory mapping and decided against it. A fundamental problem would be
unnecessary read-before-write in case of a major page fault, that is,
when a new, not yet cached, virtual memory page in the circular
ib_logfile0 is being written to. There appears to be no way to tell
the operating system that we do not care about the previous contents of
the page, or that the page fault handler should just zero it out.
Many references to HAVE_PMEM have been replaced with references to
HAVE_INNODB_MMAP.
The predicate log_sys.is_pmem() has been replaced with
log_sys.is_mmap() && !log_sys.is_opened().
Memory-mapped regular files differ from MAP_SYNC (PMEM) mappings in the
way that an open file handle to ib_logfile0 will be retained. In both
code paths, log_sys.is_mmap() will hold. Holding a file handle open will
allow log_t::clear_mmap() to disable the interface with fewer operations.
It should be noted that ever since
commit 685d958e38 (MDEV-14425)
most 64-bit Linux platforms on our CI platforms
(s390x a.k.a. IBM System Z being a notable exception) read and write
/dev/shm/*/ib_logfile0 via a memory mapping, pretending that it is
persistent memory (mount -o dax). So, the memory mapping based log
parsing that this change is enabling by default on Linux and FreeBSD
has already been extensively tested on Linux.
::log_mmap(): If a log cannot be opened as PMEM and the desired access
is read-only, try to open a read-only memory mapping.
xtrabackup_copy_mmap_snippet(), xtrabackup_copy_mmap_logfile():
Copy the InnoDB log in mariadb-backup --backup from a memory
mapped file.
In commit 232d7a5e2d we almost got
the detection logic right. However, the XGETBV instruction would
crash if Linux was started up with the option noxsave.
have_vpclmulqdq(): Check for the XSAVE flag at the correct position
and also for the AVX flag.
This was tested on Ubuntu 22.04 by starting up its Linux 5.15 kernel
with and without the noxsave option.
It is not sufficient to check that the CPU supports the necessary
instructions. Also the operating system (or virtual machine hypervisor)
must enable all the AVX registers to be saved and restored on a
context switch.
Because clang 8 does not support the compiler intrinsic _xgetbv()
we will require clang 9 or later for enabling the use of VPCLMULQDQ
and the related AVX512 features.
Line numbers had to be removed from the ignorelists in order to be
diffed against since locations of the same findings can differ
across runs. Therefore preprocessing has to be done on the CI findings
so that it can be compared to the ignorelist and new findings can be
outputted. However, since line numbers have to be removed, a situation
occurs where it is difficult to reference the location of findings
in code given the output of the CI job.
To lessen this pain, change the cppcheck template to include
code snippets which make it easier to reference where in the code
the finding is referring to, even in the absence of line numbers.
Ignorelisting works as before since locations of the finding may
change but not the code it is referring to.
Furthermore, due to the innate difficulty in maintaining ignorelists
across branches and triaging new findings, allow failure as to not
have constantly failing pipelines as a result of a new findings that
have not been addressed yet.
Lastly, update SAST ignorelists to match the newly refactored cppcheck
job and the current state of the codebase.
All new code of the whole pull request, including one or several
files that are either new files or modified ones, are contributed
under the BSD-new license. I am contributing on behalf of my
employer Amazon Web Services, Inc.
Rectify cases of mismatched brackets and address
possible cases of division by zero by checking if
the denominator is zero before dividing.
No functional changes were made.
All new code of the whole pull request, including one or several
files that are either new files or modified ones, are contributed
under the BSD-new license. I am contributing on behalf of my
employer Amazon Web Services, Inc.
MariaDB supports a "wait-free concurrent allocator based on pinning addresses".
In `lf_pinbox_real_free()` it tries to sort the pinned addresses for better
performance to use binary search during "real free". `alloca()` was used to
allocate stack memory and copy addresses.
To prevent a stack overflow when allocating the stack memory the function checks
if there's enough stack space. However, the available stack size was calculated
inaccurately which eventually caused database crash due to stack overflow.
The crash was seen on MariaDB 10.6.11 but the same code defect exists on all
MariaDB versions.
A similar issue happened previously and the fix in fc2c1e43 was to add a
`ALLOCA_SAFETY_MARGIN` which is 8192 bytes. However, that safety margin is not
enough during high connection workloads.
MySQL also had a similar issue and the fix
https://github.com/mysql/mysql-server/commit/b086fda was to remove the use of
`alloca` and replace qsort approach by a linear scan through all pointers (pins)
owned by each thread.
This commit is mostly the same as it is the only way to solve this issue as:
1. Frame sizes in different architecture can be different.
2. Number of active (non-null) pinned addresses varies, so the frame
size for the recursive sorting function `msort_with_tmp` is also hard
to predict.
3. Allocating big memory blocks in stack doesn't seem to be a very good
practice.
For further details see the mentioned commit in MySQL and the inline comments.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
We have an issue if a user have the following in a configuration file:
log_slow_filter="" # Log everything to slow query log
log_queries_not_using_indexes=ON
This set log_slow_filter to 'not_using_index' which disables
slow_query_logging of most queries.
In effect, on should never use log_slow_filter="" in config files but
instead use log_slow_filter=ALL.
Fixed by changing log_slow_filter="" that comes either from a
configuration file or from the command line, when starting to the server,
to log_slow_filter=ALL.
A warning will be printed when this happens.
Other things:
- One can now use =ALL for any 'set' variable to set all options at once.
(backported from 10.6)
Apart from better performance when accessing thread local variables,
we'll get rid of things that depend on initialization/cleanup of
pthread_key_t variables.
Where appropriate, use compiler-dependent pre-C++11 thread-local
equivalents, where it makes sense, to avoid initialization check overhead
that non-static thread_local can suffer from.