When using the default innodb_log_buffer_size=2m, mariadb-backup --backup
would spend a lot of time re-reading and re-parsing the log. For reads,
it would be beneficial to memory-map the entire ib_logfile0 to the
address space (typically 48 bits or 256 TiB) and read it from there,
both during --backup and --prepare.
We will introduce the Boolean read-only parameter innodb_log_file_mmap
that will be OFF by default on most platforms, to avoid aggressive
read-ahead of the entire ib_logfile0 in when only a tiny portion would be
accessed. On Linux and FreeBSD the default is innodb_log_file_mmap=ON,
because those platforms define a specific mmap(2) option for enabling
such read-ahead and therefore it can be assumed that the default would
be on-demand paging. This parameter will only have impact on the initial
InnoDB startup and recovery. Any writes to the log will use regular I/O,
except when the ib_logfile0 is stored in a specially configured file system
that is backed by persistent memory (Linux "mount -o dax").
We also experimented with allowing writes of the ib_logfile0 via a
memory mapping and decided against it. A fundamental problem would be
unnecessary read-before-write in case of a major page fault, that is,
when a new, not yet cached, virtual memory page in the circular
ib_logfile0 is being written to. There appears to be no way to tell
the operating system that we do not care about the previous contents of
the page, or that the page fault handler should just zero it out.
Many references to HAVE_PMEM have been replaced with references to
HAVE_INNODB_MMAP.
The predicate log_sys.is_pmem() has been replaced with
log_sys.is_mmap() && !log_sys.is_opened().
Memory-mapped regular files differ from MAP_SYNC (PMEM) mappings in the
way that an open file handle to ib_logfile0 will be retained. In both
code paths, log_sys.is_mmap() will hold. Holding a file handle open will
allow log_t::clear_mmap() to disable the interface with fewer operations.
It should be noted that ever since
commit 685d958e38 (MDEV-14425)
most 64-bit Linux platforms on our CI platforms
(s390x a.k.a. IBM System Z being a notable exception) read and write
/dev/shm/*/ib_logfile0 via a memory mapping, pretending that it is
persistent memory (mount -o dax). So, the memory mapping based log
parsing that this change is enabling by default on Linux and FreeBSD
has already been extensively tested on Linux.
::log_mmap(): If a log cannot be opened as PMEM and the desired access
is read-only, try to open a read-only memory mapping.
xtrabackup_copy_mmap_snippet(), xtrabackup_copy_mmap_logfile():
Copy the InnoDB log in mariadb-backup --backup from a memory
mapped file.
10.5 added contents of cmake/os/FreeBSD.cmake in c991efd9c3.
in the merge to 10.11, d002b1f removed this file.
In the past FreeBSD.cmake was removed in 5369df741b
in the 10.11 branch as no remaining code was needed. The combination
of this and the merge lead to the the file being removed. My assumption is
this was a non-stable branch at the time.
The purpose of this patch is clang doesn't have /usr/local/lib in
the path. As such there are various depedency linkages that will fail.
For example pcre and libfmt.
Apparently, invoking fcntl(fd, F_SETFL, O_DIRECT) will lead to
unexpected behaviour on Linux bcachefs and possibly other file systems,
depending on the operating system version. So, let us avoid doing that,
and instead just attempt to pass the O_DIRECT flag to open(). This should
make us compatible with NetBSD, IBM AIX, as well as Solaris and its
derivatives.
We will only implement innodb_log_file_buffering=OFF on systems where
we can determine the physical block size (typically 512 or 4096 bytes).
Currently, those operating systems are Linux and Microsoft Windows.
HAVE_FCNTL_DIRECT, os_file_set_nocache(): Remove.
OS_FILE_OVERWRITE, OS_FILE_CREATE_PATH: Remove (never used parameters).
os_file_log_buffered(), os_file_log_maybe_unbuffered(): Helper functions.
os_file_create_func(): When applicable, initially attempt to open files
in O_DIRECT mode. For type==OS_LOG_FILE && create_mode != OS_FILE_CREATE
we will first invoke stat(2) on the file name to find out if the size
is compatible with O_DIRECT. If create_mode == OS_FILE_CREATE, we will
invoke fstat(2) on the created log file afterwards, and may close and
reopen the file in O_DIRECT mode if applicable.
create_temp_file(): Support O_DIRECT. This is only used if O_TMPFILE is
available and innodb_disable_sort_file_cache=ON (non-default value).
Notably, that setting never worked on Microsoft Windows.
row_merge_file_create_mode(): Split from row_merge_file_create_low().
Create a temporary file in the specified mode.
Reviewed by: Vladislav Vaintroub
Apparently, invoking fcntl(fd, F_SETFL, O_DIRECT) will lead to
unexpected behaviour on Linux bcachefs and possibly other file systems,
depending on the operating system version. So, let us avoid doing that,
and instead just attempt to pass the O_DIRECT flag to open(). This should
make us compatible with NetBSD, IBM AIX, as well as Solaris and its
derivatives.
This fix does not change the fact that we had only implemented
innodb_log_file_buffering=OFF on systems where we can determine the
physical block size (typically 512 or 4096 bytes).
Currently, those operating systems are Linux and Microsoft Windows.
HAVE_FCNTL_DIRECT, os_file_set_nocache(): Remove.
OS_FILE_OVERWRITE, OS_FILE_CREATE_PATH: Remove (never used parameters).
os_file_log_buffered(), os_file_log_maybe_unbuffered(): Helper functions.
os_file_create_simple_func(): When applicable, initially attempt to
open files in O_DIRECT mode.
os_file_create_func(): When applicable, initially attempt to
open files in O_DIRECT mode.
For type==OS_LOG_FILE && create_mode != OS_FILE_CREATE
we will first invoke stat(2) on the file name to find out if the size
is compatible with O_DIRECT. If create_mode == OS_FILE_CREATE, we will
invoke fstat(2) on the created log file afterwards, and may close and
reopen the file in O_DIRECT mode if applicable.
create_temp_file(): Support O_DIRECT. This is only used if O_TMPFILE is
available and innodb_disable_sort_file_cache=ON (non-default value).
Notably, that setting never worked on Microsoft Windows.
row_merge_file_create_mode(): Split from row_merge_file_create_low().
Create a temporary file in the specified mode.
Reviewed by: Vladislav Vaintroub
The directio(3C) function on Solaris is supported on NFS and UFS
while the majority of users should be on ZFS, which is a copy-on-write
file system that implements transparent compression and therefore
cannot support unbuffered I/O.
Let us remove the call to directio() and simply treat
innodb_flush_method=O_DIRECT in the same way as the previous
default value innodb_flush_method=fsync on Solaris. Also, let us
remove some dead code around calls to os_file_set_nocache() on
platforms where fcntl(2) is not usable with O_DIRECT.
On IBM AIX, O_DIRECT is not documented for fcntl(2), only for open(2).
AIX compilation failed, because glibc's non-standard extension to
`struct tm` were used - additional members tm_gmtoff and tm_zone.
The patch fixes it by adding corresponding compile-time check.
Additionally, for the calculation of GMT offset on AIX, a portable
variant of timegm() was required.Implementation here is inspired by
SergeyD's answer on Stackoverflow :
https://stackoverflow.com/questions/16647819/timegm-cross-platform
Remove alarm() remnants
- Replace thread-unsafe use of alarm() inside my_lock.c with a
timed loop.
- Remove configure time checks
- Remove mysys my_alarm.c/my_alarm.h
Thanks to references from Brad Smith, BSDs use getmntinfo as
a system call for mounted filesystems.
Most BSDs return statfs structures, (and we use OSX's statfs64),
but NetBSD uses a statvfs structure.
Simplify Linux getmntent_r to just use getmntent.
AIX uses getmntent.
An attempt at writing Solaris compatibility with
a small bit of HPUX compatibility was made based on man page
entries only. Fixes welcome.
statvfs structures now use f_bsize for consistency with statfs
Test case adjusted as PATH_MAX is OS defined (e.g. 1023 on AIX)
Fixes: 0ee5cf837e
also fixes:
MDEV-27818: Disk plugin does not show zpool mounted devices
This is because zpool mounted point don't begin with /.
Due to the proliferation of multiple filesystem types since this
was written, we restrict the entries listed in the disks plugin
to excude:
* read only mount points (no point monitoring, and
includes squash, snaps, sysfs, procfs, cgroups...)
* mount points that aren't directories (excludes /etc/hostname and
similar mounts in containers). (getmntent (Linux/AIX) only)
* exclude systems where there is no capacity listed (excludes various
virtual filesystem types).
Reviewer: Sergei Golubchik
As pointed out with MDEV-29308 there are issues with the code as is.
MariaDB is built as C++11 / C99. aligned_alloc() is not guarenteed
to be exposed when building with any mode other than C++17 / C11.
The other *BSD's have their stdlib.h header to expose the function
with C+11 anyway, but the issue exists in the C99 code too, the
build just does not use -Werror. Linux globally defines _GNU_SOURCE
hiding the issue as well.
On all Unix platforms, link libexecinfo as system library,
if it contains backtrace_symbols_fd function, and libc does not contain
this function
Also remove cmake/os/OpenBSD.cmake, as after the fix it serves no purpose.
Table_cache_instance: Define the structure aligned at
the CPU cache line, and remove a pad[] data member.
Krunal Bauskar reported this to improve performance on ARMv8.
aligned_malloc(): Wrapper for the Microsoft _aligned_malloc()
and the ISO/IEC 9899:2011 <stdlib.h> aligned_alloc().
Note: The parameters are in the Microsoft order (size, alignment),
opposite of aligned_alloc(alignment, size).
Note: The standard defines that size must be an integer multiple
of alignment. It is enforced by AddressSanitizer but not by GNU libc
on Linux.
aligned_free(): Wrapper for the Microsoft _aligned_free() and
the standard free().
HAVE_ALIGNED_ALLOC: A new test. Unfortunately, support for
aligned_alloc() may still be missing on some platforms.
We will fall back to posix_memalign() for those cases.
HAVE_MEMALIGN: Remove, along with any use of the nonstandard memalign().
PFS_ALIGNEMENT (sic): Removed; we will use CPU_LEVEL1_DCACHE_LINESIZE.
PFS_ALIGNED: Defined using the C++11 keyword alignas.
buf_pool_t::page_hash_table::create(),
lock_sys_t::hash_table::create():
lock_sys_t::hash_table::resize(): Pad the allocation size to an
integer multiple of the alignment.
Reviewed by: Vladislav Vaintroub
Fixed inlining flags. Remove /Ob1 added by CMake for RelWithDebInfo.
(the actual compiler default is /Ob2 if optimizations are enabled)
Allow to define custom /Ob flag with new variable MSVC_INLINE, if desired
CreateServiceA, OpenServiceA, and couple of other functions do not work
correctly with non-ASCII character, in the special case where application
has defined activeCodePage=UTF8.
Workaround by redefining affected ANSI functions to own wrapper, which
works by converting narrow(ANSI) to wide, then calling wide function.
Deprecate original ANSI service functions, via declspec, so that we can catch
their use.
Workaround Windows' bug in services "ANSI" when process ANSI codepage is
UTF8.
They turn out to work unlike any other API .
Expected behavior : strings be converted from GetACP() to Unicode,
and "wide" function would be then called.
Actual current behavior :
it seems to handle strings as-if they would be encoded in system-default
ACP, rather than process-specific GetACP()
Fix: redefine the OpenService,CreateService and ChangeServiceConfig
and do ANSI-Wide conversion outselves.
Tell compiler to deprecate some ANSI service functions.
xxx
Introduce -DFAST_BUILD parameter for a little faster build or test
if set,
- do not compile with /d2OptimizeHugeFunctions, this makes compilation
of bison output much slower on optimized build
- do not use runtime checks on debug build (RTC1). This slows down tests
considerably
Brad Smith made this OpenBSD file based of the
FreeBSD cmake directives in commit ab58904367
by the Monty Program Ab.
As such the Oracle Copyright header isn't really applicable.