The problem was that when using clang + asan, we do not get a correct value
for the thread stack as some local variables are not allocated at the
normal stack.
It looks like that for example clang 18.1.3, when compiling with
-O2 -fsanitize=addressan it puts local variables and things allocated by
alloca() in other areas than on the stack.
The following code shows the issue
Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection
(connect=0x5080000027b8,
put_in_cache=<optimized out>) at sql/sql_connect.cc:1399
THD *thd;
1399 thd->thread_stack= (char*) &thd;
(gdb) p &thd
(THD **) 0x7fffedee7060
(gdb) p $sp
(void *) 0x7fffef4e7bc0
The address of thd is 24M away from the stack pointer
(gdb) info reg
...
rsp 0x7fffef4e7bc0 0x7fffef4e7bc0
...
r13 0x7fffedee7060 140737185214560
r13 is pointing to the address of the thd. Probably some kind of
"local stack" used by the sanitizer
I have verified this with gdb on a recursive call that calls alloca()
in a loop. In this case all objects was stored in a local heap,
not on the stack.
To solve this issue in a portable way, I have added two functions:
my_get_stack_pointer() returns the address of the current stack pointer.
The code is using asm instructions for intel 32/64 bit, powerpc,
arm 32/64 bit and sparc 32/64 bit.
Supported compilers are gcc, clang and MSVC.
For MSVC 64 bit we are using _AddressOfReturnAddress()
As a fallback for other compilers/arch we use the address of a local
variable.
my_get_stack_bounds() that will return the address of the base stack
and stack size using pthread_attr_getstack() or NtCurrentTed() with
fallback to using the address of a local variable and user provided
stack size.
Server changes are:
- Moving setting of thread_stack to THD::store_globals() using
my_get_stack_bounds().
- Removing setting of thd->thread_stack, except in functions that
allocates a lot on the stack before calling store_globals(). When
using estimates for stack start, we reduce stack_size with
MY_STACK_SAFE_MARGIN (8192) to take into account the stack used
before calling store_globals().
I also added a unittest, stack_allocation-t, to verify the new code.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
When using the default innodb_log_buffer_size=2m, mariadb-backup --backup
would spend a lot of time re-reading and re-parsing the log. For reads,
it would be beneficial to memory-map the entire ib_logfile0 to the
address space (typically 48 bits or 256 TiB) and read it from there,
both during --backup and --prepare.
We will introduce the Boolean read-only parameter innodb_log_file_mmap
that will be OFF by default on most platforms, to avoid aggressive
read-ahead of the entire ib_logfile0 in when only a tiny portion would be
accessed. On Linux and FreeBSD the default is innodb_log_file_mmap=ON,
because those platforms define a specific mmap(2) option for enabling
such read-ahead and therefore it can be assumed that the default would
be on-demand paging. This parameter will only have impact on the initial
InnoDB startup and recovery. Any writes to the log will use regular I/O,
except when the ib_logfile0 is stored in a specially configured file system
that is backed by persistent memory (Linux "mount -o dax").
We also experimented with allowing writes of the ib_logfile0 via a
memory mapping and decided against it. A fundamental problem would be
unnecessary read-before-write in case of a major page fault, that is,
when a new, not yet cached, virtual memory page in the circular
ib_logfile0 is being written to. There appears to be no way to tell
the operating system that we do not care about the previous contents of
the page, or that the page fault handler should just zero it out.
Many references to HAVE_PMEM have been replaced with references to
HAVE_INNODB_MMAP.
The predicate log_sys.is_pmem() has been replaced with
log_sys.is_mmap() && !log_sys.is_opened().
Memory-mapped regular files differ from MAP_SYNC (PMEM) mappings in the
way that an open file handle to ib_logfile0 will be retained. In both
code paths, log_sys.is_mmap() will hold. Holding a file handle open will
allow log_t::clear_mmap() to disable the interface with fewer operations.
It should be noted that ever since
commit 685d958e38 (MDEV-14425)
most 64-bit Linux platforms on our CI platforms
(s390x a.k.a. IBM System Z being a notable exception) read and write
/dev/shm/*/ib_logfile0 via a memory mapping, pretending that it is
persistent memory (mount -o dax). So, the memory mapping based log
parsing that this change is enabling by default on Linux and FreeBSD
has already been extensively tested on Linux.
::log_mmap(): If a log cannot be opened as PMEM and the desired access
is read-only, try to open a read-only memory mapping.
xtrabackup_copy_mmap_snippet(), xtrabackup_copy_mmap_logfile():
Copy the InnoDB log in mariadb-backup --backup from a memory
mapped file.
At least starting with ca83115b3e
the source code cannot be compiled with anything older than GCC 4.8.5.
Furthermore, 64-bit atomic read-modify-write operations on IA-32
would depend on the LOCK CMPXCHG8B instruction, which was introduced
in the Intel Pentium. Our IA-32 builds ought to be -march=i686
starting with commit 9cabc9fd8a.
Approved by Sergei Golubchik
Apparently, invoking fcntl(fd, F_SETFL, O_DIRECT) will lead to
unexpected behaviour on Linux bcachefs and possibly other file systems,
depending on the operating system version. So, let us avoid doing that,
and instead just attempt to pass the O_DIRECT flag to open(). This should
make us compatible with NetBSD, IBM AIX, as well as Solaris and its
derivatives.
This fix does not change the fact that we had only implemented
innodb_log_file_buffering=OFF on systems where we can determine the
physical block size (typically 512 or 4096 bytes).
Currently, those operating systems are Linux and Microsoft Windows.
HAVE_FCNTL_DIRECT, os_file_set_nocache(): Remove.
OS_FILE_OVERWRITE, OS_FILE_CREATE_PATH: Remove (never used parameters).
os_file_log_buffered(), os_file_log_maybe_unbuffered(): Helper functions.
os_file_create_simple_func(): When applicable, initially attempt to
open files in O_DIRECT mode.
os_file_create_func(): When applicable, initially attempt to
open files in O_DIRECT mode.
For type==OS_LOG_FILE && create_mode != OS_FILE_CREATE
we will first invoke stat(2) on the file name to find out if the size
is compatible with O_DIRECT. If create_mode == OS_FILE_CREATE, we will
invoke fstat(2) on the created log file afterwards, and may close and
reopen the file in O_DIRECT mode if applicable.
create_temp_file(): Support O_DIRECT. This is only used if O_TMPFILE is
available and innodb_disable_sort_file_cache=ON (non-default value).
Notably, that setting never worked on Microsoft Windows.
row_merge_file_create_mode(): Split from row_merge_file_create_low().
Create a temporary file in the specified mode.
Reviewed by: Vladislav Vaintroub
The directio(3C) function on Solaris is supported on NFS and UFS
while the majority of users should be on ZFS, which is a copy-on-write
file system that implements transparent compression and therefore
cannot support unbuffered I/O.
Let us remove the call to directio() and simply treat
innodb_flush_method=O_DIRECT in the same way as the previous
default value innodb_flush_method=fsync on Solaris. Also, let us
remove some dead code around calls to os_file_set_nocache() on
platforms where fcntl(2) is not usable with O_DIRECT.
On IBM AIX, O_DIRECT is not documented for fcntl(2), only for open(2).
AIX compilation failed, because glibc's non-standard extension to
`struct tm` were used - additional members tm_gmtoff and tm_zone.
The patch fixes it by adding corresponding compile-time check.
Additionally, for the calculation of GMT offset on AIX, a portable
variant of timegm() was required.Implementation here is inspired by
SergeyD's answer on Stackoverflow :
https://stackoverflow.com/questions/16647819/timegm-cross-platform
Remove alarm() remnants
- Replace thread-unsafe use of alarm() inside my_lock.c with a
timed loop.
- Remove configure time checks
- Remove mysys my_alarm.c/my_alarm.h
This allows to simplify net_real_read() and net_real_write() a bit.
Removed some superfluous #ifdef/ifndef MYSQL_SERVER from net_serv.cc
The code always runs in server, either normal or embedded.
Dead code for switching socket between blocking and non-blocking modes,
is also removed.
Removed pthread_kill() with alarm signal that woke up main thread on
server shutdown. Used shutdown(2) on polling sockets instead, to the same
effect.
Removed yet another superstitious pthread_kill(), that ran on non-Windows
in terminate_slave_thread().
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
According to commit ea56841997
the stack normally grows downwards, except on HP PA-RISC where
it grows upwards. Because determining the stack direction is not
possible in a portable way, let us determine the default STACK_DIRECTION
in CMake based on the ISA.
On clang 16.0.6 running on and targeting AMD64, STACK_DIRECTION=1 is
being incorrectly detected, causing the failure of a number of tests.
As pointed out with MDEV-29308 there are issues with the code as is.
MariaDB is built as C++11 / C99. aligned_alloc() is not guarenteed
to be exposed when building with any mode other than C++17 / C11.
The other *BSD's have their stdlib.h header to expose the function
with C+11 anyway, but the issue exists in the C99 code too, the
build just does not use -Werror. Linux globally defines _GNU_SOURCE
hiding the issue as well.
On all Unix platforms, link libexecinfo as system library,
if it contains backtrace_symbols_fd function, and libc does not contain
this function
Also remove cmake/os/OpenBSD.cmake, as after the fix it serves no purpose.
On GNU/Linux, even though the C11 aligned_alloc() appeared in
GNU libc early on, some custom memory allocators did not
implement it until recently. For example, before
gperftools/gperftools@d406f22853
the free() in tcmalloc would fail to free memory that was
returned by aligned_alloc(), because the latter would map to the
built-in allocator of libc. The Linux specific memalign() has a
similar interface and is safer to use, because it has been
available for a longer time. For AddressSanitizer, we will use
aligned_alloc() so that the constraint on size can be enforced.
buf_tmp_reserve_compression_buf(): When HAVE_ALIGNED_ALLOC holds,
round up the size to be an integer multiple of the alignment.
pfs_malloc(): In the unit test stub, round up the size to be an
integer multiple of the alignment.
Table_cache_instance: Define the structure aligned at
the CPU cache line, and remove a pad[] data member.
Krunal Bauskar reported this to improve performance on ARMv8.
aligned_malloc(): Wrapper for the Microsoft _aligned_malloc()
and the ISO/IEC 9899:2011 <stdlib.h> aligned_alloc().
Note: The parameters are in the Microsoft order (size, alignment),
opposite of aligned_alloc(alignment, size).
Note: The standard defines that size must be an integer multiple
of alignment. It is enforced by AddressSanitizer but not by GNU libc
on Linux.
aligned_free(): Wrapper for the Microsoft _aligned_free() and
the standard free().
HAVE_ALIGNED_ALLOC: A new test. Unfortunately, support for
aligned_alloc() may still be missing on some platforms.
We will fall back to posix_memalign() for those cases.
HAVE_MEMALIGN: Remove, along with any use of the nonstandard memalign().
PFS_ALIGNEMENT (sic): Removed; we will use CPU_LEVEL1_DCACHE_LINESIZE.
PFS_ALIGNED: Defined using the C++11 keyword alignas.
buf_pool_t::page_hash_table::create(),
lock_sys_t::hash_table::create():
lock_sys_t::hash_table::resize(): Pad the allocation size to an
integer multiple of the alignment.
Reviewed by: Vladislav Vaintroub