The problem was that when using clang + asan, we do not get a correct value
for the thread stack as some local variables are not allocated at the
normal stack.
It looks like that for example clang 18.1.3, when compiling with
-O2 -fsanitize=addressan it puts local variables and things allocated by
alloca() in other areas than on the stack.
The following code shows the issue
Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection
(connect=0x5080000027b8,
put_in_cache=<optimized out>) at sql/sql_connect.cc:1399
THD *thd;
1399 thd->thread_stack= (char*) &thd;
(gdb) p &thd
(THD **) 0x7fffedee7060
(gdb) p $sp
(void *) 0x7fffef4e7bc0
The address of thd is 24M away from the stack pointer
(gdb) info reg
...
rsp 0x7fffef4e7bc0 0x7fffef4e7bc0
...
r13 0x7fffedee7060 140737185214560
r13 is pointing to the address of the thd. Probably some kind of
"local stack" used by the sanitizer
I have verified this with gdb on a recursive call that calls alloca()
in a loop. In this case all objects was stored in a local heap,
not on the stack.
To solve this issue in a portable way, I have added two functions:
my_get_stack_pointer() returns the address of the current stack pointer.
The code is using asm instructions for intel 32/64 bit, powerpc,
arm 32/64 bit and sparc 32/64 bit.
Supported compilers are gcc, clang and MSVC.
For MSVC 64 bit we are using _AddressOfReturnAddress()
As a fallback for other compilers/arch we use the address of a local
variable.
my_get_stack_bounds() that will return the address of the base stack
and stack size using pthread_attr_getstack() or NtCurrentTed() with
fallback to using the address of a local variable and user provided
stack size.
Server changes are:
- Moving setting of thread_stack to THD::store_globals() using
my_get_stack_bounds().
- Removing setting of thd->thread_stack, except in functions that
allocates a lot on the stack before calling store_globals(). When
using estimates for stack start, we reduce stack_size with
MY_STACK_SAFE_MARGIN (8192) to take into account the stack used
before calling store_globals().
I also added a unittest, stack_allocation-t, to verify the new code.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Improve detection for DES support in OpenSSL, to allow compilation
against system OpenSSL without DES.
Note that MariaDB needs to be compiled against OpenSSL-like library
that itself has DES support which cmake detected. Positive detection
is indicated with CMake variable HAVE_des 1.
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk>
Thanks to references from Brad Smith, BSDs use getmntinfo as
a system call for mounted filesystems.
Most BSDs return statfs structures, (and we use OSX's statfs64),
but NetBSD uses a statvfs structure.
Simplify Linux getmntent_r to just use getmntent.
AIX uses getmntent.
An attempt at writing Solaris compatibility with
a small bit of HPUX compatibility was made based on man page
entries only. Fixes welcome.
statvfs structures now use f_bsize for consistency with statfs
Test case adjusted as PATH_MAX is OS defined (e.g. 1023 on AIX)
Fixes: 0ee5cf837e
also fixes:
MDEV-27818: Disk plugin does not show zpool mounted devices
This is because zpool mounted point don't begin with /.
Due to the proliferation of multiple filesystem types since this
was written, we restrict the entries listed in the disks plugin
to excude:
* read only mount points (no point monitoring, and
includes squash, snaps, sysfs, procfs, cgroups...)
* mount points that aren't directories (excludes /etc/hostname and
similar mounts in containers). (getmntent (Linux/AIX) only)
* exclude systems where there is no capacity listed (excludes various
virtual filesystem types).
Reviewer: Sergei Golubchik
Added checking for support of vfork by a platform where
building being done. Set HAVE_VFORK macros in case vfork()
system call is supported. Use vfork() system call if the
macros HAVE_VFORK is set, else use fork().
`mallinfo` is deprecated since glibc 2.33 and has been replaced by mallinfo2.
The deprecation causes building the server to fail if glibc version is > 2.33.
Check if mallinfo2 exist on the system and use it instead.
Add CRC32C code to mysys. The x86-64 implementation uses PCMULQDQ in addition to CRC32 instruction
after Intel whitepaper, and is ported from rocksdb code.
Optimized ARM and POWER CRC32 were already present in mysys.
When MDEV-22669 introduced CRC-32C acceleration to IA-32,
it worked around a compiler bug by disabling the acceleration
on GCC 4 for IA-32 altogether, even though the compiler bug
only affects -fPIC builds that are targeting IA-32.
Let us extend the solution fe5dbfe723
and define HAVE_CPUID_INSTRUCTION that allows us to implement
a necessary and sufficient work-around of the compiler bug.
MDEV-22641 in commit dec3f8ca69
refactored a SIMD implementation of CRC-32 for the ISO 3309 polynomial
that uses the IA-32/AMD64 carry-less multiplication (pclmul)
instructions. The code was previously only available in Mariabackup;
it was changed to be a general replacement of the zlib crc32().
There exist AMD64 systems where CMAKE_SYSTEM_PROCESSOR matches
the pattern i[36]86 but not x86_64 or amd64. This would cause a
link failure, because mysys/checksum.c would basically assume that
the compiler support for instruction is always available on GCC-compatible
compilers on AMD64.
Furthermore, we were unnecessarily disabling the SIMD acceleration
for 32-bit executables.
Note: Until MDEV-22749 has been implemented, the PCLMUL instruction
will not be used on Microsoft Windows.
Closes: #1660
Largely based on MySQL commit
75271e51d6
MySQL Ref:
BUG#24566529: BACKPORT BUG#23575445 TO 5.6
(cut)
Also, the PTR_SANE macro which tries to check if a pointer
is invalid (used when printing pointer values in stack traces)
gave false negatives on OSX/FreeBSD. On these platforms we
now simply check if the pointer is non-null. This also removes
a sbrk() deprecation warning when building on OS X. (It was
before only disabled with building using XCode).
Removed execinfo path of MySQL patch that was already included.
sbrk doesn't exist on FreeBSD aarch64.
Removed HAVE_BSS_START based detection and replaced with __linux__
as it doesn't exist on OSX, Solaris or Windows. __bss_start
exists on mutiple Linux architectures.
Tested on FreeBSD and Linux x86_64. Being in FreeBSD ports for 2
years implies a good testing there on all FreeBSD architectures there
too. MySQL-8.0.21 code is functionally identical to original commit.
Detecting the cpus based on sysconf of the online CPUs can significantly
over estimate the number of cpus available.
Wheither via numactl, cgroups, taskset, systemd constraints, docker
containers and probably other mechanisms, the number of threads mysqld
can be run on can be quite less.
As such we use the pthread_getaffinity_np function on Linux and FreeBSD
(identical API) to get the number of CPUs.
The number of CPUs is the default for the thread_pool_size and a too
high default will resulting in large memory usage and high context
switching overhead.
Closes PR #922
The fix consists of three commits backported from 10.3:
1) Cleanup isnan() portability checks
(cherry picked from commit 7ffd7fe962)
2) Cleanup isinf() portability checks
Original problem reported by Wlad: re-compilation of 10.3 on top of 10.2
build would cache undefined HAVE_ISINF from 10.2, whereas it is expected
to be 1 in 10.3.
std::isinf() seem to be available on all supported platforms.
(cherry picked from commit bc469a0bdf)
3) Use std::isfinite in C++ code
This is addition to parent revision fixing build failures.
(cherry picked from commit 54999f4e75)
On clang, use __builtin_readcyclecounter() when available.
Hinted by Sergey Vojtovich. (This may lead to runtime failure
on ARM systems. The hardware should be available on ARMv8 (AArch64),
but access to it may require special privileges.)
We remove support for the proprietary Sun Microsystems compiler,
and rely on clang or the __GNUC__ assembler syntax instead.
For now, we retain support for IA-64 (Itanium) and 32-bit SPARC,
even though those platforms are likely no longer widely used.
We remove support for clock_gettime(CLOCK_SGI_CYCLE),
because Silicon Graphics ceased supporting IRIX in December 2013.
This was the only cycle timer interface available for MIPS.
On PowerPC, we rely on the GCC 4.8 __builtin_ppc_get_timebase()
(or clang __builtin_readcyclecounter()), which should be equivalent
to the old assembler code on both 64-bit and 32-bit targets.
Starting with the Intel Skylake microarchitecture, the PAUSE
instruction latency is about 140 clock cycles instead of earlier 10.
On AMD processors, the latency could be 10 or 50 clock cycles,
depending on microarchitecture.
Because of this big range of latency, let us scale the loops around
the PAUSE instruction based on timing results at server startup.
my_cpu_relax_multiplier: New variable: How many times to invoke PAUSE
in a loop. Only defined for IA-32 and AMD64.
my_cpu_init(): Determine with RDTSC the time to run 16 PAUSE instructions
in two unrolled loops according, and based on the quicker of the two
runs, initialize my_cpu_relax_multiplier. This form of calibration was
suggested by Mikhail Sinyavin from Intel.
LF_BACKOFF(), ut_delay(): Use my_cpu_relax_multiplier when available.
ut_delay(): Define inline in my_cpu.h.
UT_COMPILER_BARRIER(): Remove. This does not seem to have any effect,
because in our ut_delay() implementation, no computations are being
performed inside the loop. The purpose of UT_COMPILER_BARRIER() was to
prohibit the compiler from reordering computations. It was not
emitting any code.