MariaDB server crashes on ARM (weak memory model architecture) while
concurrently executing l_find to load node->key and add_to_purgatory
to store node->key = NULL. l_find then uses key (which is NULL), to
pass it to a comparison function.
The specific problem is the out-of-order execution that happens on a
weak memory model architecture. Two essential reorderings are possible,
which need to be prevented.
a) As l_find has no barriers in place between the optimistic read of
the key field lf_hash.cc#L117 and the verification of link lf_hash.cc#L124,
the processor can reorder the load to happen after the while-loop.
In that case, a concurrent thread executing add_to_purgatory on the same
node can be scheduled to store NULL at the key field lf_alloc-pin.c#L253
before key is loaded in l_find.
b) A node is marked as deleted by a CAS in l_delete lf_hash.cc#L247 and
taken off the list with an upfollowing CAS lf_hash.cc#L252. Only if both
CAS succeed, the key field is written to by add_to_purgatory. However,
due to a missing barrier, the relaxed store of key lf_alloc-pin.c#L253
can be moved ahead of the two CAS operations, which makes the value of
the local purgatory list stored by add_to_purgatory visible to all threads
operating on the list. As the node is not marked as deleted yet, the
same error occurs in l_find.
This change three accesses to be atomic.
* optimistic read of key in l_find lf_hash.cc#L117
* read of link for verification lf_hash.cc#L124
* write of key in add_to_purgatory lf_alloc-pin.c#L253
Reviewers: Sergei Vojtovich, Sergei Golubchik
Fixes: MDEV-23510 / d30c1331a18d875e553f3fcf544997e4f33fb943
MariaDB server crashes on ARM (weak memory model architecture) while
concurrently executing l_find to load node->key and add_to_purgatory
to store node->key = NULL. l_find then uses key (which is NULL), to
pass it to a comparison function.
The specific problem is the out-of-order execution that happens on a
weak memory model architecture. Two essential reorderings are possible,
which need to be prevented.
a) As l_find has no barriers in place between the optimistic read of
the key field lf_hash.cc#L117 and the verification of link lf_hash.cc#L124,
the processor can reorder the load to happen after the while-loop.
In that case, a concurrent thread executing add_to_purgatory on the same
node can be scheduled to store NULL at the key field lf_alloc-pin.c#L253
before key is loaded in l_find.
b) A node is marked as deleted by a CAS in l_delete lf_hash.cc#L247 and
taken off the list with an upfollowing CAS lf_hash.cc#L252. Only if both
CAS succeed, the key field is written to by add_to_purgatory. However,
due to a missing barrier, the relaxed store of key lf_alloc-pin.c#L253
can be moved ahead of the two CAS operations, which makes the value of
the local purgatory list stored by add_to_purgatory visible to all threads
operating on the list. As the node is not marked as deleted yet, the
same error occurs in l_find.
This change three accesses to be atomic.
* optimistic read of key in l_find lf_hash.cc#L117
* read of link for verification lf_hash.cc#L124
* write of key in add_to_purgatory lf_alloc-pin.c#L253
Reviewers: Sergei Vojtovich, Sergei Golubchik
Fixes: MDEV-23510 / d30c1331a18d875e553f3fcf544997e4f33fb943
Some architectures (mips) require libatomic to support proper
atomic operations. Check first if support is available without
linking, otherwise use the library.
Contributors:
James Cowgill <jcowgill@debian.org>
Jessica Clarke <jrtc27@debian.org>
Vicențiu Ciorbaru <vicentiu@mariadb.org>
Create minidump when server fails to shutdown. If process is being
debugged, cause a debug break.
Moves some code which is part of safe_kill into mysys, as both safe_kill,
and mysqltest produce minidumps on different timeouts.
Small cleanup in wait_until_dead() - replace inefficient loop with a single
wait.
Thanks to Fabian Vogt for noticing the mutual exclusions
of these open flags on tmpfs caused by mariadb opening it
incorrectly.
As such we clear the O_CREAT flag while opening it as O_TMPFILE.
This is a side-effect of my_large_malloc() introduction,MDEV-18851
It removed a cast to size_t to variable 'blocks' in
multiplication blocks * keycache->key_cache_block_size , creating ulong value
instead of correct size_t.
Replaced a couple of ulongs with appropriate data type, which is size_t.
Also, fixed casts to ulongs in crash handler messages, so that people would
not be confused by that, too.
Interestingly, aria did not expose the same problem even if it contains
copied and pasted code in ma_pagecache, because Aria had some ulongs removed
when fixing a similar problem in MDEV-9256.
init_mutex_v1_t: Stop lying that the mutex parameter is const.
GCC 11.2.0 assumes that it is and could complain about any mysql_mutex_t
being uninitialized even after mysql_mutex_init() as long as
PLUGIN_PERFSCHEMA is enabled.
init_rwlock_v1_t, init_cond_v1_t: Remove untruthful const qualifiers.
Note: init_socket_v1_t is expecting that the socket fd has already
been created before PSI_SOCKET_CALL(init_socket), and therefore that
parameter really is being treated as a pointer to const.
my_init_atomic_write(): Detect all forms of SSD, in case multiple
types of devices are installed in the same machine.
This was broken in commit ed008a74cf
and further in commit 70684afef2.
SAME_DEV(): Match block devices, ignoring partition numbers.
Let us use stat() instead of lstat(), in case someone has a symbolic
link in /dev.
Instead of reporting errors with perror(), let us use fprintf(stderr)
with the file name, the impact of the error, and the strerror(errno).
Because this code is specific to Linux, we may depend on the
GNU libc/uClibc/musl extension %m for strerror(errno).
This needs backporting to MariaDB 10.5.
Any changes I submit are freely available under the new BSD
license.
Signed-off-by: Nia Alarie <nia@NetBSD.org>
Removed redundant code for BF abort transaction in `thr_lock.cc`.
TOI operations will ignore provided lock_wait_timeout and use `LONG_TIMEOUT`
until operation is finished.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
The easiest way to compile and test the server with UBSAN is to run:
./BUILD/compile-pentium64-ubsan
and then run mysql-test-run.
After this commit, one should be able to run this without any UBSAN
warnings. There is still a few compiler warnings that should be fixed
at some point, but these do not expose any real bugs.
The 'special' cases where we disable, suppress or circumvent UBSAN are:
- ref10 source (as here we intentionally do some shifts that UBSAN
complains about.
- x86 version of optimized int#korr() methods. UBSAN do not like unaligned
memory access of integers. Fixed by using byte_order_generic.h when
compiling with UBSAN
- We use smaller thread stack with ASAN and UBSAN, which forced me to
disable a few tests that prints the thread stack size.
- Verifying class types does not work for shared libraries. I added
suppression in mysql-test-run.pl for this case.
- Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
safe to have overflows (two cases, in item_func.cc).
Things fixed:
- Don't left shift signed values
(byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
- Don't assign not non existing values to enum variables.
- Ensure that bool and enum values are properly initialized in
constructors. This was needed as UBSAN checks that these types has
correct values when one copies an object.
(gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
- Ensure we do not called handler functions on unallocated objects or
deleted objects.
(events.cc, sql_acl.cc).
- Fixed bugs in Item_sp::Item_sp() where we did not call constructor
on Query_arena object.
- Fixed several cast of objects to an incompatible class!
(Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
sql_select.cc ...)
- Ensure we do not do integer arithmetic that causes over or underflows.
This includes also ++ and -- of integers.
(Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
- Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
value_type is initialized to this instead of to -1, which is not a valid
enum value for json_value_types.
- Ensure we do not call memcpy() when second argument could be null.
- Fixed that Item_func_str::make_empty_result() creates an empty string
instead of a null string (safer as it ensures we do not do arithmetic
on null strings).
Other things:
- Changed struct st_position to an OBJECT and added an initialization
function to it to ensure that we do not copy or use uninitialized
members. The change to a class was also motived that we used "struct
st_position" and POSITION randomly trough the code which was
confusing.
- Notably big rewrite in sql_acl.cc to avoid using deleted objects.
- Changed in sql_partition to use '^' instead of '-'. This is safe as
the operator is either 0 or 0x8000000000000000ULL.
- Added check for select_nr < INT_MAX in JOIN::build_explain() to
avoid bug when get_select() could return NULL.
- Reordered elements in POSITION for better alignment.
- Changed sql_test.cc::print_plan() to use pointers instead of objects.
- Fixed bug in find_set() where could could execute '1 << -1'.
- Added variable have_sanitizer, used by mtr. (This variable was before
only in 10.5 and up). It can now have one of two values:
ASAN or UBSAN.
- Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
it virtual. This was an effort to get UBSAN to work with loaded storage
engines. I kept the change as the new place is better.
- Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
in tabutil.cpp.
- Added HAVE_REPLICATION around usage of rgi_slave, to get embedded
server to compile with UBSAN. (Patch from Marko).
- Added #ifdef for powerpc64 to avoid a bug in old gcc versions related
to integer arithmetic.
Changes that should not be needed but had to be done to suppress warnings
from UBSAN:
- Added static_cast<<uint16_t>> around shift to get rid of a LOT of
compiler warnings when using UBSAN.
- Had to change some '/' of 2 base integers to shift to get rid of
some compile time warnings.
Reviewed by:
- Json changes: Alexey Botchkov
- Charset changes in ctype-uca.c: Alexander Barkov
- InnoDB changes & Embedded server: Marko Mäkelä
- sql_acl.cc changes: Vicențiu Ciorbaru
- build_explain() changes: Sergey Petrunia
It seems some overly tolerant compilers (gcc) allow the structure
of IO_CACHE that is defined differently in libmaria to have
members equalivance to the iocache in mysys.
More strict Solaris compilers recognise that rc_pos really
isn't a structure member and won't compile.
In commit d25f806d73 (MDEV-22749)
the CRC-32C implementation of MariaDB was broken on some
IA-32 and AMD64 builds, depending on the compiler version and
build options. This was verified for IA-32 on GCC 10.2.1.
Even though we try to identify the SSE4.2 extensions and the
availaibility of the PCLMULQDQ instruction by executing CPUID,
the fall-back code could be generated with extended instructions,
because the entire file mysys/crc32/crc32c.c was being compiled
with -msse4.2 -mpclmul. This would cause SIGILL on a PINSRD
instruction on affected IA-32 targets (such as some Intel Atom
processors). This might also affect old AMD64 processors
(predating the 2007 Intel Nehalem microarchitecture), if some
compiler chose to emit the offending instructions.
While it is fine to pass a target-specific option to a target-specific
compilation unit (like -mpclmul to a PCLMUL-specific compilation unit),
that is not safe for mixed-architecture compilation units.
For mixed-architecture compilation units, the correct way is to set
target attributes on the target-specific functions.
There does not seem to be a way to pass target attributes to
function template instantiation. Hence, we must replace the
ExtendImpl template with plain functions crc32_sse42() and
crc32_slow().
We will also remove some inconsistency between
my_crc32_implementation() and mysys_namespace::crc32::Choose_Extend().
The function crc32_pclmul_enabled() will be moved to mysys/crc32/crc32c.cc
so that the detection code will be compiled without -msse4.2 -mpclmul.
The AMD64 PCLMUL accelerated crc32c_3way() will be moved to a new
file crc32c_amd64.cc. In this way, only a few functions that depend
on -msse4.2 in mysys/crc32/crc32c.cc can be declared with
__attribute__((target("sse4.2"))), and most of the file can be compiled
for the generic target.
Last, the file mysys/crc32ieee.cc will be omitted on 64-bit POWER,
because it was dead code (no symbols were exported).
Reviewed by: Vladislav Vaintroub
ALIGN was defined already:
mysys/crc32/crc32c.cc:390: warning: "ALIGN" redefined
#define ALIGN(n, m) ((n + ((1 << m) - 1)) & ~((1 << m) - 1))
In file included from /root/aix/build/include/my_global.h:543,
from /root/aix/build/mysys/crc32/crc32c.cc:22:
/opt/freeware/lib/gcc/powerpc-ibm-aix7.1.0.0/8/include-fixed/sys/socket.h:788: note: this is the location of the previous definition
#define ALIGN(p) (ulong)((caddr_t)(p) + MACHINE_ALIGNMENT - 1 - \
The reason for the crash was that there was not a write lock to
protect against file rotations in the server_audit plugin after an
audit plugin patch to changed audit mutexes to read & write locks.
The fixes are:
* Moving server_audit.c to use read & write locks (which improves
performance).
* Added functionality in file_logger.c to not do file rotations until
it is allowed by the caller (done without any interface changes for
the logging service).
* Move checking of file size limit to server_audit.c and if it is time to
do a rotation change the read lock to a write lock and tell file_logger
that it is now allowed to rotate the log files.
Like the 10.2 version 1635686b50,
except C++ on internal functions for my_assume_aligned.
volatile != atomic.
volatile has no memory barrier schemantics, its for mmaped IO
so lets allow some optimizer gains and stop pretending it helps
with memory atomicity.
The MDEV lists a SEGV an assumption is made that an address was
partially read. As C packs structs strictly in order and on arm64 the
cache line size is 128 bits. A pointer (link - 64 bits), followed
by a hashnr (uint32 - 32 bits), leaves the following key (uchar *
64 bits), neither naturally aligned to any pointer and worse, split
across a cache line which is the processors view of an atomic
reservation of memory.
lf_dynarray_lvalue is assumed to return a 64 bit aligned address.
As a solution move the 32bit hashnr to the end so we don't get the
*key pointer split across two cache lines.
Tested by: Krunal Bauskar
Reviewer: Marko Mäkelä
volatile != atomic.
volatile has no memory barrier schemantics, its for mmaped IO
so lets allow some optimizer gains and stop pretending it helps
with memory atomicity.
The MDEV lists a SEGV an assumption is made that an address was
partially read. As C packs structs strictly in order and on arm64 the
cache line size is 128 bits. A pointer (link - 64 bits), followed
by a hashnr (uint32 - 32 bits), leaves the following key (uchar *
64 bits), neither naturally aligned to any pointer and worse, split
across a cache line which is the processors view of an atomic
reservation of memory.
lf_dynarray_lvalue is assumed to return a 64 bit aligned address.
As a solution move the 32bit hashnr to the end so we don't get the
*key pointer split across two cache lines.
Tested by: Krunal Bauskar
Reviewer: Marko Mäkelä
One should not change the program arguments!
This change also reduces warnings from the icc compiler.
Almost all changes are just syntax changes (adding const to
'get_one_option function' declarations).
Other changes:
- Added a few cast of 'argument' from 'const char*' to 'char *'. This
was mainly in calls to 'external' functions we don't have control of.
- Ensure that all reset of 'password command line argument' are similar.
(In almost all cases it was just adding a comment and a cast)
- In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
the command line argument. These places where changed to instead allocate
the option in a MEM_ROOT to avoid changing the argument. Some of this
code was changed to ensure that different programs did parsing the
same way. Added a test case for the changes in mysqlbinlog.cc
- Changed a few variables that took their value from command line options
from 'char *' to 'const char *'.
The test case was setting aria_sort_buffer_size to MAX_ULONGLONG-1
which was not handled gracefully by my_malloc() or safemalloc().
Fixed by ensuring that the malloc functions returns 0 if the size
is too big.
I also added some protection to Aria repair:
- Limit sort_buffer_size to 16G (after that a bigger sort buffer will
not help that much anyway)
- Limit sort_buffer_size also according to sort file size. This will
help by not allocating less memory if someone sets the buffer size too
high.
Encountered the linker failure on Debug build in 10.4:
[53/585] Linking CXX executable unittest/sql/mf_iocache-t
FAILED: unittest/sql/mf_iocache-t
: && /usr/bin/c++ -pie -fPIC -fstack-protector --param=ssp-buffer-size=4 -fPIC -g -DENABLED_DEBUG_SYNC -ggdb3 -DSAFE_MUTEX -DSAFEMALLOC -DTRASH_FREED_MEMORY -Wall -Wextra -Wno-format-truncation -Wno-init-self -Wno-nonnull-compare -Wno-unused-parameter -Woverloaded-virtual -Wnon-virtual-dtor -Wvla -Wwrite-strings -Werror -Wl,-z,relro,-z,now unittest/sql/CMakeFiles/mf_iocache-t.dir/mf_iocache-t.cc.o unittest/sql/CMakeFiles/mf_iocache-t.dir/__/__/sql/mf_iocache_encr.cc.o -o unittest/sql/mf_iocache-t -lpthread mysys/libmysys.a unittest/mytap/libmytap.a mysys_ssl/libmysys_ssl.a mysys/libmysys.a dbug/libdbug.a mysys/libmysys.a dbug/libdbug.a -lz -lm strings/libstrings.a -lpthread -lssl -lcrypto -ldl && :
/usr/bin/ld: mysys/libmysys.a(my_addr_resolve.c.o):/home/dan/repos/mariadb-server-10.4/mysys/my_addr_resolve.c:173: multiple definition of `info'; unittest/sql/CMakeFiles/mf_iocache-t.dir/mf_iocache-t.cc.o:/home/dan/repos/mariadb-server-10.4/unittest/sql/mf_iocache-t.cc:99: first defined here
We make Dl_info static as in MDEV-21646 moving it out of the function
was the main goal and having it scope limited by static doesn't affect
the function.
The clang++ -stdlib=libc++ header file <fstream> depends on
<filesystem> that defines a member function path::root_name(),
which conflicts with the rather unused #define root_name()
that had been introduced in
commit 7c58e97bf6.
Because an instrumented -stdlib=libc++ (rather than the default
-stdlib=libstdc++) is easier to build for a working -fsanitize=memory
(cmake -DWITH_MSAN=ON), let us remove the conflicting #define for now.
Problem:
========
When O_TMPFILE is not supported mysqlbinlog outputs the error to standard
stream as a warning which breaks PITR:
ERROR 1064 (42000) at line 382: You have an error in your SQL syntax; check
the manual that corresponds to your MariaDB server version for the right
syntax to use near 'mysqlbinlog: O_TMPFILE is not supported on /tmp (disabling
future attempts)
Analysis:
=========
'mysqlbinlog' utility is used to perform point-in-time-recovery based on binary
log. It converts the events in the binary log files, from binary format to text
so that they can be viewed or applied. This output can be saved to a file and
it can be sourced back to mysql client. The mysqlbinlog utility stores the
text output into IO_CACHE and when it is full the data is written to a temp
file. The temporary file creation is attempted using 'O_TMPFILE' flag. If the
underlying filesystem doesn't support this operation, a note is printed on to
standard error and file creation is done without O_TMPFILE' flag. If standard
error is redirected to standard output the note gets written to the sql file
as shown below.
/bld/client/mysqlbinlog: O_TMPFILE is not supported on /tmp (disabling future
attempts)
table id 32
When the sql file is used for PITR, it leads to a syntax error as it is not a
valid sql command.
Fix:
====
Make 'my_message_stderr' to ignore messages which are flagged as ME_NOTE and
ME_ERROR_LOG_ONLY. ME_ERROR_LOG_ONLY flag is applicable to server. In order to
print an informational note to stderr stream, ME_NOTE flag without
ME_ERROR_LOG_ONLY flag should be specified. 'my_message_stderr' should print
messages flagged with ME_WARNING or ME_FATAL to stderr stream.
This allows MariaDB to compile on old (limits to >2.6.32)
linux kernel versions.
This warns that attempts to use large pages will rely on
implict kernel determination.
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
According to https://stackoverflow.com/questions/22827510/how-to-avoid-bad-fd-set-buffer-overflow-crash
it seems that using select instead of poll can cause additional memory
allocations. As we are in a crashed state, we must prevent allocating
any memory (if possible). Thus, switch select call to poll.
Also move some bigger datastructures to global space. The code is not
run in a multithreaded context so best we don't use up stack space
if it's not needed.
There are 2 issues here:
Issue #1: memory allocation.
An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data,
decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc).
Issue #2: IO_CACHE::seek_not_done
When IO_CACHE objects are cloned, they still share the file descriptor.
This means, operation on one IO_CACHE may change the file read position
which will confuse other IO_CACHEs using it.
The fix of these issues would be:
Allocate the buffer to also include the extra size needed for encryption.
Perform seek again after one IO_CACHE reads the file.
The presumed reason for the error is that the file was opened
by 3rd party antivirus or backup program, causing ERROR_SHARING_VIOLATION
on rename.
The fix, actually a workaround, is to retry MoveFileEx couple of times
before finally giving up. We expect 3rd party programs not to hold file
for extended time.
Add CRC32C code to mysys. The x86-64 implementation uses PCMULQDQ in addition to CRC32 instruction
after Intel whitepaper, and is ported from rocksdb code.
Optimized ARM and POWER CRC32 were already present in mysys.
Removed some inine assembly, replaced by code from
https://github.com/intel/soft-crc
Also,replace GCC inline assembly for cpuid in ut0crc32 with __cpuid,
to fix "PIC register clobbered by 'ebx' in 'asm'.
This enables fast CRC32C on 32bit Intel processors with GCC.
Passing a null pointer to a nonnull argument is not only undefined
behaviour, but it also grants the compiler the permission to optimize
away further checks whether the pointer is null. GCC -O2 at least
starting with version 8 may do that, potentially causing SIGSEGV.
These problems were caught in a WITH_UBSAN=ON build with the
Bug#7024 test in main.view.
When MDEV-22669 introduced CRC-32C acceleration to IA-32,
it worked around a compiler bug by disabling the acceleration
on GCC 4 for IA-32 altogether, even though the compiler bug
only affects -fPIC builds that are targeting IA-32.
Let us extend the solution fe5dbfe723
and define HAVE_CPUID_INSTRUCTION that allows us to implement
a necessary and sufficient work-around of the compiler bug.
GCC before version 5 would fail to emit the CPUID instruction
when targeting IA-32 in -fPIC mode. Therefore, we must add the
CPUID instruction to the HAVE_CLMUL_INSTRUCTION check.
This means that the PCLMUL accelerated crc32() function will
not be available on i686 executables that are compiled with
GCC 4. The limitation does not impact AMD64 builds or non-PIC
x86 builds, or other compilers (clang, or GCC 5 or newer).
MDEV-22641 in commit dec3f8ca69
refactored a SIMD implementation of CRC-32 for the ISO 3309 polynomial
that uses the IA-32/AMD64 carry-less multiplication (pclmul)
instructions. The code was previously only available in Mariabackup;
it was changed to be a general replacement of the zlib crc32().
There exist AMD64 systems where CMAKE_SYSTEM_PROCESSOR matches
the pattern i[36]86 but not x86_64 or amd64. This would cause a
link failure, because mysys/checksum.c would basically assume that
the compiler support for instruction is always available on GCC-compatible
compilers on AMD64.
Furthermore, we were unnecessarily disabling the SIMD acceleration
for 32-bit executables.
Note: Until MDEV-22749 has been implemented, the PCLMUL instruction
will not be used on Microsoft Windows.
Closes: #1660
Raspberry Pi 4 supports crc32 but doesn't support pmull (MDEV-23030).
The PR #1645 offers a solution to fix this issue. But it does not consider
the condition that the target platform does support crc32 but not support PMULL.
In this condition, it should leverage the Arm64 crc32 instruction (__crc32c) and
just only skip parallel computation (pmull/vmull) rather than skip all hardware
crc32 instruction of computation.
The PR also removes unnecessary CRC32_ZERO branch in 'crc32c_aarch64' for MariaDB,
formats the indent and coding style.
Change-Id: I76371a6bd767b4985600e8cca10983d71b7e9459
Signed-off-by: Yuqi Gu <yuqi.gu@arm.com>
depending on build config the error might be hidded,
in particular liblz4.so and libjemalloc.so make it to disappear,
but with -DWITH_INNODB_LZ4=NO -DWITH_JEMALLOC=NO it reappears.
MariaDB adopted a hardware optimized crc32c approach on ARM64 starting 10.5.
Said implementation of crc32c needs support from target hardware for crc32
and pmull instructions. Existing logic is checking only for crc32 support
from target hardware through a runtime check and so if target hardware
doesn't support pmull it would cause things to fail/crash.
Expanded runtime check to ensure pmull support is also checked on the target
hardware along with existing crc32.
Thanks to Marko and Daniel for review.
I run perf top during ./mtr testing and constantly see times()
function there. It's so slow, that it has no sense to run it
in a loop too many times.
This patch speeds up -suite=innodb for me from 218s to 208s.
9s of times() function!
Small postfix to MDEV-23175 to ensure faster option on FreeBSD
and compatibility to Solaris that isn't high resolution.
ftime is left as a backup in case an implementation doesn't
contain any of these clocks.
FreeBSD
$ ./unittest/mysys/my_rdtsc-t
1..11
# ----- Routine ---------------
# myt.cycles.routine : 5
# myt.nanoseconds.routine : 11
# myt.microseconds.routine : 13
# myt.milliseconds.routine : 11
# myt.ticks.routine : 17
# ----- Frequency -------------
# myt.cycles.frequency : 3610295566
# myt.nanoseconds.frequency : 1000000000
# myt.microseconds.frequency : 1000000
# myt.milliseconds.frequency : 899
# myt.ticks.frequency : 136
# ----- Resolution ------------
# myt.cycles.resolution : 1
# myt.nanoseconds.resolution : 1
# myt.microseconds.resolution : 1
# myt.milliseconds.resolution : 7
# myt.ticks.resolution : 1
# ----- Overhead --------------
# myt.cycles.overhead : 26
# myt.nanoseconds.overhead : 19140
# myt.microseconds.overhead : 19036
# myt.milliseconds.overhead : 578
# myt.ticks.overhead : 21544
ok 1 - my_timer_init() did not crash
ok 2 - The cycle timer is strictly increasing
ok 3 - The cycle timer is implemented
ok 4 - The nanosecond timer is increasing
ok 5 - The nanosecond timer is implemented
ok 6 - The microsecond timer is increasing
ok 7 - The microsecond timer is implemented
ok 8 - The millisecond timer is increasing
ok 9 - The millisecond timer is implemented
ok 10 - The tick timer is increasing
ok 11 - The tick timer is implemented
Largely based on MySQL commit
75271e51d6
MySQL Ref:
BUG#24566529: BACKPORT BUG#23575445 TO 5.6
(cut)
Also, the PTR_SANE macro which tries to check if a pointer
is invalid (used when printing pointer values in stack traces)
gave false negatives on OSX/FreeBSD. On these platforms we
now simply check if the pointer is non-null. This also removes
a sbrk() deprecation warning when building on OS X. (It was
before only disabled with building using XCode).
Removed execinfo path of MySQL patch that was already included.
sbrk doesn't exist on FreeBSD aarch64.
Removed HAVE_BSS_START based detection and replaced with __linux__
as it doesn't exist on OSX, Solaris or Windows. __bss_start
exists on mutiple Linux architectures.
Tested on FreeBSD and Linux x86_64. Being in FreeBSD ports for 2
years implies a good testing there on all FreeBSD architectures there
too. MySQL-8.0.21 code is functionally identical to original commit.
aarch64 timer is available to userspace via arch register.
clang's __builtin_readcyclecounter is wrong for aarch64 (reads the PMU
cycle counter instead of the archi-timer register), so we don't use it.
my_rdtsc unit-test on AWS m6g shows:
frequency: 121830845
resolution: 1
overhead: 1
This counter is not strictly increasing, but it is non-decreasing.
This patch ensures that all identical character sets shares the same
cs->csname.
This allows us to replace strcmp() in my_charset_same() with comparisons
of pointers. This fixes a long standing performance issue that could cause
as strcmp() for every item sent trough the protocol class to the end user.
One consequence of this patch is that we don't allow one to add a character
definition in the Index.xml file that changes the csname of an existing
character set. This is by design as changing character set names of existing
ones is extremely dangerous, especially as some storage engines just records
character set numbers.
As we now have a hash over character set's csname, we can in the future
use that for faster access to a specific character set. This could be done
by changing the hash to non unique and use the hash to find the next
character set with same csname.
Linux glibc has deprecated ftime resutlting in a compile error on Fedora-32.
Per manual clock_gettime is the suggested replacement. Because my_timer_milliseconds
is a relative time used by largely the perfomrance schema, CLOCK_MONOTONIC_COARSE
is used. This has been available since Linux-2.6.32.
The low overhead is shows in the unittest:
$ unittest/mysys/my_rdtsc-t
1..11
# ----- Routine ---------------
# myt.cycles.routine : 5
# myt.nanoseconds.routine : 11
# myt.microseconds.routine : 13
# myt.milliseconds.routine : 18
# myt.ticks.routine : 17
# ----- Frequency -------------
# myt.cycles.frequency : 3596597014
# myt.nanoseconds.frequency : 1000000000
# myt.microseconds.frequency : 1000000
# myt.milliseconds.frequency : 1039
# myt.ticks.frequency : 103
# ----- Resolution ------------
# myt.cycles.resolution : 1
# myt.nanoseconds.resolution : 1
# myt.microseconds.resolution : 1
# myt.milliseconds.resolution : 1
# myt.ticks.resolution : 1
# ----- Overhead --------------
# myt.cycles.overhead : 118
# myt.nanoseconds.overhead : 234
# myt.microseconds.overhead : 222
# myt.milliseconds.overhead : 30
# myt.ticks.overhead : 4946
ok 1 - my_timer_init() did not crash
ok 2 - The cycle timer is strictly increasing
ok 3 - The cycle timer is implemented
ok 4 - The nanosecond timer is increasing
ok 5 - The nanosecond timer is implemented
ok 6 - The microsecond timer is increasing
ok 7 - The microsecond timer is implemented
ok 8 - The millisecond timer is increasing
ok 9 - The millisecond timer is implemented
ok 10 - The tick timer is increasing
ok 11 - The tick timer is implemented
The merge commit 0fd89a1a89
of commit b6ec1e8bbf
seems to cause occasional MemorySanitizer failures,
because it failed to replace some MEM_UNDEFINED() calls
with MEM_MAKE_ADDRESSABLE().
my_large_free(): Correctly invoke MEM_MAKE_ADDRESSABLE() after
freeing memory. Failure to do so could cause bogus
AddressSanitizer failures for memory allocated by my_large_malloc().
On MemorySanitizer, we will do nothing.
buf_pool_t::chunk_t::create(): Replace the MEM_MAKE_ADDRESSABLE()
that had been added in commit 484931325e
to work around the issue.
In AddressSanitizer, we only want memory poisoning to happen
in connection with custom memory allocation or freeing.
The primary use of MEM_UNDEFINED is for declaring memory uninitialized
in Valgrind or MemorySanitizer. We do not want MEM_UNDEFINED to
have the unwanted side effect that AddressSanitizer would no longer
be able to complain about accessing unallocated memory.
MEM_UNDEFINED(): Define as no-op for AddressSanitizer.
MEM_MAKE_ADDRESSABLE(): Define as MEM_UNDEFINED() or
ASAN_UNPOISON_MEMORY_REGION().
MEM_CHECK_ADDRESSABLE(): Wrap also __asan_region_is_poisoned().