The easiest way to compile and test the server with UBSAN is to run:
./BUILD/compile-pentium64-ubsan
and then run mysql-test-run.
After this commit, one should be able to run this without any UBSAN
warnings. There is still a few compiler warnings that should be fixed
at some point, but these do not expose any real bugs.
The 'special' cases where we disable, suppress or circumvent UBSAN are:
- ref10 source (as here we intentionally do some shifts that UBSAN
complains about.
- x86 version of optimized int#korr() methods. UBSAN do not like unaligned
memory access of integers. Fixed by using byte_order_generic.h when
compiling with UBSAN
- We use smaller thread stack with ASAN and UBSAN, which forced me to
disable a few tests that prints the thread stack size.
- Verifying class types does not work for shared libraries. I added
suppression in mysql-test-run.pl for this case.
- Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
safe to have overflows (two cases, in item_func.cc).
Things fixed:
- Don't left shift signed values
(byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
- Don't assign not non existing values to enum variables.
- Ensure that bool and enum values are properly initialized in
constructors. This was needed as UBSAN checks that these types has
correct values when one copies an object.
(gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
- Ensure we do not called handler functions on unallocated objects or
deleted objects.
(events.cc, sql_acl.cc).
- Fixed bugs in Item_sp::Item_sp() where we did not call constructor
on Query_arena object.
- Fixed several cast of objects to an incompatible class!
(Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
sql_select.cc ...)
- Ensure we do not do integer arithmetic that causes over or underflows.
This includes also ++ and -- of integers.
(Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
- Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
value_type is initialized to this instead of to -1, which is not a valid
enum value for json_value_types.
- Ensure we do not call memcpy() when second argument could be null.
- Fixed that Item_func_str::make_empty_result() creates an empty string
instead of a null string (safer as it ensures we do not do arithmetic
on null strings).
Other things:
- Changed struct st_position to an OBJECT and added an initialization
function to it to ensure that we do not copy or use uninitialized
members. The change to a class was also motived that we used "struct
st_position" and POSITION randomly trough the code which was
confusing.
- Notably big rewrite in sql_acl.cc to avoid using deleted objects.
- Changed in sql_partition to use '^' instead of '-'. This is safe as
the operator is either 0 or 0x8000000000000000ULL.
- Added check for select_nr < INT_MAX in JOIN::build_explain() to
avoid bug when get_select() could return NULL.
- Reordered elements in POSITION for better alignment.
- Changed sql_test.cc::print_plan() to use pointers instead of objects.
- Fixed bug in find_set() where could could execute '1 << -1'.
- Added variable have_sanitizer, used by mtr. (This variable was before
only in 10.5 and up). It can now have one of two values:
ASAN or UBSAN.
- Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
it virtual. This was an effort to get UBSAN to work with loaded storage
engines. I kept the change as the new place is better.
- Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
in tabutil.cpp.
- Added HAVE_REPLICATION around usage of rgi_slave, to get embedded
server to compile with UBSAN. (Patch from Marko).
- Added #ifdef for powerpc64 to avoid a bug in old gcc versions related
to integer arithmetic.
Changes that should not be needed but had to be done to suppress warnings
from UBSAN:
- Added static_cast<<uint16_t>> around shift to get rid of a LOT of
compiler warnings when using UBSAN.
- Had to change some '/' of 2 base integers to shift to get rid of
some compile time warnings.
Reviewed by:
- Json changes: Alexey Botchkov
- Charset changes in ctype-uca.c: Alexander Barkov
- InnoDB changes & Embedded server: Marko Mäkelä
- sql_acl.cc changes: Vicențiu Ciorbaru
- build_explain() changes: Sergey Petrunia
SI_USER is, however in FreeBSD there are a couple of non-kernel
user signal infomations above SI_KERNEL.
Put a fallback just in case there is nothing available.
This was because of a wrong test in encryption code that wrote random
numbers over the LSN for pages for transactional Aria tables during repair.
The effect was that after an ALTER TABLE ENABLE KEYS of a encrypted
recovery of the tables would not work.
Fixed by changing testing of !share->now_transactional to
!share->base.born_transactional.
Other things:
- Extended Aria check_table() to check for wrong (= too big) LSN numbers.
- If check_table() failed just because of wrong LSN or TRN numbers,
a following repair table will just do a zerofill which is much faster.
- Limit number of LSN errors in one check table to MAX_LSN_ERROR (10).
- Removed old obsolete test of 'if (error_count & 2)'. Changed error_count
and warning_count from bits to numbers of errors/warnings as this is
more useful.
memory barrier on ARM
As suggested in the said JIRA ticket based on the contribution done by
the community (in an attempt to optimize the spin-loop) the said approach
was evaluated against MariaDB Server 10.5 and found to help improve
throughput in the range of 2-5%.
Note: 10.6 timing graph and model are different as home-brew
mutexes are replaced with pthread mutexes. Said patch has mixed
impact on 10.6 so not recommended for 10.6.
As pointed out by Andrei Elkin, the previous fix did not fix one
race condition that may have caused the observed hang.
innodb_log_flush_request(): If we are enqueueing the very first
request at the same time the log write is being completed,
we must ensure that a near-concurrent call to log_flush_notify()
will not result in a missed notification. We guarantee this by
release-acquire operations on log_requests.start and
log_sys.flushed_to_disk_lsn.
log_flush_notify_and_unlock(): Cleanup: Always release the mutex.
log_sys_t::get_flushed_lsn(): Use acquire memory order.
log_sys_t::set_flushed_lsn(): Use release memory order.
log_sys_t::set_lsn(): Use release memory order.
log_sys_t::get_lsn(): Use relaxed memory order by default, and
allow the caller to specify acquire memory order explicitly.
Whenever the log_sys.mutex is being held or when log writes are
prohibited during startup, we can use a relaxed load. Likewise,
in some assertions where reading a stale value of log_sys.lsn
should not matter, we can use a relaxed load.
This will cause some additional instructions to be emitted on
architectures that do not implement Total Store Ordering (TSO),
such as POWER, ARM, and RISC-V Weak Memory Ordering (RVWMO).
HAVE_valgrind_or_MSAN to HAVE_valgrind was incorrect in
af784385b4.
In my_valgrind.h when clang exists (hence no __has_feature(memory_sanitizer),
and -DWITH_VALGRIND=1, but without memcheck.h, we end up with a MEM_CHECK_DEFINED
being empty.
If we are also doing a CMAKE_BUILD_TYPE=Debug this results a number of
[-Werror,-Wunused-variable] errors because MEM_CHECK_DEFINED is empty.
With MEM_CHECK_DEFINED empty, there becomes no uses of this of the
fixed field and innodb variables in this patch.
So we stop using HAVE_valgrind as catchall and use the name
HAVE_CHECK_MEM to indicate that a CHECK_MEM_DEFINED function exists.
Reviewer: Monty
Corrects: af784385b4
Incorrect processing of an auto-incrementing field in the
WSREP-related code during applying transactions results in
a duplicate key being created. This is due to the fact that
at the beginning of the write_row() and update_row() functions,
the values of the auto-increment parameters are used, which
are read from the parameters of the current thread, but further
along the code other values are used, which are read from global
variables (when applying a transaction). This can happen when
the cluster configuration has changed while applying a transaction
(for example in the high_priority_service mode for Galera 4).
Further during IST processing duplicating key is detected, and
processing of the DB_DUPLICATE_KEY return code (inside innodb,
in the write_row() handler) results in a call to the
wsrep_thd_self_abort() function.
Like the 10.2 version 1635686b50,
except C++ on internal functions for my_assume_aligned.
volatile != atomic.
volatile has no memory barrier schemantics, its for mmaped IO
so lets allow some optimizer gains and stop pretending it helps
with memory atomicity.
The MDEV lists a SEGV an assumption is made that an address was
partially read. As C packs structs strictly in order and on arm64 the
cache line size is 128 bits. A pointer (link - 64 bits), followed
by a hashnr (uint32 - 32 bits), leaves the following key (uchar *
64 bits), neither naturally aligned to any pointer and worse, split
across a cache line which is the processors view of an atomic
reservation of memory.
lf_dynarray_lvalue is assumed to return a 64 bit aligned address.
As a solution move the 32bit hashnr to the end so we don't get the
*key pointer split across two cache lines.
Tested by: Krunal Bauskar
Reviewer: Marko Mäkelä
One should not change the program arguments!
This change also reduces warnings from the icc compiler.
Almost all changes are just syntax changes (adding const to
'get_one_option function' declarations).
Other changes:
- Added a few cast of 'argument' from 'const char*' to 'char *'. This
was mainly in calls to 'external' functions we don't have control of.
- Ensure that all reset of 'password command line argument' are similar.
(In almost all cases it was just adding a comment and a cast)
- In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
the command line argument. These places where changed to instead allocate
the option in a MEM_ROOT to avoid changing the argument. Some of this
code was changed to ensure that different programs did parsing the
same way. Added a test case for the changes in mysqlbinlog.cc
- Changed a few variables that took their value from command line options
from 'char *' to 'const char *'.
Since 2017 (c2118a08b1) THD::awake() no longer requires LOCK_thd_data.
It uses LOCK_thd_kill, and this latter mutex is used to prevent
a thread of dying, not LOCK_thd_data as before.
Atomic_relaxed<T>: add fetch_or() and fetch_and()
innodb_init(): rely on a zero-initialization of a global variable
monitor_set_tbl: make Atomic_relaxed<ulint> array and use proper operations
for setting bit, unsetting bit and reading bit
Reviewed by: Marko Mäkelä
The clang++ -stdlib=libc++ header file <fstream> depends on
<filesystem> that defines a member function path::root_name(),
which conflicts with the rather unused #define root_name()
that had been introduced in
commit 7c58e97bf6.
Because an instrumented -stdlib=libc++ (rather than the default
-stdlib=libstdc++) is easier to build for a working -fsanitize=memory
(cmake -DWITH_MSAN=ON), let us remove the conflicting #define for now.
This reverts commit 6cf8f05fd9.
Original patch assumed that MAP_HUGETLB as consistent across
achitectures which isn't the case. Defining it unconditionally
broke large pages on every achitecutre where the value differed
from x86_64.
With the EOL for Centos/RHEL6 announced in 10.5.7, <3.8 linux
kernels are no longer supported.
This PR fixes same issue as MDEV-21577 for TRUNCATE TABLE.
MDEV-21577 fixed TOI replication for OPTIMIZE, REPAIR and ALTER TABLE
operating on FK child table. It was later found out that also TRUNCATE
has similar problem and needs a fix.
The actual fix is to do FK parent table lookup before TRUNCATE TOI
isolation and append found FK parent table names in certification key
list for the write set.
PR contains also new test scenario in galera_ddl_fk_conflict test where
FK child has two FK parent tables and there are two DML transactions operating
on both parent tables.
For development convenience, new TO isolation macro was added:
WSREP_TO_ISOLATION_BEGIN_IF and WSREP_TO_ISOLATION_BEGIN_ALTER macro was changed
to skip the goto statement.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
MDEV-23855 broke the handling of innodb_flush_sync=OFF.
That parameter is supposed to limit the page write rate
in case the log capacity is being exceeded and log checkpoints
are needed.
With this fix, the following should pass:
./mtr --mysqld=--loose-innodb-flush-sync=0
One of our best regression tests for page flushing is
encryption.innochecksum. With innodb_page_size=16k and
innodb_flush_sync=OFF it would likely hang without this fix.
log_sys.last_checkpoint_lsn: Declare as Atomic_relaxed<lsn_t>
so that we are allowed to read the value while not holding
log_sys.mutex.
buf_flush_wait_flushed(): Let the page cleaner perform the flushing
also if innodb_flush_sync=OFF. After the page cleaner has
completed, perform a checkpoint if it is needed, because
buf_flush_sync_for_checkpoint() will not be run if
innodb_flush_sync=OFF.
buf_flush_ahead(): Simplify the condition. We do not really care
whether buf_flush_page_cleaner() is running.
buf_flush_page_cleaner(): Evaluate innodb_flush_sync at the low
level. If innodb_flush_sync=OFF, rate-limit the batches to
innodb_io_capacity_max pages per second.
Reviewed by: Vladislav Vaintroub
Some DDL statements appear to acquire MDL locks for a table referenced by
foreign key constraint from the actual affected table of the DDL statement.
OPTIMIZE, REPAIR and ALTER TABLE belong to this class of DDL statements.
Earlier MariaDB version did not take this in consideration, and appended
only affected table in the certification key list in write set.
Because of missing certification information, it could happen that e.g.
OPTIMIZE table for FK child table could be allowed to apply in parallel
with DML operating on the foreign key parent table, and this could lead to
unhandled MDL lock conflicts between two high priority appliers (BF).
The fix in this patch, changes the TOI replication for OPTIMIZE, REPAIR and
ALTER TABLE statements so that before the execution of respective DDL
statement, there is foreign key parent search round. This FK parent search
contains following steps:
* open and lock the affected table (with permissive shared locks)
* iterate over foreign key contstraints and collect and array of Fk parent
table names
* close all tables open for the THD and release MDL locks
* do the actual TOI replication with the affected table and FK parent
table names as key values
The patch contains also new mtr test for verifying that the above mentioned
DDL statements replicate without problems when operating on FK child table.
The mtr test scenario #1, which can be used to check if some other DDL
(on top of OPTIMIZE, REPAIR and ALTER) could cause similar excessive FK
parent table locking.
Reviewed-by: Aleksey Midenkov <aleksey.midenkov@mariadb.com>
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
There are 2 issues here:
Issue #1: memory allocation.
An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data,
decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc).
Issue #2: IO_CACHE::seek_not_done
When IO_CACHE objects are cloned, they still share the file descriptor.
This means, operation on one IO_CACHE may change the file read position
which will confuse other IO_CACHEs using it.
The fix of these issues would be:
Allocate the buffer to also include the extra size needed for encryption.
Perform seek again after one IO_CACHE reads the file.