Incorrect processing of an auto-incrementing field in the
WSREP-related code during applying transactions results in
a duplicate key being created. This is due to the fact that
at the beginning of the write_row() and update_row() functions,
the values of the auto-increment parameters are used, which
are read from the parameters of the current thread, but further
along the code other values are used, which are read from global
variables (when applying a transaction). This can happen when
the cluster configuration has changed while applying a transaction
(for example in the high_priority_service mode for Galera 4).
Further during IST processing duplicating key is detected, and
processing of the DB_DUPLICATE_KEY return code (inside innodb,
in the write_row() handler) results in a call to the
wsrep_thd_self_abort() function.
Since 2017 (c2118a08b1) THD::awake() no longer requires LOCK_thd_data.
It uses LOCK_thd_kill, and this latter mutex is used to prevent
a thread of dying, not LOCK_thd_data as before.
Atomic_relaxed<T>: add fetch_or() and fetch_and()
innodb_init(): rely on a zero-initialization of a global variable
monitor_set_tbl: make Atomic_relaxed<ulint> array and use proper operations
for setting bit, unsetting bit and reading bit
Reviewed by: Marko Mäkelä
This PR fixes same issue as MDEV-21577 for TRUNCATE TABLE.
MDEV-21577 fixed TOI replication for OPTIMIZE, REPAIR and ALTER TABLE
operating on FK child table. It was later found out that also TRUNCATE
has similar problem and needs a fix.
The actual fix is to do FK parent table lookup before TRUNCATE TOI
isolation and append found FK parent table names in certification key
list for the write set.
PR contains also new test scenario in galera_ddl_fk_conflict test where
FK child has two FK parent tables and there are two DML transactions operating
on both parent tables.
For development convenience, new TO isolation macro was added:
WSREP_TO_ISOLATION_BEGIN_IF and WSREP_TO_ISOLATION_BEGIN_ALTER macro was changed
to skip the goto statement.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Some DDL statements appear to acquire MDL locks for a table referenced by
foreign key constraint from the actual affected table of the DDL statement.
OPTIMIZE, REPAIR and ALTER TABLE belong to this class of DDL statements.
Earlier MariaDB version did not take this in consideration, and appended
only affected table in the certification key list in write set.
Because of missing certification information, it could happen that e.g.
OPTIMIZE table for FK child table could be allowed to apply in parallel
with DML operating on the foreign key parent table, and this could lead to
unhandled MDL lock conflicts between two high priority appliers (BF).
The fix in this patch, changes the TOI replication for OPTIMIZE, REPAIR and
ALTER TABLE statements so that before the execution of respective DDL
statement, there is foreign key parent search round. This FK parent search
contains following steps:
* open and lock the affected table (with permissive shared locks)
* iterate over foreign key contstraints and collect and array of Fk parent
table names
* close all tables open for the THD and release MDL locks
* do the actual TOI replication with the affected table and FK parent
table names as key values
The patch contains also new mtr test for verifying that the above mentioned
DDL statements replicate without problems when operating on FK child table.
The mtr test scenario #1, which can be used to check if some other DDL
(on top of OPTIMIZE, REPAIR and ALTER) could cause similar excessive FK
parent table locking.
Reviewed-by: Aleksey Midenkov <aleksey.midenkov@mariadb.com>
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
There are 2 issues here:
Issue #1: memory allocation.
An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data,
decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc).
Issue #2: IO_CACHE::seek_not_done
When IO_CACHE objects are cloned, they still share the file descriptor.
This means, operation on one IO_CACHE may change the file read position
which will confuse other IO_CACHEs using it.
The fix of these issues would be:
Allocate the buffer to also include the extra size needed for encryption.
Perform seek again after one IO_CACHE reads the file.
Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue
and move condition to correct callers. Add a function to report
BF lock waits and assert if incorrect BF-BF lock wait happens.
wsrep_report_bf_lock_wait
Add a new function to report BF lock wait.
wsrep_assert_no_bf_bf_wait
Add a new function to check do we have a
BF-BF wait and if we have report this case
and assert as it is a bug.
lock_rec_has_to_wait
Use new wsrep_assert_bf_wait to check BF-BF wait.
lock_rec_create_low
lock_table_create
Use new function to report BF lock waits.
lock_rec_insert_by_trx_age
lock_grant_and_move_on_page
lock_grant_and_move_on_rec
Assert that trx is not Galera as VATS is not compatible
with Galera.
lock_rec_add_to_queue
If there is conflicting lock in a queue make sure that
transaction is BF.
lock_rec_has_to_wait_in_queue
Remove incorrect BF handling. If there is conflicting
locks in a queue all transactions must wait.
lock_rec_dequeue_from_page
lock_rec_unlock
If there is conflicting lock make sure it is not
BF-BF case.
lock_rec_queue_validate
Add Galera record locking rules comment and use
new function to report BF lock waits.
All attempts to reproduce the original assertion have been
failed. Therefore, there is no test case on this commit.
This follows up MDEV-14374, which was filed against MariaDB Server 10.3.
Back then, on a 48-core Qualcomm Centriq 2400, the performance of
delay loops for spinloops was tested both with and without the dummy
compare-and-swap operation, and it was decided to keep the dummy
operation.
On target architectures where nothing special is available (other than
x86 (IA-32, AMD64) or POWER), we perform a dummy compare-and-swap operation.
This is contrary to the idea of the x86 PAUSE instruction and the
__ppc_get_timebase(), which aim to keep the memory bus idle for a while,
to allow other cores to better execute code while a spinloop is waiting
for something to be changed.
On MariaDB Server 10.4 and another implementation of the ARMv8 ISA,
omitting the dummy compare-and-swap improved performance by up to 12%.
So, let us avoid the dummy compare-and-swap on ARM.
For now, we are retaining the dummy compare-and-swap on other ISAs
(such as SPARC, MIPS, S390x, RISC-V) because we do not have any
performance data for them.
In 10.3, DBUG_ASSERT() may expand to something that includes
__builtin_expect(), which expects integer arguments, not pointers.
To avoid any compiler warnings, let us use an explicit rather than
implicit comparison to the null pointer.
Due to restricted size of the threadpool, execution of client queries can
be delayed (queued) for a while. This delay was interpreted as client
inactivity, and connection is closed, if client idle time + queue time
exceeds wait_timeout.
But users did not expect queue time to be included into wait_timeout.
This patch changes the behavior. We don't close connection anymore,
if there is some unread data present on connection,
even if wait_timeout is exceeded. Unread data means that client
was not idle, it sent a query, which we did not have time to process yet.
aarch64 timer is available to userspace via arch register.
clang's __builtin_readcyclecounter is wrong for aarch64 (reads the PMU
cycle counter instead of the archi-timer register), so we don't use it.
my_rdtsc unit-test on AWS m6g shows:
frequency: 121830845
resolution: 1
overhead: 1
This counter is not strictly increasing, but it is non-decreasing.
In fsp_path_to_space_name(), we would access a byte right before
the start of the string, tripping AddressSanitizer.
This reverts commit d87006a1c1
and commit a7634281aa.
This version is not optimized yet. It could have bugs because I didn't
check it with unit tests. Also, std::char_traits are not really supported.
So, now it's not possible to create f.ex. a case insensitive string_view.
fil_path_to_space_name(): renamed, moved to another file
and refactored to use string_view
accept might return an error, including SOCKET_EAGAIN/
SOCKET_EINTR. The caller, usually handle_connections_sockets
can these however and invalid file descriptor isn't something
to call fcntl on.
Thanks to Etienne Guesnet (ATOS) for diagnosis,
sample patch description and testing.
In AddressSanitizer, we only want memory poisoning to happen
in connection with custom memory allocation or freeing.
The primary use of MEM_UNDEFINED is for declaring memory uninitialized
in Valgrind or MemorySanitizer. We do not want MEM_UNDEFINED to
have the unwanted side effect that AddressSanitizer would no longer
be able to complain about accessing unallocated memory.
MEM_UNDEFINED(): Define as no-op for AddressSanitizer.
MEM_MAKE_ADDRESSABLE(): Define as MEM_UNDEFINED() or
ASAN_UNPOISON_MEMORY_REGION().
MEM_CHECK_ADDRESSABLE(): Wrap also __asan_region_is_poisoned().
- Some of the bug fixes are backports from 10.5!
- The fix in innobase/fil/fil0fil.cc is just a backport to get less
error messages in mysqld.1.err when running with valgrind.
- Renamed HAVE_valgrind_or_MSAN to HAVE_valgrind
MemorySanitizer (clang -fsanitize=memory) requires that all code
be compiled with instrumentation enabled. The only exception is the
C runtime library. Failure to use instrumented libraries will cause
bogus messages about memory being uninitialized.
In WITH_MSAN builds, we must avoid calling getservbyname(),
because even though it is a standard library function, it is
not instrumented, not even in clang 10.
Note: Before MariaDB Server 10.5, ./mtr will typically fail
due to the old PCRE library, which was updated in MDEV-14024.
The following cmake options were tested on 10.5
in commit 94d0bb4dbe:
cmake \
-DCMAKE_C_FLAGS='-march=native -O2' \
-DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \
-DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \
-DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \
-DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DWITH_SAFEMALLOC=OFF \
-DWITH_{ZLIB,SSL,PCRE}=bundled \
-DHAVE_LIBAIO_H=0 \
-DWITH_MSAN=ON
MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED()
and __msan_unpoison().
MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for
VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow().
InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros.
ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in
functions instead of inline assembler when building WITH_MSAN.
This will require at least -msse4.2 when building for IA-32 or AMD64.
The inline assembler would not be instrumented, and would thus cause
bogus failures.
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the victim's transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data
If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.
The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter
records the aborter THD's ID. This is needed to preserve the right to kill
the victim from different locations for the same aborter thread.
It is also good error logging, to see who is reponsible for the abort.
A new test case was added in galera.galera_bf_kill_debug.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
The idea was borrowed from http://wg21.link/p0052
scope_exit class is a helper, its name is hidden from user in
the namespace detail.
Alternative implementation of scope_exit with std::function
looks slower on goldbolt.org as it may require allocation, etc.
scope_exit doesn't need to own a callable, so beeing a pointer
is enough. And std::decay produces such a pointer from callable.
MDEV-21298: mariabackup doesn't read from the [mariadbd] and [mariadbd-X.Y]
server option groups from configuration files
MDEV-21301: mariabackup doesn't read [mariadb-backup] option group in
configuration file
All three issues require to change the same code, that is why their
fixes are joined in one commit.
The fix is in invoking load_defaults_or_exit() and handle_options() for
backup-specific groups separately from client-server groups to let the last
handle_options() call fail on unknown backup-specific options.
The order of options procesing is the following:
1) Load server groups and process server options, ignore unknown
options
2) Load client groups and process client options, ignore unknown
options
3) Load backup groups and process client-server options, exit on
unknown option
4) Process --mysqld-args command line options, ignore unknown options
New global flag my_handle_options_init_variables was added to have
ability to invoke handle_options() for the same allowed options set
several times without re-initialising previously set option values.
--password value destroying is moved from option processing callback to
mariabackup's handle_options() function to have ability to invoke server's
handle_options() several times for the same possible allowed options
set.
Galera invokes wsrep_sst_mariabackup.sh with mysqld command line
options to configure mariabackup as close to the server as possible.
It is not known what server options are supported by mariabackup when the
script is invoked. That is why new mariabackup option "--mysqld-args" is added,
all unknown options that follow this option will be silently ignored.
wsrep_sst_mariabackup.sh was also changed to:
- use "--mysqld-args" mariabackup option to pass mysqld options,
- remove deprecated innobackupex mode,
- remove unsupported mariabackup options:
--encrypt
--encrypt-key
--rebuild-indexes
--rebuild-threads
Compiler tells something about argument-dependent lookup. I do not
understand how that ADL works. But I know that such operators should
be free functions, instead of methods:
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Ro-symmetric
Such syntax defines 'friend' free functions.
Introduce a new ATTRIBUTE_NOINLINE to
ib::logger member functions, and add UNIV_UNLIKELY hints to callers.
Also, remove some crash reporting output. If needed, the
information will be available using debugging tools.
Furthermore, remove some fts_enable_diag_print output that included
indexed words in raw form. The code seemed to assume that words are
NUL-terminated byte strings. It is not clear whether a NUL terminator
is always guaranteed to be present. Also, UCS2 or UTF-16 strings would
typically contain many NUL bytes.
namespace intrusive: removed
split class into two: ilist<T> and sized_ilist<T> which has a size field.
ilist<T> no more NULLify pointers to bring a slignly better performance.
As a consequence, fil_space_t::is_in_unflushed_spaces and
fil_space_t::is_in_rotation_list boolean members are needed now.
In the merge 9e6e43551f
we made Atomic_counter a more generic wrapper of std::atomic
so that dict_index_t would support the implicit assignment operator.
It is better to revert the changes to Atomic_counter and
instead introduce Atomic_relaxed as a generic wrapper to std::atomic.
Unlike Atomic_counter, we will not define operator++, operator+=
or similar, because we want to make the operations more explicit
in the users of Atomic_wrapper, because unlike loads and stores,
atomic read-modify-write operations always incur some overhead.
When neither MSAN nor Valgrind are enabled, declare
Field::mark_unused_memory_as_defined() as an empty inline function,
instead of declaring it as a virtual function.
MDEV-22073 MSAN use-of-uninitialized-value in
collect_statistics_for_table()
Other things:
innodb.analyze_table was changed to mainly test statistic
collection. Was discussed with Marko.
Fix WolfSSL build:
- Do not build with TLSv1.0,it stopped working,at least with SChannel client
- Disable a test that depends on TLSv1.0
- define FP_MAX_BITS always, to fix 32bit builds.
- Increase MAX_AES_CTX_SIZE, to fix build on Linux
Rowid Filter check is just like Index Condition Pushdown check: before
we check the filter, we must check if we have walked out of the range
we are scanning. (If we did, we should return, and not continue the scan).
Consequences of this:
- Rowid filtering doesn't work for keys that have partially-covered
blob columns (just like Index Condition Pushdown)
- The rowid filter function has three return values: CHECK_POS (passed)
CHECK_NEG (filtered out), CHECK_OUT_OF_RANGE.
All of the above is implemented in this patch
queues.c cleanup and refactoring.
Restore old version of _downhead() (from before cd483c5520)
that works well in an average case. Use it for queue_fix().
Move existing specialized version of _downhead() to queue_replace()
where it'll be handling the case it was specifically optimized for
(moving the element to the end of the queue).
And correct it to fix the heap not only down, but also up
(this fixes BUG#30301356).
Add unit tests.
Collateral cosmetic fixes.
This is a way do disable DBUG_ENTER()/DBUG_EXIT() stuff which is
needed to dbug trace. Those who doesn't need it may avoid tests
slowdown with -DWITH_DBUG_TRACE=OFF
dbug/tests.c: add define which is neede always in this test
innodb.log_file_name_debug.test: do not depend on DBUG trace stuff
in test
Benchmark results: each test eats less CPU and you can have more
parallel jobs in MTR.
patched:
./mtr -mem -par=8 -suite=innodb 185.34s user 86.85s system 133% cpu 3:23.27 total
./mtr -mem -par=8 -suite=main 80.96s user 36.01s system 182% cpu 1:04.07 total
main.select [ pass ] 1660
main.select [ pass ] 1513
main.select [ pass ] 1543
main.select [ pass ] 1660
main.select [ pass ] 1521
main.select [ pass ] 1511
main.select [ pass ] 1508
main.select [ pass ] 1520
main.select [ pass ] 1514
main.select [ pass ] 1522
vanilla:
./mtr -mem -par=8 -suite=innodb 203.61s user 92.16s system 140% cpu 3:30.16 total
./mtr -mem -par=8 -suite=main 94.11s user 35.51s system 206% cpu 1:02.69 total
main.select [ pass ] 2032
main.select [ pass ] 2017
main.select [ pass ] 2040
main.select [ pass ] 2183
main.select [ pass ] 2253
main.select [ pass ] 2075
main.select [ pass ] 2109
main.select [ pass ] 2080
main.select [ pass ] 2098
main.select [ pass ] 2114
KEY_MULTI_RANGE::range_flag does not have correct flag bits for
per-endpoint flags (NEAR_MIN, NEAR_MAX, NO_MIN_RANGE, NO_MAX_RANGE).
It only has bits for flags that describe both endpoints.
So
- Document this.
- Switch optimizer trace to using {start|end}_key.flag values, instead.
This fixes the bug.
- Switch records_in_column_ranges() to doing that too. (This used to
work, because KEY_MULTI_RANGE::range_flag had correct flag value
for the last key component, and EITS only uses one-component
pseudo-indexes)
sig_return: Solaris/OSX returns different function ptr
Move defination to my_alarm.h as its the only use.
prevents compile warnings (copied from 10.3 branch)
mysys/my_sync.c:136:19: error: 'cur_dir_name' defined but not used [-Werror=unused-const-variable=]
136 | static const char cur_dir_name[]= {FN_CURLIB, 0};
| ^~~~~~~~~~~~
fix compile error (DEPRECATED) leaked from ssl headers.
In file included from /export/home/dan/mariadb-server-10.4/sql/sys_vars.cc:37:
/export/home/dan/mariadb-server-10.4/sql/sys_vars.ic:69: error: "DEPRECATED" redefined [-Werror]
69 | #define DEPRECATED(X) X
|
In file included from /export/home/dan/mariadb-server-10.4/include/violite.h:150,
from /export/home/dan/mariadb-server-10.4/sql/sql_class.h:38,
from /export/home/dan/mariadb-server-10.4/sql/sys_vars.cc:36:
/usr/include/openssl/ssl.h:2356: note: this is the location of the previous definition
2356 | # define DEPRECATED __attribute__((deprecated))
|
Avoid Werror condition on non-Linux:
plugin/server_audit/server_audit.c:2267:7: error: variable 'db_len_off' set but not used [-Werror=unused-but-set-variable]
2267 | int db_len_off;
| ^~~~~~~~~~
plugin/server_audit/server_audit.c:2266:7: error: variable 'db_off' set but not used [-Werror=unused-but-set-variable]
2266 | int db_off;
| ^~~~~~
auth_gssapi fix include path for Solaris
Consistent with the upstream packaged patch:
https://github.com/OpenIndiana/oi-userland/blob/oi/hipster/components/database/mariadb-103/patches/06-gssapi.h.patch
compile warnings on Solaris
[ 91%] Building C object plugin/server_audit/CMakeFiles/server_audit.dir/server_audit.c.o
/plugin/server_audit/server_audit.c: In function 'auditing_v8':
/plugin/server_audit/server_audit.c:2194:20: error: unused variable 'db_len_off' [-Werror=unused-variable]
2194 | static const int db_len_off= 128;
| ^~~~~~~~~~
/plugin/server_audit/server_audit.c:2193:20: error: unused variable 'db_off' [-Werror=unused-variable]
2193 | static const int db_off= 120;
| ^~~~~~
/plugin/server_audit/server_audit.c:2192:20: error: unused variable 'cmd_off' [-Werror=unused-variable]
2192 | static const int cmd_off= 4432;
| ^~~~~~~
At top level:
/plugin/server_audit/server_audit.c:2192:20: error: 'cmd_off' defined but not used [-Werror=unused-const-variable=]
/plugin/server_audit/server_audit.c:2193:20: error: 'db_off' defined but not used [-Werror=unused-const-variable=]
2193 | static const int db_off= 120;
| ^~~~~~
/plugin/server_audit/server_audit.c:2194:20: error: 'db_len_off' defined but not used [-Werror=unused-const-variable=]
2194 | static const int db_len_off= 128;
| ^~~~~~~~~~
cc1: all warnings being treated as errors
tested on:
$ uname -a
SunOS openindiana 5.11 illumos-b97b1727bc i86pc i386 i86pc
read TLS with my_thread_var
write TLS with set_mysys_var()
my_thread_var is no longer __attribute__ ((const)): this attribute
is simply incorrect here. Read gcc manual for more information.
sql/threadpool_generic.cc fails with that attribute.
The reason why we have wsrep_on() at all is that the macro WSREP(thd)
depends on the definition of THD, and that is intentionally an opaque
data type for InnoDB. So, we cannot avoid invoking wsrep_on(), but
we can evaluate the less expensive conditions thd && WSREP_ON before
calling the function.
Global_read_lock: Use WSREP_NNULL(thd) instead of wsrep_on(thd)
because we not only know the definition of THD but also that
the pointer is not null.
wsrep_open(): Use WSREP(thd) instead of wsrep_on(thd).
InnoDB: Replace thd && wsrep_on(thd) with wsrep_on(thd), now that
the condition has been merged to the definition of the macro
wsrep_on().
If the server is compiled WITH_WSREP=OFF, we should avoid evaluating
conditions on a global variable that is constant.
WSREP_ON_: Renamed from WSREP_ON. Defined only WITH_WSREP=ON.
WSREP_ON: Defined as unlikely(WSREP_ON_).
wsrep_on(): Defined as WSREP_ON && wsrep_service->wsrep_on_func().
The reason why we have wsrep_on() at all is that the macro WSREP(thd)
depends on the definition of THD, and that is intentionally an opaque
data type for InnoDB. So, we cannot avoid invoking wsrep_on(), but
we can evaluate the less expensive condition WSREP_ON before calling
the function.