The only purpose of ibuf_bitmap_mutex is to prevent a deadlock between
two concurrent invocations of ibuf_update_free_bits_for_two_pages_low()
on the same pair of bitmap pages, but in opposite order.
The mutex is unnecessarily serializing the execution of the function
even when it is being invoked on totally different tablespaces.
To avoid deadlocks, it suffices to ensure that the two page latches
are being acquired in a deterministic (sorted) order.
The fix of MDEV-23456 (commit b1009ae5c1)
introduced a livelock between page flushing and a thread that is
executing buf_page_create().
buf_page_create(): If the current mini-transaction is holding
an exclusive latch on the page, do not attempt to acquire another
one, and do not care about any I/O fix.
mtr_t::have_x_latch(): Replaces mtr_t::get_fix_count().
dyn_buf_t::for_each_block(const Functor&) const: A new variant.
rw_lock_own(): Add a const qualifier.
Reviewed by: Thirunarayanan Balathandayuthapani
Patch removes dict_index_t::stats_latch. Table/index statistics now
protected with dict_sys->mutex. That way statistics computation can
happen in parallel in several threads and dict_sys->mutex will be locked
only for a short period of time.
This patch is a joint work with Marko Mäkelä
dict_index_t:🔒 make mutable which allows to pass const pointer
when only lock is touched in an object
btr_height_get()
btr_get_size(): make index argument const for better type safety
btr_estimate_number_of_different_key_vals(): now returns computed values
instead of setting fields in dict_index_t directly
remove everything related to dict_index_t::stats_latch
dict_stats_index_set_n_diff(): now returns computed values instead
of setting fields in dict_index_t directly
dict_stats_analyze_index(): now returns computed values instead
of setting fields in dict_index_t directly
Reviewed by: Marko Mäkelä
We can simply use C++11 std::atomic for avoiding undefined behaviour
related to concurrent stores to a shared variable. On most if not all
ISAs, std::memory_order_relaxed loads and stores will not really
differ from non-atomic loads or stores.
The srv_monitor_event and the srv_monitor_thread would not be
created when InnoDB is in read-only mode. Yet, some code would
unconditionally invoke os_event_set(srv_monitor_event).
Starting with MDEV-17441 we would no longer have os_once,
and we would always initialize zip_pad_info_t::mutex and
dict_table_t::autoinc_mutex, even for tables are not in
ROW_FORMAT=COMPRESSED nor include any AUTO_INCREMENT column.
mutex_free() on those unnecessary objects would make shutdown very slow
compared to older versions.
Let us use std::mutex for those two mutexes, to reduce the overhead.
The critical sections protected by these mutexes is very small, and
therefore contention or the need for any instrumentation should
be unlikely.
In AddressSanitizer, we only want memory poisoning to happen
in connection with custom memory allocation or freeing.
The primary use of MEM_UNDEFINED is for declaring memory uninitialized
in Valgrind or MemorySanitizer. We do not want MEM_UNDEFINED to
have the unwanted side effect that AddressSanitizer would no longer
be able to complain about accessing unallocated memory.
MEM_UNDEFINED(): Define as no-op for AddressSanitizer.
MEM_MAKE_ADDRESSABLE(): Define as MEM_UNDEFINED() or
ASAN_UNPOISON_MEMORY_REGION().
MEM_CHECK_ADDRESSABLE(): Wrap also __asan_region_is_poisoned().
MemorySanitizer (clang -fsanitize=memory) requires that all code
be compiled with instrumentation enabled. The only exception is the
C runtime library. Failure to use instrumented libraries will cause
bogus messages about memory being uninitialized.
In WITH_MSAN builds, we must avoid calling getservbyname(),
because even though it is a standard library function, it is
not instrumented, not even in clang 10.
Note: Before MariaDB Server 10.5, ./mtr will typically fail
due to the old PCRE library, which was updated in MDEV-14024.
The following cmake options were tested on 10.5
in commit 94d0bb4dbe:
cmake \
-DCMAKE_C_FLAGS='-march=native -O2' \
-DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \
-DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \
-DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \
-DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DWITH_SAFEMALLOC=OFF \
-DWITH_{ZLIB,SSL,PCRE}=bundled \
-DHAVE_LIBAIO_H=0 \
-DWITH_MSAN=ON
MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED()
and __msan_unpoison().
MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for
VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow().
InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros.
ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in
functions instead of inline assembler when building WITH_MSAN.
This will require at least -msse4.2 when building for IA-32 or AMD64.
The inline assembler would not be instrumented, and would thus cause
bogus failures.
In the merge 9e6e43551f
we made Atomic_counter a more generic wrapper of std::atomic
so that dict_index_t would support the implicit assignment operator.
It is better to revert the changes to Atomic_counter and
instead introduce Atomic_relaxed as a generic wrapper to std::atomic.
Unlike Atomic_counter, we will not define operator++, operator+=
or similar, because we want to make the operations more explicit
in the users of Atomic_wrapper, because unlike loads and stores,
atomic read-modify-write operations always incur some overhead.
- There are multiple inconsistency and incorrect way in which rw-lock
stats are calculated.
- shared rw-lock stats:
"rounds" counter is incremented only once for N rounds done
in spin-cycle.
- all rw-lock stats:
If the spin-cycle is short-circuited then attempts are re-counted.
[If spin-cycle is interrupted, before it completes
srv_n_spin_wait_rounds (default 30) rounds, spin_count is incremented
to consider this. If thread resumes spin-cycle (due to unavailability
of the locks) and is again interrupted or completed, spin_count
is again incremented with the total count, failing to adjust the
previous attempt increment].
- s/x rw-lock stats:
spin_loop counter is not incremented at-all instead it is projected
as 0 (in show engine output) and division to calculate spin-round per
spin-loop is adjusted.
As per the original semantics spin_loop counter should be incremented
once per spin_loop execution.
- sx rw-lock stats:
sx locks increments spin_loop counter but instead of incrementing it
once for a spin_loop invocation it does it multiple times based on how
many time spin_loop flow is repeated for same instance post os-wait.
As part of the SPATIAL INDEX implementation in InnoDB,
dict_index_t was expanded by a rtr_ssn_t field. There are only
3 operations for this field, all protected by rtr_ssn_t::mutex:
* btr_cur_search_to_nth_level() stores the least significant 32 bits
of the 64-bit value that is stored in the index root page.
(This would better be done when the table is opened for the
very first time.)
* rtr_get_new_ssn_id() increments the value by 1.
* rtr_get_current_ssn_id() reads the current value.
All these operations can be implemented equally safely by using
atomic memory access operations.
InnoDB RNG maintains global state, causing otherwise unnecessary bus
traffic. Even worse, this is cross-mutex traffic. That is, different
mutexes suffer from contention.
Fixed delay of 4 was verified to give best throughput by OLTP update
index and read-write benchmarks on Intel Broadwell (2/20/40) and
ARM (1/46/46).
This is a backport of ce04790065 from
MariaDB Server 10.3.
ut_rnd_interval(): Remove the first parameter, which was mostly
passed as 0. Implement as a simple wrapper around ut_rnd_gen().
Trivially return 0 if the size of the interval is smaller than 2.
ut_rnd_ulint_counter, ut_rnd_gen_next_ulint(), ut_rnd_gen_ulint(): Remove.