mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-05 14:45:31 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	277ba134ad	MDEV-26467: Avoid futile spin loops Typically, index_lock and fil_space_t::latch will be held for a longer time than the spin loop in latch acquisition would be waiting for. Let us avoid spin loops for those as well as dict_sys.latch, which could be held in exclusive mode for a longer time (while loading metadata into the buffer pool and the dictionary cache). Performance testing on a dual Intel Xeon E5-2630 v4 (2 NUMA nodes) suggests that the buffer pool page latch (block_lock) benefits from a spin loop in both read-only and read-write workloads where the working set is slightly larger than the buffer pool. Presumably, most contention would occur on leaf page latches. Contention on upper level pages in the buffer pool should intuitively last longer. We introduce srw_spin_lock and srw_spin_mutex to allow users of srw_lock or srw_mutex to opt in for the spin loop. On Microsoft Windows, a spin loop variant was and will not be available; srw_mutex and srw_lock will simply wrap SRWLOCK. That is, on Microsoft Windows, the parameters innodb_sync_spin_loops and innodb_spin_wait_delay will only affect block_lock.	2021-09-06 12:32:24 +03:00
Marko Mäkelä	a73eedbf3f	MDEV-26467 Unnecessary compare-and-swap loop in srw_mutex srw_mutex::wait_and_lock(): In the spin loop, we will try to poll for non-conflicting lock word state by reads, avoiding any writes. We invoke explicit std::atomic_thread_fence(std::memory_order_acquire) before returning. The individual operations on the lock word can use memory_order_relaxed. srw_mutex:🔒 Document that the value for a single writer is HOLDER+1 instead of HOLDER. srw_mutex::wr_lock_try(), srw_mutex::wr_unlock(): Adjust the value of the lock word of a single writer from HOLDER to HOLDER+1.	2021-09-06 12:16:26 +03:00
Marko Mäkelä	8121d03fd4	MDEV-25404 fixup: Fix ssux_lock_low::u_wr_upgrade() The U-to-X upgrade turned out to be incorrect. A debug assertion failed in wr_wait(), called from mtr_defer_drop_ahi() in a stress test with innodb_adaptive_hash_index=ON. A correct upgrade procedure ought to be readers.fetch_add(WRITER-1) to register ourselves as a WRITER (or waiting writer) and to release the reference that was being held for the U lock. Thanks to Matthias Leich for catching the problem.	2021-04-22 15:43:30 +03:00
Marko Mäkelä	8751aa7397	MDEV-25404: ssux_lock_low: Introduce a separate writer mutex Having both readers and writers use a single lock word in futex system calls caused performance regression compared to SRW_LOCK_DUMMY (mutex and 2 condition variables). A contributing factor is that we did not accurately keep track of the number of waiting threads and thus had to invoke system calls to wake up any waiting threads. SUX_LOCK_GENERIC: Renamed from SRW_LOCK_DUMMY. This is the original implementation, with rw_lock (std::atomic<uint32_t>), a mutex and two condition variables. Using a separate writer mutex (as described below) is not possible, because the mutex ownership in a buf_block_t::lock must be able to transfer from a write submitter thread to an I/O completion thread, and pthread_mutex_lock() may assume that the submitter thread is recursively acquiring the mutex that it already holds, while in reality the I/O completion thread is the real owner. POSIX does not define an interface for requesting a mutex to be non-recursive. On Microsoft Windows, srw_lock_low will remain a simple wrapper of SRWLOCK. On 32-bit Microsoft Windows, sizeof(SRWLOCK)=4 while sizeof(srw_lock_low)=8. On other platforms, srw_lock_low is an alias of ssux_lock_low, the Simple (non-recursive) Shared/Update/eXclusive lock. In the futex-based implementation of ssux_lock_low (Linux, OpenBSD, Microsoft Windows), we shall use a dedicated mutex for exclusive requests (writer), and have a WRITER flag in the 'readers' lock word to inform that a writer is holding the lock or waiting for the lock to be granted. When the WRITER flag is set, all lock requests must acquire the writer mutex. Normally, shared (S) lock requests simply perform a compare-and-swap on the 'readers' word. Update locks are implemented as a combination of writer mutex and a normal counter in the 'readers' lock word. The conflict between U and X locks is guaranteed by the writer mutex. Unlike SUX_LOCK_GENERIC, wr_u_downgrade() will not wake up any pending rd_lock() waits. They will wait until u_unlock() releases the writer mutex. The ssux_lock_low is always wrapped by sux_lock (with a recursion count of U and X locks), used for dict_index_t::lock and buf_block_t::lock. Their memory footprint for the futex-based implementation will increase by sizeof(srw_mutex), or 4 bytes. This change addresses a performance regression in read-only benchmarks, such as sysbench oltp_read_only. Also write performance was improved. On 32-bit Linux and OpenBSD, lock_sys_t::hash_table will allocate two hash table elements for each srw_lock (14 instead of 15 hash table cells per 64-byte cache line on IA-32). On Microsoft Windows, sizeof(SRWLOCK)==sizeof(void*) and there is no change. Reviewed by: Vladislav Vaintroub Tested by: Axel Schwenke and Vladislav Vaintroub	2021-04-19 18:15:49 +03:00
Marko Mäkelä	040c16ab8b	MDEV-25404: Optimize srw_mutex on Linux, OpenBSD, Windows On Linux, OpenBSD and Microsoft Windows, srw_mutex was an alias for a rw-lock while we only need mutex functionality. Let us implement a futex-based mutex with one bit for HOLDER and 31 bits for counting waiting requests. srw_lock::wr_unlock() can avoid waking up a waiter when no waiting requests exist. (Previously, we only had 1-bit rw_lock::WRITER_WAITING flag that could be wrongly cleared if multiple waiting wr_lock() exist. Now we have no problem with up to 2,147,483,648 conflicting threads.) On 64-bit Microsoft Windows, the advantage is that sizeof(srw_mutex) is 4, while sizeof(SRWLOCK) would be 8. Reviewed by: Vladislav Vaintroub	2021-04-19 18:03:17 +03:00
Marko Mäkelä	a1542f8a57	MDEV-24643: Assertion failed in rw_lock::update_unlock() mtr_defer_drop_ahi(): Upgrade the U lock to X lock and downgrade it back to U lock in case the adaptive hash index needs to be dropped. This regression was introduced in commit `03ca6495df` (MDEV-24142).	2021-02-12 17:44:58 +02:00
Vladislav Vaintroub	9118fd360a	MDEV-24142 - Windows - do not use WaitOnAddress-based ssux_lock. WaitOnAddress() turns out to be too CPU-heavy for the specific scenario, which makes it prominent in profiler output on several benchmarks with contended sux_lock. The condition variable implementation does not show the same behavior. Thus, defined SRWLOCK_DUMMY for Windows srw_mutex should remain mapped to SRWLOCK on Windows (since SRWLOCK is smaller).	2020-12-25 13:59:04 +01:00
Marko Mäkelä	59b2848af6	Fix the SRW_LOCK_DUMMY build with PLUGIN_PERFSCHEMA=NO srw_lock_low: Declare the member functions public when wrapping rw_lock_t	2020-12-15 17:53:44 +02:00
Marko Mäkelä	20da7b222d	MDEV-24410: Bug in SRW_LOCK_DUMMY rw_lock_t wrapper In commit `43d3dad114` we forgot to invert the return values of rw_tryrdlock() and rw_trywrlock(), causing strange failures.	2020-12-15 16:52:25 +02:00
Marko Mäkelä	43d3dad114	MDEV-24142/MDEV-24167 fixup: Split ssux_lock and srw_lock This conceptually reverts commit `1fdc161d8f` and reintroduces an option for srw_lock to wrap a native implementation. The srw_lock and srw_lock_low differ from ssux_lock and ssux_lock_low in that Slim SUX locks support three modes (Shared, Update, eXclusive) while Slim RW locks support only two (Read, Write). On Microsoft Windows, the srw_lock will be implemented by SRWLOCK. On Linux and OpenBSD, it will be implemented by rw_lock and the futex system call, just like earlier. On other systems or if SRW_LOCK_DUMMY is defined on anything else than Microsoft Windows, rw_lock_t will be used. ssux_lock_low::read_lock(), ssux_lock_low::update_lock(): Correct the SRW_LOCK_DUMMY implementation to prevent hangs. The intention of commit `1fdc161d8f` seems to have been do ... while loops, but the 'do' keyword was missing. This total breakage was missed in commit `260161fc9f` which did reduce the probability of the hangs. ssux_lock_low::u_unlock(): In the SRW_LOCK_DUMMY implementation (based on a mutex and two condition variables), always invoke writer_wake() in order to ensure that a waiting update_lock() will be woken up. ssux_lock_low::writer_wait(), ssux_lock_low::readers_wait(): In the SRW_LOCK_DUMMY implementation, keep waiting for the signal until the lock word has changed. The "while" had been changed to "if" in order to avoid hangs.	2020-12-15 14:29:40 +02:00
Marko Mäkelä	e9f33b7760	MDEV-24142: Avoid block_lock alignment loss on 64-bit systems sux_lock::recursive: Move right after the 32-bit sux_lock::lock. This will reduce sizeof(block_lock) from 24 to 16 bytes on 64-bit systems with CMAKE_BUILD_TYPE=RelWithDebInfo. This may be significant, because there will be one buf_block_t::lock for each buffer pool page descriptor. We still have some potential for savings, with sizeof(buf_page_t)==112 and sizeof(buf_block_t)==184 on a GNU/Linux AMD64 system. Note: On GNU/Linux AMD64, sizeof(index_lock) remains 32 bytes (16 with PLUGIN_PERFSCHEMA=NO) even tough it would fit in 24 bytes. This is because sizeof(srw_lock) includes 4 bytes of padding (to 16 bytes) that index_lock_t::recursive cannot reuse. So, in total 4+4 bytes will be lost to padding. This is rather insignificant compared to sizeof(dict_index_t)==400.	2020-12-03 17:42:07 +02:00
Marko Mäkelä	ba2d45dc54	MDEV-24142: Remove INFORMATION_SCHEMA.INNODB_MUTEXES Let us remove sux_lock::waits and the associated bookkeeping. Starting with commit `1669c8890c` the PERFORMANCE_SCHEMA instrumentation interface is keeping track of lock waits. The view INFORMATION_SCHEMA.INNODB_MUTEXES only exported counts of rw-lock waits. Also, SHOW ENGINE INNODB MUTEX will no longer export any information about rw-locks.	2020-12-03 15:28:53 +02:00
Marko Mäkelä	03ca6495df	MDEV-24142: Replace InnoDB rw_lock_t with sux_lock InnoDB buffer pool block and index tree latches depend on a special kind of read-update-write lock that allows reentrant (recursive) acquisition of the 'update' and 'write' locks as well as an upgrade from 'update' lock to 'write' lock. The 'update' lock allows any number of reader locks from other threads, but no concurrent 'update' or 'write' lock. If there were no requirement to support an upgrade from 'update' to 'write', we could compose the lock out of two srw_lock (implemented as any type of native rw-lock, such as SRWLOCK on Microsoft Windows). Removing this requirement is very difficult, so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we implemented an 'update' mode to our srw_lock. Re-entrant or recursive locking is mostly needed when writing or freeing BLOB pages, but also in crash recovery or when merging buffered changes to an index page. The re-entrancy allows us to attach a previously acquired page to a sub-mini-transaction that will be committed before whatever else is holding the page latch. The SUX lock supports Shared ('read'), Update, and eXclusive ('write') locking modes. The S latches are not re-entrant, but a single S latch may be acquired even if the thread already holds an U latch. The idea of the U latch is to allow a write of something that concurrent readers do not care about (such as the contents of BTR_SEG_LEAF, BTR_SEG_TOP and other page allocation metadata structures, or the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field is only updated when a dict_table_t for the table exists, and only read when a dict_table_t for the table is being added to dict_sys.) block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page() to allow concurrent readers but no concurrent modifications while the page is being written to the data file. That latch will be released by buf_page_write_complete() in a different thread. Hence, we use the special lock owner value FOR_IO. The index_lock::u_lock() improves concurrency on operations that involve non-leaf index pages. The interface has been cleaned up a little. We will use x_lock_recursive() instead of x_lock() when we know that a lock is already held by the current thread. Similarly, a lock upgrade from U to X is only allowed via u_x_upgrade() or x_lock_upgraded() but not via x_lock(). We will disable the LatchDebug and sync_array interfaces to InnoDB rw-locks. The SEMAPHORES section of SHOW ENGINE INNODB STATUS output will no longer include any information about InnoDB rw-locks, only TTASEventMutex (cmake -DMUTEXTYPE=event) waits. This will make a part of the 'innotop' script dead code. The block_lock buf_block_t::lock will not be covered by any PERFORMANCE_SCHEMA instrumentation. SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES will no longer output source code file names or line numbers. The dict_index_t::lock will be identified by index and table names, which should be much more useful. PERFORMANCE_SCHEMA is lumping information about all dict_index_t::lock together as event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'. buf_page_free(): Remove the file,line parameters. The sux_lock will not store such diagnostic information. buf_block_dbg_add_level(): Define as empty macro, to be removed in a subsequent commit. Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO the index_lock dict_index_t::lock will be instrumented via PERFORMANCE_SCHEMA. Similar to commit `1669c8890c` we will distinguish lock waits by registering shared_lock,exclusive_lock events instead of try_shared_lock,try_exclusive_lock. Actual 'try' operations will not be instrumented at all. rw_lock_list: Remove. After MDEV-24167, this only covered buf_block_t::lock and dict_index_t::lock. We will output their information by traversing buf_pool or dict_sys.	2020-12-03 15:19:49 +02:00
Marko Mäkelä	d46b42489a	MDEV-24142 preparation: Add srw_mutex and srw_lock::u_lock() The PERFORMANCE_SCHEMA insists on distinguishing read-update-write locks from read-write locks, so we must add template<bool support_u_lock> in rd_lock() and wr_lock() operations. rd_lock::read_trylock(): Add template<bool prioritize_updater=false> which is used by the srw_lock_low::read_lock() loop. As long as an UPDATE lock has already been granted to some thread, we will grant subsequent READ lock requests even if a waiting WRITE lock request exists. This will be necessary to be compatible with existing usage pattern of InnoDB rw_lock_t where the holder of SX-latch (which we will rename to UPDATE latch) may acquire an additional S-latch on the same object. For normal read-write locks without update operations this should make no difference at all, because the rw_lock::UPDATER flag would never be set.	2020-12-03 15:17:16 +02:00
Marko Mäkelä	1669c8890c	MDEV-24167 fixup: Improve the PERFORMANCE_SCHEMA instrumentation Let us try to avoid code bloat for the common case that performance_schema is disabled at runtime, and use ATTRIBUTE_NOINLINE member functions for instrumented latch acquisition. Also, let us distinguish lock waits from non-contended lock requests by using write_lock,read_lock for the requests that lead to waits, and try_write_lock,try_read_lock for the wait-free lock acquisitions. Actual 'try' operations are not being instrumented at all.	2020-12-03 09:55:53 +02:00
Marko Mäkelä	260161fc9f	MDEV-24167 fixup: Avoid hangs in SRW_LOCK_DUMMY In commit `1fdc161d8f` we introduced a mutex-and-condition-variable based fallback implementation for platforms that lack a futex system call. That implementation is prone to hangs. Let us use separate condition variables for shared and exclusive requests.	2020-12-03 09:11:31 +02:00
Marko Mäkelä	1fdc161d8f	MDEV-24167 fixup: Always derive srw_lock from rw_lock Let us always base srw_lock on our own std::atomic<uint32_t> based rw_lock. In this way, we can extend the locks in a portable way across all platforms. We will use futex system calls where available: Linux, OpenBSD, and Microsoft Windows. Elsewhere, we will emulate futex with a mutex and a condition variable. Thanks to Daniel Black for testing this on OpenBSD.	2020-11-30 11:47:09 +02:00
Marko Mäkelä	581aebe29f	MDEV-24167: Fix -DPLUGIN_PERFSCHEMA=NO and Windows debug builds	2020-11-24 21:01:22 +02:00
Marko Mäkelä	c561f9e6e8	MDEV-24167: Use lightweight srw_lock for btr_search_latch Many InnoDB rw-locks unnecessarily depend on the complex InnoDB rw_lock_t implementation that support the SX lock mode as well as recursive acquisition of X or SX locks. One of them is the bunch of adaptive hash index search latches, instrumented as btr_search_latch in PERFORMANCE_SCHEMA. Let us introduce a simpler lock for those in order to reduce overhead. srw_lock: A simple read-write lock that does not support recursion. On Microsoft Windows, this wraps SRWLOCK, only adding runtime overhead if PERFORMANCE_SCHEMA is enabled. On Linux (all architectures), this is implemented with std::atomic<uint32_t> and the futex system call. On other platforms, we will wrap mysql_rwlock_t with zero runtime overhead. The PERFORMANCE_SCHEMA instrumentation differs from InnoDB rw_lock_t in that we will only invoke PSI_RWLOCK_CALL(start_rwlock_wrwait) or PSI_RWLOCK_CALL(start_rwlock_rdwait) if there is an actual conflict.	2020-11-24 15:41:03 +02:00

19 commits