Commit graph

3,694 commits

Author SHA1 Message Date
Marko Mäkelä
607de9c7ac Merge 10.5 into 10.6 2021-10-13 15:17:20 +03:00
Marko Mäkelä
f1acd9f14b MDEV-26819 SET GLOBAL innodb_max_dirty_pages_pct=0 occasionally fails to trigger writes
innodb_max_dirty_pages_pct_update(),
innodb_max_dirty_pages_pct_lwm_update():
Invoke buf_pool.page_cleaner_wakeup() in order to wake up
buf_flush_page_cleaner. This allows the test innodb.page_cleaner
to run without any occasional timeouts.

The occasional hangs were introduced by
commit 7b1252c03d (MDEV-24278).
2021-10-13 15:16:23 +03:00
Marko Mäkelä
ebd5205120 MDEV-26467 fixup for clang-9 and earlier
Before clang-10, asm goto was not supported, so we must use fetch_or().
2021-10-12 07:47:10 +03:00
Marko Mäkelä
668a5f3d12 MDEV-26720: Optimize single-bit atomic operations on IA-32 and AMD64
This is mostly working around a bad compiler optimization.

The Intel 80386 processor introduced some bit operations that would be
the perfect translation for atomic single-bit read-modify-and-write
operations. Alas, even the latest compilers as of today
(GCC 11, clang 13, Microsoft Visual C 19.29) would generate a loop around
LOCK CMPXCHG instead of emitting the instructions
LOCK BTS (fetch_or()), LOCK BTR (fetch_and()), LOCK BTC (fetch_xor()).

fil_space_t::clear_closing(): Clear the CLOSING flag.

fil_space_t::set_stopping_check(): Special variant of
fil_space_t::set_stopping() that will return the old value
of the STOPPING flag after atomically setting it.

fil_space_t::clear_stopping(): Use fetch_sub() to toggle
the STOPPING flag. The flag is guaranteed to be set upon
calling this function, hence we will toggle it to clear it.
On IA-32 and AMD64, this will translate into
the 80486 LOCK XADD instruction.

fil_space_t::check_pending_operations(): Replace a Boolean
variable with a goto label, to allow more compact code
generation for fil_space_t::set_stopping_check().

trx_rseg_t: Define private accessors ref_set() and ref_reset()
for setting and clearing the flags.

trx_lock_t::clear_deadlock_victim(), trx_lock_t::set_wsrep_victim():
Accessors for clearing and setting the flags.
2021-10-03 13:49:40 +03:00
Marko Mäkelä
0144d1d2a6 MDEV-26720: rw_lock: Prefer fetch_sub() to fetch_and()
rw_lock::write_unlock(): Revert part of
commit d46b42489a (MDEV-24142)
to make the IA-32 and AMD64 implementation faster.
2021-10-02 11:29:44 +03:00
Marko Mäkelä
d301cc8edb Merge 10.5 into 10.6 2021-10-02 11:19:55 +03:00
Marko Mäkelä
98a7fa0ce7 MDEV-26720: optimize rw_lock for IA-32, AMD64 2021-10-02 11:14:14 +03:00
Marko Mäkelä
ec619a1def MDEV-26467 fixup: Prefer fetch_add() to fetch_or() on IA-32 and AMD64 2021-10-02 09:25:40 +03:00
Marko Mäkelä
a49e394399 Merge 10.5 into 10.6 2021-09-30 10:38:44 +03:00
Marko Mäkelä
be803f037f MDEV-25215 Excessive logging "InnoDB: Cannot close file"
In commit 45ed9dd957 (MDEV-23855)
when removing fil_system.LRU we failed to rate-limit the output
for reporting violations of innodb_open_files or open_files_limit.

If the server is run with a small limit of open files that is
well below the number of .ibd files that are being accessed by
the workload, and if at the same time innodb_log_file_size is
very small so that log checkpoints will occur frequently,
the process of enforcing the open files limit may be run very often.

fil_space_t::try_to_close(): Display at most one message per call,
and only if at least 5 seconds have elapsed since the last time a
message was output.

fil_node_open_file(): Only output a summary message if
fil_space_t::try_to_close() displayed a message during this run.
(Note: multiple threads may execute fil_node_open_file() on
different files at the same time.)

fil_space_t::get(): Do not dereference a null pointer if n & STOPPING.
This was caught by the test case below.

Unfortunately, it is not possible to create a fully deterministic
test case (expecting exactly 1 message to be emitted). The following with
--innodb-open-files=10 --innodb-log-file-size=4m
would occasionally fail to find the message in the log:

--source include/have_innodb.inc
--source include/have_partition.inc
--source include/have_sequence.inc

call mtr.add_suppression("InnoDB: innodb_open_files=10 is exceeded");

CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY) ENGINE=InnoDB
PARTITION BY key (pk) PARTITIONS 100;

INSERT INTO t1 SELECT * FROM seq_1_to_100;
--disable_query_log
let $n=400;
while ($n)
{
BEGIN; DELETE FROM t1; ROLLBACK;
dec $n;
}
--enable_query_log

let SEARCH_FILE= $MYSQLTEST_VARDIR/log/mysqld.1.err;
let SEARCH_PATTERN= \[Note\] InnoDB: Cannot close file;
-- source include/search_pattern_in_file.inc

DROP TABLE t1;
2021-09-30 10:01:10 +03:00
Marko Mäkelä
e79fa9f542 MDEV-26467: Actually use spinloop on block_lock
In commit 277ba134ad
we accidentally omitted this.
2021-09-28 17:19:26 +03:00
Marko Mäkelä
d0d4ade918 MDEV-26467: Universally implement spin loop
Previously, neither our wrapper of Microsoft Windows SRWLOCK
nor the futex-less implementation SUX_LOCK_GENERIC supported spin loops.

This was suggested by Vladislav Vaintroub.
2021-09-28 17:19:06 +03:00
Marko Mäkelä
d95361107c Merge 10.5 into 10.6 2021-09-24 14:38:52 +03:00
Marko Mäkelä
88f38661b7 Merge 10.4 into 10.5 2021-09-24 12:14:35 +03:00
Marko Mäkelä
d5bd704f4b Merge 10.3 into 10.4 2021-09-24 12:11:52 +03:00
Marko Mäkelä
4bfdba2e89 MDEV-26672 innodb_undo_log_truncate may reset transaction ID sequence
trx_rseg_header_create(): Add a parameter for the value that is
to be written to TRX_RSEG_MAX_TRX_ID. If we omit this write, then
the updated test innodb.undo_truncate will fail for the 4k, 8k, 16k
page sizes. This was broken ever since
commit 947efe17ed (MDEV-15158)
removed the writes of transaction identifiers to the TRX_SYS page.

srv_do_purge(): Truncate undo tablespaces also during slow shutdown
(innodb_fast_shutdown=0).

Thanks to Krunal Bauskar for noticing this problem.
2021-09-24 11:23:37 +03:00
Marko Mäkelä
f5794e1dc6 MDEV-26445 innodb_undo_log_truncate is unnecessarily slow
trx_purge_truncate_history(): Do not force a write of the undo tablespace
that is being truncated. Instead, prevent page writes by acquiring
an exclusive latch on all dirty pages of the tablespace.

fseg_create(): Relax an assertion that could fail if a dirty undo page
is being initialized during undo tablespace truncation (and
trx_purge_truncate_history() already acquired an exclusive latch on it).

fsp_page_create(): If we are truncating a tablespace, try to reuse
a page that we may have already latched exclusively (because it was
in buf_pool.flush_list). To some extent, this helps the test
innodb.undo_truncate,16k to avoid running out of buffer pool.

mtr_t::commit_shrink(): Mark as clean all pages that are outside the
new bounds of the tablespace, and only add the newly reinitialized pages
to the buf_pool.flush_list.

buf_page_create(): Do not unnecessarily invoke change buffer merge on
undo tablespaces.

buf_page_t::clear_oldest_modification(bool temporary): Move some
assertions to the caller buf_page_write_complete().

innodb.undo_truncate: Use a bigger innodb_buffer_pool_size=24M.
On my system, it would otherwise hang 1 out of 1547 attempts
(on the 40th repeat of innodb.undo_truncate,16k).
Other page sizes were not affected.
2021-09-24 08:24:03 +03:00
Marko Mäkelä
f5fddae3cb MDEV-26450: Corruption due to innodb_undo_log_truncate
At least since commit 055a3334ad
(MDEV-13564) the undo log truncation in InnoDB did not work correctly.

The main issue is that during the execution of
trx_purge_truncate_history() some pages of the newly truncated
undo tablespace could be discarded.

This is improved from commit 1cb218c37c
which was applied to earlier-version branches.

fsp_try_extend_data_file(): Apply the peculiar rounding of
fil_space_t::size_in_header only to the system tablespace,
whose size can be expressed in megabytes in a configuration parameter.
Other files may freely grow by a number of pages.

fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces,
and mention the file name in the error message.

mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace:
(1) durably write the log
(2) release the page latches of the rebuilt tablespace
(3) release the mutexes
(4) truncate the file
(5) release the tablespace latch
This is refactored from trx_purge_truncate_history().

log_write_and_flush_prepare(), log_write_and_flush(): New functions
to durably write log during mtr_t::commit_shrink().
2021-09-24 08:22:19 +03:00
Marko Mäkelä
9024498e88 Merge 10.3 into 10.4 2021-09-22 18:26:54 +03:00
Marko Mäkelä
b46cf33ab8 Merge 10.2 into 10.3 2021-09-22 18:01:41 +03:00
Marko Mäkelä
1cb218c37c MDEV-26450: Corruption due to innodb_undo_log_truncate
At least since commit 055a3334ad
(MDEV-13564) the undo log truncation in InnoDB did not work correctly.

The main issue is that during the execution of
trx_purge_truncate_history() some pages of the newly truncated
undo tablespace could be discarded.

fsp_try_extend_data_file(): Apply the peculiar rounding of
fil_space_t::size_in_header only to the system tablespace,
whose size can be expressed in megabytes in a configuration parameter.
Other files may freely grow by a number of pages.

fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces,
and mention the file name in the error message.

mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace
file. First, durably write the log, then shrink the file, and finally
release the page latches of the rebuilt tablespace. Refactored from
trx_purge_truncate_history().

log_write_and_flush_prepare(), log_write_and_flush(): New functions
to durably write log during mtr_t::commit_shrink().
2021-09-22 14:15:00 +03:00
Marko Mäkelä
b740b2356d MDEV-25919 fixup: Acquire MDL also in defragmentation
dict_stats_process_entry_from_defrag_pool(): Acquire MDL on the table
for which we are invoking dict_stats_save_defrag_stats(), to avoid
any race condition with DROP TABLE or similar operations.
2021-09-18 14:39:32 +03:00
Marko Mäkelä
ae8c8d8874 Merge 10.5 into 10.6 2021-09-17 20:07:38 +03:00
Marko Mäkelä
699de65d5e Merge 10.4 into 10.5 2021-09-17 19:57:13 +03:00
Marko Mäkelä
1e9c922fa7 MDEV-26623 Possible race condition between statistics and bulk insert
When computing statistics, let us play it safe and check whether
an insert into an empty table is in progress, once we have acquired
the root page latch. If yes, let us pretend that the table is empty,
just like MVCC reads would do.

It is unlikely that this race condition could lead into any crashes
before MDEV-24621, because the index tree structure should be protected
by the page latches. But, at least we can avoid some busy work and
return earlier.

As part of this, some code that is only used for statistics calculation
is being moved into static functions in that compilation unit.
2021-09-17 14:40:41 +03:00
Marko Mäkelä
1a4a7dddb6 Cleanup: Make btr_root_block_get() more robust
btr_root_block_get(): Check for index->page == FIL_NULL.

btr_root_get(): Declare static. Other callers can invoke
btr_root_block_get() directly.

btr_get_size(): Remove conditions that are checked in
btr_root_block_get().
2021-09-17 14:40:41 +03:00
Krunal Bauskar
48bbc44733 MDEV-26609 : Avoid deriving ELEMENT_PER_LATCH from cacheline
* buffer pool has latches that protect access to pages.

* there is a latch per N pages.
  (check page_hash_table for more details)

* N is calculated based on the cacheline size.

* for example: if cacheline size is
  : 64 then 7 pages pointers + 1 latch can be hosted on the same cacheline
  : 128 then 15 pages pointers + 1 latch can be hosted on the same cacheline

* arm generally have wider cacheline so with arm 1 latch is used
  to access 15 pages vs with x86 1 latch is used to access 7 pages.
  Naturally, the contention is more with arm case.

* said patch help relax this contention by limiting the elements
  per cacheline to 7 (+ 1 latch slot).
  for wider-cacheline (say 128), the remaining 8 slots are kept empty.
  this ensures there are no 2 latches on the same cacheline to avoid
  latch level contention.

Based on suggestion from Marko, the same logic is now extended to
lock_sys_t::hash_table.
2021-09-17 11:58:49 +03:00
Eugene Kosov
5b0a76078a MDEV-26621 assertion failue "index->table->persistent_autoinc" in /storage/innobase/btr/btr0btr.cc during IMPORT
dict_index_t::clear_instant_alter(): when searhing for an AUTO_INCREMENT column
don't skip the beginning of the list because the field can be at the beginning of the list
2021-09-16 16:52:20 +06:00
Marko Mäkelä
15139964d5 Merge 10.5 into 10.6 2021-09-11 17:55:27 +03:00
Marko Mäkelä
f99cc0d2fc Merge fixup 7c33ecb665
The merge accidentally
omitted the 10.4 commit 472b35c7ef and
reverted the 10.5 commit 6e3bd663d0.
2021-09-11 11:45:17 +03:00
Vicențiu Ciorbaru
7c33ecb665 Merge remote-tracking branch 'upstream/10.4' into 10.5 2021-09-10 17:16:18 +03:00
Marko Mäkelä
6e3bd663d0 MDEV-25776 Race conditions in buf_pool.page_hash around buf_pool.watch
Any modification of buf_pool.page_hash is supposed to be protected
by buf_pool.mutex and page_hash_latch::write_lock(). The buffer pool
watch mechanism of the InnoDB change buffer was violating that
ever since commit b1ab211dee (MDEV-15053).

buf_pool_t::watch_set(): Extend the critical section of buf_pool.mutex.

buf_pool_t::watch_unset(): Define non-inline, because
calls are infrequent and this function became larger.
If we have to detach a sentinel from page_hash,
do it while holding both the mutex and the exclusive hash latch.

buf_pool_t::watch_remove(): Assert that the mutex is being held.

buf_page_init_for_read(): Remove some work-arounds for
previously encountered race conditions related to buf_pool.watch.
2021-09-09 09:05:26 +03:00
Eugene Kosov
d089b51d66 TSAN: unprotected global variable
WARNING: ThreadSanitizer: data race (pid=1510842)
  Write of size 8 at 0x0000067b1e98 by main thread:
    #0 os_file_pwrite(IORequest const&, int, unsigned char const*, unsigned long, unsigned long, dberr_t*) /storage/innobase/os/os0file.cc:2928:2 (mariadbd+0x234c5ac)
    #1 os_file_write_func(IORequest const&, char const*, int, void const*, unsigned long, unsigned long) /storage/innobase/os/os0file.cc:2963:20 (mariadbd+0x234c019)
    #2 file_os_io::write(char const*, unsigned long, st_::span<unsigned char const>) /storage/innobase/log/log0log.cc:320:10 (mariadbd+0x22eaa50)
    #3 log_file_t::write(unsigned long, st_::span<unsigned char const>) /storage/innobase/log/log0log.cc:434:18 (mariadbd+0x22eb1d8)
    #4 log_t::file::write(unsigned long, st_::span<unsigned char>) /storage/innobase/log/log0log.cc:496:29 (mariadbd+0x22ebb55)
    #5 log_write_buf(unsigned char*, unsigned long, unsigned long, unsigned long, unsigned long) /storage/innobase/log/log0log.cc:614:14 (mariadbd+0x22f1b51)
    #6 log_write(bool) /storage/innobase/log/log0log.cc:755:2 (mariadbd+0x22ed2ec)
    #7 log_write_up_to(unsigned long, bool, bool, completion_callback const*) /storage/innobase/log/log0log.cc:817:5 (mariadbd+0x22eca44)
    #8 log_checkpoint_low(unsigned long, unsigned long) /storage/innobase/buf/buf0flu.cc:1734:5 (mariadbd+0x20d37c1)
    #9 log_checkpoint() /storage/innobase/buf/buf0flu.cc:1787:10 (mariadbd+0x20cd155)
    #10 buf_flush_wait_flushed(unsigned long) /storage/innobase/buf/buf0flu.cc:1867:5 (mariadbd+0x20ccf8f)
    #11 log_make_checkpoint() /storage/innobase/buf/buf0flu.cc:1793:3 (mariadbd+0x20cc4c9)
    #12 buf_dblwr_t::create() /storage/innobase/buf/buf0dblwr.cc:216:3 (mariadbd+0x209076a)
    #13 srv_start(bool) /storage/innobase/srv/srv0start.cc:1685:20 (mariadbd+0x256b4aa)
    #14 innodb_init(void*) /storage/innobase/handler/ha_innodb.cc:4188:8 (mariadbd+0x1ed40da)
    #15 ha_initialize_handlerton(st_plugin_int*) /sql/handler.cc:659:31 (mariadbd+0xf7c2b6)
    #16 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /sql/sql_plugin.cc:1463:9 (mariadbd+0x160fedb)
    #17 plugin_init(int*, char**, int) /sql/sql_plugin.cc:1756:15 (mariadbd+0x160f53f)
    #18 init_server_components() /sql/mysqld.cc:5043:7 (mariadbd+0xd71462)
    #19 mysqld_main(int, char**) /sql/mysqld.cc:5655:7 (mariadbd+0xd6ae87)
    #20 main /sql/main.cc:34:10 (mariadbd+0xd661c8)

  Previous write of size 8 at 0x0000067b1e98 by thread T3:
    #0 os_file_pwrite(IORequest const&, int, unsigned char const*, unsigned long, unsigned long, dberr_t*) /storage/innobase/os/os0file.cc:2928:2 (mariadbd+0x234c5ac)
    #1 os_file_write_func(IORequest const&, char const*, int, void const*, unsigned long, unsigned long) /storage/innobase/os/os0file.cc:2963:20 (mariadbd+0x234c019)
    #2 file_os_io::write(char const*, unsigned long, st_::span<unsigned char const>) /storage/innobase/log/log0log.cc:320:10 (mariadbd+0x22eaa50)
    #3 log_file_t::write(unsigned long, st_::span<unsigned char const>) /storage/innobase/log/log0log.cc:434:18 (mariadbd+0x22eb1d8)
    #4 log_t::file::write(unsigned long, st_::span<unsigned char>) /storage/innobase/log/log0log.cc:496:29 (mariadbd+0x22ebb55)
    #5 log_write_checkpoint_info(unsigned long) /storage/innobase/log/log0log.cc:911:14 (mariadbd+0x22edd4e)
    #6 log_checkpoint_low(unsigned long, unsigned long) /storage/innobase/buf/buf0flu.cc:1755:3 (mariadbd+0x20d3a3d)
    #7 buf_flush_sync_for_checkpoint(unsigned long) /storage/innobase/buf/buf0flu.cc:1947:7 (mariadbd+0x20d4163)
    #8 buf_flush_page_cleaner() /storage/innobase/buf/buf0flu.cc:2186:9 (mariadbd+0x20cdab1)
    #9 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x20c3aaa)
    #10 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x20c39bd)
    #11 void std:🧵:_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x20c3965)
    #12 std:🧵:_Invoker<std::tuple<void (*)()> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x20c3905)
    #13 std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x20c37f9)
    #14 <null> <null> (libstdc++.so.6+0xd230f)

  Location is global 'os_n_file_writes' of size 8 at 0x0000067b1e98 (mariadbd+0x67b1e98)

  Make variable atomic.
2021-09-09 04:08:50 +06:00
Eugene Kosov
748539837e TSAN: unprotected global variable
Write of size 1 at 0x0000067abe08 by thread T3 (mutexes: write M1372):
    #0 buf_flush_page_cleaner() /storage/innobase/buf/buf0flu.cc:2366:29 (mariadbd+0x20cea7c)
    #1 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x20c3a8a)
    #2 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x20c399d)
    #3 void std:🧵:_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x20c3945)
    #4 std:🧵:_Invoker<std::tuple<void (*)()> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x20c38e5)
    #5 std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x20c37d9)
    #6 <null> <null> (libstdc++.so.6+0xd230f)

  Previous read of size 1 at 0x0000067abe08 by main thread:
    #0 logs_empty_and_mark_files_at_shutdown() /storage/innobase/log/log0log.cc:1094:6 (mariadbd+0x22eeff3)
    #1 innodb_shutdown() /storage/innobase/srv/srv0start.cc:1970:3 (mariadbd+0x256ffd6)
    #2 innobase_end(handlerton*, ha_panic_function) /storage/innobase/handler/ha_innodb.cc:4265:3 (mariadbd+0x1ee3fc4)
    #3 ha_finalize_handlerton(st_plugin_int*) /sql/handler.cc:595:5 (mariadbd+0xf7bac9)
    #4 plugin_deinitialize(st_plugin_int*, bool) /sql/sql_plugin.cc:1266:9 (mariadbd+0x1611789)
    #5 reap_plugins() /sql/sql_plugin.cc:1342:7 (mariadbd+0x160e17d)
    #6 plugin_shutdown() /sql/sql_plugin.cc:2050:7 (mariadbd+0x1611f42)
    #7 clean_up(bool) /sql/mysqld.cc:1923:3 (mariadbd+0xd67a4c)
    #8 unireg_abort /sql/mysqld.cc:1835:3 (mariadbd+0xd67605)
    #9 mysqld_main(int, char**) /sql/mysqld.cc:5741:7 (mariadbd+0xd6b36a)
    #10 main /sql/main.cc:34:10 (mariadbd+0xd661a8)

  Location is global 'buf_page_cleaner_is_active' of size 1 at 0x0000067abe08 (mariadbd+0x67abe08)
2021-09-09 04:08:50 +06:00
Eugene Kosov
7f50edb215 TSAN: data race on a global counter
WARNING: ThreadSanitizer: data race (pid=1503350)
  Write of size 8 at 0x0000067b1f20 by thread T3:
    #0 os_file_sync_posix(int) /storage/innobase/os/os0file.cc:895:5 (mariadbd+0x23493f6)
    #1 os_file_flush_func(int) /storage/innobase/os/os0file.cc:983:8 (mariadbd+0x2349204)
    #2 file_os_io::flush() /storage/innobase/log/log0log.cc:326:10 (mariadbd+0x22eaaa9)
    #3 log_file_t::flush() /storage/innobase/log/log0log.cc:440:18 (mariadbd+0x22eb2d0)
    #4 log_t::file::flush() /storage/innobase/log/log0log.cc:507:29 (mariadbd+0x22ebe69)
    #5 log_write_flush_to_disk_low(unsigned long) /storage/innobase/log/log0log.cc:629:17 (mariadbd+0x22ed3f3)
    #6 log_write_up_to(unsigned long, bool, bool, completion_callback const*) /storage/innobase/log/log0log.cc:829:3 (mariadbd+0x22ecb04)
    #7 log_checkpoint_low(unsigned long, unsigned long) /storage/innobase/buf/buf0flu.cc:1734:5 (mariadbd+0x20d37f1)
    #8 buf_flush_sync_for_checkpoint(unsigned long) /storage/innobase/buf/buf0flu.cc:1947:7 (mariadbd+0x20d4193)
    #9 buf_flush_page_cleaner() /storage/innobase/buf/buf0flu.cc:2186:9 (mariadbd+0x20cdad7)
    #10 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x20c3aaa)
    #11 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x20c39bd)
    #12 void std:🧵:_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x20c3965)
    #13 std:🧵:_Invoker<std::tuple<void (*)()> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x20c3905)
    #14 std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x20c37f9)
    #15 <null> <null> (libstdc++.so.6+0xd230f)

  Previous write of size 8 at 0x0000067b1f20 by main thread:
    #0 os_file_sync_posix(int) /storage/innobase/os/os0file.cc:895:5 (mariadbd+0x23493f6)
    #1 os_file_flush_func(int) /storage/innobase/os/os0file.cc:983:8 (mariadbd+0x2349204)
    #2 fil_space_t::flush_low() /storage/innobase/fil/fil0fil.cc:504:5 (mariadbd+0x205cad5)
    #3 fil_flush_file_spaces() /storage/innobase/fil/fil0fil.cc:2947:13 (mariadbd+0x206523f)
    #4 log_checkpoint() /storage/innobase/buf/buf0flu.cc:1777:5 (mariadbd+0x20cd069)
    #5 buf_flush_wait_flushed(unsigned long) /storage/innobase/buf/buf0flu.cc:1867:5 (mariadbd+0x20ccf95)
    #6 log_make_checkpoint() /storage/innobase/buf/buf0flu.cc:1793:3 (mariadbd+0x20cc4c9)
    #7 buf_dblwr_t::create() /storage/innobase/buf/buf0dblwr.cc:216:3 (mariadbd+0x209076a)
    #8 srv_start(bool) /storage/innobase/srv/srv0start.cc:1685:20 (mariadbd+0x256b514)
    #9 innodb_init(void*) /storage/innobase/handler/ha_innodb.cc:4188:8 (mariadbd+0x1ed406a)
    #10 ha_initialize_handlerton(st_plugin_int*) /sql/handler.cc:659:31 (mariadbd+0xf7c246)
    #11 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /sql/sql_plugin.cc:1463:9 (mariadbd+0x160fe6b)
    #12 plugin_init(int*, char**, int) /sql/sql_plugin.cc:1756:15 (mariadbd+0x160f4cf)
    #13 init_server_components() /sql/mysqld.cc:5043:7 (mariadbd+0xd713f2)
    #14 mysqld_main(int, char**) /sql/mysqld.cc:5655:7 (mariadbd+0xd6ae17)
    #15 main /sql/main.cc:34:10 (mariadbd+0xd66158)

This is a correct report by TSAN for an obvious case: unprotected global
counter. Fix it by making counter std::atomic.
2021-09-09 04:08:50 +06:00
Marko Mäkelä
eb2f2c1e5f MDEV-26547 fixup: Wait for read completion
buf_load(): Wait for the submitted reads to finish before updating
innodb_buffer_pool_load_status.
2021-09-07 08:55:08 +03:00
Marko Mäkelä
277ba134ad MDEV-26467: Avoid futile spin loops
Typically, index_lock and fil_space_t::latch will be held for a longer
time than the spin loop in latch acquisition would be waiting for.
Let us avoid spin loops for those as well as dict_sys.latch, which
could be held in exclusive mode for a longer time (while loading
metadata into the buffer pool and the dictionary cache).

Performance testing on a dual Intel Xeon E5-2630 v4 (2 NUMA nodes)
suggests that the buffer pool page latch (block_lock) benefits from a
spin loop in both read-only and read-write workloads where the working
set is slightly larger than the buffer pool. Presumably, most contention
would occur on leaf page latches. Contention on upper level pages in
the buffer pool should intuitively last longer.

We introduce srw_spin_lock and srw_spin_mutex to allow users of
srw_lock or srw_mutex to opt in for the spin loop.
On Microsoft Windows, a spin loop variant was and will not be available;
srw_mutex and srw_lock will simply wrap SRWLOCK.
That is, on Microsoft Windows, the parameters innodb_sync_spin_loops
and innodb_spin_wait_delay will only affect block_lock.
2021-09-06 12:32:24 +03:00
Marko Mäkelä
a73eedbf3f MDEV-26467 Unnecessary compare-and-swap loop in srw_mutex
srw_mutex::wait_and_lock(): In the spin loop, we will try to poll
for non-conflicting lock word state by reads, avoiding any writes.
We invoke explicit std::atomic_thread_fence(std::memory_order_acquire)
before returning. The individual operations on the lock word
can use memory_order_relaxed.

srw_mutex:🔒 Document that the value for a single writer is
HOLDER+1 instead of HOLDER.

srw_mutex::wr_lock_try(), srw_mutex::wr_unlock(): Adjust the value
of the lock word of a single writer from HOLDER to HOLDER+1.
2021-09-06 12:16:26 +03:00
Marko Mäkelä
7730dd392b Merge 10.5 into 10.6 2021-09-06 10:31:32 +03:00
Marko Mäkelä
84c578c795 MDEV-26547 Restoring InnoDB buffer pool dump is single-threaded for no reason
buf_read_page_background(): Remove the parameter "bool sync"
and always actually initiate a page read in the background.

buf_load(): Always submit asynchronous reads. This allows
page checksums to be verified in concurrent threads as
soon as the reads are completed.
2021-09-06 10:14:24 +03:00
Vladislav Vaintroub
12c3d1e1d7 Fix Windows warnings and tests for -DPLUGIN_PERFSCHEMA=NO 2021-09-05 20:22:39 +02:00
Marko Mäkelä
5ae5453291 MDEV-25919 fixup: MSAN and Valgrind errors related to statistics
dict_table_close(): Fix a race condition around dict_stats_deinit().
This was not observed; it should have been caught by an assertion.

dict_stats_deinit(): Slightly simplify the code.

ha_innobase::info_low(): If the table is unreadable,
initialize some dummy statistics.
2021-09-04 19:08:14 +03:00
Marko Mäkelä
45a05fda27 MDEV-25919: Replace dict_table_t::stats_bg_flag with MDL
The purpose of dict_table_t::stats_bg_flag was to prevent
race conditions between DDL operations and a background thread
that updates persistent statistics for InnoDB tables.

Now that with the parent commit, we started to acquire a
shared meta-data lock (MDL) on the InnoDB persistent statistics tables
in background tasks that access them, we may easily acquire MDL
on the table for which the statistics are being updated. This will by
design prevent race conditions with any DDL operations on that table,
and the stats_bg_flag may be removed.

dict_stats_process_entry_from_recalc_pool(): Complete rewrite.
During the processing, retain the entry in recalc_pool, so
that dict_stats_recalc_pool_del() will be able to request
deletion of the entry, or delete the entry if its caller is
holding MDL_EXCLUSIVE while we are waiting for MDL.

recalc_pool: In addition to the table ID, store a state for
inter-thread communication, so that dict_stats_recalc_pool_del()
can wait until all processing is finished.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:54:55 +03:00
Marko Mäkelä
c5fd9aa562 MDEV-25919: Lock tables before acquiring dict_sys.latch
In commit 1bd681c8b3 (MDEV-25506 part 3)
we introduced a "fake instant timeout" when a transaction would wait
for a table or record lock while holding dict_sys.latch. This prevented
a deadlock of the server but could cause bogus errors for operations
on the InnoDB persistent statistics tables.

A better fix is to ensure that whenever a transaction is being
executed in the InnoDB internal SQL parser (which will for now
require dict_sys.latch to be held), it will already have acquired
all locks that could be required for the execution. So, we will
acquire the following locks upfront, before acquiring dict_sys.latch:

(1) MDL on the affected user table (acquired by the SQL layer)
(2) If applicable (not for RENAME TABLE): InnoDB table lock
(3) If persistent statistics are going to be modified:
(3.a) MDL_SHARED on mysql.innodb_table_stats, mysql.innodb_index_stats
(3.b) exclusive table locks on the statistics tables
(4) Exclusive table locks on the InnoDB data dictionary tables
(not needed in ANALYZE TABLE and the like)

Note: Acquiring exclusive locks on the statistics tables may cause
more locking conflicts between concurrent DDL operations.
Notably, RENAME TABLE will lock the statistics tables
even if no persistent statistics are enabled for the table.

DROP DATABASE will only acquire locks on statistics tables if
persistent statistics are enabled for the tables on which the
SQL layer is invoking ha_innobase::delete_table().
For any "garbage collection" in innodb_drop_database(), a timeout
while acquiring locks on the statistics tables will result in any
statistics not being deleted for any tables that the SQL layer
did not know about.

If innodb_defragment=ON, information may be written to the statistics
tables even for tables for which InnoDB persistent statistics are
disabled. But, DROP TABLE will no longer attempt to delete that
information if persistent statistics are not enabled for the table.

This change should also fix the hangs related to InnoDB persistent
statistics and STATS_AUTO_RECALC (MDEV-15020) as well as
a bug that running ALTER TABLE on the statistics tables
concurrently with running ALTER TABLE on InnoDB tables could
cause trouble.

lock_rec_enqueue_waiting(), lock_table_enqueue_waiting():
Do not issue a fake instant timeout error when the transaction
is holding dict_sys.latch. Instead, assert that the dict_sys.latch
is never being held here.

lock_sys_tables(): A new function to acquire exclusive locks on all
dictionary tables, in case DROP TABLE or similar operation is
being executed. Locking non-hard-coded tables is optional to avoid
a crash in row_merge_drop_temp_indexes(). The SYS_VIRTUAL table was
introduced in MySQL 5.7 and MariaDB Server 10.2. Normally, we require
all these dictionary tables to exist before executing any DDL, but
the function row_merge_drop_temp_indexes() is an exception.
When upgrading from MariaDB Server 10.1 or MySQL 5.6 or earlier,
the table SYS_VIRTUAL would not exist at this point.

ha_innobase::commit_inplace_alter_table(): Invoke
log_write_up_to() while not holding dict_sys.latch.

dict_sys_t::remove(), dict_table_close(): No longer try to
drop index stubs that were left behind by aborted online ADD INDEX.
Such indexes should be dropped from the InnoDB data dictionary by
row_merge_drop_indexes() as part of the failed DDL operation.
Stubs for aborted indexes may only be left behind in the
data dictionary cache.

dict_stats_fetch_from_ps(): Use a normal read-only transaction.

ha_innobase::delete_table(), ha_innobase::truncate(), fts_lock_table():
While waiting for purge to stop using the table,
do not hold dict_sys.latch.

ha_innobase::delete_table(): Implement a work-around for the rollback
of ALTER TABLE...ADD PARTITION. MDL_EXCLUSIVE would not be held if
ALTER TABLE hits lock_wait_timeout while trying to upgrade the MDL
due to a conflicting LOCK TABLES, such as in the first ALTER TABLE
in the test case of Bug#53676 in parts.partition_special_innodb.
Therefore, we must explicitly stop purge, because it would not be
stopped by MDL.

dict_stats_func(), btr_defragment_chunk(): Allocate a THD so that
we can acquire MDL on the InnoDB persistent statistics tables.

mysqltest_embedded: Invoke ha_pre_shutdown() before free_used_memory()
in order to avoid ASAN heap-use-after-free related to acquire_thd().

trx_t::dict_operation_lock_mode: Changed the type to bool.

row_mysql_lock_data_dictionary(), row_mysql_unlock_data_dictionary():
Implemented as macros.

rollback_inplace_alter_table(): Apply an infinite timeout to lock waits.

innodb_thd_increment_pending_ops(): Wrapper for
thd_increment_pending_ops(). Never attempt async operation for
InnoDB background threads, such as the trx_t::commit() in
dict_stats_process_entry_from_recalc_pool().

lock_sys_t::cancel(trx_t*): Make dictionary transactions immune to KILL.

lock_wait(): Make dictionary transactions immune to KILL, and to
lock wait timeout when waiting for locks on dictionary tables.

parts.partition_special_innodb: Use lock_wait_timeout=0 to instantly
get ER_LOCK_WAIT_TIMEOUT.

main.mdl: Filter out MDL on InnoDB persistent statistics tables

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:54:44 +03:00
Marko Mäkelä
094de71742 MDEV-25919 preparation: Various cleanup
que_eval_sql(): Remove the parameter lock_dict. The only caller
with lock_dict=true was dict_stats_exec_sql(), which will now
explicitly invoke dict_sys.lock() and dict_sys.unlock() by itself.

row_import_cleanup(): Do not unnecessarily lock the dictionary.
Concurrent access to the table during ALTER TABLE...IMPORT TABLESPACE
is prevented by MDL and the fact that there cannot exist any
undo log or change buffer records that would refer to the table
or tablespace.

row_import_for_mysql(): Do not unnecessarily lock the dictionary
while accessing fil_system. Thanks to MDL_EXCLUSIVE that was acquired
by the SQL layer, only one IMPORT may be in effect for the table name.

row_quiesce_set_state(): Do not unnecessarily lock the dictionary.
The dict_table_t::quiesce state is documented to be protected by
all index latches, which we are acquiring.

dict_table_close(): Introduce a simpler variant with fewer parameters.

dict_table_close(): Reduce the amount of calls.
We can simply invoke dict_table_t::release() on startup or
in DDL operations, or when the table is inaccessible.
In none of these cases, there is no need to invalidate the
InnoDB persistent statistics.

pars_info_t::graph_owns_us: Remove (unused).

pars_info_free(): Define inline.

fts_delete(), trx_t::evict_table(), row_prebuilt_free(),
row_rename_table_for_mysql(): Simplify.

row_mysql_lock_data_dictionary(): Remove some references;
use dict_sys.lock() and dict_sys.unlock() instead.

row_mysql_lock_table(): Remove. Use lock_table_for_trx() instead.

ha_innobase::check_if_supported_inplace_alter(),
row_create_table_for_mysql(): Simply assert dict_sys.sys_tables_exist().
In commit 49e2c8f0a6 and
commit 1bd681c8b3 srv_start()
actually guarantees that the system tables will exist,
or the server is in read-only mode, or startup will fail.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:54:20 +03:00
Marko Mäkelä
6a2cd6f4b4 MDEV-19505 Do not hold mutex while calling que_graph_free()
sym_tab_free_private(): Do not call dict_table_close(), but
simply invoke dict_table_t::release(), which we can do without
locking the whole dictionary cache. (Note: On user tables it
may still be necessary to invoke dict_table_close(), so that
InnoDB persistent statistics will be deinitialized as expected.)

fts_check_corrupt(), row_fts_merge_insert(): Invoke
aux_table->release() to simplify the code. This is never a user table.

fts_que_graph_free(), fts_que_graph_free_check_lock(): Replaced with
que_graph_free().

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:54:06 +03:00
Marko Mäkelä
82b7c561b7 MDEV-24258 Merge dict_sys.mutex into dict_sys.latch
In the parent commit, dict_sys.latch could theoretically have been
replaced with a mutex. But, we can do better and merge dict_sys.mutex
into dict_sys.latch. Generally, every occurrence of dict_sys.mutex_lock()
will be replaced with dict_sys.lock().

The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex
will be removed along with dict_sys.mutex. The dict_sys.latch
will remain instrumented as dict_operation_lock.

Some use of dict_sys.lock() will be replaced with dict_sys.freeze(),
which we will reintroduce for the new shared mode. Most notably,
concurrent table lookups are possible as long as the tables are present
in the dict_sys cache. In particular, this will allow more concurrency
among InnoDB purge workers.

Because dict_sys.mutex will no longer 'throttle' the threads that purge
InnoDB transaction history, a performance degradation may be observed
unless innodb_purge_threads=1.

The table cache eviction policy will become FIFO-like,
similar to what happened to fil_system.LRU
in commit 45ed9dd957.
The name of the list dict_sys.table_LRU will become somewhat misleading;
that list contains tables that may be evicted, even though the
eviction policy no longer is least-recently-used but first-in-first-out.
(Note: Tables can never be evicted as long as locks exist on them or
the tables are in use by some thread.)

As demonstrated by the test perfschema.sxlock_func, there
will be less contention on dict_sys.latch, because some previous
use of exclusive latches will be replaced with shared latches.

fts_parse_sql_no_dict_lock(): Replaced with pars_sql().

fts_get_table_name_prefix(): Merged to fts_optimize_create().

dict_stats_update_transient_for_index(): Deduplicated some code.

ha_innobase::info_low(), dict_stats_stop_bg(): Use a combination
of dict_sys.latch and table->stats_mutex_lock() to cover the
changes of BG_STAT_SHOULD_QUIT, because the flag is being read
in dict_stats_update_persistent() while not holding dict_sys.latch.

row_discard_tablespace_for_mysql(): Protect stats_bg_flag by
exclusive dict_sys.latch, like most other code does.

row_quiesce_table_has_fts_index(): Remove unnecessary mutex
acquisition. FLUSH TABLES...FOR EXPORT is protected by MDL.

row_import::set_root_by_heuristic(): Remove unnecessary mutex
acquisition. ALTER TABLE...IMPORT TABLESPACE is protected by MDL.

row_ins_sec_index_entry_low(): Replace a call
to dict_set_corrupted_index_cache_only(). Reads of index->type
were not really protected by dict_sys.mutex, and writes
(flagging an index corrupted) should be extremely rare.

dict_stats_process_entry_from_defrag_pool(): Only freeze the dictionary,
do not lock it exclusively.

dict_stats_wait_bg_to_stop_using_table(), DICT_BG_YIELD: Remove trx.
We can simply invoke dict_sys.unlock() and dict_sys.lock() directly.

dict_acquire_mdl_shared()<trylock=false>: Assert that dict_sys.latch is
only held in shared more, not exclusive mode. Only acquire it in
exclusive mode if the table needs to be loaded to the cache.

dict_sys_t::acquire(): Remove. Relocating elements in dict_sys.table_LRU
would require holding an exclusive latch, which we want to avoid
for performance reasons.

dict_sys_t::allow_eviction(): Add the table first to dict_sys.table_LRU,
to compensate for the removal of dict_sys_t::acquire(). This function
is only invoked by INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS.

dict_table_open_on_id(), dict_table_open_on_name(): If dict_locked=false,
try to acquire dict_sys.latch in shared mode. Only acquire the latch in
exclusive mode if the table is not found in the cache.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:51:35 +03:00
Marko Mäkelä
2e08b6d78c MDEV-24258 preparation: Remove dict_sys.freeze() and unfreeze()
This will essentially make dict_sys.latch a mutex
(it is only acquired in exclusive mode).

The subsequent commit will merge dict_sys.mutex into dict_sys.latch
and reintroduce dict_sys.freeze() for those cases where we currently
acquire only dict_sys.latch but not dict_sys.mutex. The case where
both are acquired will be mapped to dict_sys.lock().

i_s_sys_tables_fill_table_stats(): Invoke dict_sys.prevent_eviction()
and the new function dict_sys.allow_eviction() to avoid table eviction
while a row in INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS is being
produced.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:48:10 +03:00
Michael Widenius
497b694936 Fixed compile errors when compiling with HAVE_valgrind 2021-08-24 23:05:21 +03:00
Marko Mäkelä
49f95c4065 Merge 10.5 into 10.6 2021-08-23 11:21:33 +03:00