mariadb/mysys
Xiaotong Niu 8a505980c5 MDEV-28430: Fix memory barrier missing of lf_alloc on Arm64
When testing MariaDB on Arm64, a stall issue will occur, jira link:
https://jira.mariadb.org/browse/MDEV-28430.

The stall occurs because of an unexpected circular reference in the
LF_PINS->purgatory list which is traversed in lf_pinbox_real_free().

We found that on Arm64, ABA problem in LF_ALLOCATOR->top list was not
solved, and various undefined problems will occur, including circular
reference in LF_PINS->purgatory list.

The following codes are used to solve ABA problem, code copied
from below link.
cb4c271355/mysys/lf_alloc-pin.c (L501-)#L505

     do
     {
503     node= allocator->top;
504     lf_pin(pins, 0, node);
505  } while (node != allocator->top && LF_BACKOFF());

1. ABA problem on Arm64
Combine the below steps to analyze how ABA problem occur on Arm64, the
relevant codes in steps are simplified, code line numbers below are in
MariaDB v10.4.
------------------------------------------------------------------------
Abnormal case.
Initial state: pin = 0, top = A, top list: A->B

T1                              T2
                                step1. write top=B //seq-cst, #L517
                                step2. write A->next= "any"
                                step3. read pin==0 //relaxed, #L295
step1. write pin=A  //seq-cst, #L504
step2. read old value of top==A  //relaxed, #L505
step3. next=A->next="any" //#L517
                                step4. write A->next=B,top=A //#L420-435
step4. CAS(top,A,next) //#L517
step5. write pin=0     //#L521
------------------------------------------------------------------------
Above case is due to T1.step2 reading the old value of top, causing
"T1.step3, T1.step4" and "T2.step4" to occur at the same time, in other
words, they are not mutually exclusive.

It may happen that T2.step4 is sandwiched between T1.step3 and T1.step4,
which cause top to be updated to "any", which may be in-use or invalid
address.

2. Analyze above issue with Dekker's algorithm
Above problem can be mapped to Dekker's algorithm, link is as below
https://en.wikipedia.org/wiki/Dekker%27s_algorithm.
The following extracts the read and write operations on 'top' and 'pin',
and maps them to Dekker's algorithm to analyze the root cause.
------------------------------------------------------------------------
Initial state: top = A, pin = 0
T1                                    T2
store_seq_cst(pin, A) // write pin    store_seq_cst(top, B)  //write top
rt= load_relaxed(top) // read top     rp= load_relaxed(pin)  //read pin

if (rt == A && rp == 0) printf("oops\n"); // will "oops" be printed?
------------------------------------------------------------------------
How T1 and T2 enter their critical section:
(1) T1, write pin, if T1 reads that top has not been updated, T1 enter
its critical section(T1.step3 and T1.step4, try to obtain 'A', #L517),
otherwise just give up (T1 without priority).
(2) T2, write top, if T2 reads that pin has not been updated, T2 enter
critical section(T2.step4, try to add 'A' to top list again, #L420-435),
otherwise wait until pin!=A (T2 with priority).

In the previous code, due to load 'top' and 'pin' with relaxed semantic,
on arm and ppc, there is no guarantee that the above critical sections
are mutually exclusive, in other words, "oops" will be printed.

This bug only happens on arm and ppc, not x86. On current x86
implementation, load is always seq-cst (relaxed and seq-cst load
generates same machine code), as shown in https://godbolt.org/z/sEzMvnjd9

3. Fix method
Add sequential-consistency semantic to read 'top' in #L505(T1.step2),
Add sequential-consistency semantic to read "el->pin[i]" in #L295
and #L320.

4. Issue reproduce
Add "delay" after #L503 in lf_alloc-pin.c, When run unit.lf, can quickly
get segment fault because "top" point to an invalid address. For detail,
see comment area of below link.
https://jira.mariadb.org/browse/MDEV-28430.

5. Futher improvement
To make this code more robust and safe on all platforms, we recommend
replacing volatile with C11 atomics and to fix all data races. This will
also make the code easier to reason.

Signed-off-by: Xiaotong Niu <xiaotong.niu@arm.com>
2024-02-16 17:52:47 +02:00
..
array.c MDEV-22387: Do not violate __attribute__((nonnull)) 2020-11-02 14:19:21 +02:00
base64.c Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
ChangeLog
charset-def.c Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
charset.c MDEV-30556 UPPER() returns an empty string for U+0251 in Unicode-5.2.0+ collations for utf8 2023-02-03 18:18:32 +04:00
checksum.c Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
CMakeLists.txt Merge 10.3 into 10.4 2021-10-21 14:57:00 +03:00
errors.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
file_logger.c MENT-1098 Crash during update on 10.4.17 after upgrade from 10.4.10 2021-02-25 13:23:59 +02:00
get_password.c MDEV-31461 mariadb SIGSEGV when built with -DCLIENT_PLUGIN_DIALOG=STATIC 2023-06-19 12:12:21 +02:00
guess_malloc_library.c Fixed compiler warnings in guess_malloc_library 2018-01-15 16:44:44 +02:00
hash.c Merge 10.2 into 10.3 2020-10-22 08:26:28 +03:00
lf_alloc-pin.c MDEV-28430: Fix memory barrier missing of lf_alloc on Arm64 2024-02-16 17:52:47 +02:00
lf_dynarray.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
lf_hash.c Merge branch '10.3' into 10.4 2021-12-07 09:47:42 +01:00
list.c Merge 10.1 into 10.2 2020-05-13 11:12:31 +03:00
ma_dyncol.c MDEV-32140: Valgrind/MSAN warnings in dynamic_column_update_move_left 2023-09-26 13:56:05 +02:00
mf_arr_appstr.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_cache.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
mf_dirname.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
mf_fn_ext.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
mf_format.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
mf_getdate.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_iocache.c Merge 10.3 into 10.4 2021-06-21 12:38:25 +03:00
mf_iocache2.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
mf_keycache.c MDEV-29613 Improve WITH_DBUG_TRACE=OFF 2022-09-23 13:40:42 +03:00
mf_keycaches.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_loadpath.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_pack.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_path.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_qsort.c fix clang build: check alignment the other way 2021-07-26 12:37:25 +03:00
mf_qsort2.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_radix.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
mf_same.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_sort.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
mf_soundex.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_tempdir.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
mf_tempfile.c MDEV-26601: mysys - O_TMPFILE ^ O_CREAT 2021-09-14 21:06:34 +10:00
mf_unixpath.c Update FSF Address 2019-05-11 21:29:06 +03:00
mf_wcomp.c Update FSF Address 2019-05-11 21:29:06 +03:00
mulalloc.c Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
my_access.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
my_addr_resolve.c Backport my_addr_resolve from 10.6 to get latest bug fixes in. 2023-11-27 19:08:14 +02:00
my_alarm.c Update FSF Address 2019-05-11 21:29:06 +03:00
my_alloc.c Fixed memory leak introduces by a fix for MDEV-29932 2023-11-27 19:08:14 +02:00
my_atomic_writes.c Minimize unsafe C functions usage 2023-03-08 10:36:25 +00:00
my_basename.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
my_bit.c Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
my_bitmap.c Merge 10.3 into 10.4 2020-05-30 11:04:27 +03:00
my_chmod.c Merge branch '5.5' into 10.1 2019-05-11 19:15:57 +03:00
my_chsize.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_compare.c MDEV-30048 Prefix keys for CHAR work differently for MyISAM vs InnoDB 2023-10-24 03:35:48 +04:00
my_compress.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
my_conio.c Minimize unsafe C functions usage 2023-03-08 10:36:25 +00:00
my_context.c Merge 10.2 into 10.3 2021-10-13 11:38:21 +03:00
my_copy.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_cpu.c MDEV-19845: Make my_cpu.h self-contained 2020-02-01 14:56:05 +02:00
my_create.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
my_default.c MDEV-27038 Custom configuration file procedure does not work with Docker Desktop for Windows 10+ 2023-07-11 22:00:14 +10:00
my_delete.c Merge 10.3 into 10.4 2022-12-13 11:37:33 +02:00
my_div.c Update FSF Address 2019-05-11 21:29:06 +03:00
my_dlerror.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
my_error.c remove non-working debug assert 2020-10-29 09:35:39 +01:00
my_file.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
my_fopen.c Shrink my_atomic.h and my_cpu.h scope 2020-04-15 22:23:03 +04:00
my_fstream.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_gethwaddr.c Fix building my_gethwaddr() on OpenBSD 2022-10-27 11:30:45 +11:00
my_getncpus.c Correct FreeBSD cpuset_t type 2020-04-03 15:30:33 +02:00
my_getopt.c MDEV-18215: mariabackup does not report unknown command line options 2020-06-14 13:23:07 +03:00
my_getpagesize.c Update FSF Address 2019-05-11 21:29:06 +03:00
my_getsystime.c MDEV-20079 When setting back the system time while mysqld is running, NOW() and UNIX_TIMESTAMP() results get stuck 2019-09-04 09:30:43 +02:00
my_getwd.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_init.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
my_largepage.c Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
my_lib.c Merge 10.3 into 10.4 2020-04-16 12:12:26 +03:00
my_libwrap.c Update FSF Address 2019-05-11 21:29:06 +03:00
my_likely.c Minimize unsafe C functions usage 2023-03-08 10:36:25 +00:00
my_lock.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_lockmem.c Merge 10.3 into 10.4 2019-10-10 11:19:25 +03:00
my_malloc.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_memmem.c Update FSF Address 2019-05-11 21:29:06 +03:00
my_mess.c MDEV-23846: O_TMPFILE error in mysqlbinlog stream output breaks restore 2020-11-23 12:16:45 +05:30
my_minidump.cc MDEV-11499 mysqltest, Windows : improve diagnostics if server fails to shutdown 2021-09-24 11:49:28 +02:00
my_mkdir.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_mmap.c Update FSF Address 2019-05-11 21:29:06 +03:00
my_new.cc Update FSF Address 2019-05-11 21:29:06 +03:00
my_once.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_open.c Shrink my_atomic.h and my_cpu.h scope 2020-04-15 22:23:03 +04:00
my_port.c Follow-up to changing FSF address 2019-05-11 18:30:45 +03:00
my_pread.c Merge 10.3 into 10.4 2019-06-19 10:49:00 +03:00
my_pthread.c MDEV-15795 Stack exceeded if pthread_attr_setstacksize(&thr_attr,8196) succeeds 2022-10-22 10:24:14 +02:00
my_quick.c Update FSF Address 2019-05-11 21:29:06 +03:00
my_rdtsc.c MDEV-23175: my_timer_milliseconds clock_gettime for multiple platfomrs 2021-12-22 16:51:22 +01:00
my_read.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_redel.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
my_rename.c Merge 10.3 into 10.4 2022-12-13 11:37:33 +02:00
my_rnd.c MDEV-18531 : Use WolfSSL instead of YaSSL as "bundled" SSL/encryption library 2019-05-22 13:48:25 +02:00
my_safehash.c Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
my_safehash.h Update FSF address 2019-05-10 20:52:00 +03:00
my_seek.c myseek: AIX has no "tell" 2021-03-19 11:14:53 +11:00
my_setuser.c mysys: rename ME_xxx flags to match plugin api 2018-06-04 12:32:23 +02:00
my_sleep.c Update FSF Address 2019-05-11 21:29:06 +03:00
my_static.c Merge 10.3 into 10.4 2019-08-31 06:53:45 +03:00
my_static.h Update FSF Address 2019-05-11 21:29:06 +03:00
my_symlink.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
my_symlink2.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
my_sync.c MDEV-381: fdatasync() does not correctly flush growing binlog file 2023-08-10 19:52:04 +02:00
my_thr_init.c Minimize unsafe C functions usage 2023-03-08 10:36:25 +00:00
my_uuid.c Merge branch '5.5' into 10.1 2019-05-11 19:15:57 +03:00
my_win_popen.cc Ensure that source files contain only valid UTF8 encodings (#2188) 2023-05-19 13:21:34 +01:00
my_wincond.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
my_winerr.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
my_winfile.c MDEV-30162 Fix occasional "Permission denied" on Windows caused by buggy 3rd party 2022-12-07 14:26:10 +01:00
my_winthread.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
my_write.c Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
mysys_priv.h MDEV-30162 Fix occasional "Permission denied" on Windows caused by buggy 3rd party 2022-12-07 14:26:10 +01:00
psi_noop.c Merge 10.2 into 10.3 2021-08-31 08:36:59 +03:00
ptr_cmp.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
queues.c Merge branch '5.5' into 10.1 2020-04-30 17:36:41 +02:00
safemalloc.c Improve reporting from sf_report_leaked_memory() 2023-11-27 19:08:14 +02:00
stacktrace.c Merge 10.2 into 10.3 2020-08-20 09:12:16 +03:00
string.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
test_charset.c Update FSF Address 2019-05-11 21:29:06 +03:00
test_dir.c Update FSF Address 2019-05-11 21:29:06 +03:00
test_thr_mutex.c Update FSF address 2019-05-10 20:52:00 +03:00
test_xml.c Update FSF Address 2019-05-11 21:29:06 +03:00
testhash.c Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
thr_alarm.c Merge 10.3 into 10.4 2022-10-25 10:04:37 +03:00
thr_lock.c MDEV-22227 Assertion `state_ == s_exec' failed in wsrep::client_state::start_transaction 2021-04-28 11:11:01 +03:00
thr_mutex.c MDEV-20183 data race at safe_mutex_lock() 2019-07-26 12:36:06 +03:00
thr_rwlock.c Update FSF Address 2019-05-11 21:29:06 +03:00
thr_timer.c MDEV-15795 Stack exceeded if pthread_attr_setstacksize(&thr_attr,8196) succeeds 2022-10-22 10:24:14 +02:00
tree.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
typelib.c Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
waiting_threads.c Shrink my_atomic.h and my_cpu.h scope 2020-04-15 22:23:03 +04:00
wqueue.c Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00