mariadb/storage
Marko Mäkelä 577c61e8be MDEV-23888: Potential server hang on replication with InnoDB
In MDEV-21452, SAFE_MUTEX flagged an ordering problem that involved
trx_t::mutex, LOCK_global_system_variables, and LOCK_commit_ordered
when running
./mtr --no-reorder\
 binlog.binlog_checksum,mix binlog.binlog_commit_wait,mix

Because LOCK_commit_ordered is acquired by replication code before
innobase_commit_ordered() is invoked, and because LOCK_commit_ordered
should be below LOCK_global_system_variables in the global latching
order, it turns out that we must avoid acquiring
LOCK_global_system_variables in any low-level code.

It also turns out that lock_rec_lock() acquires lock_sys_t::mutex
and then carries on to call lock_rec_enqueue_waiting(), which may
invoke THDVAR() via thd_lock_wait_timeout(). This call is problematic
if THDVAR() had never been invoked in that thread earlier.

innobase_trx_init(): Let us invoke THDVAR() at the start of an InnoDB
transaction so that future invocations of THDVAR() will avoid
LOCK_global_system_variables acquisition on the same THD. Because
the first call to intern_sys_var_ptr() will initialize all session
variables by not passing the offset to sync_dynamic_session_variables(),
this will indeed make any future THDVAR() invocation mutex-free.

There are some THDVAR() calls in other code (related to indexed virtual
columns, fulltext indexes, and DDL operations). No SAFE_MUTEX warning
was known for those, but there does not appear to be any replication
test coverage for indexed virtual columns or fulltext indexes. DDL should
be covered, and perhaps DDL code paths were already invoking THDVAR()
while not holding any InnoDB mutex.

Side note: MySQL should avoid this type of deadlocks since
mysql/mysql-server@4d275c8995.
MariaDB never defined alloc_and_copy_thd_dynamic_variables(),
because we prefer to avoid overhead during connection creation.

An important part of the deadlock could be the current handling of
SET GLOBAL binlog_checksum=NONE; and similar assignments.
In binlog_checksum_update(), we would hold LOCK_global_system_variables
while potentially acquiring LOCK_commit_ordered in MYSQL_BIN_LOG::open().
Even if that code was changed later to release
LOCK_global_system_variables during the write to mysql_bin_log,
it could be a good idea for performance to avoid invoking the
expensive code path of THDVAR() while holding any InnoDB mutexes,
such as lock_sys.mutex in lock_rec_enqueue_waiting().

Thanks to Andrei Elkin for debugging the SAFE_MUTEX issue, and to
Sergei Golubchik for the suggestion to invoke THDVAR() early.
2020-10-06 07:47:11 +03:00
..
archive Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
blackhole MDEV-11094: Blackhole table updates on slave fail when row annotation is enabled 2019-05-29 17:35:29 +05:30
cassandra Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
connect Merge remote-tracking branch 'connect/10.2' into 10.2 2020-08-02 11:14:56 +02:00
csv Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
example Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
federated Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
federatedx Merge 10.1 into 10.2 2019-04-03 19:58:47 +03:00
heap MDEV-21082: isnan/isinf compilation errors, isfinite warnings on MacOS 2019-11-19 16:28:15 +03:00
innobase MDEV-23888: Potential server hang on replication with InnoDB 2020-10-06 07:47:11 +03:00
maria Merge 10.1 into 10.2 2020-09-01 16:20:23 +03:00
mroonga Merge 10.1 into 10.2 2020-06-01 09:33:03 +03:00
myisam Merge 10.1 into 10.2 2020-09-01 16:20:23 +03:00
myisammrg Merge branch '10.1' into 10.2 2020-08-02 11:05:29 +02:00
oqgraph Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
perfschema Fix GCC 10 -Wstringop-truncation 2020-03-13 07:39:14 +02:00
rocksdb Fix a typo in the previous cset 2020-09-04 09:12:27 +00:00
sequence Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
sphinx MDEV-20647 Fix and enable SphinxSE tests 2019-09-30 15:47:09 +03:00
spider MDEV-7098 spider/bg.spider_fixes failed in buildbot with safe_mutex: Trying to unlock mutex conn->mta_conn_mutex that wasn't locked at storage/spider/spd_db_conn.cc, line 671 2020-09-07 10:26:23 +09:00
test_sql_discovery Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
tokudb Merge branch '10.1' into 10.2 2020-08-06 16:47:39 +02:00
xtradb Merge 10.1 into 10.2 2020-09-29 10:04:37 +03:00