mariadb/mysql-test/suite/innodb/t/deadlock_wait_lock_race.test
Sergei Golubchik bead24b7f3 mariadb-test: wait on disconnect
Remove one of the major sources of race condiitons in mariadb-test.
Normally, mariadb_close() sends COM_QUIT to the server and immediately
disconnects. In mariadb-test it means the test can switch to another
connection and sends queries to the server before the server even
started parsing the COM_QUIT packet and these queries can see the
connection as fully active, as it didn't reach dispatch_command yet.

This is a major source of instability in tests and many - but not all,
still less than a half - tests employ workarounds. The correct one
is a pair count_sessions.inc/wait_until_count_sessions.inc.
Also very popular was wait_until_disconnected.inc, which was completely
useless, because it verifies that the connection is closed, and after
disconnect it always is, it didn't verify whether the server processed
COM_QUIT. Sadly the placebo was as widely used as the real thing.

Let's fix this by making mariadb-test `disconnect` command _to wait_ for
the server to confirm. This makes almost all workarounds redundant.

In some cases count_sessions.inc/wait_until_count_sessions.inc is still
needed, though, as only `disconnect` command is changed:

 * after external tools, like `exec $MYSQL`
 * after failed `connect` command
 * replication, after `STOP SLAVE`
 * Federated/CONNECT/SPIDER/etc after `DROP TABLE`

and also in some XA tests, because an XA transaction is dissociated from
the THD very late, after the server has closed the client connection.

Collateral cleanups: fix comments, remove some redundant statements:
 * DROP IF EXISTS if nothing is known to exist
 * DROP table/view before DROP DATABASE
 * REVOKE privileges before DROP USER
 etc
2025-07-16 09:14:33 +07:00

70 lines
3 KiB
Text

--source include/have_innodb.inc
--source include/have_debug_sync.inc
--disable_query_log
call mtr.add_suppression("InnoDB: Transaction was aborted due to ");
--enable_query_log
# Purge can cause deadlock in the test, requesting page's RW_X_LATCH for trx
# ids resetting, after trx 2 acqured RW_S_LATCH and suspended in debug sync point
# lock_trx_handle_wait_enter, waiting for upd_cont signal, which must be
# emitted after the last SELECT in this test. The last SELECT will hang waiting
# for purge RW_X_LATCH releasing, and trx 2 will be rolled back by timeout.
# There is deadlock_report_before_lock_releasing sync point in
# Deadlock::report(), which is waiting for sel_cont signal under
# lock_sys_t lock. The signal must be issued after "UPDATE t SET b = 100"
# rollback, and that rollback is executing undo record, which is blocked on
# dict_sys latch request. dict_sys is locked by the thread of statistics
# update(dict_stats_save()), and during that update lock_sys lock is requested,
# and can't be acquired as Deadlock::report() holds it. We have to disable
# statistics update to make the test stable.
CREATE TABLE t (a int PRIMARY KEY, b int) engine = InnoDB STATS_PERSISTENT=0;
CREATE TABLE t2 (a int PRIMARY KEY) engine = InnoDB STATS_PERSISTENT=0;
INSERT INTO t VALUES (10, 10), (20, 20), (30, 30);
INSERT INTO t2 VALUES (10), (20), (30);
BEGIN; # trx 1
# The following update is necessary to increase the transaction weight, which is
# calculated as the number of locks + the number of undo records during deadlock
# report. Victim's transaction should have minimum weight. We need trx 2 to be
# choosen as victim, that's why we need to increase the current transaction
# weight.
UPDATE t2 SET a = a + 100;
SELECT * FROM t WHERE a = 20 FOR UPDATE;
--connect(con_2,localhost,root,,)
# RC is necessary to do semi-consistent read
SET innodb_snapshot_isolation=OFF;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN; # trx 2
# The first time it will be hit on trying to lock (20,20), the second hit
# will be on (30,30).
SET DEBUG_SYNC = 'lock_trx_handle_wait_before_unlocked_wait_lock_check SIGNAL upd_locked WAIT_FOR upd_cont';
# We must not modify primary key fields to cause rr_sequential() read record
# function choosing in mysql_update(), i.e. both query_plan.using_filesort and
# query_plan.using_io_buffer must be false during init_read_record() call.
--send UPDATE t SET b = 100
--connection default
SET DEBUG_SYNC="now WAIT_FOR upd_locked";
SET DEBUG_SYNC="lock_wait_before_suspend SIGNAL upd_cont";
--send SELECT * FROM t WHERE a = 10 FOR UPDATE
--connection con_2
# If the bug is not fixed, lock_trx_handle_wait() wrongly returns DB_SUCCESS
# instead of DB_DEADLOCK, row_search_mvcc() of trx 2 behaves so as if (20,20)
# was locked. Some debug assertion must crash the server. If the bug is fixed,
# trx 2 must just be rolled back by deadlock detector.
--error ER_LOCK_DEADLOCK
--reap
--disconnect con_2
--connection default
--reap
SET DEBUG_SYNC = 'RESET';
DROP TABLE t;
DROP TABLE t2;