bug #57006 "Deadlock between HANDLER and FLUSH TABLES WITH READ
LOCK" and bug #54673 "It takes too long to get readlock for
'FLUSH TABLES WITH READ LOCK'".
The first bug manifested itself as a deadlock which occurred
when a connection, which had some table open through HANDLER
statement, tried to update some data through DML statement
while another connection tried to execute FLUSH TABLES WITH
READ LOCK concurrently.
What happened was that FTWRL in the second connection managed
to perform first step of GRL acquisition and thus blocked all
upcoming DML. After that it started to wait for table open
through HANDLER statement to be flushed. When the first connection
tried to execute DML it has started to wait for GRL/the second
connection creating deadlock.
The second bug manifested itself as starvation of FLUSH TABLES
WITH READ LOCK statements in cases when there was a constant
stream of concurrent DML statements (in two or more
connections).
This has happened because requests for protection against GRL
which were acquired by DML statements were ignoring presence of
pending GRL and thus the latter was starved.
This patch solves both these problems by re-implementing GRL
using metadata locks.
Similar to the old implementation acquisition of GRL in new
implementation is two-step. During the first step we block
all concurrent DML and DDL statements by acquiring global S
metadata lock (each DML and DDL statement acquires global IX
lock for its duration). During the second step we block commits
by acquiring global S lock in COMMIT namespace (commit code
acquires global IX lock in this namespace).
Note that unlike in old implementation acquisition of
protection against GRL in DML and DDL is semi-automatic.
We assume that any statement which should be blocked by GRL
will either open and acquires write-lock on tables or acquires
metadata locks on objects it is going to modify. For any such
statement global IX metadata lock is automatically acquired
for its duration.
The first problem is solved because waits for GRL become
visible to deadlock detector in metadata locking subsystem
and thus deadlocks like one in the first bug become impossible.
The second problem is solved because global S locks which
are used for GRL implementation are given preference over
IX locks which are acquired by concurrent DML (and we can
switch to fair scheduling in future if needed).
Important change:
FTWRL/GRL no longer blocks DML and DDL on temporary tables.
Before this patch behavior was not consistent in this respect:
in some cases DML/DDL statements on temporary tables were
blocked while in others they were not. Since the main use cases
for FTWRL are various forms of backups and temporary tables are
not preserved during backups we have opted for consistently
allowing DML/DDL on temporary tables during FTWRL/GRL.
Important change:
This patch changes thread state names which are used when
DML/DDL of FTWRL is waiting for global read lock. It is now
either "Waiting for global read lock" or "Waiting for commit
lock" depending on the stage on which FTWRL is.
Incompatible change:
To solve deadlock in events code which was exposed by this
patch we have to replace LOCK_event_metadata mutex with
metadata locks on events. As result we have to prohibit
DDL on events under LOCK TABLES.
This patch also adds extensive test coverage for interaction
of DML/DDL and FTWRL.
Performance of new and old global read lock implementations
in sysbench tests were compared. There were no significant
difference between new and old implementations.
------------------------------------------------------------
revno: 2617.68.10
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg46673
timestamp: Tue 2009-09-01 19:57:05 +0400
message:
Fix for bug #46673 "Deadlock between FLUSH TABLES WITH READ LOCK and DML".
Deadlocks occured when one concurrently executed transactions with
several statements modifying data and FLUSH TABLES WITH READ LOCK
statement or SET READ_ONLY=1 statement.
These deadlocks were introduced by the patch for WL 4284: "Transactional
DDL locking"/Bug 989: "If DROP TABLE while there's an active transaction,
wrong binlog order" which has changed FLUSH TABLES WITH READ LOCK/SET
READ_ONLY=1 to wait for pending transactions.
What happened was that FLUSH TABLES WITH READ LOCK blocked all further
statements changing tables by setting global_read_lock global variable
and has started waiting for all pending transactions to complete.
Then one of those transactions tried to executed DML, detected that
global_read_lock non-zero and tried to wait until global read lock will
be released (i.e. global_read_lock becomes 0), indeed, this led to a
deadlock.
Proper solution for this problem should probably involve full integration
of global read lock with metadata locking subsystem (which will allow to
implement waiting for pending transactions without blocking DML in them).
But since it requires significant changes another, short-term solution
for the problem is implemented in this patch.
Basically, this patch restores behavior of FLUSH TABLES WITH READ LOCK/
SET READ_ONLY=1 before the patch for WL 4284/bug 989. By ensuring that
extra references to TABLE_SHARE are not stored for active metadata locks
it changes these statements not to wait for pending transactions.
As result deadlock is eliminated.
Note that this does not change the fact that active FLUSH TABLES WITH
READ LOCK lock or SET READ_ONLY=1 prevent modifications to tables as
they also block transaction commits.
2617.31.12, 2617.31.15, 2617.31.15, 2617.31.16, 2617.43.1
- initial changeset that introduced the fix for
Bug#989 and follow up fixes for all test suite failures
introduced in the initial changeset.
------------------------------------------------------------
revno: 2617.31.1
committer: Davi Arnaut <Davi.Arnaut@Sun.COM>
branch nick: 4284-6.0
timestamp: Fri 2009-03-06 19:17:00 -0300
message:
Bug#989: If DROP TABLE while there's an active transaction, wrong binlog order
WL#4284: Transactional DDL locking
Currently the MySQL server does not keep metadata locks on
schema objects for the duration of a transaction, thus failing
to guarantee the integrity of the schema objects being used
during the transaction and to protect then from concurrent
DDL operations. This also poses a problem for replication as
a DDL operation might be replicated even thought there are
active transactions using the object being modified.
The solution is to defer the release of metadata locks until
a active transaction is either committed or rolled back. This
prevents other statements from modifying the table for the
entire duration of the transaction. This provides commitment
ordering for guaranteeing serializability across multiple
transactions.
- Incompatible change:
If MySQL's metadata locking system encounters a lock conflict,
the usual schema is to use the try and back-off technique to
avoid deadlocks -- this schema consists in releasing all locks
and trying to acquire them all in one go.
But in a transactional context this algorithm can't be utilized
as its not possible to release locks acquired during the course
of the transaction without breaking the transaction commitments.
To avoid deadlocks in this case, the ER_LOCK_DEADLOCK will be
returned if a lock conflict is encountered during a transaction.
Let's consider an example:
A transaction has two statements that modify table t1, then table
t2, and then commits. The first statement of the transaction will
acquire a shared metadata lock on table t1, and it will be kept
utill COMMIT to ensure serializability.
At the moment when the second statement attempts to acquire a
shared metadata lock on t2, a concurrent ALTER or DROP statement
might have locked t2 exclusively. The prescription of the current
locking protocol is that the acquirer of the shared lock backs off
-- gives up all his current locks and retries. This implies that
the entire multi-statement transaction has to be rolled back.
- Incompatible change:
FLUSH commands such as FLUSH PRIVILEGES and FLUSH TABLES WITH READ
LOCK won't cause locked tables to be implicitly unlocked anymore.
+ Fix for Bug#43114 wait_until_count_sessions too restrictive, random PB failures
+ Removal of a lot of other weaknesses found
+ modifications according to review
(back to behaviour of 4.1.7). Warning was not fatal: mysqldump continued. And the good thing is that it helped spot that starting from 4.1.7,
SHOW CREATE DATABASE failed (if --single-transaction and first db has non-empty InnoDB table and there is a second db) and thus mysqldump
produced CREATE DATABASE statements missing the CHARACTER SET clause. Removing the bug which was in the server, and the warning reporting in
mysqldump (compatibility with old servers).
(originally reported as "second run of innobackup hangs forever and can even hang server").
Plus testcase for the bugfix and comments about global read locks.
in a deadlock-free manner. This splits locking the global read lock in two steps.
This fixes a consequence of this bug, known as:
BUG#4953 'mysqldump --master-data may report incorrect binlog position if using InnoDB'
And a test.