Add a wait-for graph based deadlock detector to the
MDL subsystem.
Fixes bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and
bug #37346 "innodb does not detect deadlock between update and
alter table".
The first bug manifested itself as an unwarranted abort of a
transaction with ER_LOCK_DEADLOCK error by a concurrent ALTER
statement, when this transaction tried to repeat use of a
table, which it has already used in a similar fashion before
ALTER started.
The second bug showed up as a deadlock between table-level
locks and InnoDB row locks, which was "detected" only after
innodb_lock_wait_timeout timeout.
A transaction would start using the table and modify a few
rows.
Then ALTER TABLE would come in, and start copying rows
into a temporary table. Eventually it would stumble on
the modified records and get blocked on a row lock.
The first transaction would try to do more updates, and get
blocked on thr_lock.c lock.
This situation of circular wait would only get resolved
by a timeout.
Both these bugs stemmed from inadequate solutions to the
problem of deadlocks occurring between different
locking subsystems.
In the first case we tried to avoid deadlocks between metadata
locking and table-level locking subsystems, when upgrading shared
metadata lock to exclusive one.
Transactions holding the shared lock on the table and waiting for
some table-level lock used to be aborted too aggressively.
We also allowed ALTER TABLE to start in presence of transactions
that modify the subject table. ALTER TABLE acquires
TL_WRITE_ALLOW_READ lock at start, and that block all writes
against the table (naturally, we don't want any writes to be lost
when switching the old and the new table). TL_WRITE_ALLOW_READ
lock, in turn, would block the started transaction on thr_lock.c
lock, should they do more updates. This, again, lead to the need
to abort such transactions.
The second bug occurred simply because we didn't have any
mechanism to detect deadlocks between the table-level locks
in thr_lock.c and row-level locks in InnoDB, other than
innodb_lock_wait_timeout.
This patch solves both these problems by moving lock conflicts
which are causing these deadlocks into the metadata locking
subsystem, thus making it possible to avoid or detect such
deadlocks inside MDL.
To do this we introduce new type-of-operation-aware metadata
locks, which allow MDL subsystem to know not only the fact that
transaction has used or is going to use some object but also what
kind of operation it has carried out or going to carry out on the
object.
This, along with the addition of a special kind of upgradable
metadata lock, allows ALTER TABLE to wait until all
transactions which has updated the table to go away.
This solves the second issue.
Another special type of upgradable metadata lock is acquired
by LOCK TABLE WRITE. This second lock type allows to solve the
first issue, since abortion of table-level locks in event of
DDL under LOCK TABLES becomes also unnecessary.
Below follows the list of incompatible changes introduced by
this patch:
- From now on, ALTER TABLE and CREATE/DROP TRIGGER SQL (i.e. those
statements that acquire TL_WRITE_ALLOW_READ lock)
wait for all transactions which has *updated* the table to
complete.
- From now on, LOCK TABLES ... WRITE, REPAIR/OPTIMIZE TABLE
(i.e. all statements which acquire TL_WRITE table-level lock) wait
for all transaction which *updated or read* from the table
to complete.
As a consequence, innodb_table_locks=0 option no longer applies
to LOCK TABLES ... WRITE.
- DROP DATABASE, DROP TABLE, RENAME TABLE no longer abort
statements or transactions which use tables being dropped or
renamed, and instead wait for these transactions to complete.
- Since LOCK TABLES WRITE now takes a special metadata lock,
not compatible with with reads or writes against the subject table
and transaction-wide, thr_lock.c deadlock avoidance algorithm
that used to ensure absence of deadlocks between LOCK TABLES
WRITE and other statements is no longer sufficient, even for
MyISAM. The wait-for graph based deadlock detector of MDL
subsystem may sometimes be necessary and is involved. This may
lead to ER_LOCK_DEADLOCK error produced for multi-statement
transactions even if these only use MyISAM:
session 1: session 2:
begin;
update t1 ... lock table t2 write, t1 write;
-- gets a lock on t2, blocks on t1
update t2 ...
(ER_LOCK_DEADLOCK)
- Finally, support of LOW_PRIORITY option for LOCK TABLES ... WRITE
was abandoned.
LOCK TABLE ... LOW_PRIORITY WRITE from now on has the same
priority as the usual LOCK TABLE ... WRITE.
SELECT HIGH PRIORITY no longer trumps LOCK TABLE ... WRITE in
the wait queue.
- We do not take upgradable metadata locks on implicitly
locked tables. So if one has, say, a view v1 that uses
table t1, and issues:
LOCK TABLE v1 WRITE;
FLUSH TABLE t1; -- (or just 'FLUSH TABLES'),
an error is produced.
In order to be able to perform DDL on a table under LOCK TABLES,
the table must be locked explicitly in the LOCK TABLES list.
Bug#42546 Backup: RESTORE fails, thinking it finds an existing table
The problem occured when a MDL locking conflict happened for a non-existent
table between a CREATE and a INSERT statement. The code for CREATE
interpreted this lock conflict to mean that the table existed,
which meant that the statement failed when it should not have.
The problem could occur for CREATE TABLE, CREATE TABLE LIKE and
ALTER TABLE RENAME.
This patch fixes the problem for CREATE TABLE and CREATE TABLE LIKE.
It is based on code backported from the mysql-6.1-fk tree written
by Dmitry Lenev. CREATE now uses normal open_and_lock_tables() code
to acquire exclusive locks. This means that for the test case in the bug
description, CREATE will wait until INSERT completes so that it can
get the exclusive lock. This resolves the reported bug.
The patch also prohibits CREATE TABLE and CREATE TABLE LIKE under
LOCK TABLES. Note that this is an incompatible change and must
be reflected in the documentation. Affected test cases have been
updated.
mdl_sync.test contains tests for CREATE TABLE and CREATE TABLE LIKE.
Fixing the issue for ALTER TABLE RENAME is beyond the scope of this
patch. ALTER TABLE cannot be prohibited from working under LOCK TABLES
as this could seriously impact customers and a proper fix would require
a significant rewrite.
Bug #47249 assert in MDL_global_lock::is_lock_type_compatible
This assert could be triggered if LOCK TABLES were used to lock
both a table and a view that used the same table. The table would have
to be first WRITE locked and then READ locked. So "LOCK TABLES v1
WRITE, t1 READ" would eventually trigger the assert, "LOCK TABLES
v1 READ, t1 WRITE" would not. The reason is that the ordering of locks
in the interal representation made a difference when executing
FLUSH TABLE on the table.
During FLUSH TABLE, a lock was upgraded to exclusive. If this lock
was of type MDL_SHARED and not MDL_SHARED_UPGRADABLE, an internal
counter in the MDL subsystem would get out of sync. This would happen
if the *last* mention of the table in LOCK TABLES was a READ lock.
The counter in question is the number exclusive locks (active or intention).
This is used to make sure a global metadata lock is only taken when the
counter is zero (= no conflicts). The counter is increased when a
MDL_EXCLUSIVE or MDL_SHARED_UPGRADABLE lock is taken, but not when
upgrade_shared_lock_to_exclusive() is used to upgrade directly
from MDL_SHARED to MDL_EXCLUSIVE.
This patch fixes the problem by searching for a TABLE instance locked
with MDL_SHARED_UPGRADABLE or MDL_EXCLUSIVE before calling
upgrade_shared_lock_to_exclusive(). The patch also adds an assert checking
that only MDL_SHARED_UPGRADABLE locks are upgraded to exclusive.
Test case added to lock_multi.test.
------------------------------------------------------------
revno: 3035.4.1
committer: Davi Arnaut <Davi.Arnaut@Sun.COM>
branch nick: 39897-6.0
timestamp: Thu 2009-01-15 12:17:57 -0200
message:
Bug#39897: lock_multi fails in pushbuild: timeout waiting for processlist
The problem is that relying on the "Table lock" thread state in
its current position to detect that a thread is waiting on a lock
is race prone. The "Table lock" state change happens before the
thread actually tries to grab a lock on a table.
The solution is to move the "Table lock" state so that its set
only when a thread is actually going to wait for a lock. The state
change happens after the thread fails to grab the lock (because it
is owned by other thread) and proceeds to wait on a condition.
This is considered part of work related to WL#4284 "Transactional
DDL locking"
Warning: this patch contains an incompatible change.
When waiting on a lock in thr_lock.c, the server used to display "Locked"
processlist state. After this patch, the state is "Table lock".
The new state was actually intended to be display since year 2002,
when Monty added it. But up until removal of thd->locked boolean
member, this state was ignored by SHOW PROCESSLIST code.
------------------------------------------------------------
revno: 2476.1116.1
committer: davi@mysql.com/endora.local
timestamp: Fri 2007-12-14 10:10:19 -0200
message:
DROP TABLE under LOCK TABLES simultaneous to a FLUSH TABLES
WITH READ LOCK (global read lock) can lead to a deadlock.
The solution is to not wait for the global read lock if the
thread is holding any locked tables.
Related to bugs 23713 and 32395. This issues is being fixed
only on 6.0 because it depends on the fix for bug 25858 --
which was fixed only on 6.0.
------------------------------------------------------------
revno: 2476.784.3
committer: davi@moksha.local
timestamp: Tue 2007-10-02 21:27:31 -0300
message:
Bug#25858 Some DROP TABLE under LOCK TABLES can cause deadlocks
When a client (connection) holds a lock on a table and attempts to
drop (obtain a exclusive lock) on a second table that is already
held by a second client and the second client then attempts to
drop the table that is held by the first client, leads to a
circular wait deadlock. This scenario is very similar to trying to
drop (or rename) a table while holding read locks and are
correctly forbidden.
The solution is to allow a drop table operation to continue only
if the table being dropped is write (exclusively) locked, or if
the table is temporary, or if the client is not holding any
locks. Using this scheme prevents the creation of a circular
chain in which each client is waiting for one table that the
next client in the chain is holding.
This is incompatible change, as can be seen by number of tests
cases that needed to be fixed, but is consistent with respect to
behavior of the different scenarios in which the circular wait
might happen.
The problem is that a SELECT .. FOR UPDATE statement might open
a table and later wait for a impeding global read lock without
noticing whether it is holding a table that is being waited upon
the the flush phase of the process that took the global read
lock.
The same problem also affected the following statements:
LOCK TABLES .. WRITE
UPDATE .. SET (update and multi-table update)
TRUNCATE TABLE ..
LOAD DATA ..
The solution is to make the above statements wait for a impending
global read lock before opening the tables. If there is no
impending global read lock, the statement raises a temporary
protection against global read locks and progresses smoothly
towards completion.
Important notice: the patch does not try to address all possible
cases, only those which are common and can be fixed unintrusively
enough for 5.0.
Details for Bug#43015 main.lock_multi: Weak code (sleeps etc.)
-------------------------------------------------------------
- The fix for bug 42003 already removed a lot of the weaknesses mentioned.
- Tests showed that there are unfortunately no improvements of this tests
in MySQL 5.1 which could be ported back to 5.0.
- Remove a superfluous "--sleep 1" around line 195
Details for Bug#43065 main.lock_multi: This test is too big if the disk is slow
-------------------------------------------------------------------------------
- move the subtests for the bugs 38499 and 36691 into separate scripts
- runtime under excessive parallel I/O load after applying the fix
lock_multi [ pass ] 22887
lock_multi_bug38499 [ pass ] 536926
lock_multi_bug38691 [ pass ] 258498
+ Fix for Bug#43114 wait_until_count_sessions too restrictive, random PB failures
+ Removal of a lot of other weaknesses found
+ modifications according to review
derived table cause crash
When a multi-UPDATE command fails to lock some table, and
subsequently succeeds, the tables need to be reopened if
they were altered. But the reopening procedure failed for
derived tables.
Extra cleanup has been added.
``FLUSH TABLES WITH READ LOCK''
Concurrent execution of 1) multitable update with a
NATURAL/USING join and 2) a such query as "FLUSH TABLES
WITH READ LOCK" or "ALTER TABLE" of updating table led
to a server crash.
The mysql_multi_update_prepare() function call is optimized
to lock updating tables only, so it postpones locking to
the last, and if locking fails, it does cleanup of modified
syntax structures and repeats a query analysis. However,
that cleanup procedure was incomplete for NATURAL/USING join
syntax data: 1) some Field_item items pointed into freed
table structures, and 2) the TABLE_LIST::join_columns fields
was not reset.
Major change:
short-living Field *Natural_join_column::table_field has
been replaced with long-living Item*.
The problem is that the Table_locks_waited was incremented only
when the lock request succeed. If a thread waiting for the lock
gets killed or the lock request is aborted, the variable would
not be incremented, leading to inaccurate values in the variable.
The solution is to increment the Table_locks_waited whenever the
lock request is queued. This reflects better the intended behavior
of the variable -- show how many times a lock was waited.
The problem is that some DDL statements (ALTER TABLE, CREATE
TRIGGER, FLUSH TABLES, ...) when under LOCK TABLES need to
momentarily drop the lock, reopen the table and grab the write
lock again (using reopen_tables). When grabbing the lock again,
reopen_tables doesn't pass a flag to mysql_lock_tables in
order to ignore the impending global read lock, which causes a
assertion because LOCK_open is being hold. Also dropping the
lock must not signal to any threads that the table has been
relinquished (related to the locking/flushing protocol).
The solution is to correct the way the table is reopenned
and the locks grabbed. When reopening the table and under
LOCK TABLES, the table version should be set to 0 so other
threads have to wait for the table. When grabbing the lock,
any other flush should be ignored because it's theoretically
a atomic operation. The chosen solution also fixes a potential
discrepancy between binlog and GRL (global read lock) because
table placeholders were being ignored, now a FLUSH TABLES WITH
READ LOCK will properly for table with open placeholders.
It's also important to mention that this patch doesn't fix
a potential deadlock if one uses two GRLs under LOCK TABLES
concurrently.
Kill of a CREATE TABLE source_table LIKE statement waiting for a
name-lock on the source table causes a bad lock interaction.
The mysql_create_like_table() has a bug that if the connection is
killed while waiting for the name-lock on the source table, it will
jump to the wrong error path and try to unlock the source table and
LOCK_open, but both weren't locked.
The solution is to simple return when the name lock request is killed,
it's safe to do so because no lock was acquired and no cleanup is needed.
Original bug report also contains description of other problems
related to this scenario but they either already fixed in 5.1 or
will be addressed separately (see bug report for details).
mysql_ha_open calls mysql_ha_close on the error path (unsupported) to close the (opened) table before inserting it into the tables hash list handler_tables_hash) but mysql_ha_close only closes tables which are on the hash list, causing the table to be left open and locked.
This change moves the table close logic into a separate function that is always called on the error path of mysql_ha_open or on a normal handler close (mysql_ha_close).
The patch changes the test case only.
The fix is to replace all 'sleep's with wait_condition. This makes
the test deterministic and also ~300 times faster.
Addendum fixes after changing the condition variable
for the global read lock.
The stress test suite revealed some deadlocks. Some were
related to the new condition variable (COND_global_read_lock)
and some were general problems with the global read lock.
It is now necessary to signal COND_global_read_lock whenever
COND_refresh is signalled.
We need to wait for the release of a global read lock if one
is set before every operation that requires a write lock.
But we must not wait if we have locked tables by LOCK TABLES.
After setting a global read lock a thread waits until all
write locks are released.
Fixes bug#17264, for alter table on win32 for successfull operation completion
it is used TL_WRITE(=10) lock instead of TL_WRITE_ALLOW_READ(=6), however here
in innodb handler TL_WRTIE is lifted to TL_WRITE_ALLOW_WRITE, which causes
race condition when several clients do alter table simultaneously.
The order of acquiring LOCK_mysql_create_db
and wait_if_global_read_lock() was wrong. It could happen
that a thread held LOCK_mysql_create_db while waiting for
the global read lock to be released. The thread with the
global read lock could try to administrate a database too.
It would first try to lock LOCK_mysql_create_db and hang...
The check if the current thread has the global read lock
is done in wait_if_global_read_lock(), which could not be
reached because of the hang in LOCK_mysql_create_db.
Now I exchanged the order of acquiring LOCK_mysql_create_db
and wait_if_global_read_lock(). This makes
wait_if_global_read_lock() fail with an error message for
the thread with the global read lock. No deadlock happens.
Replaced COND_refresh with COND_global_read_lock becasue of a bug in NTPL threads when using different mutexes as arguments to pthread_cond_wait()
The original code caused a hang in FLUSH TABLES WITH READ LOCK in some circumstances because pthread_cond_broadcast() was not delivered to other threads.
This fixes:
Bug#16986: Deadlock condition with MyISAM tables
Bug#20048: FLUSH TABLES WITH READ LOCK causes a deadlock
Added missing DBUG_xxx_RETURN statements
Fixed some usage of not initialized variables (as found by valgrind)
Ensure that we don't remove locked tables used as name locks from open table cache until unlock_table_names() are called.
This was fixed by having drop_locked_name() returning any table used as a name lock so that we can free it in unlock_table_names()
This will allow Tomas to continue with his work to use namelocks to syncronize things.
Note: valgrind still produces a lot of warnings about using not initialized code and shows memory loss errors when running the ndb tests
Use open_normal_and_derived_tables instead of open_and_lock_tables when reading metadata for a table.
Add two test cases, one for "USE database" and one for "SHOW COLUMNS FROM table"
FOUND is not a reserved keyword anymore
Added Item_field::set_no_const_sub() to be able to mark fields that can't be substituted
Added 'simple_select' method to be able to quickly determinate if a select_result is a normal SELECT
Note that the 5.0 tree is not yet up to date: Sanja will have to fix multi-update-locks for this merge to be complete
bmove_allign -> bmove_align
Added OLAP function ROLLUP
Split mysql_fix_privilege_tables to a script and a .sql data file
Added new (MEMROOT*) functions to avoid calling current_thd() when creating some common objects.
Added table_alias_charset, for easier --lower-case-table-name handling
Better SQL_MODE handling (Setting complex options also sets sub options)
New (faster) assembler string functions for x86