Bug#54678: InnoDB, TRUNCATE, ALTER, I_S SELECT, crash or deadlock
- Incompatible change: truncate no longer resorts to a row by
row delete if the storage engine does not support the truncate
method. Consequently, the count of affected rows does not, in
any case, reflect the actual number of rows.
- Incompatible change: it is no longer possible to truncate a
table that participates as a parent in a foreign key constraint,
unless it is a self-referencing constraint (both parent and child
are in the same table). To work around this incompatible change
and still be able to truncate such tables, disable foreign checks
with SET foreign_key_checks=0 before truncate. Alternatively, if
foreign key checks are necessary, please use a DELETE statement
without a WHERE condition.
Problem description:
The problem was that for storage engines that do not support
truncate table via a external drop and recreate, such as InnoDB
which implements truncate via a internal drop and recreate, the
delete_all_rows method could be invoked with a shared metadata
lock, causing problems if the engine needed exclusive access
to some internal metadata. This problem originated with the
fact that there is no truncate specific handler method, which
ended up leading to a abuse of the delete_all_rows method that
is primarily used for delete operations without a condition.
Solution:
The solution is to introduce a truncate handler method that is
invoked when the engine does not support truncation via a table
drop and recreate. This method is invoked under a exclusive
metadata lock, so that there is only a single instance of the
table when the method is invoked.
Also, the method is not invoked and a error is thrown if
the table is a parent in a non-self-referencing foreign key
relationship. This was necessary to avoid inconsistency as
some integrity checks are bypassed. This is inline with the
fact that truncate is primarily a DDL operation that was
designed to quickly remove all data from a table.
LOAD DATA into partitioned MyISAM table
Problem was that both partitioning and myisam
used the same table_share->mutex for different protections
(auto inc and repair).
Solved by adding a specific mutex for the partitioning
auto_increment.
Also adding destroying the ha_data structure in
free_table_share (which is to be propagated
into 5.5).
This is a 5.1 ONLY patch, already fixed in 5.5+.
Bug#57113: ha_partition::extra(ha_extra_function):
Assertion `m_extra_cache' failed
Fix for bug#55458 included DBUG_ASSERTS causing
debug builds of the server to crash on
another multi-table update.
Removed the asserts since they where wrong.
(updated after testing the patch in 5.5).
Problem was that the handler call ::extra(HA_EXTRA_CACHE) was cached
but the ::extra(HA_EXTRA_PREPARE_FOR_UPDATE) was not.
Solution was to also cache the other call and forward it when moving
to a new partition to scan.
The handler function for reading one row from a specific index
was not optimized in the partitioning handler since it
used the default implementation.
No test case since it is performance only, verified by hand.
Essentially, the problem is that safemalloc is excruciatingly
slow as it checks all allocated blocks for overrun at each
memory management primitive, yielding a almost exponential
slowdown for the memory management functions (malloc, realloc,
free). The overrun check basically consists of verifying some
bytes of a block for certain magic keys, which catches some
simple forms of overrun. Another minor problem is violation
of aliasing rules and that its own internal list of blocks
is prone to corruption.
Another issue with safemalloc is rather the maintenance cost
as the tool has a significant impact on the server code.
Given the magnitude of memory debuggers available nowadays,
especially those that are provided with the platform malloc
implementation, maintenance of a in-house and largely obsolete
memory debugger becomes a burden that is not worth the effort
due to its slowness and lack of support for detecting more
common forms of heap corruption.
Since there are third-party tools that can provide the same
functionality at a lower or comparable performance cost, the
solution is to simply remove safemalloc. Third-party tools
can provide the same functionality at a lower or comparable
performance cost.
The removal of safemalloc also allows a simplification of the
malloc wrappers, removing quite a bit of kludge: redefinition
of my_malloc, my_free and the removal of the unused second
argument of my_free. Since free() always check whether the
supplied pointer is null, redudant checks are also removed.
Also, this patch adds unit testing for my_malloc and moves
my_realloc implementation into the same file as the other
memory allocation primitives.
Conflicts:
Text conflict in mysql-test/r/archive.result
Contents conflict in mysql-test/r/innodb_bug38231.result
Text conflict in mysql-test/r/mdl_sync.result
Text conflict in mysql-test/suite/binlog/t/disabled.def
Text conflict in mysql-test/suite/rpl_ndb/r/rpl_ndb_binlog_format_errors.result
Text conflict in mysql-test/t/archive.test
Contents conflict in mysql-test/t/innodb_bug38231.test
Text conflict in mysql-test/t/mdl_sync.test
Text conflict in sql/sp_head.cc
Text conflict in sql/sql_show.cc
Text conflict in sql/table.cc
Text conflict in sql/table.h
SELECT and ALTER TABLE ... REBUILD PARTITION".
ALTER TABLE on InnoDB table (including partitioned tables)
acquired exclusive locks on rows of table being altered.
In cases when there was concurrent transaction which did
locking reads from this table this sometimes led to a
deadlock which was not detected by MDL subsystem nor by
InnoDB engine (and was reported only after exceeding
innodb_lock_wait_timeout).
This problem stemmed from the fact that ALTER TABLE acquired
TL_WRITE_ALLOW_READ lock on table being altered. This lock
was interpreted as a write lock and thus for table being
altered handler::external_lock() method was called with
F_WRLCK as an argument. As result InnoDB engine treated
ALTER TABLE as an operation which is going to change data
and acquired LOCK_X locks on rows being read from old
version of table.
In case when there was a transaction which already acquired
SR metadata lock on table and some LOCK_S locks on its rows
(e.g. by using it in subquery of DML statement) concurrent
ALTER TABLE was blocked at the moment when it tried to
acquire LOCK_X lock before reading one of these rows.
The transaction's attempt to acquire SW metadata lock on
table being altered led to deadlock, since it had to wait
for ALTER TABLE to release SNW lock. This deadlock was not
detected and got resolved only after timeout expiring
because waiting were happening in two different subsystems.
Similar deadlocks could have occured in other situations.
This patch tries to solve the problem by changing ALTER TABLE
implementation to use TL_READ_NO_INSERT lock instead of
TL_WRITE_ALLOW_READ. After this step handler::external_lock()
is called with F_RDLCK as an argument and InnoDB engine
correctly interprets ALTER TABLE as operation which only
reads data from original version of table. Thanks to this
ALTER TABLE acquires only LOCK_S locks on rows it reads.
This, in its turn, causes inter-subsystem deadlocks to go
away, as all potential lock conflicts and thus deadlocks will
be limited to metadata locking subsystem:
- When ALTER TABLE reads rows from table being altered it
can't encounter any locks which conflict with LOCK_S row
locks. There should be no concurrent transactions holding
LOCK_X row locks. Such a transaction should have been
acquired SW metadata lock on table first which would have
conflicted with ALTER's SNW lock.
- Vice versa, when DML which runs concurrently with ALTER
TABLE tries to lock row it should be requesting only LOCK_S
lock which is compatible with locks acquired by ALTER,
as otherwise such DML must own an SW metadata lock on table
which would be incompatible with ALTER's SNW lock.
This patch:
- Moves all definitions from the mysql_priv.h file into
header files for the component where the variable is
defined
- Creates header files if the component lacks one
- Eliminates all include directives from mysql_priv.h
- Eliminates all circular include cycles
- Rename time.cc to sql_time.cc
- Rename mysql_priv.h to sql_priv.h
into partitioned MyISAM table
Problem was that the ha_data structure was introduced in 5.1
and only used for partitioning first, but with the intention
of be of use for others engines as well, and when used by other
engines it would clash if it also was partitioned.
Solution is to move the partitioning specific data to a separate
structure, with its own mutex (which is used for auto_increment).
Also did rename PARTITION_INFO to PARTITION_STATS since there
already exist a class named partition_info, also cleaned up
some related variables.
Conflicts:
Text conflict in client/mysqlbinlog.cc
Text conflict in mysql-test/Makefile.am
Text conflict in mysql-test/collections/default.daily
Text conflict in mysql-test/r/mysqlbinlog_row_innodb.result
Text conflict in mysql-test/suite/rpl/r/rpl_typeconv_innodb.result
Text conflict in mysql-test/suite/rpl/t/rpl_get_master_version_and_clock.test
Text conflict in mysql-test/suite/rpl/t/rpl_row_create_table.test
Text conflict in mysql-test/suite/rpl/t/rpl_slave_skip.test
Text conflict in mysql-test/suite/rpl/t/rpl_typeconv_innodb.test
Text conflict in mysys/charset.c
Text conflict in sql/field.cc
Text conflict in sql/field.h
Text conflict in sql/item.h
Text conflict in sql/item_func.cc
Text conflict in sql/log.cc
Text conflict in sql/log_event.cc
Text conflict in sql/log_event_old.cc
Text conflict in sql/mysqld.cc
Text conflict in sql/rpl_utility.cc
Text conflict in sql/rpl_utility.h
Text conflict in sql/set_var.cc
Text conflict in sql/share/Makefile.am
Text conflict in sql/sql_delete.cc
Text conflict in sql/sql_plugin.cc
Text conflict in sql/sql_select.cc
Text conflict in sql/sql_table.cc
Text conflict in storage/example/ha_example.h
Text conflict in storage/federated/ha_federated.cc
Text conflict in storage/myisammrg/ha_myisammrg.cc
Text conflict in storage/myisammrg/myrg_open.c
Conflicts:
Text conflict in mysql-test/r/partition_innodb.result
Text conflict in sql/field.h
Text conflict in sql/item.h
Text conflict in sql/item_cmpfunc.h
Text conflict in sql/item_sum.h
Text conflict in sql/log_event_old.cc
Text conflict in sql/protocol.cc
Text conflict in sql/sql_select.cc
Text conflict in sql/sql_yacc.yy
concurrent I_S query
There were two problem:
1) MYSQL_LOCK_IGNORE_FLUSH also ignored name locks
2) there was a race between abort_and_upgrade_locks and
alter_close_tables
(i.e. remove_table_from_cache and
close_data_files_and_morph_locks)
Which allowed the table to be opened with MYSQL_LOCK_IGNORE_FLUSH flag
resulting in renaming a partition that was already in use,
which could cause the table to be unusable.
Solution was to not allow IGNORE_FLUSH to skip waiting for
a named locked table.
And to not release the LOCK_open mutex between the
calls to remove_table_from_cache and
close_data_files_and_morph_locks by merging the functions
abort_and_upgrade_locks and alter_close_tables.
auto_increment on duplicate entry
The bug was that when INSERT_ID was used and the storage
engine was told to release any reserved but not used
auto_increment values, it set the highest auto_increment
value to INSERT_ID.
The fix was to check if the auto_increment value was forced
by user (INSERT_ID) or by slave-thread, i.e. not auto-
generated. So that it is only allowed to release generated
values.
Problem was block_size on partitioned tables was not set,
resulting in keys_per_block was not correct which affects
the cost calculation for read time of indexes (including
cost for group min/max).Which resulted in a bad optimizer
decision.
Fixed by setting stats.block_size correctly.
There was two problems:
The first was the symptom, caused by bad error handling in
ha_partition. It did not handle print_error etc. when
having no partitions (when used by dummy handler).
The second was the real problem that when dropping tables
it reused the table type (storage engine) from when the lock
was asked for, not the table type that it had when gaining
the exclusive name lock. So that it tried to delete tables
from wrong storage engines.
Solutions for the first problem was to accept some handler
calls to the partitioning handler even if it was not setup
with any partitions, and also if possible fallback
to use the base handler's default functions.
Solution for the second problem was to remove the optimization
to reuse the definition from the cache, instead always check
the frm-file when holding the LOCK_open mutex
(updated with a fix for a debug print crash and better
comments as required by reviewer, and removed optimization
to avoid reading the frm-file).