STATUS OF ROLLBACKED TRANSACTION" and bug #17054007 - "TRANSACTION
IS NOT FULLY ROLLED BACK IN CASE OF INNODB DEADLOCK".
The problem in the first bug report was that although deadlock involving
metadata locks was reported using the same error code and message as InnoDB
deadlock it didn't rollback transaction like the latter. This caused
confusion to users as in some cases after ER_LOCK_DEADLOCK transaction
could have been restarted immediately and in some cases rollback was
required.
The problem in the second bug report was that although InnoDB deadlock
caused transaction rollback in all storage engines it didn't cause release
of metadata locks. So concurrent DDL on the tables used in transaction was
blocked until implicit or explicit COMMIT or ROLLBACK was issued in the
connection which got InnoDB deadlock.
The former issue has stemmed from the fact that when support for detection
and reporting metadata locks deadlocks was added we erroneously assumed
that InnoDB doesn't rollback transaction on deadlock but only last statement
(while this is what happens on InnoDB lock timeout actually) and so didn't
implement rollback of transactions on MDL deadlocks.
The latter issue was caused by the fact that rollback of transaction due
to deadlock is carried out by setting THD::transaction_rollback_request
flag at the point where deadlock is detected and performing rollback
inside of trans_rollback_stmt() call when this flag is set. And
trans_rollback_stmt() is not aware of MDL locks, so no MDL locks are
released.
This patch solves these two problems in the following way:
- In case when MDL deadlock is detect transaction rollback is requested
by setting THD::transaction_rollback_request flag.
- Code performing rollback of transaction if THD::transaction_rollback_request
is moved out from trans_rollback_stmt(). Now we handle rollback request
on the same level as we call trans_rollback_stmt() and release statement/
transaction MDL locks.
Following variables do not require LOCK_open protection anymore:
- table_def_cache (renamed to tdc_hash) is protected by rw-lock
LOCK_tdc_hash;
- table_def_shutdown_in_progress doesn't need LOCK_open protection;
- last_table_id use atomics;
- TABLE_SHARE::ref_count (renamed to TABLE_SHARE::tdc.ref_count)
is protected by TABLE_SHARE::tdc.LOCK_table_share;
- TABLE_SHARE::next, ::prev (renamed to tdc.next and tdc.prev),
oldest_unused_share, end_of_unused_share are protected by
LOCK_unused_shares;
- TABLE_SHARE::m_flush_tickets (renamed to tdc.m_flush_tickets)
is protected by TABLE_SHARE::tdc.LOCK_table_share;
- refresh_version (renamed to tdc_version) use atomics.
includes:
* remove some remnants of "Bug#14521864: MYSQL 5.1 TO 5.5 BUGS PARTITIONING"
* introduce LOCK_share, now LOCK_ha_data is strictly for engines
* rea_create_table() always creates .par file (even in "frm-only" mode)
* fix a 5.6 bug, temp file leak on dummy ALTER TABLE
revno: 4559
committer: Marc Alff <marc.alff@oracle.com>
branch nick: mysql-5.6-bug14741537-v4
timestamp: Thu 2012-11-08 22:40:31 +0100
message:
Bug#14741537 - MYSQL 5.6, GTID AND PERFORMANCE_SCHEMA
Before this fix, statements using performance_schema tables:
- were marked as unsafe for replication,
- did cause warnings during execution,
- were written to the binlog, either in STATEMENT or ROW format.
When using replication with the new GTID feature,
unsafe warnings are elevated to errors,
which prevents to use both the performance_schema and GTID together.
The root cause of the problem is not related to raising warnings/errors
in some special cases, but deeper: statements involving the performance
schema should not even be written to the binary log in the first place,
because the content of the performance schema tables is 'local' to a server
instance, and may differ greatly between nodes in a replication
topology.
In particular, the DBA should be able to configure (INSERT, UPDATE, DELETE)
or flush (TRUNCATE) performance schema tables on one node,
without affecting other nodes.
This fix introduces the concept of a 'non-replicated' or 'local' table,
and adjusts the replication logic to ignore tables that are not replicated
when deciding if or how to log a statement to the binlog.
Note that while this issue was detected using the performance_schema,
other tables are also affected by the same problem.
This fix define as 'local' the following tables, which are then never
replicated:
- performance_schema.*
- mysql.general_log
- mysql.slow_log
- mysql.slave_relay_log_info
- mysql.slave_master_info
- mysql.slave_worker_info
Existing behavior for information_schema.* is unchanged by this fix,
to limit the scope of changes.
Coding wise, this fix implements the following changes:
1)
Performance schema tables are not using any replication flags,
since performance schema tables are not replicated.
2)
In open_table_from_share(),
tables with no replication capabilities (performance_schema.*),
tables with TABLE_CATEGORY_LOG (logs)
and tables with TABLE_CATEGORY_RPL_INFO (replication)
are marked as non replicated, with TABLE::no_replicate
3)
A new THD member, THD::m_binlog_filter_state,
indicate if the current statement is written to the binlog
(normal cases for most statements), or is to be discarded
(because the statements affects non replicated tables).
4)
In THD::decide_logging_format(), the replication logic
is changed to take into account non replicated tables.
Statements that affect only non replicated tables are
executed normally (no warning or errors), but not written
to the binlog.
Statements that affect (i.e., write to) a replicated table
while also using (i.e., reading from or writing to) a non replicated table
are executed normally in MIXED and ROW binlog format,
and cause a new error in STATEMENT binlog format.
THD::decide_logging_format() uses THD::m_binlog_filter_state
to indicate if a statement is to be ignored, when writing to
the binlog.
5)
In THD::binlog_query(), statements marked as ignored
are not written to the binary log.
6)
For row based replication, the existing test for 'table->no_replicate',
has been moved from binlog_log_row() to check_table_binlog_row_based().
-Note 1031 Table storage engine for 't1' doesn't have this option
+Note 1031 Table storage engine for 'InnoDB' doesn't have this option
They were caused by a change in MariaDB which changed ER_ILLEGAL_HA message
text to be like:
"Storage engine InnoDB of the table `test`.`t1` doesn't have this option"
Some the error calls were changed to pass new parameters, but some were left
to be old. Also the error text in errmsg-ut8.txt was not changed.
Implement facility for the commit in one thread to wait for the commit of
another to complete first. The wait is done in a way that does not hinder
that a waiter and a waitee can group commit together with a single fsync()
in both binlog and InnoDB. The wait is done efficiently with respect to
locking.
The patch was originally made to support TaoBao parallel replication with
in-order commit; now it will be adapted to also be used for parallel
replication of group-committed transactions.
A waiter THD registers itself with a prior waitee THD. The waiter will then
complete its commit at the earliest in the same group commit of the waitee
(when using binlog). The wait can also be done explicitly by the waitee.
1. DROP DATABASE should use ha_discover_table_names(), not look at .frm files.
2. filename_to_tablename() also encodes temp file names #sql- -> #mysql50##sql
3. no special treatment for #sql- files, no TABLE_LIST::internal_tmp_table
4. discover also table file names, that start from #
-Added test and extra code to ensure we don't leave keyread on for a handler table.
-Create on disk temporary files always with long data pointers if SQL_SMALL_RESULT is not used. This ensures that we can handle temporary files bigger than 4G.
mysql-test/include/default_mysqld.cnf:
Run test suite with smaller aria keybuffer size
mysql-test/suite/maria/maria3.result:
Run test suite with smaller aria keybuffer size
mysql-test/suite/sys_vars/r/aria_pagecache_buffer_size_basic.result:
Run test suite with smaller aria keybuffer size
sql/handler.cc:
Disable key read (extra safety if something went wrong)
sql/multi_range_read.cc:
Ensure we have don't leave keyread on for secondary_file
sql/opt_range.cc:
Simplify code with mark_columns_used_by_index_no_reset()
Ensure that read_keys_and_merge() disableds keyread if it enables it
sql/opt_subselect.cc:
Remove not anymore used argument for create_internal_tmp_table()
sql/sql_derived.cc:
Remove not anymore used argument for create_internal_tmp_table()
sql/sql_select.cc:
Use 'enable_keyread()' instead of calling HA_EXTRA_RESET. (Makes debugging easier)
Create on disk temporary files always with long data pointers if SQL_SMALL_RESULT is not used. This ensures that we can handle temporary files bigger than 4G.
Remove not anymore used argument for create_internal_tmp_table()
More DBUG
sql/sql_select.h:
Remove not anymore used argument for create_internal_tmp_table()
* print "table doesn't exist in engine" when a table doesn't exist in the engine,
instead of "file not found" (if no file was involved)
* print a complete filename that cannot be found ('t1.MYI', not 't1')
* it's not an error for a DROP if a table doesn't exist in the engine (or some table
files cannot be found) - if the DROP succeeded regardless