(like DROP TABLE) has been scheduled before conflicting DDL's (like INSERT)
are commited.
What makes these bugs hard to detect is that in most cases any wrong
schduling are caught by MDL locks. It's only when there are timing issues
that the bugs (usually deadlocks) are noticed.
This patch adds a DBUG_ASSERT() that detects, in parallel replication,
if a DDL is scheduled before any depending DML'S are commited.
It does this be checking if there are any conflicting replication locks
when the DDL is about to wait for getting it's MDL lock.
I also did some minor code cleanups in sql_base.cc to make this code
similar to other related code.
THAT ACTUALLY EXISTS
ANALYSIS:
=========
Stored functions updating a view where the view table has a
trigger defined that updates another table, fails reporting
an error that the table doesn't exist.
If there is a trigger defined on a table, a variable
'trg_event_map' will be set to a non-zero value after the
parsed tree creation. This indicates what triggers we need to
pre-load for the TABLE_LIST when opening an associated table.
During the prelocking phase, the variable 'trg_event_map'
will not be set for the view table. This value will be set
after the processing of triggers defined on the table. During
the processing of sub-statements, 'locked_tables_mode' will be
set to 'LTM_PRELOCKED' which denotes that further locking
of tables/functions cannot be done. This results in the other
table not being locked and thus further processing results in
an error getting reported.
FIX:
====
During the prelocking of view, the value of 'trg_event_map'
of the view is copied to 'trg_event_map' of the next table
in the TABLE_LIST. This results in the locking of tables
associated with the trigger as well.
On shutdown feedback was sending a short report without creating
a THD. At that point current_thd was pointing to the already
destroyed THD from the previous full report.
backport from 10.1:
commit bfe703a
Author: Sergei Golubchik <serg@mariadb.org>
Date: Tue Feb 3 18:19:56 2015 +0100
don't let current_thd to point to a destroyed THD
Problem & Analysis: If DML invokes a trigger or a
stored function that inserts into an AUTO_INCREMENT column,
that DML has to be marked as 'unsafe' statement. If the
tables are locked in the transaction prior to DML statement
(using LOCK TABLES), then the same statement is not marked as
'unsafe' statement. The logic of checking whether unsafeness
is protected with if (!thd->locked_tables_mode). Hence if
we lock the tables prior to DML statement, it is *not* entering
into this if condition. Hence the statement is not marked
as unsafe statement.
Fix: Irrespective of locked_tables_mode value, the unsafeness
check should be done. Now with this patch, the code is moved
out to 'decide_logging_format()' function where all these checks
are happening and also with out 'if(!thd->locked_tables_mode)'.
Along with the specified test case in the bug scenario
(BINLOG_STMT_UNSAFE_AUTOINC_COLUMNS), we also identified that
other cases BINLOG_STMT_UNSAFE_AUTOINC_NOT_FIRST,
BINLOG_STMT_UNSAFE_WRITE_AUTOINC_SELECT, BINLOG_STMT_UNSAFE_INSERT_TWO_KEYS
are also protected with thd->locked_tables_mode which is not right. All
of those checks also moved to 'decide_logging_format()' function.
HA_MYISAMMRG.CC:631
Analysis
========
Any attempt to open a temporary MyISAM merge table consisting
of a view in its list of tables (not the last table in the list)
under LOCK TABLES causes the server to exit.
Current implementation doesn't perform sanity checks during
merge table creation. This allows merge table to be created
with incompatible tables (table with non-myisam engine),
views or even with table doesn't exist in the system.
During view open, check to verify whether requested view
is part of a merge table is missing under LOCK TABLES path
in open_table(). This leads to opening of underlying table
with parent_l having NULL value. Later when attaching child
tables to parent, this hits an ASSERT as all child tables
should have parent_l pointing to merge parent. If the operation
does not happen under LOCK TABLES mode, open_table() checks
for view's parent_l and returns error.
Fix:
======
Check added before opening view Under LOCK TABLES in open_table()
to verify whether it is part of merge table. Error is returned
if the view is part of a merge table.
find_item_in_list() now recognize view fields like a fields even if they rever to an expression.
The problem of schema name do not taken into account for field with it and
derived table fixed.
Duplicating code removed
Problem: Not all permanent Item_direct_view_ref was in permanent list of used items of the view.
Solution: Detect creating permenent view/derived table reference and put them in the permanent list at once.
Alternative fix that doesn't cause view.test crash in --ps:
Remember when Item_ref was fixed right in the constructor
and did not have a full Item_ref::fix_fields() call. Later
in PS/SP, after Item_ref::cleanup, we use this knowledge
to avoid doing full fix_fields() for items that were never
supposed to be fix_field'ed.
Simplify the test case.
SELECT ... WHERE XX IN (SELECT YY)
this was transformed to something like:
SELECT ... WHERE IF_EXISTS(SELECT ... HAVING XX=YY)
The bug was that for normal execution XX was fixed in the original outer SELECT context while in PS it was fixed in the sub query context and this confused the optimizer.
Fixed by ensuring that XX is always fixed in the outer context.
AVOID DEADLOCK AFTER RESTORE
Analysis
--------
Accessing the restored NDB table in an active multi-statement
transaction was resulting in deadlock found error.
MySQL Server needs to discover metadata of NDB table from
data nodes after table is restored from backup. Metadata
discovery happens on the first access to restored table.
Current code mandates this statement to be the first one
in the transaction. This is because discover needs exclusive
metadata lock on the table. Lock upgrade at this point can
lead to MDL deadlock and the code was written at the time
when MDL deadlock detector was not present. In case when
discovery attempted in the statement other than the first
one in transaction ER_LOCK_DEADLOCK error is reported
pessimistically.
Fix:
---
Removed the constraint as any potential deadlock will be
handled by deadlock detector. Also changed code in discover
to keep metadata locks of active transaction.
Same issue was present in table auto repair scenario. Same
fix is added in repair path also.
This was a regression from the patch for MDEV-7668.
A test was incorrect, so the slave would not properly handle re-using
temporary tables, which lead to replication failure in this case.
It is possible for Item_field to have a NULL field_name. This is true if
the Item_field is created based on a field in a temporary table that has
no name. It is thus necessary to do a null check before attempting a
strcmp.
Make sure that in parallel replication, we execute wait_for_prior_commit()
before setting table->in_use for a temporary table. Otherwise we can end up
with two parallel replication worker threads competing with each other for
use of a temporary table.
Re-factor the use of find_temporary_table() to be able to handle errors
in the caller (as wait_for_prior_commit() can return error in case of
deadlock kill).
[This commit cherry-picked to be able to merge MDEV-7936, of which it
is a pre-requisite, into both 10.0 and 10.1.]
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.
In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.
More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.
This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.
(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).
Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.
This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
Parallel replication depends on locking (table locks, row locks, etc.) to
prevent two conflicting transactions from running and committing in parallel.
But temporary tables are designed to be visible only to one thread, and have
no such locking.
In the concrete issue, an intermediate master could commit a CREATE TEMPORARY
TABLE in the same group commit as in INSERT into that table. Thus, a
lower-level master could attempt to run them in parallel and get an error.
More generally, we need protection from parallel replication trying to run
transactions in parallel that access a common temporary table.
This patch simply causes use of a temporary table from parallel replication
to wait for all previous transactions to commit, serialising the replication
at that point.
(A more fine-grained locking could be added later, possibly. However,
using temporary tables in statement-based replication is in any case
normally undesirable; for example a restart of the server will lose
temporary tables and can break replication).
Note that row-based replication is not affected, as it does not do any
temporary tables on the slave-side.
This patch also cleans up the locking around protecting the list of
temporary tables in Relay_log_info. This used to take the
rli->data_lock at the end of every statement, which is very bad for
concurrency. With this patch, the lock is not taken unless temporary
tables (with statement-based binlogging) are in use on the slave.
Do not use merge_for_insert for commands which use SELECT because optimizer can't work with such tables.
Fixes which makes multi-delete working with normally merged views.
The reason for the failure was a bug in an include file on debian that causes 'struct stat'
to have different sized depending on the environment.
This patch fixes so that we always include my_global.h or my_config.h before we include any other files.
Other things:
- Removed #include <my_global.h> in some include files; Better to always do this at the top level to have as few
"always-include-this-file-first' files as possible.
- Removed usage of some include files that where already included by my_global.h or by other files.
client/mysql_plugin.c:
Use my_global.h first
client/mysqlslap.c:
Remove duplicated include files
extra/comp_err.c:
Remove duplicated include files
include/m_string.h:
Remove duplicated include files
include/maria.h:
Remove duplicated include files
libmysqld/emb_qcache.cc:
Use my_global.h first
plugin/semisync/semisync.h:
Use my_pthread.h first
sql/datadict.cc:
Use my_global.h first
sql/debug_sync.cc:
Use my_global.h first
sql/derror.cc:
Use my_global.h first
sql/des_key_file.cc:
Use my_global.h first
sql/discover.cc:
Use my_global.h first
sql/event_data_objects.cc:
Use my_global.h first
sql/event_db_repository.cc:
Use my_global.h first
sql/event_parse_data.cc:
Use my_global.h first
sql/event_queue.cc:
Use my_global.h first
sql/event_scheduler.cc:
Use my_global.h first
sql/events.cc:
Use my_global.h first
sql/field.cc:
Use my_global.h first
Remove duplicated include files
sql/field_conv.cc:
Use my_global.h first
sql/filesort.cc:
Use my_global.h first
Remove duplicated include files
sql/gstream.cc:
Use my_global.h first
sql/ha_ndbcluster.cc:
Use my_global.h first
sql/ha_ndbcluster_binlog.cc:
Use my_global.h first
sql/ha_ndbcluster_cond.cc:
Use my_global.h first
sql/ha_partition.cc:
Use my_global.h first
sql/handler.cc:
Use my_global.h first
sql/hash_filo.cc:
Use my_global.h first
sql/hostname.cc:
Use my_global.h first
sql/init.cc:
Use my_global.h first
sql/item.cc:
Use my_global.h first
sql/item_buff.cc:
Use my_global.h first
sql/item_cmpfunc.cc:
Use my_global.h first
sql/item_create.cc:
Use my_global.h first
sql/item_geofunc.cc:
Use my_global.h first
sql/item_inetfunc.cc:
Use my_global.h first
sql/item_row.cc:
Use my_global.h first
sql/item_strfunc.cc:
Use my_global.h first
sql/item_subselect.cc:
Use my_global.h first
sql/item_sum.cc:
Use my_global.h first
sql/item_timefunc.cc:
Use my_global.h first
sql/item_xmlfunc.cc:
Use my_global.h first
sql/key.cc:
Use my_global.h first
sql/lock.cc:
Use my_global.h first
sql/log.cc:
Use my_global.h first
sql/log_event.cc:
Use my_global.h first
sql/log_event_old.cc:
Use my_global.h first
sql/mf_iocache.cc:
Use my_global.h first
sql/mysql_install_db.cc:
Remove duplicated include files
sql/mysqld.cc:
Remove duplicated include files
sql/net_serv.cc:
Remove duplicated include files
sql/opt_range.cc:
Use my_global.h first
sql/opt_subselect.cc:
Use my_global.h first
sql/opt_sum.cc:
Use my_global.h first
sql/parse_file.cc:
Use my_global.h first
sql/partition_info.cc:
Use my_global.h first
sql/procedure.cc:
Use my_global.h first
sql/protocol.cc:
Use my_global.h first
sql/records.cc:
Use my_global.h first
sql/records.h:
Don't include my_global.h
Better to do this at the upper level
sql/repl_failsafe.cc:
Use my_global.h first
sql/rpl_filter.cc:
Use my_global.h first
sql/rpl_gtid.cc:
Use my_global.h first
sql/rpl_handler.cc:
Use my_global.h first
sql/rpl_injector.cc:
Use my_global.h first
sql/rpl_record.cc:
Use my_global.h first
sql/rpl_record_old.cc:
Use my_global.h first
sql/rpl_reporting.cc:
Use my_global.h first
sql/rpl_rli.cc:
Use my_global.h first
sql/rpl_tblmap.cc:
Use my_global.h first
sql/rpl_utility.cc:
Use my_global.h first
sql/set_var.cc:
Added comment
sql/slave.cc:
Use my_global.h first
sql/sp.cc:
Use my_global.h first
sql/sp_cache.cc:
Use my_global.h first
sql/sp_head.cc:
Use my_global.h first
sql/sp_pcontext.cc:
Use my_global.h first
sql/sp_rcontext.cc:
Use my_global.h first
sql/spatial.cc:
Use my_global.h first
sql/sql_acl.cc:
Use my_global.h first
sql/sql_admin.cc:
Use my_global.h first
sql/sql_analyse.cc:
Use my_global.h first
sql/sql_audit.cc:
Use my_global.h first
sql/sql_base.cc:
Use my_global.h first
sql/sql_binlog.cc:
Use my_global.h first
sql/sql_bootstrap.cc:
Use my_global.h first
Use my_global.h first
sql/sql_cache.cc:
Use my_global.h first
sql/sql_class.cc:
Use my_global.h first
sql/sql_client.cc:
Use my_global.h first
sql/sql_connect.cc:
Use my_global.h first
sql/sql_crypt.cc:
Use my_global.h first
sql/sql_cursor.cc:
Use my_global.h first
sql/sql_db.cc:
Use my_global.h first
sql/sql_delete.cc:
Use my_global.h first
sql/sql_derived.cc:
Use my_global.h first
sql/sql_do.cc:
Use my_global.h first
sql/sql_error.cc:
Use my_global.h first
sql/sql_explain.cc:
Use my_global.h first
sql/sql_expression_cache.cc:
Use my_global.h first
sql/sql_handler.cc:
Use my_global.h first
sql/sql_help.cc:
Use my_global.h first
sql/sql_insert.cc:
Use my_global.h first
sql/sql_lex.cc:
Use my_global.h first
sql/sql_load.cc:
Use my_global.h first
sql/sql_locale.cc:
Use my_global.h first
sql/sql_manager.cc:
Use my_global.h first
sql/sql_parse.cc:
Use my_global.h first
sql/sql_partition.cc:
Use my_global.h first
sql/sql_plugin.cc:
Added comment
sql/sql_prepare.cc:
Use my_global.h first
sql/sql_priv.h:
Added error if we use this before including my_global.h
This check is here becasue so many files includes sql_priv.h first.
sql/sql_profile.cc:
Use my_global.h first
sql/sql_reload.cc:
Use my_global.h first
sql/sql_rename.cc:
Use my_global.h first
sql/sql_repl.cc:
Use my_global.h first
sql/sql_select.cc:
Use my_global.h first
sql/sql_servers.cc:
Use my_global.h first
sql/sql_show.cc:
Added comment
sql/sql_signal.cc:
Use my_global.h first
sql/sql_statistics.cc:
Use my_global.h first
sql/sql_table.cc:
Use my_global.h first
sql/sql_tablespace.cc:
Use my_global.h first
sql/sql_test.cc:
Use my_global.h first
sql/sql_time.cc:
Use my_global.h first
sql/sql_trigger.cc:
Use my_global.h first
sql/sql_udf.cc:
Use my_global.h first
sql/sql_union.cc:
Use my_global.h first
sql/sql_update.cc:
Use my_global.h first
sql/sql_view.cc:
Use my_global.h first
sql/sys_vars.cc:
Added comment
sql/table.cc:
Use my_global.h first
sql/thr_malloc.cc:
Use my_global.h first
sql/transaction.cc:
Use my_global.h first
sql/uniques.cc:
Use my_global.h first
sql/unireg.cc:
Use my_global.h first
sql/unireg.h:
Removed inclusion of my_global.h
storage/archive/ha_archive.cc:
Added comment
storage/blackhole/ha_blackhole.cc:
Use my_global.h first
storage/csv/ha_tina.cc:
Use my_global.h first
storage/csv/transparent_file.cc:
Use my_global.h first
storage/federated/ha_federated.cc:
Use my_global.h first
storage/federatedx/federatedx_io.cc:
Use my_global.h first
storage/federatedx/federatedx_io_mysql.cc:
Use my_global.h first
storage/federatedx/federatedx_io_null.cc:
Use my_global.h first
storage/federatedx/federatedx_txn.cc:
Use my_global.h first
storage/heap/ha_heap.cc:
Use my_global.h first
storage/innobase/handler/handler0alter.cc:
Use my_global.h first
storage/maria/ha_maria.cc:
Use my_global.h first
storage/maria/unittest/ma_maria_log_cleanup.c:
Remove duplicated include files
storage/maria/unittest/test_file.c:
Added comment
storage/myisam/ha_myisam.cc:
Move sql_plugin.h first as this includes my_global.h
storage/myisammrg/ha_myisammrg.cc:
Use my_global.h first
storage/oqgraph/oqgraph_thunk.cc:
Use my_config.h and my_global.h first
One could not include my_global.h before oqgraph_thunk.h (don't know why)
storage/spider/ha_spider.cc:
Use my_global.h first
storage/spider/hs_client/config.cpp:
Use my_global.h first
storage/spider/hs_client/escape.cpp:
Use my_global.h first
storage/spider/hs_client/fatal.cpp:
Use my_global.h first
storage/spider/hs_client/hstcpcli.cpp:
Use my_global.h first
storage/spider/hs_client/socket.cpp:
Use my_global.h first
storage/spider/hs_client/string_util.cpp:
Use my_global.h first
storage/spider/spd_conn.cc:
Use my_global.h first
storage/spider/spd_copy_tables.cc:
Use my_global.h first
storage/spider/spd_db_conn.cc:
Use my_global.h first
storage/spider/spd_db_handlersocket.cc:
Use my_global.h first
storage/spider/spd_db_mysql.cc:
Use my_global.h first
storage/spider/spd_db_oracle.cc:
Use my_global.h first
storage/spider/spd_direct_sql.cc:
Use my_global.h first
storage/spider/spd_i_s.cc:
Use my_global.h first
storage/spider/spd_malloc.cc:
Use my_global.h first
storage/spider/spd_param.cc:
Use my_global.h first
storage/spider/spd_ping_table.cc:
Use my_global.h first
storage/spider/spd_sys_table.cc:
Use my_global.h first
storage/spider/spd_table.cc:
Use my_global.h first
storage/spider/spd_trx.cc:
Use my_global.h first
storage/xtradb/handler/handler0alter.cc:
Use my_global.h first
storage/xtradb/handler/i_s.cc:
Use my_global.h first
We should assume that the store engine will report the first duplicate key for this case.
Old code of suppression of unsafe logging error with LIMIT didn't work, because of wrong usage of my_interval_timer().
Suppress unsafe logging errors to the error log if we get too many unsafe logging errors in a short time.
This is to not overflow the error log with meaningless errors.
- Each error code is suppressed and counted separately.
- We do a 5 minute suppression of new errors if we get more than 10 errors in that time.
Only print unsafe logging errors if log_warnings > 1.
mysql-test/suite/binlog/r/binlog_stm_unsafe_warning.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
mysql-test/suite/binlog/r/binlog_unsafe.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
mysql-test/suite/engines/README:
Fixed typos
mysql-test/suite/rpl/r/rpl_known_bugs_detection.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
sql/sql_base.cc:
Don't log warning if there are two unique keys used with INSERT .. ON DUPLICATE KEY UPDATE.
We should assume that the store engine will report the first duplicate key for this case.
sql/sql_class.cc:
Suppress error in binary log if we get too many unsafe logging errors in a short time.
Only print unsafe logging errors if log_warnings > 1
MDEV-6560 Assertion `! is_set() ' failed in Diagnostics_area::set_ok_status on killing CREATE OR REPLACE
MDEV-6525 Assertion `table->pos_in_locked _tables == __null || table->pos_in_locked_tables->table = table' failed in mark_used_tables_as_free_for_reuse, locking problems and binlogging problems on CREATE OR REPLACE under lock.
mysql-test/r/create_or_replace.result:
Added test for MDEV-6560
mysql-test/t/create_or_replace.test:
Added test for MDEV-6560
mysql-test/valgrind.supp:
Added suppression for OpenSuse 12.3
sql/sql_base.cc:
More DBUG
sql/sql_class.cc:
Changed that thd_sqlcom_can_generate_row_events() does not report that CREATE OR REPLACE is generating row events.
This is safe as this function is only used by InnoDB/XtraDB to check if a query is generating row events as part of another transaction. As CREATE is always run as it's own transaction, this isn't a problem.
This fixed MDEV-6525.
sql/sql_table.cc:
Remember if reopen_tables() generates an error (which can only happen in case of KILL).
This fixed MDEV-6560
replication causing replication to fail.
In parallel replication, we run transactions from the master in parallel, but
force them to commit in the same order they did on the master. If we force T1
to commit before T2, but T2 holds eg. a row lock that is needed by T1, we get
a deadlock when T2 waits until T1 has committed.
Usually, we do not run T1 and T2 in parallel if there is a chance that they
can have conflicting locks like this, but there are certain edge cases where
it can occasionally happen (eg. MDEV-5914, MDEV-5941, MDEV-6020). The bug was
that this would cause replication to hang, eventually getting a lock timeout
and causing the slave to stop with error.
With this patch, InnoDB will report back to the upper layer whenever a
transactions T1 is about to do a lock wait on T2. If T1 and T2 are parallel
replication transactions, and T2 needs to commit later than T1, we can thus
detect the deadlock; we then kill T2, setting a flag that causes it to catch
the kill and convert it to a deadlock error; this error will then cause T2 to
roll back and release its locks (so that T1 can commit), and later T2 will be
re-tried and eventually also committed.
The kill happens asynchroneously in a slave background thread; this is
necessary, as the reporting from InnoDB about lock waits happen deep inside
the locking code, at a point where it is not possible to directly call
THD::awake() due to mutexes held.
Deadlock is assumed to be (very) rarely occuring, so this patch tries to
minimise the performance impact on the normal case where no deadlocks occur,
rather than optimise the handling of the occasional deadlock.
Also fix transaction retry due to deadlock when it happens after a transaction
already signalled to later transactions that it started to commit. In this
case we need to undo this signalling (and later redo it when we commit again
during retry), so following transactions will not start too early.
Also add a missing thd->send_kill_message() that got triggered during testing
(this corrects an incorrect fix for MySQL Bug#58933).