including the big commit
commit 305130361bf72726de220f3d2b2787395e10be61
Author: Marc Alff <marc.alff@oracle.com>
Date: Tue Feb 10 11:31:32 2015 +0100
WL#8354 BACKPORT DIGEST IMPROVEMENTS TO MYSQL 5.6
(with the following commits) and related changes in sql/
During slow execution, e.g. under valgrind, there was a chance
that Aria checkpoint would happen while P_S tables were being
queried; it could cause different data in joined P_S, and
thus combinations of results that the test did not expect.
Fixed by disabling Aria checkpoints for the test.
Let TABLE_SHARE::tdc.free_tables, TABLE_SHARE::tdc.all_tables,
TABLE_SHARE::tdc.flushed and corresponding invariants be protected by
per-share TABLE_SHARE::tdc.LOCK_table_share instead of global LOCK_open.
Clean up and improve the parallel implementation code, mainly related to
scheduling of work to threads and handling of stop and errors.
Fix a lot of bugs in various corner cases that could lead to crashes or
corruption.
Fix that a single replication domain could easily grab all worker threads and
stall all other domains; now a configuration variable
--slave-domain-parallel-threads allows to limit the number of
workers.
Allow next event group to start as soon as previous group begins the commit
phase (as opposed to when it ends it); this allows multiple event groups on
the slave to participate in group commit, even when no other opportunities for
parallelism are available.
Various fixes:
- Fix some races in the rpl.rpl_parallel test case.
- Fix an old incorrect assertion in Log_event iocache read.
- Fix repeated malloc/free of wait_for_commit and rpl_group_info objects.
- Simplify wait_for_commit wakeup logic.
- Fix one case in queue_for_group_commit() where killing one thread would
fail to correctly signal the error to the next, causing loss of the
transaction after slave restart.
- Fix leaking of pthreads (and their allocated stack) due to missing
PTHREAD_CREATE_DETACHED attribute.
- Fix how one batch of group-committed transactions wait for the previous
batch before starting to execute themselves. The old code had a very
complex scheduling where the first transaction was handled differently,
with subtle bugs in corner cases. Now each event group is always scheduled
for a new worker (in a round-robin fashion amongst available workers).
Keep a count of how many transactions have started to commit, and wait for
that counter to reach the appropriate value.
- Fix slave stop to wait for all workers to actually complete processing;
before, the wait was for update of last_committed_sub_id, which happens a
bit earlier, and could leave worker threads potentially accessing bits of
the replication state that is no longer valid after slave stop.
- Fix a couple of places where the test suite would kill a thread waiting
inside enter_cond() in connection with debug_sync; debug_sync + kill can
crash in rare cases due to a race with mysys_var_current_mutex in this
case.
- Fix some corner cases where we had enter_cond() but no exit_cond().
- Fix that we could get failure in wait_for_prior_commit() but forget to flag
the error with my_error().
- Fix slave stop (both for normal stop and stop due to error). Now, at stop
we pick a specific safe point (in terms of event groups executed) and make
sure that all event groups before that point are executed to completion,
and that no event group after start executing; this ensures a safe place to
restart replication, even for non-transactional stuff/DDL. In error stop,
make sure that all prior event groups are allowed to execute to completion,
and that any later event groups that have started are rolled back, if
possible. The old code could leave eg. T1 and T3 committed but T2 not, or
it could even leave half a transaction not rolled back in some random
worker, which would cause big problems when that worker was later reused
after slave restart.
- Fix the accounting of amount of events queued for one worker. Before, the
amount was reduced immediately as soon as the events were dequeued (which
happens all at once); this allowed twice the amount of events to be queued
in memory for each single worker, which is not what users would expect.
- Fix that an error set during execution of one event was sometimes not
cleared before executing the next, causing problems with the error
reporting.
- Fix incorrect handling of thd->killed in worker threads.
MASTER_GTID_WAIT() is similar to MASTER_POS_WAIT(), but works with a
GTID position rather than an old-style filename/offset.
@@LAST_GTID gives the GTID assigned to the last transaction written
into the binlog.
Together, the two can be used by applications to obtain the GTID of
an update on the master, and then do a MASTER_GTID_WAIT() for that
position on any read slave where it is important to get results that
are caught up with the master at least to the point of the update.
The implementation of MASTER_GTID_WAIT() is implemented in a way
that tries to minimise the performance impact on the SQL threads,
even in the presense of many waiters on single GTID positions (as
from @@LAST_GTID).
Syntax. Server support. Test cases.
InnoDB bugfixes:
* don't mess around with system sprintf's, always use my_error() for errors.
* don't use InnoDB internal error codes where OS error codes are expected.
* don't say "file not found", when it was.
There were some places where insufficient locking between
parallel threads could cause invalid memory accesses and
possibly other grief.
This patch adds the missing locking, and moves the locking
into the struct rpl_binlog_state methods to make it easier
to see that proper locking is in place everywhere.
The merge is still missing a few hunks related to temporary tables and
InnoDB log file size. The associated code did not seem to exist in
10.0, so the merge of that needs more work. Until this is fixed, there
are a number of test failures as a result.
than an empty host '' is the same as any-host wildcard '%'.
Replace '' with '%' in the parser (for GRANT ... foo@'') and when loading grant tables.
Side effect: one cannot have foo@'' and foo@'%' both at the same time
(but one can have foo@'%' and foo@'%%')
- Made slaves temporary table multi-thread slave safe by adding mutex around save_temporary_table usage.
- rli->save_temporary_tables is the active list of all used temporary tables
- This is copied to THD->temporary_tables when temporary tables are opened and updated when temporary tables are closed
- Added THD->lock_temporary_tables() and THD->unlock_temporary_tables() to simplify this.
- Relay_log_info->sql_thd renamed to Relay_log_info->sql_driver_thd to avoid wrong usage for merged code.
- Added is_part_of_group() to mark functions that are part of the next function. This replaces setting IN_STMT when events are executed.
- Added is_begin(), is_commit() and is_rollback() functions to Query_log_event to simplify code.
- If slave_skip_counter is set run things in single threaded mode. This simplifies code for skipping events.
- Updating state of relay log (IN_STMT and IN_TRANSACTION) is moved to one single function: update_state_of_relay_log()
We can't use OPTION_BEGIN to check for the state anymore as the sql_driver and sql execution threads may be different.
Clear IN_STMT and IN_TRANSACTION in init_relay_log_pos() and Relay_log_info::cleanup_context() to ensure the flags doesn't survive slave restarts
is_in_group() is now independent of state of executed transaction.
- Reset thd->transaction.all.modified_non_trans_table() if we did set it for single table row events.
This was mainly for keeping the flag as documented.
- Changed slave_open_temp_tables to uint32 to be able to use atomic operators on it.
- Relay_log_info::sleep_lock -> rpl_group_info::sleep_lock
- Relay_log_info::sleep_cond -> rpl_group_info::sleep_cond
- Changed some functions to take rpl_group_info instead of Relay_log_info to make them multi-slave safe and to simplify usage
- do_shall_skip()
- continue_group()
- sql_slave_killed()
- next_event()
- Simplifed arguments to io_salve_killed(), check_io_slave_killed() and sql_slave_killed(); No reason to supply THD as this is part of the given structure.
- set_thd_in_use_temporary_tables() removed as in_use is set on usage
- Added information to thd_proc_info() which thread is waiting for slave mutex to exit.
- In open_table() reuse code from find_temporary_table()
Other things:
- More DBUG statements
- Fixed the rpl_incident.test can be run with --debug
- More comments
- Disabled not used function rpl_connect_master()
mysql-test/suite/perfschema/r/all_instances.result:
Moved sleep_lock and sleep_cond to rpl_group_info
mysql-test/suite/rpl/r/rpl_incident.result:
Updated result
mysql-test/suite/rpl/t/rpl_incident-master.opt:
Not needed anymore
mysql-test/suite/rpl/t/rpl_incident.test:
Fixed that test can be run with --debug
sql/handler.cc:
More DBUG_PRINT
sql/log.cc:
More comments
sql/log_event.cc:
Added DBUG statements
do_shall_skip(), continue_group() now takes rpl_group_info param
Use is_begin(), is_commit() and is_rollback() functions instead of inspecting query string
We don't have set slaves temporary tables 'in_use' as this is now done when tables are opened.
Removed IN_STMT flag setting. This is now done in update_state_of_relay_log()
Use IN_TRANSACTION flag to test state of relay log.
In rows_event_stmt_cleanup() reset thd->transaction.all.modified_non_trans_table if we had set this before.
sql/log_event.h:
do_shall_skip(), continue_group() now takes rpl_group_info param
Added is_part_of_group() to mark events that are part of the next event. This replaces setting IN_STMT when events are executed.
Added is_begin(), is_commit() and is_rollback() functions to Query_log_event to simplify code.
sql/log_event_old.cc:
Removed IN_STMT flag setting. This is now done in update_state_of_relay_log()
do_shall_skip(), continue_group() now takes rpl_group_info param
sql/log_event_old.h:
Added is_part_of_group() to mark events that are part of the next event.
do_shall_skip(), continue_group() now takes rpl_group_info param
sql/mysqld.cc:
Changed slave_open_temp_tables to uint32 to be able to use atomic operators on it.
Relay_log_info::sleep_lock -> Rpl_group_info::sleep_lock
Relay_log_info::sleep_cond -> Rpl_group_info::sleep_cond
sql/mysqld.h:
Updated types and names
sql/rpl_gtid.cc:
More DBUG
sql/rpl_parallel.cc:
Updated TODO section
Set thd for event that is execution
Use new is_begin(), is_commit() and is_rollback() functions.
More comments
sql/rpl_rli.cc:
sql_thd -> sql_driver_thd
Relay_log_info::sleep_lock -> rpl_group_info::sleep_lock
Relay_log_info::sleep_cond -> rpl_group_info::sleep_cond
Clear IN_STMT and IN_TRANSACTION in init_relay_log_pos() and Relay_log_info::cleanup_context() to ensure the flags doesn't survive slave restarts.
Reset table->in_use for temporary tables as the table may have been used by another THD.
Use IN_TRANSACTION instead of OPTION_BEGIN to check state of relay log.
Removed IN_STMT flag setting. This is now done in update_state_of_relay_log()
sql/rpl_rli.h:
Changed relay log state flags to bit masks instead of bit positions (most other code we have uses bit masks)
Added IN_TRANSACTION to mark if we are in a BEGIN ... COMMIT section.
save_temporary_tables is now thread safe
Relay_log_info::sleep_lock -> rpl_group_info::sleep_lock
Relay_log_info::sleep_cond -> rpl_group_info::sleep_cond
Relay_log_info->sql_thd renamed to Relay_log_info->sql_driver_thd to avoid wrong usage for merged code
is_in_group() is now independent of state of executed transaction.
sql/slave.cc:
Simplifed arguments to io_salve_killed(), sql_slave_killed() and check_io_slave_killed(); No reason to supply THD as this is part of the given structure.
set_thd_in_use_temporary_tables() removed as in_use is set on usage in sql_base.cc
sql_thd -> sql_driver_thd
More DBUG
Added update_state_of_relay_log() which will calculate the IN_STMT and IN_TRANSACTION state of the relay log after the current element is executed.
If slave_skip_counter is set run things in single threaded mode.
Simplifed arguments to io_salve_killed(), check_io_slave_killed() and sql_slave_killed(); No reason to supply THD as this is part of the given structure.
Added information to thd_proc_info() which thread is waiting for slave mutex to exit.
Disabled not used function rpl_connect_master()
Updated argument to next_event()
sql/sql_base.cc:
Added mutex around usage of slave's temporary tables. The active list is always kept up to date in sql->rgi_slave->save_temporary_tables.
Clear thd->temporary_tables after query (safety)
More DBUG
When using temporary table, set table->in_use to current thd as the THD may be different for slave threads.
Some code is ifdef:ed with REMOVE_AFTER_MERGE_WITH_10 as the given code in 10.0 is not yet in this tree.
In open_table() reuse code from find_temporary_table()
sql/sql_binlog.cc:
rli->sql_thd -> rli->sql_driver_thd
Remove duplicate setting of rgi->rli
sql/sql_class.cc:
Added helper functions rgi_lock_temporary_tables() and rgi_unlock_temporary_tables()
Would have been nicer to have these inline, but there was no easy way to do that
sql/sql_class.h:
Added functions to protect slaves temporary tables
sql/sql_parse.cc:
Added DBUG_PRINT
sql/transaction.cc:
Added comment
Following variables do not require LOCK_open protection anymore:
- table_def_cache (renamed to tdc_hash) is protected by rw-lock
LOCK_tdc_hash;
- table_def_shutdown_in_progress doesn't need LOCK_open protection;
- last_table_id use atomics;
- TABLE_SHARE::ref_count (renamed to TABLE_SHARE::tdc.ref_count)
is protected by TABLE_SHARE::tdc.LOCK_table_share;
- TABLE_SHARE::next, ::prev (renamed to tdc.next and tdc.prev),
oldest_unused_share, end_of_unused_share are protected by
LOCK_unused_shares;
- TABLE_SHARE::m_flush_tickets (renamed to tdc.m_flush_tickets)
is protected by TABLE_SHARE::tdc.LOCK_table_share;
- refresh_version (renamed to tdc_version) use atomics.
* update results
* don't force HA_CREATE_DELAY_KEY_WRITE on all temp tables,
(bad for CREATE ... LIKE) instead imply it in myisam/aria
* restore HA_ERR_TABLE_DEF_CHANGED in archive
* increase the default number of rwlock classes in P_S to fit all our rwlocks
includes:
* remove some remnants of "Bug#14521864: MYSQL 5.1 TO 5.5 BUGS PARTITIONING"
* introduce LOCK_share, now LOCK_ha_data is strictly for engines
* rea_create_table() always creates .par file (even in "frm-only" mode)
* fix a 5.6 bug, temp file leak on dummy ALTER TABLE
revno: 4559
committer: Marc Alff <marc.alff@oracle.com>
branch nick: mysql-5.6-bug14741537-v4
timestamp: Thu 2012-11-08 22:40:31 +0100
message:
Bug#14741537 - MYSQL 5.6, GTID AND PERFORMANCE_SCHEMA
Before this fix, statements using performance_schema tables:
- were marked as unsafe for replication,
- did cause warnings during execution,
- were written to the binlog, either in STATEMENT or ROW format.
When using replication with the new GTID feature,
unsafe warnings are elevated to errors,
which prevents to use both the performance_schema and GTID together.
The root cause of the problem is not related to raising warnings/errors
in some special cases, but deeper: statements involving the performance
schema should not even be written to the binary log in the first place,
because the content of the performance schema tables is 'local' to a server
instance, and may differ greatly between nodes in a replication
topology.
In particular, the DBA should be able to configure (INSERT, UPDATE, DELETE)
or flush (TRUNCATE) performance schema tables on one node,
without affecting other nodes.
This fix introduces the concept of a 'non-replicated' or 'local' table,
and adjusts the replication logic to ignore tables that are not replicated
when deciding if or how to log a statement to the binlog.
Note that while this issue was detected using the performance_schema,
other tables are also affected by the same problem.
This fix define as 'local' the following tables, which are then never
replicated:
- performance_schema.*
- mysql.general_log
- mysql.slow_log
- mysql.slave_relay_log_info
- mysql.slave_master_info
- mysql.slave_worker_info
Existing behavior for information_schema.* is unchanged by this fix,
to limit the scope of changes.
Coding wise, this fix implements the following changes:
1)
Performance schema tables are not using any replication flags,
since performance schema tables are not replicated.
2)
In open_table_from_share(),
tables with no replication capabilities (performance_schema.*),
tables with TABLE_CATEGORY_LOG (logs)
and tables with TABLE_CATEGORY_RPL_INFO (replication)
are marked as non replicated, with TABLE::no_replicate
3)
A new THD member, THD::m_binlog_filter_state,
indicate if the current statement is written to the binlog
(normal cases for most statements), or is to be discarded
(because the statements affects non replicated tables).
4)
In THD::decide_logging_format(), the replication logic
is changed to take into account non replicated tables.
Statements that affect only non replicated tables are
executed normally (no warning or errors), but not written
to the binlog.
Statements that affect (i.e., write to) a replicated table
while also using (i.e., reading from or writing to) a non replicated table
are executed normally in MIXED and ROW binlog format,
and cause a new error in STATEMENT binlog format.
THD::decide_logging_format() uses THD::m_binlog_filter_state
to indicate if a statement is to be ignored, when writing to
the binlog.
5)
In THD::binlog_query(), statements marked as ignored
are not written to the binary log.
6)
For row based replication, the existing test for 'table->no_replicate',
has been moved from binlog_log_row() to check_table_binlog_row_based().