Support for galera GTID consistency thru cluster. All nodes in cluster
should have same GTID for replicated events which are originating from cluster.
Cluster originating commands need to contain sequential WSREP GTID seqno
Ignore manual setting of gtid_seq_no=X.
In master-slave scenario where master is non galera node replicated GTID is
replicated and is preserved in all nodes.
To have this - domain_id, server_id and seqnos should be same on all nodes.
Node which bootstraps the cluster, to achieve this, sends domain_id and
server_id to other nodes and this combination is used to write GTID for events
that are replicated inside cluster.
Cluster nodes that are executing non replicated events are going to have different
GTID than replicated ones, difference will be visible in domain part of gtid.
With wsrep_gtid_domain_id you can set domain_id for WSREP cluster.
Functions WSREP_LAST_WRITTEN_GTID, WSREP_LAST_SEEN_GTID and
WSREP_SYNC_WAIT_UPTO_GTID now works with "native" GTID format.
Fixed galera tests to reflect this chances.
Add variable to manually update WSREP GTID seqno in cluster
Add variable to manipulate and change WSREP GTID seqno. Next command
originating from cluster and on same thread will have set seqno and
cluster should change their internal counter to it's value.
Behavior is same as using @@gtid_seq_no for non WSREP transaction.
Problem:
-------
Accessing a member within 'xid_count_per_binlog' structure results in
following error when 'UBSAN' is enabled.
member access within address 0xXXX which does not point to an object of type
'xid_count_per_binlog'
Analysis:
---------
The problem appears to be that no constructor for 'xid_count_per_binlog' is
being called, and thus the vtable will not be initialized.
Fix:
---
Defined a parameterized constructor for 'xid_count_per_binlog' class.
Fixed by caching last binary log number used in last_used_log_number
Other things:
- Moved locking of LOCK_log form new_file_impl() to new_file(). This fixed
a bug where LOCK_log could have been unlocked even if 'need_lock' was
not set. Removed not anymore used argument need_lock.
- Made generate_new_name() virtual to simplify the code between
other logs and binary log.
Reviewed by Andrei Elkin
LOG_INFO::lock was useless. It could've only protect against concurrent
iterators execution, which was already protected by LOCK_thread_count.
Use LOCK_thd_data instead of LOCK_thread_count as a protection against
THD::current_linfo reset.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
MyRocks internally will print non-critical messages to
sql_print_verbose_info() which will do what InnoDB does in similar cases:
check if (global_system_variables.log_warnings > 2).
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend
a lot of time rolling back writes to the intermediate copy of the table.
To reduce the amount of busy work done, a work-around was introduced in
commit fd069e2bb3 in MySQL 4.1.8 and 5.0.2,
to commit the transaction after every 10,000 inserted rows.
A proper fix would have been to disable the undo logging altogether and
to simply drop the intermediate copy of the table on subsequent server
startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585.
In MariaDB 10.2, the intermediate copy of the table would be left behind
with a name starting with the string #sql.
This is a backport of a bug fix from MySQL 8.0.0 to MariaDB,
contributed by jixianliang <271365745@qq.com>.
Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation
InnoDB must for now keep the undo logging enabled, so that the latest
row can be rolled back in case of an error.
In Galera cluster, the LOAD DATA statement will retain the existing
behaviour and commit the transaction after every 10,000 rows if
the parameter wsrep_load_data_splitting=ON is set. The logic to do
so (the wsrep_load_data_split() function and the call
handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work
by Ji Xianliang and Marko Mäkelä.
The original fix:
Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date: Wed Dec 2 16:09:15 2015 +0530
Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY
Problem:
During ALTER TABLE, we commit and restart the transaction for every
10,000 rows, so that the rollback after recovery would not take so long.
Fix:
Suppress the undo logging during copy alter operation. If fts_index is
present then insert directly into fts auxiliary table rather
than doing at commit time.
ha_innobase::num_write_row: Remove the variable.
ha_innobase::write_row(): Remove the hack for committing every 10000 rows.
row_lock_table_for_mysql(): Remove the extra 2 parameters.
lock_get_src_table(), lock_is_table_exclusive(): Remove.
Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com>
Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
Problem:- Gtid are not transferred in Galera Cluster.
Solution:- We need to transfer gtid in the case on either when cluster is
slave/master in async replication. In normal Gtid replication gtid are generated on
recieving node itself and it is always on sync with other nodes. Because galera keeps
node in sync , So all nodes get same no of event groups. So the issue arises when
say galera is slave in async replication.
A
| (Async replication)
D <-> E <-> F {Galera replication}
So what should happen is that all node should apply the master gtid but this does
node happen, becuase node E, F does not recieve gtid from D in write set , So what E(or F)
does is that it applies wsrep_gtid_domain_id, D server-id , E gtid next seq no. This
generated gtid does not always work when say A has different domain id.
So In this commit, on galera node when we see that this event is recieved from master
we simply write Gtid_Log_Event in write_set and send it to other nodes.
and specifically the ack receiving functionality.
Semisync is turned to be static instead of plugin so its functions
are invoked at the same points as RUN_HOOKS.
The RUN_HOOKS and the observer interface remain to be removed by later
patch.
Todo:
React on killed status by repl_semisync_master.wait_after_sync(). Currently
Repl_semi_sync_master::commit_trx does not check the killed status.
There were few bugfixes found that are present in mysql and its unclear
whether/how they are covered. Those include:
Bug#15985893: GTID SKIPPED EVENTS ON MASTER CAUSE SEMI SYNC TIME-OUTS
Bug#17932935 CALLING IS_SEMI_SYNC_SLAVE() IN EACH FUNCTION CALL
HAS BAD PERFORMANCE
Bug#20574628: SEMI-SYNC REPLICATION PERFORMANCE DEGRADES WITH A HIGH NUMBER OF THREADS
Part of MDEV-13073 AliSQL Optimize performance of semisync
The idea it to use a dedicated lock detecting if there is new data in
the master's binary log instead of the overused LOCK_log.
Changes:
- Use dedicated COND variables for the relay and binary log signaling.
This was needed as we where the old 'update_cond' variable was used
with different mutex's, which could cause deadlocks.
- Relay log uses now COND_relay_log_updated and LOCK_log
- Binary log uses now COND_bin_log_updated and LOCK_binlog_end_pos
- Renamed signal_cnt to relay_signal_cnt (as we now have two signals)
- Added some missing error handling in MYSQL_BIN_LOG::new_file_impl()
- Reformatted some comments with old style
- Renamed m_key_LOCK_binlog_end_pos to key_LOCK_binlog_end_pos
- Changed 'signal_update()' to update_binlog_end_pos() which works for
both relay and binary log
The crash (sometimes assert) in MYSQL_BIN_LOG::mark_xid_done was caused by a
fact that log.cc:binlog_background_thread_queue could become a cyclic list.
This possibility becomes real with two checkpoint capable engines that
may execute TC_LOG_BINLOG::commit_checkpoint_notify() in succession before
binlog_background thread gets control and eventually finds a freed memory
while otherwise endlessly looping in while(queue).
It is fixed with counting the notificaion kind instead of en-listing the same notificaion kind in commit_checkpoint_notify as formerly. The while(queue) of binlog background thread is refined to pay attention to the new counter. In effectno more access to free memory is possible.
As reported in MDEV-11969 "there's no way to ditch knowledge" about some
domain that is no longer updated on a server. Besides being of annoyance to
clutter output in DBA console stale domains can prevent the slave
to connect the master as MDEV-12012 witnesses.
What domain is obsolete must be evaluated by the user (DBA) according
to whether the domain info is still relevant and will the domain ever
receive any update.
This patch introduces a method to discard obsolete gtid domains from
the server binlog state. The removal requires no event group from such
domain present in existing binlog files though. If there are any the
containing logs must be first PURGEd in order for
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)
succeed. Otherwise the command returns an error.
The list of obsolete domains can be computed through
intersecting two sets - the earliest (first) binlog's Gtid_list
and the current value of @@global.gtid_binlog_state - and extracting
the domain id components from the intersection list items.
The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog
omitting the deleted domains from the active binlog file's Gtid_list.
Notice though when the command is ineffective - that none of requested to delete
domain exists in the binlog state - rotation does not occur.
Obsolete domain deletion is not harmful for connected slaves as long
as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID.
The slaves must have the last event from purged files processed as usual,
in order not to bump later into requesting a gtid from a file which
was already gone.
While the command is not replicated (as ordinary FLUSH BINLOG LOGS is)
slaves, even though having extra domains, won't suffer from reconnection errors
thanks to master-slave gtid connection protocol allowing the master
to be ignorant about a gtid domain.
Should at failover such slave to be promoted into master role it may run
the ex-master's
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)
to clean its own binlog state.
NOTES.
suite/perfschema/r/start_server_low_digest.result
is re-recorded as consequence of internal parser codes changes.
- Simplified use_trans_cache() to return at once if is_transactional is set
- Indentation and spelling errors fixed
- Don't call signal_update() if update_binlog_end_pos() is called as the
function already calls signal_update()
- Removed not used function wait_for_update_bin_log(), which would cause
errors if ever used.
- Simplified handler::clone() by always allocating 'ref' in ha_open(). To do
this I added an optional MEM_ROOT argument to ha_open() to be used when
allocating 'ref'
- Changed arguments to get_system_var() from LEX_CSTRING to LEX_CSTRING*
- Added THD as argument to create_select_for_variable(). Changed also char*
argument to LEX_CSTRING to avoid strlen() call.
- Change calls to append() to use LEX_CSTRING
Intermediate commit.
Implement status variables to aid the DBA in determining the need
and/or effectiveness of the per-engine mylsq.gtid_slave_pos feature:
transactions_multi_engine
Number of transactions that changed data in multiple (transactional)
storage engines.
rpl_transactions_multi_engine
Number of replicated transactions that involved changes in multiple
(transactional) storage engines, before considering the update of the
mysql.gtid_slave_posXXX table.
transactions_gtid_foreign_engine
Number of replicated transactions where the update of the
mysql.gtid_slave_posXXX table had to choose a storage engine that did not
otherwise participate in the transaction.
This happens because the master writes a table_map event to the binary log, but no row event.
The slave has a check that there should always be a row event if there was a table_map event, which
causes a crash.
Fixed by remembering in the cache what kind of events are logged
and ignore cached statements which is just a table map event.
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
.. file '/var/log/mysql/mariadb-bin.000001' not found in binlog
index, needed for recovery. Aborting.
In Galera cluster, while preparing for rsync/xtrabackup based
SST, the donor node takes an FTWRL followed by (REFRESH_ENGINE_LOG
in rsync based state transfer and) REFRESH_BINARY_LOG. The latter
rotates the binary log and logs Binlog_checkpoint_log_event
corresponding to the penultimate binary log file into the new file.
The checkpoint event for the current file is later logged
synchronously by binlog_background_thread.
Now, since in rsync/xtrabackup based snapshot state transfer methods,
only the last binary log file is transferred to the joiner node; the
file could get transferred even before the checkpoint event for the
same file gets written to it. As a result, the joiner node would fail
to start complaining about the missing binlog file needed for recovery.
In order to fix this, a mechanism has been put in place to make
REFRESH_BINARY_LOG operation wait for Binlog_checkpoint_log_event
to be logged for the current binary log file if the node is part of
a Galera cluster. As further safety, during rsync based state transfer
the donor node now acquires and owns LOCK_log for the duration of file
transfer during SST.
Revert following bug fix:
Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE
This fix results in a deadlock between slave IO thread
and SQL thread.
(cherry picked from commit e3fea6c6dbb36c6ab21c4ab777224560e9608b53)
- Change some static variables to dynamic to ensure that we don't do any memory
allocations before server starts or stops
- Print more memory information on SIGHUP. Fixed output.
- Write out if memory was lost if run with --debug-at-exit
- Fixed wrong #ifdef in sql_cache.cc
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE
Problem:
========
Currently SHOW SLAVE STATUS blocks if IO thread waits for
disk space. This makes automation tools verifying
server health block on taking relevant action. Finally this
will create SHOW SLAVE STATUS piles.
Analysis:
=========
SHOW SLAVE STATUS hangs on mi->data_lock if relay log write
is waiting for free disk space while holding mi->data_lock.
mi->data_lock is needed to protect the format description
event (mi->format_description_event) which is accessed by
the clients running FLUSH LOGS and slave IO thread. Note
relay log writes don't need to be protected by
mi->data_lock, LOCK_log is used to protect relay log between
IO and SQL thread (see MYSQL_BIN_LOG::append_event). The
code takes mi->data_lock to protect
mi->format_description_event during relay log rotate which
might get triggered right after relay log write.
Fix:
====
Release the data_lock just for the duration of writing into
relay log.
Made change to ensure the following lock order is maintained
to avoid deadlocks.
data_lock, LOCK_log
data_lock is held during relay log rotations to protect
the description event.
Introduce Log_event_writer() that encapsulates
writing data to an IO_CACHE with automatic checksum calculation.
Now all events properly checksum themselves as needed.
Use Log_event_writer in MYSQL_BIN_LOG::write_cache() instead
of copy-pasting its logic all over.
Later Log_event_writer will also do encryption.
Other things:
- Avoid calling init_and_set_log_file_name() when opening binary log.
- Remove newlines early when reading from index file.
- Ensure that reset_logs() will work even if thd is 0 (Can happen on startup)
- Added thd to sart_slave_threads() for better error handling.
Add log_bin_index, log_bin_basename and relay_log_basename system
variables. Also, convert relay_log_index system variable to
NO_CMD_LINE and implement --relay-log-index as a command line
option.
Do not use format function attribute for sql_print_xxx() family of
functions as they use a MariaDB-specific extension of printf instead
of one provided by the system.
remove group_commit_reason_immediate
rename group_commit_reason_transaction to group_commit_trigger_lock_wait
rename group_commit_reason_usec to group_commit_trigger_timeout
rename group_commit_reason_count to group_commit_triggger_count
The following global status variables where added:
* binlog_group_commit_reason_count
* binlog_group_commit_reason_usec
* binlog_group_commit_reason_transaction
* binlog_group_commit_reason_immediate
binlog_group_commit_reason_count corresponds to group commits made by
virtue of the binlog_commit_wait_count variable.
binlog_group_commit_reason_usec corresponds to the binlog_commit_wait_usec
variable.
binlog_group_commit_reason_transaction is a result of ordered
transaction that need to occur in the same order on the slave and can't
be parallelised.
binlog_group_commit_reason_immediate is caused to prevent stalls with
row locks as described in log.cc:binlog_report_wait_for. This immediate
count is also counted a second time in binlog_group_commit_reason_transaction.
Overall binlog_group_commits = binlog_group_commit_reason_count +
binlog_group_commit_reason_usec + binlog_group_commit_reason_transaction
This work was funded thanks to Open Source Developers Club Australia.
Parallel replication (in 10.0 / "conservative" mode) relies on binlog group
commits to group transactions that can be safely run in parallel on the
slave. The --binlog-commit-wait-count and --binlog-commit-wait-usec options
exist to increase the number of commits per group. But in case of conflicts
between transactions, this can cause unnecessary delay and reduced througput,
especially on a slave where commit order is fixed.
This patch adds a heuristics to reduce this problem. When transaction T1 goes
to commit, it will first wait for N transactions to queue up for a group
commit. However, if we detect that another transaction T2 is waiting for a row
lock held by T1, then we will skip the wait and let T1 commit immediately,
releasing locks and let T2 continue.
On a slave, this avoids the unfortunate situation where T1 is waiting for T2
to join the group commit, but T2 is waiting for T1 to release locks, causing
no work to be done for the duration of the --binlog-commit-wait-usec timeout.
(The heuristic seems reasonable on the master as well, so it is enabled for
all transactions, not just replication transactions).
Using a boolean flag for 'there is a RESET MASTER in progress' doesn't
work very well for multiple concurrent RESET MASTER statements.
Changed to a counter.
log-tc-size is 24K by default. Page size is 64K on PPC64. But log-tc-size
must be at least 3 x page size. This is enforced by TC_LOG_MMAP::open()
with a comment: to guarantee non-empty pool.
This all makes server not startable in default configuration on PPC64.
Autosize log-tc-size, so that it's min value= page size * 3, default
value= page size * 6, block size= page size.
Implement --semi-sync-master-wait-point=AFTER_SYNC|AFTER_COMMIT.
When AFTER_SYNC, the semi-sync wait will be done earlier, before the storage
engine commit rather than after. This means that a transaction will not be
visible on the master until at least one slave has received it.
Make the binlog dump threads not need to take LOCK_log while sending
binlog events to slave. Instead, a new LOCK_binlog_end_pos is used
just to coordinate tracking the current end-of-log.
This is a pre-requisite for MDEV-162, "Enhanced semisync
replication". It should also help reduce the contention on LOCK_log on
a busy master.
Also does some much-needed refactoring/cleanup of the related code in
the binlog dump thread.
Merged lp:maria/maria-10.0-galera up to revision 3879.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
We don't support changing tc_log implementation at run time.
If the first XA-capable engine is loaded with INSTALL PLUGIN - disable its
XA capabilities with a warning
Now if CREATE OR REPLACE fails but we have deleted a table already, we will generate a DROP TABLE in the binary log.
This fixes this issue.
In addition, for a failing CREATE OR REPLACE TABLE ... SELECT we don't generate a log of all the inserted rows, only the DROP TABLE.
I added code for not logging DROP TEMPORARY TABLE for tables where the CREATE TABLE was not logged. This code will be activated in 10.1
by removing the code protected by DONT_LOG_DROP_OF_TEMPORARY_TABLES.
mysql-test/suite/rpl/r/create_or_replace_mix.result:
More test cases
mysql-test/suite/rpl/r/create_or_replace_row.result:
More test cases
mysql-test/suite/rpl/r/create_or_replace_statement.result:
More test cases
mysql-test/suite/rpl/t/create_or_replace.inc:
More test cases
sql/log.cc:
Added binlog_reset_cache() to clear the binary log.
sql/log.h:
Added prototype
sql/sql_insert.cc:
If CREATE OR REPLACE TABLE ... SELECT fails:
- Don't log anything if nothing changed
- If table was deleted, log a DROP TABLE.
Remember if we table creation of temporary tables was logged.
sql/sql_table.cc:
Added log_drop_table()
Remember if we table creation of temporary tables was logged.
If CREATE OR REPLACE TABLE ... SELECT fails and a table was deleted, log a DROP TABLE.
sql/sql_table.h:
Added prototype
sql/sql_truncate.cc:
Remember if we table creation of temporary tables was logged.
sql/table.h:
Added table_creation_was_logged
The problem is a deadlock between MYSQL_BIN_LOG::reset_logs() and
MYSQL_BIN_LOG::mark_xid_done(). The former takes LOCK_log and waits for the
latter to complete. But the latter also tries to take LOCK_log; this can lead
to a deadlock.
There was already code that tries to deal with this, with the flag
reset_master_pending. However, there was still a small opportunity for
deadlock, when an previous mark_xid_done() is still running when reset_logs()
is called and is at the precise point where it first releases LOCK_xid_list
and then re-aquires both LOCK_log and LOCK_xid_list.
Solve by setting reset_master_pending in reset_logs() before taking
LOCK_log. And also count how many invocations of LOCK_xid_list are in the
progress of releasing and re-aquiring locks, and in reset_logs() wait for that
number to drop to zero after setting reset_master_pending and before taking
LOCK_log.
Using CREATE OR REPLACE TABLE is be identical to
DROP TABLE IF EXISTS table_name;
CREATE TABLE ...;
Except that:
* CREATE OR REPLACE is be atomic (now one can create the same table between drop and create).
* Temporary tables will not shadow the table name for the DROP as the CREATE TABLE tells us already if we are using a temporary table or not.
* If the table was locked with LOCK TABLES, the new table will be locked with the same lock after it's created.
Implementation details:
- We don't anymore open the to-be-created table during CREATE TABLE, which the original code did.
- There is no need to open a table we are planning to create. It's enough to check if the table exists or not.
- Removed some of duplicated code for CREATE IF NOT EXISTS.
- Give an error when using CREATE OR REPLACE with IF NOT EXISTS (conflicting options).
- As a side effect of the code changes, we don't anymore have to internally re-prepare prepared statements with CREATE TABLE if the table exists.
- Made one code path for all testing if log table are in use.
- Better error message if one tries to create/drop/alter a log table in use
- Added back disabled rpl_row_create_table test as it now seams to work and includes a lot of interesting tests.
- Added HA_LEX_CREATE_REPLACE to mark if we are using CREATE OR REPLACE
- Aligned CREATE OR REPLACE parsing code in sql_yacc.yy for TABLE and VIEW
- Changed interface for drop_temporary_table() to make it more reusable
- Changed Locked_tables_list::init_locked_tables() to work on the table object instead of the table list object. Before this it used a mix of both, which was not good.
- Locked_tables_list::unlock_locked_tables(THD *thd) now requires a valid thd argument. Old usage of calling this with 0 i changed to instead call Locked_tables_list::reset()
- Added functions Locked_tables_list:restore_lock() and Locked_tables_list::add_back_last_deleted_lock() to be able to easily add back a locked table to the lock list.
- Added restart_trans_for_tables() to be able to restart a transaction.
- DROP_ACL is required if one uses CREATE TABLE OR REPLACE.
- Added drop of normal and temporary tables in create_table_imp() if CREATE OR REPLACE was used.
- Added reacquiring of table locks in mysql_create_table() and mysql_create_like_table()
mysql-test/include/commit.inc:
With new code we get fewer status increments
mysql-test/r/commit_1innodb.result:
With new code we get fewer status increments
mysql-test/r/create.result:
Added testing of create or replace with timeout
mysql-test/r/create_or_replace.result:
Basic testing of CREATE OR REPLACE TABLE
mysql-test/r/partition_exchange.result:
New error message
mysql-test/r/ps_ddl.result:
Fewer reprepares with new code
mysql-test/suite/archive/discover.result:
Don't rediscover archive tables if the .frm file exists
(Sergei will look at this if there is a better way...)
mysql-test/suite/archive/discover.test:
Don't rediscover archive tables if the .frm file exists
(Sergei will look at this if there is a better way...)
mysql-test/suite/funcs_1/r/innodb_views.result:
New error message
mysql-test/suite/funcs_1/r/memory_views.result:
New error message
mysql-test/suite/rpl/disabled.def:
rpl_row_create_table should now be safe to use
mysql-test/suite/rpl/r/rpl_row_create_table.result:
Updated results after adding back disabled test
mysql-test/suite/rpl/t/rpl_create_if_not_exists.test:
Added comment
mysql-test/suite/rpl/t/rpl_row_create_table.test:
Added CREATE OR REPLACE TABLE test
mysql-test/t/create.test:
Added CREATE OR REPLACE TABLE test
mysql-test/t/create_or_replace-master.opt:
Create logs
mysql-test/t/create_or_replace.test:
Basic testing of CREATE OR REPLACE TABLE
mysql-test/t/partition_exchange.test:
Error number changed as we are now using same code for all log table change issues
mysql-test/t/ps_ddl.test:
Fewer reprepares with new code
sql/handler.h:
Moved things around a bit in a structure to get better alignment.
Added HA_LEX_CREATE_REPLACE to mark if we are using CREATE OR REPLACE
Added 3 elements to end of HA_CREATE_INFO to be able to store state to add backs locks in case of LOCK TABLES.
sql/log.cc:
Reimplemented check_if_log_table():
- Simpler and faster usage
- Can give error messages
This gives us one code path for allmost all error messages if log tables are in use
sql/log.h:
New interface for check_if_log_table()
sql/slave.cc:
More logging
sql/sql_alter.cc:
New interface for check_if_log_table()
sql/sql_base.cc:
More documentation
Changed interface for drop_temporary_table() to make it more reusable
Changed Locked_tables_list::init_locked_tables() to work on the table object instead of the table list object. Before this it used a mix of both, which was not good.
Locked_tables_list::unlock_locked_tables(THD *thd) now requires a valid thd argument. Old usage of calling this with 0 i changed to instead call Locked_tables_list::reset()
Added functions Locked_tables_list:restore_lock() and Locked_tables_list::add_back_last_deleted_lock() to be able to easily add back a locked table to the lock list.
Check for command number instead of open_strategy of CREATE TABLE was used.
Added restart_trans_for_tables() to be able to restart a transaction. This was needed in "create or replace ... select" between the drop table and the select.
sql/sql_base.h:
Added and updated function prototypes
sql/sql_class.h:
Added new prototypes to Locked_tables_list class
Added extra argument to select_create to avoid double call to eof() or send_error()
- I needed this in some edge case where the table was not created against expections.
sql/sql_db.cc:
New interface for check_if_log_table()
sql/sql_insert.cc:
Remember position to lock information so that we can reaquire table lock for LOCK TABLES + CREATE OR REPLACE TABLE SELECT. Later add back the lock by calling restore_lock().
Removed one not needed indentation level in create_table_from_items()
Ensure we don't call send_eof() or abort_result_set() twice.
sql/sql_lex.h:
Removed variable that I temporarly added in an earlier changeset
sql/sql_parse.cc:
Removed old test code (marked with QQ)
Ensure that we have open_strategy set as TABLE_LIST::OPEN_STUB in CREATE TABLE
Removed some IF NOT EXISTS code as this is now handled in create_table_table_impl().
Set OPTION_KEEP_LOGS later. This code had to be moved as the test for IF EXISTS has changed place.
DROP_ACL is required if one uses CREATE TABLE OR REPLACE.
sql/sql_partition_admin.cc:
New interface for check_if_log_table()
sql/sql_rename.cc:
New interface for check_if_log_table()
sql/sql_table.cc:
New interface for check_if_log_table()
Moved some code in mysql_rm_table() under a common test.
- Safe as temporary tables doesn't have statistics.
- !is_temporary_table(table) test was moved out from drop_temporary_table() and merged with upper level code.
- Added drop of normal and temporary tables in create_table_imp() if CREATE OR REPLACE was used.
- Added reacquiring of table locks in mysql_create_table() and mysql_create_like_table()
- In mysql_create_like_table(), restore table->open_strategy() if it was changed.
- Re-test if table was a view after opening it.
sql/sql_table.h:
New prototype for mysql_create_table_no_lock()
sql/sql_yacc.yy:
Added syntax for CREATE OR REPLACE TABLE
Reuse new code for CREATE OR REPLACE VIEW
sql/table.h:
Added name for enum type
sql/table_cache.cc:
More DBUG
MDEV-4725: Incorrect binlog state recovery if crash while writing event group
The binlog state was not recovered correctly if XA is not used (eg. InnoDB
disabled), or if server crashed in the middle of writing an event group to the
binlog.
With this patch, we ensure that recovery of binlog state is done even if we do
not do the full XA binlog recovery, and we ensure that we only recover fully
written event groups into the binlog state.
Implement @@gtid_binlog_state. This is the internal state of the binlog
(most recent GTID logged for every domain_id and server_id). This allows
to save the state before RESET MASTER and restore it afterwards.
Fix a bunch of issues found with locking, ordering, and non-thread-safe stuff
in Relay_log_info.
Now able to do a simple benchmark, showing 4.5 times speedup for applying a
binlog with 10000 REPLACE statements.
Impement options --binlog-commit-wait-count and
--binlog-commit-wait-usec.
These options permit the DBA to deliberately increase latency
of an individual commit to get more transactions in each
binlog group commit. This increases the opportunity for
parallel replication on the slave, and can also decrease I/O
load on the master.
The options also make it easier to test the parallel
replication with mysql-test-run.
Implement facility for the commit in one thread to wait for the commit of
another to complete first. The wait is done in a way that does not hinder
that a waiter and a waitee can group commit together with a single fsync()
in both binlog and InnoDB. The wait is done efficiently with respect to
locking.
The patch was originally made to support TaoBao parallel replication with
in-order commit; now it will be adapted to also be used for parallel
replication of group-committed transactions.
A waiter THD registers itself with a prior waitee THD. The waiter will then
complete its commit at the earliest in the same group commit of the waitee
(when using binlog). The wait can also be done explicitly by the waitee.
When @@GLOBAL.gtid_strict_mode=1, then certain operations result
in error that would otherwise result in out-of-order binlog files
between servers.
GTID sequence numbers are now allocated independently per domain;
this results in less/no holes in GTID sequences, increasing the
likelyhood that diverging binlogs will be caught by the slave when
GTID strict mode is enabled.
Change of user interface to be more logical and more in line with expectations
to work similar to old-style replication.
User can now explicitly choose in CHANGE MASTER whether binlog position is
taken into account (master_gtid_pos=current_pos) or not (master_gtid_pos=
slave_pos) when slave connects to master.
@@gtid_pos is replaced by three separate variables @@gtid_slave_pos (can
be set by user, replicated GTIDs only), @@gtid_binlog_pos (read only), and
@@gtid_current_pos (a combination of the two, most recent GTID within each
domain). mysql.rpl_slave_state is renamed to mysql.gtid_slave_pos to match.
This fixes MDEV-4474.
Merge of 10.0-mdev26 feature tree into 10.0-base.
Global transaction ID is prepended to each event group in the binlog.
Slave connect can request to start from GTID position instead of specifying
file name/offset of master binlog. This facilitates easy switch to a new
master.
Slave GTID state is stored in a table mysql.rpl_slave_state, which can be
InnoDB to get crash-safe slave state.
GTID includes a replication domain ID, allowing to keep track of distinct
positions for each of multiple masters.
Test crashing the master, check that it recovers the binlog state.
Fix one bug introduced by previous commit (crash-recoved binlog state was
overwritten by loading stale binlog state file).
Fix Windows build error.
Implement test case rpl_gtid_stop_start.test to test normal stop and restart
of master and slave mysqld servers.
Fix a couple bugs found with the test:
- When InnoDB is disabled (no XA), the binlog state was not read when master
mysqld starts.
- Remove old code that puts a bogus D-S-0 into the initial binlog state, it
is not correct in current design.
- Fix memory leak in gtid_find_binlog_file().
During server shutdown, we need to wait for binlog checkpointing to
finish in the binlog background thread before closing the binlog.
This was not done, so we could get assert and failure to finish the
final binlog checkpoint if shutdown happened in the middle.
Fix things so that a master can switch with MASTER_GTID_POS=AUTO to a slave
that was previously running with log_slave_updates=0, by looking into the
slave replication state on the master when the slave requests something not
present in the binlog.
Be a bit more strict about what position the slave can ask for, to avoid some
easy-to-hit misconfiguration errors.
Start over with seq_no counter when RESET MASTER.
When starting slave, check binlog state in addition to mysql.rpl_slave.state.
This allows to switch a previous master to be a slave directly
with MASTER_GTID_POS=AUTO.
Make the commit checkpoint inside InnoDB be asynchroneous.
Implement a background thread in binlog to do the writing and flushing of
binlog checkpoint events to disk.