Commit graph

415 commits

Author SHA1 Message Date
Marko Mäkelä
5ff66fb0b9 Merge 10.2 into 10.3 2020-01-31 11:37:12 +02:00
Marko Mäkelä
2daf3b14fe Merge 10.1 into 10.2 2020-01-31 10:53:56 +02:00
mkaruza
41bc736871 Galera GTID support
Support for galera GTID consistency thru cluster. All nodes in cluster
should have same GTID for replicated events which are originating from cluster.
Cluster originating commands need to contain sequential WSREP GTID seqno
Ignore manual setting of gtid_seq_no=X.

In master-slave scenario where master is non galera node replicated GTID is
replicated and is preserved in all nodes.

To have this - domain_id, server_id and seqnos should be same on all nodes.
Node which bootstraps the cluster, to achieve this, sends domain_id and
server_id to other nodes and this combination is used to write GTID for events
that are replicated inside cluster.

Cluster nodes that are executing non replicated events are going to have different
GTID than replicated ones, difference will be visible in domain part of gtid.

With wsrep_gtid_domain_id you can set domain_id for WSREP cluster.

Functions WSREP_LAST_WRITTEN_GTID, WSREP_LAST_SEEN_GTID and
WSREP_SYNC_WAIT_UPTO_GTID now works with "native" GTID format.

Fixed galera tests to reflect this chances.

Add variable to manually update WSREP GTID seqno in cluster

Add variable to manipulate and change WSREP GTID seqno. Next command
originating from cluster and on same thread will have set seqno and
cluster should change their internal counter to it's value.
Behavior is same as using @@gtid_seq_no for non WSREP transaction.
2020-01-29 15:06:06 +02:00
Sujatha
d89bb88674 MDEV-20923:UBSAN: member access within address … which does not point to an object of type 'xid_count_per_binlog'
Problem:
-------
Accessing a member within 'xid_count_per_binlog' structure results in
following error when 'UBSAN' is enabled.

member access within address 0xXXX which does not point to an object of type
'xid_count_per_binlog'

Analysis:
---------
The problem appears to be that no constructor for 'xid_count_per_binlog' is
being called, and thus the vtable will not be initialized.

Fix:
---
Defined a parameterized constructor for 'xid_count_per_binlog' class.
2020-01-29 16:33:05 +05:30
Sergey Vojtovich
6b0b25a25b Cleanup log_type_arg of MYSQL_BIN_LOG::open()
It is always LOG_BIN anyway.
2019-08-22 13:20:30 +04:00
Sergey Vojtovich
e976d95614 Cleanup MYSQL_LOG
Embed MYSQL_LOG::init().
Reduce visibility of MYSQL_LOG::init_and_set_log_file_name().
Cleanup unused mysql_bin_log_file_name() and mysql_bin_log_file_pos().
2019-08-22 13:20:30 +04:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
cb248f8806 Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru
5543b75550 Update FSF Address
* Update wrong zip-code
2019-05-11 21:29:06 +03:00
Sergei Golubchik
88961a28e2 MDEV-17710 "unknown error" with FLUSH LOGS if log directory is not writeable 2019-05-07 18:40:36 +02:00
Monty
48810a0014 MDEV-19116 Speed up rotation of binary logs
Fixed by caching last binary log number used in last_used_log_number

Other things:
- Moved locking of LOCK_log form new_file_impl() to new_file(). This fixed
  a bug where LOCK_log could have been unlocked even if 'need_lock' was
  not set.  Removed not anymore used argument need_lock.
- Made generate_new_name() virtual to simplify the code between
  other logs and binary log.

Reviewed by Andrei Elkin
2019-04-01 19:47:24 +03:00
Sergey Vojtovich
891be49a36 Simplified THD::current_linfo locking
LOG_INFO::lock was useless. It could've only protect against concurrent
iterators execution, which was already protected by LOCK_thread_count.

Use LOCK_thd_data instead of LOCK_thread_count as a protection against
THD::current_linfo reset.

Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
2019-01-28 17:39:07 +04:00
Brave Galera Crew
36a2a185fe Galera4 2019-01-23 15:30:00 +04:00
Marko Mäkelä
c6ba758d1d Merge 10.2 into 10.3 2018-04-23 09:49:58 +03:00
Sergei Petrunia
0c02c91bc1 MyRocks: MDEV-15911: Reduce debug logging on default levels in error log
MyRocks internally will print non-critical messages to
sql_print_verbose_info() which will do what InnoDB does in similar cases:
check if (global_system_variables.log_warnings > 2).
2018-04-19 14:13:28 +03:00
Vladislav Vaintroub
6c279ad6a7 MDEV-15091 : Windows, 64bit: reenable and fix warning C4267 (conversion from 'size_t' to 'type', possible loss of data)
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.

This fix excludes rocksdb, spider,spider, sphinx and connect for now.
2018-02-06 12:55:58 +00:00
Marko Mäkelä
0ba6aaf030 MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY
If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend
a lot of time rolling back writes to the intermediate copy of the table.
To reduce the amount of busy work done, a work-around was introduced in
commit fd069e2bb3 in MySQL 4.1.8 and 5.0.2,
to commit the transaction after every 10,000 inserted rows.

A proper fix would have been to disable the undo logging altogether and
to simply drop the intermediate copy of the table on subsequent server
startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585.
In MariaDB 10.2, the intermediate copy of the table would be left behind
with a name starting with the string #sql.

This is a backport of a bug fix from MySQL 8.0.0 to MariaDB,
contributed by jixianliang <271365745@qq.com>.

Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation
InnoDB must for now keep the undo logging enabled, so that the latest
row can be rolled back in case of an error.

In Galera cluster, the LOAD DATA statement will retain the existing
behaviour and commit the transaction after every 10,000 rows if
the parameter wsrep_load_data_splitting=ON is set. The logic to do
so (the wsrep_load_data_split() function and the call
handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work
by Ji Xianliang and Marko Mäkelä.

The original fix:

Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date:   Wed Dec 2 16:09:15 2015 +0530

Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY

Problem:

During ALTER TABLE, we commit and restart the transaction for every
10,000 rows, so that the rollback after recovery would not take so long.

Fix:

Suppress the undo logging during copy alter operation. If fts_index is
present then insert directly into fts auxiliary table rather
than doing at commit time.

ha_innobase::num_write_row: Remove the variable.

ha_innobase::write_row(): Remove the hack for committing every 10000 rows.

row_lock_table_for_mysql(): Remove the extra 2 parameters.

lock_get_src_table(), lock_is_table_exclusive(): Remove.

Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com>
Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
2018-01-30 20:24:23 +02:00
Marko Mäkelä
145ae15a33 Merge bb-10.2-ext into 10.3 2018-01-04 09:22:59 +02:00
Monty
fbab79c9b8 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext
Conflicts:
	cmake/make_dist.cmake.in
	mysql-test/r/func_json.result
	mysql-test/r/ps.result
	mysql-test/t/func_json.test
	mysql-test/t/ps.test
	sql/item_cmpfunc.h
2018-01-01 19:39:59 +02:00
Vicențiu Ciorbaru
9aeb5d01d6 Merge remote-tracking branch 'origin/10.1' into bb-10.2-vicentiu 2017-12-28 19:27:00 +02:00
Sachin Setiya
2fe6186124 MDEV-10715 Galera: Replicate MariaDB GTID to other nodes in the cluster
Problem:- Gtid are not transferred in Galera Cluster.

Solution:- We need to transfer gtid in the case on either when cluster is
slave/master in async replication. In normal Gtid replication gtid are generated on
recieving node itself and it is always on sync with other nodes. Because galera keeps
node in sync , So all nodes get same no of event groups. So the issue arises when
say galera is slave in async replication.
A
|    (Async replication)
D <-> E <-> F  {Galera replication}
So what should happen is that all node should apply the master gtid but this does
node happen, becuase node E, F does not recieve gtid from D in write set , So what E(or F)
does is that it applies wsrep_gtid_domain_id, D server-id , E gtid next seq no. This
generated gtid does not always work when say A has different domain id.

So In this commit, on galera node when we see that this event is recieved from master
we simply write Gtid_Log_Event in write_set and send it to other nodes.
2017-12-25 13:57:42 +05:30
Andrei Elkin
e972125f11 MDEV-13073 This part merges the Ali semisync related changes
and specifically the ack receiving functionality.
Semisync is turned to be static instead of plugin so its functions
are invoked at the same points as RUN_HOOKS.
The RUN_HOOKS and the observer interface remain to be removed by later
patch.

Todo:
  React on killed status by repl_semisync_master.wait_after_sync(). Currently
  Repl_semi_sync_master::commit_trx does not check the killed status.

  There were few bugfixes found that are present in mysql and its unclear
  whether/how they are covered. Those include:

  Bug#15985893: GTID SKIPPED EVENTS ON MASTER CAUSE SEMI SYNC TIME-OUTS
  Bug#17932935 CALLING IS_SEMI_SYNC_SLAVE() IN EACH FUNCTION CALL
                 HAS BAD PERFORMANCE
  Bug#20574628: SEMI-SYNC REPLICATION PERFORMANCE DEGRADES WITH A HIGH NUMBER OF THREADS
2017-12-18 13:43:37 +02:00
Monty
13770edbcb Changed from using LOCK_log to LOCK_binlog_end_pos for binary log
Part of MDEV-13073 AliSQL Optimize performance of semisync

The idea it to use a dedicated lock detecting if there is new data in
the master's binary log instead of the overused LOCK_log.

Changes:
- Use dedicated COND variables for the relay and binary log signaling.
  This was needed as we where the old 'update_cond' variable was used
  with different mutex's, which could cause deadlocks.
   - Relay log uses now COND_relay_log_updated and LOCK_log
   - Binary log uses now COND_bin_log_updated and LOCK_binlog_end_pos
- Renamed signal_cnt to relay_signal_cnt (as we now have two signals)
- Added some missing error handling in MYSQL_BIN_LOG::new_file_impl()
- Reformatted some comments with old style
- Renamed m_key_LOCK_binlog_end_pos to key_LOCK_binlog_end_pos
- Changed 'signal_update()' to update_binlog_end_pos() which works for
  both relay and binary log
2017-12-18 13:43:37 +02:00
Andrei Elkin
15219eb08a MDEV-14290 Binlog rotate crashes when two commit_checkpoint_notify capable engines.
The crash (sometimes assert) in MYSQL_BIN_LOG::mark_xid_done was caused by a
fact that log.cc:binlog_background_thread_queue could become a cyclic list.
This possibility becomes real with two checkpoint capable engines that
may execute TC_LOG_BINLOG::commit_checkpoint_notify() in succession before
binlog_background thread gets control and eventually finds a freed memory
while otherwise endlessly looping in while(queue).

It is fixed with counting the notificaion kind instead of en-listing the same notificaion kind in commit_checkpoint_notify as formerly. The while(queue) of binlog background thread is refined to pay attention to the new counter. In effectno more access to free memory is possible.
2017-12-11 12:41:45 +02:00
Marko Mäkelä
7cb3520c06 Merge bb-10.2-ext into 10.3 2017-11-30 08:16:37 +02:00
Alexander Barkov
5b697c5a23 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext 2017-11-29 12:06:48 +04:00
Sergei Golubchik
7f1900705b Merge branch '10.1' into 10.2 2017-11-21 19:47:46 +01:00
Andrei Elkin
aae4932775 MDEV-12012/MDEV-11969 Can't remove GTIDs for a stale GTID Domain ID
As reported in MDEV-11969 "there's no way to ditch knowledge" about some
domain that is no longer updated on a server. Besides being of annoyance to
clutter output in DBA console stale domains can prevent the slave
to connect the master as MDEV-12012 witnesses.
What domain is obsolete must be evaluated by the user (DBA) according
to whether the domain info is still relevant and will the domain ever
receive any update.

This patch introduces a method to discard obsolete gtid domains from
the server binlog state. The removal requires no event group from such
domain present in existing binlog files though. If there are any the
containing logs must be first PURGEd in order for

  FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)

succeed. Otherwise the command returns an error.

The list of obsolete domains can be computed through
intersecting two sets - the earliest (first) binlog's Gtid_list
and the current value of @@global.gtid_binlog_state - and extracting
the domain id components from the intersection list items.
The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog
omitting the deleted domains from the active binlog file's Gtid_list.
Notice though when the command is ineffective - that none of requested to delete
domain exists in the binlog state - rotation does not occur.

Obsolete domain deletion is not harmful for connected slaves as long
as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID.
The slaves must have the last event from purged files processed as usual,
in order not to bump later into requesting a gtid from a file which
was already gone.
While the command is not replicated (as ordinary FLUSH BINLOG LOGS is)
slaves, even though having extra domains, won't suffer from reconnection errors
thanks to master-slave gtid connection protocol allowing the master
to be ignorant about a gtid domain.
Should at failover such slave to be promoted into master role it may run
the ex-master's

 FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)

to clean its own binlog state.

NOTES.
  suite/perfschema/r/start_server_low_digest.result
is re-recorded as consequence of internal parser codes changes.
2017-11-15 22:26:32 +02:00
Michael Widenius
458d5ed8aa Lots of small cleanups
- Simplified use_trans_cache() to return at once if is_transactional is set
- Indentation and spelling errors fixed
- Don't call signal_update() if update_binlog_end_pos() is called as the
  function already calls signal_update()
- Removed not used function wait_for_update_bin_log(), which would cause
  errors if ever used.
- Simplified handler::clone() by always allocating 'ref' in ha_open(). To do
  this I added an optional MEM_ROOT argument to ha_open() to be used when
  allocating 'ref'
- Changed arguments to get_system_var() from LEX_CSTRING to LEX_CSTRING*
- Added THD as argument to create_select_for_variable(). Changed also char*
  argument to LEX_CSTRING to avoid strlen() call.
- Change calls to append() to use LEX_CSTRING
2017-08-24 01:05:49 +02:00
Kristian Nielsen
1d91910b94 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Merge into MariaDB 10.3.
2017-07-03 09:33:41 +02:00
Kristian Nielsen
c174718aed MDEV-12179: Per-engine mysql.gtid_slave_pos table
Intermediate commit.

Implement status variables to aid the DBA in determining the need
and/or effectiveness of the per-engine mylsq.gtid_slave_pos feature:

transactions_multi_engine

  Number of transactions that changed data in multiple (transactional)
  storage engines.

rpl_transactions_multi_engine

  Number of replicated transactions that involved changes in multiple
  (transactional) storage engines, before considering the update of the
  mysql.gtid_slave_posXXX table.

transactions_gtid_foreign_engine

  Number of replicated transactions where the update of the
  mysql.gtid_slave_posXXX table had to choose a storage engine that did not
  otherwise participate in the transaction.
2017-04-25 19:08:45 +02:00
Alexander Barkov
949faa2ec2 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext 2017-04-13 05:52:44 +04:00
Monty
546e7aa96f MDEV-8203 Assert in Query_log_event::do_apply_event()
This happens because the master writes a table_map event to the binary log, but no row event.
The slave has a check that there should always be a row event if there was a table_map event, which
causes a crash.

Fixed by remembering in the cache what kind of events are logged
and ignore cached statements which is just a table map event.
2017-04-07 15:58:17 +04:00
Sergei Golubchik
cd79be82d1 cleanup: unused method LOGGER::flush_logs 2017-04-07 09:55:54 +02:00
Marko Mäkelä
89d80c1b0b Fix many -Wconversion warnings.
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong.  Change some parameters to this type.

Use size_t in a few more places.

Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.

When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.

In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
2017-03-07 19:07:27 +02:00
Sergei Golubchik
06b7fce9f2 Merge branch '10.1' into 10.2 2016-09-09 08:33:08 +02:00
Sergei Golubchik
932646b1ff Merge branch '10.1' into 10.2 2016-06-30 16:38:05 +02:00
Nirbhay Choubey
3fd214c8be MDEV-9423: cannot add new node to the cluser: Binlog..
.. file '/var/log/mysql/mariadb-bin.000001' not found in binlog
index, needed for recovery. Aborting.

In Galera cluster, while preparing for rsync/xtrabackup based
SST, the donor node takes an FTWRL followed by (REFRESH_ENGINE_LOG
in rsync based state transfer and) REFRESH_BINARY_LOG. The latter
rotates the binary log and logs Binlog_checkpoint_log_event
corresponding to the penultimate binary log file into the new file.
The checkpoint event for the current file is later logged
synchronously by binlog_background_thread.

Now, since in rsync/xtrabackup based snapshot state transfer methods,
only the last binary log file is transferred to the joiner node; the
file could get transferred even before the checkpoint event for the
same file gets written to it. As a result, the joiner node would fail
to start complaining about the missing binlog file needed for recovery.

In order to fix this, a mechanism has been put in place to make
REFRESH_BINARY_LOG operation wait for Binlog_checkpoint_log_event
to be logged for the current binary log file if the node is part of
a Galera cluster. As further safety, during rsync based state transfer
the donor node now acquires and owns LOCK_log for the duration of file
transfer during SST.
2016-06-29 16:50:53 -04:00
Sergei Golubchik
3361aee591 Merge branch '10.0' into 10.1 2016-06-28 22:01:55 +02:00
Sergei Golubchik
c081c978a2 Merge branch '5.5' into bb-10.0 2016-06-21 14:11:02 +02:00
Sergei Golubchik
ae29ea2d86 Merge branch 'mysql/5.5' into 5.5 2016-06-14 13:55:28 +02:00
Sergei Golubchik
4a0612ed2a stop binlog background thread together with others
that fixes many rpl tests failures
2016-06-04 09:06:00 +02:00
Sujatha Sivakumar
ef3f09f0c9 Bug#23251517: SEMISYNC REPLICATION HANGING
Revert following bug fix:

Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE

This fix results in a deadlock between slave IO thread
and SQL thread.

(cherry picked from commit e3fea6c6dbb36c6ab21c4ab777224560e9608b53)
2016-05-16 11:34:20 +02:00
Monty
636bb59034 Final fixes for Memory_used
- Change some static variables to dynamic to ensure that we don't do any memory
  allocations before server starts or stops
- Print more memory information on SIGHUP. Fixed output.
- Write out if memory was lost if run with --debug-at-exit
- Fixed wrong #ifdef in sql_cache.cc
2016-04-28 17:15:38 +03:00
Sujatha Sivakumar
8361151765 Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE

Problem:
========
Currently SHOW SLAVE STATUS blocks if IO thread waits for
disk space. This makes automation tools verifying
server health block on taking relevant action. Finally this
will create SHOW SLAVE STATUS piles.

Analysis:
=========
SHOW SLAVE STATUS hangs on mi->data_lock if relay log write
is waiting for free disk space while holding mi->data_lock.
mi->data_lock is needed to protect the format description
event (mi->format_description_event) which is accessed by
the clients running FLUSH LOGS and slave IO thread. Note
relay log writes don't need to be protected by
mi->data_lock, LOCK_log is used to protect relay log between
IO and SQL thread (see MYSQL_BIN_LOG::append_event). The
code takes mi->data_lock to protect
mi->format_description_event during relay log rotate which
might get triggered right after relay log write.

Fix:
====
Release the data_lock just for the duration of writing into
relay log.

Made change to ensure the following lock order is maintained
to avoid deadlocks.

data_lock, LOCK_log

data_lock is held during relay log rotations to protect
the description event.
2016-03-01 12:29:51 +05:30
Sergei Golubchik
fb8713385f remove unneded #include's that had a dubious explanation 2015-10-24 19:58:34 +02:00
Sergei Golubchik
b85a00161e MDEV-8264 encryption for binlog
* Start_encryption_log_event
* --encrypt-binlog command line option

based on google patches.
2015-09-04 10:33:55 +02:00
Sergei Golubchik
41d68cabee cleanup: Log_event::write() and MYSQL_BIN_LOG::write_cache()
Introduce Log_event_writer() that encapsulates
writing data to an IO_CACHE with automatic checksum calculation.

Now all events properly checksum themselves as needed.

Use Log_event_writer in MYSQL_BIN_LOG::write_cache() instead
of copy-pasting its logic all over.

Later Log_event_writer will also do encryption.
2015-09-04 10:33:55 +02:00
Sergei Golubchik
2d2286faf3 cleanup: use enum_binlog_checksum_alg, not uint8
* fix unireg.h includes
* use enum_binlog_checksum_alg for binlog checksum variables,
  not uint8
2015-09-04 10:33:52 +02:00
Monty
872a953b22 MDEV-8469 Add RESET MASTER TO x to allow specification of binlog file nr
Other things:
- Avoid calling init_and_set_log_file_name() when opening binary log.
- Remove newlines early when reading from index file.
- Ensure that reset_logs() will work even if thd is 0 (Can happen on startup)
- Added thd to sart_slave_threads() for better error handling.
2015-07-16 10:36:58 +03:00
Sergei Golubchik
658992699b Merge tag 'mariadb-10.0.20' into 10.1 2015-06-27 20:35:26 +02:00
Sergei Golubchik
810cf362ea Merge branch '5.5' into 10.0 2015-06-11 20:20:35 +02:00
Nirbhay Choubey
f965cae5fb MDEV-7110 : Add missing MySQL variable log_bin_basename and log_bin_index
Add log_bin_index, log_bin_basename and relay_log_basename system
variables. Also, convert relay_log_index system variable to
NO_CMD_LINE and implement --relay-log-index as a command line
option.
2015-06-09 13:38:29 -04:00
Sergei Golubchik
5091a4ba75 Merge tag 'mariadb-10.0.19' into 10.1 2015-06-01 15:51:25 +02:00
Nirbhay Choubey
6f8558bbd4 Fix for debug build failure
Do not use format function attribute for sql_print_xxx() family of
functions as they use a MariaDB-specific extension of printf instead
of one provided by the system.
2015-05-12 14:19:30 -04:00
Kristian Nielsen
9088f26f20 MDEV-7802: group commit status variable addition
Backport into 10.0
2015-04-29 11:29:25 +02:00
Kristian Nielsen
a15a4d674d Merge MDEV-7802 into 10.1 2015-04-20 13:22:51 +02:00
Kristian Nielsen
791b0ab5db Merge 10.0 -> 10.1 2015-04-20 13:21:58 +02:00
Daniel Black
1d5220d112 binlog_group_commit_* status variables update
remove group_commit_reason_immediate
rename group_commit_reason_transaction to group_commit_trigger_lock_wait
rename group_commit_reason_usec to group_commit_trigger_timeout
rename group_commit_reason_count to group_commit_triggger_count
2015-04-01 22:47:36 +11:00
Daniel Black
54287adc27 MDEV-7802 Add status binlog_group_commit_reason_*
The following global status variables where added:
* binlog_group_commit_reason_count
* binlog_group_commit_reason_usec
* binlog_group_commit_reason_transaction
* binlog_group_commit_reason_immediate

binlog_group_commit_reason_count corresponds to group commits made by
virtue of the binlog_commit_wait_count variable.

binlog_group_commit_reason_usec corresponds to the binlog_commit_wait_usec
variable.

binlog_group_commit_reason_transaction is a result of ordered
transaction that need to occur in the same order on the slave and can't
be parallelised.

binlog_group_commit_reason_immediate is caused to prevent stalls with
row locks as described in log.cc:binlog_report_wait_for. This immediate
count is also counted a second time in binlog_group_commit_reason_transaction.

Overall binlog_group_commits = binlog_group_commit_reason_count +
binlog_group_commit_reason_usec + binlog_group_commit_reason_transaction

This work was funded thanks to Open Source Developers Club Australia.
2015-03-19 15:26:58 +11:00
Kristian Nielsen
184f718fef MDEV-7249: Performance problem in parallel replication with multi-level slaves
Parallel replication (in 10.0 / "conservative" mode) relies on binlog group
commits to group transactions that can be safely run in parallel on the
slave. The --binlog-commit-wait-count and --binlog-commit-wait-usec options
exist to increase the number of commits per group. But in case of conflicts
between transactions, this can cause unnecessary delay and reduced througput,
especially on a slave where commit order is fixed.

This patch adds a heuristics to reduce this problem. When transaction T1 goes
to commit, it will first wait for N transactions to queue up for a group
commit. However, if we detect that another transaction T2 is waiting for a row
lock held by T1, then we will skip the wait and let T1 commit immediately,
releasing locks and let T2 continue.

On a slave, this avoids the unfortunate situation where T1 is waiting for T2
to join the group commit, but T2 is waiting for T1 to release locks, causing
no work to be done for the duration of the --binlog-commit-wait-usec timeout.

(The heuristic seems reasonable on the master as well, so it is enabled for
all transactions, not just replication transactions).
2015-03-13 14:01:52 +01:00
Sergei Golubchik
863cfb3fa5 small cleanup, remove a useless function 2015-01-31 21:51:45 +01:00
Sergei Golubchik
4b21cd21fe Merge branch '10.0' into merge-wip 2015-01-31 21:48:47 +01:00
Sergei Golubchik
510ca9b697 MDEV-7402 'reset master' hangs, waits for signalled COND_xid_list
Using a boolean flag for 'there is a RESET MASTER in progress' doesn't
work very well for multiple concurrent RESET MASTER statements.
Changed to a counter.
2015-01-19 14:32:28 +01:00
Sergey Vojtovich
dc92032fa3 Fixed sysvars_server_embedded test result to reflect new values for
query_prealloc_size, query_alloc_block_size and log_tc_size.

Fixed incorrect registration of LOCK_binlog_end_pos in PFS.
2014-12-29 15:41:08 +04:00
Sergey Vojtovich
f65901eef2 MDEV-7273 - 10.1 fails to start up during tc_log initializations on PPC64
log-tc-size is 24K by default. Page size is 64K on PPC64. But log-tc-size
must be at least 3 x page size. This is enforced by TC_LOG_MMAP::open()
with a comment: to guarantee non-empty pool.

This all makes server not startable in default configuration on PPC64.

Autosize log-tc-size, so that it's min value= page size * 3, default
value= page size * 6, block size= page size.
2014-12-26 23:38:45 +04:00
Jonas Oreland
0b87de124d MDEV-162 Enhanced semisync replication
Implement --semi-sync-master-wait-point=AFTER_SYNC|AFTER_COMMIT.

When AFTER_SYNC, the semi-sync wait will be done earlier, before the storage
engine commit rather than after. This means that a transaction will not be
visible on the master until at least one slave has received it.
2014-12-23 14:16:32 +01:00
Jonas Oreland
4d8b346e07 MDEV-7257: Dump Thread Enhancements
Make the binlog dump threads not need to take LOCK_log while sending
binlog events to slave. Instead, a new LOCK_binlog_end_pos is used
just to coordinate tracking the current end-of-log.

This is a pre-requisite for MDEV-162, "Enhanced semisync
replication". It should also help reduce the contention on LOCK_log on
a busy master.

Also does some much-needed refactoring/cleanup of the related code in
the binlog dump thread.
2014-12-23 14:16:13 +01:00
Sergei Golubchik
4b9bf9d3b8 bugfix: remove the code that broke XA recovery 2014-10-01 23:38:27 +02:00
Jan Lindström
595bcb7947 Fix merge error on binlog_remove_pending_rows causing failure
on binlog_innodb_row test.
2014-09-10 18:48:26 +03:00
Jan Lindström
df4dd593f2 MDEV-6247: Merge 10.0-galera to 10.1.
Merged lp:maria/maria-10.0-galera up to revision 3879.

Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.
2014-08-26 15:43:46 +03:00
Sergei Golubchik
a6071cc596 MDEV-6082 Assertion `0' fails in TC_LOG_DUMMY::log_and_order on DML after installing TokuDB at runtime on server with disabled InnoDB
We don't support changing tc_log implementation at run time.
If the first XA-capable engine is loaded with INSTALL PLUGIN - disable its
XA capabilities with a warning
2014-07-27 21:02:00 +02:00
Sergei Golubchik
6fb17a0601 5.5.39 merge 2014-08-07 18:06:56 +02:00
Michael Widenius
c4bb7cd6dc Fix for MDEV-5589: "Discrepancy in binlog on half-failed CREATE OR REPLACE"
Now if CREATE OR REPLACE fails but we have deleted a table already, we will generate a DROP TABLE in the binary log.
This fixes this issue.

In addition, for a failing CREATE OR REPLACE TABLE ... SELECT we don't generate a log of all the inserted rows, only the DROP TABLE.

I added code for not logging DROP TEMPORARY TABLE for tables where the CREATE TABLE was not logged. This code will be activated in 10.1
by removing the code protected by DONT_LOG_DROP_OF_TEMPORARY_TABLES.





mysql-test/suite/rpl/r/create_or_replace_mix.result:
  More test cases
mysql-test/suite/rpl/r/create_or_replace_row.result:
  More test cases
mysql-test/suite/rpl/r/create_or_replace_statement.result:
  More test cases
mysql-test/suite/rpl/t/create_or_replace.inc:
  More test cases
sql/log.cc:
  Added binlog_reset_cache() to clear the binary log.
sql/log.h:
  Added prototype
sql/sql_insert.cc:
  If CREATE OR REPLACE TABLE ... SELECT fails:
  - Don't log anything if nothing changed
  - If table was deleted, log a DROP TABLE.
  Remember if we table creation of temporary tables was logged.
sql/sql_table.cc:
  Added log_drop_table()
  Remember if we table creation of temporary tables was logged.
  If CREATE OR REPLACE TABLE ... SELECT fails and a table was deleted, log a DROP TABLE.
sql/sql_table.h:
  Added prototype
sql/sql_truncate.cc:
  Remember if we table creation of temporary tables was logged.
sql/table.h:
  Added table_creation_was_logged
2014-03-20 00:59:13 +02:00
unknown
dd93ec5633 Merge MariaDB 10.0-base to 10.0. 2014-02-10 15:12:17 +01:00
unknown
07eaf6ea76 MDEV-5636: Deadlock in RESET MASTER
The problem is a deadlock between MYSQL_BIN_LOG::reset_logs() and
MYSQL_BIN_LOG::mark_xid_done(). The former takes LOCK_log and waits for the
latter to complete. But the latter also tries to take LOCK_log; this can lead
to a deadlock.

There was already code that tries to deal with this, with the flag
reset_master_pending. However, there was still a small opportunity for
deadlock, when an previous mark_xid_done() is still running when reset_logs()
is called and is at the precise point where it first releases LOCK_xid_list
and then re-aquires both LOCK_log and LOCK_xid_list.

Solve by setting reset_master_pending in reset_logs() before taking
LOCK_log. And also count how many invocations of LOCK_xid_list are in the
progress of releasing and re-aquiring locks, and in reset_logs() wait for that
number to drop to zero after setting reset_master_pending and before taking
LOCK_log.
2014-02-09 00:56:18 +01:00
Michael Widenius
10001c8e4f Automatic merge 2014-02-05 19:23:11 +02:00
Michael Widenius
7ffc9da093 Implementation of MDEV-5491: CREATE OR REPLACE TABLE
Using CREATE OR REPLACE TABLE is be identical to

DROP TABLE IF EXISTS table_name;
CREATE TABLE ...;

Except that:

* CREATE OR REPLACE is be atomic (now one can create the same table between drop and create).
* Temporary tables will not shadow the table name for the DROP as the CREATE TABLE tells us already if we are using a temporary table or not.
* If the table was locked with LOCK TABLES, the new table will be locked with the same lock after it's created.

Implementation details:
- We don't anymore open the to-be-created table during CREATE TABLE, which the original code did.
  - There is no need to open a table we are planning to create. It's enough to check if the table exists or not.
- Removed some of duplicated code for CREATE IF NOT EXISTS.
- Give an error when using CREATE OR REPLACE with IF NOT EXISTS (conflicting options).
- As a side effect of the code changes, we don't anymore have to internally re-prepare prepared statements with CREATE TABLE if the table exists.
- Made one code path for all testing if log table are in use.
- Better error message if one tries to create/drop/alter a log table in use
- Added back disabled rpl_row_create_table test as it now seams to work and includes a lot of interesting tests.
- Added HA_LEX_CREATE_REPLACE to mark if we are using CREATE OR REPLACE
- Aligned CREATE OR REPLACE parsing code in sql_yacc.yy for TABLE and VIEW
- Changed interface for drop_temporary_table() to make it more reusable
- Changed Locked_tables_list::init_locked_tables() to work on the table object instead of the table list object. Before this it used a mix of both, which was not good.
- Locked_tables_list::unlock_locked_tables(THD *thd) now requires a valid thd argument. Old usage of calling this with 0 i changed to instead call Locked_tables_list::reset()
- Added functions Locked_tables_list:restore_lock() and Locked_tables_list::add_back_last_deleted_lock() to be able to easily add back a locked table to the lock list.
- Added restart_trans_for_tables() to be able to restart a transaction.
- DROP_ACL is required if one uses CREATE TABLE OR REPLACE.
- Added drop of normal and temporary tables in create_table_imp() if CREATE OR REPLACE was used.
- Added reacquiring of table locks in mysql_create_table() and mysql_create_like_table()




mysql-test/include/commit.inc:
  With new code we get fewer status increments
mysql-test/r/commit_1innodb.result:
  With new code we get fewer status increments
mysql-test/r/create.result:
  Added testing of create or replace with timeout
mysql-test/r/create_or_replace.result:
  Basic testing of CREATE OR REPLACE TABLE
mysql-test/r/partition_exchange.result:
  New error message
mysql-test/r/ps_ddl.result:
  Fewer reprepares with new code
mysql-test/suite/archive/discover.result:
  Don't rediscover archive tables if the .frm file exists
  (Sergei will look at this if there is a better way...)
mysql-test/suite/archive/discover.test:
  Don't rediscover archive tables if the .frm file exists
  (Sergei will look at this if there is a better way...)
mysql-test/suite/funcs_1/r/innodb_views.result:
  New error message
mysql-test/suite/funcs_1/r/memory_views.result:
  New error message
mysql-test/suite/rpl/disabled.def:
  rpl_row_create_table should now be safe to use
mysql-test/suite/rpl/r/rpl_row_create_table.result:
  Updated results after adding back disabled test
mysql-test/suite/rpl/t/rpl_create_if_not_exists.test:
  Added comment
mysql-test/suite/rpl/t/rpl_row_create_table.test:
  Added CREATE OR REPLACE TABLE test
mysql-test/t/create.test:
  Added CREATE OR REPLACE TABLE test
mysql-test/t/create_or_replace-master.opt:
  Create logs
mysql-test/t/create_or_replace.test:
  Basic testing of CREATE OR REPLACE TABLE
mysql-test/t/partition_exchange.test:
  Error number changed as we are now using same code for all log table change issues
mysql-test/t/ps_ddl.test:
  Fewer reprepares with new code
sql/handler.h:
  Moved things around a bit in a structure to get better alignment.
  Added HA_LEX_CREATE_REPLACE to mark if we are using CREATE OR REPLACE
  Added 3 elements to end of HA_CREATE_INFO to be able to store state to add backs locks in case of LOCK TABLES.
sql/log.cc:
  Reimplemented check_if_log_table():
  - Simpler and faster usage
  - Can give error messages
  
  This gives us one code path for allmost all error messages if log tables are in use
sql/log.h:
  New interface for check_if_log_table()
sql/slave.cc:
  More logging
sql/sql_alter.cc:
  New interface for check_if_log_table()
sql/sql_base.cc:
  More documentation
  Changed interface for drop_temporary_table() to make it more reusable
  Changed Locked_tables_list::init_locked_tables() to work on the table object instead of the table list object. Before this it used a mix of both, which was not good.
  Locked_tables_list::unlock_locked_tables(THD *thd) now requires a valid thd argument.  Old usage of calling this with 0 i changed to instead call Locked_tables_list::reset()
  Added functions Locked_tables_list:restore_lock() and Locked_tables_list::add_back_last_deleted_lock() to be able to easily add back a locked table to the lock list.
  Check for command number instead of open_strategy of CREATE TABLE was used.
  Added restart_trans_for_tables() to be able to restart a transaction.  This was needed in "create or replace ... select" between the drop table and the select.
sql/sql_base.h:
  Added and updated function prototypes
sql/sql_class.h:
  Added new prototypes to Locked_tables_list class
  Added extra argument to select_create to avoid double call to eof() or send_error()
  - I needed this in some edge case where the table was not created against expections.
sql/sql_db.cc:
  New interface for check_if_log_table()
sql/sql_insert.cc:
  Remember position to lock information so that we can reaquire table lock for LOCK TABLES + CREATE OR REPLACE TABLE SELECT. Later add back the lock by calling restore_lock().
  Removed one not needed indentation level in create_table_from_items()
  Ensure we don't call send_eof() or abort_result_set() twice.
sql/sql_lex.h:
  Removed variable that I temporarly added in an earlier changeset
sql/sql_parse.cc:
  Removed old test code (marked with QQ)
  Ensure that we have open_strategy set as TABLE_LIST::OPEN_STUB in CREATE TABLE
  Removed some IF NOT EXISTS code as this is now handled in create_table_table_impl().
  Set OPTION_KEEP_LOGS later. This code had to be moved as the test for IF EXISTS has changed place.
  DROP_ACL is required if one uses CREATE TABLE OR REPLACE.
sql/sql_partition_admin.cc:
  New interface for check_if_log_table()
sql/sql_rename.cc:
  New interface for check_if_log_table()
sql/sql_table.cc:
  New interface for check_if_log_table()
  Moved some code in mysql_rm_table() under a common test.
  - Safe as temporary tables doesn't have statistics.
  - !is_temporary_table(table) test was moved out from drop_temporary_table() and merged with upper level code.
  - Added drop of normal and temporary tables in create_table_imp() if CREATE OR REPLACE was used.
  - Added reacquiring of table locks in mysql_create_table() and mysql_create_like_table()
  - In mysql_create_like_table(), restore table->open_strategy() if it was changed.
  - Re-test if table was a view after opening it.
sql/sql_table.h:
  New prototype for mysql_create_table_no_lock()
sql/sql_yacc.yy:
  Added syntax for CREATE OR REPLACE TABLE
  Reuse new code for CREATE OR REPLACE VIEW
sql/table.h:
  Added name for enum type
sql/table_cache.cc:
  More DBUG
2014-01-29 15:37:17 +02:00
unknown
4d6ee2d119 MDEV-5363: Make parallel replication waits killable
Make wait_for_prior_commit killable, and handle the error if
killed.
2013-12-05 14:36:09 +01:00
unknown
55a7159f53 MDEV-4982: GTID looses all binlog state after crash if InnoDB is disabled
MDEV-4725: Incorrect binlog state recovery if crash while writing event group

The binlog state was not recovered correctly if XA is not used (eg. InnoDB
disabled), or if server crashed in the middle of writing an event group to the
binlog.

With this patch, we ensure that recovery of binlog state is done even if we do
not do the full XA binlog recovery, and we ensure that we only recover fully
written event groups into the binlog state.
2013-11-21 14:42:25 +01:00
unknown
cb86ce60b9 Merge MDEV-4506: Parallel replication into 10.0-base. 2013-11-01 09:17:06 +01:00
unknown
d107bdaa01 MDEV-4506, parallel replication.
Some after-review fixes.
2013-09-13 15:09:57 +02:00
unknown
ada15c7a0f Fix various places where code would work incorrectly if the common_header_len of events is different on master and slave
Patch developed with the help of Pavel Ivanov.

Also fix an uninitialised variable in queue_event().
2013-09-04 12:22:09 +02:00
unknown
f9c2b402f4 MDEV-26: Global transaction ID.
Implement @@gtid_binlog_state. This is the internal state of the binlog
(most recent GTID logged for every domain_id and server_id). This allows
to save the state before RESET MASTER and restore it afterwards.
2013-08-23 14:02:13 +02:00
unknown
a99356fbe7 MDEV-4506: Parallel replication: intermediate commit.
Fix a bunch of issues found with locking, ordering, and non-thread-safe stuff
in Relay_log_info.

Now able to do a simple benchmark, showing 4.5 times speedup for applying a
binlog with 10000 REPLACE statements.
2013-07-08 16:47:07 +02:00
unknown
e654be3865 MDEV-4506: Parallel replication: Intermediate commit.
Impement options --binlog-commit-wait-count and
--binlog-commit-wait-usec.

These options permit the DBA to deliberately increase latency
of an individual commit to get more transactions in each
binlog group commit. This increases the opportunity for
parallel replication on the slave, and can also decrease I/O
load on the master.

The options also make it easier to test the parallel
replication with mysql-test-run.
2013-07-05 00:26:15 +02:00
unknown
7e5dc4f074 MDEV-4506: Parallel replication. Intermediate commit.
Implement facility for the commit in one thread to wait for the commit of
another to complete first. The wait is done in a way that does not hinder
that a waiter and a waitee can group commit together with a single fsync()
in both binlog and InnoDB. The wait is done efficiently with respect to
locking.

The patch was originally made to support TaoBao parallel replication with
in-order commit; now it will be adapted to also be used for parallel
replication of group-committed transactions.

A waiter THD registers itself with a prior waitee THD. The waiter will then
complete its commit at the earliest in the same group commit of the waitee
(when using binlog). The wait can also be done explicitly by the waitee.
2013-06-26 12:10:35 +02:00
unknown
26a9fbc416 MDEV-4506: Parallel replication of group-committed transactions: Intermediate commit
First very rough sketch. We spawn and retire a pool of slave threads.
Test main.alias works, most likely not much else does.
2013-06-24 10:50:25 +02:00
unknown
ee2b7db3f8 MDEV-4478: Implement GTID "strict mode"
When @@GLOBAL.gtid_strict_mode=1, then certain operations result
in error that would otherwise result in out-of-order binlog files
between servers.

GTID sequence numbers are now allocated independently per domain;
this results in less/no holes in GTID sequences, increasing the
likelyhood that diverging binlogs will be caught by the slave when
GTID strict mode is enabled.
2013-05-28 13:28:31 +02:00
unknown
1cd6eb5f94 MDEV-26: Global transaction ID.
Change of user interface to be more logical and more in line with expectations
to work similar to old-style replication.

User can now explicitly choose in CHANGE MASTER whether binlog position is
taken into account (master_gtid_pos=current_pos) or not (master_gtid_pos=
slave_pos) when slave connects to master.

@@gtid_pos is replaced by three separate variables @@gtid_slave_pos (can
be set by user, replicated GTIDs only), @@gtid_binlog_pos (read only), and
@@gtid_current_pos (a combination of the two, most recent GTID within each
domain). mysql.rpl_slave_state is renamed to mysql.gtid_slave_pos to match.

This fixes MDEV-4474.
2013-05-22 17:36:48 +02:00
unknown
665a31af2b MDEV-26: Global transaction ID. First alpha release.
Merge of 10.0-mdev26 feature tree into 10.0-base.

Global transaction ID is prepended to each event group in the binlog.

Slave connect can request to start from GTID position instead of specifying
file name/offset of master binlog. This facilitates easy switch to a new
master.

Slave GTID state is stored in a table mysql.rpl_slave_state, which can be
InnoDB to get crash-safe slave state.

GTID includes a replication domain ID, allowing to keep track of distinct
positions for each of multiple masters.
2013-04-15 10:55:27 +02:00
unknown
b0389850a5 MDEV-26: Global transaction ID.
Test crashing the master, check that it recovers the binlog state.

Fix one bug introduced by previous commit (crash-recoved binlog state was
overwritten by loading stale binlog state file).

Fix Windows build error.
2013-03-27 19:29:59 +01:00
unknown
0fdbdde474 MDEV-26: Global transaction ID.
Implement test case rpl_gtid_stop_start.test to test normal stop and restart
of master and slave mysqld servers.

Fix a couple bugs found with the test:

 - When InnoDB is disabled (no XA), the binlog state was not read when master
   mysqld starts.

 - Remove old code that puts a bogus D-S-0 into the initial binlog state, it
   is not correct in current design.

 - Fix memory leak in gtid_find_binlog_file().
2013-03-27 16:06:45 +01:00
unknown
22f91eddb1 MDEV-4322: Race in binlog checkpointing during server shutdown.
During server shutdown, we need to wait for binlog checkpointing to
finish in the binlog background thread before closing the binlog.

This was not done, so we could get assert and failure to finish the
final binlog checkpoint if shutdown happened in the middle.
2013-03-25 12:05:27 +01:00
unknown
9d9ddad759 MDEV-26: Global transaction ID.
Fix things so that a master can switch with MASTER_GTID_POS=AUTO to a slave
that was previously running with log_slave_updates=0, by looking into the
slave replication state on the master when the slave requests something not
present in the binlog.

Be a bit more strict about what position the slave can ask for, to avoid some
easy-to-hit misconfiguration errors.

Start over with seq_no counter when RESET MASTER.
2013-03-18 15:09:36 +01:00
unknown
1d35777647 MDEV-26: Global transaction ID.
When starting slave, check binlog state in addition to mysql.rpl_slave.state.

This allows to switch a previous master to be a slave directly
with MASTER_GTID_POS=AUTO.
2013-01-25 15:21:49 +01:00
Igor Babaev
7760efad74 Merge mariadb-5.5 -> 10.0-base. 2012-12-16 16:49:19 -08:00
unknown
40bbf697aa MDEV-532: Async InnoDB commit checkpoint.
Make the commit checkpoint inside InnoDB be asynchroneous.
Implement a background thread in binlog to do the writing and flushing of
binlog checkpoint events to disk.
2012-12-14 15:38:07 +01:00