Commit graph

62 commits

Author SHA1 Message Date
Andrei
8d238d4726 MDEV-28609 refine gtid-strict-mode to ignore same server-id gtid from the past
... on semisync slave

To provide semisync master crash-recovery the same server-id transactions
were made to accept for execution on the semisync slave when the strict gtid
mode (see MDEV-27760).
That however caused out-of-order error on a master's transaction
server of the circular setup.
The error was fair in the sense of the gtid strict mode rule as indeed
under the condition of the circular setup the replicated transaction
already exists in the local binlog.

This is fixed by the commit to ignore on the gtid strict mode semisync
slave those gtids that exist in the slave's binlog that effectively restores
the default same-server-id ignore policy.
At the same time the fixes complies with MDEV-21117 semisync slave recovery
to accept the same server-id transactions that do not exist in local binlog.
2022-07-26 16:01:14 +03:00
Sergei Golubchik
ef781162ff Merge branch '10.4' into 10.5 2022-05-09 22:04:06 +02:00
Sergei Golubchik
a70a1cf3f4 Merge branch '10.3' into 10.4 2022-05-08 23:03:08 +02:00
Oleksandr Byelkin
9614fde1aa Merge branch '10.2' into 10.3 2022-05-03 10:59:54 +02:00
Andrei
1bcdc3e9eb MDEV-27697 slave must recognize incomplete replication event group
In cases of a faulty master or an incorrect binlog event producer, that slave is working with,
sends an incomplete group of events slave must react with an error to not to log
into the relay-log any new events that do not belong to the incomplete group.

Fixed with extending received event properties check when slave connects to master
in gtid mode.
Specifically for the event that can be a part of a group its relay-logging is
permitted only when its position within the group is validated.
Otherwise slave IO thread stops with ER_SLAVE_RELAY_LOG_WRITE_FAILURE.
2022-04-25 16:00:35 +03:00
Marko Mäkelä
3b25083785 Merge 10.4 into 10.5 2020-03-23 10:50:14 +02:00
Sergey Vojtovich
a39d92ca57 gtid_pos_table: my_atomic to std::atomic 2020-03-21 17:36:38 +04:00
Sergey Vojtovich
4d9977e5ff default_gtid_pos_table: my_atomic to std::atomic 2020-03-21 15:55:00 +04:00
Sergei Golubchik
2ac3121af2 perfschema - various collateral cleanups and small changes 2020-03-10 19:24:22 +01:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
cb248f8806 Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
Kristian Nielsen
34f11b06e6 Move deletion of old GTID rows to slave background thread
This patch changes how old rows in mysql.gtid_slave_pos* tables are deleted.
Instead of doing it as part of every replicated transaction in
record_gtid(), it is done periodically (every @@gtid_cleanup_batch_size
transaction) in the slave background thread.

This removes the deletion step from the replication process in SQL or worker
threads, which could speed up replication with many small transactions. It
also decreases contention on the global mutex LOCK_slave_state. And it
simplifies the logic, eg. when a replicated transaction fails after having
deleted old rows.

With this patch, the deletion of old GTID rows happens asynchroneously and
slightly non-deterministic. Thus the number of old rows in
mysql.gtid_slave_pos can temporarily exceed @@gtid_cleanup_batch_size. But
all old rows will be deleted eventually after sufficiently many new GTIDs
have been replicated.
2018-12-07 07:10:40 +01:00
Marko Mäkelä
32062cc61c Merge 10.1 into 10.2 2018-11-06 08:41:48 +02:00
Kristian Nielsen
3eb2c46644 Merge branch 'gtid_table_garbage_rows' into gtid_table_garbage_rows_10.3 2018-10-07 23:40:32 +02:00
Kristian Nielsen
2f4a0c5be2 Fix accumulation of old rows in mysql.gtid_slave_pos
This would happen especially in optimistic parallel replication, where there
is a good chance that a transaction will be rolled back (due to conflicts)
after it has executed record_gtid(). If the transaction did any deletions of
old rows as part of record_gtid(), those deletions will be undone as well.
And the code did not properly ensure that the deletions would be re-tried.

This patch makes record_gtid() remember the list of deletions done as part
of a transaction. Then in rpl_slave_state::update() when the changes have
been committed, we discard the list. However, in case of error and rollback,
in cleanup_context() we will instead put the list back into
rpl_global_gtid_slave_state so that the deletions will be re-tried later.

Probably fixes part of the cause of MDEV-12147 as well.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2018-10-07 18:59:52 +02:00
Monty
a7e352b54d Changed database, tablename and alias to be LEX_CSTRING
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db

Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
  for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
  correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
  handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
  NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
2018-01-30 21:33:55 +02:00
Marko Mäkelä
7cb3520c06 Merge bb-10.2-ext into 10.3 2017-11-30 08:16:37 +02:00
Alexander Barkov
5b697c5a23 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext 2017-11-29 12:06:48 +04:00
Sergei Golubchik
7f1900705b Merge branch '10.1' into 10.2 2017-11-21 19:47:46 +01:00
Andrei Elkin
aae4932775 MDEV-12012/MDEV-11969 Can't remove GTIDs for a stale GTID Domain ID
As reported in MDEV-11969 "there's no way to ditch knowledge" about some
domain that is no longer updated on a server. Besides being of annoyance to
clutter output in DBA console stale domains can prevent the slave
to connect the master as MDEV-12012 witnesses.
What domain is obsolete must be evaluated by the user (DBA) according
to whether the domain info is still relevant and will the domain ever
receive any update.

This patch introduces a method to discard obsolete gtid domains from
the server binlog state. The removal requires no event group from such
domain present in existing binlog files though. If there are any the
containing logs must be first PURGEd in order for

  FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)

succeed. Otherwise the command returns an error.

The list of obsolete domains can be computed through
intersecting two sets - the earliest (first) binlog's Gtid_list
and the current value of @@global.gtid_binlog_state - and extracting
the domain id components from the intersection list items.
The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog
omitting the deleted domains from the active binlog file's Gtid_list.
Notice though when the command is ineffective - that none of requested to delete
domain exists in the binlog state - rotation does not occur.

Obsolete domain deletion is not harmful for connected slaves as long
as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID.
The slaves must have the last event from purged files processed as usual,
in order not to bump later into requesting a gtid from a file which
was already gone.
While the command is not replicated (as ordinary FLUSH BINLOG LOGS is)
slaves, even though having extra domains, won't suffer from reconnection errors
thanks to master-slave gtid connection protocol allowing the master
to be ignorant about a gtid domain.
Should at failover such slave to be promoted into master role it may run
the ex-master's

 FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)

to clean its own binlog state.

NOTES.
  suite/perfschema/r/start_server_low_digest.result
is re-recorded as consequence of internal parser codes changes.
2017-11-15 22:26:32 +02:00
Kristian Nielsen
c36620ddc3 MDEV-12179 post-merge fixes.
Fix LEX_STRING -> LEX_CSTRING issues.
2017-07-03 10:36:09 +02:00
Kristian Nielsen
1d91910b94 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Merge into MariaDB 10.3.
2017-07-03 09:33:41 +02:00
Kristian Nielsen
0db2cd7c76 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Intermediate commit.

Fix compilation failure with different my_atomic implementation.

The my_atomic_loadptr* takes void ** as first argument, so variables
updated with it needs to be void * (it is not legal C to cast
some_type ** to void **).
2017-05-10 09:56:31 +02:00
Monty
5a759d31f7 Changing field::field_name and Item::name to LEX_CSTRING
Benefits of this patch:
- Removed a lot of calls to strlen(), especially for field_string
- Strings generated by parser are now const strings, less chance of
  accidently changing a string
- Removed a lot of calls with LEX_STRING as parameter (changed to pointer)
- More uniform code
- Item::name_length was not kept up to date. Now fixed
- Several bugs found and fixed (Access to null pointers,
  access of freed memory, wrong arguments to printf like functions)
- Removed a lot of casts from (const char*) to (char*)

Changes:
- This caused some ABI changes
  - lex_string_set now uses LEX_CSTRING
  - Some fucntions are now taking const char* instead of char*
- Create_field::change and after changed to LEX_CSTRING
- handler::connect_string, comment and engine_name() changed to LEX_CSTRING
- Checked printf() related calls to find bugs. Found and fixed several
  errors in old code.
- A lot of changes from LEX_STRING to LEX_CSTRING, especially related to
  parsing and events.
- Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING*
- Some changes for char* to const char*
- Added printf argument checking for my_snprintf()
- Introduced null_clex_str, star_clex_string, temp_lex_str to simplify
  code
- Added item_empty_name and item_used_name to be able to distingush between
  items that was given an empty name and items that was not given a name
  This is used in sql_yacc.yy to know when to give an item a name.
- select table_name."*' is not anymore same as table_name.*
- removed not used function Item::rename()
- Added comparision of item->name_length before some calls to
  my_strcasecmp() to speed up comparison
- Moved Item_sp_variable::make_field() from item.h to item.cc
- Some minimal code changes to avoid copying to const char *
- Fixed wrong error message in wsrep_mysql_parse()
- Fixed wrong code in find_field_in_natural_join() where real_item() was
  set when it shouldn't
- ER_ERROR_ON_RENAME was used with extra arguments.
- Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already
  give the error.

TODO:
- Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c
- Change code to not modify LEX_CSTRING for database name
  (as part of lower_case_table_names)
2017-04-23 22:35:46 +03:00
Kristian Nielsen
00eebb2243 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Intermediate commit.

Fix incorrect assertion. The hton in the list of pending GTIDs can be
NULL, in the special case where we failed to load the
mysql.gtid_slave_pos table at server startup, but nevertheless allow
non-GTID replication to proceed.
2017-04-21 10:30:17 +02:00
Kristian Nielsen
fdf2d40770 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Intermediate commit.

Implement auto-creation of mysql.gtid_slave_pos* tables with needed engines,
if listed in --gtid-pos-auto-engines.

Uses an asynchronous approach to minimise locking overhead.

The list of available tables is extended with a flag. Extra entries are
added for --gtid-pos-auto-engines tables that do not exist yet, marked as
not existing but ready for auto-creation.

If record_gtid() needs a table marked for auto-creation, it sends a request
to the slave background thread to create the table, and continues to use an
existing table for the current and immediately coming transactions.

As soon as the slave background thread has made the new table available, it
will be used for all subsequent relevant transactions in record_gtid().

This asynchronous approach also avoids a lot of complex issues around trying
to do DDL in the middle of an on-going transaction.
2017-04-21 10:30:16 +02:00
Kristian Nielsen
6a84473c28 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Intermediate commit.

This commit implements that record_gtid() selects a gtid_slave_posXXX table
with a storage engine already in use by current transaction, if any.

The default table mysql.gtid_slave_pos is used if no match can be found on
storage engine, or for GTID position updates with no specific storage
engine.

Table discovery of mysql.gtid_slave_pos* happens on initial GTID state load
as well as on every START SLAVE. Some effort is made to make this possible
without additional locking. New tables are added using lock-free atomics.
Removing tables requires stopping all slaves first. A warning is given in
the error log when a table is removed but a non-stopped slave still has a
reference to it.

If multiple mysql.gtid_slave_posXXX tables with same storage engine exist,
one is chosen arbitrarily to be used, with a warning in the error log. GTID
data from all tables is still read, but only one among redundant tables with
same storage engine will be updated.
2017-04-21 10:30:14 +02:00
Kristian Nielsen
c995ecbe98 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Intermediate commit.

For each GTID recorded in mysq.gtid_slave_pos, keep track of which
engine the update was made in.

This will be later used to know which rows can be deleted in the table
of a given engine.
2017-04-21 08:00:06 +02:00
Kristian Nielsen
087cf02328 MDEV-12179: Per-engine mysql.gtid_slave_pos table
Intermediate commit.

Keep track of which mysql.gtid_slave_posXXX tables are available for each
engine, by searching for all tables in the mysql schema with names that
start with "gtid_slave_pos".

The list is computed at server start when the GTID position is loaded, and
it is re-computed on every START SLAVE command. This way, the DBA can
manually add a table for a new engine, and it will be automatically picked
up on next START SLAVE, so a full server restart is not needed.

The list is not yet actually used in the code.
2017-04-21 08:00:06 +02:00
Michael Widenius
d82ac8eaaf Change "static int" to enum in classes
This was done when static int where used as bit fields or enums
2017-04-18 12:23:40 +03:00
Monty
636bb59034 Final fixes for Memory_used
- Change some static variables to dynamic to ensure that we don't do any memory
  allocations before server starts or stops
- Print more memory information on SIGHUP. Fixed output.
- Write out if memory was lost if run with --debug-at-exit
- Fixed wrong #ifdef in sql_cache.cc
2016-04-28 17:15:38 +03:00
Sergei Golubchik
a2bcee626d Merge branch '10.0' into 10.1 2015-12-21 21:24:22 +01:00
Monty
c3018b0ff4 Fixes to get all test to run on MacosX Lion 10.7
This includes fixing all utilities to not have any memory leaks,
as safemalloc warnings stopped tests from passing on MacOSX.

- Ensure that all clients takes character-set-dir, as the
  libmysqlclient library will use it.
- mysql-test-run now passes character-set-dir to all external clients.
- Changed dynstr_free() so that it can be called twice (made freeing code easier)
- Changed rpl_global_gtid_slave_state to be allocated dynamicly as it
  includes a mutex that needs to be initizlied/destroyed before my_end() is called.
- Removed rpl_slave_state::init() and rpl_slave_stage::deinit() as
  their job are better handling by constructor and delete.
- Print alias instead of table_name in check_duplicate_key as
  table_name may have been converted to lower case.

Other things:
- Fixed a case in time_to_datetime_with_warn() where we where
  using && instead of & in tests
2015-11-29 17:51:23 +02:00
Kristian Nielsen
95d7208859 Merge MDEV-6589 and MDEV-6403 into 10.1.
Conflicts:
	sql/log.cc
	sql/rpl_rli.cc
	sql/sql_repl.cc
2015-03-04 13:49:37 +01:00
Kristian Nielsen
78c74dbe30 MDEV-6403: Temporary tables lost at STOP SLAVE in GTID mode if master has not rotated binlog since restart
The binlog contains specially marked format description events to mark
when a master restart happened (which could have caused temporary
tables to be silently dropped). Such events also cause slave to close
temporary tables.

However, there was a bug that if after this, slave re-connects to the
master in GTID mode, the master can send an old format description
event again. If temporary tables are closed when such event is seen
for the second time, it might drop temporary tables created after that
event, and cause replication failure.

With this patch, the restart flag of the format description event is
cleared by the master when it is sent to the slave in a subsequent
connection, to avoid the errorneous temp table close.
2015-03-04 13:36:29 +01:00
Kristian Nielsen
ad0d203f2e MDEV-6589: Incorrect relay log start position when restarting SQL thread after error in parallel replication
The problem occurs in parallel replication in GTID mode, when we are using
multiple replication domains. In this case, if the SQL thread stops, the
slave GTID position may refer to a different point in the relay log for each
domain.

The bug was that when the SQL thread was stopped and restarted (but the IO
thread was kept running), the SQL thread would resume applying the relay log
from the point of the most advanced replication domain, silently skipping all
earlier events within other domains. This caused replication corruption.

This patch solves the problem by storing, when the SQL thread stops with
multiple parallel replication domains active, the current GTID
position. Additionally, the current position in the relay logs is moved back
to a point known to be earlier than the current position of any replication
domain. Then when the SQL thread restarts from the earlier position, GTIDs
encountered are compared against the stored GTID position. Any GTID that was
already applied before the stop is skipped to avoid duplicate apply.

This patch should have no effect if multi-domain GTID parallel replication is
not used. Similarly, if both SQL and IO thread are stopped and restarted, the
patch has no effect, as in this case the existing relay logs are removed and
re-fetched from the master at the current global @@gtid_slave_pos.
2015-03-04 13:36:04 +01:00
Nirbhay Choubey
75a27eeaf7 MDEV-4987: Sort by domain_id when list of GTIDs are output
Added logic to sort gtid list based on domain_id before
populating them in string. Added a test case.
2015-02-27 23:33:22 -05:00
Nirbhay Choubey
a50ddebb5c MDEV-6593 : domain_id based replication filters
Implementation for domain ID based filtering of replication events.
2014-12-03 22:30:48 -05:00
unknown
8b9b7ec395 MDEV-5804: If same GTID is received on multiple master connections in multi-source replication, the event is double-executed causing corruption or replication failure
Some fixes, mainly to make it work in non-parallel replication mode also
(--slave-parallel-threads=0).

Patch should be fairly complete now.
2014-03-12 00:14:49 +01:00
unknown
2c2478b822 MDEV-5804: If same GTID is received on multiple master connections in multi-source replication, the event is double-executed causing corruption or replication failure
Before, the arrival of same GTID twice in multi-source replication
would cause double-apply or in gtid strict mode an error.

Keep the behaviour, but add an option --gtid-ignore-duplicates which
allows to correctly handle duplicates, ignoring all but the first.
This relies on the user ensuring correct configuration so that
sequence numbers are strictly increasing within each replication
domain; then duplicates can be detected simply by comparing the
sequence numbers against what is already applied.

Only one master connection (but possibly multiple parallel worker
threads within that connection) is allowed to apply events within
one replication domain at a time; any other connection that
receives a GTID in the same domain either discards it (if it is
already applied) or waits for the other connection to not have
any events to apply.

Intermediate patch, as proof-of-concept for testing. The main limitation
is that currently it is only implemented for parallel replication,
@@slave_parallel_threads > 0.
2014-03-09 10:27:38 +01:00
unknown
76e929a92e MDEV-4984: Implement MASTER_GTID_WAIT() and @@LAST_GTID.
Rewrite the gtid_waiting::wait_for_gtid() function.
The code was rubbish (and buggy). Now the logic is
much clearer.

Also fix a missing slave sync that could cause test failure.
2014-02-08 22:28:41 +01:00
unknown
4e6606acad MDEV-4984: Implement MASTER_GTID_WAIT() and @@LAST_GTID.
MASTER_GTID_WAIT() is similar to MASTER_POS_WAIT(), but works with a
GTID position rather than an old-style filename/offset.

@@LAST_GTID gives the GTID assigned to the last transaction written
into the binlog.

Together, the two can be used by applications to obtain the GTID of
an update on the master, and then do a MASTER_GTID_WAIT() for that
position on any read slave where it is important to get results that
are caught up with the master at least to the point of the update.

The implementation of MASTER_GTID_WAIT() is implemented in a way
that tries to minimise the performance impact on the SQL threads,
even in the presense of many waiters on single GTID positions (as
from @@LAST_GTID).
2014-02-07 19:15:28 +01:00
unknown
170e9e593d MDEV-5306: Missing locking around rpl_global_gtid_binlog_state
There were some places where insufficient locking between
parallel threads could cause invalid memory accesses and
possibly other grief.

This patch adds the missing locking, and moves the locking
into the struct rpl_binlog_state methods to make it easier
to see that proper locking is in place everywhere.
2013-11-18 15:22:50 +01:00
unknown
cb86ce60b9 Merge MDEV-4506: Parallel replication into 10.0-base. 2013-11-01 09:17:06 +01:00
unknown
6a38b59475 MDEV-5189: Incorrect parallel apply in parallel replication
Two problems were fixed:

1. When not in GTID mode (master_use_gtid=no), then we must not apply events
   in different domains in parallel (in non-GTID mode we are not capable of
   restarting at different points in different domains).

2. When transactions B and C group commit together, but after and separate
   from A, we can apply B and C in parallel, but both B and C must not start
   until A has committed. Fix sub_id to be globally increasing (not just
   per-domain increasing) so that this wait (which is based on sub_id) can be
   done correctly.
2013-10-25 21:17:14 +02:00
unknown
f9c2b402f4 MDEV-26: Global transaction ID.
Implement @@gtid_binlog_state. This is the internal state of the binlog
(most recent GTID logged for every domain_id and server_id). This allows
to save the state before RESET MASTER and restore it afterwards.
2013-08-23 14:02:13 +02:00
unknown
f74c745a99 MDEV-4488: When master is on the list of ignore_server_ids, GTID position on slave is not updated
The ignored events are not written to the relay log, but instead a fake
Rotate event is generated to handle update of position.

Extend this for Gtid so we similarly generate a fake Gtid_list event
to update the GTID position.

Also fix an unrelated test issue that got triggered by the added test cases.
2013-08-22 12:36:42 +02:00
unknown
f0deff867a MDEV-4820: Empty master does not give error for slave GTID position that does not exist in the binlog
The main bug here was the following situation:

Suppose we set up a completely new master2 as an extra multi-master to an
existing slave that already has a different master1 for domain_id=0. When the
slave tries to connect to master2, master2 will not have anything that slave
requests in domain_id=0, but that is fine as master2 is supposedly meant to
serve eg. domain_id=1. (This is MDEV-4485).

But suppose that master2 then actually starts sending events from
domain_id=0. In this case, the fix for MDEV-4485 was incomplete, and the code
would fail to give the error that the position requested by the slave in
domain_id=0 was missing from the binlogs of master2. This could lead to lost
events or completely wrong replication.

The patch for this bug fixes this issue.

In addition, it cleans up the code a bit, getting rid of the fake_gtid_hash in
the code. And the error message when slave and master have diverged due to
alternate future is clarified, as requested in the bug description.
2013-08-16 15:10:25 +02:00