... on semisync slave
To provide semisync master crash-recovery the same server-id transactions
were made to accept for execution on the semisync slave when the strict gtid
mode (see MDEV-27760).
That however caused out-of-order error on a master's transaction
server of the circular setup.
The error was fair in the sense of the gtid strict mode rule as indeed
under the condition of the circular setup the replicated transaction
already exists in the local binlog.
This is fixed by the commit to ignore on the gtid strict mode semisync
slave those gtids that exist in the slave's binlog that effectively restores
the default same-server-id ignore policy.
At the same time the fixes complies with MDEV-21117 semisync slave recovery
to accept the same server-id transactions that do not exist in local binlog.
New Feature:
============
Extend mariadb-binlog command-line tool to allow for filtering
events using GTID domain and server ids. The functionality mimics
that of a replica server’s DO_DOMAIN_IDS, IGNORE_DOMAIN_IDS, and
IGNORE_SERVER_IDS from CHANGE MASTER TO. For completeness, this
patch additionally adds the option --do-server-ids as an alias for
--server-id, which now accepts a list of server ids instead of a
single one.
Example usage:
mariadb-binlog --do-domain-ids=2,3,4 --do-server-ids=1,3
master-bin.000001
Functional Notes:
1. --do-domain-ids cannot be combined with --ignore-domain-ids
2. --do-server-ids cannot be combined with --ignore-server-ids
3. A domain id filter can be combined with a server id filter
4. When any new filter options are combined with the
--gtid-strict-mode option, events from excluded domains/servers are
not validated.
5. Domain/server id filters can be combined with GTID ranges (i.e.
specifications of --start-position and --stop-position). However,
because the --stop-position option implicitly undertakes filtering
to only output events within its range of domains, when combined
with --do-domain-ids or --ignore-domain-ids, output will consist of
the intersection between the filters. Specifically, with
--do-domain-ids and --stop-position, only events with domain ids
present in both argument lists will be output. Conversely, with
--ignore-domain-ids and --stop-position, only events with domain ids
present in the --stop-position and absent from the
--ignore-domain-ids options will be output.
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
This patch fixes two issues:
First, it fixes test failure due to GTID List events
having inconsistent ordering of domain ids. In
particular, this patch ensures that a GTID list log
event will have its GTIDs ordered by domain id
(ascending) followed by sequence number (ascending).
Second, it fixes an assert which could use an
unintialized variable.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
This commit implements two phase binloggable ALTER.
When a new
@@session.binlog_alter_two_phase = YES
ALTER query gets logged in two parts, the START ALTER and the COMMIT
or ROLLBACK ALTER. START Alter is written in binlog as soon as
necessary locks have been acquired for the table. The timing is
such that any concurrent DML:s that update the same table are either
committed, thus logged into binary log having done work on the old
version of the table, or will be queued for execution on its new
version.
The "COMPLETE" COMMIT or ROLLBACK ALTER are written at the very point
of a normal "single-piece" ALTER that is after the most of
the query work is done. When its result is positive COMMIT ALTER is
written, otherwise ROLLBACK ALTER is written with specific error
happened after START ALTER phase.
Replication of two-phase binloggable ALTER is
cross-version safe. Specifically the OLD slave merely does not
recognized the start alter part, still being able to process and
memorize its gtid.
Two phase logged ALTER is read from binlog by mysqlbinlog to produce
BINLOG 'string', where 'string' contains base64 encoded
Query_log_event containing either the start part of ALTER, or a
completion part. The Query details can be displayed with `-v` flag,
similarly to ROW format events. Notice, mysqlbinlog output containing
parts of two-phase binloggable ALTER is processable correctly only by
binlog_alter_two_phase server.
@@log_warnings > 2 can reveal details of binlogging and slave side
processing of the ALTER parts.
The current commit also carries fixes to the following list of
reported bugs:
MDEV-27511, MDEV-27471, MDEV-27349, MDEV-27628, MDEV-27528.
Thanks to all people involved into early discussion of the feature
including Kristian Nielsen, those who helped to design, implement and
test: Sergei Golubchik, Andrei Elkin who took the burden of the
implemenation completion, Sujatha Sivakumar, Brandon
Nesterenko, Alice Sherepa, Ramesh Sivaraman, Jan Lindstrom.
New Feature:
===========
This commit extends the mariadb-binlog capabilities to allow events
to be filtered by GTID ranges. More specifically, the
--start-position and --stop-position arguments have been extended to
accept values formatted as a list of GTID positions, e.g.
--start-position=0-1-0,1-2-55. The following specific capabilities
are addressed:
1) GTIDs can be used to filter results on local binlog files
2) GTIDs can be used to filter results from remote servers
3) Implemented --gtid-strict-mode that ensures the GTID event
stream in each domain is monotonically increasing
4) Added new level of verbosity in mysqlbinlog -vvv to print
additional diagnostic information/warnings about invalid GTID
states
5) For a given GTID range, its start and stop position parameters
aim to mimic the behaviors of
CHANGE MASTER TO MASTER_USE_GTID=slave_pos and
START SLAVE UNTIL master_gtid_pos=<GTID>, respectively. In
particular, the start-position list expresses a gtid state of
the server, similarly to how @@global.gtid_slave_pos expresses
the gtid state of a slave server when connecting to a master
with MASTER_USE_GTID=slave_pos.
The GTID start-position list is exclusive and the
stop-position list is inclusive. This allows users to receive
events strictly after those that they already have, and is
useful in cases of point in (logical) time recovery including
1) events were received out of order and should be re-sent, or
2) specifying the gtid state of a slave to get events newer
than their current state. If a seq_no is 0 for start-position,
it means to include the entirety of the domain. If a seq_no is
0 for stop-position, it means to exclude all events from that
domain. The GTIDs provided in a start position argument must
match with the GTID state of the first processed log (i.e.
those listed in the Gtid_list event). If a stop position is
provided, the events that are output are limited to only those
with domain ids listed in the argument. When specifying
combinations of start and stop positions, the following
behaviors are expected:
[--start-position without --stop-position]: Events that have domain
ids in the start position are output if their seq_no occurs after
the respective start position. Events with domain ids that are
unspecified in the start position list are also output. Note that if
the Gtid_list event of the first binary log is populated (i.e.
non-empty), each domain in the Gtid_list must be present in the
start-position list with a seq_no at or after the listed value.
This behavior mimics how a slave only processes events after the
state provided by @@global.gtid_slave_pos when connecting to a
master with CHANGE MASTER TO MASTER_USE_GTID=slave_pos.
[--stop-position without --start-position]: Output is limited to
only events with both 1) domain ids that are present in the given
stop position list and 2) seq_nos that are less than or equal to
their respective stop GTID. Once all GTIDs in the stop position
list have been processed, the program will stop processing log
files. This behavior mimics how
START SLAVE UNTIL master_gtid_pos=<G>
has a slave only process events with domain ids present in G with
their seq_nos at or before the respective gtid.
[--start-position and --stop-position]: Output consists of the
intersection between the events permitted by both the start and stop
position rules. More concretely, the output can be defined by a
union of the following rules:
1. For domains which exist in both the start and stop position
lists, the events which exist in-between these positions
(exclusive start, inclusive stop) are output
2. For all other events, the rules of
[--stop-position without --start-position] are followed
This is due to the implicit filtering within each individual rule.
Even though the start position rule always includes events from
unspecified domains, the stop position rule takes precedence because
it always excludes events from unspecified domains. In other words,
events which the start position rule would have included would then
always be excluded by the stop position rule.
[neither --start-position nor --stop-position]: Events are not
omitted based on GTID positioning; however, --gtid-strict-mode and
-vvv can still analyze gtid correctness for warning and error
reporting.
[repeated specification of --start-position or --stop-position]:
Subsequent specifications of start and stop positions completely
override previous ones. E.g., if invoked as
mysqlbinlog --start-position=<G1> --start-position=<G2> ...
All GTIDs specified in G1 are ignored and only those specified in G2
are used for the start position.
A few additional notes:
1) this commit squashes together the commits:
f4319661120e-78a9d49907ba
2) Changed rpl.rpl_blackhole_row_annotate test because it has
out of order GTIDs in its binlog, so I added
--skip-gtid-strict-mode
3) After all binlog events have been written, the session server
id and domain id are reset to their values in the global state
Reviewed By:
===========
Andrei Elkin: <andrei.elkin@mariadb.com>
Per https://bugs.gentoo.org/807995
The test failed with:
CURRENT_TEST: binlog.binlog_flush_binlogs_delete_domain
— /tmp/mariadb-10.5.11/mysql-test/suite/binlog/r/binlog_flush_binlogs_delete_domain.result 2021-06-18 18:19:11.000000000 +0800
+++ /tmp/mariadb-10.5.11/mysql-test/suite/binlog/r/binlog_flush_binlogs_delete_domain.reject 2021-09-01 22:55:29.406655479 +0800
@@ -85,6 +85,6 @@
ERROR HY000: The value of gtid domain being deleted ('4294967296') exceeds its maximum size of 32 bit unsigned integer
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (4294967295);
Warnings:
-Warning 1076 The gtid domain being deleted ('4294967295') is not in the current binlog state
+Warning 1076 The gtid domain being deleted ('18446744073709551615') is not in the current binlog state
DROP TABLE t;
RESET MASTER;
mysqltest: Result length mismatch
ptr_domain_id is a uint32* so explicitly cast this when printing it out.
Thanks Marek Szuba for the bug report and testing the patch.
Changes:
- To detect automatic strlen() I removed the methods in String that
uses 'const char *' without a length:
- String::append(const char*)
- Binary_string(const char *str)
- String(const char *str, CHARSET_INFO *cs)
- append_for_single_quote(const char *)
All usage of append(const char*) is changed to either use
String::append(char), String::append(const char*, size_t length) or
String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
String::append()
- Added overflow argument to escape_string_for_mysql() and
escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
This was needed as most usage of the above functions never tested the
result for -1 and would have given wrong results or crashes in case
of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
Changed all Item_func::func_name()'s to func_name_cstring()'s.
The old Item_func_or_sum::func_name() is now an inline function that
returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
LEX_CSTRING.
- Changed for some functions the name argument from const char * to
to const LEX_CSTRING &:
- Item::Item_func_fix_attributes()
- Item::check_type_...()
- Type_std_attributes::agg_item_collations()
- Type_std_attributes::agg_item_set_converter()
- Type_std_attributes::agg_arg_charsets...()
- Type_handler_hybrid_field_type::aggregate_for_result()
- Type_handler_geometry::check_type_geom_or_binary()
- Type_handler::Item_func_or_sum_illegal_param()
- Predicant_to_list_comparator::add_value_skip_null()
- Predicant_to_list_comparator::add_value()
- cmp_item_row::prepare_comparators()
- cmp_item_row::aggregate_row_elements_for_comparison()
- Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
could be simplified to not use String_space(), thanks to the fixed
my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
- NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
clarify what the function really does.
- Rename of protocol function:
bool store(const char *from, CHARSET_INFO *cs) to
bool store_string_or_null(const char *from, CHARSET_INFO *cs).
This was done to both clarify the difference between this 'store' function
and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.
Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.
The reason for the failure is that
thd->mdl_context.release_transactional_locks()
was called after commit & rollback even in cases where the current
transaction is still active.
For 10.2, 10.3 and 10.4 the fix is simple:
- Replace all calls to thd->mdl_context.release_transactional_locks() with
thd->release_transactional_locks(). The thd function will only call
the mdl_context function if there are no active transactional locks.
In 10.6 we will better fix where we will change the return value for
some trans_xxx() functions to indicate if transaction did close the
transaction or not. This will avoid the need of the indirect call.
Other things:
- trans_xa_commit() and trans_xa_rollback() will automatically
call release_transactional_locks() if the transaction is closed.
- We can't do that for the other functions as the caller of many of these
are doing additional work (like close_thread_tables) before calling
release_transactional_locks().
- Added missing abort_result_set() and missing DBUG_RETURN in
select_create::send_eof()
- Fixed wrong indentation in injector::transaction::commit()
All changes (except one) is of type
thd->transaction. -> thd->transaction->
thd->transaction points by default to 'thd->default_transaction'
This allows us to 'easily' have multiple active transactions for a
THD object, like when reading data from the mysql.proc table
TDC_RT_REMOVE_ALL -> tdc_remove_table(). Some occurrences replaced with
TDC_element::flush() (whenver TABLE_SHARE is available).
TDC_RT_REMOVE_NOT_OWN[_KEEP_SHARE] -> TDC_element::flush(). These modes
assume that current thread owns TABLE_SHARE reference, which means we can
avoid hash lookup and flush unused TABLE instances directly.
TDC_RT_REMOVE_UNUSED -> TDC_element::flush_unused(). Only [ab]used by
mysql_admin_table() currently. Should be removed eventually.
Part of MDEV-17882 - Cleanup refresh version
Aim of this patch is to remove tdc_remove_table(TDC_RT_REMOVE_UNUSED),
which was mistakenly introduced by 055a3334a.
InnoDB allows only one open TABLE instance while performing table
truncation. To fulfill this requirement:
1. MDL_EXCLUSIVE has to be acquired to block concurrent threads from
accessing given table
2. cached TABLE instances have to be flushed
3. another InnoDB requirement is such that TABLE_SHARE and remaining
TABLE instance have to be invalidated and re-opened after truncation
This goes more or less inline with what regular TRUNCATE TABLE does.
Alternative solution would be handler::ha_delete_all_rows(), but InnoDB
doesn't implement it unfortunately.
Part of MDEV-17882 - Cleanup refresh version
MDEV-21605 Clean up and speed up interfaces for binary row logging
MDEV-21617 Bug fix for previous version of this code
The intention is to have as few 'if' as possible in ha_write() and
related functions. This is done by pre-calculating once per statement the
row_logging state for all tables.
Benefits are simpler and faster code both when binary logging is disabled
and when it's enabled.
Changes:
- Added handler->row_logging to make it easy to check it table should be
row logged. This also made it easier to disabling row logging for system,
internal and temporary tables.
- The tables row_logging capabilities are checked once per "statements
that updates tables" in THD::binlog_prepare_for_row_logging() which
is called when needed from THD::decide_logging_format().
- Removed most usage of tmp_disable_binlog(), reenable_binlog() and
temporary saving and setting of thd->variables.option_bits.
- Moved checks that can't change during a statement from
check_table_binlog_row_based() to check_table_binlog_row_based_internal()
- Removed flag row_already_logged (used by sequence engine)
- Moved binlog_log_row() to a handler::
- Moved write_locked_table_maps() to THD::binlog_write_table_maps() as
most other related binlog functions are in THD.
- Removed binlog_write_table_map() and binlog_log_row_internal() as
they are now obsolete as 'has_transactions()' is pre-calculated in
prepare_for_row_logging().
- Remove 'is_transactional' argument from binlog_write_table_map() as this
can now be read from handler.
- Changed order of 'if's in handler::external_lock() and wsrep_mysqld.h
to first evaluate fast and likely cases before more complex ones.
- Added error checking in ha_write_row() and related functions if
binlog_log_row() failed.
- Don't clear check_table_binlog_row_based_result in
clear_cached_table_binlog_row_based_flag() as it's not needed.
- THD::clear_binlog_table_maps() has been replaced with
THD::reset_binlog_for_next_statement()
- Added 'MYSQL_OPEN_IGNORE_LOGGING_FORMAT' flag to open_and_lock_tables()
to avoid calculating of binary log format for internal opens. This flag
is also used to avoid reading statistics tables for internal tables.
- Added OPTION_BINLOG_LOG_OFF as a simple way to turn of binlog temporary
for create (instead of using THD::sql_log_bin_off.
- Removed flag THD::sql_log_bin_off (not needed anymore)
- Speed up THD::decide_logging_format() by remembering if blackhole engine
is used and avoid a loop over all tables if it's not used
(the common case).
- THD::decide_logging_format() is not called anymore if no tables are used
for the statement. This will speed up pure stored procedure code with
about 5%+ according to some simple tests.
- We now get annotated events on slave if a CREATE ... SELECT statement
is transformed on the slave from statement to row logging.
- In the original code, the master could come into a state where row
logging is enforced for all future events if statement could be used.
This is now partly fixed.
Other changes:
- Ensure that all tables used by a statement has query_id set.
- Had to restore the row_logging flag for not used tables in
THD::binlog_write_table_maps (not normal scenario)
- Removed injector::transaction::use_table(server_id_type sid, table tbl)
as it's not used.
- Cleaned up set_slave_thread_options()
- Some more DBUG_ENTER/DBUG_RETURN, code comments and minor indentation
changes.
- Ensure we only call THD::decide_logging_format_low() once in
mysql_insert() (inefficiency).
- Don't annotate INSERT DELAYED
- Removed zeroing pos_in_table_list in THD::open_temporary_table() as it's
already 0
After 7fb9d64 it is used only by ALTER/DROP SERVER, which most probably
wasn't intentional as Federated never supported delayed inserts anyway.
If delayed inserts will ever become an issue with ALTER/DROP SERVER, we
should kill them by acquiring X-lock instead.
Part of MDEV-17882 - Cleanup refresh version
- Initialize variables that could be used uninitialized
- Added extra end space to DbugStringItemTypeValue to get rid of warnings
from c_ptr()
- Session_sysvars_tracker::update() accessed unitialized memory if called
with NULL value.
- get_schema_stat_record() accessed unitialized memory if HA_KEY_LONG_HASH
was used
- parse_vcol_defs() accessed random memory for tables without keys.
Problem:
========
gcc 8 -O2 seems to indicate a real error for this code:
direct_pos= table->file->ha_table_flags() & HA_PRIMARY_KEY_REQUIRED_FOR_POSITION;
the warning: /mariadb/10.4/sql/rpl_gtid.cc:980:7:
warning: 'direct_pos' may be used uninitialized in this function [-Wmaybe-uninitialized]
Analysis:
=========
'direct_pos' is a variable which holds 'table_flags'. If this flag is set it means
that a record within a table can be directly located by using its position. If
this flag is set to '0' means there is no direct access is available, hence
index scan must be initiated to locate the record. This direct_pos is used to
locate a row within mysql.gtid_slave_pos table for deletion.
Prior to the initialization of 'direct_pos' following steps take place.
1. mysql.gtid_slave_pos table is opened and 'table_opened' flag is set to true.
2. State check for mysql.gtid_slave_pos table is initiated.
If there is a failure during step2 code will be redirected to the error handling
part. This error handling code will access uninitialized value of 'direct_pos'.
This results in above mentioned warning.
Another issue found during analysis is the error handling code uses '!direct_pos'
to identify if the index is initialized or not. This is incorrect.
The index initialization code is shown below.
if (!direct_pos && (err= table->file->ha_index_init(0, 0)))
{
table->file->print_error(err, MYF(0));
goto end;
}
In case there is a failure during ha_index_init code will be redirected to end
part which tries to close the uninitialized index. It will result in an assert
10.4/sql/handler.h:3186: int handler::ha_index_end(): Assertion `inited==INDEX'
failed.
Fix:
===
Introduce a new variable named 'index_inited'. Set this variable upon successful
initialization of index initialization otherwise by default it is false. Use
this variable during error handling.
This patch changes how old rows in mysql.gtid_slave_pos* tables are deleted.
Instead of doing it as part of every replicated transaction in
record_gtid(), it is done periodically (every @@gtid_cleanup_batch_size
transaction) in the slave background thread.
This removes the deletion step from the replication process in SQL or worker
threads, which could speed up replication with many small transactions. It
also decreases contention on the global mutex LOCK_slave_state. And it
simplifies the logic, eg. when a replicated transaction fails after having
deleted old rows.
With this patch, the deletion of old GTID rows happens asynchroneously and
slightly non-deterministic. Thus the number of old rows in
mysql.gtid_slave_pos can temporarily exceed @@gtid_cleanup_batch_size. But
all old rows will be deleted eventually after sufficiently many new GTIDs
have been replicated.
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
The test and also rpl_gtid_delete_domain failed on PPC64 platform
due to an incorrectly specified actual key for searching
in a gtid domain system hash. While the correct size is 32 bits
the supplied value was 8 bytes of long int size on the platform.
The problem became evident thanks to the big endiness which
cut off the *least* significant part of the value field.
Fixed with correcting a dynamic array initialization to hold
now uint32 values as well as the values extraction for
searching in the gtid domain system hash.
A new added test ensures no overflowed values are accepted
for deletion which prevents inadvertent action. Notice though
MariaDB [test]> set @@session.gtid_domain_id=(1 << 32) + 1;
MariaDB [test]> show warnings;
+---------+------+--------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------+
| Warning | 1292 | Truncated incorrect gtid_domain_id value: '4294967297' |
+---------+------+--------------------------------------------------------+
MariaDB [test]> select @@session.gtid_domain_id;
+--------------------------+
| @@session.gtid_domain_id |
+--------------------------+
| 4294967295 |
+--------------------------+
This would happen especially in optimistic parallel replication, where there
is a good chance that a transaction will be rolled back (due to conflicts)
after it has executed record_gtid(). If the transaction did any deletions of
old rows as part of record_gtid(), those deletions will be undone as well.
And the code did not properly ensure that the deletions would be re-tried.
This patch makes record_gtid() remember the list of deletions done as part
of a transaction. Then in rpl_slave_state::update() when the changes have
been committed, we discard the list. However, in case of error and rollback,
in cleanup_context() we will instead put the list back into
rpl_global_gtid_slave_state so that the deletions will be re-tried later.
Probably fixes part of the cause of MDEV-12147 as well.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>