Similar to #2480.
567b681 introduced safe_strcpy() to minimize the use of C with
potentially unsafe memory overflow with strcpy() whose use is
discouraged.
Replace instances of strcpy() with safe_strcpy() where possible, limited
here to files in the `sql/` directory.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
When rolling back and retrying a transaction in parallel replication, don't
release the domain ownership (for --gtid-ignore-duplicates) as part of the
rollback. Otherwise another master connection could grab the ownership and
double-apply the transaction in parallel with the retry.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Improve the performance of slave connect using B+-Tree indexes on each binlog
file. The index allows fast lookup of a GTID position to the corresponding
offset in the binlog file, as well as lookup of a position to find the
corresponding GTID position.
This eliminates a costly sequential scan of the starting binlog file
to find the GTID starting position when a slave connects. This is
especially costly if the binlog file is not cached in memory (IO
cost), or if it is encrypted or a lot of slaves connect simultaneously
(CPU cost).
The size of the index files is generally less than 1% of the binlog data, so
not expected to be an issue.
Most of the work writing the index is done as a background task, in
the binlog background thread. This minimises the performance impact on
transaction commit. A simple global mutex is used to protect index
reads and (background) index writes; this is fine as slave connect is
a relatively infrequent operation.
Here are the user-visible options and status variables. The feature is on by
default and is expected to need no tuning or configuration for most users.
binlog_gtid_index
On by default. Can be used to disable the indexes for testing purposes.
binlog_gtid_index_page_size (default 4096)
Page size to use for the binlog GTID index. This is the size of the nodes
in the B+-tree used internally in the index. A very small page-size (64 is
the minimum) will be less efficient, but can be used to stress the
BTree-code during testing.
binlog_gtid_index_span_min (default 65536)
Control sparseness of the binlog GTID index. If set to N, at most one
index record will be added for every N bytes of binlog file written.
This can be used to reduce the number of records in the index, at
the cost only of having to scan a few more events in the binlog file
before finding the target position
Two status variables are available to monitor the use of the GTID indexes:
Binlog_gtid_index_hit
Binlog_gtid_index_miss
The "hit" status increments for each successful lookup in a GTID index.
The "miss" increments when a lookup is not possible. This indicates that the
index file is missing (eg. binlog written by old server version
without GTID index support), or corrupt.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
New Feature:
============
This patch extends the START SLAVE UNTIL command with options
SQL_BEFORE_GTIDS and SQL_AFTER_GTIDS to allow user control of
whether the replica stops before or after a provided GTID state. Its
syntax is:
START SLAVE UNTIL (SQL_BEFORE_GTIDS|SQL_AFTER_GTIDS)=”<gtid_list>”
When providing SQL_BEFORE_GTIDS=”<gtid_list>”, for each domain
specified in the gtid_list, the replica will execute transactions up
to the GTID found, and immediately stop processing events in that
domain (without executing the transaction of the specified GTID).
Once all domains have stopped, the replica will stop. Events
originating from domains that are not specified in the list are not
replicated.
START SLAVE UNTIL SQL_AFTER_GTIDS=”<gtid_list>” is an alias to the
default behavior of START SLAVE UNTIL master_gtid_pos=”<gtid_list>”.
That is, the replica will only execute transactions originating from
domain ids provided in the list, and will stop once all transactions
provided in the UNTIL list have all been executed.
Example:
=========
If a primary server has a binary log consisting of the following GTIDs:
0-1-1
1-1-1
0-1-2
1-1-2
0-1-3
1-1-3
If a fresh replica (i.e. one with an empty GTID position,
@@gtid_slave_pos='') is started with SQL_BEFORE_GTIDS, i.e.
START SLAVE UNTIL SQL_BEFORE_GTIDS=”1-1-2”
The resulting gtid_slave_pos of the replica will be “1-1-1”.
This is because the replica will execute only events from domain 1
until it sees the transaction with sequence number 2, and
immediately stop without executing it.
If the replica is started with SQL_AFTER_GTIDS, i.e.
START SLAVE UNTIL SQL_AFTER_GTIDS=”1-1-2”
then the resulting gtid_slave_pos of the replica will be “1-1-2”.
This is because it will only execute events from domain 1 until it
has executed the provided GTID.
Reviewed By:
============
Kristian Nielson <knielsen@knielsen-hq.org>
state() == s_prepared || state() == s_must_abort || state() == s_aborting ||
state() == s_cert_failed || state() == s_must_replay' failed
When applier tries to execute write rows event it find out
in table_def::compatible_with that value is not compatible
and sets error and thd->is_slave_error but thd->is_error()
is false. Later in rpl_group_info::slave_close_thread_tables
we commit stmt. This is bad for Galera because later in
apply_write_set we notice that event apply was not successful
and try to rollback transaction, but wsrep transaction
is already in s_committed state.
This is fixed on rpl_group_info::slave_close_thread_tables
so that in Galera case we rollback stmt if thd->is_slave_error
or thd->is_error() is set. Then later we can rollback wsrep
transaction.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Also fixes MDEV-32025 Crashes in MDL_key::mdl_key_init with lower-case-table-names=2
Change overview:
- In changes made in MDEV-31948, MDEV-31982 the code path
which originaly worked only in case of lower-case-table-names==1
also started to work in case of lower-case-table-names==2 in a mistake.
Restoring the original check_db_name() compatible behavior
(but without re-using check_db_name() itself).
- MDEV-31978 erroneously added a wrong DBUG_ASSERT. Removing.
Details:
- In mysql_change_db() the database name should be lower-cased only
in case of lower_case_table_names==1. It should not be lower-cased
for lower_case_table_names==2. The problem was caused by MDEV-31948.
The new code version restored the pre-MDEV-31948 behavior, which
used check_db_name() behavior.
- Passing lower_case_table_names==1 instead of just lower_case_table_names
to the "casedn" parameter to DBNameBuffer constructor in sql_parse.cc
The database name should not be lower-cased for lower_case_table_names==2.
This restores pre-MDEV-31982 behavioir which used check_db_name() here.
- Adding a new data type Lex_ident_db_normalized, it stores database
names which are both checked and normalized to lower case
in case lower_case_table_names==1 and lower_case_table_names==2.
- Changing the data type for the "db" parameter to Lex_ident_db_normalized in
lock_schema_name(), lock_db_routines(), find_db_tables_and_rm_known_files().
This is to avoid incorrectly passing a non-normalized name in the future.
- Restoring the database name normalization in mysql_create_db_internal()
and mysql_rm_db_internal() before calling lock_schema_name().
The problem was caused MDEV-31982.
- Adding database name normalization in mysql_alter_db_internal()
and mysql_upgrade_db(). This fixes MDEV-32026.
- Removing a wrong assert in Create_sp_func::create_with_db() was incorrect:
DBUG_ASSERT(Lex_ident_fs(*db).ok_for_lower_case_names());
The database name comes to here checked, but not normalized
to lower case with lower-case-table-names=2.
The assert was erroneously added by MDEV-31978.
- Recording lowercase_tables2.results and lowercase_tables4.results
according to
MDEV-29446 Change SHOW CREATE TABLE to display default collations
These tests are skipped on buildbot on all platforms, so this change
was forgotten in the patch for MDEV-29446.
We can't rely on keys formed with columns that were added during this
ALTER. These columns can be set with non-deterministic values, which can
end up with broken or incorrect search.
The same applies to the keys that contain reliable columns, but also have
bogus ones. Using them can narrow the search, but they're also ignored.
Also, added columns shouldn't be considered during the record match. To
determine them, table->has_value_set bitmap is used.
To fill has_value_set bitmap in the find_key call, extra unpack_row call
has been added.
For replication case, extra replica columns can be considered for this
case. We try to ignore them, too.
Fix that rpl_slave_state::load() was calling rpl_slave_state::update() without
holding LOCK_slave_state.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The hang could be seen as show slave status displaying an error like
Last_Error: Could not execute Write_rows_v1
along with
Slave_SQL_Running: Yes
accompanied with one of the replication threads in show-processlist
characteristically having status like
2394 | system user | | NULL | Slave_worker | 50852| closing tables
It turns out that closing tables worker got entrapped in endless looping
in mark_start_commit_inner() across already garbage-collected gco items.
The reclaimed gco links are explained with actually possible
out-of-order groups of events termination due to the Last_Error.
This patch reinforces the correct ordering to perform
finish_event_group's cleanup actions, incl unlinking gco:s
from the active list.
The user XA commit execution branch was caught not have been covered
with MDEV-21953 fixes.
The XA involved deadlock is resolved now to apply the former fixes
pattern.
Along the fixes the following changes have been implemented.
- MDL lock attribute correction
- dissociation of the externally completed XA from the current
thread's xid_state in the error branches
- cleanup_context() preseves the prepared XA
- wait_for_prior_commit() is relocated to satisfy both
the binlog ON (log-slave-updates and skip-log-bin)
and OFF slave execution branches.