This commit adds 3 new status variables to 'show all slaves status':
- Master_last_event_time ; timestamp of the last event read from the
master by the IO thread.
- Slave_last_event_time ; Master timestamp of the last event committed
on the slave.
- Master_Slave_time_diff: The difference of the above two timestamps.
All the above variables are NULL until the slave has started and the
slave has read one query event from the master that changes data.
- Added information_schema.slave_status, which allows us to remove:
- show_master_info(), show_master_info_get_fields(),
send_show_master_info_data(), show_all_master_info()
- class Sql_cmd_show_slave_status.
- Protocol::store(I_List<i_string_pair>* str_list) as it is not
used anymore.
- Changed old SHOW SLAVE STATUS and SHOW ALL SLAVES STATUS to
use the SELECT code path, as all other SHOW ... STATUS commands.
Other things:
- Xid_log_time is set to time of commit to allow slave that reads the
binary log to calculate Master_last_event_time and
Slave_last_event_time.
This is needed as there is not 'exec_time' for row events.
- Fixed that Load_log_event calculates exec_time identically to
Query_event.
- Updated RESET SLAVE to reset Master/Slave_last_event_time
- Updated SQL thread's update on first transaction read-in to
only update Slave_last_event_time on group events.
- Fixed possible (unlikely) bugs in sql_show.cc ...old_format() functions
if allocation of 'field' would fail.
Reviewed By:
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Kristian Nielsen <knielsen@knielsen-hq.org>
Two new variables added:
- max_tmp_space_usage : Limits the the temporary space allowance per user
- max_total_tmp_space_usage: Limits the temporary space allowance for
all users.
New status variables: tmp_space_used & max_tmp_space_used
New field in information_schema.process_list: TMP_SPACE_USED
The temporary space is counted for:
- All SQL level temporary files. This includes files for filesort,
transaction temporary space, analyze, binlog_stmt_cache etc.
It does not include engine internal temporary files used for repair,
alter table, index pre sorting etc.
- All internal on disk temporary tables created as part of resolving a
SELECT, multi-source update etc.
Special cases:
- When doing a commit, the last flush of the binlog_stmt_cache
will not cause an error even if the temporary space limit is exceeded.
This is to avoid giving errors on commit. This means that a user
can temporary go over the limit with up to binlog_stmt_cache_size.
Noteworthy issue:
- One has to be careful when using small values for max_tmp_space_limit
together with binary logging and with non transactional tables.
If a the binary log entry for the query is bigger than
binlog_stmt_cache_size and one hits the limit of max_tmp_space_limit
when flushing the entry to disk, the query will abort and the
binary log will not contain the last changes to the table.
This will also stop the slave!
This is also true for all Aria tables as Aria cannot do rollback
(except in case of crashes)!
One way to avoid it is to use @@binlog_format=statement for
queries that updates a lot of rows.
Implementation:
- All writes to temporary files or internal temporary tables, that
increases the file size, are routed through temp_file_size_cb_func()
which updates and checks the temp space usage.
- Most of the temporary file monitoring is done inside IO_CACHE.
Temporary file monitoring is done inside the Aria engine.
- MY_TRACK and MY_TRACK_WITH_LIMIT are new flags for ini_io_cache().
MY_TRACK means that we track the file usage. TRACK_WITH_LIMIT means
that we track the file usage and we give an error if the limit is
breached. This is used to not give an error on commit when
binlog_stmp_cache is flushed.
- global_tmp_space_used contains the total tmp space used so far.
This is needed quickly check against max_total_tmp_space_usage.
- Temporary space errors are using EE_LOCAL_TMP_SPACE_FULL and
handler errors are using HA_ERR_LOCAL_TMP_SPACE_FULL.
This is needed until we move general errors to it's own error space
so that they cannot conflict with system error numbers.
- Return value of my_chsize() and mysql_file_chsize() has changed
so that -1 is returned in the case my_chsize() could not decrease
the file size (very unlikely and will not happen on modern systems).
All calls to _chsize() are updated to check for > 0 as the error
condition.
- At the destruction of THD we check that THD::tmp_file_space == 0
- At server end we check that global_tmp_space_used == 0
- As a precaution against errors in the tmp_space_used code, one can set
max_tmp_space_usage and max_total_tmp_space_usage to 0 to disable
the tmp space quota errors.
- truncate_io_cache() function added.
- Aria tables using static or dynamic row length are registered in 8K
increments to avoid some calls to update_tmp_file_size().
Other things:
- Ensure that all handler errors are registered. Before, some engine
errors could be printed as "Unknown error".
- Fixed bug in filesort() that causes a assert if there was an error
when writing to the temporay file.
- Fixed that compute_window_func() now takes into account write errors.
- In case of parallel replication, rpl_group_info::cleanup_context()
could call trans_rollback() with thd->error set, which would cause
an assert. Fixed by resetting the error before calling trans_rollback().
- Fixed bug in subselect3.inc which caused following test to use
heap tables with low value for max_heap_table_size
- Fixed bug in sql_expression_cache where it did not overflow
heap table to Aria table.
- Added Max_tmp_disk_space_used to slow query log.
- Fixed some bugs in log_slow_innodb.test
Similar to #2480.
567b681 introduced safe_strcpy() to minimize the use of C with
potentially unsafe memory overflow with strcpy() whose use is
discouraged.
Replace instances of strcpy() with safe_strcpy() where possible, limited
here to files in the `sql/` directory.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
When rolling back and retrying a transaction in parallel replication, don't
release the domain ownership (for --gtid-ignore-duplicates) as part of the
rollback. Otherwise another master connection could grab the ownership and
double-apply the transaction in parallel with the retry.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Under terms of MDEV 27490 we'll add support for non-BMP identifiers
and upgrade casefolding information to Unicode version 14.0.0.
In Unicode-14.0.0 conversion to lower and upper cases can increase octet length
of the string, so conversion won't be possible in-place any more.
This patch removes virtual functions performing in-place casefolding:
- my_charset_handler_st::casedn_str()
- my_charset_handler_st::caseup_str()
and fixes the code to use the non-inplace functions instead:
- my_charset_handler_st::casedn()
- my_charset_handler_st::caseup()
Improve the performance of slave connect using B+-Tree indexes on each binlog
file. The index allows fast lookup of a GTID position to the corresponding
offset in the binlog file, as well as lookup of a position to find the
corresponding GTID position.
This eliminates a costly sequential scan of the starting binlog file
to find the GTID starting position when a slave connects. This is
especially costly if the binlog file is not cached in memory (IO
cost), or if it is encrypted or a lot of slaves connect simultaneously
(CPU cost).
The size of the index files is generally less than 1% of the binlog data, so
not expected to be an issue.
Most of the work writing the index is done as a background task, in
the binlog background thread. This minimises the performance impact on
transaction commit. A simple global mutex is used to protect index
reads and (background) index writes; this is fine as slave connect is
a relatively infrequent operation.
Here are the user-visible options and status variables. The feature is on by
default and is expected to need no tuning or configuration for most users.
binlog_gtid_index
On by default. Can be used to disable the indexes for testing purposes.
binlog_gtid_index_page_size (default 4096)
Page size to use for the binlog GTID index. This is the size of the nodes
in the B+-tree used internally in the index. A very small page-size (64 is
the minimum) will be less efficient, but can be used to stress the
BTree-code during testing.
binlog_gtid_index_span_min (default 65536)
Control sparseness of the binlog GTID index. If set to N, at most one
index record will be added for every N bytes of binlog file written.
This can be used to reduce the number of records in the index, at
the cost only of having to scan a few more events in the binlog file
before finding the target position
Two status variables are available to monitor the use of the GTID indexes:
Binlog_gtid_index_hit
Binlog_gtid_index_miss
The "hit" status increments for each successful lookup in a GTID index.
The "miss" increments when a lookup is not possible. This indicates that the
index file is missing (eg. binlog written by old server version
without GTID index support), or corrupt.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
New Feature:
============
This patch extends the START SLAVE UNTIL command with options
SQL_BEFORE_GTIDS and SQL_AFTER_GTIDS to allow user control of
whether the replica stops before or after a provided GTID state. Its
syntax is:
START SLAVE UNTIL (SQL_BEFORE_GTIDS|SQL_AFTER_GTIDS)=”<gtid_list>”
When providing SQL_BEFORE_GTIDS=”<gtid_list>”, for each domain
specified in the gtid_list, the replica will execute transactions up
to the GTID found, and immediately stop processing events in that
domain (without executing the transaction of the specified GTID).
Once all domains have stopped, the replica will stop. Events
originating from domains that are not specified in the list are not
replicated.
START SLAVE UNTIL SQL_AFTER_GTIDS=”<gtid_list>” is an alias to the
default behavior of START SLAVE UNTIL master_gtid_pos=”<gtid_list>”.
That is, the replica will only execute transactions originating from
domain ids provided in the list, and will stop once all transactions
provided in the UNTIL list have all been executed.
Example:
=========
If a primary server has a binary log consisting of the following GTIDs:
0-1-1
1-1-1
0-1-2
1-1-2
0-1-3
1-1-3
If a fresh replica (i.e. one with an empty GTID position,
@@gtid_slave_pos='') is started with SQL_BEFORE_GTIDS, i.e.
START SLAVE UNTIL SQL_BEFORE_GTIDS=”1-1-2”
The resulting gtid_slave_pos of the replica will be “1-1-1”.
This is because the replica will execute only events from domain 1
until it sees the transaction with sequence number 2, and
immediately stop without executing it.
If the replica is started with SQL_AFTER_GTIDS, i.e.
START SLAVE UNTIL SQL_AFTER_GTIDS=”1-1-2”
then the resulting gtid_slave_pos of the replica will be “1-1-2”.
This is because it will only execute events from domain 1 until it
has executed the provided GTID.
Reviewed By:
============
Kristian Nielson <knielsen@knielsen-hq.org>
state() == s_prepared || state() == s_must_abort || state() == s_aborting ||
state() == s_cert_failed || state() == s_must_replay' failed
When applier tries to execute write rows event it find out
in table_def::compatible_with that value is not compatible
and sets error and thd->is_slave_error but thd->is_error()
is false. Later in rpl_group_info::slave_close_thread_tables
we commit stmt. This is bad for Galera because later in
apply_write_set we notice that event apply was not successful
and try to rollback transaction, but wsrep transaction
is already in s_committed state.
This is fixed on rpl_group_info::slave_close_thread_tables
so that in Galera case we rollback stmt if thd->is_slave_error
or thd->is_error() is set. Then later we can rollback wsrep
transaction.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Also fixes MDEV-32025 Crashes in MDL_key::mdl_key_init with lower-case-table-names=2
Change overview:
- In changes made in MDEV-31948, MDEV-31982 the code path
which originaly worked only in case of lower-case-table-names==1
also started to work in case of lower-case-table-names==2 in a mistake.
Restoring the original check_db_name() compatible behavior
(but without re-using check_db_name() itself).
- MDEV-31978 erroneously added a wrong DBUG_ASSERT. Removing.
Details:
- In mysql_change_db() the database name should be lower-cased only
in case of lower_case_table_names==1. It should not be lower-cased
for lower_case_table_names==2. The problem was caused by MDEV-31948.
The new code version restored the pre-MDEV-31948 behavior, which
used check_db_name() behavior.
- Passing lower_case_table_names==1 instead of just lower_case_table_names
to the "casedn" parameter to DBNameBuffer constructor in sql_parse.cc
The database name should not be lower-cased for lower_case_table_names==2.
This restores pre-MDEV-31982 behavioir which used check_db_name() here.
- Adding a new data type Lex_ident_db_normalized, it stores database
names which are both checked and normalized to lower case
in case lower_case_table_names==1 and lower_case_table_names==2.
- Changing the data type for the "db" parameter to Lex_ident_db_normalized in
lock_schema_name(), lock_db_routines(), find_db_tables_and_rm_known_files().
This is to avoid incorrectly passing a non-normalized name in the future.
- Restoring the database name normalization in mysql_create_db_internal()
and mysql_rm_db_internal() before calling lock_schema_name().
The problem was caused MDEV-31982.
- Adding database name normalization in mysql_alter_db_internal()
and mysql_upgrade_db(). This fixes MDEV-32026.
- Removing a wrong assert in Create_sp_func::create_with_db() was incorrect:
DBUG_ASSERT(Lex_ident_fs(*db).ok_for_lower_case_names());
The database name comes to here checked, but not normalized
to lower case with lower-case-table-names=2.
The assert was erroneously added by MDEV-31978.
- Recording lowercase_tables2.results and lowercase_tables4.results
according to
MDEV-29446 Change SHOW CREATE TABLE to display default collations
These tests are skipped on buildbot on all platforms, so this change
was forgotten in the patch for MDEV-29446.
We can't rely on keys formed with columns that were added during this
ALTER. These columns can be set with non-deterministic values, which can
end up with broken or incorrect search.
The same applies to the keys that contain reliable columns, but also have
bogus ones. Using them can narrow the search, but they're also ignored.
Also, added columns shouldn't be considered during the record match. To
determine them, table->has_value_set bitmap is used.
To fill has_value_set bitmap in the find_key call, extra unpack_row call
has been added.
For replication case, extra replica columns can be considered for this
case. We try to ignore them, too.
Fix that rpl_slave_state::load() was calling rpl_slave_state::update() without
holding LOCK_slave_state.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The hang could be seen as show slave status displaying an error like
Last_Error: Could not execute Write_rows_v1
along with
Slave_SQL_Running: Yes
accompanied with one of the replication threads in show-processlist
characteristically having status like
2394 | system user | | NULL | Slave_worker | 50852| closing tables
It turns out that closing tables worker got entrapped in endless looping
in mark_start_commit_inner() across already garbage-collected gco items.
The reclaimed gco links are explained with actually possible
out-of-order groups of events termination due to the Last_Error.
This patch reinforces the correct ordering to perform
finish_event_group's cleanup actions, incl unlinking gco:s
from the active list.