Commit graph

516 commits

Author SHA1 Message Date
Monty
25b5c63905 MDEV-33856: Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps
This commit adds 3 new status variables to 'show all slaves status':

- Master_last_event_time ; timestamp of the last event read from the
  master by the IO thread.
- Slave_last_event_time ; Master timestamp of the last event committed
  on the slave.
- Master_Slave_time_diff: The difference of the above two timestamps.

All the above variables are NULL until the slave has started and the
slave has read one query event from the master that changes data.

- Added information_schema.slave_status, which allows us to remove:
   - show_master_info(), show_master_info_get_fields(),
     send_show_master_info_data(), show_all_master_info()
   - class Sql_cmd_show_slave_status.
   - Protocol::store(I_List<i_string_pair>* str_list) as it is not
     used anymore.
- Changed old SHOW SLAVE STATUS and SHOW ALL SLAVES STATUS to
  use the SELECT code path, as all other SHOW ... STATUS commands.

Other things:
- Xid_log_time is set to time of commit to allow slave that reads the
  binary log to calculate Master_last_event_time and
  Slave_last_event_time.
  This is needed as there is not 'exec_time' for row events.
- Fixed that Load_log_event calculates exec_time identically to
  Query_event.
- Updated RESET SLAVE to reset Master/Slave_last_event_time
- Updated SQL thread's update on first transaction read-in to
  only update Slave_last_event_time on group events.
- Fixed possible (unlikely) bugs in sql_show.cc ...old_format() functions
  if allocation of 'field' would fail.

Reviewed By:
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-07-25 08:57:27 -06:00
Alexander Barkov
8f4ec79d09 Merge remote-tracking branch 'origin/11.4' into 11.5 2024-07-08 12:25:04 +04:00
Alexander Barkov
c4bf4ce948 Merge remote-tracking branch 'origin/11.2' into 11.4 2024-06-17 15:46:39 +04:00
Marko Mäkelä
a21e49cbcc Merge 11.1 into 11.2 2024-06-17 12:02:03 +03:00
Yuchen Pei
2d3e2c58b6
Merge branch '10.11' into 11.1 2024-05-31 10:54:31 +10:00
Marko Mäkelä
22ba7e4ff8 Merge 10.6 into 10.11 2024-05-30 16:04:00 +03:00
Marko Mäkelä
5ba542e9ee Merge 10.5 into 10.6 2024-05-30 14:27:07 +03:00
Monty
b9f5793176 MDEV-9101 Limit size of created disk temporary files and tables
Two new variables added:
- max_tmp_space_usage : Limits the the temporary space allowance per user
- max_total_tmp_space_usage: Limits the temporary space allowance for
  all users.

New status variables: tmp_space_used & max_tmp_space_used
New field in information_schema.process_list: TMP_SPACE_USED

The temporary space is counted for:
- All SQL level temporary files. This includes files for filesort,
  transaction temporary space, analyze, binlog_stmt_cache etc.
  It does not include engine internal temporary files used for repair,
  alter table, index pre sorting etc.
- All internal on disk temporary tables created as part of resolving a
  SELECT, multi-source update etc.

Special cases:
- When doing a commit, the last flush of the binlog_stmt_cache
  will not cause an error even if the temporary space limit is exceeded.
  This is to avoid giving errors on commit. This means that a user
  can temporary go over the limit with up to binlog_stmt_cache_size.

Noteworthy issue:
- One has to be careful when using small values for max_tmp_space_limit
  together with binary logging and with non transactional tables.
  If a the binary log entry for the query is bigger than
  binlog_stmt_cache_size and one hits the limit of max_tmp_space_limit
  when flushing the entry to disk, the query will abort and the
  binary log will not contain the last changes to the table.
  This will also stop the slave!
  This is also true for all Aria tables as Aria cannot do rollback
  (except in case of crashes)!
  One way to avoid it is to use @@binlog_format=statement for
  queries that updates a lot of rows.

Implementation:
- All writes to temporary files or internal temporary tables, that
  increases the file size, are routed through temp_file_size_cb_func()
  which updates and checks the temp space usage.
- Most of the temporary file monitoring is done inside IO_CACHE.
  Temporary file monitoring is done inside the Aria engine.
- MY_TRACK and MY_TRACK_WITH_LIMIT are new flags for ini_io_cache().
  MY_TRACK means that we track the file usage. TRACK_WITH_LIMIT means
  that we track the file usage and we give an error if the limit is
  breached. This is used to not give an error on commit when
  binlog_stmp_cache is flushed.
- global_tmp_space_used contains the total tmp space used so far.
  This is needed quickly check against max_total_tmp_space_usage.
- Temporary space errors are using EE_LOCAL_TMP_SPACE_FULL and
  handler errors are using HA_ERR_LOCAL_TMP_SPACE_FULL.
  This is needed until we move general errors to it's own error space
  so that they cannot conflict with system error numbers.
- Return value of my_chsize() and mysql_file_chsize() has changed
  so that -1 is returned in the case my_chsize() could not decrease
  the file size (very unlikely and will not happen on modern systems).
  All calls to _chsize() are updated to check for > 0 as the error
  condition.
- At the destruction of THD we check that THD::tmp_file_space == 0
- At server end we check that global_tmp_space_used == 0
- As a precaution against errors in the tmp_space_used code, one can set
  max_tmp_space_usage and max_total_tmp_space_usage to 0 to disable
  the tmp space quota errors.
- truncate_io_cache() function added.
- Aria tables using static or dynamic row length are registered in 8K
  increments to avoid some calls to update_tmp_file_size().

Other things:
- Ensure that all handler errors are registered.  Before, some engine
  errors could be printed as "Unknown error".
- Fixed bug in filesort() that causes a assert if there was an error
  when writing to the temporay file.
- Fixed that compute_window_func() now takes into account write errors.
- In case of parallel replication, rpl_group_info::cleanup_context()
  could call trans_rollback() with thd->error set, which would cause
  an assert. Fixed by resetting the error before calling trans_rollback().
- Fixed bug in subselect3.inc which caused following test to use
  heap tables with low value for max_heap_table_size
- Fixed bug in sql_expression_cache where it did not overflow
  heap table to Aria table.
- Added Max_tmp_disk_space_used to slow query log.
- Fixed some bugs in log_slow_innodb.test
2024-05-27 12:39:04 +02:00
Monty
d2e6fe02d7 Generate a warning(note) and write to error log if master_pos_wait() fails.
This is to make it easier to understand why master_pos_wait() fails in mtr.
2024-05-27 12:39:02 +02:00
Oleksandr Byelkin
dd7d9d7fb1 Merge branch '11.4' into 11.5 2024-05-23 17:01:43 +02:00
Oleksandr Byelkin
99b370e023 Merge branch '11.2' into 11.4 2024-05-21 19:38:51 +02:00
Robin Newhouse
dc38d8ea80 Minimize unsafe C functions with safe_strcpy()
Similar to #2480.
567b681 introduced safe_strcpy() to minimize the use of C with
potentially unsafe memory overflow with strcpy() whose use is
discouraged.
Replace instances of strcpy() with safe_strcpy() where possible, limited
here to files in the `sql/` directory.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
2024-05-17 13:33:16 +01:00
Sergei Golubchik
bf5da43e50 Merge branch '11.1' into 11.2 2024-05-13 10:00:26 +02:00
Sergei Golubchik
f9807aadef Merge branch '10.11' into 11.0 2024-05-12 12:18:28 +02:00
Sergei Golubchik
7b53672c63 Merge branch '10.5' into 10.6 2024-05-08 20:06:00 +02:00
Sergei Golubchik
cea083af9f cleanup: use THD_STAGE_INFO, not thd_proc_info
and put master-slave.inc *last* in the series of includes
2024-05-05 21:37:07 +02:00
Sergei Golubchik
9cf718859f cleanup: use THD_STAGE_INFO, not thd_proc_info
and put master-slave.inc *last* in the series of includes
2024-04-24 22:08:52 +02:00
Sergei Golubchik
018d537ec1 Merge branch '10.6' into 10.11 2024-04-22 15:23:10 +02:00
Marko Mäkelä
829cb1a49c Merge 10.5 into 10.6 2024-04-17 14:14:58 +03:00
Kristian Nielsen
16aa4b5f59 Merge from 10.4 to 10.5
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-15 17:46:49 +02:00
Kristian Nielsen
0a6f46965a MDEV-33475: --gtid-ignore-duplicate can double-apply event in case of parallel replication retry
When rolling back and retrying a transaction in parallel replication, don't
release the domain ownership (for --gtid-ignore-duplicates) as part of the
rollback. Otherwise another master connection could grab the ownership and
double-apply the transaction in parallel with the retry.

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-13 16:59:10 +01:00
Alexander Barkov
929c2e06aa MDEV-31531 Remove my_casedn_str() and my_caseup_str()
Under terms of MDEV 27490 we'll add support for non-BMP identifiers
and upgrade casefolding information to Unicode version 14.0.0.
In Unicode-14.0.0 conversion to lower and upper cases can increase octet length
of the string, so conversion won't be possible in-place any more.

This patch removes virtual functions performing in-place casefolding:
  - my_charset_handler_st::casedn_str()
  - my_charset_handler_st::caseup_str()
and fixes the code to use the non-inplace functions instead:
  - my_charset_handler_st::casedn()
  - my_charset_handler_st::caseup()
2024-02-28 22:20:29 +04:00
Kristian Nielsen
d039346a7a MDEV-4991: GTID binlog indexing
Improve the performance of slave connect using B+-Tree indexes on each binlog
file. The index allows fast lookup of a GTID position to the corresponding
offset in the binlog file, as well as lookup of a position to find the
corresponding GTID position.

This eliminates a costly sequential scan of the starting binlog file
to find the GTID starting position when a slave connects. This is
especially costly if the binlog file is not cached in memory (IO
cost), or if it is encrypted or a lot of slaves connect simultaneously
(CPU cost).

The size of the index files is generally less than 1% of the binlog data, so
not expected to be an issue.

Most of the work writing the index is done as a background task, in
the binlog background thread. This minimises the performance impact on
transaction commit. A simple global mutex is used to protect index
reads and (background) index writes; this is fine as slave connect is
a relatively infrequent operation.

Here are the user-visible options and status variables. The feature is on by
default and is expected to need no tuning or configuration for most users.

binlog_gtid_index
  On by default. Can be used to disable the indexes for testing purposes.

binlog_gtid_index_page_size (default 4096)
  Page size to use for the binlog GTID index. This is the size of the nodes
  in the B+-tree used internally in the index. A very small page-size (64 is
  the minimum) will be less efficient, but can be used to stress the
  BTree-code during testing.

binlog_gtid_index_span_min (default 65536)
  Control sparseness of the binlog GTID index. If set to N, at most one
  index record will be added for every N bytes of binlog file written.
  This can be used to reduce the number of records in the index, at
  the cost only of having to scan a few more events in the binlog file
  before finding the target position

Two status variables are available to monitor the use of the GTID indexes:

  Binlog_gtid_index_hit
  Binlog_gtid_index_miss

The "hit" status increments for each successful lookup in a GTID index.
The "miss" increments when a lookup is not possible. This indicates that the
index file is missing (eg. binlog written by old server version
without GTID index support), or corrupt.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-01-27 12:09:54 +01:00
Oleksandr Byelkin
34272bd6a5 Merge branch '11.2' into 11.3 2023-11-14 18:33:03 +01:00
Oleksandr Byelkin
0427c4739e MariaDB 11.1.3 release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEF39AEP5WyjM2MAMF8WVvJMdM0dgFAmVSLiUACgkQ8WVvJMdM
 0dhj4A/7B2GIx75Mv4IcExE2s4bfR7sOKZzvjWbMHysovMHhsHAV5fHN7dRQojyV
 HxmSY8lxykm/LMoJ8RASmrojRZsgvkJ84z+fLK7is327Vms7fW7ZWc3eqotIgs7I
 m9Dz3+wiexvl6NKeHnafTZtkJOe8MEqZEGPV1e8V4I3SAJQWyQLnRr4si/VmjAMi
 miKuieTuKoZUYSkdNLwicEFHXysgg6/U8367sgMsJe9V3HYSD3pVQJ/nboTL5uZL
 vTbmEPS1pICKPvPu75DdedSdxSASMyXis9/IWtk13NqPPzX16uHtjkhffAuBT3+k
 CUgRggTYAuoF3MjvyspIS3pdC/73PBb1O+w/9vlHPiwSXVn3d48Ay55uvFgM/pVB
 UKLorw+As0oH2N1HWUp/d4Rbvrnjdq5OgzhmMTrWDAtYyrNU9Jw5S1CAp+G/s2dD
 5j+FUPBBnHo5UfxI+EVTqUggm56R+vJTx4H3q82n05bdJTJYNJ+nixvsYuf7hS3f
 oEqJAUizgGI3h6FGPD9bN0HSYGblEeNgAYv1YogfVX/Eq10RriWic9PtxxOxgOmE
 n+UhdH4YTTyaTv0jssWTJVmVNzjjXMvI4aB8A1FkXeIz2iohSziSkJzaBuzNq2QY
 kKHr8XqiyNnckcoRxfoxNPtrWcmiykpHOBFnuyMRWoXPKcr7idc=
 =ShdC
 -----END PGP SIGNATURE-----

Merge tag '11.1' into 11.2

MariaDB 11.1.3 release
2023-11-14 18:28:37 +01:00
Oleksandr Byelkin
48af85db21 Merge branch '10.11' into 11.0 2023-11-08 17:09:44 +01:00
Oleksandr Byelkin
04d9a46c41 Merge branch '10.6' into 10.10 2023-11-08 16:23:30 +01:00
Oleksandr Byelkin
b83c379420 Merge branch '10.5' into 10.6 2023-11-08 15:57:05 +01:00
Oleksandr Byelkin
6cfd2ba397 Merge branch '10.4' into 10.5 2023-11-08 12:59:00 +01:00
Brandon Nesterenko
0c1bf5e247 MDEV-27247: Add keywords "SQL_BEFORE_GTIDS" and "SQL_AFTER_GTIDS" for START SLAVE UNTIL
New Feature:
============
This patch extends the START SLAVE UNTIL command with options
SQL_BEFORE_GTIDS and SQL_AFTER_GTIDS to allow user control of
whether the replica stops before or after a provided GTID state. Its
syntax is:

START SLAVE UNTIL (SQL_BEFORE_GTIDS|SQL_AFTER_GTIDS)=”<gtid_list>”

When providing SQL_BEFORE_GTIDS=”<gtid_list>”, for each domain
specified in the gtid_list, the replica will execute transactions up
to the GTID found, and immediately stop processing events in that
domain (without executing the transaction of the specified GTID).
Once all domains have stopped, the replica will stop. Events
originating from domains that are not specified in the list are not
replicated.

START SLAVE UNTIL SQL_AFTER_GTIDS=”<gtid_list>” is an alias to the
default behavior of START SLAVE UNTIL master_gtid_pos=”<gtid_list>”.
That is, the replica will only execute transactions originating from
domain ids provided in the list, and will stop once all transactions
provided in the UNTIL list have all been executed.

Example:
=========
If a primary server has a binary log consisting of the following GTIDs:

0-1-1
1-1-1
0-1-2
1-1-2
0-1-3
1-1-3

If a fresh replica (i.e. one with an empty GTID position,
@@gtid_slave_pos='') is started with SQL_BEFORE_GTIDS, i.e.

START SLAVE UNTIL SQL_BEFORE_GTIDS=”1-1-2”

The resulting gtid_slave_pos of the replica will be “1-1-1”.
This is because the replica will execute only events from domain 1
until it sees the transaction with sequence number 2, and
immediately stop without executing it.

If the replica is started with SQL_AFTER_GTIDS, i.e.

START SLAVE UNTIL SQL_AFTER_GTIDS=”1-1-2”

then the resulting gtid_slave_pos of the replica will be “1-1-2”.
This is because it will only execute events from domain 1 until it
has executed the provided GTID.

Reviewed By:
============
Kristian Nielson <knielsen@knielsen-hq.org>
2023-10-23 06:40:05 -06:00
Jan Lindström
3c65434b78 MDEV-31285 : Assertion `state() == s_executing || state() == s_preparing ||
state() == s_prepared || state() == s_must_abort || state() == s_aborting ||
state() == s_cert_failed || state() == s_must_replay' failed

When applier tries to execute write rows event it find out
in table_def::compatible_with that value is not compatible
and sets error and thd->is_slave_error but thd->is_error()
is false. Later in rpl_group_info::slave_close_thread_tables
we commit stmt. This is bad for Galera because later in
apply_write_set we notice that event apply was not successful
and try to rollback transaction, but wsrep transaction
is already in s_committed state.

This is fixed on rpl_group_info::slave_close_thread_tables
so that in Galera case we rollback stmt if thd->is_slave_error
or thd->is_error() is set. Then later we can rollback wsrep
transaction.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-09-28 19:28:25 +02:00
Nikita Malyavin
28b4037242 Merge branch '11.2' into 11.3 2023-09-21 14:15:04 +04:00
Alexander Barkov
9cb75f333f MDEV-32026 lowercase_table2.test failures in 11.3
Also fixes MDEV-32025 Crashes in MDL_key::mdl_key_init with lower-case-table-names=2

Change overview:
- In changes made in MDEV-31948, MDEV-31982 the code path
  which originaly worked only in case of lower-case-table-names==1
  also started to work in case of lower-case-table-names==2 in a mistake.

  Restoring the original check_db_name() compatible behavior
  (but without re-using check_db_name() itself).
- MDEV-31978 erroneously added a wrong DBUG_ASSERT. Removing.

Details:

- In mysql_change_db() the database name should be lower-cased only
  in case of lower_case_table_names==1. It should not be lower-cased
  for lower_case_table_names==2. The problem was caused by MDEV-31948.
  The new code version restored the pre-MDEV-31948 behavior, which
  used check_db_name() behavior.

- Passing lower_case_table_names==1 instead of just lower_case_table_names
  to the "casedn" parameter to DBNameBuffer constructor in sql_parse.cc
  The database name should not be lower-cased for lower_case_table_names==2.
  This restores pre-MDEV-31982 behavioir which used check_db_name() here.

- Adding a new data type Lex_ident_db_normalized, it stores database
  names which are both checked and normalized to lower case
  in case lower_case_table_names==1 and lower_case_table_names==2.

- Changing the data type for the "db" parameter to Lex_ident_db_normalized in
  lock_schema_name(), lock_db_routines(), find_db_tables_and_rm_known_files().

  This is to avoid incorrectly passing a non-normalized name in the future.

- Restoring the database name normalization in mysql_create_db_internal()
  and mysql_rm_db_internal() before calling lock_schema_name().
  The problem was caused MDEV-31982.

- Adding database name normalization in mysql_alter_db_internal()
  and mysql_upgrade_db(). This fixes MDEV-32026.

- Removing a wrong assert in Create_sp_func::create_with_db() was incorrect:

    DBUG_ASSERT(Lex_ident_fs(*db).ok_for_lower_case_names());

  The database name comes to here checked, but not normalized
  to lower case with lower-case-table-names=2.
  The assert was erroneously added by MDEV-31978.

- Recording lowercase_tables2.results and lowercase_tables4.results
  according to
    MDEV-29446 Change SHOW CREATE TABLE to display default collations
  These tests are skipped on buildbot on all platforms, so this change
  was forgotten in the patch for MDEV-29446.
2023-08-29 14:19:38 +04:00
Sergei Golubchik
18ddde4826 Merge branch '11.1' into 11.2 2023-08-18 00:59:16 +02:00
Nikita Malyavin
2be4c836e5 MDEV-29069 ER_KEY_NOT_FOUND on online autoinc addition + concurrent DELETE
We can't rely on keys formed with columns that were added during this
ALTER. These columns can be set with non-deterministic values, which can
end up with broken or incorrect search.

The same applies to the keys that contain reliable columns, but also have
bogus ones. Using them can narrow the search, but they're also ignored.

Also, added columns shouldn't be considered during the record match. To
determine them, table->has_value_set bitmap is used.

To fill has_value_set bitmap in the find_key call, extra unpack_row call
has been added.

For replication case, extra replica columns can be considered for this
case. We try to ignore them, too.
2023-08-15 10:16:13 +02:00
Oleksandr Byelkin
51f9d62005 Merge branch '10.11' into 11.0 2023-08-09 07:53:48 +02:00
Oleksandr Byelkin
34a8e78581 Merge branch '10.6' into 10.9 2023-08-04 08:01:06 +02:00
Oleksandr Byelkin
6bf8483cac Merge branch '10.5' into 10.6 2023-08-01 15:08:52 +02:00
Oleksandr Byelkin
f52954ef42 Merge commit '10.4' into 10.5 2023-07-20 11:54:52 +02:00
Kristian Nielsen
9856bb4245 MDEV-31602: Race on rpl_global_gtid_slave_state when starting IO thread
Fix that rpl_slave_state::load() was calling rpl_slave_state::update() without
holding LOCK_slave_state.

Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-07-04 22:18:31 +02:00
Sergei Petrunia
c7fe8e51de Merge 10.11 into 11.0 2023-04-17 16:50:01 +03:00
Marko Mäkelä
1d1e0ab2cc Merge 10.6 into 10.8 2023-04-12 15:50:08 +03:00
Marko Mäkelä
5bada1246d Merge 10.5 into 10.6 2023-04-11 16:15:19 +03:00
Oleksandr Byelkin
ac5a534a4c Merge remote-tracking branch '10.4' into 10.5 2023-03-31 21:32:41 +02:00
Andrei
d4339620be MDEV-30780 optimistic parallel slave hangs after hit an error
The hang could be seen as show slave status displaying an error like
    Last_Error: Could not execute Write_rows_v1
along with
    Slave_SQL_Running: Yes

accompanied with one of the replication threads in show-processlist
characteristically having status like

   2394 | system user  |    | NULL | Slave_worker | 50852| closing tables

It turns out that closing tables worker got entrapped in endless looping
in mark_start_commit_inner() across already garbage-collected gco items.

The reclaimed gco links are explained with actually possible
out-of-order groups of events termination due to the Last_Error.
This patch reinforces the correct ordering to perform
finish_event_group's cleanup actions, incl unlinking gco:s
from the active list.
2023-03-16 18:55:19 +02:00
Marko Mäkelä
2e431ff7e6 Merge 10.11 into 11.0 2023-02-16 13:34:45 +02:00
Sergei Golubchik
760d149067 MDEV-30128 remove support for 5.1- replication events
including patches from Andrei Elkin
2023-02-05 22:02:30 +01:00
Oleksandr Byelkin
638625278e Merge branch '10.7' into 10.8 2023-01-31 09:57:52 +01:00
Oleksandr Byelkin
b923b80cfd Merge branch '10.6' into 10.7 2023-01-31 09:33:58 +01:00
Oleksandr Byelkin
c3a5cf2b5b Merge branch '10.5' into 10.6 2023-01-31 09:31:42 +01:00