Commit graph

1312 commits

Author SHA1 Message Date
Alexander Barkov
a2a5ba14a8 Merge remote-tracking branch 'origin/11.5' into 11.6 2024-07-10 19:18:52 +04:00
Alexander Barkov
4e805aed85 Merge remote-tracking branch 'origin/11.4' into 11.5 2024-07-10 12:17:09 +04:00
Alexander Barkov
5fb07d942b Merge remote-tracking branch 'origin/11.2' into 11.4 2024-07-09 21:45:37 +04:00
Alexander Barkov
8aad19ddfc Merge remote-tracking branch 'origin/11.1' into 11.2 2024-07-09 14:04:11 +04:00
Oleksandr Byelkin
2447dda2c0 Merge branch '10.11' into 11.1 2024-07-08 22:40:16 +02:00
Alexander Barkov
027f137741 Merge remote-tracking branch 'origin/11.5' into 11.6 2024-07-08 14:15:04 +04:00
Alexander Barkov
8f4ec79d09 Merge remote-tracking branch 'origin/11.4' into 11.5 2024-07-08 12:25:04 +04:00
Oleksandr Byelkin
034a175982 Merge branch '10.6' into 10.11 2024-07-04 11:52:07 +02:00
Oleksandr Byelkin
dcd8a64892 Merge branch '10.5' into 10.6 2024-07-03 13:27:23 +02:00
Monty
2739b5f5f8 MDEV-34494 Add server_uid global variable and add it to error log at startup
The feedback plugin server_uid variable and the calculate_server_uid()
function is moved from feedback/utils.cc to sql/mysqld.cc

server_uid is added as a global variable (shown in 'show variables') and
is written to the error log on server startup together with server version
and server commit id.
2024-07-02 11:26:13 +03:00
Monty
d8c9c5ead6 MDEV-34491 Setting log_slow_admin="" at startup should be converted to log_slow_admin=ALL
We have an issue if a user have the following in a configuration file:
log_slow_filter=""                  # Log everything to slow query log
log_queries_not_using_indexes=ON

This set log_slow_filter to 'not_using_index' which disables
slow_query_logging of most queries.
In effect, on should never use log_slow_filter="" in config files but
instead use log_slow_filter=ALL.

Fixed by changing log_slow_filter="" that comes either from a
configuration file or from the command line, when starting to the server,
to log_slow_filter=ALL.
A warning will be printed when this happens.

Other things:
- One can now use =ALL for any 'set' variable to set all options at once.
  (backported from 10.6)
2024-07-02 11:26:13 +03:00
Daniel Black
e7b76f87c4 MDEV-34437 restrict port and extra-port to tcp valid values
extra_port and port are 16 bit numbers and not 32 bit as they are
tcp ports.

Restrict their value.
2024-07-01 17:43:12 +10:00
Alexander Barkov
c4bf4ce948 Merge remote-tracking branch 'origin/11.2' into 11.4 2024-06-17 15:46:39 +04:00
Marko Mäkelä
a21e49cbcc Merge 11.1 into 11.2 2024-06-17 12:02:03 +03:00
Marko Mäkelä
d34289a3e2 Merge 10.11 into 11.1 2024-06-17 09:21:50 +03:00
Alexey Yurchenko
a1e5a284fc MDEV-31809 Automatic SST user account management
Implement automatic creation of temporary accounts for SST and pass
account credentials to SST script via socket as opposed to environment
variables. Delete the user after the SST script returns,

Respect wsrep_sst_auth set by the adminitrator in case some additional
privilege grants are needed for particular SST method.

mysqldump SST requires significant change to make use of the new
automatic user generation facility. For now just make it compatible
by ignoring automatically generated user and rely only on wsrep_sst_auth
setting on the joiner node to keep backward compatibility.

Adapt mysqldump SST to automatic SST user generation changes:
 - disable special treatment for mysqldump SST on donor
 - make mysqldump SST script compatible with the new SST script
   interface.

Differentiate user privileges for different SST methods:
 - grant minimum required privileges for clone and xtrabackup SST
   accounts
 - grant all privileges to custom SST accounts as it is not known what
   is needed.
 - disable SST account generation for rsync SST since it is not needed.

MTR tests:
 - add MTR tests for clone and xtrabackup SSTs without wsrep_sst_auth,
 - add MTR test for testing masking of wsrep_sst_auth.
 - don't attmept to restore original wsrep_sst_auth in MTR tests as it
   is always masked.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-06-10 23:29:05 +02:00
Alexander Barkov
76e0dc18b6 MDEV-34288 SET NAMES DEFAULT crashes mariadbd --collation-server=utf8mb4_unicode_ci
The @@global.character_set_client variable could erroneously be set
to a non-default collation of its character set, which further made
the `SET NAMES DEFAULT` statement crash the server.

Fixing the code to make sure that the global value these variables:
  @@character_set_client
  @@character_set_connection
  @@character_set_server
  @@character_set_database
  @@character_set_connection
point to the default compiled collations of the character set.
2024-06-04 12:38:43 +04:00
Yuchen Pei
2d3e2c58b6
Merge branch '10.11' into 11.1 2024-05-31 10:54:31 +10:00
Marko Mäkelä
22ba7e4ff8 Merge 10.6 into 10.11 2024-05-30 16:04:00 +03:00
Marko Mäkelä
5ba542e9ee Merge 10.5 into 10.6 2024-05-30 14:27:07 +03:00
Monty
b9f5793176 MDEV-9101 Limit size of created disk temporary files and tables
Two new variables added:
- max_tmp_space_usage : Limits the the temporary space allowance per user
- max_total_tmp_space_usage: Limits the temporary space allowance for
  all users.

New status variables: tmp_space_used & max_tmp_space_used
New field in information_schema.process_list: TMP_SPACE_USED

The temporary space is counted for:
- All SQL level temporary files. This includes files for filesort,
  transaction temporary space, analyze, binlog_stmt_cache etc.
  It does not include engine internal temporary files used for repair,
  alter table, index pre sorting etc.
- All internal on disk temporary tables created as part of resolving a
  SELECT, multi-source update etc.

Special cases:
- When doing a commit, the last flush of the binlog_stmt_cache
  will not cause an error even if the temporary space limit is exceeded.
  This is to avoid giving errors on commit. This means that a user
  can temporary go over the limit with up to binlog_stmt_cache_size.

Noteworthy issue:
- One has to be careful when using small values for max_tmp_space_limit
  together with binary logging and with non transactional tables.
  If a the binary log entry for the query is bigger than
  binlog_stmt_cache_size and one hits the limit of max_tmp_space_limit
  when flushing the entry to disk, the query will abort and the
  binary log will not contain the last changes to the table.
  This will also stop the slave!
  This is also true for all Aria tables as Aria cannot do rollback
  (except in case of crashes)!
  One way to avoid it is to use @@binlog_format=statement for
  queries that updates a lot of rows.

Implementation:
- All writes to temporary files or internal temporary tables, that
  increases the file size, are routed through temp_file_size_cb_func()
  which updates and checks the temp space usage.
- Most of the temporary file monitoring is done inside IO_CACHE.
  Temporary file monitoring is done inside the Aria engine.
- MY_TRACK and MY_TRACK_WITH_LIMIT are new flags for ini_io_cache().
  MY_TRACK means that we track the file usage. TRACK_WITH_LIMIT means
  that we track the file usage and we give an error if the limit is
  breached. This is used to not give an error on commit when
  binlog_stmp_cache is flushed.
- global_tmp_space_used contains the total tmp space used so far.
  This is needed quickly check against max_total_tmp_space_usage.
- Temporary space errors are using EE_LOCAL_TMP_SPACE_FULL and
  handler errors are using HA_ERR_LOCAL_TMP_SPACE_FULL.
  This is needed until we move general errors to it's own error space
  so that they cannot conflict with system error numbers.
- Return value of my_chsize() and mysql_file_chsize() has changed
  so that -1 is returned in the case my_chsize() could not decrease
  the file size (very unlikely and will not happen on modern systems).
  All calls to _chsize() are updated to check for > 0 as the error
  condition.
- At the destruction of THD we check that THD::tmp_file_space == 0
- At server end we check that global_tmp_space_used == 0
- As a precaution against errors in the tmp_space_used code, one can set
  max_tmp_space_usage and max_total_tmp_space_usage to 0 to disable
  the tmp space quota errors.
- truncate_io_cache() function added.
- Aria tables using static or dynamic row length are registered in 8K
  increments to avoid some calls to update_tmp_file_size().

Other things:
- Ensure that all handler errors are registered.  Before, some engine
  errors could be printed as "Unknown error".
- Fixed bug in filesort() that causes a assert if there was an error
  when writing to the temporay file.
- Fixed that compute_window_func() now takes into account write errors.
- In case of parallel replication, rpl_group_info::cleanup_context()
  could call trans_rollback() with thd->error set, which would cause
  an assert. Fixed by resetting the error before calling trans_rollback().
- Fixed bug in subselect3.inc which caused following test to use
  heap tables with low value for max_heap_table_size
- Fixed bug in sql_expression_cache where it did not overflow
  heap table to Aria table.
- Added Max_tmp_disk_space_used to slow query log.
- Fixed some bugs in log_slow_innodb.test
2024-05-27 12:39:04 +02:00
Sergei Golubchik
9293d40fa7 MDEV-33145 support for old-mode=OLD_FLUSH_STATUS
add old-mode that restores inconsistent legacy behavior for FLUSH STATUS.
It doesn't affect FLUSH { SESSION | GLOBAL } STATUS.
2024-05-27 12:39:03 +02:00
Sergei Golubchik
9ecec1f730 cleanup: old_mode_deprecated
make it easier to add more old-mode values
2024-05-27 12:39:03 +02:00
Monty
2464ee758a MDEV-33655 Remove alter_algorithm
Remove alter_algorithm but keep the variable as no-op (with a warning).

The reasons for removing alter_algorithm are:
- alter_algorithm was introduced as a replacement for the
  old_alter_table that was used to force the usage of the original
  alter table algorithm (copy) in the cases where the new alter
  algorithm did not work. The new option was added as a way to force
  the usage of a specific algorithm when it should instead have made
  it possible to disable algorithms that would not work for some
  reason.
- alter_algorithm introduced some cases where ALTER TABLE would not
  work without specifying the ALGORITHM=XXX option together with
  ALTER TABLE.
- Having different values of alter_algorithm on master and slave could
  cause slave to stop unexpectedly.
- ALTER TABLE FORCE, as used by mariadb-upgrade, would not always work
  if alter_algorithm was set for the server.
- As part of the MDEV-33449 "improving repair of tables" it become
  clear that alter- algorithm made it harder to provide a better and
  more consistent ALTER TABLE FORCE and REPAIR TABLE and it would be
  better to remove it.
2024-05-27 12:39:03 +02:00
Monty
8af7a99443 Fixed warnings when using deprecated variables
Also fixed that all unused variables are using the same variable comment.
The warning will be tested with the next commit that deprecates the
variable alter_algorithm.
2024-05-27 12:39:02 +02:00
Sergei Golubchik
5296f908ed MDEV-28671 post-testing fixes
Various help message improvements:
* MySQL->MariaDB, mysqld->mariadbd, "mysqld daemon" -> "mariadbd process"
* typos
* don't specify defaults directly in the help message
* don't say that an option is deprecated, mark is as such
* missing spaces in the middle of the text
etc
2024-05-27 12:39:02 +02:00
Sergei Golubchik
df10a945fc MDEV-28671 post-merge fixes
* use new deprecated printer for all deprecated server options
* restore alphabetic option sorting order
* move deprecated printer from mysqld.cc to my_getopt.c
* in --help print deprecation message at the end of the option help
* move 'ALL' help text where it belongs - to other SET options, and
  with a correct indentation.
* consistently end all or none command-line option help strings
  with a dot - my_print_help() needs that.
  It's about 50/50 now, so let's do none, less line wraps in --help
* remove trailing spaces from command-line option help strings
2024-05-27 12:39:02 +02:00
Christian Gonzalez
4186fa72fb MDEV-28671 Enable var deprecation for mysqld help output
Currently there are mechanism to mark a system variable as
deprecated, but they are only used to print warning messages
when a deprecated variable is set.

Leverage the existing mechanisms in order to make the
deprecation information available at the --help output of mysqld by:

* Moving the deprecation information (i.e `deprecation_substitute`
  attribute) from the `sys_var` class into the `my_option` struct.
  As every `sys_var` contains its own `my_option` struct, the access
  to the deprecation information remains available to `sys_var`
  objects. `my_getotp` functions, which works directly with
  `my_option` structs, gain access to this information while building
  the --help output.

* For plugin variables, leverages the `PLUGIN_VAR_DEPRECATED` flag
  and set the `deprecation_substitute` attribute  accordingly when
  building the `my_option` objects.

* Change the `option_cmp` function to use the `deprecation_substitute`
  attribute instead of the name when sorting the options. This way
  deprecated options and the substitutes will be grouped together.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
2024-05-27 12:39:02 +02:00
Vladislav Vaintroub
736449d30f MDEV-34205: ASAN stack buffer overflow in strxnmov() in frm_file_exists
Correct the second parameter for strxnmov to prevent potential buffer
overflows. The second parameter must be one less than the size of the
input buffer to avoid writing past the end of the buffer.

While the second parameter is usually correct, there are exceptions
that need fixing.

This commit addresses the issue within frm_file_exists() and other
affected places.
2024-05-23 22:08:27 +02:00
Alexander Barkov
fd247cc21f MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp()
This patch also fixes:
  MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
  MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
  MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
  MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
  MDEV-33088 Cannot create triggers in the database `MYSQL`
  MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
  MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
  MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
  MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
  MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0

- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER

- Adding a wrapper function CHARSET_INFO::streq(), to compare
  two strings for equality. For now it calls strnncoll() internally.
  In the future it will turn into a virtual function.

- Adding new accent sensitive case insensitive collations:
    - utf8mb4_general1400_as_ci
    - utf8mb3_general1400_as_ci
  They implement accent sensitive case insensitive comparison.
  The weight of a character is equal to the code point of its
  upper case variant. These collations use Unicode-14.0.0 casefolding data.

  The result of
     my_charset_utf8mb3_general1400_as_ci.strcoll()
  is very close to the former
     my_charset_utf8mb3_general_ci.strcasecmp()

  There is only a difference in a couple dozen rare characters, because:
    - the switch from "tolower" to "toupper" comparison, to make
      utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
    - the switch from Unicode-3.0.0 to Unicode-14.0.0
  This difference should be tolarable. See the list of affected
  characters in the MDEV description.

  Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
  Unlike utf8mb4_general_ci, it does not treat all BMP characters
  as equal.

- Adding classes representing names of the file based database objects:

    Lex_ident_db
    Lex_ident_table
    Lex_ident_trigger

  Their comparison collation depends on the underlying
  file system case sensitivity and on --lower-case-table-names
  and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.

- Adding classes representing names of other database objects,
  whose names have case insensitive comparison style,
  using my_charset_utf8mb3_general1400_as_ci:

  Lex_ident_column
  Lex_ident_sys_var
  Lex_ident_user_var
  Lex_ident_sp_var
  Lex_ident_ps
  Lex_ident_i_s_table
  Lex_ident_window
  Lex_ident_func
  Lex_ident_partition
  Lex_ident_with_element
  Lex_ident_rpl_filter
  Lex_ident_master_info
  Lex_ident_host
  Lex_ident_locale
  Lex_ident_plugin
  Lex_ident_engine
  Lex_ident_server
  Lex_ident_savepoint
  Lex_ident_charset
  engine_option_value::Name

- All the mentioned Lex_ident_xxx classes implement a method streq():

  if (ident1.streq(ident2))
     do_equal();

  This method works as a wrapper for CHARSET_INFO::streq().

- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
  in class members and in function/method parameters.

- Replacing all calls like
    system_charset_info->coll->strcasecmp(ident1, ident2)
  to
    ident1.streq(ident2)

- Taking advantage of the c++11 user defined literal operator
  for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
  data types. Use example:

  const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;

  is now a shorter version of:

  const Lex_ident_column primary_key_name=
    Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
2024-04-18 15:22:10 +04:00
Sergei Golubchik
eeba940311 remove deprecated since 10.4 2024-02-17 17:10:25 +01:00
Oleksandr Byelkin
fa69b085b1 Merge branch '11.3' into 11.4 2024-02-15 13:53:21 +01:00
Marko Mäkelä
64cce8d5bf Merge 10.6 into 10.11 2024-02-14 16:12:53 +02:00
Monty
18dfcfdecf MDEV-31404 Implement binlog_space_limit
binlog_space_limit is a variable in Percona server used to limit the total
size of all binary logs.

This implementation is based on code from Percona server 5.7.

In MariaDB we decided to call the variable max-binlog-total-size to be
similar to max-binlog-size. This makes it easier to find in the output
from 'mariadbd --help --verbose'). MariaDB will also support
binlog_space_limit for compatibility with Percona.

Some internal notes to explain implementation notes:

- When running MariaDB does not delete binary logs that are either
  used by slaves or have active xid that are not yet committed.

Some implementation notes:

- max-binlog-total-size is by default 0 (no limit).
- max-binlog-total-size can be changed without server restart.
- Binlog file sizes are checked on startup, or if
  max-binlog-total-size is set to a value > 0, not for every log write.
  The total size of all binary logs is cached and dynamically updated
  when updating the binary log on binary log rotation.
- max-binlog-total-size is checked against existing log files during
  serverstart, binlog rotation, FLUSH LOGS, when writing to binary log
  or when max-binlog-total-size changes value.
- Option --slave-connections-needed-for-purge with 1 as default added.
  This allows one to ensure that we do not delete binary logs if there
  is less than 'slave-connections-needed-for-purge' connected.
  Without this option max-binlog-total-size would potentially delete
  binlogs needed by slaves on server startup or when a slave disconnects
  as there are then no connected slaves to protect active binlogs.
- PURGE BINARY LOGS TO ... will be executed as if
  slave-connectitons-needed-for-purge would be zero. In other words
  it will do the purge even if there is no slaves connected. If there
  are connected slaves working on the logs, these will be protected.
- If binary log is on and max-binlog-total_size <> 0 then the status
  variable 'Binlog_disk_use' shows the current size of all old binary
  logs + the state of the current one.
- Removed test of strcmp(log_file_name, log_info.log_file_name) in
  purge_logs_before_date() as this is tested in can_purge_logs()
- To avoid expensive calls of log_in_use() we cache the result for the
  last log that is in use by a slave. Future calls to can_purge_logs()
  for this binary log will be quickly detected and false will be returned
  until a slave starts working on a new log.
- Note that after a binary log rotation caused by max_binlog_size,
  the last log will not be purged directly as it is still in use
  internally. The next binary log write will purge binlogs if needed.

Reviewer:Kristian Nielsen <knielsen@knielsen-hq.org>
2024-02-14 15:02:21 +01:00
Monty
3907345e22 MDEV-33306 Optimizer choosing incorrect index in 10.6, 10.5 but not in 10.4
In MariaDB up to 10.11, the test_if_cheaper_ordering() code (that tries
to optimizer how GROUP BY is executed) assumes that if a table scan is used
then if there is any index usable by GROUP BY it will be used.

The reason MySQL 10.4 provides a better plan is because of two differences:
- Plans using 'ref' has a cost of 1/10 of what it should be (as a
  protection against table scans). This is why 'ref' is used in 10.4
  and not in 10.5.
- When 'ref' is used, then GROUP BY will not use an index for GROUP BY.

In MariaDB 10.5 the chosen plan is a table scan (as it calculated to be
faster) but as 'ref' is not used, the test_if_cheaper_ordering()
optimizer phase decides (as ref is not usd) to use an index for GROUP BY,
which has bad performance.

Description of fix:
- All new code is protected by the "optimizer_adjust_secondary_key_costs"
  variable, which is now a bit map, and is only executed if the option
  "disable_forced_index_in_group_by" set.
- Corrects GROUP BY handling in test_if_cheaper_ordering() by making
  the choise of using and index with GROUP BY cost based instead of rule
  based.
- Adds TIME_FOR_COMPARE to all costs, when using group by, to make
  read_time, index_scan_time and range_cost comparable.

Other things:
- Made optimizer_adjust_secondary_key_costs a bit map (compatible with old
  code).

Notes:
Current code ignores costs for the algorithm used when doing GROUP
BY on the first table:
  - Create an in-memory temporary table for handling group by and doing a
    filesort of the result file
We can probably in 10.6 continue to ignore this cost.

This patch should NOT be merged to 11.0 series (not needed in 11.0).
2024-02-12 16:43:00 +02:00
Oleksandr Byelkin
d21cb43db1 Merge branch '11.2' into 11.3 2024-02-04 16:42:31 +01:00
Sergei Golubchik
75bfb4b8a3 deprecate SQL_NOTES variable in favor of NOTE_VERBOSITY
as suggested by Monty
2024-02-03 11:22:20 +01:00
Sergei Golubchik
79580f4f96 Merge branch '11.1' into 11.2 2024-02-02 17:43:57 +01:00
Sergei Golubchik
b6680e0101 Merge branch '11.0' into 11.1 2024-02-02 11:30:47 +01:00
Sergei Golubchik
87e13722a9 Merge branch '10.6' into 10.11 2024-02-01 18:36:14 +01:00
Oleksandr Byelkin
fe490f85bb Merge branch '10.11' into 11.0 2024-01-30 08:54:10 +01:00
Oleksandr Byelkin
14d930db5d Merge branch '10.6' into 10.11 2024-01-30 08:17:58 +01:00
Kristian Nielsen
d039346a7a MDEV-4991: GTID binlog indexing
Improve the performance of slave connect using B+-Tree indexes on each binlog
file. The index allows fast lookup of a GTID position to the corresponding
offset in the binlog file, as well as lookup of a position to find the
corresponding GTID position.

This eliminates a costly sequential scan of the starting binlog file
to find the GTID starting position when a slave connects. This is
especially costly if the binlog file is not cached in memory (IO
cost), or if it is encrypted or a lot of slaves connect simultaneously
(CPU cost).

The size of the index files is generally less than 1% of the binlog data, so
not expected to be an issue.

Most of the work writing the index is done as a background task, in
the binlog background thread. This minimises the performance impact on
transaction commit. A simple global mutex is used to protect index
reads and (background) index writes; this is fine as slave connect is
a relatively infrequent operation.

Here are the user-visible options and status variables. The feature is on by
default and is expected to need no tuning or configuration for most users.

binlog_gtid_index
  On by default. Can be used to disable the indexes for testing purposes.

binlog_gtid_index_page_size (default 4096)
  Page size to use for the binlog GTID index. This is the size of the nodes
  in the B+-tree used internally in the index. A very small page-size (64 is
  the minimum) will be less efficient, but can be used to stress the
  BTree-code during testing.

binlog_gtid_index_span_min (default 65536)
  Control sparseness of the binlog GTID index. If set to N, at most one
  index record will be added for every N bytes of binlog file written.
  This can be used to reduce the number of records in the index, at
  the cost only of having to scan a few more events in the binlog file
  before finding the target position

Two status variables are available to monitor the use of the GTID indexes:

  Binlog_gtid_index_hit
  Binlog_gtid_index_miss

The "hit" status increments for each successful lookup in a GTID index.
The "miss" increments when a lookup is not possible. This indicates that the
index file is missing (eg. binlog written by old server version
without GTID index support), or corrupt.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-01-27 12:09:54 +01:00
Monty
6f65e08277 MDEV-33118 optimizer_adjust_secondary_key_costs variable
optimizer-adjust_secondary_key_costs is added to provide 2 small
adjustments to the 10.x optimizer cost model. This can be used in the
case where the optimizer wrongly uses a secondary key instead of a
clustered primary key.

The reason behind this change is that MariaDB 10.x does not take into
account that for engines like InnoDB, that scanning a primary key can be
up to 7x faster than scanning a secondary key + read the row data trough
the primary key.

The different values for optimizer_adjust_secondary_key_costs are:

optimizer_adjust_secondary_key_costs=0
- No changes to current model

optimizer_adjust_secondary_key_costs=1
- Ensure that the cost of of secondary indexes has a cost of at
  least 5x times the cost of a clustered primary key (if one exists).
  This disables part of the worst_seek optimization described below.

optimizer_adjust_secondary_key_costs=2
- Disable "worst_seek optimization" and adjust filter cost slightly
  (add cost of 1 if filter is used).

The idea behind 'worst_seek optimization' is that we limit the
cost for all non clustered ref access to the least of:
- best-rows-by-range (or all rows in no range found) / 10
- scan-time-table (roughly number of file blocks to scan table) * 3

In addition we also do not try to use rowid_filter if number of rows
estimated for 'ref' access is less than the worst_seek limitation.

The idea is that worst_seek is trying to take into account that if
we do a lot of accesses through a key, this is likely to be cached.
However it only does this for secondary keys, and not for clustered
keys or index only reads.

The effect of the worst_seek are:
- In some cases 'ref' will have a much lower cost than range or using
  a clustered key.
- Some possible rowid filters for secondary keys will be ignored.

When implementing optimizer_adjust_secondary_key_costs=2, I noticed
that there is a slightly different costs for how ref+filter and
range+filter are calculated.  This caused a lot of range and
range+filter to change to ref+filter, which is not good as
range+filter provides the optimizer a better estimate of how many
accepted rows there will be in the result set.
Adding a extra small cost (1 seek) when using filter mitigated the
above problems in almost all cases.

This patch should not be applied to MariaDB 11.0 as worst_seeks is
removed in 11.0 and the cost calculation for clustered keys, secondary
keys, index scan and filter is more exact.

Test case changes for --optimizer-adjust_secondary_key_costs=1
(Fix secondary key costs to be 5x of primary key):

- stat_tables_innodb:
  - Complex change (probably ok as number of rows are really small)
    - ref over 1 row changed to range over 10 rows with join buffer
    - ref over 5 rows changed to eq_ref
    - secondary ref over 1 row changed to ref of primary key over 4 rows
    - Change of key to use longer key with index pushdown (a little
      bit worse but not significant).
  - Change to use secondary (1 row) -> primary (4 rows)
- rowid_filter_innodb:
  - index_merge (2 rows) & ref (1) -> all (23 rows) -> primary eq_ref.

Test case changes for --optimizer-adjust_secondary_key_costs=2
(remove of worst_seeks & adjust filter cost):

- stat_tables_innodb:
  - Join order change (probably ok as number of rows are really small)
  - ref (5 rows) & ref(1 row) changed to range (10 rows & join buffer)
    & eq_ref.
- selectivity_innodb:
  - ref -> ref|filter  (ok)
- rowid_filter_innodb:
  - ref -> ref|filter (ok)
  - range|filter (64 rows) changed to ref|filter (128 rows).
    ok as ref|filter outputs wrong number of rows in explain.
- range, range_mrr_icp:
  -ref (500 rows -> ALL (1000 rows) (ok)
- select_pkeycache, select, select_jcl6:
  - ref|filter (2 rows) -> ref (2 rows) (ok)
- selectivity:
  - ref -> ref_filter (ok)
- range:
  - Change of 'filtered' but no stat or plan change (ok)
- selectivity:
 - ref -> ref+filter (ok)
 - Change of filtered but no plan change (ok)
- join_nested_jcl6:
  - range -> ref|filter (ok as only 2 rows)
- subselect3, subselect3_jcl6:
  - ref_or_null (4 rows) -> ALL (10 rows) (ok)
  - Index_subquery (4 rows) -> ALL (10 rows)  (ok)
- partition_mrr_myisam, partition_mrr_aria and partition_mrr_innodb:
  - Uses ALL instead of REF for a key value that is the same for > 50%
    of rows.  (good)
order_by_innodb:
  - range (200 rows) -> ref (20 rows)+filesort (ok)
- subselect_sj2_mat:
  - One test changed. One ALL removed and replaced with eq_ref. Likely
    to be better.
- join_cache:
  - Changed ref over 60% of the rows to use hash join (ok)
- opt_tvc:
  - Changed to use eq_ref instead of ref with plan change (probably ok)
- opt_trace:
  - No worst/max seeks clipping (good).
  - Almost double range_scan_time and index_scan_time (ok).
- rowid_filter:
  - ref -> ref|filtered (ok)
  - range|filter (77 rows) changed to ref|filter (151 rows).  Proably
    ok as ref|filter outputs wrong number of rows in explain.

Reviewer: Sergei Petrunia <sergey@mariadb.com>
2024-01-23 13:03:11 +02:00
Michael Widenius
7af50e4df4 MDEV-32551: "Read semi-sync reply magic number error" warnings on master
rpl_semi_sync_slave_enabled_consistent.test and the first part of
the commit message comes from Brandon Nesterenko.

A test to show how to induce the "Read semi-sync reply magic number
error" message on a primary. In short, if semi-sync is turned on
during the hand-shake process between a primary and replica, but
later a user negates the rpl_semi_sync_slave_enabled variable while
the replica's IO thread is running; if the io thread exits, the
replica can skip a necessary call to kill_connection() in
repl_semisync_slave.slave_stop() due to its reliance on a global
variable. Then, the replica will send a COM_QUIT packet to the
primary on an active semi-sync connection, causing the magic number
error.

The test in this patch exits the IO thread by forcing an error;
though note a call to STOP SLAVE could also do this, but it ends up
needing more synchronization. That is, the STOP SLAVE command also
tries to kill the VIO of the replica, which makes a race with the IO
thread to try and send the COM_QUIT before this happens (which would
need more debug_sync to get around). See THD::awake_no_mutex for
details as to the killing of the replica’s vio.

Notes:
- The MariaDB documentation does not make it clear that when one
  enables semi-sync replication it does not matter if one enables
  it first in the master or slave. Any order works.

Changes done:
- The rpl_semi_sync_slave_enabled variable is now a default value for
  when semisync is started. The variable does not anymore affect
  semisync if it is already running. This fixes the original reported
  bug.  Internally we now use repl_semisync_slave.get_slave_enabled()
  instead of rpl_semi_sync_slave_enabled. To check if semisync is
  active on should check the @@rpl_semi_sync_slave_status variable (as
  before).
- The semisync protocol conflicts in the way that the original
  MySQL/MariaDB client-server protocol was designed (client-server
  send and reply packets are strictly ordered and includes a packet
  number to allow one to check if a packet is lost). When using
  semi-sync the master and slave can send packets at 'any time', so
  packet numbering does not work. The 'solution' has been that each
  communication starts with packet number 1, but in some cases there
  is still a chance that the packet number check can fail.  Fixed by
  adding a flag (pkt_nr_can_be_reset) in the NET struct that one can
  use to signal that packet number checking should not be done. This
  is flag is set when semi-sync is used.
- Added Master_info::semi_sync_reply_enabled to allow one to configure
  some slaves with semisync and other other slaves without semisync.
  Removed global variable semi_sync_need_reply that would not work
  with multi-master.
- Repl_semi_sync_master::report_reply_packet() can now recognize
  the COM_QUIT packet from semisync slave and not give a
  "Read semi-sync reply magic number error" error for this case.
  The slave will be removed from the Ack listener.
- On Windows, don't stop semisync Ack listener just because one
  slave connection is using socket_id > FD_SETSIZE.
- Removed busy loop in Ack_receiver::run() by using
 "Self-pipe trick" to signal new slave and stop Ack_receiver.
- Changed some Repl_semi_sync_slave functions that always returns 0
  from int to void.
- Added Repl_semi_sync_slave::slave_reconnect().
- Removed dummy_function Repl_semi_sync_slave::reset_slave().
- Removed some duplicate semisync notes from the error log.
- Add test of "if (get_slave_enabled() && semi_sync_need_reply)"
  before calling Repl_semi_sync_slave::slave_reply().
  (Speeds up the code as we can skip all initializations).
- If epl_semisync_slave.slave_reply() fails, we disable semisync
  for that connection.
- We do not call semisync.switch_off() if there are no active slaves.
  Instead we check in Repl_semi_sync_master::commit_trx() if there are
  no active threads. This simplices the code.
- Changed assert() to DBUG_ASSERT() to ensure that the DBUG log is
  flushed in case of asserts.
- Removed the internal rpl_semi_sync_slave_status as it is not needed
  anymore. The @@rpl_semi_sync_slave_status status variable is now
  mapped to rpl_semi_sync_enabled.
- Removed rpl_semi_sync_slave_enabled  as it is not needed anymore.
  Repl_semi_sync_slave::get_slave_enabled() contains the active status.
- Added checking that we do not add a slave twice with
  Ack_receiver::add_slave(). This could happen with old code.
- Removed Repl_semi_sync_master::check_and_switch() as it is not
  needed anymore.
- Ensure that when we call Ack_receiver::remove_slave() that the slave
  is removed from the listener before function returns.
- Call listener.listen_on_sockets() outside of mutex for better
  performance and less contested mutex.
- Ensure that listening is ignoring newly added slaves when checking for
  responses.
- Fixed the master ack_receiver listener is not killed if there are no
  connected slaves (and thus stop semisync handling of future
  connections). This could happen if all slaves sockets where would be
  marked as unreliable.
- Added unlink() to base_ilist_iterator and remove() to
  I_List_iterator. This enables us to remove 'dead' slaves in
  Ack_recever::run().
- kill_zombie_dump_threads() now does killing of dump threads properly.
  - It can now kill several threads (should be impossible but could
    happen if IO slaves reconnects very fast).
  - We now wait until the dump thread is done before starting the
    dump.
- Added an error if kill_zombie_dump_threads() fails.
- Set thd->variables.server_id before calling
  kill_zombie_dump_threads(). This simplies the code.
- Added a lot of comments both in code and tests.
- Removed DBUG_EVALUATE_IF "failed_slave_start" as it is not used.

Test changes:
- rpl.rpl_session_var2 added which runs rpl.rpl_session_var test with
  semisync enabled.
- Some timings changed slight with startup of slave which caused
  rpl_binlog_dump_slave_gtid_state_info.text to fail as it checked the
  error log file before the slave had started properly. Fixed by
  adding wait_for_pattern_in_file.inc that allows waiting for the
  pattern to appear in the log file.
- Tests have been updated so that we first set
  rpl_semi_sync_master_enabled on the master and then set
  rpl_semi_sync_slave_enabled on the slaves (this is according to how
  the MariaDB documentation document how to setup semi-sync).
- Error text "Master server does not have semi-sync enabled" has been
  replaced with "Master server does not support semi-sync" for the
  case when the master supports semi-sync but semi-sync is not
  enabled.

Other things:
- Some trivial cleanups in Repl_semi_sync_master::update_sync_header().
- We should in 11.3 changed the default value for
  rpl-semi-sync-master-wait-no-slave from TRUE to FALSE as the TRUE
  does not make much sense as default. The main difference with using
  FALSE is that we do not wait for semisync Ack if there are no slave
  threads.  In the case of TRUE we wait once, which did not bring any
  notable benefits except slower startup of master configured for
  using semisync.

Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com>

This solves the problem reported in MDEV-32960 where a new
slave may not be registered in time and the master disables
semi sync because of that.
2024-01-23 13:03:11 +02:00
Sergei Golubchik
c154aafe1a Merge remote-tracking branch '11.3' into 11.4 2023-12-21 15:40:55 +01:00
Sergei Golubchik
7f0094aac8 Merge branch '11.2' into 11.3 2023-12-21 02:14:59 +01:00
Sergei Golubchik
fef31a26f3 Merge branch '11.1' into 11.2 2023-12-20 23:43:05 +01:00
Sergei Golubchik
7a5448f8da Merge branch '11.0' into 11.1 2023-12-19 20:11:54 +01:00
Sergei Golubchik
8c8bce05d2 Merge branch '10.11' into 11.0 2023-12-19 15:53:18 +01:00