Improve the performance of slave connect using B+-Tree indexes on each binlog
file. The index allows fast lookup of a GTID position to the corresponding
offset in the binlog file, as well as lookup of a position to find the
corresponding GTID position.
This eliminates a costly sequential scan of the starting binlog file
to find the GTID starting position when a slave connects. This is
especially costly if the binlog file is not cached in memory (IO
cost), or if it is encrypted or a lot of slaves connect simultaneously
(CPU cost).
The size of the index files is generally less than 1% of the binlog data, so
not expected to be an issue.
Most of the work writing the index is done as a background task, in
the binlog background thread. This minimises the performance impact on
transaction commit. A simple global mutex is used to protect index
reads and (background) index writes; this is fine as slave connect is
a relatively infrequent operation.
Here are the user-visible options and status variables. The feature is on by
default and is expected to need no tuning or configuration for most users.
binlog_gtid_index
On by default. Can be used to disable the indexes for testing purposes.
binlog_gtid_index_page_size (default 4096)
Page size to use for the binlog GTID index. This is the size of the nodes
in the B+-tree used internally in the index. A very small page-size (64 is
the minimum) will be less efficient, but can be used to stress the
BTree-code during testing.
binlog_gtid_index_span_min (default 65536)
Control sparseness of the binlog GTID index. If set to N, at most one
index record will be added for every N bytes of binlog file written.
This can be used to reduce the number of records in the index, at
the cost only of having to scan a few more events in the binlog file
before finding the target position
Two status variables are available to monitor the use of the GTID indexes:
Binlog_gtid_index_hit
Binlog_gtid_index_miss
The "hit" status increments for each successful lookup in a GTID index.
The "miss" increments when a lookup is not possible. This indicates that the
index file is missing (eg. binlog written by old server version
without GTID index support), or corrupt.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The existing INFORMATION_SCHEMA.TABLE_PRIVILEGES displays only those
privileges that were specifically granted on the table level,
whereas it may be useful to see privileges granted at the database
and global level.
This commit adds a new view `table_privileges` to the `sys` schema
for that purpose. The view shows privileges on existing tables
and views, combining all possible levels:
- user_privileges
- schema_privileges
- table_privileges
Compute binlog checksums (when enabled) already when writing events
into the statement or transaction caches, where before it was done
when the caches are copied to the real binlog file. This moves the
checksum computation outside of holding LOCK_log, improving
scalabitily.
At stmt/trx cache write time, the final end_log_pos values are not
known, so with this patch these will be set to 0. Events that are
written directly to the binlog file (not through stmt/trx cache) keep
the correct end_log_pos value. The GTID and COMMIT/XID events at the
start and end of event groups are written directly, so the zero
end_log_pos is only for events in the middle of event groups, which
do not negatively affect replication.
An option --binlog-legacy-event-pos, off by default, is provided to
disable this behavior to provide backwards compatibility with any
external applications that might rely on end_log_pos in events in the
middle of event groups.
Checksums cannot be pre-computed when binlog encryption is enabled, as
encryption relies on correct end_log_pos to provide part of the
nonce/IV.
Checksum pre-computation is also disabled for WSREP/Galera, as it uses
events differently in its write-sets and so on. Extending pre-computation of
checksums to Galera where it makes sense could be added in a future patch.
The current --binlog-checksum configuration is saved in
binlog_cache_data at transaction start and used to pre-compute
checksums in cache, if applicable. When the cache is later copied to
the binlog, a check is made if the saved value still matches the
configured global value; if so, the events are block-copied directly
into the binlog file. If --binlog-checksum was changed during the
transaction, events are re-written to the binlog file one-by-one and
the checksums recomputed/discarded as appropriate.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Turn the remaining three `binlog*` options binlog_do_db, binlog_ignore_db,
binlog_rows_event_max_size into global variables so that they can be
visible from the SQL user level. This is for audit / secure
configuration check purposes.
Create new MTR tests to make sure that the newly created global
variables can be visible from the command line interface.
Behavior before the code change:
MariaDB [(none)]> SHOW GLOBAL VARIABLES WHERE
-> Variable_name LIKE 'binlog_do_db' OR
-> Variable_name LIKE 'binlog_ignore_db' OR
-> Variable_name LIKE 'binlog_row_event_max_size';
Empty set (0.001 sec)
Behavior after the code change:
MariaDB [(none)]> SHOW GLOBAL VARIABLES WHERE
-> Variable_name LIKE 'binlog_do_db' OR
-> Variable_name LIKE 'binlog_ignore_db' OR
-> Variable_name LIKE 'binlog_row_event_max_size';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| binlog_do_db | |
| binlog_ignore_db | |
| binlog_row_event_max_size | 8192 |
+---------------------------+-------+
3 rows in set (0.001 sec)
Note:
For `binlog_do_db` and `binlog_ignore_db`, we add a new class
`Sys_var_binlog_filter` to handle the dynamically-composable command line
options for `binlog_do_db` and `binlog_ignore_db`. Below
is the motivation:
When the users start the server with the option
--binlog-do-db="database1" --binlog-do-db="database2"
The expected behavior is that the system should allow replication for
both `database1` and `database2`, which is the logic of the original
code.
However, when turning the variables into system variables, the
functionality does not exist any more, since system variables will only
handle the last occurrence of the option, and in this case, the system
will only be able to handle `database2`.
Copyright:
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services, Inc.
These are mainly internal files so is a low impact change.
The few scripts/mysql*sql where renames to mariadb_* prefix
on the name.
mysql-test renamed to mariadb-test in the final packages
SUPER privilege used to allow various actions that were alternatively
allowed by one of BINLOG ADMIN, BINLOG MONITOR, BINLOG REPLAY,
CONNECTION ADMIN, FEDERATED ADMIN, REPL MASTER ADMIN, REPL SLAVE ADMIN,
SET USER, SLAVE MONITOR.
Now SUPER no longer does that, one has to grant one of the fine-grained
privileges above to be to perform corresponding actions.
On upgrade from MariaDB versions 10.11 and below all the privileges
above are granted automatically if the user has SUPER.
As a side-effect, such an upgrade will allow SUPER-user to run SHOW
BINLOG EVENTS, SHOW RELAYLOG EVENTS, SHOW SLAVE HOSTS, even if he wasn't
able to do it before the upgrade.
To prevent ASAN heap-use-after-poison in the MDEV-16549 part of
./mtr --repeat=6 main.derived
the initialization of Name_resolution_context was cleaned up.
- Add `replicate_rewrite_db` status variable, that may accept comma
separated key-value pairs.
- Note that option `OPT_REPLICATE_REWRITE_DB` already existed in `mysqld.h`
from this commit 23d8586dbf
Reviewer:Brandon Nesterenko <brandon.nesterenko@mariadb.com>
The benefit of this is that one can remove the READ ONLY ADMIN privilege
from all users and this way ensure that no one can do any changes on
any non-temporary tables.
This is good option to use on slaves when one wants to ensure that the
slave is kept identical to the master.
New Feature:
============
This patch adds a new system variable, @@slave_max_statement_time,
which limits the execution time of s slave’s events that implements
an equivalent to @@max_statement_time for slave applier.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Add a new privilege "SLAVE MONITOR" which will grant user the permission
to execute "SHOW SLAVE STATUS" and "SHOW RELAYLOG EVENTS" commands.
SHOW SLAVE STATUS requires either SLAVE MONITOR/SUPER
SHOW RELAYLOG EVENTS requires SLAVE MONITOR privilege.