MDEV-29533 Crash when MariaDB is replica of MySQL 8.0
MySQL 8.0 has added the following new events in the MySQL binary log
PARTIAL_UPDATE_ROWS_EVENT
TRANSACTION_PAYLOAD_EVENT
HEARTBEAT_LOG_EVENT_V2
- PARTIAL_UPDATE_ROWS_EVENT is used by MySQL to generate update
statements using JSON_SET, JSON_REPLACE and JSON_REMOVE to make
update of JSON columns more efficient. These events can be
disabled by setting 'binlog-row-value-options=""'
- TRANSACTION_PAYLOAD_EVENT is used by MySQL to signal that a
row event is compressed. It an be disably by setting
'binlog_transaction_compression=0'.
- HEARTBEAT_LOG_EVENT_V2 is written to the binary log many times
per seconds. It can be ignored by the server.
What this patch does:
- If PARTIAL_UPDATE_ROWS_EVENT or TRANSACTION_PAYLOAD_EVENT is found,
the server will stop with an error message of how to disable the
MySQL server to generate such events.
- HEARTBEAT_LOG_EVENT_V2 events are ignored.
- mariadb-binlog will write the name of the new events.
- mariadb-binlog will stop if PARTIAL_UPDATE_ROWS_EVENT or
TRANSACTION_PAYLOAD_EVENT is found, unless --force is given.
- Fixes a crash in mariadb-binlog if a character set unknown to
MariaDB is found. (MDEV-29533)
From Kristian Nielsen:
- Add test case for MySQL 8.0 to MariaDB replication and fixed a
a small typo in post_header_len initialization.
Reviewer: knielsen@mariadb.org
The Write_rows_log_event originally allocated the m_rows_buf up-front, and
thus is_valid() checks that the buffer is allocated correctly. But at some
point this was changed to allocate the buffer lazily on demand. This means
that a a valid event can now have m_rows_buf==NULL. The is_valid() code was
not changed, and thus is_valid() could return false on a valid event.
This caused a bug for REPLACE INTO t() VALUES(), () which generates a
write_rows event with no after image; then the m_rows_buf was never
allocated and is_valid() incorrectly returned false, causing an error in
some other parts of the code.
Also fix a couple of missing special cases in the code for mysqlbinlog to
correctly decode (in comments) row events with missing after image.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Fractional part < 100000 microseconds was printed without leading zeros,
causing such timestamps to be applied incorrectly in mariadb-binlog | mysql
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
There are two problems.
First, replication fails when XA transactions are used where the
slave has replicate_do_db set and the client has touched a different
database when running DML such as inserts. This is because XA
commands are not treated as keywords, and are thereby not exempt
from the replication filter. The effect of this is that during an XA
transaction, if its logged “use db” from the master is filtered out
by the replication filter, then XA END will be ignored, yet its
corresponding XA PREPARE will be executed in an invalid state,
thereby breaking replication.
Second, if the slave replicates an XA transaction which results in
an empty transaction, the XA START through XA PREPARE first phase of
the transaction won’t be binlogged, yet the XA COMMIT will be
binlogged. This will break replication in chain configurations.
The first problem is fixed by treating XA commands in
Query_log_event as keywords, thus allowing them to bypass the
replication filter. Note that Query_log_event::is_trans_keyword() is
changed to accept a new parameter to define its mode, to either
check for XA commands or regular transaction commands, but not both.
In addition, mysqlbinlog is adapted to use this mode so its
--database filter does not remove XA commands from its output.
The second problem fixed by overwriting the XA state in the XID
cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to
rollback the transaction and skip its binlogging. If the xid cache
is cleared before an XA transaction receives its completion command
(e.g. on server shutdown), then before reporting ER_XAER_NOTA when
the completion command is executed, the filter is first checked if
the database is ignored, and if so, the error is ignored.
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
Two new variables added:
- max_tmp_space_usage : Limits the the temporary space allowance per user
- max_total_tmp_space_usage: Limits the temporary space allowance for
all users.
New status variables: tmp_space_used & max_tmp_space_used
New field in information_schema.process_list: TMP_SPACE_USED
The temporary space is counted for:
- All SQL level temporary files. This includes files for filesort,
transaction temporary space, analyze, binlog_stmt_cache etc.
It does not include engine internal temporary files used for repair,
alter table, index pre sorting etc.
- All internal on disk temporary tables created as part of resolving a
SELECT, multi-source update etc.
Special cases:
- When doing a commit, the last flush of the binlog_stmt_cache
will not cause an error even if the temporary space limit is exceeded.
This is to avoid giving errors on commit. This means that a user
can temporary go over the limit with up to binlog_stmt_cache_size.
Noteworthy issue:
- One has to be careful when using small values for max_tmp_space_limit
together with binary logging and with non transactional tables.
If a the binary log entry for the query is bigger than
binlog_stmt_cache_size and one hits the limit of max_tmp_space_limit
when flushing the entry to disk, the query will abort and the
binary log will not contain the last changes to the table.
This will also stop the slave!
This is also true for all Aria tables as Aria cannot do rollback
(except in case of crashes)!
One way to avoid it is to use @@binlog_format=statement for
queries that updates a lot of rows.
Implementation:
- All writes to temporary files or internal temporary tables, that
increases the file size, are routed through temp_file_size_cb_func()
which updates and checks the temp space usage.
- Most of the temporary file monitoring is done inside IO_CACHE.
Temporary file monitoring is done inside the Aria engine.
- MY_TRACK and MY_TRACK_WITH_LIMIT are new flags for ini_io_cache().
MY_TRACK means that we track the file usage. TRACK_WITH_LIMIT means
that we track the file usage and we give an error if the limit is
breached. This is used to not give an error on commit when
binlog_stmp_cache is flushed.
- global_tmp_space_used contains the total tmp space used so far.
This is needed quickly check against max_total_tmp_space_usage.
- Temporary space errors are using EE_LOCAL_TMP_SPACE_FULL and
handler errors are using HA_ERR_LOCAL_TMP_SPACE_FULL.
This is needed until we move general errors to it's own error space
so that they cannot conflict with system error numbers.
- Return value of my_chsize() and mysql_file_chsize() has changed
so that -1 is returned in the case my_chsize() could not decrease
the file size (very unlikely and will not happen on modern systems).
All calls to _chsize() are updated to check for > 0 as the error
condition.
- At the destruction of THD we check that THD::tmp_file_space == 0
- At server end we check that global_tmp_space_used == 0
- As a precaution against errors in the tmp_space_used code, one can set
max_tmp_space_usage and max_total_tmp_space_usage to 0 to disable
the tmp space quota errors.
- truncate_io_cache() function added.
- Aria tables using static or dynamic row length are registered in 8K
increments to avoid some calls to update_tmp_file_size().
Other things:
- Ensure that all handler errors are registered. Before, some engine
errors could be printed as "Unknown error".
- Fixed bug in filesort() that causes a assert if there was an error
when writing to the temporay file.
- Fixed that compute_window_func() now takes into account write errors.
- In case of parallel replication, rpl_group_info::cleanup_context()
could call trans_rollback() with thd->error set, which would cause
an assert. Fixed by resetting the error before calling trans_rollback().
- Fixed bug in subselect3.inc which caused following test to use
heap tables with low value for max_heap_table_size
- Fixed bug in sql_expression_cache where it did not overflow
heap table to Aria table.
- Added Max_tmp_disk_space_used to slow query log.
- Fixed some bugs in log_slow_innodb.test
MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range
- Changed usage of timeval to my_timeval as the timeval parts on windows
are 32-bit long, which causes some compiler issues on windows.
This patch augments Gtid_log_event with the user thread-id.
In particular that compensates for the loss of this info in
Rows_log_events.
Gtid_log_event::thread_id gets visible in mysqlbinlog output like
#231025 16:21:45 server id 1 end_log_pos 537 CRC32 0x1cf1d963 GTID 0-1-2 ddl thread_id=10
as a 32 bit unsigned integer. Note this is a 32-bit value, as
the connection id can only be 32 bits (see MDEV-15089 for
details).
While the size of Gtid event has grown by 4 bytes
replication from OLD <-> NEW is not affected by it. This patch
also slightly changes the logic to convert Gtid events to Query
events for older replicas which don't support Gtid. Instead of
hard-coding the padding of the sys var section of the generated
Query event, the length to pad is dynamically calculated based
on the length of the Gtid event.
This work was started by the late Sujatha Sivakumar.
Brandon Nesterenko took it over, reviewed initial patches and
extended the work.
Also thanks to Andrei for his help in finalizing the fixes for
MDEV-33924, which were squashed into this patch.
Reviewed-by:
=============
Andrei Elkin <andrei.elkin@mariadb.com>
Kristian Nielsen <knielsen@knielsen-hq.org>
This reverts commit c37b2087b4.
In c37b20887, when re-binlogging a GTID event on a replica,
it will overwrite the thread_id from the primary to be the
value of the slave applier (SQL thread or parallel worker).
This should be the value of the original thread_id on the
master connection though, to both help track temporary
tables, and be consistent with Query_log_event.
Reverting the commit to re-target 11.5, so we can re-test
with the corrected thread_id.
Most things where wrong in the test suite.
The one thing that was a bug was that table_map_id was in some places
defined as ulong and in other places as ulonglong. On Linux 64 bit this
is not a problem as ulong == ulonglong, but on windows this caused failures.
Fixed by ensuring that all instances of table_map_id are ulonglong.
This patch augments Gtid_log_event with the user thread-id.
In particular that compensates for the loss of this info in
Rows_log_events.
Gtid_log_event::thread_id gets visible in mysqlbinlog output like
#231025 16:21:45 server id 1 end_log_pos 537 CRC32 0x1cf1d963 GTID 0-1-2 ddl thread_id=10
as 64 bit unsigned integer.
While the size of Gtid event has grown by 8-9 bytes
replication from OLD <-> NEW is not affected by it.
This work was started by the late Sujatha Sivakumar.
Brandon Nesterenko took it over, reviewed initial patches and extended
the work.
Reviewed-by: <andrei.elkin@mariadb.com>
Summary
=======
With FULL_NODUP mode, before image inclues all columns and after
image inclues only the changed columns. flashback will swap the
value of changed columns from after image to before image.
For example:
BI: c1, c2, c3_old, c4_old
AI: c3_new, c4_new
flashback will reconstruct the before and after images to
BI: c1, c2, c3_new, c4_new
AI: c3_old, c4_old
Implementation
==============
When parsing the before and after image, position and length of
the fields are collected into ai_fields and bi_fields, if it is an
Update_rows_event and the after image doesn't includes all columns.
The changed fields are swapped between bi_fields and ai_fields.
Then it recreates the before image and after image by using
bi_fields and ai_fields. nullbit will be set to 1 if the
field is NULL, otherwise nullbit will be 0.
It also optimized flashback a little bit.
- calc_row_event_length is used instead of print_verbose_one_row
- swap_buff1 and swap_buff2 are removed.
This is a preparatory commit for pre-computing checksums outside of
holding LOCK_log, no functional changes.
Which checksum algorithm is used (if any) when writing an event does not
belong in the event, it is a property of the log being written to.
Instead decide the checksum algorithm when constructing the
Log_event_writer object, and store it there.
Introduce a client-only Log_event::read_checksum_alg to be able to
print the checksum read, and a
Format_description_log_event::source_checksum_alg which is the
checksum algorithm (if any) to use when reading events from a log.
Also eliminate some redundant `enum` keywords on the enum_binlog_checksum_alg
type.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This patch adds a way to override default collations
(or "character set collations") for desired character sets.
The SQL standard says:
> Each collation known in an SQL-environment is applicable to one
> or more character sets, and for each character set, one or more
> collations are applicable to it, one of which is associated with
> it as its character set collation.
In MariaDB, character set collations has been hard-coded so far,
e.g. utf8mb4_general_ci has been a hard-coded character set collation
for utf8mb4.
This patch allows to override (globally per server, or per session)
character set collations, so for example, uca1400_ai_ci can be set as a
character set collation for Unicode character sets
(instead of compiled xxx_general_ci).
The array of overridden character set collations is stored in a new
(session and global) system variable @@character_set_collations and
can be set as a comma separated list of charset=collation pairs, e.g.:
SET @@character_set_collations='utf8mb3=uca1400_ai_ci,utf8mb4=uca1400_ai_ci';
The variable is empty by default, which mean use the hard-coded
character set collations (e.g. utf8mb4_general_ci for utf8mb4).
The variable can also be set globally by passing to the server startup command
line, and/or in my.cnf.