MDEV-29533 Crash when MariaDB is replica of MySQL 8.0
MySQL 8.0 has added the following new events in the MySQL binary log
PARTIAL_UPDATE_ROWS_EVENT
TRANSACTION_PAYLOAD_EVENT
HEARTBEAT_LOG_EVENT_V2
- PARTIAL_UPDATE_ROWS_EVENT is used by MySQL to generate update
statements using JSON_SET, JSON_REPLACE and JSON_REMOVE to make
update of JSON columns more efficient. These events can be
disabled by setting 'binlog-row-value-options=""'
- TRANSACTION_PAYLOAD_EVENT is used by MySQL to signal that a
row event is compressed. It an be disably by setting
'binlog_transaction_compression=0'.
- HEARTBEAT_LOG_EVENT_V2 is written to the binary log many times
per seconds. It can be ignored by the server.
What this patch does:
- If PARTIAL_UPDATE_ROWS_EVENT or TRANSACTION_PAYLOAD_EVENT is found,
the server will stop with an error message of how to disable the
MySQL server to generate such events.
- HEARTBEAT_LOG_EVENT_V2 events are ignored.
- mariadb-binlog will write the name of the new events.
- mariadb-binlog will stop if PARTIAL_UPDATE_ROWS_EVENT or
TRANSACTION_PAYLOAD_EVENT is found, unless --force is given.
- Fixes a crash in mariadb-binlog if a character set unknown to
MariaDB is found. (MDEV-29533)
From Kristian Nielsen:
- Add test case for MySQL 8.0 to MariaDB replication and fixed a
a small typo in post_header_len initialization.
Reviewer: knielsen@mariadb.org
The Write_rows_log_event originally allocated the m_rows_buf up-front, and
thus is_valid() checks that the buffer is allocated correctly. But at some
point this was changed to allocate the buffer lazily on demand. This means
that a a valid event can now have m_rows_buf==NULL. The is_valid() code was
not changed, and thus is_valid() could return false on a valid event.
This caused a bug for REPLACE INTO t() VALUES(), () which generates a
write_rows event with no after image; then the m_rows_buf was never
allocated and is_valid() incorrectly returned false, causing an error in
some other parts of the code.
Also fix a couple of missing special cases in the code for mysqlbinlog to
correctly decode (in comments) row events with missing after image.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Fractional part < 100000 microseconds was printed without leading zeros,
causing such timestamps to be applied incorrectly in mariadb-binlog | mysql
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
There are two problems.
First, replication fails when XA transactions are used where the
slave has replicate_do_db set and the client has touched a different
database when running DML such as inserts. This is because XA
commands are not treated as keywords, and are thereby not exempt
from the replication filter. The effect of this is that during an XA
transaction, if its logged “use db” from the master is filtered out
by the replication filter, then XA END will be ignored, yet its
corresponding XA PREPARE will be executed in an invalid state,
thereby breaking replication.
Second, if the slave replicates an XA transaction which results in
an empty transaction, the XA START through XA PREPARE first phase of
the transaction won’t be binlogged, yet the XA COMMIT will be
binlogged. This will break replication in chain configurations.
The first problem is fixed by treating XA commands in
Query_log_event as keywords, thus allowing them to bypass the
replication filter. Note that Query_log_event::is_trans_keyword() is
changed to accept a new parameter to define its mode, to either
check for XA commands or regular transaction commands, but not both.
In addition, mysqlbinlog is adapted to use this mode so its
--database filter does not remove XA commands from its output.
The second problem fixed by overwriting the XA state in the XID
cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to
rollback the transaction and skip its binlogging. If the xid cache
is cleared before an XA transaction receives its completion command
(e.g. on server shutdown), then before reporting ER_XAER_NOTA when
the completion command is executed, the filter is first checked if
the database is ignored, and if so, the error is ignored.
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
Most things where wrong in the test suite.
The one thing that was a bug was that table_map_id was in some places
defined as ulong and in other places as ulonglong. On Linux 64 bit this
is not a problem as ulong == ulonglong, but on windows this caused failures.
Fixed by ensuring that all instances of table_map_id are ulonglong.
Commit a923d6f49c disabled numeric setting
of character_set_* variables with non-default values:
MariaDB [(none)]> set character_set_client=224;
ERROR 1115 (42000): Unknown character set: '224'
However the corresponding binlog functionality still write numeric
values for log event, and this will break binlog replay if the value is
not default. Now make the server use 'String' type for
'character_set_client' when generating binlog events
Before:
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=224,@@session.collation_connection=224,@@session.collation_server=33/*!*/;
After:
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=utf8mb4,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
Note: prior to the previous commit, setting with '224' or '45' or
'utf8mb4' have the same effect, as they all set the parameter to
'utf8mb4'.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
don't assume anymore that OPTIONS_WRITTEN_TO_BIN_LOG is fixed once
and forever. Instead, deduct master's OPTIONS_WRITTEN_TO_BIN_LOG
from the master's version in binlog.
- Major rewrite of ddl_log.cc and ddl_log.h
- ddl_log.cc described in the beginning how the recovery works.
- ddl_log.log has unique signature and is dynamic. It's easy to
add more information to the header and other ddl blocks while still
being able to execute old ddl entries.
- IO_SIZE for ddl blocks is now dynamic. Can be changed without affecting
recovery of old logs.
- Code is more modular and is now usable outside of partition handling.
- Renamed log file to dll_recovery.log and added option --log-ddl-recovery
to allow one to specify the path & filename.
- Added ddl_log_entry_phase[], number of phases for each DDL action,
which allowed me to greatly simply set_global_from_ddl_log_entry()
- Changed how strings are stored in log entries, which allows us to
store much more information in a log entry.
- ddl log is now always created at start and deleted on normal shutdown.
This simplices things notable.
- Added probes debug_crash_here() and debug_simulate_error() to simply
crash testing and allow crash after a given number of times a probe
is executed. See comments in debug_sync.cc and rename_table.test for
how this can be used.
- Reverting failed table and view renames is done trough the ddl log.
This ensures that the ddl log is tested also outside of recovery.
- Added helper function 'handler::needs_lower_case_filenames()'
- Extend binary log with Q_XID events. ddl log handling is using this
to check if a ddl log entry was logged to the binary log (if yes,
it will be deleted from the log during ddl_log_close_binlogged_events()
- If a DDL entry fails 3 time, disable it. This is to ensure that if
we have a crash in ddl recovery code the server will not get stuck
in a forever crash-restart-crash loop.
mysqltest.cc changes:
- --die will now replace $variables with their values
- $error will contain the error of the last failed statement
storage engine changes:
- maria_rename() was changed to be more robust against crashes during
rename.
This change is to get rid of randomly failing tests, especially those
that reads random position of the binary log. From looking at the logs
it's clear that some failures is because of a read char (with value >= 128)
is converted to a big long value. Using uchar everywhere makes this much
less likely to happen.
Another benefit is that a lot of cast of char to uchar could be removed.
Other things:
- Removed some extra space before '=' and '+=' in assignments
- Fixed indentations and lines > 80 characters
- Replace '16' with 'element_size' (from class definition) in
Gtid_list_log_event()
This change removed 68 explict strlen() calls from the code.
The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name
Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible
Other things:
- There where compiler issues with ensuring that all character set names
points to the same string: gcc doesn't allow one to use integer constants
when defining global structures (constant char * pointers works fine).
To get around this, I declared defines for each character set name
length.
The cause was an uninitalized variable on the slave when reading a dummy
event that can only be generated by the test.
Fixed by ensuring that flag2 is always initialized.
Fixed also some indentation issues and improved comments.
MDEV-19964 S3 replication support
Added new configure options:
s3_slave_ignore_updates
"If the slave has shares same S3 storage as the master"
s3_replicate_alter_as_create_select
"When converting S3 table to local table, log all rows in binary log"
This allows on to configure slaves to have the S3 storage shared or
independent from the master.
Other thing:
Added new session variable '@@sql_if_exists' to force IF_EXIST to DDL's.
Lifted long standing limitation to the XA of rolling it back at the
transaction's
connection close even if the XA is prepared.
Prepared XA-transaction is made to sustain connection close or server
restart.
The patch consists of
- binary logging extension to write prepared XA part of
transaction signified with
its XID in a new XA_prepare_log_event. The concusion part -
with Commit or Rollback decision - is logged separately as
Query_log_event.
That is in the binlog the XA consists of two separate group of
events.
That makes the whole XA possibly interweaving in binlog with
other XA:s or regular transaction but with no harm to
replication and data consistency.
Gtid_log_event receives two more flags to identify which of the
two XA phases of the transaction it represents. With either flag
set also XID info is added to the event.
When binlog is ON on the server XID::formatID is
constrained to 4 bytes.
- engines are made aware of the server policy to keep up user
prepared XA:s so they (Innodb, rocksdb) don't roll them back
anymore at their disconnect methods.
- slave applier is refined to cope with two phase logged XA:s
including parallel modes of execution.
This patch does not address crash-safe logging of the new events which
is being addressed by MDEV-21469.
CORNER CASES: read-only, pure myisam, binlog-*, @@skip_log_bin, etc
Are addressed along the following policies.
1. The read-only at reconnect marks XID to fail for future
completion with ER_XA_RBROLLBACK.
2. binlog-* filtered XA when it changes engine data is regarded as
loggable even when nothing got cached for binlog. An empty
XA-prepare group is recorded. Consequent Commit-or-Rollback
succeeds in the Engine(s) as well as recorded into binlog.
3. The same applies to the non-transactional engine XA.
4. @@skip_log_bin=OFF does not record anything at XA-prepare
(obviously), but the completion event is recorded into binlog to
admit inconsistency with slave.
The following actions are taken by the patch.
At XA-prepare:
when empty binlog cache - don't do anything to binlog if RO,
otherwise write empty XA_prepare (assert(binlog-filter case)).
At Disconnect:
when Prepared && RO (=> no binlogging was done)
set Xid_cache_element::error := ER_XA_RBROLLBACK
*keep* XID in the cache, and rollback the transaction.
At XA-"complete":
Discover the error, if any don't binlog the "complete",
return the error to the user.
Kudos
-----
Alexey Botchkov took to drive this work initially.
Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of
good recommendations.
Sergei Voitovich made a magnificent review and improvements to the code.
They all deserve a bunch of thanks for making this work done!
calc_field_event_length should accurately calculate the size of BLOB type
fields, Instead of returning just the bytes taken by length it should return
length bytes + actual length.
Cherry-pick the commits the mysql and some changes.
WL#4618 RBR: extended table metadata in the binary log
This patch extends Table Map Event. It appends some new fields for
more metadata. The new metadata includes:
- Signedness of Numberic Columns
- Character Set of Character Columns and Binary Columns
- Column Name
- String Value of SET Columns
- String Value of ENUM Columns
- Primary Key
- Character Set of SET Columns and ENUM Columns
- Geometry Type
Some of them are optional, the patch introduces a GLOBAL system
variable to control it. It is binlog_row_metadata.
- Scope: GLOBAL
- Dynamic: Yes
- Type: ENUM
- Values: {NO_LOG, MINIMAL, FULL}
- Default: NO_LOG
Only Signedness, character set and geometry type are logged if it is MINIMAL.
Otherwise all of them are logged.
Also add a binlog_type_info() to field, So that we can have extract
relevant binlog info from field.