For GTID consistenty, GTID events was artificialy added before
replication happned. This event should not contain CHECKSUM calculated.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Problem:
========
A slave’s relay log format description event is used when
calculating Seconds_Behind_Master (SBM). This forces the SBM
value to spike when processing these events, as their creation
date is set to the timestamp that the IO thread begins.
Solution:
========
When the slave generates a format description event, mark the
event as a relay log event so it does not update the
rli->last_master_timestamp variable.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
This could cause out of order wsrep checkpoints due wsrep specific leader
code not being executed in `MYSQL_BIN_LOG::write_transaction_to_binlog_events`.
Move original result assignment to before wsrep logic to prevent that.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
When transaction creates or drops temporary tables and afterward its statement
faces an error even the transactional table statement's cached ROW
format events get involved into binlog and are visible after the transaction's commit.
Fixed with proper analysis of whether the errored-out statement needs
to be rolled back in binlog.
For instance a fact of already cached CREATE or DROP for temporary
tables by previous statements alone
does not cause to retain the being errored-out statement events in the
cache.
Conversely, if the statement creates or drops a temporary table
itself it can't be rolled back - this rule remains.
When transaction creates or drops temporary tables and afterward its statement
faces an error even the transactional table statement's cached ROW
format events get involved into binlog and are visible after the transaction's commit.
Fixed with proper analysis of whether the errored-out statement needs
to be rolled back in binlog.
For instance a fact of already cached CREATE or DROP for temporary
tables by previous statements alone
does not cause to retain the being errored-out statement events in the
cache.
Conversely, if the statement creates or drops a temporary table
itself it can't be rolled back - this rule remains.
Problem:
=======
There are two issues that are addressed in this patch:
1) SHOW BINARY LOGS uses caching to store the binary logs that exist
in the log directory; however, if new events are written to the logs,
the caching strategy is unaware. This is okay for users, as it is
okay for SHOW to return slightly old data. The test, however, can
result in inconsistent data. It runs two connections concurrently,
where one shows the logs, and the other adds a new file. The output
of SHOW BINARY LOGS then depends on when the cache is built, with
respect to the time that the second connection rotates the logs.
2) There is a race condition between RESET MASTER and SHOW BINARY
LOGS. More specifically, where they both need the binary log lock to
begin, SHOW BINARY LOGS only needs the lock to build its cache. If
RESET MASTER is issued after SHOW BINARY LOGS has built its cache and
before it has returned the results, the presented data may be
incorrect.
Solution:
========
1) As it is okay for users to see stale data, to make the test
consistent, use DEBUG_SYNC to force the race condition (problem 2) to
make SHOW BINARY LOGS build a cache before RESET MASTER is called.
Then, use additional logic from the next part of the solution to
rebuild the cache.
2) Use an Atomic_counter to keep track of the number of times RESET
MASTER has been called. If the value of the counter changes after
building the cache, the cache should be rebuilt and the analysis
should be restarted.
Reviewed By:
============
Andrei Elkin: <andrei.elkin@mariadb.com>
This patch changes statement rollback for streaming replication.
Previously, a statement rollback was turned into full transaction
rollback in the case where the transaction had already replicated a
fragment. This was introduced in the initial implementation of
streaming replication due to the fact that we do not have a mechanism
to perform a statement rollback on the applying side.
This policy is however overly pessimistic, causing full rollbacks even
in cases where a local statement rollback, would not require a
statement rollback on the applying side. This happens to be case when
the statement itself has not replicated any fragments.
So the patch changes the condition that determines if a statement
rollback should be turned into a full rollback accordingly.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
There was race between a committing transaction and the following in binlog
order FLUSH LOGS that could create a 2nd Binlog checkpoint (BCP) event
in the new file *before* the first logged-in-old-binlog transaction gets committed in
Innodb. That would cause the transaction loss at recovery, should
the server stop right after the BCP.
The race is tackled by enforcing the necessary set of mutexes to be acquired
by FLUSH-LOGS handler in the correct order (of the group commit leader
pattern).
Note, there remain two cases where a similar race is still possible:
- the above race as it is when the server is run with ("unlikely")
non-default `--binlog-optimize-thread-scheduling=0` (MDEV-24530), and
- at unlikely event of bin-logging of Incident event (MDEV-24531) that
also triggers binlog rotation,
in both cases though with lesser chances after the current fixes.
The assert fired falsely having not captured two more not apparent
possiblities in its condition.
They are masked out hton error out of REPLACE execution (so at later xa-prepare
that engine is still present as read-write) and a prepare-capable engine
which also may not be an actual participant in the xa transation. That
engine, such as SEQUENCE, though does create its own event block.
Binlog group commit could lead to a situation where group commit leader
accesses participant thd's wsrep client state concurrently with the
thread executing the participant thd.
This is because of race condition in
MYSQL_BIN_LOG::write_transaction_to_binlog_events(),
and was fixed by moving wsrep_ordered_commit() to happen in
MYSQL_BIN_LOG::queue_for_group_commit() under protection of
LOCK_prepare_ordered mutex.
2 different problems:
- MYSQL_BIN_LOG::write() did not check if mdl_context.acquire_lock() failed
- Sql_cmd_optimize_table::execute() and Sql_cmd_repair_table::execute()
called write_bin_log(), which could fail if sql_admin() had already
called my_eof()
Fixed by adding check for aquire_lock() return status and protect
write_bin_log() in the above two functions with set_overwrite_status().
Analysis:
========
Writes to 'rli->log_space_total' needs to be synchronized, otherwise both
SQL_THREAD and IO_THREAD can try to modify the variable simultaneously
resulting in incorrect rli->log_space_total. In the current test scenario
SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log'
and IO_THREAD is trying to increment the 'rli->log_space_total' in
'queue_event' simultaneously. Hence test occasionally fails with result
mismatch.
Fix:
===
Convert 'rli->log_space_total' variable to atomic type.
This commit fixed the problems with S3 after the "DROP TABLE FORCE" changes.
It also fixes all failing replication S3 tests.
A slave is delayed if it is trying to execute replicated queries on a
table that is already converted to S3 by the master later in the binlog.
Fixes for replication events on S3 tables for delayed slaves:
- INSERT and INSERT ... SELECT and CREATE TABLE are ignored but written
to the binary log. UPDATE & DELETE will be fixed in a future commit.
Other things:
- On slaves with --s3-slave-ignore-updates set, allow S3 tables to be
opened in read-write mode. This was done to be able to
ignore-but-replicate queries like insert. Without this change any
open of an S3 table failed with 'Table is read only' which is too
early to be able to replicate the original query.
- Errors are now printed if handler::extra() call fails in
wait_while_tables_are_used().
- Error message for row changes are changed from HA_ERR_WRONG_COMMAND
to HA_ERR_TABLE_READONLY.
- Disable some maria_extra() calls for S3 tables. This could cause
S3 tables to fail in some cases.
- Added missing thr_lock_delete() to ma_open() in case of failure.
- Removed from mysql_prepare_insert() the not needed argument 'table'.
EVEN IF I LOG TO FILE.
Analysis:
----------
MYSQL_UPGRADE of the master breaks the replication when
the query logging is enabled with FILE/NONE 'log-output'
option on the slave.
mysql_upgrade modifies the 'general_log' and 'slow_log'
tables after the logging is disabled as below:
SET @old_log_state = @@global.general_log;
SET GLOBAL general_log = 'OFF';
ALTER TABLE general_log
MODIFY event_time TIMESTAMP NOT NULL,
( .... );
SET GLOBAL general_log = @old_log_state;
and
SET @old_log_state = @@global.slow_query_log;
SET GLOBAL slow_query_log = 'OFF';
ALTER TABLE slow_log
MODIFY start_time TIMESTAMP NOT NULL,
( .... );
SET GLOBAL slow_query_log = @old_log_state;
In the binary log, only the ALTER statements are logged
but not the SET statements which turns ON/OFF the logging.
So when the slave replays the binary log,the ALTER of LOG
tables throws an error since the logging is enabled. Also
the 'log-output' option is not checked to determine
whether to allow/disallow the ALTER operation.
Fix:
----
The 'log-output' option is included in the check while
determining whether the query logging happens using the
log tables.
Picked from mysql respository at 0daaf8aecd8f84ff1fb400029139222ea1f0d812
MDEV-21953 deadlock between BACKUP STAGE BLOCK_COMMIT and parallel
replication
Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking
before log_and_order.
The original problem for MDEV-21953 was that while a thread was waiting in
for another threads to commit in 'log_and_order', it had the
MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the
MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks.
This causes a deadlock as the waited-for thread can never get past the
MDL_BACKUP_COMMIT lock in ha_commit_trans.
The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while
a thread is waiting for other 'previous' threads to commit. This ensures
that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which
ensures that there are no deadlocks anymore.
When converting a table (test.s3_table) from S3 to another engine, the
following will be logged to the binary log:
DROP TABLE IF EXISTS test.t1;
CREATE OR REPLACE TABLE test.t1 (...) ENGINE=new_engine
INSERT rows to test.t1 in binary-row-log-format
The bug is that the above statements are logged one by one to the binary
log. This means that a fast slave, configured to use the same S3 storage
as the master, would be able to execute the DROP and CREATE from the
binary log before the master has finished the ALTER TABLE.
In this case the slave would ignore the DROP (as it's on a S3 table) but
it will stop on CREATE of the local tale, as the table is still exists in
S3. The REPLACE part will be ignored by the slave as it can't touch the
S3 table.
The fix is to ensure that all the above statements is written to binary
log AFTER the table has been deleted from S3.
- Rewrote bool Query_compressed_log_event::write() to make it more readable
(no logic changes).
- Changed DBUG_PRINT of 'is_error:' to 'is_error():' to make it easier to
find error: in traces.
- Ensure that 'db' is never null in Query_log_event (Simplified code).