The problem was when a GTID event was part of a group commit, and so contained
a commit id. The code that replaces GTID with a BEGIN event for old slaves did
not correctly handle this case.
Fix the code so that the GTID with commit id can also be properly replaced
with a BEGIN query event. The extra two bytes are in the BEGIN event replaced
with a dummy, empty time zone string.
As a side-effect of purge_relay_logs(), sql_slave_skip_counter
was silently ignored in GTID mode.
But sql_slave_skip_counter in fact is not a good match with GTID.
And it is not really needed either, as users can explicitly set
@@gtid_slave_pos to skip specific GTIDs, in a way that matches
well how GTID replication works.
So with this patch, we give an error on attempts to set
sql_slave_skip_counter when using GTID, with a suggestion to use
gtid_slave_pos instead, if needed.
AUTO_INC COLUMN ONLY ON SLAVE
In RBR, if the slave's table as one additional auto_inc column,
then, it will insert the value 0 instead of generating the next
auto_inc number.
We fix this by checking that if an auto_inc extra column exists,
when compared to column data of the row event, we explicitly set
it to NULL and flag the engine that a nulled auto_inc column will
be inserted.
(MYSQLBINLOG -V CRASHES WITH THAT BINLOG)
Problem: If slave receives a corrupted row event,
slave server is crashing.
Analysis: When slave is unpacking the row event, it is
not validating the data before applying the event. If the
data is corrupted for eg: the length of a field is wrong,
it could end up reading wrong data leading to a crash.
A similar problem happens when mysqlbinlog tool is used
against a corrupted binlog using '-v' option. Due to -v
option, the tool tries to print the values of all the
fields. Corrupted field length could lead to a crash.
Fix: Before unpacking the field, a verification
will be made on the length. If it falls into the event
range, only then it will be unpacked. Otherwise,
"ER_SLAVE_CORRUPT_EVENT" error will be thrown.
Incase mysqlbinlog -v case, the field value will not be
printed and the processing of the file will be stopped.
sql/field.h:
Removed a function which is not required anymore
sql/log_event.cc:
Adding a validation on the field length before
the tool tries to print the value.
sql/log_event.h:
Changing unpack_row call according to the new arguments
sql/log_event_old.h:
Changing unpack_row call according to the new arguments
sql/rpl_record.cc:
Adding a new argument 'row_end' which tells
the end position of the complete data in the
row event. It will be used to do validation
before doing 'unpack' field.
sql/rpl_record.h:
Adding a new argument 'row_end' which tells
the end position of the complete data in the
row event. It will be used to do validation
before doing 'unpack' field.
sql/rpl_utility.cc:
Now calc_field_size() is required for client too.
- Made slaves temporary table multi-thread slave safe by adding mutex around save_temporary_table usage.
- rli->save_temporary_tables is the active list of all used temporary tables
- This is copied to THD->temporary_tables when temporary tables are opened and updated when temporary tables are closed
- Added THD->lock_temporary_tables() and THD->unlock_temporary_tables() to simplify this.
- Relay_log_info->sql_thd renamed to Relay_log_info->sql_driver_thd to avoid wrong usage for merged code.
- Added is_part_of_group() to mark functions that are part of the next function. This replaces setting IN_STMT when events are executed.
- Added is_begin(), is_commit() and is_rollback() functions to Query_log_event to simplify code.
- If slave_skip_counter is set run things in single threaded mode. This simplifies code for skipping events.
- Updating state of relay log (IN_STMT and IN_TRANSACTION) is moved to one single function: update_state_of_relay_log()
We can't use OPTION_BEGIN to check for the state anymore as the sql_driver and sql execution threads may be different.
Clear IN_STMT and IN_TRANSACTION in init_relay_log_pos() and Relay_log_info::cleanup_context() to ensure the flags doesn't survive slave restarts
is_in_group() is now independent of state of executed transaction.
- Reset thd->transaction.all.modified_non_trans_table() if we did set it for single table row events.
This was mainly for keeping the flag as documented.
- Changed slave_open_temp_tables to uint32 to be able to use atomic operators on it.
- Relay_log_info::sleep_lock -> rpl_group_info::sleep_lock
- Relay_log_info::sleep_cond -> rpl_group_info::sleep_cond
- Changed some functions to take rpl_group_info instead of Relay_log_info to make them multi-slave safe and to simplify usage
- do_shall_skip()
- continue_group()
- sql_slave_killed()
- next_event()
- Simplifed arguments to io_salve_killed(), check_io_slave_killed() and sql_slave_killed(); No reason to supply THD as this is part of the given structure.
- set_thd_in_use_temporary_tables() removed as in_use is set on usage
- Added information to thd_proc_info() which thread is waiting for slave mutex to exit.
- In open_table() reuse code from find_temporary_table()
Other things:
- More DBUG statements
- Fixed the rpl_incident.test can be run with --debug
- More comments
- Disabled not used function rpl_connect_master()
mysql-test/suite/perfschema/r/all_instances.result:
Moved sleep_lock and sleep_cond to rpl_group_info
mysql-test/suite/rpl/r/rpl_incident.result:
Updated result
mysql-test/suite/rpl/t/rpl_incident-master.opt:
Not needed anymore
mysql-test/suite/rpl/t/rpl_incident.test:
Fixed that test can be run with --debug
sql/handler.cc:
More DBUG_PRINT
sql/log.cc:
More comments
sql/log_event.cc:
Added DBUG statements
do_shall_skip(), continue_group() now takes rpl_group_info param
Use is_begin(), is_commit() and is_rollback() functions instead of inspecting query string
We don't have set slaves temporary tables 'in_use' as this is now done when tables are opened.
Removed IN_STMT flag setting. This is now done in update_state_of_relay_log()
Use IN_TRANSACTION flag to test state of relay log.
In rows_event_stmt_cleanup() reset thd->transaction.all.modified_non_trans_table if we had set this before.
sql/log_event.h:
do_shall_skip(), continue_group() now takes rpl_group_info param
Added is_part_of_group() to mark events that are part of the next event. This replaces setting IN_STMT when events are executed.
Added is_begin(), is_commit() and is_rollback() functions to Query_log_event to simplify code.
sql/log_event_old.cc:
Removed IN_STMT flag setting. This is now done in update_state_of_relay_log()
do_shall_skip(), continue_group() now takes rpl_group_info param
sql/log_event_old.h:
Added is_part_of_group() to mark events that are part of the next event.
do_shall_skip(), continue_group() now takes rpl_group_info param
sql/mysqld.cc:
Changed slave_open_temp_tables to uint32 to be able to use atomic operators on it.
Relay_log_info::sleep_lock -> Rpl_group_info::sleep_lock
Relay_log_info::sleep_cond -> Rpl_group_info::sleep_cond
sql/mysqld.h:
Updated types and names
sql/rpl_gtid.cc:
More DBUG
sql/rpl_parallel.cc:
Updated TODO section
Set thd for event that is execution
Use new is_begin(), is_commit() and is_rollback() functions.
More comments
sql/rpl_rli.cc:
sql_thd -> sql_driver_thd
Relay_log_info::sleep_lock -> rpl_group_info::sleep_lock
Relay_log_info::sleep_cond -> rpl_group_info::sleep_cond
Clear IN_STMT and IN_TRANSACTION in init_relay_log_pos() and Relay_log_info::cleanup_context() to ensure the flags doesn't survive slave restarts.
Reset table->in_use for temporary tables as the table may have been used by another THD.
Use IN_TRANSACTION instead of OPTION_BEGIN to check state of relay log.
Removed IN_STMT flag setting. This is now done in update_state_of_relay_log()
sql/rpl_rli.h:
Changed relay log state flags to bit masks instead of bit positions (most other code we have uses bit masks)
Added IN_TRANSACTION to mark if we are in a BEGIN ... COMMIT section.
save_temporary_tables is now thread safe
Relay_log_info::sleep_lock -> rpl_group_info::sleep_lock
Relay_log_info::sleep_cond -> rpl_group_info::sleep_cond
Relay_log_info->sql_thd renamed to Relay_log_info->sql_driver_thd to avoid wrong usage for merged code
is_in_group() is now independent of state of executed transaction.
sql/slave.cc:
Simplifed arguments to io_salve_killed(), sql_slave_killed() and check_io_slave_killed(); No reason to supply THD as this is part of the given structure.
set_thd_in_use_temporary_tables() removed as in_use is set on usage in sql_base.cc
sql_thd -> sql_driver_thd
More DBUG
Added update_state_of_relay_log() which will calculate the IN_STMT and IN_TRANSACTION state of the relay log after the current element is executed.
If slave_skip_counter is set run things in single threaded mode.
Simplifed arguments to io_salve_killed(), check_io_slave_killed() and sql_slave_killed(); No reason to supply THD as this is part of the given structure.
Added information to thd_proc_info() which thread is waiting for slave mutex to exit.
Disabled not used function rpl_connect_master()
Updated argument to next_event()
sql/sql_base.cc:
Added mutex around usage of slave's temporary tables. The active list is always kept up to date in sql->rgi_slave->save_temporary_tables.
Clear thd->temporary_tables after query (safety)
More DBUG
When using temporary table, set table->in_use to current thd as the THD may be different for slave threads.
Some code is ifdef:ed with REMOVE_AFTER_MERGE_WITH_10 as the given code in 10.0 is not yet in this tree.
In open_table() reuse code from find_temporary_table()
sql/sql_binlog.cc:
rli->sql_thd -> rli->sql_driver_thd
Remove duplicate setting of rgi->rli
sql/sql_class.cc:
Added helper functions rgi_lock_temporary_tables() and rgi_unlock_temporary_tables()
Would have been nicer to have these inline, but there was no easy way to do that
sql/sql_class.h:
Added functions to protect slaves temporary tables
sql/sql_parse.cc:
Added DBUG_PRINT
sql/transaction.cc:
Added comment
Currently several places use description_event->common_header_len instead of
LOG_EVENT_MINIMAL_HEADER_LEN when parsing events with "frozen" headers (such
as Start_event_v3 and its subclasses such as Format_description_log_event, as
well as Rotate_event). This causes events with extra headers (which would otherwise
be valid and those headers ignored) to be corrupted due to over-reading or skipping
into the data portion of the log events.
It is rewritten in some details patch of Jeremy Cole (See MDEV):
- The virtual function returns length to avoid IFs (and only one call of the virtual function made)
- Printing function avoids printing strings
The ignored events are not written to the relay log, but instead a fake
Rotate event is generated to handle update of position.
Extend this for Gtid so we similarly generate a fake Gtid_list event
to update the GTID position.
Also fix an unrelated test issue that got triggered by the added test cases.
Hook in the wait-for-prior-commit logic (not really tested yet).
Clean up some resource maintenance around rpl_group_info (may still be some
smaller issues there though).
Add a ToDo list at the top of rpl_parallel.cc
The problem was the Gtid_list event which is logged to the binlog in
10.0 and is not understood by the 5.5 server.
This event is supposed to be replaced with a dummy event for 5.5
servers. But the very first event logged in the very first binlog
has an empty list of GTID, which makes the event too short to be
replacable with an empty event.
The fix is to pad the empty Gtid_list event to be big enough to
be replacable by a dummy event.
Implement START SLAVE UNTIL master_gtid_pos = "<GTID position>".
Add test cases, including a test showing how to use this to promote
a new master among a set of slaves.
In log_event.h
DESCRIPTION:
Due to inclusion of an implementation file, namely 'rpl_tblmap.cc'
in a header file, namely 'log_event.h'; linker errors occur if
log_event.h is included in an application containing multiple source
files, such as in the case of Binlog API.
Binlog API requires including log_event.h in its source files;
which leads to multiple definition errors, for functions defined
in rpl_tblmap.cc for class 'table_mapping'.
FIX:
Change the inclusion from header file(log_event.h) to source files
using this header and have flag MYSQL_CLIENT set. The only file in
the current server repository is mysqlbinlog.cc.
client/mysqlbinlog.cc:
Pulled in code of rpl_tblmap.cc
sql/log_event.h:
Removed inclusion of the implementation file from this header file
When the slave connects, the master skips binlog event groups
until it reaches the position requested by the slave. To
identify event groups, it needs to detect COMMIT events. But
this detection did not correctly handle binlog checksums, so
could incorrectly skip extra groups due to not detecting the
end of an event group.
Merge of 10.0-mdev26 feature tree into 10.0-base.
Global transaction ID is prepended to each event group in the binlog.
Slave connect can request to start from GTID position instead of specifying
file name/offset of master binlog. This facilitates easy switch to a new
master.
Slave GTID state is stored in a table mysql.rpl_slave_state, which can be
InnoDB to get crash-safe slave state.
GTID includes a replication domain ID, allowing to keep track of distinct
positions for each of multiple masters.
At logging a first Query referring a user var, the slave missed to log the user var.
It appears that at execution of a Uservar event the slaver applier
thought of the variable as already logged.
The reason of misjudgement is in coincidence of query id:s: of one that the thread
holds at Uservar execution and another one that the thread sees at the Query applying.
While the two are naturally different in the regular execution branch (as two computational
events are separated as individual events), in the deferred applying case the User var execution
effectively belongs to its Query processing.
Fixed with storing the Uservar parsing time (where desicion to defer is taken) query id
to temporarily substitute with it the actual query id at the Uservar execution time
(along with its query).
Such manipulation mimics behaviour of the regular applying branch.
sql/log_event.cc:
Storing the Uservar parsing time query id into a new member of the event
to to temporarily substitute
with it the actual thread id at the Uservar execution time.
sql/log_event.h:
Storage for keeping query-id in User-var intance is added.
Fix MDEV-4275 - I/O thread restart duplicates events in the relay log.
The first time we connect to master after CHANGE MASTER or restart, we connect
from the GTID position. But then subsequent reconnects or IO thread restarts
reconnect with the old-style file/offset binlog pos from where it left off at
last disconnect. This is necessary to avoid duplicate events in the relay
logs, as there is nothing that synchronises the SQL thread update of GTID
state (multiple threads in case of multi-source) with IO thread reconnects.
Test cases.
Some small cleanups and fixes.
- Fix skipping initial MyISAM DML when connecting using GTID.
- Fix RESET MASTER not clearing in-memory binlog state.
- Fix not reading standalone flag in Gtid_log_event::peek().
- Fix skipping DDL that the slave has already seen when using GTID position.
- Fix that slave GTID state was updated from the wrong place in the code,
causing random crashing and other misery.
- Fix updates to mysql.rpl_slave_state to not go to binlog (this would cause
duplicate key errors on the slave and is generally the wrong thing to do).
In method mysql_binlog_send, right after detecting a EOF in the
read event loop, and before deciding if we should change to a new
binlog file there is a execution window where new events can be
written to the binlog and a rotation can happen. When reaching
the test, the function will then change to a new binlog file
ignoring all the events written in this window. This will result
in events not being replicated.
Only when the binlog is detected as deactivated in the event loop
of the dump thread, can we really know that no more events
remain. For this reason, this test is now made under the log lock
in the beginning of the event loop when reading the events.
Slave now loads the GTID state from the master when connecting with
old-style filename/offset position.
This allows the user to use MASTER_GTID_POS=AUTO on next CHANGE MASTER
without any other action needed.
Implement binlog_gtid_pos() function. This will be used so that
the slave can obtain the gtid position automatically from first
connect with old-style position - then MASTER_GTID_POS=AUTO will
work the next time. Can also be used by mysqldump --master-data
to give the current gtid position directly.
When starting slave, check binlog state in addition to mysql.rpl_slave.state.
This allows to switch a previous master to be a slave directly
with MASTER_GTID_POS=AUTO.
FORMAT_DESCRIPTION_LOG_EVENT::CALC_SERVER_VERSION_SPLIT
Problem: When reading a Format_description_log_event, it supposes MySQL
version is always valid and DBUG_ASSERTION is used check the version number.
However, user may give a wrong binlog offset, even give a faked binary event
which includes an invalid MySQL version. This will cause server crash.
Fix: The assertions are removed and an error will be reported if MySQL
version in Format_description_log_event is invalid.
Now slave can connect to master, sending start position as slave state
rather than old-style binlog name/position.
This enables to switch to a new master by changing just connection
information, replication slave GTID state ensures that slave starts
at the correct point in the new master.
MDEV-26: Global transaction id, partial commit
Change server_id to be a session variable.
User with SUPER can set it to binlog with different server_id.
Implement backward-compatible ::server_id mirror for plugins.
This bug had two problems:
P1) Reads out of bounds;
P2) Writes out of bounds.
PROBLEM P1
----------
User_var_log_event unmarshalling from binlog was not performing range
checks when using name_len and val_len variables to walk on event
buffer.
Added range checks to User_var_log_event unmarshalling to prevent
unmarshalling errors.
PROBLEM P2
----------
User_var_log_event value was allocated on thread stack, what caused
stack frame errors when User_var_log_event value was bigger than thread
stack size.
Currently value is allocated on heap memory.
QUOTING IN REPLICATION
Problem: Misquoting or unquoted identifiers may lead to
incorrect statements to be logged to the binary log.
Fix: we use specialized functions to append quoted identifiers in
the statements generated by the server.
Introduce a new storage engine API method commit_checkpoint_request().
This is used to replace the fsync() at the end of every storage engine
commit with a single fsync() when a binlog is rotated.
Binlog rotation is now done during group commit instead of being
delayed until unlog(), removing some server stall and avoiding an
expensive lock/unlock of LOCK_log inside unlog().
An "orthographic" typo in User_var::set_deferred() was made in fixes for
bug@14275000. While editing the signature of the initial patch to remove
the only argument, the assigned value of the argument remained in the body ...
to be successfully compiled (!) thanks to names coincidence:
the arg to User_var method and its member.
Fixed with correcting the typo.
two tests still fail:
main.innodb_icp and main.range_vs_index_merge_innodb
call records_in_range() with both range ends being open
(which triggers an assert)
MASTER-MASTER AND USING SET USE
Problem:
=======
In a master-master set-up, a master can show a wrong
'SHOW SLAVE STATUS' output.
Requirements:
- master-master
- log_slave_updates
This is caused when using SET user-variables and then using
it to perform writes. From then on the master that performed
the insert will have a SHOW SLAVE STATUS that is wrong and
it will never get updated until a write happens on the other
master. On"Master A" the "exec_master_log_pos" is not
getting updated.
Analysis:
========
Slave receives a "User_var" event from the master and after
applying the event, when "log_slave_updates" option is
enabled the slave tries to write this applied event into
its own binary log. At the time of writing this event the
slave should use the "originating server-id". But in the
above case the sever always logs the "user var events"
by using its global server-id. Due to this in a
"master-master" replication when the event comes back to the
originating server the "User_var_event" doesn't get skipped.
"User_var_events" are context based events and they always
follow with a query event which marks their end of group.
Due to the above mentioned problem with "User_var_event"
logging the "User_var_event" never gets skipped where as
its corresponding "query_event" gets skipped. Hence the
"User_var" event always waits for the next "query event"
and the "Exec_master_log_position" does not get updated
properly.
Fix:
===
`MYSQL_BIN_LOG::write' function is used to write events
into binary log. Within this function a new object for
"User_var_log_event" is created and this new object is used
to write the "User_var" event in the binlog. "User var"
event is inherited from "Log_event". This "Log_event" has
different overloaded constructors. When a "THD" object
is present "Log_event(thd,...)" constructor should be used
to initialise the objects and in the absence of a valid
"THD" object "Log_event()" minimal constructor should be
used. In the above mentioned problem always default minimal
constructor was used which is incorrect. This minimal
constructor is replaced with "Log_event(thd,...)".
sql/log_event.h:
Replaced the default constructor with another constructor
which takes "THD" object as an argument.
Fixes for BUG11761686 left a flaw that managed to slip away from testing.
Only effective filtering branch was actually tested with a regression test
added to rpl_filter_tables_not_exist.
The reason of the failure is destuction of too early mem-root-allocated memory
at the end of the deferred User-var's do_apply_event().
Fixed with bypassing free_root() in the deferred execution branch.
Deallocation of created in do_apply_event() items is done by the base code
through THD::cleanup_after_query() -> free_items() that the parent Query
can't miss.
sql/log_event.cc:
Do not call free_root() in case the deferred User-var event.
Necessary methods to the User-var class are added, do_apply_event() refined.
sql/log_event.h:
Necessary methods to avoid destoying mem-root-based memory at
User-var applying are defined.
Keep track of how many pending XIDs (transactions that are prepared in
storage engine and written into binlog, but not yet durably committed
on disk in the engine) there are in each binlog.
When the count of one binlog drops to zero, write a new binlog checkpoint
event, telling which is the oldest binlog with pending XIDs.
When doing XA recovery after a crash, check the last binlog checkpoint
event, and scan all binlog files from that point onwards for XIDs that
must be committed if found in prepared state inside engine.
Remove the code in binlog rotation that waits for all prepared XIDs to
be committed before writing a new binlog file (this is no longer necessary
when recovery can scan multiple binlog files).
Add function to replace arbitrary event with dummy event.
Add code which uses this to fix the bug that enabling row_annotate events
on the master breaks slaves which do not request such events.
Add that slaves set a variable @mariadb_slave_capability to inform the
master in a robust way about which events it can, and cannot, handle.
Add tests.
Problem
========
SQL statements close to the size of max_allowed_packet produce binary
log events larger than max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet + max_event_header
length. Now since the event length exceeds this size master Dump
thread is unable to send the packet on to the slave.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem was fixed by increasing the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option included which is used to
regulate the max_allowed_packet of the slave thread (IO/SQL).
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value ,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
BUG#11761686 insert_id event is not filtered.
Two issues are covered.
INSERT into autoincrement field which is not the first part in the composed primary key
is unsafe by autoincrement logging design. The case is specific to MyISAM engine
because Innodb does not allow such table definition.
However no warnings and row-format logging in the MIXED mode was done, and
that is fixed.
Int-, Rand-, User-var log-events were not filtered along with their parent
query that made possible them to screw up execution context of the following
query.
Fixed with deferring their execution until the parent query.
******
Bug#11754117
Post review fixes.
mysql-test/suite/rpl/r/rpl_auto_increment_bug45679.result:
a new result file is added.
mysql-test/suite/rpl/r/rpl_filter_tables_not_exist.result:
results updated.
mysql-test/suite/rpl/t/rpl_auto_increment_bug45679.test:
regression test for BUG#11754117-45670 is added.
mysql-test/suite/rpl/t/rpl_filter_tables_not_exist.test:
regression test for filtering issue of BUG#11754117 - 45670 is added.
sql/log_event.cc:
Logics are added for deferring and executing events associated
with the Query event.
sql/log_event.h:
Interface to deferred events batch execution is added.
sql/rpl_rli.cc:
initialization for new RLI members is added.
sql/rpl_rli.h:
New members to RLI are added to facilitate deferred events gathering
and execution control;
two general character RLI cleanup methods are constructed.
sql/rpl_utility.cc:
Deferred_log_events methods are difined.
sql/rpl_utility.h:
A new class Deferred_log_events is defined to implement
IRU events gathering, execution and cleanup.
sql/slave.cc:
Necessary changes to initialize `rli->deferred_events' and prevent
deferred event deletion in the main read-exec branch.
sql/sql_base.cc:
A new safe-check function for multi-part pk with auto-increment is defined
and deployed in lock_tables().
sql/sql_class.cc:
Initialization for a new member and replication cleanups are added
to THD class.
sql/sql_class.h:
THD class receives a new member to hold a specific execution
context for slave applier.
sql/sql_parse.cc:
Execution of the deferred event in started prior to its parent query.