Approximative, because it's using our binlogging way (what we call "query"-level) and this is not as good as record-level binlog (5.1) would be. It imposes several
limitations to routines, and has caveats (which I'll document, and for which the server will try to issue errors but that is not always possible).
Reason I don't propagate caller info to the binlog as planned is that on master and slave
users may be different; even with that some caveats would remain.
This saves one byte per Query_log_event on disk compared to 5.0.[0..3]. Compatibility problems with 5.0.x where x<4
are explained in the comments in log_event.cc. Putting back s/my_open(O_TRUNC)/(my_delete+my_create) change which had
been wiped away by somebody doing a wrong 4.1->5.0 merge (which happened just
before 5.0.3 :( ). Applying it to new events for LOAD DATA INFILE.
If slave fails in Execute_load_query_log_event::exec_event(),
don't delete the file (so that it's re-usable at next START SLAVE).
And (youpi!) fix for BUG#3247 "a partially completed LOAD DATA INFILE is not
executed at all on the slave" (storing an Execute_load_query_log_event
to binlog, with its error code, instead of Delete_file_log_event).
The idea is to use TABLE_LIST::lock_type for passing type of lock for
target table to mysql_load() instead of using LEX::lock_option
(which were rewritten by first subselect in SET clause).
This should also fix potential problem with LOAD DATA in SP
(it is important for them to have right lock_type in the table
list by the end of statement parsing).
Now one can use user variables as target for data loaded from file
(besides table's columns). Also LOAD DATA got new SET-clause in which
one can specify values for table columns as expressions.
For example the following is possible:
LOAD DATA INFILE 'words.dat' INTO TABLE t1 (a, @b) SET c = @b + 1;
This patch also implements new way of replicating LOAD DATA.
Now we do it similarly to other queries.
We store LOAD DATA query in new Execute_load_query event
(which is last in the sequence of events representing LOAD DATA).
When we are executing this event we simply rewrite part of query which
holds name of file (we use name of temporary file) and then execute it
as usual query. In the beggining of this sequence we use Begin_load_query
event which is almost identical to Append_file event
cast sql_mode to ulonglong before int8store, otherwise the implementation of int8store
based on >> complains that we do 32-bit var >> 32 which is undefined according to C standard.
(compiler warning appeared on SGI Irix)
Lots of small fixes to multi-precision-math path
Give Note for '123.4e'
Added helper functions type 'val_string_from_real()
Don't give warnings for end space for string2decimal()
Changed storage of values for SP so that we can detect length of argument without strlen()
Changed interface for str2dec() so that we must supple the pointer to the last character in the buffer
we store 7 bytes (1 + 2*3) in every Query_log_event.
In the future if users want binlog optimized for small size and less safe,
we could add --binlog-no-charset (and binlog-no-sql-mode etc): charset info
is something by design optional (even if for now we don't offer possibility to disable it):
it's not a binlog format change.
We try to reduce the number of get_charset() calls in the slave SQL thread to a minimum
by caching the charset read from the previous event (which will often be equal to the one of the current event).
We don't use SET ONE_SHOT for charset-aware repl (we still do for timezones, will be fixed later).
No more errors if one changes the global value of charset vars on master or slave
(as we log charset info in all Query_log_event).
Not fixing Load_log_event as it will be rewritten soon by Dmitri.
Testing how mysqlbinlog behaves in rpl_charset.test.
mysqlbinlog needs to know where charset file is (to be able to convert a charset number found
in binlog (e.g. in User_var_log_event) to a charset name); mysql-test-run needs to pass
the correct value for this option to mysqlbinlog.
Many result udpates (adding charset info into every event shifts log_pos in SHOW BINLOG EVENTS).
Roughly the same job is to be done for timezones :)
Split TABLE to TABLE and TABLE_SHARE (TABLE_SHARE is still allocated as part of table, will be fixed soon)
Created Field::make_field() and made Field_num::make_field() to call this
Added 'TABLE_SHARE->db' that points to database name; Changed all usage of table_cache_key as database name to use this instead
Changed field->table_name to point to pointer to alias. This allows us to change alias for a table by just updating one pointer.
Renamed TABLE_SHARE->real_name to table_name
Renamed TABLE->table_name to alias
Renamed TABLE_LIST->real_name to table_name
CREATE DATABASE statement used the current database instead of the
database created when checking conditions for replication.
CREATE/DROP/ALTER DATABASE statements are now replicated based on
the manipulated database.
Replication using replicate-rewrite-db did not work for LOAD DATA INFILE.
Now is does. There was one place in the code that used current database
instead of the rewrite database.
Now thd->mem_root is a pointer to thd->main_mem_root and THR_MALLOC is a pointer to thd->mem_root.
This gives us the following benefits:
- Allow us to easily detect if arena has already been swapped before (this fixes a bug in setup_conds() where arena was swaped twice in some cases)
- Faster swaps of arenas (as we don't have to copy the whole MEM_ROOT)
- We don't anymore have to call my_pthread_setspecific_ptr(THR_MALLOC,...) to change where memory is alloced. Now it's enough to set thd->mem_root
as we already have db_len in Log_event. Only if rewrite_db() changed the db we need a strlen
(so we now do the strlen() in rewrite_db). Plus a test (we had none for --replicate-rewrite-db :( ).
This allows one to setup a master <-> master replication with non conflicting auto-increment series.
Cleaned up binary log code to make it easyer to add new state variables.
Added simpler 'upper level' logic for artificial events (events that should not cause cleanups on slave).
Simplified binary log handling.
Changed how auto_increment works together with to SET INSERT_ID=# to make it more predictable: Now the inserted rows in a multi-row statement are set independent of the existing rows in the table. (Before only InnoDB did this correctly)
Fixed (together with Guilhem) bugs in mysqlbinlog regarding --offset
Prefix addresses with 0x for easier comparisons of debug logs
Fixed problem where MySQL choosed index-read even if there would be a much better range on the same index
This fix changed some 'index' queries to 'range' queries in the test suite
Don't create 'dummy' WHERE clause for trivial WHERE clauses where we can remove the WHERE clause.
This fix removed of a lot of 'Using where' notes in the test suite.
Give NOTE instead of WARNING if table/function doesn't exists when using DROP IF EXISTS
Give NOTE instead of WARNING for safe field-type conversions
We must not reset the charset in slave after each statement, otherwise the SET CHARACTER SET is cancelled immediately.
Instead, we write a SET CHARACTER SET DEFAULT to the master's binlog when needed (like we already do for SET FOREIGN_KEY_CHECKS);
such writing is not necessary in 4.1 (in 4.1 the bug does not exist, as the SET ONE_SHOT syntax is used).
I have written a test and it works, but I'm not pushing the test as it requires building with all charsets.
I have noticed differences between what is inserted in the master's table in 4.0 and 4.1, and alerted Bar.
client code and replication slave code, as far as LOAD DATA INFILE and
other queries' execution is concerned. Duplication of code leads to
replication bugs, because the replication duplicate lags much behind.
Fix for 2 Valgrind errors on slave replicating LOAD DATA INFILE
- one serious (causing a random test failure in rpl_loaddata in 5.0)
- one not serious (theoretically a bug but not dangerous): uninited thd->row_count
processlist on slave":
we now report in SHOW PROCESSLIST that we are writing to the temp
files or loading the table. When we are writing to the tmp file:
| 3 | system user | | | Connect | 6 | Making temp file /tmp/SQL_LOAD-2-1-2.data |
and when we are actually loading the .data temp file into the table:
| 3 | system user | | test | Connect | 2 | | LOAD DATA INFILE '/tmp/SQL_LOAD-2-1-2.data' INTO TABLE `t` <...> |
In mysqlbinlog, there was a problem with how we escaped the content of a string user variable.
To be perfect, we should have escaped with character_set_client. But this charset is unknown
to mysqlbinlog. So the simplest is to print the string in hex. This is unreadable but
100% safe with any charset (checked with Bar), no more need to bother with character_set_client.
by binlogging some SET ONE_SHOT CHARACTER_SETetc,
which will be enough until we have it more compact and more complete in 5.0. With the present patch,
replication will work ok between 4.1.3 master and slaves, as long as:
- master and slave have the same GLOBAL.COLLATION_SERVER
- COLLATION_DATABASE and CHARACTER_SET_DATABASE are not used
- application does not use the fact that table is created with charset of the USEd db (BUG#2326).
all of which are not too hard to fulfill.
ONE_SHOT is reserved for internal use of mysqlbinlog|mysql and works only for charsets,
so we give error if used for non-charset vars.
Fix for BUG#3875 "mysqlbinlog produces wrong ouput if query uses
variables containing quotes" and BUG#3943 "Queries with non-ASCII literals are not replicated
properly after SET NAMES".
Detecting that master and slave have different global charsets or server ids.
Fix remaining cases of Bug #3596: fix possible races caused by an obsolete value of thd->query_length in SHOW PROCESSLIST and SHOW INNODB STATUS; this fix depends on the fact that thd->query is always set to NULL before setting it to point to a new query
in hard-coded replication messages, always put small-length info (error codes, explanation of the error) at the beginning,
so that it is not cut by truncation if the query is very long (which happens if the query goes first).
too big by 6 bytes. So I add code to substract 6 bytes if the master is 3.23.
This is not perfect (because it won't work if the slave I/O thread has not
noticed yet that the master is 3.23), but as long as the slave I/O thread
starts Exec_master_log_pos will be ok.
It must be merged to 4.1 but not to 5.0 (or it can be, because of #if MYSQL_VERSION_ID),
because 5.0 already works if the master is 3.23 (and in a more natural way:
in 5.0 we store the end_log_pos in the binlog and relay log).
I had to move functions from slave.h to slave.cc to satisfy gcc.
Fixed bugs in group_concat with ORDER BY and DISTINCT (Bugs #2695, #3381 and #3319)
Fixed crash when doing rollback in slave and the io thread catched up with the sql thread
Set locked_in_memory properly
We introduce a new function mysql_test_parse_for_slave().
If the slave sees that the query got a really bad error on master
(killed e.g.), then it calls this function to know if this query
can be ignored because of replicate-*-table rules (do not worry
about replicate-*-db rules: they are checked so early that they have
no bug). If the answer is yes, it skips the query and continues. If
it's no, then it stops and say "fix your slave data manually" (like it
did before this change).
- the one about BUG#2921
- the one about relay log flushing
Both will be rewritten in a next changeset
(this one will not be pushed before the next changeset).
INSERT DELAYED works only for one-row inserts (in latest 4.0 versions
at least). So killing a delayed_insert thread does not spoil replication:
the rows which actually went into the table are exactly those listed
in the binlog. So when the delayed_insert thread is killed, don't log
it as 'killed', because it causes superfluous stops on the slave.
"(binlog, position) stored by InnoDB for a replication slave can be wrong".
This code contains conditional #if to distinguish between versions;
it should be merged into 4.1 and 5.0.
Added more DBUG statements
Ensure that we are comparing end space with BINARY strings
Use 'any_db' instead of '' to mean any database. (For HANDLER command)
Only strip ' ' when comparing CHAR, not other space-like characters (like \t)
reset errors (in thd) before executing the event. Otherwise if an event is ignored
because of replicate-*-table rules (error ER_SLAVE_IGNORED_TABLE) this error code
may remain in thd->net and the next event may pick it.
For previous commit I had run only rpl* tests, here the other ones had a
few surprises. Latest status:
- all tests pass
- all replication tests pass with Valgrind
This is the final-final commit & push.
Doc remains.
* A more dynamic binlog format which allows small changes (1064)
* Log session variables in Query_log_event (1063)
It contains a few bugfixes (which I made when running the testsuite).
I carefully updated the results of the testsuite (i.e. I checked for every one,
if the difference between .reject and .result could be explained).
Apparently mysql-test-run --manager is broken in 4.1 and 5.0 currently,
so I could neither run the few tests which require --manager, nor check
that they pass nor modify their .result. But for builds, we don't run
with --manager.
Apart from --manager, the full testsuite passes, with Valgrind too (no errors).
I'm going to push in the next minutes. Remains: update the manual.
Note: by chance I saw that (in 4.1, in 5.0) rpl_get_lock fails when run alone;
this is normal at it makes assumptions on thread ids. I will fix this one day
in 4.1.
This is the main commit for Worklog tasks:
* A more dynamic binlog format which allows small changes (1064)
* Log session variables in Query_log_event (1063)
Below 5.0 means 5.0.0.
MySQL 5.0 is able to replicate FOREIGN_KEY_CHECKS, UNIQUE_KEY_CHECKS (for speed),
SQL_AUTO_IS_NULL, SQL_MODE. Not charsets (WL#1062), not some vars (I can only think
of SQL_SELECT_LIMIT, which deserves a special treatment). Note that this
works for queries, except LOAD DATA INFILE (for this it would have to wait
for Dmitri's push of WL#874, which in turns waits for the present push, so...
the deadlock must be broken!). Note that when Dmitri pushes WL#874 in 5.0.1,
5.0.0 won't be able to replicate a LOAD DATA INFILE from 5.0.1.
Apart from that, the new binlog format is designed so that it can tolerate
a little variation in the events (so that a 5.0.0 slave could replicate a
5.0.1 master, except for LOAD DATA INFILE unfortunately); that is, when I
later add replication of charsets it should break nothing. And when I later
add a UID to every event, it should break nothing.
The main change brought by this patch is a new type of event, Format_description_log_event,
which describes some lengthes in other event types. This event is needed for
the master/slave/mysqlbinlog to understand a 5.0 log. Thanks to this event,
we can later add more bytes to the header of every event without breaking compatibility.
Inside Query_log_event, we have some additional dynamic format, as every Query_log_event
can have a different number of status variables, stored as pairs (code, value); that's
how SQL_MODE and session variables and catalog are stored. Like this, we can later
add count of affected rows, charsets... and we can have options --don't-log-count-affected-rows
if we want.
MySQL 5.0 is able to run on 4.x relay logs, 4.x binlogs.
Upgrading a 4.x master to 5.0 is ok (no need to delete binlogs),
upgrading a 4.x slave to 5.0 is ok (no need to delete relay logs);
so both can be "hot" upgrades.
Upgrading a 3.23 master to 5.0 requires as much as upgrading it to 4.0.
3.23 and 4.x can't be slaves of 5.0.
So downgrading from 5.0 to 4.x may be complicated.
Log_event::log_pos is now the position of the end of the event, which is
more useful than the position of the beginning. We take care about compatibility
with <5.0 (in which log_pos is the beginning).
I added a short test for replication of SQL_MODE and some other variables.
TODO:
- after committing this, merge the latest 5.0 into it
- fix all tests
- update the manual with upgrade notes.
"EE_ error codes (EE_DELETE, EE_WRITE) end up in the binlog, making slave stop".
The problem was that during execution of the command on the master, an error
can occur (for example, not space left on device, then mysqld waits and when
there is space it completes successfully: so finally it worked but the error
EE_WRITE remains in thd->net.last_errno and thd->net.last_error).
To know if finally the command succeeded, we test the 'error' variable in
every place, and if it shows no failure we reset thd->net.last_err* using
the function THD::clear_error() which is backported from 4.1.
A new test to see if now only real errors get to the binlog (note: the test
uses "rm").
Also a bit of memory free/alloc saving in log_event.cc (do not free the whole
mem_root after every query in the slave SQL thread: we can keep the initial
block of it; which will be freed when the thread terminates).
The constructor of Rotate_log_event used when we are rotating our binlog or
relay log, should not assume that there is a nonzero THD available.
For example, when we are reacting to SIGHUP, the THD is 0.
In fact we don't need to use the THD in this constructor;
we can do like for Stop_log_event, and use the minimal Log_event
constructor.
If we were allowed to put Unix-specific commands in the testsuite,
I'd add a test for this (<sigh>).
- when we don't have in_addr_t, use uint32.
- a forgotten initialization of slave_proxy_id in sql/log_event.cc (was not really "forgot", was
"we needn't init it there", but there was one case where we needed...).
- made slave_proxy_id always meaningful in THD and Log_event, so we can
rely more on it (no need to test if it's meaningful). THD::slave_proxy_id
is equal to THD::thread_id except for the slave SQL thread.
- clean up the slave's temporary table (i.e. free their memory) when slave
server shuts down.
"If 2 master threads with same-name temp table, slave makes bad binlog"
and (two birds with one stone) for
BUG#1240 "slave of slave breaks when STOP SLAVE was issud on parent slave
and temp tables".
Here is the design change:
in a slave running with --log-slave-updates, events are now logged with the
thread id they had on the master. So no more id conflicts between master threads,
but introduces id conflicts between one master thread and one normal
client thread connected to the slave. This is solved by storing the server id
in the temp table's name.
New test which requires mysql-test-run to be run with --manager,
otherwise it will be skipped.
Undoing a Monty's change (hum, a chill runs down my spine ;) which was
"Cleanup temporary tables when slave ends" in ChangeSet 1.1572.1.1.
query_alloc_block_size, query_prealloc_size, range_alloc_block_size,transaction_alloc_block_size and transaction_prealloc_size
Add more checks for "out of memory" detection in range optimization
"Add a column "Timestamp_of_last_master_event_executed" in SHOW SLAVE STATUS".
Finally this is adding
- Slave_IO_State (a copy of the State column of SHOW PROCESSLIST for the I/O thread,
so that the users, most of the time, has enough info with only SHOW SLAVE STATUS).
- Seconds_behind_master. When the slave connects to the master it does SELECT UNIX_TIMESTAMP()
on the master, computes the absolute difference between the master's and the slave's clock.
It records the timestamp of the last event executed by the SQL thread, and does a
small computation to find the number of seconds by which the slave is late.