in slave SQL thread: if a transaction fails because of InnoDB deadlock or innodb_lock_wait_timeout exceeded,
optionally retry the transaction a certain number of times (new variable --slave_transaction_retries).
we store 7 bytes (1 + 2*3) in every Query_log_event.
In the future if users want binlog optimized for small size and less safe,
we could add --binlog-no-charset (and binlog-no-sql-mode etc): charset info
is something by design optional (even if for now we don't offer possibility to disable it):
it's not a binlog format change.
We try to reduce the number of get_charset() calls in the slave SQL thread to a minimum
by caching the charset read from the previous event (which will often be equal to the one of the current event).
We don't use SET ONE_SHOT for charset-aware repl (we still do for timezones, will be fixed later).
No more errors if one changes the global value of charset vars on master or slave
(as we log charset info in all Query_log_event).
Not fixing Load_log_event as it will be rewritten soon by Dmitri.
Testing how mysqlbinlog behaves in rpl_charset.test.
mysqlbinlog needs to know where charset file is (to be able to convert a charset number found
in binlog (e.g. in User_var_log_event) to a charset name); mysql-test-run needs to pass
the correct value for this option to mysqlbinlog.
Many result udpates (adding charset info into every event shifts log_pos in SHOW BINLOG EVENTS).
Roughly the same job is to be done for timezones :)
Split TABLE to TABLE and TABLE_SHARE (TABLE_SHARE is still allocated as part of table, will be fixed soon)
Created Field::make_field() and made Field_num::make_field() to call this
Added 'TABLE_SHARE->db' that points to database name; Changed all usage of table_cache_key as database name to use this instead
Changed field->table_name to point to pointer to alias. This allows us to change alias for a table by just updating one pointer.
Renamed TABLE_SHARE->real_name to table_name
Renamed TABLE->table_name to alias
Renamed TABLE_LIST->real_name to table_name
because old behaviour was somewhat nonsensical (kind of bug). Changes are that if repl threads are
down or disconnected the column will be NULL, and if master is idle the column will not grow indefinitely anymore.
All our programs which use mysql_real_connect() and mysql_connect() are updated accordingly, though I have deliberately
made mysqlimport not reconnect anymore (already true for mysqldump >= 4.1.8).
All Connector devs have been warned about the change I'm doing here - which was agreed with Monty,
and fixes BUG#2555.
1 if the return type is int or int_fast8_t. The test case that showed
this problem is rpl000001 and the tested version was MySQL 5.0.2. The
compiler with the problem is GCC 3.0.4 runing on "Linux bitch 2.4.18
#2 Thu Apr 11 14:37:17 EDT 2002 sparc64 unknown".
By changing the return type to bool the problem disappear. (Another
way to make the problem disappear is to simply print the returned
value with printf("%d",?). The printed returned value is always 0 in
the test cases I have run.) This is only a partial solution to the
problem, since someone could later change the return type of the
function back to int or some other type that does not work.
as we already have db_len in Log_event. Only if rewrite_db() changed the db we need a strlen
(so we now do the strlen() in rewrite_db). Plus a test (we had none for --replicate-rewrite-db :( ).
This allows one to setup a master <-> master replication with non conflicting auto-increment series.
Cleaned up binary log code to make it easyer to add new state variables.
Added simpler 'upper level' logic for artificial events (events that should not cause cleanups on slave).
Simplified binary log handling.
Changed how auto_increment works together with to SET INSERT_ID=# to make it more predictable: Now the inserted rows in a multi-row statement are set independent of the existing rows in the table. (Before only InnoDB did this correctly)
client.c:
Removed call to clear_slave_vio in end_server(). Removed header declaration of clear_slave_vio
slave.cc:
Removed clear_slave_vio function and added calls to thd->clear_active_vio before each call to end_server()
client.c:
Added call to clear_slave_vio inside end_server only when under Windows with repliaction
slave.cc:
Added clear_slave_vio function for clearing active vio on THD under Windows replication
do not use '' as user in tests, because it picks the Unix login (which gives unexpected results if it is 'root')
(such behaviour is a feature of mysql_real_connect(), see the manual).
- client side part is simple and may be considered stable
- server side part now just joggles with THD state to save execution
state and has no additional locking wisdom.
Lot's of it are to be rewritten.
Bug #4810 "deadlock with KILL when the victim was in a wait state"
(I included mutex unlock into exit_cond() for future safety)
and BUG#4827 "KILL while START SLAVE may lead to replication slave crash"
Added basic per-thread time zone functionality (based on public
domain elsie-code). Now user can select current time zone
(from the list of time zones described in system tables).
All NOW-like functions honor this time zone, values of TIMESTAMP
type are interpreted as values in this time zone, so now
our TIMESTAMP type behaves similar to Oracle's TIMESTAMP WITH
LOCAL TIME ZONE (or proper PostgresSQL type).
WL#1266 "CONVERT_TZ() - basic time with time zone conversion
function".
Fixed problems described in Bug #2336 (Different number of warnings
when inserting bad datetime as string or as number). This required
reworking of datetime realted warning hadling (they now generated
at Field object level not in conversion functions).
Optimization: Now Field class descendants use table->in_use member
instead of current_thd macro.
error messages when a query goes wrong.
Note that from now on, if you run with --slave-skip-error=xx, then nothing will
be printed to the error log when the slave is having this error xx and
skipping it (but you don't care as you want to skip it).
by binlogging some SET ONE_SHOT CHARACTER_SETetc,
which will be enough until we have it more compact and more complete in 5.0. With the present patch,
replication will work ok between 4.1.3 master and slaves, as long as:
- master and slave have the same GLOBAL.COLLATION_SERVER
- COLLATION_DATABASE and CHARACTER_SET_DATABASE are not used
- application does not use the fact that table is created with charset of the USEd db (BUG#2326).
all of which are not too hard to fulfill.
ONE_SHOT is reserved for internal use of mysqlbinlog|mysql and works only for charsets,
so we give error if used for non-charset vars.
Fix for BUG#3875 "mysqlbinlog produces wrong ouput if query uses
variables containing quotes" and BUG#3943 "Queries with non-ASCII literals are not replicated
properly after SET NAMES".
Detecting that master and slave have different global charsets or server ids.
in tests after the last 4.1->5.0 merge, and:
*** The same as ChangeSet@1.1822.1.1, 2004-05-14 23:08:03+02:00, guilhem@mysql.com of MySQL 4.0 ***
Replication testsuite: making the master-slave synchronization less likely to fail,
by adding sleep-and-retries (max 4 times) if MASTER_POS_WAIT() returns NULL
in sync_with_master and sync_slave_with_master.
The problem showed up only today, in MySQL 5.0 in rpl_server_id2.test,
but may affect 4.x as well, so I fixed 4.x too. Note that I am also fixing
5.0, with the same exact patch, because I don't want to leave 5.0 broken
until the next 4.0->4.1->5.0 merge.
Fix remaining cases of Bug #3596: fix possible races caused by an obsolete value of thd->query_length in SHOW PROCESSLIST and SHOW INNODB STATUS; this fix depends on the fact that thd->query is always set to NULL before setting it to point to a new query
(WL#794). This can be of interest in some recovery-from-backup scenarios, and also when you have
two databases in one mysqld, having a certain similarity and you want one db to be updated when the other is
(some sort of trigger).
Plus small fix for BUG#3568 "MySQL server crashes when built --with-debug and CHANGE MASTER +MASTER_POS_WAIT"
In tables_ok(), when there is no table having "updating==TRUE" in the list,
return that we don't replicate this statement (the slave is supposed to
replicate *changes* only).
In practice, the case can only happen for this statement:
DELETE t FROM t,u WHERE ... ;
tables_ok(t,u) will now return 0, which (check all_tables_not_ok())
will give a chance to tables_ok(t) to run.
as we transform the 3.23 Load_log_event into a 4.0 Create_file_log_event which is one
byte longer, we need to increment event_len. The bug was that we did not increment it,
so later in code the end 0 was not seen so there was for example a segfault in
strlen(fname) because fname was not 0-terminated.
Other problems remain in 3.23->4.0 replication of LOAD DATA INFILE but they are less serious:
Exec_master_log_pos and Relay_log_space are incorrect. I'll document them.
They are not fixable without significant code changes (if you fix those problems in 4.0,
you get assertion failures somewhere else etc), * which are already done in 5.0.0 *.
too big by 6 bytes. So I add code to substract 6 bytes if the master is 3.23.
This is not perfect (because it won't work if the slave I/O thread has not
noticed yet that the master is 3.23), but as long as the slave I/O thread
starts Exec_master_log_pos will be ok.
It must be merged to 4.1 but not to 5.0 (or it can be, because of #if MYSQL_VERSION_ID),
because 5.0 already works if the master is 3.23 (and in a more natural way:
in 5.0 we store the end_log_pos in the binlog and relay log).
I had to move functions from slave.h to slave.cc to satisfy gcc.
Fixed bugs in group_concat with ORDER BY and DISTINCT (Bugs #2695, #3381 and #3319)
Fixed crash when doing rollback in slave and the io thread catched up with the sql thread
Set locked_in_memory properly
We introduce a new function mysql_test_parse_for_slave().
If the slave sees that the query got a really bad error on master
(killed e.g.), then it calls this function to know if this query
can be ignored because of replicate-*-table rules (do not worry
about replicate-*-db rules: they are checked so early that they have
no bug). If the answer is yes, it skips the query and continues. If
it's no, then it stops and say "fix your slave data manually" (like it
did before this change).
re-using unused LOCK_active_mi to serialize all administrative
commands related to replication:
START SLAVE, STOP SLAVE, RESET SLAVE, CHANGE MASTER, init_slave()
(replication autostart at server startup), end_slave() (replication
autostop at server shutdown), LOAD DATA FROM MASTER.
This protects us against a handful of deadlocks (like BUG#2921
when two START SLAVE, but when two STOP SLAVE too).
Removing unused variables.
ChangeSet 1.1620.12.1 and ChangeSet 1.1625.2.1
from 4.1. This makes the slave I/O thread flush the relay log
after every event, which provides additional safety in case
of brutal crash (reduces chances to lose a part of the relay log).
- the one about BUG#2921
- the one about relay log flushing
Both will be rewritten in a next changeset
(this one will not be pushed before the next changeset).
BUG#2826 "Seconds behind master weirdness"
(sometimes this column of SHOW SLAVE STATUS shows a very big value,
in fact a small negative number casted to ulonglong).
This problem was reported by only one user, but which uses
synchronized time between his servers.
As suggested by the user, to hide this I display max(0, the value)
so that it will be less confusing. For a user, seeing 0 is probably
better than seeing -1 (both tell you that the slave is very close
to the master).
* A more dynamic binlog format which allows small changes (1064)
* Log session variables in Query_log_event (1063)
It contains a few bugfixes (which I made when running the testsuite).
I carefully updated the results of the testsuite (i.e. I checked for every one,
if the difference between .reject and .result could be explained).
Apparently mysql-test-run --manager is broken in 4.1 and 5.0 currently,
so I could neither run the few tests which require --manager, nor check
that they pass nor modify their .result. But for builds, we don't run
with --manager.
Apart from --manager, the full testsuite passes, with Valgrind too (no errors).
I'm going to push in the next minutes. Remains: update the manual.
Note: by chance I saw that (in 4.1, in 5.0) rpl_get_lock fails when run alone;
this is normal at it makes assumptions on thread ids. I will fix this one day
in 4.1.
This is the main commit for Worklog tasks:
* A more dynamic binlog format which allows small changes (1064)
* Log session variables in Query_log_event (1063)
Below 5.0 means 5.0.0.
MySQL 5.0 is able to replicate FOREIGN_KEY_CHECKS, UNIQUE_KEY_CHECKS (for speed),
SQL_AUTO_IS_NULL, SQL_MODE. Not charsets (WL#1062), not some vars (I can only think
of SQL_SELECT_LIMIT, which deserves a special treatment). Note that this
works for queries, except LOAD DATA INFILE (for this it would have to wait
for Dmitri's push of WL#874, which in turns waits for the present push, so...
the deadlock must be broken!). Note that when Dmitri pushes WL#874 in 5.0.1,
5.0.0 won't be able to replicate a LOAD DATA INFILE from 5.0.1.
Apart from that, the new binlog format is designed so that it can tolerate
a little variation in the events (so that a 5.0.0 slave could replicate a
5.0.1 master, except for LOAD DATA INFILE unfortunately); that is, when I
later add replication of charsets it should break nothing. And when I later
add a UID to every event, it should break nothing.
The main change brought by this patch is a new type of event, Format_description_log_event,
which describes some lengthes in other event types. This event is needed for
the master/slave/mysqlbinlog to understand a 5.0 log. Thanks to this event,
we can later add more bytes to the header of every event without breaking compatibility.
Inside Query_log_event, we have some additional dynamic format, as every Query_log_event
can have a different number of status variables, stored as pairs (code, value); that's
how SQL_MODE and session variables and catalog are stored. Like this, we can later
add count of affected rows, charsets... and we can have options --don't-log-count-affected-rows
if we want.
MySQL 5.0 is able to run on 4.x relay logs, 4.x binlogs.
Upgrading a 4.x master to 5.0 is ok (no need to delete binlogs),
upgrading a 4.x slave to 5.0 is ok (no need to delete relay logs);
so both can be "hot" upgrades.
Upgrading a 3.23 master to 5.0 requires as much as upgrading it to 4.0.
3.23 and 4.x can't be slaves of 5.0.
So downgrading from 5.0 to 4.x may be complicated.
Log_event::log_pos is now the position of the end of the event, which is
more useful than the position of the beginning. We take care about compatibility
with <5.0 (in which log_pos is the beginning).
I added a short test for replication of SQL_MODE and some other variables.
TODO:
- after committing this, merge the latest 5.0 into it
- fix all tests
- update the manual with upgrade notes.
we change THD::system_thread from a 'bool' to a bitmap to be able to
distinguish between delayed-insert threads and slave threads.
- Fix for BUG#1701 "Update from multiple tables" (one line in sql_parse.cc,
plus a new test rpl_multi_update.test). That's just adding an initialization.
The problem was that when the slave SQL thread reads a hot relay log (hot = the one being written to by the
slave I/O thread), it must have the LOCK_log. It already took it for read_log_event(), but needs
it also for check_binlog_magic().
This should fix all recently reported failures of the rpl_max_relay_size test in 4.1 and 5.0
(though the bug exists since 4.0, it showed up first in 5.0).
the relay log before flushing master.info.
Doing 'before' leads to duplicate event, doing after leads to missing event.
Both can be as destructive, but 'duplicate' enables us to later add detection
code to catch it. Whereas 'missing' can't be caught (it can't, because
the I/O thread can produce legal position jumps, for example if it has
ignored an event coming from this slave (rememember that starting from 4.1.1,
the I/O thread filters the server id).
Now the I/O thread (in flush_master_info()) flushes the relay log to disk
after reading every event. Slower but provides additionnal safety in case
of brutal crash.
I had to make the flush optional (i.e. add a if(some_bool_argument) in the function)
because sometimes flush_master_info() is called when there is no usable
relay log (the relay log's IO_CACHE is not initialized so can't be flushed).
(Initial caps for each word.) For example, instead of writing
Until_condition, Until_Log_File, and Until_log_pos, write
Until_Condition, Until_Log_File, and Until_Log_pos.
rli->save_temporary_tables and slave_open_temp_tables
(in old 4.0 you could make "SHOW STATUS LIKE 'slave_open_temp_tables'" grow
indefinitely by doing RESET SLAVE and replicating always the same CREATE
TEMPORARY TABLE).
It's critical to reset save_temporary_tables to 0 (otherwise you may later
read memory which has been freed) so this changeset should go into 4.1.
- when we don't have in_addr_t, use uint32.
- a forgotten initialization of slave_proxy_id in sql/log_event.cc (was not really "forgot", was
"we needn't init it there", but there was one case where we needed...).
- made slave_proxy_id always meaningful in THD and Log_event, so we can
rely more on it (no need to test if it's meaningful). THD::slave_proxy_id
is equal to THD::thread_id except for the slave SQL thread.
- clean up the slave's temporary table (i.e. free their memory) when slave
server shuts down.
"If 2 master threads with same-name temp table, slave makes bad binlog"
and (two birds with one stone) for
BUG#1240 "slave of slave breaks when STOP SLAVE was issud on parent slave
and temp tables".
Here is the design change:
in a slave running with --log-slave-updates, events are now logged with the
thread id they had on the master. So no more id conflicts between master threads,
but introduces id conflicts between one master thread and one normal
client thread connected to the slave. This is solved by storing the server id
in the temp table's name.
New test which requires mysql-test-run to be run with --manager,
otherwise it will be skipped.
Undoing a Monty's change (hum, a chill runs down my spine ;) which was
"Cleanup temporary tables when slave ends" in ChangeSet 1.1572.1.1.
a Format_description_log_event (or maybe it will be named
Description_log_event) which is not recognized by 4.0, so
a 4.0 can't be a slave of 5.0. We detect it early to produce
a helpful message instead of "corrupted relay log" later.
"Add a column "Timestamp_of_last_master_event_executed" in SHOW SLAVE STATUS".
Finally this is adding
- Slave_IO_State (a copy of the State column of SHOW PROCESSLIST for the I/O thread,
so that the users, most of the time, has enough info with only SHOW SLAVE STATUS).
- Seconds_behind_master. When the slave connects to the master it does SELECT UNIX_TIMESTAMP()
on the master, computes the absolute difference between the master's and the slave's clock.
It records the timestamp of the last event executed by the SQL thread, and does a
small computation to find the number of seconds by which the slave is late.