We introduce a new function mysql_test_parse_for_slave().
If the slave sees that the query got a really bad error on master
(killed e.g.), then it calls this function to know if this query
can be ignored because of replicate-*-table rules (do not worry
about replicate-*-db rules: they are checked so early that they have
no bug). If the answer is yes, it skips the query and continues. If
it's no, then it stops and say "fix your slave data manually" (like it
did before this change).
re-using unused LOCK_active_mi to serialize all administrative
commands related to replication:
START SLAVE, STOP SLAVE, RESET SLAVE, CHANGE MASTER, init_slave()
(replication autostart at server startup), end_slave() (replication
autostop at server shutdown), LOAD DATA FROM MASTER.
This protects us against a handful of deadlocks (like BUG#2921
when two START SLAVE, but when two STOP SLAVE too).
Removing unused variables.
ChangeSet 1.1620.12.1 and ChangeSet 1.1625.2.1
from 4.1. This makes the slave I/O thread flush the relay log
after every event, which provides additional safety in case
of brutal crash (reduces chances to lose a part of the relay log).
- the one about BUG#2921
- the one about relay log flushing
Both will be rewritten in a next changeset
(this one will not be pushed before the next changeset).
BUG#2826 "Seconds behind master weirdness"
(sometimes this column of SHOW SLAVE STATUS shows a very big value,
in fact a small negative number casted to ulonglong).
This problem was reported by only one user, but which uses
synchronized time between his servers.
As suggested by the user, to hide this I display max(0, the value)
so that it will be less confusing. For a user, seeing 0 is probably
better than seeing -1 (both tell you that the slave is very close
to the master).
* A more dynamic binlog format which allows small changes (1064)
* Log session variables in Query_log_event (1063)
It contains a few bugfixes (which I made when running the testsuite).
I carefully updated the results of the testsuite (i.e. I checked for every one,
if the difference between .reject and .result could be explained).
Apparently mysql-test-run --manager is broken in 4.1 and 5.0 currently,
so I could neither run the few tests which require --manager, nor check
that they pass nor modify their .result. But for builds, we don't run
with --manager.
Apart from --manager, the full testsuite passes, with Valgrind too (no errors).
I'm going to push in the next minutes. Remains: update the manual.
Note: by chance I saw that (in 4.1, in 5.0) rpl_get_lock fails when run alone;
this is normal at it makes assumptions on thread ids. I will fix this one day
in 4.1.
This is the main commit for Worklog tasks:
* A more dynamic binlog format which allows small changes (1064)
* Log session variables in Query_log_event (1063)
Below 5.0 means 5.0.0.
MySQL 5.0 is able to replicate FOREIGN_KEY_CHECKS, UNIQUE_KEY_CHECKS (for speed),
SQL_AUTO_IS_NULL, SQL_MODE. Not charsets (WL#1062), not some vars (I can only think
of SQL_SELECT_LIMIT, which deserves a special treatment). Note that this
works for queries, except LOAD DATA INFILE (for this it would have to wait
for Dmitri's push of WL#874, which in turns waits for the present push, so...
the deadlock must be broken!). Note that when Dmitri pushes WL#874 in 5.0.1,
5.0.0 won't be able to replicate a LOAD DATA INFILE from 5.0.1.
Apart from that, the new binlog format is designed so that it can tolerate
a little variation in the events (so that a 5.0.0 slave could replicate a
5.0.1 master, except for LOAD DATA INFILE unfortunately); that is, when I
later add replication of charsets it should break nothing. And when I later
add a UID to every event, it should break nothing.
The main change brought by this patch is a new type of event, Format_description_log_event,
which describes some lengthes in other event types. This event is needed for
the master/slave/mysqlbinlog to understand a 5.0 log. Thanks to this event,
we can later add more bytes to the header of every event without breaking compatibility.
Inside Query_log_event, we have some additional dynamic format, as every Query_log_event
can have a different number of status variables, stored as pairs (code, value); that's
how SQL_MODE and session variables and catalog are stored. Like this, we can later
add count of affected rows, charsets... and we can have options --don't-log-count-affected-rows
if we want.
MySQL 5.0 is able to run on 4.x relay logs, 4.x binlogs.
Upgrading a 4.x master to 5.0 is ok (no need to delete binlogs),
upgrading a 4.x slave to 5.0 is ok (no need to delete relay logs);
so both can be "hot" upgrades.
Upgrading a 3.23 master to 5.0 requires as much as upgrading it to 4.0.
3.23 and 4.x can't be slaves of 5.0.
So downgrading from 5.0 to 4.x may be complicated.
Log_event::log_pos is now the position of the end of the event, which is
more useful than the position of the beginning. We take care about compatibility
with <5.0 (in which log_pos is the beginning).
I added a short test for replication of SQL_MODE and some other variables.
TODO:
- after committing this, merge the latest 5.0 into it
- fix all tests
- update the manual with upgrade notes.
we change THD::system_thread from a 'bool' to a bitmap to be able to
distinguish between delayed-insert threads and slave threads.
- Fix for BUG#1701 "Update from multiple tables" (one line in sql_parse.cc,
plus a new test rpl_multi_update.test). That's just adding an initialization.
The problem was that when the slave SQL thread reads a hot relay log (hot = the one being written to by the
slave I/O thread), it must have the LOCK_log. It already took it for read_log_event(), but needs
it also for check_binlog_magic().
This should fix all recently reported failures of the rpl_max_relay_size test in 4.1 and 5.0
(though the bug exists since 4.0, it showed up first in 5.0).
the relay log before flushing master.info.
Doing 'before' leads to duplicate event, doing after leads to missing event.
Both can be as destructive, but 'duplicate' enables us to later add detection
code to catch it. Whereas 'missing' can't be caught (it can't, because
the I/O thread can produce legal position jumps, for example if it has
ignored an event coming from this slave (rememember that starting from 4.1.1,
the I/O thread filters the server id).
Now the I/O thread (in flush_master_info()) flushes the relay log to disk
after reading every event. Slower but provides additionnal safety in case
of brutal crash.
I had to make the flush optional (i.e. add a if(some_bool_argument) in the function)
because sometimes flush_master_info() is called when there is no usable
relay log (the relay log's IO_CACHE is not initialized so can't be flushed).
(Initial caps for each word.) For example, instead of writing
Until_condition, Until_Log_File, and Until_log_pos, write
Until_Condition, Until_Log_File, and Until_Log_pos.
rli->save_temporary_tables and slave_open_temp_tables
(in old 4.0 you could make "SHOW STATUS LIKE 'slave_open_temp_tables'" grow
indefinitely by doing RESET SLAVE and replicating always the same CREATE
TEMPORARY TABLE).
It's critical to reset save_temporary_tables to 0 (otherwise you may later
read memory which has been freed) so this changeset should go into 4.1.
- when we don't have in_addr_t, use uint32.
- a forgotten initialization of slave_proxy_id in sql/log_event.cc (was not really "forgot", was
"we needn't init it there", but there was one case where we needed...).
- made slave_proxy_id always meaningful in THD and Log_event, so we can
rely more on it (no need to test if it's meaningful). THD::slave_proxy_id
is equal to THD::thread_id except for the slave SQL thread.
- clean up the slave's temporary table (i.e. free their memory) when slave
server shuts down.
"If 2 master threads with same-name temp table, slave makes bad binlog"
and (two birds with one stone) for
BUG#1240 "slave of slave breaks when STOP SLAVE was issud on parent slave
and temp tables".
Here is the design change:
in a slave running with --log-slave-updates, events are now logged with the
thread id they had on the master. So no more id conflicts between master threads,
but introduces id conflicts between one master thread and one normal
client thread connected to the slave. This is solved by storing the server id
in the temp table's name.
New test which requires mysql-test-run to be run with --manager,
otherwise it will be skipped.
Undoing a Monty's change (hum, a chill runs down my spine ;) which was
"Cleanup temporary tables when slave ends" in ChangeSet 1.1572.1.1.
a Format_description_log_event (or maybe it will be named
Description_log_event) which is not recognized by 4.0, so
a 4.0 can't be a slave of 5.0. We detect it early to produce
a helpful message instead of "corrupted relay log" later.
"Add a column "Timestamp_of_last_master_event_executed" in SHOW SLAVE STATUS".
Finally this is adding
- Slave_IO_State (a copy of the State column of SHOW PROCESSLIST for the I/O thread,
so that the users, most of the time, has enough info with only SHOW SLAVE STATUS).
- Seconds_behind_master. When the slave connects to the master it does SELECT UNIX_TIMESTAMP()
on the master, computes the absolute difference between the master's and the slave's clock.
It records the timestamp of the last event executed by the SQL thread, and does a
small computation to find the number of seconds by which the slave is late.
Now LOAD DATA FROM MASTER does not drop the database, instead it only tries to
create it, and drops/creates table-by-table.
* replicate_wild_ignore_table='db1.%' is now considered as "ignore the 'db1'
database as a whole", as it already works for CREATE DATABASE and DROP DATABASE.
Added proper options to CHANGE MASTER TO, new fields to SHOW SLAVE STATUS,
Honoring this parameters during connection to master.
Introduced new format of master.info file
For example the Binlog_dump thread (on the master) sometimes showed "Slave:".
And there were confusing messages where "binlog" was employed instead
of "relay log".
when the SQL thread stops, set rli->inside_transaction to 0. This is needed if the user
later restarts replication from a completely different place where there are only autocommit
statements.
* Detect the case where the master died while flushing the binlog cache to the binlog
and stop with error. Cannot add a testcase for this in 4.0 (I tested it manually)
as the slave always runs with --skip-innodb.
Let's say the lack of comments did not help me ;)
Copying it back again and adding comments; now 3.23->4.0
replication of LOAD DATA INFILE works again.
"LOAD DATA INFILE is badly filtered by binlog-*-db rules".
There will probably be a second final one to merge Dmitri's changes
to rpl_log.result and mine.
2 new tests:
rpl_loaddata_rule_m : test of logging of LOAD DATA INFILE when the master has binlog-*-db rules,
rpl_loaddata_rule_s : test of logging of LOAD DATA INFILE when the slave has binlog-*-db rules and --log-slave-updates.
log-slave-updates since this causes unexpected values in
Exec_master_log_pos in A->B->C replication setup, synchronization
problems in master_pos_wait()...
Still this brokes some functionality in sql/repl_failsafe.cc
(but this file is not used now)
- Bug #985: "Between RESET SLAVE and START SLAVE, SHOW SLAVE STATUS is wrong."
Now RESET SLAVE puts correct info in mi->host etc. A new test rpl_reset_slave
for that.
- Bug #986: "CHANGE MASTER & START SLAVE do not reset error columns in SHOW
SLAVE STATUS". Now these reset the errors.
I extended the task to cleaning error messages, making them look nicer,
and making the output of SHOW SLAVE STATUS (column Last_error) be as complete
as what's printed on the .err file;
previously we would have, for a failure of a replicated LOAD DATA INFILE:
- in the .err, 2 lines:
"duplicate entry 2708 for key 1"
"failed loading SQL_LOAD-5-2-2.info"
- and in SHOW SLAVE STATUS, only:
"failed loading SQL_LOAD-5-2-2.info".
Now SHOW SLAVE STATUS will contain the concatenation of the 2 messages.
and other replicate-*-table options in SHOW SLAVE STATUS.
Seems like it had not been done, so I push it now:
there's 4 new columns to SHOW SLAVE STATUS.
so I commit again in a fresh tree.
Fix for bug#763 (Relay_log_space too big by 4 bytes),
plus comments and DBUG_PRINT, and we don't start replication
if --bootstrap.
For now following tasks have been done:
- PASSWORD() function was rewritten. PASSWORD() now returns SHA1
hash_stage2; for new passwords user.password contains '*'hash_stage2; sql_yacc.yy also fixed;
- password.c: new functions were implemented, old rolled back to 4.0 state
- server code was rewritten to use new authorization algorithm (check_user(), change
user, and other stuff in sql/sql_parse.cc)
- client code was rewritten to use new authorization algorithm
(mysql_real_connect, myslq_authenticate in sql-common/client.c)
- now server barks on 45-byte-length 4.1.0 passwords and refuses 4.1.0-style
authentification. Users with 4.1.0 passwords are blocked (sql/sql_acl.cc)
- mysqladmin.c was fixed to work correctly with new passwords
Tests for 4.0-4.1.1, 4.1.1-4.1.1 (with or without db/password) logons was performed;
mysqladmin also was tested. Additional check are nevertheless necessary.
thd->enter_cond() and exit_cond(), so that the I/O thread accepts to stop
when it's waiting for relay log space.
Reset ignore_log_space_limit to 0 when the SQL thread terminates.
Could not add a testcase for this: if the test goes into a MASTER_POS_WAIT, it waits
until this terminates (even doing "connection other_con" to launch "stop slave" is blocked).
- In MASTER_POS_WAIT() don't test if the I/O slave is running, but if the SQL thread
is running.
- Some DBUG info for this bugfix.
- Comments for future devs.
- Start_log_event::exec_event() : when we hit it, do a rollback.
- We don't need LOG_EVENT_FORCED_ROTATE_F.
- Stop_log_event::exec_event() : when we hit it, we needn't clean anything.
- Removed LOG_EVENT_TIME_F and LOG_EVENT_FORCED_ROTATE_F.
- We don't need Stop events in the relay log.
- Now filtering of server id is done in the I/O thread first.
support issue with an unclear message which can have N reasons for appearing.
This should help us know at which point it failed, and get the errno when
my_open was involved (as the reason for the unclear message is often a
permission problem).
RESET SLAVE resets last_error and last_errno in SHOW SLAVE STATUS (without this,
rpl_loaddata.test, which is expected to generate an error in last_error, influenced
rpl_log_pos.test).
A small test update.
Added STOP SLAVE to mysql-test-run to get rid of several stupid error messages
which are printed while the master restarts and the slave attempts/manages to
connect to it and sends it nonsense binlog requests.
we now make a distinction between if the master is < 3.23.57, 3.23 && >=57, and 4.x
(before the 2 3.23 were one). This is because in 3.23.57 we have a way to distinguish between
a Start_log_event written at server startup and one written at FLUSH LOGS, so we
have a way to know if the slave must drop old temp tables or not.
Change: mi->old_format was bool, now it's enum (to handle 3 cases). However, functions
which had 'bool old_format' as an argument have their prototypes unchanged, because
the old old_format == 0 now corresponds to the enum value BINLOG_FORMAT_CURRENT which
is equal to 0, so boolean tests are left untouched. The only case were we use mi->old_format
as an enum instead of casting it implicitly to a bool, is in Start_log_event::exec_event,
where we want to distinguish between the 3 possible enum values.
Plus a changeset which I had committed but forgot to push (and this changeset is lost on
another computer, so I recreate it here). This changeset is "user-friendly SHOW BINLOG EVENTS
and CHANGE MASTER TO when log positions < 4 are used.
Here is another pack of changes about gathering common client code in
sql-common/client.c.
Now i symlink the client.c from sql/ and libmysql/. These directories
have client_settings.h files to be included to client.c. It contains
defines and declarations to compile client.c in appropriate manner.
Also i've added include/sql_common.h, containing declarations of what
is exported from client.c
I removed as many #ifdef-s from client.c as i dared to. I think it's better
push it with some extra #ifdef-s now (of course, if everythihg besides it is
ok) so other people can check the code.
- A few more mutex locks and unlocks of rli.log_space_lock for doing clean reads of
rli.ignore_log_space_limit
- Broadcast after unlock, not before (small speed optimisation).
now we'll have something like this :
030308 18:46:58 Slave I/O thread: connected to master 'gb@localhost:3306', replication started in log 'FIRST' at position 4
030308 18:46:58 While trying to obtain the list of slaves from the master 'localhost:3306', user 'gb' got the following error: 'Access denied. You need the REPLICATION SLAVE privilege for this operation'
030308 18:46:58 Slave I/O thread exiting, read up to log 'FIRST', position 4
instead of "Error updating slave list: Query error".
This fixes bug #80.
Added ALL as parameter option for all group functions.
Make join handling uniform. This allows us to use ',', JOIN and INNER JOIN the same way.
Sort NULL last if DESC is used (ANSI SQL 99 requirement)
Call pthread_mutex_destroy() on not used mutex.
Changed comments in .h and .c files from // -> /* */
Added detection of mutex on which one didn't call pthread_mutex_destroy()
Fixed bug in create_tmp_field() which causes a memory overrun in queries that uses "ORDER BY constant_expression"
Added optimisation for ORDER BY NULL