SHOW PROCESSLIST, SHOW BINLOGS
Problem: A deadlock was occurring when 4 threads were
involved in acquiring locks in the following way
Thread 1: Dump thread ( Slave is reconnecting, so on
Master, a new dump thread is trying kill
zombie dump threads. It acquired thread's
LOCK_thd_data and it is about to acquire
mysys_var->current_mutex ( which LOCK_log)
Thread 2: Application thread is executing show binlogs and
acquired LOCK_log and it is about to acquire
LOCK_index.
Thread 3: Application thread is executing Purge binary logs
and acquired LOCK_index and it is about to
acquire LOCK_thread_count.
Thread 4: Application thread is executing show processlist
and acquired LOCK_thread_count and it is
about to acquire zombie dump thread's
LOCK_thd_data.
Deadlock Cycle:
Thread 1 -> Thread 2 -> Thread 3-> Thread 4 ->Thread 1
The same above deadlock was observed even when thread 4 is
executing 'SELECT * FROM information_schema.processlist' command and
acquired LOCK_thread_count and it is about to acquire zombie
dump thread's LOCK_thd_data.
Analysis:
There are four locks involved in the deadlock. LOCK_log,
LOCK_thread_count, LOCK_index and LOCK_thd_data.
LOCK_log, LOCK_thread_count, LOCK_index are global mutexes
where as LOCK_thd_data is local to a thread.
We can divide these four locks in two groups.
Group 1 consists of LOCK_log and LOCK_index and the order
should be LOCK_log followed by LOCK_index.
Group 2 consists of other two mutexes
LOCK_thread_count, LOCK_thd_data and the order should
be LOCK_thread_count followed by LOCK_thd_data.
Unfortunately, there is no specific predefined lock order defined
to follow in the MySQL system when it comes to locks across these
two groups. In the above problematic example,
there is no problem in the way we are acquiring the locks
if you see each thread individually.
But If you combine all 4 threads, they end up in a deadlock.
Fix:
Since everything seems to be fine in the way threads are taking locks,
In this patch We are changing the duration of the locks in Thread 4
to break the deadlock. i.e., before the patch, Thread 4
('show processlist' command) mysqld_list_processes()
function acquires LOCK_thread_count for the complete duration
of the function and it also acquires/releases
each thread's LOCK_thd_data.
LOCK_thread_count is used to protect addition and
deletion of threads in global threads list. While show
process list is looping through all the existing threads,
it will be a problem if a thread is exited but there is no problem
if a new thread is added to the system. Hence a new mutex is
introduced "LOCK_thd_remove" which will protect deletion
of a thread from global threads list. All threads which are
getting exited should acquire LOCK_thd_remove
followed by LOCK_thread_count. (It should take LOCK_thread_count
also because other places of the code still thinks that exit thread
is protected with LOCK_thread_count. In this fix, we are changing
only 'show process list' query logic )
(Eg: unlink_thd logic will be protected with
LOCK_thd_remove).
Logic of mysqld_list_processes(or file_schema_processlist)
will now be protected with 'LOCK_thd_remove' instead of
'LOCK_thread_count'.
Now the new locking order after this patch is:
LOCK_thd_remove -> LOCK_thd_data -> LOCK_log ->
LOCK_index -> LOCK_thread_count
Since log_throttle is not available in 5.5. Logging of
error message for failure of thread to create new connection
in "create_thread_to_handle_connection" is not backported.
Since, function "my_plugin_log_message" is not available in
5.5 version and since there is incompatibility between
sql_print_XXX function compiled with g++ and alog files with
gcc to use sql_print_error, changes related to audit log
plugin is not backported.
Problem - When the slave was disconnected from the master, under certain
conditions, upon reconnect, it will report that it received a
packet larger the slave_max_allowed_packet which causes the
replication to stop.
Analysis -The reason of this failure is that on reconnect
the slave sets the max_allowed_packet from the master's mi->mysql
object which keeps the max_allowed_packet as 1MB. This causes the
slave to report such error on recieving packet bigger than 1MB.
START SLAVE on the slave fixes the problem since it restarts
slave threads which initializes the max_allowed_packet to
slave_max_allowed_packet.
Fix - The problem is fixed by some code refactoring and introduction of a new
function which updates the max_allowed_packet for the THD object of the
slave thread and the mysql->options max_allowed_packet.
Problem
========
Replication breaks in the cases if the event length exceeds
the size of master Dump thread's max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet, on addition of the
max_event_header length exceeds the max_allowed_packet of the DUMP thread.
This causes the Dump thread to break replication and throw an error.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem is fixed in 2 steps:
1.) The Dump thread limit to read event is increased to the upper limit
i.e. Dump thread reads whatever gets logged in the binary log.
2.) On the slave side we increase the the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option (slave_max_allowed_packet)
included, is used to regulate the max_allowed_packet of the
slave thread (IO/SQL) by the DBA, and facilitates the sending of
large packets from the master to the slave.
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.cc:
The max_allowed_packet is not evaluated to the new option
slave_max_allowed_packet after the fix.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
sql/sql_repl.cc:
The dump thread's max_allowed_packet is set to the upper limit
which makes it independent and it now reads whatever gets
logged in the binary log.
Problem
========
SQL statements close to the size of max_allowed_packet produce binary
log events larger than max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet + max_event_header
length. Now since the event length exceeds this size master Dump
thread is unable to send the packet on to the slave.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem was fixed by increasing the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option included which is used to
regulate the max_allowed_packet of the slave thread (IO/SQL).
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value ,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
PROBLEM:
Threads end-up in deadlock due to locks acquired as described
below,
con1: Run Query on a table.
It is important that this SELECT must back-off while
trying to open the t1 and enter into wait_for_condition().
The SELECT then is blocked trying to lock mysys_var->mutex
which is held by con3. The very significant fact here is
that mysys_var->current_mutex will still point to LOCK_open,
even if LOCK_open is no longer held by con1 at this point.
con2: Try dropping table used in con1 or query some table.
It will hold LOCK_open and be blocked trying to lock
kernel_mutex held by con4.
con3: Try killing the query run by con1.
It will hold THD::LOCK_thd_data belonging to con1 while
trying to lock mysys_var->current_mutex belonging to con1.
But current_mutex will point to LOCK_open which is held
by con2.
con4: Get innodb engine status
It will hold kernel_mutex, trying to lock THD::LOCK_thd_data
belonging to con1 which is held by con3.
So while technically only con2, con3 and con4 participate in the
deadlock, con1's mysys_var->current_mutex pointing to LOCK_open
is a vital component of the deadlock.
CYCLE = (THD::LOCK_thd_data -> LOCK_open ->
kernel_mutex -> THD::LOCK_thd_data)
FIX:
LOCK_thd_data has responsibility of protecting,
1) thd->query, thd->query_length
2) VIO
3) thd->mysys_var (used by KILL statement and shutdown)
4) THD during thread delete.
Among above responsibilities, 1), 2)and (3,4) seems to be three
independent group of responsibility. If there is different LOCK
owning responsibility of (3,4), the above mentioned deadlock cycle
can be avoid. This fix introduces LOCK_thd_kill to handle
responsibility (3,4), which eliminates the deadlock issue.
Note: The problem is not found in 5.5. Introduction MDL subsystem
caused metadata locking responsibility to be moved from TDC/TC to
MDL subsystem. Due to this, responsibility of LOCK_open is reduced.
As the use of LOCK_open is removed in open_table() and
mysql_rm_table() the above mentioned CYCLE does not form.
Revision ID for changes,
open_table() = dlenev@mysql.com-20100727133458-m3ua9oslnx8fbbvz
mysql_rm_table() = jon.hauglid@oracle.com-20101116100012-kxep9txz2fxy3nmw
BUG#11761686 insert_id event is not filtered.
Two issues are covered.
INSERT into autoincrement field which is not the first part in the composed primary key
is unsafe by autoincrement logging design. The case is specific to MyISAM engine
because Innodb does not allow such table definition.
However no warnings and row-format logging in the MIXED mode was done, and
that is fixed.
Int-, Rand-, User-var log-events were not filtered along with their parent
query that made possible them to screw up execution context of the following
query.
Fixed with deferring their execution until the parent query.
******
Bug#11754117
Post review fixes.
mysql-test/suite/rpl/r/rpl_auto_increment_bug45679.result:
a new result file is added.
mysql-test/suite/rpl/r/rpl_filter_tables_not_exist.result:
results updated.
mysql-test/suite/rpl/t/rpl_auto_increment_bug45679.test:
regression test for BUG#11754117-45670 is added.
mysql-test/suite/rpl/t/rpl_filter_tables_not_exist.test:
regression test for filtering issue of BUG#11754117 - 45670 is added.
sql/log_event.cc:
Logics are added for deferring and executing events associated
with the Query event.
sql/log_event.h:
Interface to deferred events batch execution is added.
sql/rpl_rli.cc:
initialization for new RLI members is added.
sql/rpl_rli.h:
New members to RLI are added to facilitate deferred events gathering
and execution control;
two general character RLI cleanup methods are constructed.
sql/rpl_utility.cc:
Deferred_log_events methods are difined.
sql/rpl_utility.h:
A new class Deferred_log_events is defined to implement
IRU events gathering, execution and cleanup.
sql/slave.cc:
Necessary changes to initialize `rli->deferred_events' and prevent
deferred event deletion in the main read-exec branch.
sql/sql_base.cc:
A new safe-check function for multi-part pk with auto-increment is defined
and deployed in lock_tables().
sql/sql_class.cc:
Initialization for a new member and replication cleanups are added
to THD class.
sql/sql_class.h:
THD class receives a new member to hold a specific execution
context for slave applier.
sql/sql_parse.cc:
Execution of the deferred event in started prior to its parent query.
BUG#64503: mysql frequently ignores --relay-log-space-limit
When the SQL thread goes to sleep, waiting for more events, it sets
the flag ignore_log_space_limit to true. This gives the IO thread a
chance to queue some more events and ultimately the SQL thread will be
able to purge the log once it is rotated. By then the SQL thread
resets the ignore_log_space_limit to false. However, between the time
the SQL thread has set the ignore flag and the time it resets it, the
IO thread will be queuing events in the relay log, possibly going way
over the limit.
This patch makes the IO and SQL thread to synchronize when they reach
the space limit and only ask for one event at a time. Thus the SQL
thread sets ignore_log_space_limit flag and the IO thread resets it to
false everytime it processes one more event. In addition, everytime
the SQL thread processes the next event, and the limit has been
reached, it checks if the IO thread should rotate. If it should, it
instructs the IO thread to rotate, giving the SQL thread a chance to
purge the logs (freeing space). Finally, this patch removes the
resetting of the ignore_log_space_limit flag from purge_first_log,
because this is now reset by the IO thread every time it processes the
next event when the limit has been reached.
If the SQL thread is in a transaction, it cannot purge so, there is no
point in asking the IO thread to rotate. The only thing it can do is
to ask for more events until the transaction is over (then it can ask
the IO to rotate and purge the log right away). Otherwise, there would
be a deadlock (SQL would not be able to purge and IO thread would not
be able to queue events so that the SQL would finish the transaction).
Problem : The basic problem is the way the thread sleeps in mysql-5.5 and also in mysql-5.1
when we execute a stop slave on windows platform.
On windows platform if the stop slave is executed after the master dies, we have
this long wait before the stop slave return a value. This is because there is a
sleep of the thread. The sleep is uninterruptable in the two above version,
which was fixed by Davi patch for the BUG#11765860 for mysql-trunk. Backporting
his patch for mysql-5.5 fixes the problem.
Solution : A new pair of mutex and condition variable is introduced to synchronize thread
sleep and finalization. A new mutex is required because the slave threads are
terminated while holding the slave thread locks (run_lock), which can not be
relinquished during termination as this would affect the lock order.
mysql-test/suite/rpl/r/rpl_start_stop_slave.result:
The result file associated with the test added.
mysql-test/suite/rpl/t/rpl_start_stop_slave.test:
A test to check the new functionality.
sql/rpl_mi.cc:
The constructor using the new mutex and condition variables for the master_info.
sql/rpl_mi.h:
The condition variable and mutex have been added for the master_info.
sql/rpl_rli.cc:
The constructor using the new mutex and condition variables for the realy_log_info.
sql/rpl_rli.h:
The condition variable and mutex have been added for the relay_log_info.
sql/slave.cc:
Use a timed wait on a condition variable to implement a interruptible sleep.
The wait is registered with the THD object so that the thread will be woken
up if killed.
When passing an empty user to the connect function will cause
valgrind warnings. Seems that the client code is not prepared
to handle empty users. On 5.6 this can even be triggered by
START SLAVE PASSWORD='...'; i.e., without setting USER='...' on
the START SLAVE command (see WL#4143 for details on the new
additional START SLAVE commands).
To fix this, we disallow empty users when configuring the slave
connection parameters (this decision might be revisited if the
client code accepts empty users in the future).
sql/slave.cc:
We throw an error if an empty user is supplied to the connection
function.
Connection of slave to master using a replication account which authenticates
with an external plugin was not possible.
Fixed by making sure that the CLIENT_PLUGIN_AUTH capability is set when client connects using mysql_real_connect(). Also, a plugin-dir path used by client library to locate authentication plugins is set based on the analogous server setting. This is done in connect_to_master() function before a call to mysql_real_connect().
Background: Backporting fix for BUG 11752963 to Mysql5.1 branch.
Problem: Fix of bug 11752963 was only available for trunk and 5.5 branch.
Partial fix has been pushed to 5.1 branch as well.
Fix: backporting the fixes of bug 11752963 to 5.1 branch.
1. Made all major changes to make 5.1 branch in line with 5.5 and the trunk.
2. skipped the partial patch that was already applied to the 5.1 branch.
sql/rpl_rli.h:
Made inited Volatile (find inline comments)
sql/slave.cc:
backported all changes from the fix of BUG#11752963.
Manual merge from mysql-5.1 into mysql-5.5.
Conflicts
=========
Text conflict in mysql-test/suite/rpl/t/rpl_row_until.test
Text conflict in sql/handler.h
Text conflict in storage/archive/ha_archive.cc
Currently, rpl_semi_sync is failing in PB due to the warning message:
"Slave SQL: slave SQL thread is being stopped in the middle of "
"applying of a group having updated a non-transaction table; "
"waiting for the group completion ..."
The problem started happening after the fix for BUG#11762407 what was
automatically suppressing some warning messages.
To fix the current issue, we suppress the aforementioned warning message
and exploit the opportunity to make the sentence clearer.
rpl_packet got a timeout failure sporadically on PB when stopping
slave. The real reason of this bug is that STOP SLAVE stopped
IO thread first and then stopped SQL thread. It was
possible that IO thread stopped after replicating part of a
transaction which SQL thread was executing. SQL thread would
be hung if the transaction could not be rolled back safely.
After this patch, STOP SLAVE will stop SQL thread first and then stop IO
thread, which guarantees that IO thread will fetch the reset of the
events of the transaction that SQL thread is executing, so that SQL
thread can finish the transaction if it cannot be rolled back safely.
Added below auxiliary files to make the test code neater.
restart_slave_sql.inc
rpl_connection_master.inc
rpl_connection_slave.inc
rpl_connection_slave1.inc
Manual merge from mysql-5.1-bugteam into mysql-5.5-bugteam.
Conflicts
=========
Text conflict in sql/log.cc
Text conflict in sql/log.h
Text conflict in sql/slave.cc
Text conflict in sql/sql_parse.cc
Text conflict in sql/sql_priv.h
when generating new name.
If find_uniq_filename returns an error, then this error is not
being propagated upwards, and execution does not report error to
the user (although a entry in the error log is generated).
Additionally, some more errors were ignored in new_file_impl:
- when writing the rotate event
- when reopening the index and binary log file
This patch addresses this by propagating the error up in the
execution stack. Furthermore, when rotation of the binary log
fails, an incident event is written, because there may be a
chance that some changes for a given statement, were not properly
logged. For example, in SBR, LOAD DATA INFILE statement requires
more than one event to be logged, should rotation fail while
logging part of the LOAD DATA events, then the logged data would
become inconsistent with the data in the storage engine.
mysql-test/include/restart_mysqld.inc:
Refactored restart_mysqld so that it is not hardcoded for
mysqld.1, but rather for the current server.
mysql-test/suite/binlog/t/binlog_index.test:
The error on open of index and binary log on new_file_impl
is now caught. Thence the user will get an error message.
We need to accomodate this change in the test case for the
failing FLUSH LOGS.
mysql-test/suite/rpl/t/rpl_binlog_errors-master.opt:
Sets max_binlog_size to 4096.
mysql-test/suite/rpl/t/rpl_binlog_errors.test:
Added some test cases for asserting that the error is found
and reported.
sql/handler.cc:
Catching error now returned by unlog (in ha_commit_trans) and
returning it.
sql/log.cc:
Propagating errors from new_file_impl upwards. The errors that
new_file_impl catches now are:
- error on generate_new_name
- error on writing the rotate event
- error when opening the index or the binary log file.
sql/log.h:
Changing declaration of:
- rotate_and_purge
- new_file
- new_file_without_locking
- new_file_impl
- unlog
They now return int instead of void.
sql/mysql_priv.h:
Change signature of reload_acl_and_cache so that write_to_binlog
is an int instead of bool.
sql/mysqld.cc:
Redeclaring not_used var as int instead of bool.
sql/rpl_injector.cc:
Changes to catch the return from rotate_and_purge.
sql/slave.cc:
Changes to catch the return values for new_file and rotate_relay_log.
sql/slave.h:
Changes to rotate_relay_log declaration (now returns int
instead of void).
sql/sql_load.cc:
In SBR, some logging of LOAD DATA events goes through
IO_CACHE_CALLBACK invocation at mf_iocache.c:_my_b_get. The
IO_CACHE implementation is ignoring the return value for from
these callbacks (pre_read and post_read), so we need to find out
at the end of the execution if the error is set or not in THD.
sql/sql_parse.cc:
Catching the rotate_relay_log and rotate_and_purge return values.
Semantic change in reload_acl_and_cache so that we report errors
in binlog interactions through the write_to_binlog output parameter.
If there was any failure while rotating the binary log, we should
then report the error to the client when handling SQLCOMM_FLUSH.
Problem: Extended characters outside of ASCII range where not displayed
properly in SHOW PROCESSLIST, because thd_info->query was always sent as
system_character_set (utf8). This was wrong, because query buffer
is never converted to utf8 - it is always have client character set.
Fix: sending query buffer using query character set
@ sql/sql_class.cc
@ sql/sql_class.h
Introducing a new class CSET_STRING, a LEX_STRING with character set.
Adding set_query(&CSET_STRING)
Adding reset_query(), to use instead of set_query(0, NULL).
@ sql/event_data_objects.cc
Using reset_query()
@ sql/log_event.cc
Using reset_query()
Adding charset argument to set_query_and_id().
@ sql/slave.cc
Using reset_query().
@ sql/sp_head.cc
Changing backing up and restore code to use CSET_STRING.
@ sql/sql_audit.h
Using CSET_STRING.
In the "else" branch it's OK not to use
global_system_variables.character_set_client.
&my_charset_latin1, which is set in constructor, is fine
(verified with Sergey Vojtovich).
@ sql/sql_insert.cc
Using set_query() with proper character set: table_name is utf8.
@ sql/sql_parse.cc
Adding character set argument to set_query_and_id().
(This is the main point where thd->charset() is stored
into thd->query_string.cs, for use in "SHOW PROCESSLIST".)
Using reset_query().
@ sql/sql_prepare.cc
Storing client character set into thd->query_string.cs.
@ sql/sql_show.cc
Using CSET_STRING to fetch and send charset-aware query information
from threads.
@ storage/myisam/ha_myisam.cc
Using set_query() with proper character set: table_name is utf8.
@ mysql-test/r/show_check.result
@ mysql-test/t/show_check.test
Adding tests
Bug#57995: Compiler flag change build error on OSX 10.4: my_getncpus.c
Bug#57996: Compiler flag change build error on OSX 10.5 : bind.c
Bug#57994: Compiler flag change build error : my_redel.c
Bug#57993: Compiler flag change build error on FreeBsd 7.0 : regexec.c
Bug#57992: Compiler flag change build error on FreeBsd : mf_keycache.c
Bug#57997: Compiler flag change build error on OSX 10.6: debug_sync.cc
Fix assorted compiler generated warnings.
cmd-line-utils/readline/bind.c:
Bug#57996: Compiler flag change build error on OSX 10.5 : bind.c
Initialize variable to work around a false positive warning.
include/m_string.h:
Bug#57994: Compiler flag change build error : my_redel.c
The expansion of stpcpy (in glibc) causes warnings if the
return value of strmov is not being used. Since stpcpy is
a GNU extension and the expansion ends up using a built-in
provided by GCC, use the compiler provided built-in directly
when possible.
include/my_compiler.h:
Define a dummy MY_GNUC_PREREQ when not compiling with GCC.
libmysql/libmysql.c:
Bug#58057: 5.1 libmysql/libmysql.c unused variable/compile failure
Variable might not be used in some cases. So, tag it as unused.
mysys/mf_keycache.c:
Bug#57992: Compiler flag change build error on FreeBsd : mf_keycache.c
Use UNINIT_VAR to work around a false positive warning.
mysys/my_getncpus.c:
Bug#57995: Compiler flag change build error on OSX 10.4: my_getncpus.c
Declare variable in the same block where it is used.
regex/regexec.c:
Bug#57993: Compiler flag change build error on FreeBsd 7.0 : regexec.c
Work around a compiler bug which causes the cast to not be enforced.
sql/debug_sync.cc:
Bug#57997: Compiler flag change build error on OSX 10.6: debug_sync.cc
Use UNINIT_VAR to work around a false positive warning.
sql/handler.cc:
Use UNINIT_VAR to work around a false positive warning.
sql/slave.cc:
Use UNINIT_VAR to work around a false positive warning.
sql/sql_partition.cc:
Use UNINIT_VAR to work around a false positive warning.
storage/myisam/ft_nlq_search.c:
Use UNINIT_VAR to work around a false positive warning.
storage/myisam/mi_create.c:
Use UNINIT_VAR to work around a false positive warning.
storage/myisammrg/myrg_open.c:
Use UNINIT_VAR to work around a false positive warning.
tests/mysql_client_test.c:
Change function to take a pointer to const, no need for a cast.
- A prerequisite cleanup patch for making KILL reliable.
The test case main.kill did not work reliably.
The following problems have been identified:
1. A kill signal could go lost if it came in, short before a
thread went reading on the client connection.
2. A kill signal could go lost if it came in, short before a
thread went waiting on a condition variable.
These problems have been solved as follows. Please see also
added code comments for more details.
1. There is no safe way to detect, when a thread enters the
blocking state of a read(2) or recv(2) system call, where it
can be interrupted by a signal. Hence it is not possible to
wait for the right moment to send a kill signal. It has been
decided, not to fix it in the code. Instead, the test case
repeats the KILL statement until the connection terminates.
2. Before waiting on a condition variable, we register it
together with a synchronizating mutex in THD::mysys_var. After
this, we need to test THD::killed again. At some places we did
only test it in a loop condition before the registration. When
THD::killed had been set between this test and the registration,
we entered waiting without noticing the killed flag. Additional
checks ahve been introduced where required.
In addition to the above, a re-write of the main.kill test
case has been done. All sleeps have been replaced by Debug
Sync Facility synchronization. A couple of sync points have
been added to the server code.
To avoid further problems, if the test case fails in spite of
the fixes, the test case has been added to the "experimental"
list for now.
- Most of the work on this patch is authored by Ingo Struewing
mysql-test/t/kill.test:
Re-wrote test case to use Debug Sync points instead of sleeps
sql/event_queue.cc:
Fixed kill detection in Event_queue::cond_wait() by adding a check
after enter_cond().
sql/lock.cc:
Moved Debug Sync points behind enter_cond().
Fixed comments.
sql/slave.cc:
Fixed kill detection in start_slave_thread() by adding a check
after enter_cond().
sql/sql_class.cc:
Swapped order of kill and close in THD::awake().
Added comments.
sql/sql_class.h:
Added a comment to THD::killed.
sql/sql_parse.cc:
Added a sync point in do_command().
sql/sql_select.cc:
Added a sync point in JOIN::optimize().
replication aborts
When recieving a 'SLAVE STOP' command, slave SQL thread will roll back the
transaction and stop immidiately if there is only transactional table updated,
even through 'CREATE|DROP TEMPOARY TABLE' statement are in it. But These
statements can never be rolled back. Because the temporary tables to the user
session mapping remain until 'RESET SLAVE', Therefore it will abort SQL thread
with an error that the table already exists or doesn't exist, when it restarts
and executes the whole transaction again.
After this patch, SQL thread always waits till the transaction ends and then stops,
if 'CREATE|DROP TEMPOARY TABLE' statement are in it.
mysql-test/extra/rpl_tests/rpl_stop_slave.test:
Auxiliary file which is used to test this bug.
mysql-test/suite/rpl/t/rpl_stop_slave.test:
Test case for this bug.
sql/slave.cc:
Checking if OPTION_KEEP_LOG is set. If it is set, SQL thread should wait
until the transaction ends.
sql/sql_parse.cc:
Add a debug point for testing this bug.
When slave executes a transaction bigger than slave's max_binlog_cache_size,
slave will crash. It is caused by the assert that server should only roll back
the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL
happens. But slave sql thread always rollbacks the whole transaction when
an error happens.
Ather this patch, we always clear any error set in sql thread(it is different
from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
the transaction.
mysql-test/suite/rpl/r/rpl_binlog_max_cache_size.result:
SET binlog_cache_size and max_binlog_cache_size for all test cases.
Add test case for bug#55375.
mysql-test/suite/rpl/t/rpl_binlog_max_cache_size-master.opt:
binlog_cache_size and max_binlog_cache_size can be set in the client connection.
so remove this option file.
mysql-test/suite/rpl/t/rpl_binlog_max_cache_size.test:
SET binlog_cache_size and max_binlog_cache_size for all test cases.
Add test case for bug#55375.
sql/log_event.cc:
Some functions don't return the error code, so it is a wrong error code.
The error should always be set into thd->main_da. So we use
slave_rows_error_report to report the right error.
sql/slave.cc:
exec_relay_log_event() need call cleanup_context() to clear context.
clearup_context() will call end_trans().
Clear thd's error before cleanup_context. It avoid to trigger the assert
which cause this bug.
The procedure for setting up a fake binary log, by changing the
relay log files manually, is run twice when we issue mtr with
--repeat=2. However, when setting it up for the second time,
neither the sql thread is reset nor the server is restarted. This
means that previous stale relay log IO_CACHE metadata - from
first run - is left around. As a consequence, when the test is
run for the second time, the IO_CACHE for the relay log has
position in file different than zero, triggering the assertion.
We fix this by deploying a call to my_b_seek before calling
check_binlog_magic in next_event. This prevents the server
from asserting, in the cases that the SQL thread was reads
from a hot log (relay.NNNNN), then is stopped, then is restarted
from a previous cold log (relay.MMMMM), and then it reaches
the hot log relay.NNNNN again, in which case, the read
coordinates are not set to zero, but to the old values.
Additionally, some comments to the source code were added.
This patch also fixes Bug#55452 "SET PASSWORD is
replicated twice in RBR mode".
The goal of this patch is to remove the release of
metadata locks from close_thread_tables().
This is necessary to not mistakenly release
the locks in the course of a multi-step
operation that involves multiple close_thread_tables()
or close_tables_for_reopen().
On the same token, move statement commit outside
close_thread_tables().
Other cleanups:
Cleanup COM_FIELD_LIST.
Don't call close_thread_tables() in COM_SHUTDOWN -- there
are no open tables there that can be closed (we leave
the locked tables mode in THD destructor, and this
close_thread_tables() won't leave it anyway).
Make open_and_lock_tables() and open_and_lock_tables_derived()
call close_thread_tables() upon failure.
Remove the calls to close_thread_tables() that are now
unnecessary.
Simplify the back off condition in Open_table_context.
Streamline metadata lock handling in LOCK TABLES
implementation.
Add asserts to ensure correct life cycle of
statement transaction in a session.
Remove a piece of dead code that has also become redundant
after the fix for Bug 37521.
mysql-test/r/variables.result:
Update results: set @@autocommit and statement transaction/
prelocked mode.
mysql-test/r/view.result:
A harmless change in CHECK TABLE <view> status for a broken view.
If previously a failure to prelock all functions used in a view
would leave the connection in LTM_PRELOCKED mode, now we call
close_thread_tables() from open_and_lock_tables()
and leave prelocked mode, thus some check in mysql_admin_table() that
works only in prelocked/locked tables mode is no longer activated.
mysql-test/suite/rpl/r/rpl_row_implicit_commit_binlog.result:
Fixed Bug#55452 "SET PASSWORD is replicated twice in
RBR mode": extra binlog events are gone from the
binary log.
mysql-test/t/variables.test:
Add a test case: set autocommit and statement transaction/prelocked
mode.
sql/event_data_objects.cc:
Simplify code in Event_job_data::execute().
Move sp_head memory management to lex_end().
sql/event_db_repository.cc:
Move the release of metadata locks outside
close_thread_tables().
Make sure we call close_thread_tables() when
open_and_lock_tables() fails and remove extra
code from the events data dictionary.
Use close_mysql_tables(), a new internal
function to properly close mysql.* tables
in the data dictionary.
Contract Event_db_repository::drop_events_by_field,
drop_schema_events into one function.
When dropping all events in a schema,
make sure we don't mistakenly release all
locks acquired by DROP DATABASE. These
include locks on the database name
and the global intention exclusive
metadata lock.
sql/event_db_repository.h:
Function open_event_table() does not require an instance
of Event_db_repository.
sql/events.cc:
Use close_mysql_tables() instead of close_thread_tables()
to bootstrap events, since the latter no longer
releases metadata locks.
sql/ha_ndbcluster.cc:
- mysql_rm_table_part2 no longer releases
acquired metadata locks. Do it in the caller.
sql/ha_ndbcluster_binlog.cc:
Deploy the new protocol for closing thread
tables in run_query() and ndb_binlog_index
code.
sql/handler.cc:
Assert that we never call ha_commit_trans/
ha_rollback_trans in sub-statement, which
is now the case.
sql/handler.h:
Add an accessor to check whether THD_TRANS object
is empty (has no transaction started).
sql/log.cc:
Update a comment.
sql/log_event.cc:
Since now we commit/rollback statement transaction in
mysql_execute_command(), we need a mechanism to communicate
from Query_log_event::do_apply_event() to mysql_execute_command()
that the statement transaction should be rolled back, not committed.
Ideally it would be a virtual method of THD. I hesitate
to make THD a virtual base class in this already large patch.
Use a thd->variables.option_bits for now.
Remove a call to close_thread_tables() from the slave IO
thread. It doesn't open any tables, and the protocol
for closing thread tables is more complicated now.
Make sure we properly close thread tables, however,
in Load_data_log_event, which doesn't
follow the standard server execution procedure
with mysql_execute_command().
@todo: this piece should use Server_runnable
framework instead.
Remove an unnecessary call to mysql_unlock_tables().
sql/rpl_rli.cc:
Update Relay_log_info::slave_close_thread_tables()
to follow the new close protocol.
sql/set_var.cc:
Remove an unused header.
sql/slave.cc:
Remove an unnecessary call to
close_thread_tables().
sql/sp.cc:
Remove unnecessary calls to close_thread_tables()
from SP DDL implementation. The tables will
be closed by the caller, in mysql_execute_command().
When dropping all routines in a database, make sure
to not mistakenly drop all metadata locks acquired
so far, they include the scoped lock on the schema.
sql/sp_head.cc:
Correct the protocol that closes thread tables
in an SP instruction.
Clear lex->sphead before cleaning up lex
with lex_end to make sure that we don't
delete the sphead twice. It's considered
to be "cleaner" and more in line with
future changes than calling delete lex->sphead
in other places that cleanup the lex.
sql/sp_head.h:
When destroying m_lex_keeper of an instruction,
don't delete the sphead that all lex objects
share.
@todo: don't store a reference to routine's sp_head
instance in instruction's lex.
sql/sql_acl.cc:
Don't call close_thread_tables() where the caller will
do that for us.
Fix Bug#55452 "SET PASSWORD is replicated twice in RBR
mode" by disabling RBR replication in change_password()
function.
Use close_mysql_tables() in bootstrap and ACL reload
code to make sure we release all metadata locks.
sql/sql_base.cc:
This is the main part of the patch:
- remove manipulation with thd->transaction
and thd->mdl_context from close_thread_tables().
Now this function is only responsible for closing
tables, nothing else.
This is necessary to be able to easily use
close_thread_tables() in procedures, that
involve multiple open/close tables, which all
need to be protected continuously by metadata
locks.
Add asserts ensuring that TABLE object
is only used when is protected by a metadata lock.
Simplify the back off condition of Open_table_context,
we no longer need to look at the autocommit mode.
Make open_and_lock_tables() and open_normal_and_derived_tables()
close thread tables and release metadata locks acquired so-far
upon failure. This simplifies their usage.
Implement close_mysql_tables().
sql/sql_base.h:
Add declaration for close_mysql_tables().
sql/sql_class.cc:
Remove a piece of dead code that has also become redundant
after the fix for Bug 37521.
The code became dead when my_eof() was made a non-protocol method,
but a method that merely modifies the diagnostics area.
The code became redundant with the fix for Bug#37521, when
we started to cal close_thread_tables() before
Protocol::end_statement().
sql/sql_do.cc:
Do nothing in DO if inside a substatement
(the assert moved out of trans_rollback_stmt).
sql/sql_handler.cc:
Add comments.
sql/sql_insert.cc:
Remove dead code.
Release metadata locks explicitly at the
end of the delayed insert thread.
sql/sql_lex.cc:
Add destruction of lex->sphead to lex_end(),
lex "reset" method called at the end of each statement.
sql/sql_parse.cc:
Move close_thread_tables() and other related
cleanups to mysql_execute_command()
from dispatch_command(). This has become
possible after the fix for Bug#37521.
Mark federated SERVER statements as DDL.
Next step: make sure that we don't store
eof packet in the query cache, and move
the query cache code outside mysql_parse.
Brush up the code of COM_FIELD_LIST.
Remove unnecessary calls to close_thread_tables().
When killing a query, don't report "OK"
if it was a suicide.
sql/sql_parse.h:
Remove declaration of a function that is now static.
sql/sql_partition.cc:
Remove an unnecessary call to close_thread_tables().
sql/sql_plugin.cc:
open_and_lock_tables() will clean up
after itself after a failure.
Move close_thread_tables() above
end: label, and replace with close_mysql_tables(),
which will also release the metadata lock
on mysql.plugin.
sql/sql_prepare.cc:
Now that we no longer release locks in close_thread_tables()
statement prepare code has become more straightforward.
Remove the now redundant check for thd->killed() (used
only by the backup project) from Execute_server_runnable.
Reorder code to take into account that now mysql_execute_command()
performs lex->unit.cleanup() and close_thread_tables().
sql/sql_priv.h:
Add a new option to server options to interact
between the slave SQL thread and execution
framework (hack). @todo: use a virtual
method of class THD instead.
sql/sql_servers.cc:
Due to Bug 25705 replication of
DROP/CREATE/ALTER SERVER is broken.
Make sure at least we do not attempt to
replicate these statements using RBR,
as this violates the assert in close_mysql_tables().
sql/sql_table.cc:
Do not release metadata locks in mysql_rm_table_part2,
this is done by the caller.
Do not call close_thread_tables() in mysql_create_table(),
this is done by the caller.
Fix a bug in DROP TABLE under LOCK TABLES when,
upon error in wait_while_table_is_used() we would mistakenly
release the metadata lock on a non-dropped table.
Explicitly release metadata locks when doing an implicit
commit.
sql/sql_trigger.cc:
Now that we delete lex->sphead in lex_end(),
zero the trigger's sphead in lex after loading
the trigger, to avoid double deletion.
sql/sql_udf.cc:
Use close_mysql_tables() instead of close_thread_tables().
sql/sys_vars.cc:
Remove code added in scope of WL#4284 which would
break when we perform set @@session.autocommit along
with setting other variables and using tables or functions.
A test case added to variables.test.
sql/transaction.cc:
Add asserts.
sql/tztime.cc:
Use close_mysql_tables() rather than close_thread_tables().