The problem was that wait_for_slave_io_to_start reported that the io thread
was ready, when it was still initializing. This caused test suite to
continue too early, for example before the semi sync plugin was properly
enabled.
Fixed by introducing a new internal stage: "Preparing". Slave_IO_Running is
now set to "Yes" only when all initializing is done and the IO thread is
ready to read things from the master.
The only test affected by this change is rpl_flsh_tbls, which got stuck in
the preparing phase while trying to read the GTID position from a table.
Fixed by having this test waiting for Preparing instead of Yes.
- Added testing if connection is killed to shortcut reading of connection data
This will allow us later in 10.2 to do a cleaner shutdown of slaves (less errors in the log)
- Add new status variables: Slaves_connected, Slaves_running and Slave_connections.
- Use MYSQL_SLAVE_NOT_RUN instead of 0 with slave_running.
- Don't print obvious extra warnings to the error log when slave is shut down normally.
On shutdown feedback was sending a short report without creating
a THD. At that point current_thd was pointing to the already
destroyed THD from the previous full report.
backport from 10.1:
commit bfe703a
Author: Sergei Golubchik <serg@mariadb.org>
Date: Tue Feb 3 18:19:56 2015 +0100
don't let current_thd to point to a destroyed THD
As a fix for MDEV-8208, for initial wsrep threads, the
invocation of init_for_queries() was moved after plugins
were initialized. Due to which, OPTION_BEGIN bit of wsrep
applier THD (originally set in wsrep_replication_process)
got reset due to implicit commit within init_for_queries().
As a result, events from a multi-statement transaction from
another node were committed separately by the applier thread,
which leads to an assertion as they all carry same seqno.
Fixed by making sure that variable.option_bits are restored
post init_for_queries(). Also restored server_status.
Added a test case.
Problem was that we used same condition variable with 2 different mutex.
Fixed by changing to use COND_rpl_thread_stop instead of COND_parallel_entry
for stopping threads.
Patch by Kristian Nielsen
Problem is that FLUSH TABLES WITH READ LOCK first blocks threads from
starting new commits, then waits for running commits to complete. But
in-order parallel replication needs commits to happen in a particular
order, so this can easily deadlock.
To fix this problem, this patch introduces a way to temporarily pause
the parallel replication worker threads. Before starting FTWRL, we let
all worker threads complete in-progress transactions, and then
wait. Then we proceed to take the global read lock. Once the lock is
obtained, we unpause the worker threads. Now commits are blocked from
starting by the global read lock, so the deadlock will no longer occur.
Patch backported from MariaDB 10.1
- Ensure that we wait with cleanup() until slave thread has stopped.
- Added signal_thd_deleted() to signal close_connections() that all THD's has been freed.
Other things
- Removed not needed calls to THD_CHECK_SENTRY() when we are calling 'delete thd'.
new features:
set event_scheduler=ON|OFF will now try to init event scheduler
if it's not enabled
set event_scheduler=default will try to enable it based on
the value of the event_scheduler when mysqld was started
Addendum:
* Before calling THD::init_for_queries(), flip the current_thd to wsrep
thread so that memory gets allocated for the right THD.
* Use wsrep_creating_startup_threads instead of plugins_are_initialized
as the condition for the execution of THD::init_for_queries() within
start_wsrep_THD(), as use of latter could still leave some room for
race.
Problem:
When mysqld starts as a galera node, it creates 2 system threads
(applier & rollbacker) using start_wsrep_THD(). These threads are
created before plugin initialization (plugin_init()) for SST methods
like rsync and xtrabackup.
The threads' initialization itself can proceed in parallel to mysqld's
main thread of execution. As a result, the thread initialization code
(start_wsrep_THD()) can end up accessing some un/partially initialized
structures (like maria_hton, in this particular case) resulting in
segfault.
Solution:
Fixed by calling THD::init_for_queries() (which accesses maria_hton)
only after the plugins have been initialized.
PROBLEMS
Description:- Server variable "--lower_case_tables_names"
when set to "0" on windows platform which does not support
case sensitive file operations leads to problems. A warning
message is printed in the error log while starting the
server with "--lower_case_tables_names=0". Also according to
the documentation, seting "lower_case_tables_names" to "0"
on a case-insensitive filesystem might lead to index
corruption.
Analysis:- The problem reported in the bug is:-
Creating an INNODB table 'a' and executing a query, "INSERT
INTO a SELECT a FROM A;" on a server started with
"--lower_case_tables_names=0" and running on a
case-insensitive filesystem leads innodb to flat spin.
Optimizer thinks that "a" and "A" are two different tables
as the variable "lower_case_table_names" is set to "0". As a
result, optimizer comes up with a plan which does not need a
temporary table. If the same table is used in select and
insert, a temporary table is needed. This incorrect
optimizer plan leads to infinite insertions.
Fix:- If the server is started with
"--lower_case_tables_names" set to 0 on a case-insensitive
filesystem, an error, "The server option
'lower_case_table_names'is configured to use case sensitive
table names but the data directory is on a case-insensitive
file system which is an unsupported combination. Please
consider either using a case sensitive file system for your
data directory or switching to a case-insensitive table name
mode.", is printed in the server error log and the server
exits.
field.cc
- Fixed warning about overlapping memory copy (backport from 10.0)
Item_subselect.cc
- Fixed core dump in main.view
- Problem was that thd->lex->current_select->master_unit()->item was not set, which caused crash in maxr_as_dependent
sql/mysqld.cc
- Got error on shutdown as we where freeing mutex before all THD objects was freed
(~THD uses some mutex). Fixed by during shutdown freeing THD inside mutex.
sql/log.cc
- log_space_lock and LOCK_log where locked in inconsistenly. Fixed by not having a log_space_lock around purge_logs.
sql/slave.cc
- Remove unnecessary log_space_lock
- Move cond_broadcast inside lock to ensure we don't miss the signal
when --bind-address is not specificed explicitly (or set to '*')
MariaDB tries all wildcard addresses. Print a warning (not an error)
if a socket cannot be created for some of them.
Still print an error if a socket cannot be created for an address
that a user has specified expicitly with --bind-address.
When the slave processes the master restart format_description event,
parallel replication needs to complete any prior events before processing
the restart event (which closes temporary tables and such stuff).
This happens in wait_for_workers_idle(), however it was not waiting long
enough. The wait was using wait_for_prior_commit(), but at that points table
can still be open. This lead to assertion in this case.
So change wait_for_workers_idle() to wait until all worker threads have
reached finish_event_group(), at which point all tables should have been
closed.
During server start, as wsrep initialization happens before
plugin_init(), segfault may occur if wsrep THDs try to access
the uninitialized maria_hton.