The reason for this is that stop slave takes LOCK_active_mi over the
whole operation while some slave operations will also need LOCK_active_mi
which causes deadlocks.
Fixed by introducing object counting for Master_info and not taking
LOCK_active_mi over stop slave or even stop_all_slaves()
Another benefit of this approach is that it allows:
- Multiple threads can run SHOW SLAVE STATUS at the same time
- START/STOP/RESET/SLAVE STATUS on a slave will not block other slaves
- Simpler interface for handling get_master_info()
- Added some missing unlock of 'log_lock' in error condtions
- Moved rpl_parallel_inactivate_pool(&global_rpl_thread_pool) to end
of stop_slave() to not have to use LOCK_active_mi inside
terminate_slave_threads()
- Changed argument for remove_master_info() to Master_info, as we always
have this available
- Fixed core dump when doing FLUSH TABLES WITH READ LOCK and parallel
replication. Problem was that waiting for pause_for_ftwrl was not done
when deleting rpt->current_owner after a force_abort.
Tasks:-
Changes in wsrep_dirty_reads variable
1.) Global + Session scope (Current: session-only)
2.) Can be set using command line.
3.) Allow all commands that do not change data (besides SELECT)
4.) Allow prepared Statements that do not change data
5.) Works with wsrep_sync_wait enabled
Increase max number of possible table_open_cache instances from 512K
to 1024K. This only affects user who are trying to set the variable over
the old limit.
Delete not used test table_open_cache_instances_basic
(Need to be added back and rewritten in 10.2)
This has no functional changes, but it helps avoid merge problems from 10.0
to 10.1. In 10.0, code that checks for parallel replication uses
opt_slave_parallel_threads > 0, but this check needs to be
mi->using_parallel() in 10.1. By using the same check in 10.0 (with
unchanged semantics), merge problems to 10.1 are avoided.
In well defined C code, the "this" pointer is never NULL. Currently, we
were potentially dereferencing a NULL pointer (master_info_index). GCC v6
removes any "if (!this)" conditions as it assumes this is always a
non-null pointer. In order to prevent undefined behaviour, check the
pointer before dereferencing and remove the check within member
functions.
This changes variable wsrep_max_ws_size so that its value
is linked to the value of provider option repl.max_ws_size.
That is, changing the value of variable wsrep_max_ws_size
will change the value of provider option repl.max_ws_size,
and viceversa.
The writeset size limit is always enforced in the provider,
regardless of which option is used.
Since wsrep_sync_wait & wsrep_causal_reads variables are related,
they are always kept in sync whenever one of them changes.
Same is tried on server start, where wsrep_sync_wait get updated
based on wsrep_causal_reads' value. But, since wsrep_causal_reads
is OFF by default, wsrep_sync_wait's value gets modified and loses
its WSREP_SYNC_WAIT_BEFORE_READ bit.
Fixed by syncing wsrep_sync_wait & wsrep_causal_reads values
individually on server start in mysqld_get_one_option() based
on command line arguments used.
Post-fix #2:
- Update test results
- Make the optimization conditional under @@optimizer_switch flag.
- The optimization is now disabled by default, so .result files
are changed back to be what they were before the MDEV-8989 patch.
LOG-SLOW-SLAVE-STATEMENTS NOT DISPLAYED.
These parameters were moved from the command line options to
the system variables section. Treatment of the
opt_log_slow_slave_statements changed to let the
dynamic change of the variable.
* make a local variable for target_table->field[col]
* move an often-used bit function to my_bit.h
* remove a non-static and not really needed trivial comparison
function with a very generic name
This occurs when replication stops with an error, domain-based parallel
replication is used, and the GTID position contains more than one domain.
Furthermore, it relates to the case where the SQL thread is restarted
without first stopping the IO thread.
In this case, the file/offset relay-log position does not correctly
represent the slave's multi-dimensional position, because other domains may
be far ahead of, or behind, the domain with the failing event. So the code
reverts the relay log position back to the start of a relay log file that is
known to be before all active domains.
There was a bug that when the SQL thread was restarted, the
rli->relay_log_state was incorrectly initialised from @@gtid_slave_pos. This
position will likely be too far ahead, due to reverting the relay log
position. Thus, if the replication fails again after the SQL thread restart,
the rli->restart_gtid_pos might be updated incorrectly. This in turn would
cause a second SQL thread restart to replicate from the wrong position, if
the IO thread was still left running.
The fix is to initialise rli->relay_log_state from @@gtid_slave_pos only
when we actually purge and re-fetch relay logs from the master, not at every
SQL thread start.
A related problem is the use of sql_slave_skip_counter to resolve
replication failures in this kind of scenario. Since the slave position is
multi-dimensional, sql_slave_skip_counter can not work properly - it is
indeterminate exactly which event is to be skipped, and is unlikely to work
as expected for the user. So make this an error in the case where
domain-based parallel replication is used with multiple domains, suggesting
instead the user to set @@gtid_slave_pos to reliably skip the desired event.
This includes fixing all utilities to not have any memory leaks,
as safemalloc warnings stopped tests from passing on MacOSX.
- Ensure that all clients takes character-set-dir, as the
libmysqlclient library will use it.
- mysql-test-run now passes character-set-dir to all external clients.
- Changed dynstr_free() so that it can be called twice (made freeing code easier)
- Changed rpl_global_gtid_slave_state to be allocated dynamicly as it
includes a mutex that needs to be initizlied/destroyed before my_end() is called.
- Removed rpl_slave_state::init() and rpl_slave_stage::deinit() as
their job are better handling by constructor and delete.
- Print alias instead of table_name in check_duplicate_key as
table_name may have been converted to lower case.
Other things:
- Fixed a case in time_to_datetime_with_warn() where we where
using && instead of & in tests
rpl/rpl_mdev382 ; Wrong replace in show_binlog_events2.inc
binlog/database ; Different error on Solaris if file exists
mroonga/repair_table_no_index_file ; Different system error on Solaris
partition_not_blackhole ; Different error on Solaris
partition_myisam ; Different error on Solaris
Some other failures in mroonga was because have_32bit.inc didn't correctly
detect 64 bits on Solaris. Fixed using DEFAULT_MACHINE instead of MACHINE_TYPE
for Sys_version_compile_machine.
new features:
set event_scheduler=ON|OFF will now try to init event scheduler
if it's not enabled
set event_scheduler=default will try to enable it based on
the value of the event_scheduler when mysqld was started