fix a few cases where a successful ALTER was not binlogged:
* on errors after the completed ALTER, binlog it, then return an error
* don't let thd->killed abort open_table() after completed
online ALTER.
create_partition_index_description() had wrong logic to calculate
length of the key value buffer that is used by the range optimizer.
For some reason it used MAX(partitioning_columns_len,
subpartitioning_columns_len) while it should use SUM of these values.
This occurs when replication stops with an error, domain-based parallel
replication is used, and the GTID position contains more than one domain.
Furthermore, it relates to the case where the SQL thread is restarted
without first stopping the IO thread.
In this case, the file/offset relay-log position does not correctly
represent the slave's multi-dimensional position, because other domains may
be far ahead of, or behind, the domain with the failing event. So the code
reverts the relay log position back to the start of a relay log file that is
known to be before all active domains.
There was a bug that when the SQL thread was restarted, the
rli->relay_log_state was incorrectly initialised from @@gtid_slave_pos. This
position will likely be too far ahead, due to reverting the relay log
position. Thus, if the replication fails again after the SQL thread restart,
the rli->restart_gtid_pos might be updated incorrectly. This in turn would
cause a second SQL thread restart to replicate from the wrong position, if
the IO thread was still left running.
The fix is to initialise rli->relay_log_state from @@gtid_slave_pos only
when we actually purge and re-fetch relay logs from the master, not at every
SQL thread start.
A related problem is the use of sql_slave_skip_counter to resolve
replication failures in this kind of scenario. Since the slave position is
multi-dimensional, sql_slave_skip_counter can not work properly - it is
indeterminate exactly which event is to be skipped, and is unlikely to work
as expected for the user. So make this an error in the case where
domain-based parallel replication is used with multiple domains, suggesting
instead the user to set @@gtid_slave_pos to reliably skip the desired event.
The problem was that wait_for_slave_io_to_start reported that the io thread
was ready, when it was still initializing. This caused test suite to
continue too early, for example before the semi sync plugin was properly
enabled.
Fixed by introducing a new internal stage: "Preparing". Slave_IO_Running is
now set to "Yes" only when all initializing is done and the IO thread is
ready to read things from the master.
The only test affected by this change is rpl_flsh_tbls, which got stuck in
the preparing phase while trying to read the GTID position from a table.
Fixed by having this test waiting for Preparing instead of Yes.
- Added testing if connection is killed to shortcut reading of connection data
This will allow us later in 10.2 to do a cleaner shutdown of slaves (less errors in the log)
- Add new status variables: Slaves_connected, Slaves_running and Slave_connections.
- Use MYSQL_SLAVE_NOT_RUN instead of 0 with slave_running.
- Don't print obvious extra warnings to the error log when slave is shut down normally.
On shutdown feedback was sending a short report without creating
a THD. At that point current_thd was pointing to the already
destroyed THD from the previous full report.
backport from 10.1:
commit bfe703a
Author: Sergei Golubchik <serg@mariadb.org>
Date: Tue Feb 3 18:19:56 2015 +0100
don't let current_thd to point to a destroyed THD
that was mistakenly merged from mysql-5.5.47
(introduces valgrind failures in main.sp, because Field_varstring
columns are created as FIELD_NORMAL and that causes aria to
read bytes between the actual value length and field max length)
Item_func_coalesce::fix_length_and_dec() calls
Item_func::count_string_result_length()) which called agg_arg_charsets()
with wrong flags, so the collation derivation of the COALESCE result was
not properly set to DERIVATION_COERCIBLE. It erroneously stayed
DERIVATION_NUMERIC. So GREATEST() misinterpreted the argument as
a number rather that a string and did not calculate its own length properly.
Due to a bug in Visual Studio 2015 runtime, some newlines get lost
which makes the bootstrapping fail (which also makes MSI installer
non-functional).
This does not have a visible effect on packages we produce so far,
because we do not use VS2015 yet for building them.
This includes fixing all utilities to not have any memory leaks,
as safemalloc warnings stopped tests from passing on MacOSX.
- Ensure that all clients takes character-set-dir, as the
libmysqlclient library will use it.
- mysql-test-run now passes character-set-dir to all external clients.
- Changed dynstr_free() so that it can be called twice (made freeing code easier)
- Changed rpl_global_gtid_slave_state to be allocated dynamicly as it
includes a mutex that needs to be initizlied/destroyed before my_end() is called.
- Removed rpl_slave_state::init() and rpl_slave_stage::deinit() as
their job are better handling by constructor and delete.
- Print alias instead of table_name in check_duplicate_key as
table_name may have been converted to lower case.
Other things:
- Fixed a case in time_to_datetime_with_warn() where we where
using && instead of & in tests
Problem was that we used same condition variable with 2 different mutex.
Fixed by changing to use COND_rpl_thread_stop instead of COND_parallel_entry
for stopping threads.
Patch by Kristian Nielsen
RENAME TABLE code tries to update EITS statistics. It hung, because
it used an index on (db_name,table_name) to find the table, and attempted
to update these values at the same time. The fix is do what SQL UPDATE
statement does when updating index that it's used for scanning:
- First, buffer the rowids of rows to be updated,
- then make the second pass to actually update the rows
Also fixed the call to rename_table_in_stat_tables() in sql_rename.cc
to pass the correct new database (before, it passed old db_name so cross-
database renames were not handled correctly).
Variant #2, with review feedback addressed.
rpl/rpl_mdev382 ; Wrong replace in show_binlog_events2.inc
binlog/database ; Different error on Solaris if file exists
mroonga/repair_table_no_index_file ; Different system error on Solaris
partition_not_blackhole ; Different error on Solaris
partition_myisam ; Different error on Solaris
Some other failures in mroonga was because have_32bit.inc didn't correctly
detect 64 bits on Solaris. Fixed using DEFAULT_MACHINE instead of MACHINE_TYPE
for Sys_version_compile_machine.
Problem is that FLUSH TABLES WITH READ LOCK first blocks threads from
starting new commits, then waits for running commits to complete. But
in-order parallel replication needs commits to happen in a particular
order, so this can easily deadlock.
To fix this problem, this patch introduces a way to temporarily pause
the parallel replication worker threads. Before starting FTWRL, we let
all worker threads complete in-progress transactions, and then
wait. Then we proceed to take the global read lock. Once the lock is
obtained, we unpause the worker threads. Now commits are blocked from
starting by the global read lock, so the deadlock will no longer occur.
Before, the Seconds_behind_master was updated already when an event
was queued for a worker thread to execute later. This might lead users
to interpret a low value as the slave being almost up to date with the
master, while in reality there might still be lots and lots of events
still queued up waiting to be applied by the slave.
See https://lists.launchpad.net/maria-developers/msg08958.html for
more detailed discussions.
Patch backported from MariaDB 10.1
- Ensure that we wait with cleanup() until slave thread has stopped.
- Added signal_thd_deleted() to signal close_connections() that all THD's has been freed.
Other things
- Removed not needed calls to THD_CHECK_SENTRY() when we are calling 'delete thd'.