Commit graph

2400 commits

Author SHA1 Message Date
Vicențiu Ciorbaru
6e55236c0a Merge branch '10.0-galera' into 10.1 2018-06-12 19:39:37 +03:00
Jan Lindström
648cf7176c Merge remote-tracking branch 'origin/5.5-galera' into 10.0-galera 2018-05-07 13:49:14 +03:00
sjaakola
2f0b8f3e02 MDEV-16005 sporadic failures with galera tests MW-328B and MW-328C
These test can sporadically show mutex deadlock warnings between LOCK_wsrep_thd
and LOCK_thd_data mutexes. This means that these mutexes can be locked in opposite
order by different threads, and thus result in deadlock situation.
To fix such issue, the locking policy of these mutexes should be revised and
enforced to be uniform. However, a quick code review shows that the number of
lock/unlock operations for these mutexes combined is between 100-200, and all these
mutex invocations should be checked/fixed.

On the other hand, it turns out that LOCK_wsrep_thd is used for protecting access to
wsrep variables of THD (wsrep_conflict_state, wsrep_query_state), whereas LOCK_thd_data
protects query, db and mysys_var variables in THD. Extending LOCK_thd_data to protect
also wsrep variables looks like a viable solution, as there should not be a use case
where separate threads need simultaneous access to wsrep variables and THD data variables.

In this commit LOCK_wsrep_thd mutex is refactored to be replaced by LOCK_thd_data.
By bluntly replacing LOCK_wsrep_thd by LOCK_thd_data, will result in double locking
of LOCK_thd_data, and some adjustements have been performed to fix such situations.
2018-04-24 16:57:39 +03:00
Sachin Setiya
3cecb1bab3 Merge tag 'mariadb-10.0.33' into bb-10.0-galera 2017-11-03 12:34:05 +05:30
Sergei Golubchik
9d2e2d7533 Merge branch '10.0' into 10.1 2017-10-22 13:03:41 +02:00
Alexey Yurchenko
86d31ce9f1 MW-384 protect access to wsrep_ready variable with mutex 2017-10-19 09:34:09 +03:00
Jan Lindström
8da6b4ef52 Merge tag 'mariadb-5.5.58' into 5.5-galera 2017-10-19 09:06:17 +03:00
Sergei Golubchik
da4503e956 Merge branch '5.5' into 10.0 2017-10-18 15:14:39 +02:00
Sergei Golubchik
df5f25fa7a Merge branch 'mysql/5.5' into 5.5 2017-10-17 10:18:17 +02:00
Venkatesh Duggirala
d75f8a1742 Bug#24763131 LOCAL-INFILE DEFAULT SHOULD BE DISABLED
Problem & Analysis: Slave's Receiver thread, Applier thread and worker
    threads are created with LOCAL-INFILE option enabled. As the document
    says https://dev.mysql.com/doc/refman/5.7/en/load-data-local.html,
    there are some issues if a thread enables local infile.
    This flag should be enabled with care. But for the above mentioned
    internal threads, server is enabling it at the time of creation.

Fix: Further analysis on the code shows that none of threads really
    need this flag to be enabled at any time as Slave never executes
    "LOAD DATA LOCAL INFILE" after reading it from Relay log.
    Applier thread removes "LOCAL" before start executing the query.
2017-08-23 09:16:12 +05:30
Jan Lindström
56b03e308f Merge tag 'mariadb-10.0.32' into 10.0-galera 2017-08-09 08:56:11 +03:00
Sergei Golubchik
8e8d42ddf0 Merge branch '10.0' into 10.1 2017-08-08 10:18:43 +02:00
Monty
19f2b3d02f Fixed compiler warnings 2017-08-07 03:48:58 +03:00
Sergei Golubchik
c784277590 move the error message where it belongs 2017-07-27 12:43:03 +02:00
Vicențiu Ciorbaru
786ad0a158 Merge remote-tracking branch 'origin/5.5' into 10.0 2017-07-25 00:41:54 +03:00
Jan Lindström
a481de30bb Merge tag 'mariadb-5.5.57' into 5.5-galera 2017-07-20 08:56:09 +03:00
Sergei Golubchik
9a5fe1f4ea Merge remote-tracking branch 'mysql/5.5' into 5.5 2017-07-18 14:59:10 +02:00
Sergei Golubchik
9e11e055ce Merge branch '10.0' into 10.1 2017-07-07 11:30:03 +02:00
Andrei Elkin
946a07e8a8 Fix for MDEV-9670 server_id mysteriously set to 0
Problem was that in a circular replication setup the master remembers
position to events it has generated itself when reading from a slave.
If there are no new events in the queue from the slave, a
Gtid_list_log_event is generated to remember the last skipped event.
The problem happens if there is a network delay and we generate a
Gtid_list_log_event in the middle of the transaction, in which case there
will be an implicit comment and a new transaction with serverid=0 will be
logged.

The fix was to not generate any Gtid_list_log_events in the middle of a
transaction.
2017-07-02 19:47:30 +03:00
Sachin Setiya
92209ac6f6 Merge tag 'mariadb-10.0.31' into 10.0-galera
Signed-off-by: Sachin Setiya <sachin.setiya@mariadb.com>
2017-05-30 15:28:52 +05:30
Marko Mäkelä
13a350ac29 Merge 10.0 into 10.1 2017-05-19 12:29:37 +03:00
Marko Mäkelä
71cd205956 Silence bogus GCC 7 warnings -Wimplicit-fallthrough
Do not silence uncertain cases, or fix any bugs.

The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
2017-05-17 08:27:04 +03:00
Marko Mäkelä
7972da8aa1 Silence bogus GCC 7 warnings -Wimplicit-fallthrough
Do not silence uncertain cases, or fix any bugs.

The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
2017-05-17 08:07:02 +03:00
Sergei Golubchik
71b4503242 MDEV-9998 Fix issues caught by Clang's -Wpointer-bool-conversion warning
remove useless checks
and a couple of others
2017-05-15 22:23:10 +02:00
Marko Mäkelä
8c38147cdd Merge 10.0 into 10.1 2017-04-21 12:46:12 +03:00
Kristian Nielsen
88613e1df6 MDEV-11201: gtid_ignore_duplicates incorrectly ignores statements when GTID replication is not enabled
When master_use_gtid=no, the IO thread loads the slave GTID state from
the master during connect. This races with the SQL thread when
gtid_ignore_duplicates=1. If an event is in the relay log from before
the new connect and has not been applied yet, moving the slave
position causes the SQL thread to think that event should be skipped
due to gtid_ignore_duplicates=1.

This patch simply disables gtid_ignore_duplicates when not using GTID,
which seems to be what one would expect.
2017-04-10 07:53:27 +02:00
Sergei Golubchik
09a2107b1b Merge branch '10.0' into 10.1 2017-03-21 19:20:44 +01:00
Sachin Setiya
9cf499724f Merge branch '10.0' into bb-10.0-galera 2017-03-20 18:11:56 +05:30
Sachin Setiya
f66395f7c0 Merge tag 'mariadb-10.0.30' into bb-sachin-10.0-galera-merge
Signed-off-by: Sachin Setiya <sachin.setiya@mariadb.com>
2017-03-17 02:05:20 +05:30
Monty
2d0c579a86 Wait for slave threads to start during startup
- Before this patch during startup all slave threads was started without
  any check that they had started properly.
- If one did a START SLAVE, STOP SLAVE or CHANGE MASTER as first command to the server
  there was a chance that server could access structures that where not
  properly  initialized which could lead to crashes in
  Log_event::read_log_event
- Fixed by waiting for slave threads to start up properly also during
  server startup, like we do with START SLAVE.
2017-03-16 14:21:33 +02:00
Marko Mäkelä
adc91387e3 Merge 10.0 into 10.1 2017-03-03 13:27:12 +02:00
Monty
f3c65ce951 Add protection to not access is_open() without LOCK_log mutex
Protection added to reopen_file() and new_file_impl().

Without this we could get an assert in fn_format() as name == 0,
because the file was closed and name reset, atthe same time
new_file_impl() was called.
2017-02-28 16:10:47 +01:00
Monty
b624b41abb Don't allow one to kill START SLAVE while the slaves IO_THREAD or SQL_THREAD
is starting.

This is needed as if we kill the START SLAVE thread too early during
shutdown then the IO_THREAD or SQL_THREAD will not have time to properly
initlize it's replication or THD structures and clean_up() will try to
delete master_info structures that are still in use.
2017-02-28 16:10:47 +01:00
Monty
4bad74e139 Added error checking for all calls to flush_relay_log_info() and stmt_done() 2017-02-28 16:10:47 +01:00
Monty
c5e25c8b40 Added a separate lock for start/stop/reset slave.
This solves some possible dead locks when one calls stop slave while slave
is starting.
2017-02-28 16:10:46 +01:00
Monty
e65f667bb6 MDEV-9573 'Stop slave' hangs on replication slave
The reason for this is that stop slave takes LOCK_active_mi over the
whole operation while some slave operations will also need LOCK_active_mi
which causes deadlocks.

Fixed by introducing object counting for Master_info and not taking
LOCK_active_mi over stop slave or even stop_all_slaves()

Another benefit of this approach is that it allows:
- Multiple threads can run SHOW SLAVE STATUS at the same time
- START/STOP/RESET/SLAVE STATUS on a slave will not block other slaves
- Simpler interface for handling get_master_info()
- Added some missing unlock of 'log_lock' in error condtions
- Moved rpl_parallel_inactivate_pool(&global_rpl_thread_pool) to end
  of stop_slave() to not have to use LOCK_active_mi inside
  terminate_slave_threads()
- Changed argument for remove_master_info() to Master_info, as we always
  have this available
- Fixed core dump when doing FLUSH TABLES WITH READ LOCK and parallel
  replication. Problem was that waiting for pause_for_ftwrl was not done
  when deleting rpt->current_owner after a force_abort.
2017-02-28 16:10:46 +01:00
Sujatha Sivakumar
e619295e1b Bug#24901077: RESET SLAVE ALL DOES NOT ALWAYS RESET SLAVE
Description:
============
If you have a relay log index file that has ended up with
some relay log files that do not exists, then RESET SLAVE
ALL is not enough to get back to a clean state.

Analysis:
=========
In the bug scenario slave server is in stopped state and
some of the relay logs got deleted but the relay log index
file is not updated.

During slave server restart replication initialization fails
as some of the required relay logs are missing. User
executes RESET SLAVE/RESET SLAVE ALL command to start a
clean slave. As per the documentation RESET SLAVE command
clears the master info and relay log info repositories,
deletes all the relay log files, and starts a new relay log
file. But in a scenario where the slave server's
Relay_log_info object is not initialized slave will not
purge the existing relay logs. Hence the index file still
remains in a bad state. Users will not be able to start
the slave unless these files are cleared.

Fix:
===
RESET SLAVE/RESET SLAVE ALL commands should do the cleanup
even in a scenario where Relay_log_info object
initialization failed.

Backported a flag named 'error_on_rli_init_info' which is
required to identify slave's Relay_log_info object
initialization failure. This flag exists in MySQL-5.6
onwards as part of BUG#14021292 fix.

During RESET SLAVE/RESET SLAVE ALL execution this flag
indicates the Relay_log_info initialization failure.
In such a case open the relay log index/relay log files
and do the required clean up.
2017-02-28 10:00:51 +05:30
Nirbhay Choubey
ee8b5c305a Merge tag 'mariadb-10.0.29' into 10.0-galera 2017-01-13 13:53:59 -05:00
Marko Mäkelä
5044dae239 Merge 10.0 into 10.1 2017-01-10 14:30:11 +02:00
Kristian Nielsen
43378f367c MDEV-10271: Stopped SQL slave thread doesn't print a message to error log like IO thread does
Make the slave SQL thread always output to the error log the message "Slave
SQL thread exiting, replication stopped in ..." whenever it previously
outputted "Slave SQL thread initialized, starting replication ...".

Before this patch, it was somewhat inconsistent in which cases the message
would be output and in which not, depending on the exact time and cause of
the condition that caused the SQL thread to stop.
2017-01-06 10:46:20 +01:00
Sergei Golubchik
2f20d297f8 Merge branch '10.0' into 10.1 2016-12-11 09:53:42 +01:00
Kristian Nielsen
390f2a013b Fix incorrect reading of events from relaylog in parallel replication.
The SQL thread keeps track of the position in the current relay log from
which to read the next event. This position is not normally used, but a
certain interaction with the IO thread can cause the SQL thread to re-open
the relay log and seek to the stored position.

In parallel replication, there were a couple of places where the position
was not updated. This created a race where a re-open of the relay log could
seek to the wrong position and start re-reading and processing events
already handled once, causing various kinds of problems.

Fix this by moving the position update into a single place in
apply_event_and_update_pos(), which should ensure that the position is
always updated in the parallel replication case.

This problem was found from the testcase of MDEV-10863, but it is logically
a separate problem.
2016-11-16 11:00:38 +01:00
Kristian Nielsen
f1fcc1fc10 Back-port Master_info::using_parallel() to 10.0.
This has no functional changes, but it helps avoid merge problems from 10.0
to 10.1. In 10.0, code that checks for parallel replication uses
opt_slave_parallel_threads > 0, but this check needs to be
mi->using_parallel() in 10.1. By using the same check in 10.0 (with
unchanged semantics), merge problems to 10.1 are avoided.
2016-11-15 23:00:11 +01:00
Kristian Nielsen
bccd0b5e0e Merge branch 'mdev10863' into 10.1 2016-11-15 13:10:21 +01:00
Kristian Nielsen
717f212840 MDEV-10863: parallel replication tries to continue from wrong position
This occured when the SQL thread (but not the IO thread) stops while
GTID and parallel replication are used with multiple domain ids in the
GTID position, and is restarted.

In this case, the SQL needs to start some way back in the relay log,
applying or skipping events within each replication domain as
appropriate.

The SQL threads starts at the beginning of an old relay log file, and
this position may be in the middle of an event group. The bug was that
such partial event group could be re-applied, causing replication
corruption.

This patch fixes the issue, by making sure to skip any initial events
that were part of an earlier (already applied) event group.
2016-11-04 12:33:42 +01:00
Sergei Golubchik
a98c85bb50 Merge branch '10.0-galera' into 10.1 2016-11-02 13:44:07 +01:00
Nirbhay Choubey
5db2195a35 Merge tag 'mariadb-10.0.28' into 10.0-galera 2016-10-28 15:50:13 -04:00
Sergei Golubchik
22490a0d70 MDEV-8345 STOP SLAVE should not cause an ERROR to be logged to the error log
cherry-pick from 5.7:
  commit 6b24763
  Author: Manish Kumar <manish.4.kumar@oracle.com>
  Date:   Tue Mar 27 13:10:42 2012 +0530

  BUG#12977988 - ON STOP SLAVE: ERROR READING PACKET FROM SERVER: LOST CONNECTION
                 TO MYSQL SERVER
  BUG#11761457 - ERROR 2013 + "ERROR READING RELAY LOG EVENT" ON STOP SLAVEBUG#12977988 - ON STOP SLAVE: ERROR READING PACKET FROM SERVER: LOST CONNECTION
               TO MYSQL SERVER
2016-10-26 18:44:34 +02:00
Kristian Nielsen
50f19ca809 Remove unnecessary global mutex in parallel replication.
The function apply_event_and_update_pos() is called with the
rli->data_lock mutex held. However, there seems to be nothing in the
function actually needing the mutex to be held. Certainly not in the
parallel replication case, where sql_slave_skip_counter is always 0
since the non-zero case is handled by the SQL driver thread.

So this patch makes parallel replication use a variant of
apply_event_and_update_pos() without the need to take the
rli->data_lock mutex. This avoids one contended global mutex for each
event executed, which might improve performance on CPU-bound workloads
somewhat.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2016-10-14 22:44:40 +02:00
Kristian Nielsen
7e0c9de864 Parallel replication async deadlock kill
When a deadlock kill is detected inside the storage engine, the kill
is not done immediately, to avoid calling back into the storage engine
kill_query method with various lock subsystem mutexes held. Instead the
kill is queued and done later by a slave background thread.

This patch in preparation for fixing TokuDB optimistic parallel
replication, as well as for removing locking hacks in InnoDB/XtraDB in
10.2.

Signed-off-by: Kristian Nielsen <knielsen at knielsen-hq.org>
2016-09-08 15:25:40 +02:00