mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 12:02:42 +01:00

Author	SHA1	Message	Date
Oleksandr Byelkin	edd0b03e60	Merge branch '10.3' into 10.4	2023-05-02 10:09:27 +02:00
Andrei	55a53949be	MDEV-29621: Replica stopped by locks on sequence When using binlog_row_image=FULL with sequence table inserts, a replica can deadlock because it treats full inserts in a sequence as DDL statements by getting an exclusive lock on the sequence table. It has been observed that with parallel replication, this exclusive lock on the sequence table can lead to a deadlock where one transaction has the exclusive lock and is waiting on a prior transaction to commit, whereas this prior transaction is waiting on the MDL lock. This fix for this is on the master side, to raise FL_DDL flag on the GTID of a full binlog_row_image write of a sequence table. This forces the slave to execute the statement serially so a deadlock cannot happen. A test verifies the deadlock also to prove it happen on the OLD (pre-fixes) slave. OLD (buggy master) -replication-> NEW (fixed slave) is provided. As the pre-fixes master's full row-image may represent both SELECT NEXT VALUE and INSERT, the parallel slave pessimistically waits for the prior transaction to have committed before to take on the critical part of the second (like INSERT in the test) event execution. The waiting exploits a parallel slave's retry mechanism which is controlled by `@@global.slave_transaction_retries`. Note that in order to avoid any persistent 'Deadlock found' 2013 error in OLD -> NEW, `slave_transaction_retries` may need to be set to a higher than the default value. START-SLAVE is an effective work-around if this still happens.	2023-04-27 21:55:45 +03:00
Oleksandr Byelkin	a977054ee0	Merge branch '10.3' into 10.4	2023-01-28 18:22:55 +01:00
Brandon Nesterenko	d69e835787	MDEV-29639: Seconds_Behind_Master is incorrect for Delayed, Parallel Replicas Problem ======== On a parallel, delayed replica, Seconds_Behind_Master will not be calculated until after MASTER_DELAY seconds have passed and the event has finished executing, resulting in potentially very large values of Seconds_Behind_Master (which could be much larger than the MASTER_DELAY parameter) for the entire duration the event is delayed. This contradicts the documented MASTER_DELAY behavior, which specifies how many seconds to withhold replicated events from execution. Solution ======== After a parallel replica idles, the first event after idling should immediately update last_master_timestamp with the time that it began execution on the primary. Reviewed By =========== Andrei Elkin <andrei.elkin@mariadb.com>	2023-01-24 08:11:35 -07:00
Oleksandr Byelkin	c07325f932	Merge branch '10.3' into 10.4	2019-05-19 20:55:37 +02:00
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	cb248f8806	Merge branch '5.5' into 10.1	2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru	5543b75550	Update FSF Address * Update wrong zip-code	2019-05-11 21:29:06 +03:00
Sergey Vojtovich	88d89ee0ba	Less abort_loop references Removed redundant initialisation in unireg_init(): already done by mysql_init_variables(). Slave threads already check THD::killed, which eliminates the need to check abort_loop. Removed unused wsrep_kill_mysql().	2019-03-09 20:22:24 +04:00
Kristian Nielsen	34f11b06e6	Move deletion of old GTID rows to slave background thread This patch changes how old rows in mysql.gtid_slave_pos* tables are deleted. Instead of doing it as part of every replicated transaction in record_gtid(), it is done periodically (every @@gtid_cleanup_batch_size transaction) in the slave background thread. This removes the deletion step from the replication process in SQL or worker threads, which could speed up replication with many small transactions. It also decreases contention on the global mutex LOCK_slave_state. And it simplifies the logic, eg. when a replicated transaction fails after having deleted old rows. With this patch, the deletion of old GTID rows happens asynchroneously and slightly non-deterministic. Thus the number of old rows in mysql.gtid_slave_pos can temporarily exceed @@gtid_cleanup_batch_size. But all old rows will be deleted eventually after sufficiently many new GTIDs have been replicated.	2018-12-07 07:10:40 +01:00
Marko Mäkelä	145ae15a33	Merge bb-10.2-ext into 10.3	2018-01-04 09:22:59 +02:00
Monty	fbab79c9b8	Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext Conflicts: cmake/make_dist.cmake.in mysql-test/r/func_json.result mysql-test/r/ps.result mysql-test/t/func_json.test mysql-test/t/ps.test sql/item_cmpfunc.h	2018-01-01 19:39:59 +02:00
Monty	6e7ca6b0b2	MDEV-13719 Assertion `bit < (map)->n_bits' failed in bitmap_is_set Problem was that MAX_SLAVE_ERROR didn't cover all possible errors.	2017-12-23 14:53:12 +02:00
Monty	b016e1ba7f	MDEV-7702 Spiral patch 004_mariadb-10.0.15.slave-trx-retry.diff This is about adding more options to force slave retries Two new variables has been added: slave_transaction_retry_errors - Tells the slave thread to retry transaction for replication when a query event returns an error from the provided list. Deadlock and elapsed lock wait timeout errors are automatically added to this list slave-transaction-retry-interval - Interval of the slave SQL thread will retry a transaction in case it failed with a deadlock or elapsed lock wait timeout or listed in slave_transaction_retry_errors Other changes: - Simplifed code for slave_skip_errors (to be aligned with slave_transaction_retry_errors) - Renamed print_slave_skip_errors() to make_slave_skip_errors_printable() - Remove printing error from init_slave_skip_errors as my_bitmap_init() will do that if needed. - Generalize has_temporary_error()	2017-12-03 13:58:35 +02:00
Kristian Nielsen	1d91910b94	MDEV-12179: Per-engine mysql.gtid_slave_pos table Merge into MariaDB 10.3.	2017-07-03 09:33:41 +02:00
Monty	5a759d31f7	Changing field::field_name and Item::name to LEX_CSTRING Benefits of this patch: - Removed a lot of calls to strlen(), especially for field_string - Strings generated by parser are now const strings, less chance of accidently changing a string - Removed a lot of calls with LEX_STRING as parameter (changed to pointer) - More uniform code - Item::name_length was not kept up to date. Now fixed - Several bugs found and fixed (Access to null pointers, access of freed memory, wrong arguments to printf like functions) - Removed a lot of casts from (const char) to (char) Changes: - This caused some ABI changes - lex_string_set now uses LEX_CSTRING - Some fucntions are now taking const char* instead of char* - Create_field::change and after changed to LEX_CSTRING - handler::connect_string, comment and engine_name() changed to LEX_CSTRING - Checked printf() related calls to find bugs. Found and fixed several errors in old code. - A lot of changes from LEX_STRING to LEX_CSTRING, especially related to parsing and events. - Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING* - Some changes for char* to const char* - Added printf argument checking for my_snprintf() - Introduced null_clex_str, star_clex_string, temp_lex_str to simplify code - Added item_empty_name and item_used_name to be able to distingush between items that was given an empty name and items that was not given a name This is used in sql_yacc.yy to know when to give an item a name. - select table_name."' is not anymore same as table_name. - removed not used function Item::rename() - Added comparision of item->name_length before some calls to my_strcasecmp() to speed up comparison - Moved Item_sp_variable::make_field() from item.h to item.cc - Some minimal code changes to avoid copying to const char * - Fixed wrong error message in wsrep_mysql_parse() - Fixed wrong code in find_field_in_natural_join() where real_item() was set when it shouldn't - ER_ERROR_ON_RENAME was used with extra arguments. - Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already give the error. TODO: - Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c - Change code to not modify LEX_CSTRING for database name (as part of lower_case_table_names)	2017-04-23 22:35:46 +03:00
Kristian Nielsen	fdf2d40770	MDEV-12179: Per-engine mysql.gtid_slave_pos table Intermediate commit. Implement auto-creation of mysql.gtid_slave_pos* tables with needed engines, if listed in --gtid-pos-auto-engines. Uses an asynchronous approach to minimise locking overhead. The list of available tables is extended with a flag. Extra entries are added for --gtid-pos-auto-engines tables that do not exist yet, marked as not existing but ready for auto-creation. If record_gtid() needs a table marked for auto-creation, it sends a request to the slave background thread to create the table, and continues to use an existing table for the current and immediately coming transactions. As soon as the slave background thread has made the new table available, it will be used for all subsequent relevant transactions in record_gtid(). This asynchronous approach also avoids a lot of complex issues around trying to do DDL in the middle of an on-going transaction.	2017-04-21 10:30:16 +02:00
Sergei Golubchik	da4d71d10d	Merge branch '10.1' into 10.2	2017-03-30 12:48:42 +02:00
Marko Mäkelä	adc91387e3	Merge 10.0 into 10.1	2017-03-03 13:27:12 +02:00
Monty	c5e25c8b40	Added a separate lock for start/stop/reset slave. This solves some possible dead locks when one calls stop slave while slave is starting.	2017-02-28 16:10:46 +01:00
Monty	e65f667bb6	MDEV-9573 'Stop slave' hangs on replication slave The reason for this is that stop slave takes LOCK_active_mi over the whole operation while some slave operations will also need LOCK_active_mi which causes deadlocks. Fixed by introducing object counting for Master_info and not taking LOCK_active_mi over stop slave or even stop_all_slaves() Another benefit of this approach is that it allows: - Multiple threads can run SHOW SLAVE STATUS at the same time - START/STOP/RESET/SLAVE STATUS on a slave will not block other slaves - Simpler interface for handling get_master_info() - Added some missing unlock of 'log_lock' in error condtions - Moved rpl_parallel_inactivate_pool(&global_rpl_thread_pool) to end of stop_slave() to not have to use LOCK_active_mi inside terminate_slave_threads() - Changed argument for remove_master_info() to Master_info, as we always have this available - Fixed core dump when doing FLUSH TABLES WITH READ LOCK and parallel replication. Problem was that waiting for pause_for_ftwrl was not done when deleting rpt->current_owner after a force_abort.	2017-02-28 16:10:46 +01:00
Nirbhay Choubey	3435e8a515	MDEV-7635: Part 1 innodb_autoinc_lock_mode = 2 innodb_buffer_pool_dump_at_shutdown = ON innodb_buffer_pool_dump_pct = 25 innodb_buffer_pool_load_at_startup = ON innodb_checksum_algorithm = CRC32 innodb_file_format = Barracuda innodb_large_prefix = ON innodb_log_compressed_pages = ON innodb_purge_threads = 4 innodb_strict_mode = ON binlog_annotate_row_events = ON binlog_format = MIXED binlog-row-event-max-size = 8192 group_concat_max_len = 1M lock_wait_timeout = 86400 log_slow_admin_statements = ON log_slow_slave_statements = ON log_warnings = 2 max_allowed_packet = 16M replicate_annotate_row_events = ON slave_net_timeout = 60 sync_binlog = 1 aria_recover = BACKUP,QUICK myisam_recover_options = BACKUP,QUICK	2017-02-10 06:30:42 -05:00
vinchen	8eb0f5ca1a	Control the Maximum speed(KB/s) to read binlog from master	2016-10-19 13:51:08 +02:00
Kristian Nielsen	3011060b2a	MDEV-7145: Delayed slave. Extend to work also for parallel replication. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2016-10-14 23:15:59 +02:00
Kristian Nielsen	b2bc6dadee	MDEV-7145: Delayed replication, cleanup some code The original MySQL patch left some refactoring todo's, possibly because of known conflicts with other parallel development (like info-repository feature perhaps). This patch fixes those todos/refactorings. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2016-10-14 23:15:59 +02:00
Kristian Nielsen	19abe79fd1	MDEV-7145: Delayed replication, intermediate commit. Initial merge of delayed replication from MySQL git. The code from the initial push into MySQL is merged, and the associated test case passes. A number of tasks are still pending: 1. Check full test suite run for any regressions or .result file updates. 2. Extend the feature to also work for parallel replication. 3. There are some todo-comments about future refactoring left from MySQL, these should be located and merged on top. 4. There are some later related MySQL commits, these should be checked and merged. These include: e134b9362ba0b750d6ac1b444780019622d14aa5 b38f0f7857c073edfcc0a64675b7f7ede04be00f fd2b210383358fe7697f201e19ac9779879ba72a afc397376ec50e96b2918ee64e48baf4dda0d37d 5. The testcase from MySQL relies heavily on sleep and timing for testing, and seems likely to sporadically fail on heavily loaded test servers in buildbot or distro build farms. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2016-10-14 23:15:58 +02:00
Kristian Nielsen	50f19ca809	Remove unnecessary global mutex in parallel replication. The function apply_event_and_update_pos() is called with the rli->data_lock mutex held. However, there seems to be nothing in the function actually needing the mutex to be held. Certainly not in the parallel replication case, where sql_slave_skip_counter is always 0 since the non-zero case is handled by the SQL driver thread. So this patch makes parallel replication use a variant of apply_event_and_update_pos() without the need to take the rli->data_lock mutex. This avoids one contended global mutex for each event executed, which might improve performance on CPU-bound workloads somewhat. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2016-10-14 22:44:40 +02:00
Kristian Nielsen	7e0c9de864	Parallel replication async deadlock kill When a deadlock kill is detected inside the storage engine, the kill is not done immediately, to avoid calling back into the storage engine kill_query method with various lock subsystem mutexes held. Instead the kill is queued and done later by a slave background thread. This patch in preparation for fixing TokuDB optimistic parallel replication, as well as for removing locking hacks in InnoDB/XtraDB in 10.2. Signed-off-by: Kristian Nielsen <knielsen at knielsen-hq.org>	2016-09-08 15:25:40 +02:00
Sergei Golubchik	3361aee591	Merge branch '10.0' into 10.1	2016-06-28 22:01:55 +02:00
Sergei Golubchik	c081c978a2	Merge branch '5.5' into bb-10.0	2016-06-21 14:11:02 +02:00
Sergei Golubchik	ae29ea2d86	Merge branch 'mysql/5.5' into 5.5	2016-06-14 13:55:28 +02:00
Sujatha Sivakumar	ef3f09f0c9	Bug#23251517: SEMISYNC REPLICATION HANGING Revert following bug fix: Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS FULL Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO THREAD WAITS FOR DISK SPACE This fix results in a deadlock between slave IO thread and SQL thread. (cherry picked from commit e3fea6c6dbb36c6ab21c4ab777224560e9608b53)	2016-05-16 11:34:20 +02:00
Sujatha Sivakumar	8361151765	Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS FULL Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO THREAD WAITS FOR DISK SPACE Problem: ======== Currently SHOW SLAVE STATUS blocks if IO thread waits for disk space. This makes automation tools verifying server health block on taking relevant action. Finally this will create SHOW SLAVE STATUS piles. Analysis: ========= SHOW SLAVE STATUS hangs on mi->data_lock if relay log write is waiting for free disk space while holding mi->data_lock. mi->data_lock is needed to protect the format description event (mi->format_description_event) which is accessed by the clients running FLUSH LOGS and slave IO thread. Note relay log writes don't need to be protected by mi->data_lock, LOCK_log is used to protect relay log between IO and SQL thread (see MYSQL_BIN_LOG::append_event). The code takes mi->data_lock to protect mi->format_description_event during relay log rotate which might get triggered right after relay log write. Fix: ==== Release the data_lock just for the duration of writing into relay log. Made change to ensure the following lock order is maintained to avoid deadlocks. data_lock, LOCK_log data_lock is held during relay log rotations to protect the description event.	2016-03-01 12:29:51 +05:30
Alexey Botchkov	efb36ac5d5	MDEV-5273 Prepared statement doesn't return metadata after prepare. SHOW MASTER STATUS fixed.	2016-01-28 11:12:03 +04:00
Alexey Botchkov	75a1d866dd	MDEV-5273 Prepared statement doesn't return metadata after prepare. SHOW SLAVE STATUS fixed.	2016-01-28 11:12:03 +04:00
Sergei Golubchik	f4faac4d6a	Merge branch '10.0' into 10.1	2016-01-25 22:58:57 +01:00
Monty	8fcc0bfefa	Fixed bug in semi_sync replication tests. The problem was that wait_for_slave_io_to_start reported that the io thread was ready, when it was still initializing. This caused test suite to continue too early, for example before the semi sync plugin was properly enabled. Fixed by introducing a new internal stage: "Preparing". Slave_IO_Running is now set to "Yes" only when all initializing is done and the IO thread is ready to read things from the master. The only test affected by this change is rpl_flsh_tbls, which got stuck in the preparing phase while trying to read the GTID position from a table. Fixed by having this test waiting for Preparing instead of Yes.	2016-01-03 13:27:59 +02:00
Monty	872a953b22	MDEV-8469 Add RESET MASTER TO x to allow specification of binlog file nr Other things: - Avoid calling init_and_set_log_file_name() when opening binary log. - Remove newlines early when reading from index file. - Ensure that reset_logs() will work even if thd is 0 (Can happen on startup) - Added thd to sart_slave_threads() for better error handling.	2015-07-16 10:36:58 +03:00
Nirbhay Choubey	f965cae5fb	MDEV-7110 : Add missing MySQL variable log_bin_basename and log_bin_index Add log_bin_index, log_bin_basename and relay_log_basename system variables. Also, convert relay_log_index system variable to NO_CMD_LINE and implement --relay-log-index as a command line option.	2015-06-09 13:38:29 -04:00
Daniel Black	eac71ced18	Add Slave_skipped_errors to global status This counts the number of times a replication event is ignored due to slave_skip_errors.	2015-03-12 05:23:05 +11:00
Kristian Nielsen	8a3e2f29bb	MDEV-6718: Server crashed in Gtid_log_event::Gtid_log_event with parallel replication The bug occured in parallel replication when re-trying transactions that failed due to deadlock. In this case, the relay log file is re-opened and the events are read out again. This reading requires a format description event of the appropriate version. But the code was using a description event stored in rli, which is not thread-safe. This could lead to various rare races if the format description event was replaced by the SQL driver thread at the exact moment where a worker thread was trying to use it. The fix is to instead make the retry code create and maintain its own format description event. When the relay log file is opened, we first read the format description event from the start of the file, before seeking to the current position. This now uses the same code as when the SQL driver threads starts from a given relay log position. This also makes sure that the correct format description event version will be used in cases where the version of the binlog could change during replication.	2014-11-13 10:09:46 +01:00
Sergei Golubchik	06d6552192	5.5 merge	2014-09-23 23:55:29 +02:00
Sergei Golubchik	34f3aa9915	remove unused (obsolete) declarations from slave.h	2014-09-19 09:21:51 +02:00
Kristian Nielsen	98fc5b3af8	MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel replication causing replication to fail. After-review changes. For this patch in 10.0, we do not introduce a new public storage engine API, we just fix the InnoDB/XtraDB issues. In 10.1, we will make a better public API that can be used for all storage engines (MDEV-6429). Eliminate the background thread that did deadlock kills asynchroneously. Instead, we ensure that the InnoDB/XtraDB code can handle doing the kill from inside the deadlock detection code (when thd_report_wait_for() needs to kill a later thread to resolve a deadlock). (We preserve the part of the original patch that introduces dedicated mutex and condition for the slave init thread, to remove the abuse of LOCK_thread_count for start/stop synchronisation of the slave init thread).	2014-07-08 12:54:47 +02:00
unknown	bd4153a8c2	MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel replication causing replication to fail. Remove the temporary fix for MDEV-5914, which used READ COMMITTED for parallel replication worker threads. Replace it with a better, more selective solution. The issue is with certain edge cases of InnoDB gap locks, for example between INSERT and ranged DELETE. It is possible for the gap lock set by the DELETE to block the INSERT, if the DELETE runs first, while the record lock set by INSERT does not block the DELETE, if the INSERT runs first. This can cause a conflict between the two in parallel replication on the slave even though they ran without conflicts on the master. With this patch, InnoDB will ask the server layer about the two involved transactions before blocking on a gap lock. If the server layer tells InnoDB that the transactions are already fixed wrt. commit order, as they are in parallel replication, InnoDB will ignore the gap lock and allow the two transactions to proceed in parallel, avoiding the conflict. Improve the fix for MDEV-6020. When InnoDB itself detects a deadlock, it now asks the server layer for any preferences about which transaction to roll back. In case of parallel replication with two transactions T1 and T2 fixed to commit T1 before T2, the server layer will ask InnoDB to roll back T2 as the deadlock victim, not T1. This helps in some cases to avoid excessive deadlock rollback, as T2 will in any case need to wait for T1 to complete before it can itself commit. Also some misc. fixes found during development and testing: - Remove thd_rpl_is_parallel(), it is not used or needed. - Use KILL_CONNECTION instead of KILL_QUERY when a parallel replication worker thread is killed to resolve a deadlock with fixed commit ordering. There are some cases, eg. in sql/sql_parse.cc, where a KILL_QUERY can be ignored if the query otherwise completed successfully, and this could cause the deadlock kill to be lost, so that the deadlock was not correctly resolved. - Fix random test failure due to missing wait_for_binlog_checkpoint.inc. - Make sure that deadlock or other temporary errors during parallel replication are not printed to the the error log; there were some places around the replication code with extra error logging. These conditions can occur occasionally and are handled automatically without breaking replication, so they should not pollute the error log. - Fix handling of rgi->gtid_sub_id. We need to be able to access this also at the end of a transaction, to be able to detect and resolve deadlocks due to commit ordering. But this value was also used as a flag to mark whether record_gtid() had been called, by being set to zero, losing the value. Now, introduce a separate flag rgi->gtid_pending, so rgi->gtid_sub_id remains valid for the entire duration of the transaction. - Fix one place where the code to handle ignored errors called reset_killed() unconditionally, even if no error was caught that should be ignored. This could cause loss of a deadlock kill signal, breaking deadlock detection and resolution. - Fix a couple of missing mysql_reset_thd_for_next_command(). This could cause a prior error condition to remain for the next event executed, causing assertions about errors already being set and possibly giving incorrect error handling for following event executions. - Fix code that cleared thd->rgi_slave in the parallel replication worker threads after each event execution; this caused the deadlock detection and handling code to not be able to correctly process the associated transactions as belonging to replication worker threads. - Remove useless error code in slave_background_kill_request(). - Fix bug where wfc->wakeup_error was not cleared at wait_for_commit::unregister_wait_for_prior_commit(). This could cause the error condition to wrongly propagate to a later wait_for_prior_commit(), causing spurious ER_PRIOR_COMMIT_FAILED errors. - Do not put the binlog background thread into the processlist. It causes too many result differences in mtr, but also it probably is not useful for users to pollute the process list with a system thread that does not really perform any user-visible tasks...	2014-06-10 10:13:15 +02:00
unknown	629b822913	MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel replication causing replication to fail. In parallel replication, we run transactions from the master in parallel, but force them to commit in the same order they did on the master. If we force T1 to commit before T2, but T2 holds eg. a row lock that is needed by T1, we get a deadlock when T2 waits until T1 has committed. Usually, we do not run T1 and T2 in parallel if there is a chance that they can have conflicting locks like this, but there are certain edge cases where it can occasionally happen (eg. MDEV-5914, MDEV-5941, MDEV-6020). The bug was that this would cause replication to hang, eventually getting a lock timeout and causing the slave to stop with error. With this patch, InnoDB will report back to the upper layer whenever a transactions T1 is about to do a lock wait on T2. If T1 and T2 are parallel replication transactions, and T2 needs to commit later than T1, we can thus detect the deadlock; we then kill T2, setting a flag that causes it to catch the kill and convert it to a deadlock error; this error will then cause T2 to roll back and release its locks (so that T1 can commit), and later T2 will be re-tried and eventually also committed. The kill happens asynchroneously in a slave background thread; this is necessary, as the reporting from InnoDB about lock waits happen deep inside the locking code, at a point where it is not possible to directly call THD::awake() due to mutexes held. Deadlock is assumed to be (very) rarely occuring, so this patch tries to minimise the performance impact on the normal case where no deadlocks occur, rather than optimise the handling of the occasional deadlock. Also fix transaction retry due to deadlock when it happens after a transaction already signalled to later transactions that it started to commit. In this case we need to undo this signalling (and later redo it when we commit again during retry), so following transactions will not start too early. Also add a missing thd->send_kill_message() that got triggered during testing (this corrects an incorrect fix for MySQL Bug#58933).	2014-06-03 10:31:11 +02:00
unknown	b0b60f2498	MDEV-5262: Missing retry after temp error in parallel replication Start implementing that an event group can be re-tried in parallel replication if it fails with a temporary error (like deadlock). Patch is very incomplete, just some very basic retry works. Stuff still missing (not complete list): - Handle moving to the next relay log file, if event group to be retried spans multiple relay log files. - Handle refcounting of relay log files, to ensure that we do not purge a relay log file and then later attempt to re-execute events out of it. - Handle description_event_for_exec - we need to save this somehow for the possible retry - and use the correct one in case it differs between relay logs. - Do another retry attempt in case the first retry also fails. - Limit the max number of retries. - Lots of testing will be needed for the various edge cases.	2014-05-08 14:20:18 +02:00
Kristian Nielsen	86362129a2	MDEV-6120: When slave stops with error, error message should indicate the failing GTID If replication breaks in GTID mode, it is not trivial to determine the GTID of the failing event group. This is a problem, as such GTID is needed eg. to explicitly set @@gtid_slave_pos to skip to after that event group, or to compare errors on different servers, etc. Fix by ensuring that relevant slave errors logged to the error log include the GTID of the event group containing the problem event.	2014-06-25 15:17:03 +02:00
unknown	010971a761	MDEV-6156: Parallel replication incorrectly caches charset between worker threads Replication caches the character sets used in a query, to be able to quickly reuse them for the next query in the common case of them not having changed. In parallel replication, this caching needs to be per-worker-thread. The code was not modified to handle this correctly, so the caching in one worker could cause another worker to run a query using the wrong character set, causing replication corruption.	2014-04-23 16:06:06 +02:00

1 2 3 4 5 ...

502 commits