mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 20:12:31 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	dad7a8ee7d	Merge 10.2 into 10.3	2020-05-27 17:10:39 +03:00
Andrei Elkin	0c1f97b3ab	MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTIL The immediate bug was caused by a failure to recognize a correct position to stop the slave applier run in optimistic parallel mode. There were the following set of issues that the analysis unveil. 1 incorrect estimate for the event binlog position passed to is_until_satisfied 2 wait for workers to complete by the driver thread did not account non-group events that could be left unprocessed and thus to mix up the last executed binlog group's file and position: the file remained old and the position related to the new rotated file 3 incorrect 'slave reached file:pos' by the parallel slave report in the error log 4 relay log UNTIL missed out the parallel slave branch in is_until_satisfied. The patch addresses all of them to simplify logics of log change notification in either the master and relay-log until case. P.1 is addressed with passing the event into is_until_satisfied() for proper analisis by the function. P.2 is fixed by changes in handle_queued_pos_update(). P.4 required removing relay-log change notification by workers. Instead the driver thread updates the notion of the current relay-log fully itself with aid of introduced bool Relay_log_info::until_relay_log_names_defer. An extra print out of the requested until file:pos is arranged with --log-warning=3.	2020-05-26 18:49:43 +03:00
Andrei Elkin	dbe447a789	MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTIL The immediate bug was caused by a failure to recognize a correct position to stop the slave applier run in optimistic parallel mode. There were the following set of issues that the analysis unveil. 1 incorrect estimate for the event binlog position passed to is_until_satisfied 2 wait for workers to complete by the driver thread did not account non-group events that could be left unprocessed and thus to mix up the last executed binlog group's file and position: the file remained old and the position related to the new rotated file 3 incorrect 'slave reached file:pos' by the parallel slave report in the error log 4 relay log UNTIL missed out the parallel slave branch in is_until_satisfied. The patch addresses all of them to simplify logics of log change notification in either the master and relay-log until case. P.1 is addressed with passing the event into is_until_satisfied() for proper analisis by the function. P.2 is fixed by changes in handle_queued_pos_update(). P.4 required removing relay-log change notification by workers. Instead the driver thread updates the notion of the current relay-log fully itself with aid of introduced bool Relay_log_info::until_relay_log_names_defer. An extra print out of the requested until file:pos is arranged with --log-warning=3.	2020-05-26 18:26:50 +03:00
Marko Mäkelä	fbe2712705	Merge 10.4 into 10.5 The functional changes of commit `5836191c8f` (MDEV-21168) are omitted due to MDEV-742 having addressed the issue.	2020-04-25 21:57:52 +03:00
Sergey Vojtovich	5876ed9e5b	Relay_log_info::executed_entries to Atomic_counter	2020-04-15 18:36:07 +04:00
Marko Mäkelä	53aabda6b5	Merge 10.4 into 10.5	2020-03-27 09:39:15 +02:00
Sergey Vojtovich	1c8de231a3	dequeued_count my_atomic to Atomic_counter Also allocate inuse_relaylog with new rather than my_malloc(MY_ZEROFILL).	2020-03-25 23:49:38 +04:00
Otto Kekäläinen	c8388de2fd	Fix various spelling errors e.g. - dont -> don't - occurence -> occurrence - succesfully -> successfully - easyly -> easily Also remove trailing space in selected files. These changes span: - server core - Connect and Innobase storage engine code - OQgraph, Sphinx and TokuDB storage engines Related to MDEV-21769.	2020-03-16 00:10:50 +02:00
Andrei Elkin	c8ae357341	MDEV-742 XA PREPAREd transaction survive disconnect/server restart Lifted long standing limitation to the XA of rolling it back at the transaction's connection close even if the XA is prepared. Prepared XA-transaction is made to sustain connection close or server restart. The patch consists of - binary logging extension to write prepared XA part of transaction signified with its XID in a new XA_prepare_log_event. The concusion part - with Commit or Rollback decision - is logged separately as Query_log_event. That is in the binlog the XA consists of two separate group of events. That makes the whole XA possibly interweaving in binlog with other XA:s or regular transaction but with no harm to replication and data consistency. Gtid_log_event receives two more flags to identify which of the two XA phases of the transaction it represents. With either flag set also XID info is added to the event. When binlog is ON on the server XID::formatID is constrained to 4 bytes. - engines are made aware of the server policy to keep up user prepared XA:s so they (Innodb, rocksdb) don't roll them back anymore at their disconnect methods. - slave applier is refined to cope with two phase logged XA:s including parallel modes of execution. This patch does not address crash-safe logging of the new events which is being addressed by MDEV-21469. CORNER CASES: read-only, pure myisam, binlog-, @@skip_log_bin, etc Are addressed along the following policies. 1. The read-only at reconnect marks XID to fail for future completion with ER_XA_RBROLLBACK. 2. binlog- filtered XA when it changes engine data is regarded as loggable even when nothing got cached for binlog. An empty XA-prepare group is recorded. Consequent Commit-or-Rollback succeeds in the Engine(s) as well as recorded into binlog. 3. The same applies to the non-transactional engine XA. 4. @@skip_log_bin=OFF does not record anything at XA-prepare (obviously), but the completion event is recorded into binlog to admit inconsistency with slave. The following actions are taken by the patch. At XA-prepare: when empty binlog cache - don't do anything to binlog if RO, otherwise write empty XA_prepare (assert(binlog-filter case)). At Disconnect: when Prepared && RO (=> no binlogging was done) set Xid_cache_element::error := ER_XA_RBROLLBACK keep XID in the cache, and rollback the transaction. At XA-"complete": Discover the error, if any don't binlog the "complete", return the error to the user. Kudos ----- Alexey Botchkov took to drive this work initially. Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of good recommendations. Sergei Voitovich made a magnificent review and improvements to the code. They all deserve a bunch of thanks for making this work done!	2020-03-14 22:45:48 +02:00
Sergei Golubchik	c1c5222cae	cleanup: PSI key is always the first argument	2020-03-10 19:24:23 +01:00
Sergei Golubchik	7c58e97bf6	perfschema memory related instrumentation changes	2020-03-10 19:24:22 +01:00
seppo	421d52e896	MDEV-6860 Parallel async replication hangs (#1400 ) Instrumenting parallel slave worker thread with wsrep replication hooks. Added mtr test for testing parallel slave support. The test is based on the test attached in MDEV-6860 jira tracker.	2019-10-16 07:51:36 +03:00
Alexander Barkov	dc588e3d3f	Merge remote-tracking branch 'origin/10.3' into 10.4	2019-10-01 10:45:52 +04:00
Alexander Barkov	7e44c455f4	Merge remote-tracking branch 'origin/10.2' into 10.3	2019-10-01 09:37:40 +04:00
Alexander Barkov	f203245e9e	Merge remote-tracking branch 'origin/10.1' into 10.2	2019-10-01 07:11:54 +04:00
Sujatha	9b80f9300d	MDEV-20645: Replication consistency is broken as workers miss the error notification from an earlier failed group. Analysis: ======== In general if there are three groups. 1 - Inserts 32 which fails due to local entry '32' on slave. 2 - Inserts 33 3 - Inserts 34 Each group considers itself as a waiter and it waits for prior group 'waitee'. This is done in 'register_wait_for_prior_event_group_commit'. If there is no other parallel group being scheduled then no waitee will be there. Let us assume 3 groups are being scheduled in parallel. 3-> waits for 2-> waits for->1 '1' upon completion it checks is there any registered subsequent waiter. If so it wakes up the subsequent waiter with its execution status. This execution status is stored in wakeup_error. If '1' failed then it sends corresponding wakeup_error to 2. Then '2' aborts and it propagates error to '3'. So all further commits are aborted. This mechanism works only when all transactions reach a stage where they are waiting for their prior commit to complete. In case of optimistic following scenario occurs. 1,2,3 are scheduled in parallel. 3 - Reaches group_commit_code waits for 2 to complete. 1 - errors out sets stop_on_error_sub_id=1. When a group execution results in error its corresponding sub_id is set to 'stop_on_error_sub_id'. Any new groups queued for execution will check if their sub_id is > stop_on_error_sub_id. If it is true their execution will be skipped as prior group execution failed. 'skip_event_group=1' will be set. Since the execution of SQL thread is about to stop we just skip execution of all the following event groups. We still do all the normal waiting and wakeup processing between the event groups as a simple way to ensure that everything is stopped and cleaned up correctly. Upon error '1' transaction checks for registered waiters. Since no one is there it simply goes away. 2 - Starts the execution. It checks do I have a waitee. Since wait_commit_sub_id == entry->last_committed_sub_id no waitee is set. Secondly: 'entry->stop_on_error_sub_id' is set by '1'st execution. Now 'handle_parallel_thread' code checks if the current group 'sub_id' is greater than the 'sub_id' set within 'stop_on_error_sub_id'. Since the above is true 'skip_event_group=true' is set. Simply call 'wait_for_prior_commit' to wakeup all waiters. Group '2' didn't had any waitee and its execution is skipped. Hence its wakeup_error=0.It sends a positive wakeup signal to '3'. Which commits. This results in a missed transaction. i.e 33 is missed and 34 is committed. Fix: === When a worker learns that an earlier transaction execution has failed, and it should not proceed for further execution, it should mark its own execution status as failed so that it alerts its followers to abort as well.	2019-09-30 13:22:37 +05:30
Sergey Vojtovich	3503fbbebf	Move THD list handling to THD_list Implemented and integrated THD_list as a replacement for the global thread list. It uses own mutex instead of LOCK_thread_count for THD list protection. Removed unused first_global_thread() and next_global_thread(). delayed_insert_threads is now protected by LOCK_delayed_insert. Although this patch doesn't fix very wrong synchronization of this variable. After this patch there are only 2 legitimate uses of LOCK_thread_count left, both in mysqld.cc: thread_count and ready_to_exit. Aim is to reduce usage of LOCK_thread_count and COND_thread_count. Part of MDEV-15135.	2019-01-28 17:39:07 +04:00
Marko Mäkelä	ae9d82c9f8	Merge 10.2 into 10.3	2018-10-11 08:22:08 +03:00
Marko Mäkelä	07815d9555	Merge 10.1 into 10.2	2018-10-11 08:16:08 +03:00
Andrei Elkin	f517d8c742	MDEV-17346 parallel slave start and stop races to workers disappeared The bug appears as a slave SQL thread hanging in rpl_parallel_thread_pool::get_thread() while there are no slave worker threads to awake it. The reason of the hang is that at the parallel slave worker pool activation the being stared SQL thread could read the worker pool size concurrently with pool deactivation. At reading the SQL thread did not employ necessary protection from a race. Fixed with making the SQL thread at the pool activation first to grab the same lock as potential deactivator also does prior to access the pool size.	2018-10-08 19:46:34 +03:00
Monty	58721c3e38	MDEV-16286 Killed CREATE SEQUENCE leaves sequence in unusable state Fixed by deleting the sequence if we where not able to initialize it I also noticed that we didn't always set the error message when check_killed(), which could lead to aborted queries without error beeing properly set. Fixed by default setting error message if check_error() noticed that killed had been called. This allowed me to remove a lot of calls to thd->send_kill_message().	2018-05-27 19:47:17 +03:00
Thirunarayanan Balathandayuthapani	85cc6b70bd	MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT Introduced new alter algorithm type called NOCOPY & INSTANT for inplace alter operation. NOCOPY - Algorithm refuses any alter operation that would rebuild the clustered index. It is a subset of INPLACE algorithm. INSTANT - Algorithm allow any alter operation that would modify only meta data. It is a subset of NOCOPY algorithm. Introduce new variable called alter_algorithm. The values are DEFAULT(0), COPY(1), INPLACE(2), NOCOPY(3), INSTANT(4) Message to deprecate old_alter_table variable and make it alias for alter_algorithm variable. alter_algorithm variable for slave is always set to default.	2018-05-07 14:58:11 +05:30
Monty	30ebc3ee9e	Add likely/unlikely to speed up execution Added to: - if (error) - Lex - sql_yacc.yy and sql_yacc_ora.yy - In header files to alloc() calls - Added thd argument to thd_net_is_killed()	2018-05-07 00:07:32 +03:00
Sergei Golubchik	b1818dccf7	Merge branch '10.2' into 10.3	2018-03-28 17:31:57 +02:00
Andrei Elkin	30019a48bf	MDEV-12746 rpl.rpl_parallel_optimistic_nobinlog fails committing out of order at retry The test failures were of two sorts. One is that the number of retries what the slave thought as a temporary error exceeded the default value of the slave retry option. The 2nd issue was an out of order commit by transactions that were supposed to error out instead. Both issues are caused by the same reason that the post-temporary-error retry did not check possibly already existing error status. This is mended with refining conditions to retry. Specifically, a retrying worker checks `rpl_parallel_entry::stop_on_error_sub_id` that a potential failing predecessor could set to its own sub id. Now should the member be set the retrying follower errors out with ER_PRIOR_COMMIT_FAILED.	2018-03-13 12:46:07 +02:00
Monty	a7e352b54d	Changed database, tablename and alias to be LEX_CSTRING This was done in, among other things: - thd->db and thd->db_length - TABLE_LIST tablename, db, alias and schema_name - Audit plugin database name - lex->db - All db and table names in Alter_table_ctx - st_select_lex db Other things: - Changed a lot of functions to take const LEX_CSTRING* as argument for db, table_name and alias. See init_one_table() as an example. - Changed some function arguments from LEX_CSTRING to const LEX_CSTRING - Changed some lists from LEX_STRING to LEX_CSTRING - threads_mysql.result changed because process list_db wasn't always correctly updated - New append_identifier() function that takes LEX_CSTRING* as arguments - Added new element tmp_buff to Alter_table_ctx to separate temp name handling from temporary space - Ensure we store the length after my_casedn_str() of table/db names - Removed not used version of rename_table_in_stat_tables() - Changed Natural_join_column::table_name and db_name() to never return NULL (used for print) - thd->get_db() now returns db as a printable string (thd->db.str or "")	2018-01-30 21:33:55 +02:00
Alexander Barkov	c7a2f23a7b	Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3	2018-01-29 12:44:20 +04:00
Monty	7fc25cfbca	Fix for MDEV-12730 Assertion `count > 0' failed in rpl_parallel_thread_pool:: get_thread, rpl.rpl_parallel failed in buildbot The reason for this is that one thread can call rpl_parallel_resize_pool_if_no_slaves() while another thread calls at the same time rpl_parallel_activate_pool(). If rpl_parallel_active_pool() is called before rpl_parallel_resize_pool_if_no_slaves() has finished, pool->count will be set to 0 even if there exists active slave threads. Added a mutex lock in rpl_parallel_activate_pool() to protect against this scenario, which seams to fix this issue.	2018-01-24 23:33:21 +02:00
Monty	13770edbcb	Changed from using LOCK_log to LOCK_binlog_end_pos for binary log Part of MDEV-13073 AliSQL Optimize performance of semisync The idea it to use a dedicated lock detecting if there is new data in the master's binary log instead of the overused LOCK_log. Changes: - Use dedicated COND variables for the relay and binary log signaling. This was needed as we where the old 'update_cond' variable was used with different mutex's, which could cause deadlocks. - Relay log uses now COND_relay_log_updated and LOCK_log - Binary log uses now COND_bin_log_updated and LOCK_binlog_end_pos - Renamed signal_cnt to relay_signal_cnt (as we now have two signals) - Added some missing error handling in MYSQL_BIN_LOG::new_file_impl() - Reformatted some comments with old style - Renamed m_key_LOCK_binlog_end_pos to key_LOCK_binlog_end_pos - Changed 'signal_update()' to update_binlog_end_pos() which works for both relay and binary log	2017-12-18 13:43:37 +02:00
Monty	ea37c129f9	Removed not used lock argument from read_log_event	2017-12-18 13:43:36 +02:00
Marko Mäkelä	2c1067166d	Merge bb-10.2-ext into 10.3	2017-10-04 08:24:06 +03:00
Vladislav Vaintroub	7354dc6773	MDEV-13384 - misc Windows warnings fixed	2017-09-28 17:20:46 +00:00
Sergei Golubchik	bb8e99fdc3	Merge branch 'bb-10.2-ext' into 10.3	2017-08-26 00:34:43 +02:00
Monty	21518ab2e4	New option for slow logging (log_slow_disable_statements) This fixes MDEV-7742 and MDEV-8305 (Allow user to specify if stored procedures should be logged in the slow and general log) New functionality: - Added new variables log_slow_disable_statements and log_disable_statements that can be used to disable logging of certain queries to slow and general log. Currently supported options are 'admin', 'call', 'slave' and 'sp'. Defaults are as before. Only 'sp' (stored procedure statements) is disabled for slow and general_log. - Slow log to files now includes the following new information: - When logging stored procedure statements the name of stored procedure is logged. - Number of created tmp_tables, tmp_disk_tables and the space used by temporary tables. - When logging 'call', the logged status now contains the sum of all included statements. Before only 'time' was correct. - Added filsort_priority_queue as an option for log_slow_filter (this variable existed before, but was not exposed) - Added support for BIT types in my_getopt() Mapped some old variables to bitmaps (old variables can still be used) - Variable 'log_queries_not_using_indexes' is mapped to log_slow_filter='not_using_index' - Variable 'log_slow_slave_statements' is mapped to log_slow_disabled_statements='slave' - Variable 'log_slow_admin_statements' is mapped to log_slow_disabled_statements='admin' - All the above variables are changed to session variables from global variables Other things: - Simplified LOGGER::log_command. We don't need to check for super if OPTION_LOG_OFF is set as this flag can only be set if one is a super user. - Removed some setting of enable_slow_log as it's guaranteed to be set by mysql_parse() - mysql_admin_table() now sets thd->enable_slow_log - Added prepare_logs_for_admin_command() to reset thd->enable_slow_log if needed. - Added new functions to store, restore and add slow query status - Added new functions to store and restore query start time - Reorganized Sub_statement_state according to types - Added code in dispatch_command() to ensure that thd->reset_for_next_command() is always called for a query. - Added thd->last_sql_command to simplify checking of what was the type of the last command. Needed when logging to slow log as lex->sql_command may have changed before slow logging is called. - Moved QPLAN_TMP_... to where status for tmp tables are updated - Added new THD variable, affected_rows, to be able to correctly log number of affected rows to slow log.	2017-08-24 01:05:51 +02:00
Michael Widenius	f71bed08ca	Safety fix: lock binlog_end_pos before calling signal_update The mutex is needed to ensure that sql thread should not not miss the error signal.	2017-08-24 01:05:50 +02:00
Michael Widenius	4aaa38d26e	Enusure that my_global.h is included first - Added sql/mariadb.h file that should be included first by files in sql directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables that must be done before my_global.h is included) - Removed a lot of include my_global.h from include files - Removed include's of some files that my_global.h automatically includes - Removed duplicated include's of my_sys.h - Replaced include my_config.h with my_global.h	2017-08-24 01:05:44 +02:00
Sergei Golubchik	cb1e76e4de	Merge branch '10.1' into 10.2	2017-08-17 11:38:34 +02:00
Monty	74543698a7	MDEV-13179 main.errors fails with wrong errno The problem was that the introduction of max-thread-mem-used can cause an allocation error very early, even before mysql_parse() is called. As mysql_parse() calls thd->reset_for_next_command(), which called clear_error(), the error number was lost. Fixed by adding an option to have unique messages for each KILL signal and change max-thread-mem-used to use this new feature. This removes a lot of problems with the original approach, where one could get errors signaled silenty almost any time. ixed by moving clear_error() from reset_for_next_command() to do_command(), before any memory allocation for the thread. Related changes: - reset_for_next_command() now have an optional parameter if we should call clear_error() or not. By default it's called, but not anymore from dispatch_command() which was the original problem. - Added optional paramater to clear_error() to force calling of reset_diagnostics_area(). Before clear_error() only called reset_diagnostics_area() if there was no error, so we normally called reset_diagnostics_area() twice. - This change removed several duplicated calls to clear_error() when starting a query. - Reset max_mem_used on COM_QUIT, to protect against kill during quit. - Use fatal_error() instead of setting is_fatal_error (cleanup) - Set fatal_error if max_thead_mem_used is signaled. (Same logic we use for other places where we are out of resources)	2017-08-07 03:48:58 +03:00
Kristian Nielsen	1d91910b94	MDEV-12179: Per-engine mysql.gtid_slave_pos table Merge into MariaDB 10.3.	2017-07-03 09:33:41 +02:00
Marko Mäkelä	f740d23ce6	Merge 10.1 into 10.2	2017-04-28 12:22:32 +03:00
Kristian Nielsen	89aad233de	MDEV-12179: Per-engine mysql.gtid_slave_pos table Intermediate commit. Move the discovery of mysql.gtid_slave_pos* tables into the SQL thread. This avoids doing things like opening tables and scanning the mysql schema for tables inside of the START SLAVE statement, which might interact badly with existing transaction or table locks. (Even though START SLAVE is documented to implicitly commit any active transactions, this appears not to be the case in current code). Table discovery fits naturally in the SQL thread init code, next to the loading of mysql.gtid_slave_pos state.	2017-04-23 10:49:58 +02:00
Marko Mäkelä	8c38147cdd	Merge 10.0 into 10.1	2017-04-21 12:46:12 +03:00
Kristian Nielsen	88613e1df6	MDEV-11201: gtid_ignore_duplicates incorrectly ignores statements when GTID replication is not enabled When master_use_gtid=no, the IO thread loads the slave GTID state from the master during connect. This races with the SQL thread when gtid_ignore_duplicates=1. If an event is in the relay log from before the new connect and has not been applied yet, moving the slave position causes the SQL thread to think that event should be skipped due to gtid_ignore_duplicates=1. This patch simply disables gtid_ignore_duplicates when not using GTID, which seems to be what one would expect.	2017-04-10 07:53:27 +02:00
Sergei Golubchik	da4d71d10d	Merge branch '10.1' into 10.2	2017-03-30 12:48:42 +02:00
Sergei Golubchik	09a2107b1b	Merge branch '10.0' into 10.1	2017-03-21 19:20:44 +01:00
Monty	e7f55fde88	Removed wrong assert The following is an updated commit message for the following commit that was pushed before I had a chance to update the commit message: `c5e25c8b40` Fixed dead locks when doing stop slave while slave was starting. - Added a separate lock for protecting start/stop/reset of a specific slave. This solves some possible dead locks when one calls stop slave while the slave is starting as the old run_locks was over used for other things. - Set hash->records to 0 before calling free of all hash elements. This was set to stop concurrent threads to loop over hash elements and access members that was already freed. This was a problem especially in start_all_slaves/stop_all_slaves as the mutex protecting the hash was temporarily released while a slave was started/stopped. - Because of change to hash->records during hash_reset(), any_slave_sql_running() will return 1 during shutdown as one can't loop over master_info_index->master_info_hash while hash_reset() of it is in progress. This also fixes a potential old bug in any_slave_sql_running() where during shutdown and ~Master_info_index(), my_hash_free() we could potentially try to access elements that was already freed.	2017-03-16 14:21:32 +02:00
Marko Mäkelä	adc91387e3	Merge 10.0 into 10.1	2017-03-03 13:27:12 +02:00
Monty	84ed5e1d5f	Fixed hang doing FLUSH TABLES WITH READ LOCK and parallel replication The problem was that waiting for pause_for_ftwrl was done before event_group was completed. This caused rpl_pause_for_ftwrl() to wait forever during FLUSH TABLES WITH READ LOCK. Now we only wait for FLUSH TABLES WITH READ LOCK when we are changing to a new event group.	2017-02-28 16:10:47 +01:00
Monty	c5e25c8b40	Added a separate lock for start/stop/reset slave. This solves some possible dead locks when one calls stop slave while slave is starting.	2017-02-28 16:10:46 +01:00
Monty	e65f667bb6	MDEV-9573 'Stop slave' hangs on replication slave The reason for this is that stop slave takes LOCK_active_mi over the whole operation while some slave operations will also need LOCK_active_mi which causes deadlocks. Fixed by introducing object counting for Master_info and not taking LOCK_active_mi over stop slave or even stop_all_slaves() Another benefit of this approach is that it allows: - Multiple threads can run SHOW SLAVE STATUS at the same time - START/STOP/RESET/SLAVE STATUS on a slave will not block other slaves - Simpler interface for handling get_master_info() - Added some missing unlock of 'log_lock' in error condtions - Moved rpl_parallel_inactivate_pool(&global_rpl_thread_pool) to end of stop_slave() to not have to use LOCK_active_mi inside terminate_slave_threads() - Changed argument for remove_master_info() to Master_info, as we always have this available - Fixed core dump when doing FLUSH TABLES WITH READ LOCK and parallel replication. Problem was that waiting for pause_for_ftwrl was not done when deleting rpt->current_owner after a force_abort.	2017-02-28 16:10:46 +01:00
Sergei Golubchik	4a5d25c338	Merge branch '10.1' into 10.2	2016-12-29 13:23:18 +01:00
Sergei Golubchik	2f20d297f8	Merge branch '10.0' into 10.1	2016-12-11 09:53:42 +01:00
Kristian Nielsen	390f2a013b	Fix incorrect reading of events from relaylog in parallel replication. The SQL thread keeps track of the position in the current relay log from which to read the next event. This position is not normally used, but a certain interaction with the IO thread can cause the SQL thread to re-open the relay log and seek to the stored position. In parallel replication, there were a couple of places where the position was not updated. This created a race where a re-open of the relay log could seek to the wrong position and start re-reading and processing events already handled once, causing various kinds of problems. Fix this by moving the position update into a single place in apply_event_and_update_pos(), which should ensure that the position is always updated in the parallel replication case. This problem was found from the testcase of MDEV-10863, but it is logically a separate problem.	2016-11-16 11:00:38 +01:00
Kristian Nielsen	c06bc66816	MDEV-11065: Compressed binary log Minor review comments/changes: - A bunch of style-fixes. - Change macros to static inline functions. - Update check_event_type() with compressed event types. - Small .result file update.	2016-10-20 18:00:59 +02:00
Kristian Nielsen	e1ef99c3dc	MDEV-7145: Delayed replication Merge feature into 10.2 from feature branch. Delayed replication adds an option CHANGE MASTER TO master_delay=<seconds> Replication will then delay applying events with that many seconds. This creates a replication slave that reflects the state of the master some time in the past. Feature is ported from MySQL source tree. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2016-10-16 23:44:44 +02:00
Kristian Nielsen	3011060b2a	MDEV-7145: Delayed slave. Extend to work also for parallel replication. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2016-10-14 23:15:59 +02:00
Kristian Nielsen	50f19ca809	Remove unnecessary global mutex in parallel replication. The function apply_event_and_update_pos() is called with the rli->data_lock mutex held. However, there seems to be nothing in the function actually needing the mutex to be held. Certainly not in the parallel replication case, where sql_slave_skip_counter is always 0 since the non-zero case is handled by the SQL driver thread. So this patch makes parallel replication use a variant of apply_event_and_update_pos() without the need to take the rli->data_lock mutex. This avoids one contended global mutex for each event executed, which might improve performance on CPU-bound workloads somewhat. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2016-10-14 22:44:40 +02:00
Kristian Nielsen	ec47beaba6	Merge parallel replication async deadlock kill into 10.2. Conflicts: sql/mysqld.cc sql/slave.cc	2016-09-09 12:15:53 +02:00
Kristian Nielsen	7e0c9de864	Parallel replication async deadlock kill When a deadlock kill is detected inside the storage engine, the kill is not done immediately, to avoid calling back into the storage engine kill_query method with various lock subsystem mutexes held. Instead the kill is queued and done later by a slave background thread. This patch in preparation for fixing TokuDB optimistic parallel replication, as well as for removing locking hacks in InnoDB/XtraDB in 10.2. Signed-off-by: Kristian Nielsen <knielsen at knielsen-hq.org>	2016-09-08 15:25:40 +02:00
Monty	96e95b5465	Better SHOW PROCESSLIST for replication - When waiting for events, start time is now counted from start of wait - Instead of having "Connect" as "Command" for all replication threads we now have: - Slave_IO for Slave thread reading relay log - Slave_SQL for slave executing SQL commands or distribution queries to Slave workers - Slave_worker for slave threads executin SQL commands in parallel replication	2016-08-29 13:10:17 +03:00
Monty	89685d55d7	Reuse THD for new user connections - To ensure that mallocs are marked for the correct THD, even if it's allocated in another thread, I added the thread_id to the THD constructor - Added st_my_thread_var to thr_lock_info_init() to avoid a call to my_thread_var - Moved things from THD::THD() to THD::init() - Moved some things to THD::cleanup() - Added THD::free_connection() and THD::reset_for_reuse() - Added THD to CONNECT::create_thd() - Added THD::thread_dbug_id and st_my_thread_var->dbug_id. These are needed to ensure that we have a constant thread_id used for debugging with a THD, even if it changes thread_id (=connection_id) - Set variables.pseudo_thread_id in constructor. Removed not needed sets.	2016-06-04 09:06:00 +02:00
Monty	732adec0a4	Removed some not needed when doing delete thd, which caused warnings about wrong mutex usage from safe_mutex. Ensure that LOCK_status is always taken before LOCK_thread_count	2016-04-28 13:39:55 +03:00
Monty	cdd4043117	Cleanups: - Removed some QQ markers - Removed some rows not compatible with valgrind 3.9.0 - Made mysql_install_db.sh more silent by default. --verbose now gives more information - Added assert that auto-increment doesn't generate 0 (safety) - Removed thd->set_time() in some places as it's set in init_for_queries() - Fixed some --big tests in tokudb - Fixed a bug in mysql_client_test.cc where sql_mode was not properly reset	2016-04-05 18:00:04 +03:00
Sergei Golubchik	f67a2211ec	Merge branch '10.1' into 10.2	2016-03-23 22:36:46 +01:00
Sergei Golubchik	3b0c7ac1f9	Merge branch '10.0' into 10.1	2016-03-21 13:02:53 +01:00
Otto Kekäläinen	1777fd5f55	Fix spelling: occurred, execute, which etc	2016-03-04 02:09:37 +02:00
Monty	3d4a7390c1	MDEV-6150 Speed up connection speed by moving creation of THD to new thread Creating a CONNECT object on client connect and pass this to the working thread which creates the THD. Split LOCK_thread_count to different mutexes Added LOCK_thread_start to syncronize threads Moved most usage of LOCK_thread_count to dedicated functions Use next_thread_id() instead of thread_id++ Other things: - Thread id now starts from 1 instead of 2 - Added cast for thread_id as thread id is now of type my_thread_id - Made THD->host const (To ensure it's not changed) - Removed some DBUG_PRINT() about entering/exiting mutex as these was already logged by mutex code - Fixed that aborted_connects and connection_errors_internal are counted in all cases - Don't take locks for current_linfo when we set it (not needed as it was 0 before)	2016-02-07 10:34:03 +02:00
Sergei Golubchik	a2bcee626d	Merge branch '10.0' into 10.1	2015-12-21 21:24:22 +01:00
Monty	c3018b0ff4	Fixes to get all test to run on MacosX Lion 10.7 This includes fixing all utilities to not have any memory leaks, as safemalloc warnings stopped tests from passing on MacOSX. - Ensure that all clients takes character-set-dir, as the libmysqlclient library will use it. - mysql-test-run now passes character-set-dir to all external clients. - Changed dynstr_free() so that it can be called twice (made freeing code easier) - Changed rpl_global_gtid_slave_state to be allocated dynamicly as it includes a mutex that needs to be initizlied/destroyed before my_end() is called. - Removed rpl_slave_state::init() and rpl_slave_stage::deinit() as their job are better handling by constructor and delete. - Print alias instead of table_name in check_duplicate_key as table_name may have been converted to lower case. Other things: - Fixed a case in time_to_datetime_with_warn() where we where using && instead of & in tests	2015-11-29 17:51:23 +02:00
Monty	b30a768e7b	Fixed failures in rpl_parallel2 Problem was that we used same condition variable with 2 different mutex. Fixed by changing to use COND_rpl_thread_stop instead of COND_parallel_entry for stopping threads. Patch by Kristian Nielsen	2015-11-23 19:58:30 +02:00
Kristian Nielsen	8f2e05f41c	Merge branch 'mdev7818-4' into 10.1 Conflicts: mysql-test/suite/perfschema/r/stage_mdl_global.result sql/rpl_rli.cc sql/sql_parse.cc	2015-11-13 14:24:40 +01:00
Kristian Nielsen	ba02550166	MDEV-7818: Deadlock occurring with parallel replication and FTWRL Problem is that FLUSH TABLES WITH READ LOCK first blocks threads from starting new commits, then waits for running commits to complete. But in-order parallel replication needs commits to happen in a particular order, so this can easily deadlock. To fix this problem, this patch introduces a way to temporarily pause the parallel replication worker threads. Before starting FTWRL, we let all worker threads complete in-progress transactions, and then wait. Then we proceed to take the global read lock. Once the lock is obtained, we unpause the worker threads. Now commits are blocked from starting by the global read lock, so the deadlock will no longer occur.	2015-11-13 14:02:15 +01:00
Kristian Nielsen	6d96fab7dd	MDEV-7818: Deadlock occurring with parallel replication and FTWRL Preparation patch, moving the GCO wait into a separate function, in preparation for adding a separate wait phase for FLUSH TABLES WITH READ LOCK.	2015-11-13 14:02:14 +01:00
Kristian Nielsen	75dc267101	Change Seconds_behind_master to be updated only at commit in parallel replication Before, the Seconds_behind_master was updated already when an event was queued for a worker thread to execute later. This might lead users to interpret a low value as the slave being almost up to date with the master, while in reality there might still be lots and lots of events still queued up waiting to be applied by the slave. See https://lists.launchpad.net/maria-developers/msg08958.html for more detailed discussions.	2015-11-13 10:24:53 +01:00
Kristian Nielsen	df9b8aee58	Merge MDEV-8193 into 10.1 Conflicts: sql/rpl_rli.cc	2015-09-11 12:01:48 +02:00
Kristian Nielsen	51eaa7fe53	MDEV-8193: UNTIL clause in START SLAVE is sporadically disobeyed by parallel replication The code was using the wrong variable when comparing the binlog name for the UNTIL position. This could cause the comparison to fail after binlog rotation, in turn causing the UNTIL clause to not trigger slave stop.	2015-09-11 10:51:56 +02:00
Sergei Golubchik	b85a00161e	MDEV-8264 encryption for binlog * Start_encryption_log_event * --encrypt-binlog command line option based on google patches.	2015-09-04 10:33:55 +02:00
Kristian Nielsen	ef82cb7c2c	Merge MDEV-8725 into 10.1	2015-09-02 10:53:37 +02:00
Kristian Nielsen	999c43aeb7	MDEV-8725: Assertion `!(thd->rgi_slave && thd-> rgi_slave->did_mark_start_commit)' failed in ha_rollback_trans The assertion is there to catch cases where we rollback while mark_start_commit() is active. This can allow following event groups to be replicated too early, causing conflicts. But in this case, we have an _explicit_ ROLLBACK event in the binlog, which should not assert. We fix this by delaying the mark_start_commit() in the explicit ROLLBACK case. It seems safest to delay this in ROLLBACK case anyway, and there should be no reason to try to optimise this corner case.	2015-09-02 09:57:18 +02:00
Kristian Nielsen	dbd205797b	Merge MDEV-8302 into 10.1	2015-08-04 12:39:22 +02:00
Kristian Nielsen	9b9c5e890c	MDEV-8302: Duplicate key with parallel replication This bug is essentially another variant of MDEV-7458. If a transaction conflict caused a deadlock kill of T2 in record_gtid() during commit, the code would do a rollback _before_ running rgi->unmark_start_commit(). This creates a race where following transactions could start too early (before T2 has completed its transaction retry). This in turn could lead to replication failure, if there was a conflict that caused eg. duplicate key error or similar. The fix is to remove these rollbacks (in Query_log_event::do_apply_event() and Xid_log_event::do_apply_event(). They seem out-of-place; code in log_event.cc generally does not roll back on error, this is handled higher up. In addition, because of the extreme difficulty of reproducing bugs like MDEV-7458 and MDEV-8302, this patch adds some extra precations to try to detect (in debug builds) or prevent (in release builds) similar bugs. ha_rollback_trans() will now call unmark_start_commit() if needed (and assert in debug build when a caller does rollback without unmark first). We also add an extra check for thd->killed() so that we avoid doing mark_start_commit() if we already have a pending deadlock kill. And we add a missing unmark_start_commit() call in the error case, found by the above assertion.	2015-08-04 11:40:19 +02:00
Kristian Nielsen	903f8dc72d	Merge MDEV-8147 into 10.1	2015-05-26 15:03:22 +02:00
Kristian Nielsen	e5f1e841dc	MDEV-8147: Assertion `m_lock_type == 2' failed in handler::ha_close() during parallel replication When the slave processes the master restart format_description event, parallel replication needs to complete any prior events before processing the restart event (which closes temporary tables and such stuff). This happens in wait_for_workers_idle(), however it was not waiting long enough. The wait was using wait_for_prior_commit(), but at that points table can still be open. This lead to assertion in this case. So change wait_for_workers_idle() to wait until all worker threads have reached finish_event_group(), at which point all tables should have been closed.	2015-05-26 13:04:15 +02:00
Sergey Vojtovich	9851a8193f	MDEV-8001 - mysql_reset_thd_for_next_command() takes 0.04% in OLTP RO Removed yet more mysql_reset_thd_for_next_command(). Call THD::reset_for_next_command() directly instead.	2015-05-13 15:28:34 +04:00
Kristian Nielsen	8bedb638d7	MDEV-8113: Parallel slave: slave hangs on ALTER TABLE (or other DDL) as the first event after slave start In optimistic parallel replication, it is not safe to try to run a following transaction in parallel with a DDL statement, and there is code to prevent this. However, the code was missing the case where the DDL is the very first event after slave start. In this case, following transactions could run in parallel with the DDL, which can cause the slave to hang or even corrupt slave in unlucky cases.	2015-05-11 12:43:38 +02:00
Kristian Nielsen	c2dd88ac85	Merge MDEV-8031 into 10.1	2015-04-23 14:40:10 +02:00
Kristian Nielsen	b616991a68	MDEV-8031: Parallel replication stops on "connection killed" error (probably incorrectly handled deadlock kill) There was a rare race, where a deadlock error might not be correctly handled, causing the slave to stop with something like this in the error log: 150423 14:04:10 [ERROR] Slave SQL: Connection was killed, Gtid 0-1-2, Internal MariaDB error code: 1927 150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927 150423 14:04:10 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213 150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927 150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927 150423 14:04:10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001 position 1234 The problem was incorrect error handling. When a deadlock is detected, it causes a KILL CONNECTION on the offending thread. This error is then later converted to a deadlock error, and the transaction is retried. However, the deadlock error was not cleared at the start of the retry, nor was the lingering kill signal. So it was possible to get another deadlock kill early during retry. If this happened with particular thread scheduling/timing, it was possible that the new KILL CONNECTION error was masked by the earlier deadlock error, so that the second kill was not properly converted into a deadlock error and retry. This patch adds code that clears the old error and killed flag before starting the retry. It also adds code to handle a deadlock kill caught in a couple of places where it was not handled before.	2015-04-23 14:09:15 +02:00
Kristian Nielsen	167332597f	Merge 10.0 -> 10.1. Conflicts: mysql-test/suite/multi_source/multisource.result sql/sql_base.cc	2015-04-17 15:18:44 +02:00
Kristian Nielsen	accdabd668	Merge MDEV-7888 and MDEV-7929 into 10.0.	2015-04-08 13:19:22 +02:00
Kristian Nielsen	48c10fb5f7	Merge MDEV-7888 and MDEV-7929 into 10.1.	2015-04-08 11:04:24 +02:00
Kristian Nielsen	3b961347db	MDEV-7888, MDEV-7929: Parallel replication hangs sometimes on ANALYZE TABLE or DDL The hangs occur when the group_commit_orderer object is freed before the last mark_start_commit() call on it - this loses the wakeup to other waiting worker threads, causing them to hang until killed manually. The object was freed because wakeup_subsequent_commits() was called two early in two places. For MDEV-7888, during ANALYZE TABLE, and for MDEV-7929 during record_gtid() after processing a DDL event. The group_commit_orderer object can be freed when its last transaction has called wait_for_prior_commit(). Fix by implementing a suspend/resume mechanism for wakeup_subsequent_commits() that can be used in places where a transaction is committed without this being the commit of the actual replication event group. Also add a protection mechanism (that asserts in debug builds) which can prevent the too-early free and hang if other similar bugs should remain in other parts of the code.	2015-04-08 11:01:18 +02:00
Kristian Nielsen	f573b65e41	Merge MDEV-7847 and MDEV-7882 into 10.0. Conflicts: mysql-test/suite/rpl/r/rpl_parallel.result sql/rpl_parallel.cc	2015-03-30 15:10:29 +02:00
Kristian Nielsen	c41e4d3b49	Merge MDEV-7847 and MDEV-7882 into 10.0. Conflicts: mysql-test/suite/rpl/r/rpl_parallel.result mysql-test/suite/rpl/t/rpl_parallel.test	2015-03-30 14:51:25 +02:00
Kristian Nielsen	880f2273fd	MDEV-7847: "Slave worker thread retried transaction 10 time(s) in vain, giving up", followed by replication hanging This patch fixes a bug in the error handling in parallel replication, when one worker thread gets a failure and other worker threads processing later transactions have to rollback and abort. The problem was with the lifetime of group_commit_orderer objects (GCOs). A GCO is freed when we register that its last event group has committed. This relies on register_wait_for_prior_commit() and wait_for_prior_commit() to ensure that the fact that T2 has committed implies that any earlier T1 has also committed, and can thus no longer execute mark_start_commit(). However, in the error case, the code was skipping the register_wait_for_prior_commit() and wait_for_prior_commit() calls. Thus commit ordering was not guaranteed, and a GCO could be freed too early. Then a later mark_start_commit() would reference deallocated GCO, which could lead to lost wakeup (causing slave threads to hang) or other corruption. This patch makes also the error case respect commit order. This way, also the error case gets the GCO lifetime correct, and the hang no longer occurs.	2015-03-30 14:33:44 +02:00
Kristian Nielsen	a4082918c8	MDEV-7882: Excessive transaction retry in parallel replication When a transaction in parallel replication needs to retry (eg. because of deadlock kill), first wait for all prior transactions to commit before doing the retry. This way, we avoid the retry once again conflicting with a prior transaction, requiring yet another retry. Without this patch, we saw "in the wild" that transactions had to be retried more than 10 times to succeed, which exceeds the default --slave_transaction_retries value and is in any case undesirable. (We already do this in 10.1 in "optimistic" parallel replication mode; this patch just makes the code use the same logic for "conservative" mode (only mode in 10.0)).	2015-03-30 14:16:57 +02:00
Kristian Nielsen	bd2ae787ea	MDEV-7825: Parallel replication race condition on gco->flags, possibly resulting in slave hang The patch for optimistic parallel replication as a memory optimisation moved the gco->installed field into a bit in gco->flags. However, that is just plain wrong. The gco->flags field is owned by the SQL driver thread, but gco->installed is used by the worker threads, so this will cause a race condition. The user-visible problem might be conflicts between transactions and/or slave threads hanging. So revert this part of the optimistic parallel replication patch, going back to using a separate field gco->installed like in 10.0.	2015-03-24 16:33:51 +01:00
Kristian Nielsen	ed04c40b01	MDEV-5289: master server starts slave parallel threads Delay spawning parallel replication worker threads until a slave SQL thread is running, and de-spawn them when the last SQL thread stops. This is especially useful to avoid needless threads on a master in a setup where same my.cnf is used on masters and slaves.	2015-03-11 09:18:16 +01:00
Sergei Golubchik	2db62f686e	Merge branch '10.0' into 10.1	2015-03-07 13:21:02 +01:00
Kristian Nielsen	95d7208859	Merge MDEV-6589 and MDEV-6403 into 10.1. Conflicts: sql/log.cc sql/rpl_rli.cc sql/sql_repl.cc	2015-03-04 13:49:37 +01:00
Kristian Nielsen	3ef0b9b235	Merge MDEV-6589 and MDEV-6403 into 10.0.	2015-03-04 13:36:54 +01:00

1 2 3 4 5

243 commits