mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-03-01 00:33:09 +01:00

Author	SHA1	Message	Date
Oleksandr Byelkin	9ed8deb656	Merge branch '10.6' into 10.7	2022-02-04 14:11:46 +01:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Oleksandr Byelkin	a576a1cea5	Merge branch '10.3' into 10.4	2022-01-30 09:46:52 +01:00
Oleksandr Byelkin	41a163ac5c	Merge branch '10.2' into 10.3	2022-01-29 15:41:05 +01:00
Brandon Nesterenko	96de6bfd5e	MDEV-16091: Seconds_Behind_Master spikes to millions of seconds Problem: ======== A slave’s relay log format description event is used when calculating Seconds_Behind_Master (SBM). This forces the SBM value to spike when processing these events, as their creation date is set to the timestamp that the IO thread begins. Solution: ======== When the slave generates a format description event, mark the event as a relay log event so it does not update the rli->last_master_timestamp variable. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2022-01-04 11:21:33 -07:00
sjaakola	ef2dbb8dbc	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 20:40:35 +02:00
Jan Lindström	d5bc05798f	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 20:38:11 +02:00
sjaakola	5c230b21bf	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 09:52:52 +03:00
Jan Lindström	aa7ca987db	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 09:52:40 +03:00
Marko Mäkelä	d7af7bfc2b	Merge 10.6 into 10.7	2021-10-28 09:14:51 +03:00
Marko Mäkelä	d8c6c53a06	Merge 10.5 into 10.6	2021-10-28 09:08:58 +03:00
Marko Mäkelä	a8ded39557	Merge 10.4 into 10.5	2021-10-28 08:48:36 +03:00
Julius Goryavsky	7948a1dc53	MDEV-26914: Unreleased mutex in the exec_relay_log_event() function In the replication-related code, in the exec_relay_log_event() (slave.cc) function, where the "data_lock" mutex is captured, this mutex is then not released on one of the early return branches within a specific insert for WSREP, namely under the branch: "if (wsrep_before_statement(thd))". As a result, the mutex remains captured, resulting in errors or hangs. This commit fixes this issue, which is now showing up as intermittent failures in mtr tests for galera and galera_sr suites.	2021-10-28 03:17:12 +02:00
Aleksey Midenkov	d324c03d0c	Vanilla cleanups and refactorings Dead code cleanup: part_info->num_parts usage was wrong and working incorrectly in mysql_drop_partitions() because num_parts is already updated in prep_alter_part_table(). We don't have to update part_info->partitions because part_info is destroyed at alter_partition_lock_handling(). Cleanups: - DBUG_EVALUATE_IF() macro replaced by shorter form DBUG_IF(); - Typo in ER_KEY_COLUMN_DOES_NOT_EXITS. Refactorings: - Splitted write_log_replace_delete_frm() into write_log_delete_frm() and write_log_replace_frm(); - partition_info via DDL_LOG_STATE; - set_part_info_exec_log_entry() removed. DBUG_EVALUATE removed DBUG_EVALUTATE was only added for consistency together with DBUG_EVALUATE_IF. It is not used anywhere in the code. DBUG_SUICIDE() fix on release build On release DBUG_SUICIDE() was statement. It was wrong as DBUG_SUICIDE() is used in expression context.	2021-10-26 17:07:46 +02:00
Sergei Golubchik	0299ec29d4	cleanup: MY_BITMAP mutex in about a hundred of users of MY_BITMAP, only two were using its built-in mutex, and only one of those two was actually needing it. Remove the mutex from MY_BITMAP, remove all associated conditions and checks in bitmap functions. Use an external LOCK_temp_pool mutex and temp_pool_set_next/temp_pool_clear_bit acccessors. Remove bitmap_init/bitmap_free, always use my_* versions.	2021-08-26 23:39:52 +02:00
Sujatha	6c39eaeb12	MDEV-21117: refine the server binlog-based recovery for semisync Problem: ======= When the semisync master is crashed and restarted as slave it could recover transactions that former slaves may never have seen. A known method existed to clear out all prepared transactions with --tc-heuristic-recover=rollback does not care to adjust binlog accordingly. Fix: === The binlog-based recovery is made to concern of the slave semisync role of post-crash restarted server. No changes in behavior is done to the "normal" binloggging server and the semisync master. When the restarted server is configured with --rpl-semi-sync-slave-enabled=1 the refined recovery attempts to roll back prepared transactions and truncate binlog accordingly. In case of a partially committed (that is committed at least in one of the engine participants) such transaction gets committed. It's guaranteed no (partially as well) committed transactions exist beyond the truncate position. In case there exists a non-transactional replication event (being in a way a committed transaction) past the computed truncate position the recovery ends with an error. As after master crash and failover to slave, the demoted-to-slave ex-master must be ready to face and accept its own (generated by) events, without generally necessary --replicate-same-server-id. So the acceptance conditions are relaxed for the semisync slave to accept own events without that option. While gtid_strict_mode ON ensures no duplicate transaction can be (re-)executed the master_use_gtid=none slave has to be configured with --replicate-same-server-id. NOTE for reviewers. This patch does not handle the user XA which is done in next git commit.	2021-06-11 19:49:39 +03:00
Rucha Deodhar	4e19539c14	MDEV-22189: Change error messages inside code to have mariadb instead of mysql Fix: Changed error messages, rerecorded results and changed other relevant files.	2021-05-24 11:38:13 +05:30
Monty	85d6278fed	Change replication to use uchar for all buffers instead of char This change is to get rid of randomly failing tests, especially those that reads random position of the binary log. From looking at the logs it's clear that some failures is because of a read char (with value >= 128) is converted to a big long value. Using uchar everywhere makes this much less likely to happen. Another benefit is that a lot of cast of char to uchar could be removed. Other things: - Removed some extra space before '=' and '+=' in assignments - Fixed indentations and lines > 80 characters - Replace '16' with 'element_size' (from class definition) in Gtid_list_log_event()	2021-05-19 22:54:12 +02:00
Monty	a206658b98	Change CHARSET_INFO character set and collaction names to LEX_CSTRING This change removed 68 explict strlen() calls from the code. The following renames was done to ensure we don't use the old names when merging code from earlier releases, as using the new variables for print function could result in crashes: - charset->csname renamed to charset->cs_name - charset->name renamed to charset->coll_name Almost everything where mechanical changes except: - Changed to use the new Protocol::store(LEX_CSTRING..) when possible - Changed to use field->store(LEX_CSTRING, CHARSET_INFO) when possible - Changed to use String->append(LEX_CSTRING&) when possible Other things: - There where compiler issues with ensuring that all character set names points to the same string: gcc doesn't allow one to use integer constants when defining global structures (constant char * pointers works fine). To get around this, I declared defines for each character set name length.	2021-05-19 22:54:07 +02:00
Monty	b6ff139aa3	Reduce usage of strlen() Changes: - To detect automatic strlen() I removed the methods in String that uses 'const char ' without a length: - String::append(const char) - Binary_string(const char str) - String(const char str, CHARSET_INFO cs) - append_for_single_quote(const char ) All usage of append(const char) is changed to either use String::append(char), String::append(const char, size_t length) or String::append(LEX_CSTRING) - Added STRING_WITH_LEN() around constant string arguments to String::append() - Added overflow argument to escape_string_for_mysql() and escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow. This was needed as most usage of the above functions never tested the result for -1 and would have given wrong results or crashes in case of overflows. - Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING. Changed all Item_func::func_name()'s to func_name_cstring()'s. The old Item_func_or_sum::func_name() is now an inline function that returns func_name_cstring().str. - Changed Item::mode_name() and Item::func_name_ext() to return LEX_CSTRING. - Changed for some functions the name argument from const char * to to const LEX_CSTRING &: - Item::Item_func_fix_attributes() - Item::check_type_...() - Type_std_attributes::agg_item_collations() - Type_std_attributes::agg_item_set_converter() - Type_std_attributes::agg_arg_charsets...() - Type_handler_hybrid_field_type::aggregate_for_result() - Type_handler_geometry::check_type_geom_or_binary() - Type_handler::Item_func_or_sum_illegal_param() - Predicant_to_list_comparator::add_value_skip_null() - Predicant_to_list_comparator::add_value() - cmp_item_row::prepare_comparators() - cmp_item_row::aggregate_row_elements_for_comparison() - Cursor_ref::print_func() - Removes String_space() as it was only used in one cases and that could be simplified to not use String_space(), thanks to the fixed my_vsnprintf(). - Added some const LEX_CSTRING's for common strings: - NULL_clex_str, DATA_clex_str, INDEX_clex_str. - Changed primary_key_name to a LEX_CSTRING - Renamed String::set_quick() to String::set_buffer_if_not_allocated() to clarify what the function really does. - Rename of protocol function: bool store(const char from, CHARSET_INFO cs) to bool store_string_or_null(const char from, CHARSET_INFO cs). This was done to both clarify the difference between this 'store' function and also to make it easier to find unoptimal usage of store() calls. - Added Protocol::store(const LEX_CSTRING, CHARSET_INFO) - Changed some 'const char' arrays to instead be of type LEX_CSTRING. - class Item_func_units now used LEX_CSTRING for name. Other things: - Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character in the prompt would cause some part of the prompt to be duplicated. - Fixed a lot of instances where the length of the argument to append is known or easily obtain but was not used. - Removed some not needed 'virtual' definition for functions that was inherited from the parent. I added override to these. - Fixed Ordered_key::print() to preallocate needed buffer. Old code could case memory overruns. - Simplified some loops when adding char to a String with delimiters.	2021-05-19 22:27:48 +02:00
Marko Mäkelä	ca3f497564	Merge 10.2 into 10.3, except MDEV-25682	2021-05-18 08:40:19 +03:00
Sachin Kumar	355dc74b76	MDEV-22370 safe_mutex: Trying to lock uninitialized mutex at /data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL Problem:- When we issue FTWRL with shutdown in parallel, there is race between FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool) before FTWRL can lock it. So we can get crash on FTWRL thread Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for FTWRL thread to complete its work , and then destroy. So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed later in end_slave()	2021-05-14 11:49:46 +01:00
Andrei Elkin	3616640a31	MDEV-20821 parallel slave server shutdown hang Parallel slave server shutdown found to be hanging in close_connections() triggered by shutdown due to a slave worker thread would not be notified to exit in case the worker was sitting idle. Fixed with destroying the worker pool earlier that is in slave_prepare_for_shutdown() when all their driver threads have already left. A test file is added to simulate the bug condition as well as check multi-sourced and not-idle worker cases.	2021-05-14 11:49:26 +01:00
Sujatha	70642871bc	MDEV-16437: merge 5.7 P_S replication instrumentation and tables Merge 'replication_applier_status_by_coordinator' table. This table captures SQL_THREAD status in case of both single threaded and multi threaded slave configuration. When multi_source replication is enabled this table will display each source specific SQL_THREAD status. Added new columns for: - LAST_SEEN_TRANSACTION - LAST_TRANS_RETRY_COUNT	2021-04-16 09:02:00 +05:30
Marko Mäkelä	94b4578704	Merge 10.5 into 10.6	2021-02-17 19:39:05 +02:00
Sergei Golubchik	25d9d2e37f	Merge branch 'bb-10.4-release' into bb-10.5-release	2021-02-15 16:43:15 +01:00
Sergei Golubchik	eac8341df4	MDEV-23328 Server hang due to Galera lock conflict resolution adaptation of `29bbcac0ee` for 10.4	2021-02-12 18:17:06 +01:00
Sergei Golubchik	9703cffa8c	don't take mutexes conditionally	2021-02-12 18:14:20 +01:00
Sergei Golubchik	00a313ecf3	Merge branch 'bb-10.3-release' into bb-10.4-release Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution" was null-merged. 10.4 version of the fix is coming up separately	2021-02-12 17:44:22 +01:00
Sergei Golubchik	60ea09eae6	Merge branch '10.2' into 10.3	2021-02-01 13:49:33 +01:00
Sergei Golubchik	6a1cb449fe	cleanup: remove slave background thread, use handle_manager thread instead	2021-01-24 11:35:55 +01:00
Daniel Black	29d9897fe2	MDEV-10272: add master host/port info to slave thread exit messages Sample log error message generated: 2021-01-21 2:33:24 139912137520896 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 369 33:24 139912137520896 [Note] master was 127.0.0.1:16400 2021-01-21 2:33:24 139912137828096 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 369 2021-01-21 2:33:24 139912137828096 [Note] master was 127.0.0.1:16400 Based on work by Hartmut Holzgraefe. Reviewer: knielsen@knielsen-hq.org, Andrei, Sachin	2021-01-22 10:06:33 +11:00
Hartmut Holzgraefe	fa14c423cd	MDEV-10271: add master host/port info to slave thread exit messages Sample log error message generated: mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 8 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329, master: 127.0.0.1:16000 mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 7 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 329, master 127.0.0.1:16000 mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 12 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329; GTID position '', master: 127.0.0.1:16000 Reviewer: knielsen@knielsen-hq.org, Andrei and Sachin	2021-01-21 13:03:54 +11:00
Oleksandr Byelkin	10aa576483	Merge branch '10.5' into 10.6	2020-11-14 20:05:35 +01:00
Marko Mäkelä	d7a5824899	Merge 10.4 into 10.5	2020-11-13 21:54:21 +02:00
Sujatha	b2029c0300	Merge branch '10.3' into 10.4	2020-11-12 15:39:02 +05:30
Sujatha	bafb011a82	Merge branch '10.2' into 10.3	2020-11-12 14:10:05 +05:30
Sujatha	984a06db2c	MDEV-4633: multi_source.simple test fails sporadically Analysis: ======== Writes to 'rli->log_space_total' needs to be synchronized, otherwise both SQL_THREAD and IO_THREAD can try to modify the variable simultaneously resulting in incorrect rli->log_space_total. In the current test scenario SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log' and IO_THREAD is trying to increment the 'rli->log_space_total' in 'queue_event' simultaneously. Hence test occasionally fails with result mismatch. Fix: === Convert 'rli->log_space_total' variable to atomic type.	2020-11-12 13:04:39 +05:30
Oleksandr Byelkin	5edf3e0388	Merge branch '10.5' into 10.6	2020-09-02 14:36:14 +02:00
Alexander Barkov	e96f66b93d	MDEV-23270 Remove a String parameter from Protocol::store(double/float)	2020-08-14 09:14:07 +04:00
Marko Mäkelä	9a7948e3f6	Merge 10.5 into 10.6	2020-08-04 07:55:16 +03:00
Marko Mäkelä	50a11f396a	Merge 10.4 into 10.5	2020-08-01 14:42:51 +03:00
Marko Mäkelä	9216114ce7	Merge 10.3 into 10.4	2020-07-31 18:09:08 +03:00
Marko Mäkelä	66ec3a770f	Merge 10.2 into 10.3	2020-07-31 13:51:28 +03:00
Sujatha	b3dd95e035	MDEV-14203: rpl.rpl_extra_col_master_myisam, rpl.rpl_slave_load_tmpdir_not_exist failed in buildbot with a warning Problem: ======= rpl.rpl_slave_load_tmpdir_not_exist 'stmt' w3 [ fail ] Found warnings/errors in server log file! Test ended at 2017-09-27 20:34:55 [Warning] Master is configured to log replication events with checksum, but will not send such events to slaves that cannot process them ^ Found warnings in /mnt/buildbot/build/mariadb-10.2.10/mysql-test/var/3/log/mysqld.1.err ok Analysis: ======== When slave tries to connect to master 'get_master_version_and_clock' function is invoked to perform elaborated slave-master handshake. During this process slave server queries master server, to know if it is checksum aware and at the same time master is notified about its CRC-awareness. The master's side instant value of @@global.binlog_checksum is stored in the dump thread's uservar area as well as cached locally to become known in consensus by master and slave. Post hand-shake slave requests master for binlog dump. It sends 'COM_BINLOG_DUMP'. This command is sent to master by 'cli_advanced_command' call. If there is some temporary network failure during this request_dump call, 'end_server' is invoked to close the current connection between master and slave. Upon connection close the dump thread on the master gets terminated and it clears the 'uservar' data it got through master-slave handshake. The 'COM_BINLOG_DUMP' command is sent once again without master-slave handshake. Since the checksum data is not available with new dump thread a warning gets reported. Fix: === Upon network write error donot attempt reconnect, proceed to master-slave handshake. This ensures that master is aware of slave's capability to use checksums.	2020-07-23 12:54:40 +05:30
Vladislav Vaintroub	b0d2a59d9a	MDEV-21612 - remove COM_MULTI from server and C/C The COM_MULTI did not take off. No connector is using it. Remove related code from server, and client. If anything it is a step simplification of already-bloated dispatch_command(), and related code.	2020-07-14 11:16:24 +02:00
Marko Mäkelä	c515b1d092	Merge 10.4 into 10.5	2020-06-18 13:58:54 +03:00
Sachin	592a10d079	MDEV-22370 safe_mutex: Trying to lock uninitialized mutex at /data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL Problem:- When we issue FTWRL with shutdown in parallel, there is race between FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool) before FTWRL can lock it. So we can get crash on FTWRL thread Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for FTWRL thread to complete its work , and then destroy. So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed later in end_slave()	2020-06-17 02:22:46 +05:30
Marko Mäkelä	4a0b56f604	Merge 10.4 into 10.5	2020-05-31 10:28:59 +03:00

1 2 3 4 5 ...

2626 commits