mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 03:52:35 +01:00

Author	SHA1	Message	Date
ParadoxV5	d5f16d6305	Extract some of #3360 fixes to 10.6.x That PR uncovered countless issues on `my_snprintf` uses. This commit backports a squashed subset of their fixes (excludes #3485).	2024-11-18 13:29:04 +11:00
Brandon Nesterenko	1ed30e08af	MDEV-34122: Assertion `entry' failed in Active_tranx::assert_thd_is_waiter If semi-sync is switched off then on while a transaction is in-between binlogging and waiting for an ACK, the semi-sync state of the transaction is removed, leading to a debug assertion that indicates the transaction tried to wait, but cannot receive an ACK signal. More specifically, when semi-sync is switched off, the Active_tranx list is cleared (where a transaction adds an entry to this list during binlogging), and each entry in this list saves the thread which will wait for an ACK, and the thread has the COND variable to signal to wake itself. So if the entry is lost, the Ack_receiver thread won’t be able to find the thread to wake up when an ACK comes in The fix is to ensure that the entry exists before awaiting the ACK, and if there is no entry, skip the wait. In debug builds, an informative message is written explaining that the transaction is skipping its wait. Additional debug-build only logic is added to ensure that the cause of the missing entry is due to semi-sync being turned off and on Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-10-21 15:35:54 -06:00
Brandon Nesterenko	d58975bbef	MDEV-9159: Bring back assert in semisync_master.cc In 10.0 there was an assert to ensure that there were semi sync clients before removing one, but it was removed in 10.1. This patch adds the assertion back.	2024-07-03 14:10:14 -06:00
Marko Mäkelä	5ba542e9ee	Merge 10.5 into 10.6	2024-05-30 14:27:07 +03:00
Robin Newhouse	dc38d8ea80	Minimize unsafe C functions with safe_strcpy() Similar to #2480. `567b681` introduced safe_strcpy() to minimize the use of C with potentially unsafe memory overflow with strcpy() whose use is discouraged. Replace instances of strcpy() with safe_strcpy() where possible, limited here to files in the `sql/` directory. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.	2024-05-17 13:33:16 +01:00
Brandon Nesterenko	75c7c6dc39	MDEV-33551: Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency When using semi-sync replication with rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the primary can significantly reduce compared to AFTER_SYNC's performance for workloads with many concurrent users executing transactions. This is because all connections on the primary share the same cond_wait variable/mutex pair, so any time an ACK is received from a replica, all waiting connections are awoken to check if the ACK was for itself, which is done in mutual exclusion. This patch changes this such that the waiting THD will use its own local condition variable, and the ACK receiver thread only signals connections which have been ACKed for wakeup. That is, the THD::LOCK_wakeup_ready condition variable is re-used for this purpose, and the Active_tranx queue nodes are extended to hold the waiting thread, so it can be signalled once ACKed. Additionally: 1) Removed part of MDEV-11853 additions, which allowed suspended connection threads awaiting their semi-sync ACKs to live until their ACKs had been received. This part, however, wasn't needed. That is, all that was needed was for the Ack_thread to survive. So now the connection threads are killed during phase 1. Thereby THD::is_awaiting_semisync_ack, and all its related code was removed. 2) COND_binlog_send is repurposed to signal on the condition when Active_tranx is emptied during clear_active_tranx_nodes. 3) At master shutdown (when waiting for slaves), instead of the main loop individually waiting for each ACK, await_slave_reply() (renamed await_all_slave_replies()) just waits once for the repurposed COND_binlog_send to signal it is empty. 4) Test rpl_semi_sync_shutdown_await_ack is updates as following: 4.1) Added test case (adapted from Kristian Nielsen) to ensure that if a thread awaiting its ACK is killed while SHUTDOWN WAIT FOR ALL SLAVES is issued, the primary will still wait for the ACK from the killed thread. 4.2) As connections which by-passed phase 1 of thread killing no longer are delayed for kill until phase 2, we can no longer query yes/no tx after receiving an ACK/timeout. The check for these variables is removed. 4.3) Comment descriptions are updated which mention that the connection is alive; and adjusted to be the Ack_thread. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-03-21 08:42:18 -06:00
Monty	4106974b9d	Fixed some compiler warnings	2024-02-12 15:06:20 +02:00
Michael Widenius	7af50e4df4	MDEV-32551: "Read semi-sync reply magic number error" warnings on master rpl_semi_sync_slave_enabled_consistent.test and the first part of the commit message comes from Brandon Nesterenko. A test to show how to induce the "Read semi-sync reply magic number error" message on a primary. In short, if semi-sync is turned on during the hand-shake process between a primary and replica, but later a user negates the rpl_semi_sync_slave_enabled variable while the replica's IO thread is running; if the io thread exits, the replica can skip a necessary call to kill_connection() in repl_semisync_slave.slave_stop() due to its reliance on a global variable. Then, the replica will send a COM_QUIT packet to the primary on an active semi-sync connection, causing the magic number error. The test in this patch exits the IO thread by forcing an error; though note a call to STOP SLAVE could also do this, but it ends up needing more synchronization. That is, the STOP SLAVE command also tries to kill the VIO of the replica, which makes a race with the IO thread to try and send the COM_QUIT before this happens (which would need more debug_sync to get around). See THD::awake_no_mutex for details as to the killing of the replica’s vio. Notes: - The MariaDB documentation does not make it clear that when one enables semi-sync replication it does not matter if one enables it first in the master or slave. Any order works. Changes done: - The rpl_semi_sync_slave_enabled variable is now a default value for when semisync is started. The variable does not anymore affect semisync if it is already running. This fixes the original reported bug. Internally we now use repl_semisync_slave.get_slave_enabled() instead of rpl_semi_sync_slave_enabled. To check if semisync is active on should check the @@rpl_semi_sync_slave_status variable (as before). - The semisync protocol conflicts in the way that the original MySQL/MariaDB client-server protocol was designed (client-server send and reply packets are strictly ordered and includes a packet number to allow one to check if a packet is lost). When using semi-sync the master and slave can send packets at 'any time', so packet numbering does not work. The 'solution' has been that each communication starts with packet number 1, but in some cases there is still a chance that the packet number check can fail. Fixed by adding a flag (pkt_nr_can_be_reset) in the NET struct that one can use to signal that packet number checking should not be done. This is flag is set when semi-sync is used. - Added Master_info::semi_sync_reply_enabled to allow one to configure some slaves with semisync and other other slaves without semisync. Removed global variable semi_sync_need_reply that would not work with multi-master. - Repl_semi_sync_master::report_reply_packet() can now recognize the COM_QUIT packet from semisync slave and not give a "Read semi-sync reply magic number error" error for this case. The slave will be removed from the Ack listener. - On Windows, don't stop semisync Ack listener just because one slave connection is using socket_id > FD_SETSIZE. - Removed busy loop in Ack_receiver::run() by using "Self-pipe trick" to signal new slave and stop Ack_receiver. - Changed some Repl_semi_sync_slave functions that always returns 0 from int to void. - Added Repl_semi_sync_slave::slave_reconnect(). - Removed dummy_function Repl_semi_sync_slave::reset_slave(). - Removed some duplicate semisync notes from the error log. - Add test of "if (get_slave_enabled() && semi_sync_need_reply)" before calling Repl_semi_sync_slave::slave_reply(). (Speeds up the code as we can skip all initializations). - If epl_semisync_slave.slave_reply() fails, we disable semisync for that connection. - We do not call semisync.switch_off() if there are no active slaves. Instead we check in Repl_semi_sync_master::commit_trx() if there are no active threads. This simplices the code. - Changed assert() to DBUG_ASSERT() to ensure that the DBUG log is flushed in case of asserts. - Removed the internal rpl_semi_sync_slave_status as it is not needed anymore. The @@rpl_semi_sync_slave_status status variable is now mapped to rpl_semi_sync_enabled. - Removed rpl_semi_sync_slave_enabled as it is not needed anymore. Repl_semi_sync_slave::get_slave_enabled() contains the active status. - Added checking that we do not add a slave twice with Ack_receiver::add_slave(). This could happen with old code. - Removed Repl_semi_sync_master::check_and_switch() as it is not needed anymore. - Ensure that when we call Ack_receiver::remove_slave() that the slave is removed from the listener before function returns. - Call listener.listen_on_sockets() outside of mutex for better performance and less contested mutex. - Ensure that listening is ignoring newly added slaves when checking for responses. - Fixed the master ack_receiver listener is not killed if there are no connected slaves (and thus stop semisync handling of future connections). This could happen if all slaves sockets where would be marked as unreliable. - Added unlink() to base_ilist_iterator and remove() to I_List_iterator. This enables us to remove 'dead' slaves in Ack_recever::run(). - kill_zombie_dump_threads() now does killing of dump threads properly. - It can now kill several threads (should be impossible but could happen if IO slaves reconnects very fast). - We now wait until the dump thread is done before starting the dump. - Added an error if kill_zombie_dump_threads() fails. - Set thd->variables.server_id before calling kill_zombie_dump_threads(). This simplies the code. - Added a lot of comments both in code and tests. - Removed DBUG_EVALUATE_IF "failed_slave_start" as it is not used. Test changes: - rpl.rpl_session_var2 added which runs rpl.rpl_session_var test with semisync enabled. - Some timings changed slight with startup of slave which caused rpl_binlog_dump_slave_gtid_state_info.text to fail as it checked the error log file before the slave had started properly. Fixed by adding wait_for_pattern_in_file.inc that allows waiting for the pattern to appear in the log file. - Tests have been updated so that we first set rpl_semi_sync_master_enabled on the master and then set rpl_semi_sync_slave_enabled on the slaves (this is according to how the MariaDB documentation document how to setup semi-sync). - Error text "Master server does not have semi-sync enabled" has been replaced with "Master server does not support semi-sync" for the case when the master supports semi-sync but semi-sync is not enabled. Other things: - Some trivial cleanups in Repl_semi_sync_master::update_sync_header(). - We should in 11.3 changed the default value for rpl-semi-sync-master-wait-no-slave from TRUE to FALSE as the TRUE does not make much sense as default. The main difference with using FALSE is that we do not wait for semisync Ack if there are no slave threads. In the case of TRUE we wait once, which did not bring any notable benefits except slower startup of master configured for using semisync. Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com> This solves the problem reported in MDEV-32960 where a new slave may not be registered in time and the master disables semi sync because of that.	2024-01-23 13:03:11 +02:00
Oleksandr Byelkin	6cfd2ba397	Merge branch '10.4' into 10.5	2023-11-08 12:59:00 +01:00
Andrei	9c43343213	MDEV-32365 detailize the semisync replication magic number error Semisync ack (master side) receiver thread is made to report details of faced errors. In case of 'magic byte' error, a hexdump of the received packet is always (level) NOTEd into the error log. In other cases an exact server level error is print out as a warning (as it may not be critical) under log_warnings > 2. An MTR test added for the magic byte error. For others existing mtr tests cover that, provided log_warnings > 2 is set.	2023-10-26 20:24:44 +03:00
Oleksandr Byelkin	ac5a534a4c	Merge remote-tracking branch '10.4' into 10.5	2023-03-31 21:32:41 +02:00
Anel Husakovic	c596ad734d	MDEV-30269: Remove rpl_semi_sync_[slave,master] usage in code - Description: - Before 10.3.8 semisync was a plugin that is built into the server with MDEV-13073,starting with commit `cbc71485e2`. There are still some usage of `rpl_semi_sync_master` in mtr. Note: - To recognize the replica in the `dump_thread`, replica is creating local variable `rpl_semi_sync_slave` (the keyword of plugin) in function `request_transmit`, that is catched by primary in `is_semi_sync_slave()`. This is the user variable and as such not related to the obsolete plugin. - Found in `sys_vars.all_vars` and `rpl_semi_sync_wait_point` tests, usage of plugins `rpl_semi_sync_master`, `rpl_semi_sync_slave`. The former test is disabled by default (`sys_vars/disabled.def`) and marked as `obsolete`, however this patch will remove the queries. - Add cosmetic fixes to semisync codebase Reviewer: <brandon.nesterenko@mariadb.com> Closes PR #2528, PR #2380	2023-03-23 13:39:46 +01:00
Sergei Golubchik	3a2116241b	Merge branch '10.4' into 10.5	2022-10-02 14:38:13 +02:00
Marko Mäkelä	e3fdabd501	MDEV-29613 fixup: clang -Wunused-but-set-variable	2022-09-26 15:16:51 +03:00
Sergei Golubchik	ef781162ff	Merge branch '10.4' into 10.5	2022-05-09 22:04:06 +02:00
Brandon Nesterenko	a83c7ab1ea	MDEV-11853: semisync thread can be killed after sync binlog but before ACK in the sync state Problem: ======== If a primary is shutdown during an active semi-sync connection during the period when the primary is awaiting an ACK, the primary hard kills the active communication thread and does not ensure the transaction was received by a replica. This can lead to an inconsistent replication state. Solution: ======== During shutdown, the primary should wait for an ACK or timeout before hard killing a thread which is awaiting a communication. We extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore any threads waiting for a semi-sync ACK in phase 1. Then, before stopping the ack receiver thread, the shutdown is delayed until all waiting semi-sync connections receive an ACK or time out. The connections are then killed in phase 2. Notes: 1) There remains an unresolved corner case that affects this patch. MDEV-28141: Slave crashes with Packets out of order when connecting to a shutting down master. Specifically, If a slave is connecting to a master which is actively shutting down, the slave can crash with a "Packets out of order" assertion error. To get around this issue in the MTR tests, the primary will wait a small amount of time before phase 1 killing threads to let the replicas safely stop (if applicable). 2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver Thread Can Error on COM_QUIT Reviewed By ============ Andrei Elkin <andrei.elkin@mariadb.com>	2022-04-22 12:59:54 -06:00
Marko Mäkelä	d62b0368ca	Merge 10.4 into 10.5	2022-03-29 12:59:18 +03:00
Brandon Nesterenko	32ab6219be	MDEV-25580: rpl.rpl_semi_sync_slave_compressed_protocol crashes because of wrong packet Problem: ======== When both semi-sync and slave compression are enabled, the numbering on packet headers can become out of sync between the primary and replica servers. More specifically, after the master flushes its write, it should increment the counters that track packets. The bug is such that the master only updates the normal packet counter and leaves the compressed packet counter alone. Solution: ======== After the master flushes, additionally increment the compressed packet counter. Reviewed By: ============ Andrei Elkin: <andrei.elkin@mariadb.com>	2022-03-24 07:25:22 -06:00
Monty	d1d472646d	Change THD->transaction to a pointer to enable multiple transactions All changes (except one) is of type thd->transaction. -> thd->transaction-> thd->transaction points by default to 'thd->default_transaction' This allows us to 'easily' have multiple active transactions for a THD object, like when reading data from the mysql.proc table	2020-05-23 12:29:10 +03:00
Sergei Golubchik	7c58e97bf6	perfschema memory related instrumentation changes	2020-03-10 19:24:22 +01:00
Michael Widenius	716d396bb3	Remove \n from DBUG_PRINT statements	2019-10-21 18:41:58 +03:00
Sergey Vojtovich	779fb636da	Revert THD::THD(skip_global_sys_var_lock) argument Originally introduced by `e972125f1` to avoid harmless wait for LOCK_global_system_variables in a newly created thread, which creation was initiated by system variable update. At the same time it opens dangerous hole, when system variable update thread already released LOCK_global_system_variables and ack_receiver thread haven't yet completed new THD construction. In this case THD constructor goes completely unprotected. Since ack_receiver.stop() waits for the thread to go down, we have to temporarily release LOCK_global_system_variables so that it doesn't deadlock with ack_receiver.run(). Unfortunately it breaks atomicity of rpl_semi_sync_master_enabled updates and makes them not serialized. LOCK_rpl_semi_sync_master_enabled was introduced to workaround the above. TODO: move ack_receiver start/stop into repl_semisync_master enable_master/disable_master under LOCK_binlog protection? Part of MDEV-14984 - regression in connect performance	2019-05-03 16:46:11 +04:00
Andrei Elkin	42c58b87da	MDEV-18096 The server would crash when has configs rpl_semi_sync_master_enabled = OFF rpl_semi_sync_master_wait_no_slave = OFF The patch fixes a fired assert in the semisync master module. The assert caught attempt to switch semisync off (per rpl_semi_sync_master_wait_no_slave = OFF) when it was not even initialized (per rpl_semi_sync_master_enabled = OFF). The switching-off execution branch is relocated under one that executes enable_master() first. A minor cleaup is done to remove the int return from two functions that did not return anything but an error which could not happen in the functions.	2019-04-19 18:48:16 +03:00
Monty	30ebc3ee9e	Add likely/unlikely to speed up execution Added to: - if (error) - Lex - sql_yacc.yy and sql_yacc_ora.yy - In header files to alloc() calls - Added thd argument to thd_net_is_killed()	2018-05-07 00:07:32 +03:00
Vladislav Vaintroub	6c279ad6a7	MDEV-15091 : Windows, 64bit: reenable and fix warning C4267 (conversion from 'size_t' to 'type', possible loss of data) Handle string length as size_t, consistently (almost always:)) Change function prototypes to accept size_t, where in the past ulong or uint were used. change local/member variables to size_t when appropriate. This fix excludes rocksdb, spider,spider, sphinx and connect for now.	2018-02-06 12:55:58 +00:00
Monty	a7e352b54d	Changed database, tablename and alias to be LEX_CSTRING This was done in, among other things: - thd->db and thd->db_length - TABLE_LIST tablename, db, alias and schema_name - Audit plugin database name - lex->db - All db and table names in Alter_table_ctx - st_select_lex db Other things: - Changed a lot of functions to take const LEX_CSTRING* as argument for db, table_name and alias. See init_one_table() as an example. - Changed some function arguments from LEX_CSTRING to const LEX_CSTRING - Changed some lists from LEX_STRING to LEX_CSTRING - threads_mysql.result changed because process list_db wasn't always correctly updated - New append_identifier() function that takes LEX_CSTRING* as arguments - Added new element tmp_buff to Alter_table_ctx to separate temp name handling from temporary space - Ensure we store the length after my_casedn_str() of table/db names - Removed not used version of rename_table_in_stat_tables() - Changed Natural_join_column::table_name and db_name() to never return NULL (used for print) - thd->get_db() now returns db as a printable string (thd->db.str or "")	2018-01-30 21:33:55 +02:00
Andrei Elkin	529120e1cb	MDEV-13073. This patch is a followup of the previous one to convert the trailing underscore identifier to mariadb standard. For identifier representing class private members the underscore is replaced with a `m_` prefix. Otherwise `_` is just removed.	2017-12-18 13:43:38 +02:00
Andrei Elkin	f279d3c43a	MDEV-13073. This part converts the Ali patch`s identifiers to the MariaDB standard. Also some renaming is done as well as white spaces removal.	2017-12-18 13:43:38 +02:00
Andrei Elkin	6a84e3407d	MDEV-13073. This patch replaces semisync's native function_enter,exit and its custom trace faciltiy with standard DBUG_ based equivalents.	2017-12-18 13:43:38 +02:00
Andrei Elkin	74b35b6874	MDEV-13073. This part patch weeds out RUN_HOOK from the server as semisync is defined statically. Consequently the observer interfaces are removed as well.	2017-12-18 13:43:37 +02:00
Andrei Elkin	e972125f11	MDEV-13073 This part merges the Ali semisync related changes and specifically the ack receiving functionality. Semisync is turned to be static instead of plugin so its functions are invoked at the same points as RUN_HOOKS. The RUN_HOOKS and the observer interface remain to be removed by later patch. Todo: React on killed status by repl_semisync_master.wait_after_sync(). Currently Repl_semi_sync_master::commit_trx does not check the killed status. There were few bugfixes found that are present in mysql and its unclear whether/how they are covered. Those include: Bug#15985893: GTID SKIPPED EVENTS ON MASTER CAUSE SEMI SYNC TIME-OUTS Bug#17932935 CALLING IS_SEMI_SYNC_SLAVE() IN EACH FUNCTION CALL HAS BAD PERFORMANCE Bug#20574628: SEMI-SYNC REPLICATION PERFORMANCE DEGRADES WITH A HIGH NUMBER OF THREADS	2017-12-18 13:43:37 +02:00
Monty	2e53b96a0a	Moved semisync from a plugin to normal server Part of MDEV-13073 AliSQL Optimize performance of semisync Did the following renames to match other similar variables key_ss_mutex_LOCK_binlog_ > key_LOCK_bing key_ss_cond_COND_binlog_send_ -> key_COND_binlog_send COND_binlog_send_ -> COND_binlog_send LOCK_binlog_ -> LOCK_binlog debian/mariadb-server-10.2.install does not install semisync libs.	2017-12-18 13:43:36 +02:00

32 commits