mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-27 09:14:17 +01:00

Author	SHA1	Message	Date
Oleksandr Byelkin	f00711bba2	Merge branch '10.5' into 10.6	2024-10-29 14:20:03 +01:00
Monty	bddbef3573	MDEV-34533 asan error about stack overflow when writing record in Aria The problem was that when using clang + asan, we do not get a correct value for the thread stack as some local variables are not allocated at the normal stack. It looks like that for example clang 18.1.3, when compiling with -O2 -fsanitize=addressan it puts local variables and things allocated by alloca() in other areas than on the stack. The following code shows the issue Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection (connect=0x5080000027b8, put_in_cache=<optimized out>) at sql/sql_connect.cc:1399 THD thd; 1399 thd->thread_stack= (char) &thd; (gdb) p &thd (THD *) 0x7fffedee7060 (gdb) p $sp (void ) 0x7fffef4e7bc0 The address of thd is 24M away from the stack pointer (gdb) info reg ... rsp 0x7fffef4e7bc0 0x7fffef4e7bc0 ... r13 0x7fffedee7060 140737185214560 r13 is pointing to the address of the thd. Probably some kind of "local stack" used by the sanitizer I have verified this with gdb on a recursive call that calls alloca() in a loop. In this case all objects was stored in a local heap, not on the stack. To solve this issue in a portable way, I have added two functions: my_get_stack_pointer() returns the address of the current stack pointer. The code is using asm instructions for intel 32/64 bit, powerpc, arm 32/64 bit and sparc 32/64 bit. Supported compilers are gcc, clang and MSVC. For MSVC 64 bit we are using _AddressOfReturnAddress() As a fallback for other compilers/arch we use the address of a local variable. my_get_stack_bounds() that will return the address of the base stack and stack size using pthread_attr_getstack() or NtCurrentTed() with fallback to using the address of a local variable and user provided stack size. Server changes are: - Moving setting of thread_stack to THD::store_globals() using my_get_stack_bounds(). - Removing setting of thd->thread_stack, except in functions that allocates a lot on the stack before calling store_globals(). When using estimates for stack start, we reduce stack_size with MY_STACK_SAFE_MARGIN (8192) to take into account the stack used before calling store_globals(). I also added a unittest, stack_allocation-t, to verify the new code. Reviewed-by: Sergei Golubchik <serg@mariadb.org>	2024-10-16 17:24:46 +03:00
Monty	567c097359	MDEV-33582 Add more warnings to be able to better diagnose network issues Warnings are added to net_server.cc when global_system_variables.log_warnings >= 4. When the above condition holds then: - All communication errors from net_serv.cc is also written to the error log. - In case of a of not being able to read or write a packet, a more detailed error is given. Other things: - Added detection of slaves that has hangup to Ack_receiver::run() - vio_close() is now first marking the socket closed before closing it. The reason for this is to ensure that the connection that gets a read error can check if the reason was that the socket was closed. - Add a new state to vio to be able to detect if vio is acive, shutdown or closed. This is used to detect if socket is closed by another thread. - Testing of the new warnings is done in rpl_get_lock.test - Suppress some of the new warnings in mtr to allow one to run some of the tests with -mysqld=--log-warnings=4. All test in the 'rpl' suite can now be run with this option. - Ensure that global.log_warnings are restored at test end in a way that allows one to use mtr --mysqld=--log-warnings=4. Reviewed-by: <serg@mariadb.org>,<brandon.nesterenko@mariadb.com>	2024-03-05 20:19:49 +02:00
Monty	1ca813bf76	Added socketpair.c as a replacement for 'pipe()' call for Windows. This was needed to get semisync to work on Windows.	2024-01-23 13:03:11 +02:00
Michael Widenius	7af50e4df4	MDEV-32551: "Read semi-sync reply magic number error" warnings on master rpl_semi_sync_slave_enabled_consistent.test and the first part of the commit message comes from Brandon Nesterenko. A test to show how to induce the "Read semi-sync reply magic number error" message on a primary. In short, if semi-sync is turned on during the hand-shake process between a primary and replica, but later a user negates the rpl_semi_sync_slave_enabled variable while the replica's IO thread is running; if the io thread exits, the replica can skip a necessary call to kill_connection() in repl_semisync_slave.slave_stop() due to its reliance on a global variable. Then, the replica will send a COM_QUIT packet to the primary on an active semi-sync connection, causing the magic number error. The test in this patch exits the IO thread by forcing an error; though note a call to STOP SLAVE could also do this, but it ends up needing more synchronization. That is, the STOP SLAVE command also tries to kill the VIO of the replica, which makes a race with the IO thread to try and send the COM_QUIT before this happens (which would need more debug_sync to get around). See THD::awake_no_mutex for details as to the killing of the replica’s vio. Notes: - The MariaDB documentation does not make it clear that when one enables semi-sync replication it does not matter if one enables it first in the master or slave. Any order works. Changes done: - The rpl_semi_sync_slave_enabled variable is now a default value for when semisync is started. The variable does not anymore affect semisync if it is already running. This fixes the original reported bug. Internally we now use repl_semisync_slave.get_slave_enabled() instead of rpl_semi_sync_slave_enabled. To check if semisync is active on should check the @@rpl_semi_sync_slave_status variable (as before). - The semisync protocol conflicts in the way that the original MySQL/MariaDB client-server protocol was designed (client-server send and reply packets are strictly ordered and includes a packet number to allow one to check if a packet is lost). When using semi-sync the master and slave can send packets at 'any time', so packet numbering does not work. The 'solution' has been that each communication starts with packet number 1, but in some cases there is still a chance that the packet number check can fail. Fixed by adding a flag (pkt_nr_can_be_reset) in the NET struct that one can use to signal that packet number checking should not be done. This is flag is set when semi-sync is used. - Added Master_info::semi_sync_reply_enabled to allow one to configure some slaves with semisync and other other slaves without semisync. Removed global variable semi_sync_need_reply that would not work with multi-master. - Repl_semi_sync_master::report_reply_packet() can now recognize the COM_QUIT packet from semisync slave and not give a "Read semi-sync reply magic number error" error for this case. The slave will be removed from the Ack listener. - On Windows, don't stop semisync Ack listener just because one slave connection is using socket_id > FD_SETSIZE. - Removed busy loop in Ack_receiver::run() by using "Self-pipe trick" to signal new slave and stop Ack_receiver. - Changed some Repl_semi_sync_slave functions that always returns 0 from int to void. - Added Repl_semi_sync_slave::slave_reconnect(). - Removed dummy_function Repl_semi_sync_slave::reset_slave(). - Removed some duplicate semisync notes from the error log. - Add test of "if (get_slave_enabled() && semi_sync_need_reply)" before calling Repl_semi_sync_slave::slave_reply(). (Speeds up the code as we can skip all initializations). - If epl_semisync_slave.slave_reply() fails, we disable semisync for that connection. - We do not call semisync.switch_off() if there are no active slaves. Instead we check in Repl_semi_sync_master::commit_trx() if there are no active threads. This simplices the code. - Changed assert() to DBUG_ASSERT() to ensure that the DBUG log is flushed in case of asserts. - Removed the internal rpl_semi_sync_slave_status as it is not needed anymore. The @@rpl_semi_sync_slave_status status variable is now mapped to rpl_semi_sync_enabled. - Removed rpl_semi_sync_slave_enabled as it is not needed anymore. Repl_semi_sync_slave::get_slave_enabled() contains the active status. - Added checking that we do not add a slave twice with Ack_receiver::add_slave(). This could happen with old code. - Removed Repl_semi_sync_master::check_and_switch() as it is not needed anymore. - Ensure that when we call Ack_receiver::remove_slave() that the slave is removed from the listener before function returns. - Call listener.listen_on_sockets() outside of mutex for better performance and less contested mutex. - Ensure that listening is ignoring newly added slaves when checking for responses. - Fixed the master ack_receiver listener is not killed if there are no connected slaves (and thus stop semisync handling of future connections). This could happen if all slaves sockets where would be marked as unreliable. - Added unlink() to base_ilist_iterator and remove() to I_List_iterator. This enables us to remove 'dead' slaves in Ack_recever::run(). - kill_zombie_dump_threads() now does killing of dump threads properly. - It can now kill several threads (should be impossible but could happen if IO slaves reconnects very fast). - We now wait until the dump thread is done before starting the dump. - Added an error if kill_zombie_dump_threads() fails. - Set thd->variables.server_id before calling kill_zombie_dump_threads(). This simplies the code. - Added a lot of comments both in code and tests. - Removed DBUG_EVALUATE_IF "failed_slave_start" as it is not used. Test changes: - rpl.rpl_session_var2 added which runs rpl.rpl_session_var test with semisync enabled. - Some timings changed slight with startup of slave which caused rpl_binlog_dump_slave_gtid_state_info.text to fail as it checked the error log file before the slave had started properly. Fixed by adding wait_for_pattern_in_file.inc that allows waiting for the pattern to appear in the log file. - Tests have been updated so that we first set rpl_semi_sync_master_enabled on the master and then set rpl_semi_sync_slave_enabled on the slaves (this is according to how the MariaDB documentation document how to setup semi-sync). - Error text "Master server does not have semi-sync enabled" has been replaced with "Master server does not support semi-sync" for the case when the master supports semi-sync but semi-sync is not enabled. Other things: - Some trivial cleanups in Repl_semi_sync_master::update_sync_header(). - We should in 11.3 changed the default value for rpl-semi-sync-master-wait-no-slave from TRUE to FALSE as the TRUE does not make much sense as default. The main difference with using FALSE is that we do not wait for semisync Ack if there are no slave threads. In the case of TRUE we wait once, which did not bring any notable benefits except slower startup of master configured for using semisync. Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com> This solves the problem reported in MDEV-32960 where a new slave may not be registered in time and the master disables semi sync because of that.	2024-01-23 13:03:11 +02:00
Oleksandr Byelkin	6cfd2ba397	Merge branch '10.4' into 10.5	2023-11-08 12:59:00 +01:00
Andrei	9c43343213	MDEV-32365 detailize the semisync replication magic number error Semisync ack (master side) receiver thread is made to report details of faced errors. In case of 'magic byte' error, a hexdump of the received packet is always (level) NOTEd into the error log. In other cases an exact server level error is print out as a warning (as it may not be critical) under log_warnings > 2. An MTR test added for the magic byte error. For others existing mtr tests cover that, provided log_warnings > 2 is set.	2023-10-26 20:24:44 +03:00
Marko Mäkelä	559efad44e	Merge 10.4 into 10.5	2021-04-27 09:10:47 +03:00
Marko Mäkelä	90a306a7ab	Merge 10.3 into 10.4	2021-04-27 08:53:50 +03:00
Sujatha	391f1aa6ee	MDEV-24773: slave_compressed_protocol doesn't work properly with semi-sync replication Back port upstream fix commit 1800b015a1d487330f7b15f2020b887be348a66b Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com> Date: Fri Sep 8 20:29:22 2017 +0530 Bug#26027024 SLAVE_COMPRESSED_PROTOCOL DOESN'T WORK WITH SEMI-SYNC REPLICATION IN MYSQL-5.7 Analysis: In mysql-5.6, dump thread (the thread that is created on Master after Slave requested for a binlog dump) is also used to receive acknowledgements from the Slave and act on them accordingly. For performance reasons, a special thread called Ack Receiver thread is added in mysql-5.7 Semi synchronous replication plugin. This thread does not have special handling to receive acknowledgements if Slave has enabled compression in the protocol. Hence Master is unable to handle any slave if Slave_compressed_protocol is enabled on it. Fix: Enable compress flag on the communication channels if the Slave has Slave_compressed_protocol ON.	2021-04-26 11:09:39 +05:30
Sergei Golubchik	7af733a5a2	perfschema compilation, test and misc fixes	2020-03-10 19:24:23 +01:00
Marko Mäkelä	d3dcec5d65	Merge 10.3 into 10.4	2019-05-05 15:06:44 +03:00
Sergey Vojtovich	779fb636da	Revert THD::THD(skip_global_sys_var_lock) argument Originally introduced by `e972125f1` to avoid harmless wait for LOCK_global_system_variables in a newly created thread, which creation was initiated by system variable update. At the same time it opens dangerous hole, when system variable update thread already released LOCK_global_system_variables and ack_receiver thread haven't yet completed new THD construction. In this case THD constructor goes completely unprotected. Since ack_receiver.stop() waits for the thread to go down, we have to temporarily release LOCK_global_system_variables so that it doesn't deadlock with ack_receiver.run(). Unfortunately it breaks atomicity of rpl_semi_sync_master_enabled updates and makes them not serialized. LOCK_rpl_semi_sync_master_enabled was introduced to workaround the above. TODO: move ack_receiver start/stop into repl_semisync_master enable_master/disable_master under LOCK_binlog protection? Part of MDEV-14984 - regression in connect performance	2019-05-03 16:46:11 +04:00
Sergey Vojtovich	9824ec81aa	Removed redundant service_thread_count In contrast to thread_count, which is decremented by THD destructor, this one was most probably intended to be decremented after all THD destructors are done. THD_count class was added to achieve similar effect with thread_count. Aim is to reduce usage of LOCK_thread_count and COND_thread_count. Part of MDEV-15135.	2019-01-28 17:39:08 +04:00
Andrei Elkin	573c4db57a	MDEV-17437 followup: fixing compilation on non-HAVE_POLL.	2018-11-13 14:54:33 +02:00
Andrei Elkin	6db773a542	MDEV-17437 Semisync master fires invalid fd value assert The semisync ack collector hits fd's out-of-bound value assert through a stack of /usr/sbin/mysqld(_ZN12Ack_receiver17get_slave_socketsEP6fd_setPj+0x70)[0x7fa3bbe27400] /usr/sbin/mysqld(_ZN12Ack_receiver3runEv+0x540)[0x7fa3bbe27980] /usr/sbin/mysqld(ack_receive_handler+0x19)[0x7fa3bbe27a79] The reason of the failure must be the same as in https://bugs.mysql.com/bug.php?id=79865 whose fixes are applied with minor changes. Specifically, the semisync ack thread is changed to use poll() instead of select() on platforms where the former is defined. On the systems that still use select(), Ack receive thread will generate an error and semi sync will be switched off. Windows systems is exception case because on windows this limitation does not exists. The sustain manual testing with `mysqlslap --concurrency > 1024' in "background" while the slave io thread is restarting multiple times.	2018-11-13 10:29:18 +02:00
Sergei Golubchik	08b91548db	MDEV-16495 mariadb segfaults at start on FreeBSD don't use MY_MUTEX_INIT_FAST in constructors of statically allocated objects.	2018-08-12 11:37:42 +02:00
Vladislav Vaintroub	6c279ad6a7	MDEV-15091 : Windows, 64bit: reenable and fix warning C4267 (conversion from 'size_t' to 'type', possible loss of data) Handle string length as size_t, consistently (almost always:)) Change function prototypes to accept size_t, where in the past ulong or uint were used. change local/member variables to size_t when appropriate. This fix excludes rocksdb, spider,spider, sphinx and connect for now.	2018-02-06 12:55:58 +00:00
Aleksey Midenkov	c59c1a0736	System Versioning 1.0 pre8 Merge branch '10.3' into trunk	2018-01-10 12:36:55 +03:00
Sergei Golubchik	e577b5667a	fix compilation w/o P_S	2018-01-09 14:21:08 +03:00
Vladislav Vaintroub	a603b46593	Fix warnings	2018-01-07 11:37:38 +00:00
Andrei Elkin	f279d3c43a	MDEV-13073. This part converts the Ali patch`s identifiers to the MariaDB standard. Also some renaming is done as well as white spaces removal.	2017-12-18 13:43:38 +02:00
Andrei Elkin	6a84e3407d	MDEV-13073. This patch replaces semisync's native function_enter,exit and its custom trace faciltiy with standard DBUG_ based equivalents.	2017-12-18 13:43:38 +02:00
Andrei Elkin	e972125f11	MDEV-13073 This part merges the Ali semisync related changes and specifically the ack receiving functionality. Semisync is turned to be static instead of plugin so its functions are invoked at the same points as RUN_HOOKS. The RUN_HOOKS and the observer interface remain to be removed by later patch. Todo: React on killed status by repl_semisync_master.wait_after_sync(). Currently Repl_semi_sync_master::commit_trx does not check the killed status. There were few bugfixes found that are present in mysql and its unclear whether/how they are covered. Those include: Bug#15985893: GTID SKIPPED EVENTS ON MASTER CAUSE SEMI SYNC TIME-OUTS Bug#17932935 CALLING IS_SEMI_SYNC_SLAVE() IN EACH FUNCTION CALL HAS BAD PERFORMANCE Bug#20574628: SEMI-SYNC REPLICATION PERFORMANCE DEGRADES WITH A HIGH NUMBER OF THREADS	2017-12-18 13:43:37 +02:00

24 commits