mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-30 18:41:56 +01:00

Author	SHA1	Message	Date
Sachin Kumar	355dc74b76	MDEV-22370 safe_mutex: Trying to lock uninitialized mutex at /data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL Problem:- When we issue FTWRL with shutdown in parallel, there is race between FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool) before FTWRL can lock it. So we can get crash on FTWRL thread Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for FTWRL thread to complete its work , and then destroy. So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed later in end_slave()	2021-05-14 11:49:46 +01:00
Andrei Elkin	3616640a31	MDEV-20821 parallel slave server shutdown hang Parallel slave server shutdown found to be hanging in close_connections() triggered by shutdown due to a slave worker thread would not be notified to exit in case the worker was sitting idle. Fixed with destroying the worker pool earlier that is in slave_prepare_for_shutdown() when all their driver threads have already left. A test file is added to simulate the bug condition as well as check multi-sourced and not-idle worker cases.	2021-05-14 11:49:26 +01:00
Sergei Golubchik	6a1cb449fe	cleanup: remove slave background thread, use handle_manager thread instead	2021-01-24 11:35:55 +01:00
Daniel Black	29d9897fe2	MDEV-10272: add master host/port info to slave thread exit messages Sample log error message generated: 2021-01-21 2:33:24 139912137520896 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 369 33:24 139912137520896 [Note] master was 127.0.0.1:16400 2021-01-21 2:33:24 139912137828096 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 369 2021-01-21 2:33:24 139912137828096 [Note] master was 127.0.0.1:16400 Based on work by Hartmut Holzgraefe. Reviewer: knielsen@knielsen-hq.org, Andrei, Sachin	2021-01-22 10:06:33 +11:00
Sujatha	984a06db2c	MDEV-4633: multi_source.simple test fails sporadically Analysis: ======== Writes to 'rli->log_space_total' needs to be synchronized, otherwise both SQL_THREAD and IO_THREAD can try to modify the variable simultaneously resulting in incorrect rli->log_space_total. In the current test scenario SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log' and IO_THREAD is trying to increment the 'rli->log_space_total' in 'queue_event' simultaneously. Hence test occasionally fails with result mismatch. Fix: === Convert 'rli->log_space_total' variable to atomic type.	2020-11-12 13:04:39 +05:30
Sujatha	b3dd95e035	MDEV-14203: rpl.rpl_extra_col_master_myisam, rpl.rpl_slave_load_tmpdir_not_exist failed in buildbot with a warning Problem: ======= rpl.rpl_slave_load_tmpdir_not_exist 'stmt' w3 [ fail ] Found warnings/errors in server log file! Test ended at 2017-09-27 20:34:55 [Warning] Master is configured to log replication events with checksum, but will not send such events to slaves that cannot process them ^ Found warnings in /mnt/buildbot/build/mariadb-10.2.10/mysql-test/var/3/log/mysqld.1.err ok Analysis: ======== When slave tries to connect to master 'get_master_version_and_clock' function is invoked to perform elaborated slave-master handshake. During this process slave server queries master server, to know if it is checksum aware and at the same time master is notified about its CRC-awareness. The master's side instant value of @@global.binlog_checksum is stored in the dump thread's uservar area as well as cached locally to become known in consensus by master and slave. Post hand-shake slave requests master for binlog dump. It sends 'COM_BINLOG_DUMP'. This command is sent to master by 'cli_advanced_command' call. If there is some temporary network failure during this request_dump call, 'end_server' is invoked to close the current connection between master and slave. Upon connection close the dump thread on the master gets terminated and it clears the 'uservar' data it got through master-slave handshake. The 'COM_BINLOG_DUMP' command is sent once again without master-slave handshake. Since the checksum data is not available with new dump thread a warning gets reported. Fix: === Upon network write error donot attempt reconnect, proceed to master-slave handshake. This ensures that master is aware of slave's capability to use checksums.	2020-07-23 12:54:40 +05:30
Andrei Elkin	0c1f97b3ab	MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTIL The immediate bug was caused by a failure to recognize a correct position to stop the slave applier run in optimistic parallel mode. There were the following set of issues that the analysis unveil. 1 incorrect estimate for the event binlog position passed to is_until_satisfied 2 wait for workers to complete by the driver thread did not account non-group events that could be left unprocessed and thus to mix up the last executed binlog group's file and position: the file remained old and the position related to the new rotated file 3 incorrect 'slave reached file:pos' by the parallel slave report in the error log 4 relay log UNTIL missed out the parallel slave branch in is_until_satisfied. The patch addresses all of them to simplify logics of log change notification in either the master and relay-log until case. P.1 is addressed with passing the event into is_until_satisfied() for proper analisis by the function. P.2 is fixed by changes in handle_queued_pos_update(). P.4 required removing relay-log change notification by workers. Instead the driver thread updates the notion of the current relay-log fully itself with aid of introduced bool Relay_log_info::until_relay_log_names_defer. An extra print out of the requested until file:pos is arranged with --log-warning=3.	2020-05-26 18:49:43 +03:00
Oleksandr Byelkin	ca091e6372	Merge branch '10.1' into 10.2	2020-05-02 08:44:17 +02:00
Oleksandr Byelkin	23c6fb3e62	Merge branch '5.5' into 10.1	2020-04-30 17:36:41 +02:00
Sergei Golubchik	6bb28e0bc5	Bug#29915479 RUNNING COM_REGISTER_SLAVE WITHOUT COM_BINLOG_DUMP CAN RESULTS IN SERVER EXIT in fact, in MariaDB it cannot, but it can show spurious slaves in SHOW SLAVE HOSTS. slave was registered in COM_REGISTER_SLAVE and un-registered after COM_BINLOG_DUMP. If there was no COM_BINLOG_DUMP, it would never unregister.	2020-04-30 10:13:18 +02:00
Sergei Golubchik	8c534bdeb8	cleanup: remove dbug keywords that are never used	2020-04-29 18:17:08 +02:00
Marko Mäkelä	c06845d6f0	Merge 10.1 into 10.2	2020-04-27 13:28:13 +03:00
Marko Mäkelä	6be05ceb05	MDEV-22203: WSREP_ON is unnecessarily expensive to evaluate This is a backport of the applicable part of commit `93475aff8d` and commit `2c39f69d34` from 10.4. Before 10.4 and Galera 4, WSREP_ON is a macro that points to a global Boolean variable, so it is not that expensive to evaluate, but we will add an unlikely() hint around it. WSREP_ON_NEW: Remove. This macro was introduced in commit `c863159c32` when reverting WSREP_ON to its previous definition. We replace some use of WSREP_ON with WSREP(thd), like it was done in `93475aff8d`. Note: the macro WSREP() in 10.1 is equivalent to WSREP_NNULL() in 10.4. Item_func_rand::seed_random(): Avoid invoking current_thd when WSREP is not enabled.	2020-04-27 09:40:51 +03:00
seppo	5918b17004	MDEV-21473 conflicts with async slave BF aborting (#1475 ) If async slave thread (slave SQL handler), becomes a BF victim, it may occasionally happen that rollbacker thread is used to carry out the rollback instead of the async slave thread. This can happen, if async slave thread has flagged "idle" state when BF thread tries to figure out how to kill the victim. The issue was possible to test by using a galera cluster as slave for external master, and issuing high load of conflicting writes through async replication and directly against galera cluster nodes. However, a deterministic mtr test for the "conflict window" has not yet been worked on. The fix, in this patch makes sure that async slave thread state is never set to IDLE. This prevents the rollbacker thread to intervene. The wsrep_query_state change was refactored to happen by dedicated function to make controlling the idle state change in one place.	2020-03-24 11:01:42 +02:00
seppo	4618c974e4	MDEV-21723 Async slave thread BF abort and replaying fixes (#1448 ) If async replication slave thread conflicts with cluster replication, then the async slave transaction should be BF aborted, and depending on the state of async slave transaction execution, potentially also replayed. There were problems in such BF abort implementation and the replaying was not started. This pull request contains fixes which make sure that if async slave thread is marked to abort and replay, it will complete carry out the rollback and release all locks and resources before starting the replaying. After replaying, async slave transactions is treated as successful, so the slave thread will continue as usual, handling next replication event. There is also new mtr test: galera.galera_slave_replay, which stresses both a certification failure for async slave thread and a successful BF abort followed by replaying.	2020-02-23 10:29:42 +02:00
Oleksandr Byelkin	3a1716a7e7	Merge branch '10.1' into 10.2	2020-01-20 16:15:05 +01:00
Oleksandr Byelkin	f31bf6f094	Merge branch '5.5' into 10.1	2020-01-19 12:22:12 +01:00
Markus Mäkelä	5683c113b8	Use get_ident_len in heartbeat event error messages The string doesn't appear to be null-terminated when binlog checksums are enabled. This causes a corrupt binlog name in the error message when a slave is ahead of the master.	2020-01-13 14:50:02 +02:00
Jan Lindström	c9b9eb3315	MDEV-18497 : CTAS async replication from mariadb master crashes galera nodes (#1410 ) In MariaDB 10.2 master could have been configured so that there is extra annotate events. When we peak next event type for CTAS we need to skip annotate events.	2019-12-04 11:46:37 +02:00
Oleksandr Byelkin	f8b5e147da	Merge branch '10.1' into 10.2	2019-12-03 14:45:06 +01:00
seppo	38839854b7	MDEV-19572 async slave node fails to apply MyISAM only writes (#1418 ) The problem happens when MariaDB master replicates writes for only non InnoDB tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster, can apply these writes successfully, but it will, in the end, write gtid position in mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and this write makes innodb handlerton part of the replicated "transaction". Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated and skips appending wsrep keys for these writes. However, as InnoDB was present in the transaction, and there are replication events (for MyISAM table) in transaction cache, but there are no appended keys, wsrep raises an error, and this makes the söave thread to stop. The fix is simply to not treat it as an error if async slave tries to replicate a write set with binlog events, but no keys. We just skip wsrep replication and return successfully. This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be of InnoDB engine, and executes MyISAM only write through asyn replication. There is additional fix for declaring IO and background slave threads as non wsrep. These threads should not write anything for wsrep replication, and this is just a safeguard to make sure nothing leaks into cluster from these slave threads.	2019-11-26 08:49:50 +02:00
seppo	5c68343db7	MDEV-18497 CTAS async replication from mariadb master crashes galera nodes (#1410 ) This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster. The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure. PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster. This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.	2019-11-18 15:18:00 +02:00
Sachin Setiya	27664ef29d	MDEV-20574 Position of events reported by mysqlbinlog is wrong with encrypted binlogs, SHOW BINLOG EVENTS reports the correct one. Analysis Mysqlbinlog output for encrypted binary log #Q> insert into tab1 values (3,'row 003') #190912 17:36:35 server id 10221 end_log_pos 980 CRC32 0x53bcb3d3 Table_map: `test`.`tab1` mapped to number 19 # at 940 #190912 17:36:35 server id 10221 end_log_pos 1026 CRC32 0xf2ae5136 Write_rows: table id 19 flags: STMT_END_F Here we can see Table_map_log_event ends at 980 but Next event starts at 940. And the reason for that is we do not send START_ENCRYPTION_EVENT to the slave Solution:- Send Start_encryption_log_event as Ignorable_log_event to slave(mysqlbinlog), So that mysqlbinlog can update its log_pos. Since Slave can request multiple FORMAT_DESCRIPTION_EVENT while master does not have so We only update slave master pos when master actually have the FORMAT_DESCRIPTION_EVENT. Similar logic should be applied for START_ENCRYPTION_EVENT. Also added the test case when new server reads the data from old server which does not send START_ENCRYPTION_EVENT to slave. Master Slave Upgrade Scenario. When Slave is updated first, Slave will have extra logic of handling START_ENCRYPTION_EVENT But master willnot be sending START_ENCRYPTION_EVENT. So there will be no issue. When Master is updated first, It will send START_ENCRYPTION_EVENT to slave , But slave will ignore this event in queue_event.	2019-10-08 14:35:34 +05:30
Eugene Kosov	d36c107a6b	imporve clang build cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about deprecated `register` keyword. Too much warnings came from Mroonga and I gave up on it.	2019-06-25 13:21:36 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	cb248f8806	Merge branch '5.5' into 10.1	2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru	5543b75550	Update FSF Address * Update wrong zip-code	2019-05-11 21:29:06 +03:00
Sujatha Sivakumar	e42192d7b3	MDEV-13895: GTID and Master_Delay causes excessive initial delay Problem: ======== When attempting to delay a Slave attached with GTID, there appears to be an extra delay applied initially. For example, this output reflects a Slave that is already delayed by 43200 seconds. When switching to GTID replication, replication is paused until SQL_Remaining_Delay counts down to 0: CHANGE MASTER TO master_use_gtid=current_pos; CHANGE MASTER TO MASTER_DELAY=43200; Seconds_Behind_Master: 44847 Using_Gtid: Current_Pos SQL_Delay: 43200 SQL_Remaining_Delay: 43089 Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event Analysis: ========= When slave initiates a GTID based connection request to master, the master sends two GTID_LIST events. The first one is actual GTID_LIST event and the second one is a fake GTID_LIST event. This is sent by master to provide its current binlary log file position. The fake GTID_LIST events will have their ev->when=0. 'when' (the timestamp) is set to 0 so that slave could distinguish between real and fake Rotate events. On slave side when MASTER_DELAY is configured to "X" the applier will ensure that there is a time delay of "X" seconds before the event is applied. General behaviour of MASTER_DELAY example:- Master timestamp of event e1=10 timestamp of event e2=11 On slave MASTER_DELAY=5 Event e1 will be applied at = 15 e2 will be applied at =16 In bug scenario:- On Master: With GTIDs timestamp of event e1=10 timestamp of event e2=0 On Slave: e1 will be applied at = 10 + 5 =15 For e2, since "e2->when=0" e2->when is set to current timestamp. i.e since the e2->when and current timestamp on slave is the same applier waits for additional master_delay=5 seconds. the ev->when contributes to "rli->last_master_timestamp". rli->last_master_timestamp= ev->when + (time_t) ev->exec_time; Fake events should not update the "ev->when" to "current timestamp" on slave. Fix: === Remove the assignment of current timestamp to "ev->when" when "ev->when=0".	2019-03-28 20:35:39 +05:30
Andrei Elkin	c565622c6c	MDEV-14528 followup. There was a failure in rpl_delayed_slave after recent MDEV-14528 commit. The parallel applier should not set its Relay_log::last_master_timestamp from Format-descriptor log event. The latter may reflect a deep past so Seconds-behind-master will be computed through it and displayed all time while the first possibly "slow" group of events is executed. The main MDEV-14528 is refined, rpl_delayed_slave now passes also in the parallel mode.	2018-11-07 21:13:50 +02:00
Igor Mazur	54b8856b87	MDEV-14528 Track master timestamp in case rolling back to serial replication When replicated events are from Master unaware of MariaDB GTID their handling by the Parallel slave misses Seconds_Behind_Master updating. In the bug condition the Show-Slave-Status' field remains unchanged. Because in such case event execution is sequential the bug is fixed with deploying the same logics as in the explicit single-threaded mode with is to set Relay_log_event::last_master_timestamp member early at the end of event reading from the relay log.	2018-11-06 21:11:49 +02:00
Sergei Golubchik	b942aa34c1	Merge branch '10.1' into 10.2	2018-06-21 23:47:39 +02:00
Vicențiu Ciorbaru	6e55236c0a	Merge branch '10.0-galera' into 10.1	2018-06-12 19:39:37 +03:00
Sergei Golubchik	9b1824dcd2	Merge branch '10.1' into 10.2	2018-05-10 13:01:42 +02:00
Jan Lindström	648cf7176c	Merge remote-tracking branch 'origin/5.5-galera' into 10.0-galera	2018-05-07 13:49:14 +03:00
sjaakola	2f0b8f3e02	MDEV-16005 sporadic failures with galera tests MW-328B and MW-328C These test can sporadically show mutex deadlock warnings between LOCK_wsrep_thd and LOCK_thd_data mutexes. This means that these mutexes can be locked in opposite order by different threads, and thus result in deadlock situation. To fix such issue, the locking policy of these mutexes should be revised and enforced to be uniform. However, a quick code review shows that the number of lock/unlock operations for these mutexes combined is between 100-200, and all these mutex invocations should be checked/fixed. On the other hand, it turns out that LOCK_wsrep_thd is used for protecting access to wsrep variables of THD (wsrep_conflict_state, wsrep_query_state), whereas LOCK_thd_data protects query, db and mysys_var variables in THD. Extending LOCK_thd_data to protect also wsrep variables looks like a viable solution, as there should not be a use case where separate threads need simultaneous access to wsrep variables and THD data variables. In this commit LOCK_wsrep_thd mutex is refactored to be replaced by LOCK_thd_data. By bluntly replacing LOCK_wsrep_thd by LOCK_thd_data, will result in double locking of LOCK_thd_data, and some adjustements have been performed to fix such situations.	2018-04-24 16:57:39 +03:00
Marko Mäkelä	39a4985520	Remove most 'register' use in C++ Modern compilers (such as GCC 8) emit warnings that the 'register' keyword is deprecated and not valid C++17. Let us remove most use of the 'register' keyword. Code in 'extra/' is not touched.	2018-04-24 12:48:27 +03:00
Vladislav Vaintroub	84e8e07e8e	Revert "Fix 2 more VS2015 warnings" This reverts commit `0ea45725d8`, since it breaks clang	2018-02-19 16:38:21 +00:00
Vladislav Vaintroub	0ea45725d8	Fix 2 more VS2015 warnings	2018-02-19 15:22:24 +00:00
Monty	5e0b13d173	Fixed wrong arguments to printf and related functions Other things, mainly to get create_mysqld_error_find_printf_error tool to work: - Added protection to not include mysqld_error.h twice - Include "unireg.h" instead of "mysqld_error.h" in server - Added protection if ER_XX messages are already defined - Removed wrong calls to my_error(ER_OUTOFMEMORY) as my_malloc() and my_alloc will do this automatically - Added missing %s to ER_DUP_QUERY_NAME - Removed old and wrong calls to my_strerror() when using MY_ERROR_ON_RENAME (wrong merge) - Fixed deadlock error message from Galera. Before the extra information given to ER_LOCK_DEADLOCK was missing because ER_LOCK_DEADLOCK doesn't provide any extra information. I kept #ifdef mysqld_error_find_printf_error_used in sql_acl.h to make it easy to do this kind of check again in the future	2018-01-04 16:24:09 +02:00
Sachin Setiya	3cecb1bab3	Merge tag 'mariadb-10.0.33' into bb-10.0-galera	2017-11-03 12:34:05 +05:30
Sergei Golubchik	e0a1c745ec	Merge branch '10.1' into 10.2	2017-10-24 14:53:18 +02:00
Sergei Golubchik	9d2e2d7533	Merge branch '10.0' into 10.1	2017-10-22 13:03:41 +02:00
Alexey Yurchenko	86d31ce9f1	MW-384 protect access to wsrep_ready variable with mutex	2017-10-19 09:34:09 +03:00
Jan Lindström	8da6b4ef52	Merge tag 'mariadb-5.5.58' into 5.5-galera	2017-10-19 09:06:17 +03:00
Sergei Golubchik	da4503e956	Merge branch '5.5' into 10.0	2017-10-18 15:14:39 +02:00
Sergei Golubchik	df5f25fa7a	Merge branch 'mysql/5.5' into 5.5	2017-10-17 10:18:17 +02:00
Vladislav Vaintroub	7354dc6773	MDEV-13384 - misc Windows warnings fixed	2017-09-28 17:20:46 +00:00
Vladislav Vaintroub	eba44874ca	MDEV-13844 : Fix Windows warnings. Fix DBUG_PRINT. - Fix win64 pointer truncation warnings (usually coming from misusing 0x%lx and long cast in DBUG) - Also fix printf-format warnings Make the above mentioned warnings fatal. - fix pthread_join on Windows to set return value.	2017-09-28 17:20:46 +00:00
Venkatesh Duggirala	d75f8a1742	Bug#24763131 LOCAL-INFILE DEFAULT SHOULD BE DISABLED Problem & Analysis: Slave's Receiver thread, Applier thread and worker threads are created with LOCAL-INFILE option enabled. As the document says https://dev.mysql.com/doc/refman/5.7/en/load-data-local.html, there are some issues if a thread enables local infile. This flag should be enabled with care. But for the above mentioned internal threads, server is enabling it at the time of creation. Fix: Further analysis on the code shows that none of threads really need this flag to be enabled at any time as Slave never executes "LOAD DATA LOCAL INFILE" after reading it from Relay log. Applier thread removes "LOCAL" before start executing the query.	2017-08-23 09:16:12 +05:30
Sergei Golubchik	cb1e76e4de	Merge branch '10.1' into 10.2	2017-08-17 11:38:34 +02:00

1 2 3 4 5 ...

2479 commits