mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-15 03:17:20 +02:00

Author	SHA1	Message	Date
Kristian Nielsen	5335c85a47	Binlog-in-engine: Fix incorrect handling of internal 2pc rollback The error handling for internal 2pc transactions (eg. RocksDB/Spider) would incorrectly try to handle the engine binlog_unlog() during rollback, in binlog_post_rollback(); this should instead be handled solely in log_and_order() and unlog(). This could trigger for example in parallel replication error handling, causing assertions when wrongly entering XA code paths. Also fix a couple bugs found during debug: - Don't send format description even to the slave from before the starting GTID position, as that can cause the slave to wrongly drop temporary tables. - When looking up the initial GTID position for a new dump thread, wait for the necessary part of the binlog to become durable before reading it. - Don't error when searching the initial GTID position if reaching EOF of the durable portion, instead search back to an earlier GTID state record. - A rare race in the test framework that could fail to kill off lingering dump threads before RESET MASTER. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:34:24 +02:00
Kristian Nielsen	956942461a	Binlog-in-innodb: Small compile/test fixes after 11.4 rebase Following rebase on latest 11.4, a few compile and test errors need fixing. For now, these fixes are not distributed on the individual patches in the series, but just on top here. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:32:41 +02:00
Kristian Nielsen	caaf6221a6	Binlog-in-engine: XA: Fix hang during server shutdown. Whenever a record is written to the binlog, it must be entered into the pending LSN fifo. This was missing for XA PREPARE and XA ROLLBACK. If a prepare or rollback record was at the end of the binlog, the tablespace close during shutdown would hang waiting for the record to be marked durable, which never happened as it was missing from the LSN fifo. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:30:41 +02:00
Kristian Nielsen	b37107b4d1	Binlog-in-engine: Fix 3bugs found during RQG testing 1. Fix the GTID lookup of a connecting slave/dump thread to not look at parts of the binlog that are not yet durable on disk on the master. This could cause the dump thread to be ahead of the valid durable end-point of the reader, causing assertion. 2. Fix bug in the flushing of binlog pages. The background flush thread would incorrectly flush at most one page per pthread_cond wakeup, which would cause it to get behind and binlog page flush to disk be delayed. 3. Fix incorrect check during InnoDB recovery scan of redo log; binlog redo records are allowed to be larger than InnoDB tablespace page size. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:30:37 +02:00
Kristian Nielsen	10218b8d85	Binlog-in-engine: Initial support for 2pc and XA At XA PREPARE, spill all events (including COMMIT end event) as OOB, and call into the engine to binlog a PREPARE record. Store the OOB reference along with the XID in an engine-binlog internal hash. At XA COMMIT, fetch the OOB reference from the internal hash and put it into a COMMIT record for the transaction. For both user XA and internal two-phase commit between binlog and other storage engine, write the XID into an XA complete event in the same mtr as the commit record. This record will be later used to be able to consistently recover (commit or rollback) prepared transactions in the other engines, depending on whether binlog write became durable before the crash or not. At XA ROLLBACK, merely put in an XA complete event. Maintain reference counts for prending prepared XA transactions, and for pending two-phase commit records, to make sure binlog files containing these will not be purged while those transactions are active. Implement the necessary "unlog" mechanisms so that the reference counts can be released only after all other participating engines have durably committed (respectively XA prepared/rolled back) their part of the transaction. This commit does not handle XA/binlog crash recovery, will come in a later patch. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:29:45 +02:00
Kristian Nielsen	649851f52b	Binlog-in-engine: Fix binary search for GTID position When finding the midpoint for each step in the binary search, that midpoint was not correctly rounded to the nearest page containing a GTID state record (when the range from the low to the high point is an odd multiple of number of innodb_binlog_state_interval bytes). This caused the search to look at the wrong page (and assert in debug build). Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:28:01 +02:00
Kristian Nielsen	cf126c83d4	Binlog-in-engine: Fix busy-wait in binlog reader A couple of bugs in the checks in ha_innodb_binlog_reader::wait_available() for if new data is available. These could cause the reader (eg. binlog dump thread on the master) to busy-loop instead of proper pthread_cond_wait()'ing on new data to become durable and available for sending to slave. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:27:51 +02:00
Kristian Nielsen	dcfe9aec50	Binlog-in-engine: Support for combined stmt/trx cache in binlog In most cases, only one of stmt and trx caches will be empty when binlogging an event group. However, it is possible to have data in both when using autocommit and combining non-transactional and transactional changes in the same statement. This patch implements handling this case, the main issue being the existence of two independent out-of-band references. The commit record is extended to contain up to two oob references, and the reader is extended to be able to read both of them. For simplicity, when this (rare) case occurs, we always spill the full content of both caches (except the GTID event) as oob. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:27:40 +02:00
Kristian Nielsen	89f0ae10c6	Binlog-in-engine: SHOW BINLOG EVENT improvements. Improve SHOW BINLOG EVENTS FROM when the specified position does not correspond exactly to a valid binlog record position. Scan the page containing the requested position, and start from the first valid point at or after that position. If a position is specified that is past the end of data available in the binlog file, SHOW BINLOG EVENTS returns empty. Make the format description event written at server startup also have end_log_pos=0, for consistency. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:24:40 +02:00
Kristian Nielsen	1fca55e590	Binlog-in-engine: Fix out-of-bounds read Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:24:37 +02:00
Kristian Nielsen	9463726469	Binlog-in-engine: Fixes for some review comments Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:24:15 +02:00
Kristian Nielsen	6a4c541fb5	Binlog-in-engine: Bug fix around crash-safe slave Fix race where trx_group_commit_leader() was accessing the group commit queue after waking up participants, which can invalidate the queue. Instead do the remaining operations in the individual thread for each group commit participant. Also fix a problem where entries could be inserted out-of-order in the pending LSN fifo, when the queue was empty after removing a later LSN, and then an earlier LSN got inserted. This could move back the durable binlog offset, causing slaves to not receive events. Seen as sporadic failures of test case binlog_in_engine.mariabackup_slave_provision_nolock. A few other test tweaks to make them robust to sporadic failures. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:15:16 +02:00
Kristian Nielsen	b4f59c9e54	Binlog-in-engine: Report master restart to slave Write a single format description event to the engine binlog at server startup. This format description event - like for the legacy binlog - is used to inform the slave server about the master restart. This is used by the slave to drop any temporary tables that were binlogged by the master before the restart, and are now implicitly dropped by the restart. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:15:03 +02:00
Kristian Nielsen	c69b86d468	Binlog-in-engine: Support for new binlog format in mysqlbinlog Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:13:04 +02:00
Kristian Nielsen	c1ad984aa5	Binlog-in-engine: Clean up gtid state reading Refactor the code to use binlog_chunk_reader for reading a GTID state record, getting rid of the duplicate logic in the old special-purpose GTID state reading code. This also removes the assumption that GTID state fits in a single page (untested for now though). Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:13:01 +02:00
Kristian Nielsen	7b45ddd0c3	Binlog-in-engine: Handle mixing transactional and non-transactional tables When updating non-transactional tables inside a multi-statement transaction, and binlog_direct_non_transactional_updates=1, then the non-transactional updates are binlogged directly through the statement cache while the transaction cache is still being added to in the main transaction. Thus, move the engine_binlog_info out from binlog_cache_mngr and into the individual stmt/trx binlog_cache_data, so that we can have separate engine_binlog_info active for the statement and the transaction cache. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:12:09 +02:00
Kristian Nielsen	836fd2cefc	Binlog-in-engine: Handle recovery when all but one binlog files have been purged Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:11:20 +02:00
Kristian Nielsen	7e6e8724aa	Binlog-in-engine: Handle single event writes larger than binlog size Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:11:11 +02:00
Kristian Nielsen	fb050e7981	Binlog-in-engine: Implement dynamically changing binlog max size Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:10:23 +02:00
Kristian Nielsen	5a79c16fd7	Binlog-in-engine: Implement savepoint support Support for SAVEPOINT, ROLLBACK TO SAVEPOINT, rolling back a failed statement (keeping active transaction), and rolling back transaction. For savepoints (and start-of-statement), if the binlog data to be rolled back is still in the in-memory part of trx cache we can just truncate the cache to the point. But if we need to spill cache contents as out-of-band data containing one or more savepoints/start-of-statement point, then split the spill at each point and inform the engine of the savepoints. In InnoDB, at savepoint set, save the state of the forest of perfect binary trees being built. Then at rollback, restore the appropriate state. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:10:21 +02:00
Kristian Nielsen	df4bec2123	MDEV-34705: Binlog-in-engine: Binlog reader to read whole page at a time Instead of returning only one chunk at a time, make ha_innodb_binlog_reader::read_data() try to read all chunks on the page. This reduces the number of times each reader has to latch pages in the page fifo, which contends for a global mutex also shared with the writer. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:05:27 +02:00
Kristian Nielsen	0dcab78e0d	MDEV-34705: Binlog-in-engine: Crash-safe slave This patch makes replication crash-safe with the new binlog implementation, even when --innodb-flush-log-at-trx-commit=0\|2. The point is to not send any binlog events to the slave until they have become durable on master, thus avoiding that a slave may replicate a transaction that is lost during master recovery, diverging the slave from the master. Keep track of which point in the binlog has been durably synced to disk (meaning the corresponding LSN has been durably synced to disk in the InnoDB redo log). Each write to the binlog inserts an entry with offset and corresponding LSN in a FIFO. Dump threads will first read only up to the durable point in the binlog. A dump thread will then check the LSN fifo, and do an InnoDB redo log sync if anything is pending. Then the FIFO is emptied of any LSNs that have now become durable, and the durable point in the binlog is updated and reading the binlog can continue. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 12:04:37 +02:00
Kristian Nielsen	b96c487a59	MDEV-34705: Binlog-in-engine: Fix hang with event group of specific size If the event group fitted in the binlog cache without the GTID event but not with, the code would attempt to spill part of the GTID event as out-of-band data, which is not correct. In release builds this would hang the server as the spilling would try to lock an already owned mutex. Fix by checking if the GTID event fits, and spilling any non-GTID data as oob if it does not. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:55:02 +02:00
Kristian Nielsen	9bd57017f6	MDEV-34705: Binlog-in-engine: Attempt to fix assertion in do_fdatasync() After temporarily releasing the mutex during wait in fsp_binlog_page_fifo::do_fdatasync(), the state may have changed, so be sure to re-check to avoid fdatasync() on a now stale fh. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:55:00 +02:00
Kristian Nielsen	fdbfc12a74	MDEV-34705: Binlog-in-engine: Improved page fifo Some basic improvements to the binlog-specific page fifo to hopefully get reasonable scalabitily as a starting point. The fifo is still protected by a global mutex, but some effort is taken to reduce the duration a thread is holding the mutex. Use a cyclic array instead of a linked list so pages can be looked up in constant time. And cache allocated page objects to avoid repeated malloc/free while holding the mutex. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:54:40 +02:00
Kristian Nielsen	24c1a891a5	MDEV-34705: Binlog-in-engine: Reduce struct fsp_binlog_page_entry size The file_no and page_no values are not really needed in the page object, so remove them to save a bit of memory. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:54:34 +02:00
Kristian Nielsen	5115cd7425	MDEV-34705: Binlog-in-engine: mariadb-backup integration InnoDB binlog files are now backed up along with other InnoDB data by mariadb-backup. The files are copied after backup locks have been released. Backup files created later than the backup LSN are skipped. Then during --prepare, any data missing from the hot-copied binlog files will be restored by the binlog recovery code, and any excess data written after the backup LSN will be zeroed out. A couple test cases test taking a consistent backup of a server with active traffic during the backup, by provisioning a slave from the restored binlog position and checking that the slave can replicate from the original master and get identical data. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:54:26 +02:00
Kristian Nielsen	2c77e8c85a	MDEV-34705: Binlog-in-engine: Implement refcounting outstanding OOB records Keep track of, for each binlog file, how many open transactions have out-of-band data starting in that file. Then at the start of each new binlog file, in the header page, record the file_no of the earliest file that this file might contain commit records with references back to OOB records in that earlier file. Use this in PURGE BINARY LOGS, so that when a dump thread (slave connection) is active in file number N, and that file (or a later one) may require looking back in an earlier file number M for out-of-band records, purge will stop already at file number M. This way, we avoid that purge accidentally deletes some binlog file that a dump thread would later get an error on because it needs to read out-of-band data. This patch also includes placeholder data for a similar facility for XA references. The actual implementation of support for XA is for later though. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:54:24 +02:00
Kristian Nielsen	a0decae938	MDEV-34705: Binlog-in-engine: Integration with server-layer code Mostly various fixes to avoid initializing or creating any data or files for the legacy binlog. A possible later refinement could be to sub-class the binlog class differently for legacy and in-engine binlogs, writing separate virtual functions for behaviour that differ, extracting common functionality into sub-methods. This could remove some if (opt_binlog_engine_hton) conditionals. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:48:25 +02:00
Kristian Nielsen	bdb88e5561	MDEV-34705: Binlog-in-engine: More compiler warning fixes Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:48:23 +02:00
Kristian Nielsen	bc7d084576	MDEV-34705: Binlog-in-engine: Fix MSAN uninitialized warning in binlog_flush Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:48:20 +02:00
Kristian Nielsen	c2e8fd1f1a	MDEV-34705: Binlog-in-engine: Work-around compiler warning Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:47:32 +02:00
Kristian Nielsen	ccbbb1ff24	MDEV-34705: Binlog-in-engine: Fix uninitialized variable in binlog discovery Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:47:28 +02:00
Kristian Nielsen	3c713c7498	MDEV-34705: Binlog-in-engine: Implement file header page Now the first page of each binlog tablespace file is reserved as a file header, replacing the use of extra fields in the first gtid state record of the file. The header is primarily used during recovery, especially to get the file LSN before which no redo should be applied to the file. Using a dedicated page makes it possible to durably sync the file header to disk after RESET MASTER (and at first server startup) and not have it overwritten (and potentially corrupted) later; this guarantees that the recovery will have at least one file header to look at to determine from which LSN to apply redo records. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:47:17 +02:00
Kristian Nielsen	881ed99f8f	MDEV-34705: Binlog-in-engine: Use separate 4k pagesize for binlog files Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:47:12 +02:00
Kristian Nielsen	490168b82f	MDEV-34705: Binlog-in-engine: Use the whole page for binlog data Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:47:07 +02:00
Kristian Nielsen	336dab063c	MDEV-34705: Binlog-in-engine: Implement page checksum Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:47:02 +02:00
Kristian Nielsen	5172e5a0e2	MDEV-34705: Binlog-in-engine: Recovery testcase + few bugfixes Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:47:00 +02:00
Kristian Nielsen	1966a83967	MDEV-34705: Binlog-in-engine: First working recovery Still needs more testing. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:46:56 +02:00
Kristian Nielsen	857d3451a1	MDEV-34705: Binlog-in-engine: Implement SHOW BINLOG EVENTS Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:44:12 +02:00
Kristian Nielsen	319e635710	MDEV-34705: Binlog-in-engine: Implement legacy SHOW MASTER STATUS Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:43:24 +02:00
Kristian Nielsen	5bd458038d	MDEV-34705: binlog-in-engine: New recovery preparatory commit Some smaller refactoring and additions to prepare for new approach to recovery of binlog tablespaces. Store at the head of each binlog file the start LSN and the file size. The final page of a binlog file is now not released in the page fifo until mtr is committed. This ensures that all changes to a binlog file are redo logged when the tablespace is closed, which simplifies things as then at most the two most recent binlog files will need redo records to be re-applied during recovery. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:43:12 +02:00
Kristian Nielsen	2b5bb8d03d	MDEV-34705: Binlog-in-engine: No use of InnoDB tablespace and bufferpool In preparation for a simplified, lower-level recovery of binlog files implemented in InnoDB, remove use of InnoDB tablespaces and buffer pool from the binlog code. Instead, a custom binlog page fifo replaces the general buffer pool for binlog pages, and tablespaces are replaced by simple file_no references. The new binlog page fifo is deliberately naively written in this commit for simplicity, until the new recovery is complete and proven with tests; later it can be improved for better efficiency and scalability. This first version uses a simple global mutex, linear scans of linked lists, repeated alloc/free of pages, and simple backgrund flush thread that uses synchroneous pwrite() one page after another. Error handling is also mostly omitted in this first version. The page header/footer is not changed in this commit, nor is the pagesize, to be done in a later patch. The call to mtr_t::write_binlog() is currently commented-out in function fsp_log_binlog_write() as it asserts in numerous places. To be enabled when those asserts are fixed. For the same reason, the code does not yet implement binlog_write_up_to(lsn_t lsn), to be done once mtr_t operations are working. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:42:43 +02:00
Kristian Nielsen	77809c8855	MDEV-34705: Binlog-in-engine: Implement DELETE_DOMAIN_ID for FLUSH Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:39:19 +02:00
Kristian Nielsen	2f1a40ae35	MDEV-34705: Binlog-in-engine: Implement PURGE BINARY LOGS Still ToDo: is to restrict auto-purge so that it does not purge any binlog file with out-of-band data that might still be needed by a connected slave. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:38:30 +02:00
Kristian Nielsen	ff2b101609	MDEV-34705: Binlog-in-engine: Handful of fixes Fix missing WORDS_BIGENDIAN define in ut0compr_int.cc. Fix misaligned read buffer for O_DIRECT. Fix wrong/missing update_binlog_end_pos() in binlog group commit. Fix race where active_binlog_file_no incremented too early. Fix wrong assertion when reader reaches the very start of (active+1). Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:38:26 +02:00
Kristian Nielsen	ef6e5823b8	MDEV-34705: Binlog-in-engine: Buildbot fixes Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:38:14 +02:00
Kristian Nielsen	bbe3fc3b57	MDEV-34075: Binlog-in-engine: Some test and review fixes Enable binlog_in_engine as a default suite. Fix embedded and Windows build failures. Use sql_print_(error\|warning) over ib::error() and ib::warn(). Use small_vector<> for the innodb_binlog_oob_reader instead of a custom implementation. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:38:01 +02:00
Kristian Nielsen	d3bd9b83ca	MDEV-34705: Binlog-in-engine: Misc. small fixes to make normal test suite mostly pass Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:37:57 +02:00
Kristian Nielsen	bb41951d6e	MDEV-34705: Binlog-in-engine: Implement RESET MASTER Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2026-04-08 11:37:08 +02:00

1 2 3 4 5 ...

4,371 commits