mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-15 11:27:39 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	d71b2a7412	Merge 11.4 into 11.8	2025-10-01 10:32:47 +03:00
Marko Mäkelä	5cf9d846ea	Merge 10.11 into 11.4	2025-10-01 07:24:54 +03:00
Sergei Golubchik	35767042e5	MDEV-37743 Frequent timeouts of the test innodb.innodb_bug38231 a comment in the test says # do not clean up - we do not know which of the three has been released # so the --reap command may hang because the command that is being executed # in that connection is still running/waiting	2025-09-30 16:38:29 +02:00
Marko Mäkelä	78aa5fb623	MDEV-37299 fixup: cmake -DPLUGIN_PERFSCHEMA=NO	2025-09-30 16:42:58 +03:00
Marko Mäkelä	3cc9ac0b30	MDEV-37482: Introduce innodb_adaptive_hash_index_cells SET GLOBAL innodb_adaptive_hash_index_cells may be executed while the server is running. This parameter will be effectively multiplied by innodb_adaptive_hash_index_parts, because each partition will contain its own hash table. Previously, the number of hash table cells in the InnoDB adaptive hash index depended on the initial innodb_buffer_pool_size and was insufficient for some workloads, leading to excessively long hash bucket chains. If innodb_adaptive_hash_index_cells is at its minimum and default value 16381 at startup, it will be derived from the innodb_buffer_pool_size, for backward compatibility.	2025-09-30 10:15:09 +03:00
Marko Mäkelä	643d365ced	Merge 11.4 into 11.8	2025-09-30 09:28:08 +03:00
bsrikanth-mariadb	6aa7498313	MDEV-31744: Assertion with COUNT(*) OVER (ORDER BY const RANGE BETWEEN...) When the query uses several Window Functions: SELECT WIN_FUNC1() OVER (ORDER BY 'const', col1), WIN_FUNC2() OVER (ORDER BY col1 RANGE BETWEEN CURRENT ROW AND 5 FOLLOWING) compare_window_funcs_by_window_specs() will try to get the Window Specs to reuse the ORDER BY lists. If the lists produce the same order (like above) Window Spec of the WIN_FUNC2 will reuse the ORDER BY list of WIN_FUNC1. However, WIN_FUNC2 has a RANGE-type window frame. It expects to get ORDER BY list with one element, which it will use to compute frame bounds. Proving it with ORDER BY list from WIN_FUNC1 ('const', col1) was caused an assertion failure The fix is to: Use the original ORDER BY list when constructing RANGE-type frames Fix an apparent typo bug in compare_window_funcs_by_window_specs(): assignment win_spec1->save_order_list= win_spec2->order_list; Saved the order list from the wrong spec. Instead, take one from win_spec1.	2025-09-30 08:33:00 +05:30
Aleksey Midenkov	ff33f49d9a	Merge 11.4 into 11.8	2025-09-29 18:25:09 +03:00
Marko Mäkelä	13076351f1	MDEV-37152: Reimplement innodb_buffer_pool_read_requests Let us remove the thread-local variable mariadb_stats and introduce trx_t::pages_accessed, trx_t::active_handler_stats for more efficiently maintaining some statistics inside InnoDB. buf_pool.stat.n_page_gets: Reimplemented as Atomic_counter<ulint>. This will no longer track some accesses in the background where !current_thd() \|\| !thd_to_trx(current_thd). trx_t::free(), trx_t::commit_cleanup(): Apply pages_accessed to buf_pool.stat.n_page_gets. buf_read_ahead_report(): Report a completed read-ahead batch. ha_innobase::estimate_rows_upper_bound(): Do not bother updating trx_t::op_info around some quick arithmetics. ha_innobase::records_in_range(): Do invoke mariadb_set_stats. This will change some ANALYZE FORMAT=JSON SELECT results of the test main.rowid_filter_innodb. Reviewed by: Vladislav Lesin Tested by: Saahil Alam	2025-09-29 14:13:27 +03:00
Marko Mäkelä	a742fb7bce	Merge 10.11 into 11.4	2025-09-29 08:25:37 +03:00
Thirunarayanan Balathandayuthapani	bef32e4bbe	MDEV-37083 Fixup to trigger ahi for encryption.innochecksum	2025-09-26 18:33:26 +05:30
Jan Lindström	dd159aeb1b	MDEV-30418 : Setting wsrep_slave_threads causes thread hang Problem was that wsrep was disconnected and new slave threads tried to connect to cluster but failed as we were disconnected state. Allow changing wsrep_slave_threads only when wsrep is enabled and we are connected to a cluster. In other cases report error and issue a warning.	2025-09-26 15:33:56 +03:00
Marko Mäkelä	e8ef8c0055	Merge 10.11 into 11.4	2025-09-24 13:40:09 +03:00
Marko Mäkelä	990b44495c	Merge 10.6 into 10.11	2025-09-24 12:48:56 +03:00
Daniel Black	f2ef683b7a	MDEV-37705 main.lotofstack /main.sp-error fails in MSAN+Debug Tests on clang-20/21 had both of these tests overrunning the stack. The check_stack_overrun function checked the function earlier with a 2STACK_MIN_SIZE margin. The exection within the processing is deeper then when check_stack_overrun was called. Raising STACK_MIN_SIZE to 44k was sufficient (and 40k wasn't oufficient). execution_constants also tested however the topic mention tests are bigger. Perfscheam tests perfschema.statement_program_nesting_event_check * perfschema.statement_program_nested * perfschema.max_program_zero A small increase to the test thread-stack-size on statement_program_lost_inst allows this test to continue to pass.	2025-09-24 09:08:16 +10:00
Jan Lindström	f9bdff6162	MDEV-37373 : InnoDB partition table disallow local GTIDs in galera Problem was that for partitioned tables base table storage engine is DB_TYPE_PARTITION_DB and naturally different than DB_TYPE_INNODB so operation was not allowed in Galera. Fixed by requesting implementing storage engine for partitioned tables i.e. table->file->partition_ht() or if that does not exist we can use base table storage engine. Resulting storage engine type is then used on condition is operation allowed when wsrep_mode=DISALLOW_LOCAL_GTID or not. Operations to InnoDB storage engine i.e DB_TYPE_INNODB should be allowed.	2025-09-23 13:17:00 +03:00
Thirunarayanan Balathandayuthapani	687b18648c	MDEV-35163 InnoDB persistent statistics fail to update after ALTER TABLE...ALGORITHM=COPY Problem: ======= - InnoDB statistics calculation for the table is done after every 10 seconds by default in background thread dict_stats_thread() - Doing multiple ALTER TABLE..ALGORITHM=COPY causes the dict_stats_thread() to lag behind, therefore calculation of stats for newly created intermediate table gets delayed Fix: ==== - Stats calculation for newly created intermediate table is made independent of background thread. After copying gets completed, stats for new table is calculated as part of ALTER TABLE ... ALGORITHM=COPY. dict_stats_rename_table(): Rename the table statistics from intermediate table to new table alter_stats_rebuild(): Removes the table name from the warning. Because this warning can print for intermediate table as well. Alter table using copy algorithm now calls alter_stats_rebuild() under a shared MDL lock on a temporary #sql-alter- table, differing from its previous use only during ALGORITHM=INPLACE operations on user-visible tables. dict_stats_schema_check(): Added a separate check for table readability before checking for tablespace existence. This could lead to detect of existence of persistent statistics storage eariler and fallback to transient statistics. This is a cherry-pick fix of mysql commit@cfe5f287ae99d004e8532a30003a7e8e77d379e3	2025-09-22 17:39:47 +05:30
Sergei Golubchik	c0233a09ee	MDEV-37600 Backpoint MDEV-9804 Implement a caching_sha2_password plugin but without caching	2025-09-21 13:13:30 +02:00
mariadb-satishkumar	1454d28cf8	MDEV-37299: Fix crash when server read-only and encrption ON Modified srv_start to call fil_crypt_threads_init() only when srv_read_only_mode is not set. Modified encryption.innodb-read-only to capture number of encryption threads created for both scenarios when server is not read only as well as when server is read only.	2025-09-19 11:55:43 +05:30
Arcadiy Ivanov	62b21714d0	Reproducible test case for MDEV-37434 Add debug logging to help with tracing Add the fix	2025-09-18 18:01:33 +02:00
Nikita Malyavin	28472359b1	MDEV-15990 versioning: don't allow changes in the past	2025-09-17 18:47:25 +03:00
Nikita Malyavin	8001679af6	MDEV-15990 handle timestamp-based collisions as well Timestamp-versioned row deletion was exposed to a collisional problem: if current timestamp wasn't changed, then a sequence of row delete+insert could get a duplication error. A row delete would find another conflicting history row and return an error. This is true both for REPLACE and DELETE statements, however in REPLACE, the "optimized" path is usually taken, especially in the tests. There, delete+insert is substituted for a single versioned row update. In the end, both paths end up as ha_update_row + ha_write_row. The solution is to handle a history collision somehow. From the design perspective, the user shouldn't experience history rows loss, unless there's a technical limitation. To the contrary, trxid-based changes should never generate history for the same transaction, see MDEV-15427. If two operations on the same row happened too quickly, so that they happen at the same timestamp, the history row shouldn't be lost. We can still write a history row, though it'll have row_start == row_end. We cannot store more than one such historical row, as this will violate the unique constraint on row_end. So we will have to phisically delete the row if the history row is already available. In this commit: 1. Improve TABLE::delete_row to handle the history collision: if an update results with a duplicate error, delete a row for real. 2. use TABLE::delete_row in a non-optimistic path of REPLACE, where the system-versioned case now belongs entirely.	2025-09-17 18:29:47 +03:00
Nikita Malyavin	aeb25743af	MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY We had a protection against it, by allowing versioned delete if: trx->id != table->vers_start_id() For replace this check fails: replace calls ha_delete_row(record[2]), but table->vers_start_id() returns the value from record[0], which is irrelevant. The same problem hits Field::is_max, which may have checked the wrong record. Fix: * Refactor Field::is_max to optionally accept a pointer as an argument. * Refactor vers_start_id and vers_end_id to always accept a pointer to the record. there is a difference with is_max is that is_max accepts the pointer to the field data, rather than to the record. Method val_int() would be too effortful to refactor to accept the argument, so instead the value in record is fetched directly, like it is done in Field_longlong.	2025-09-17 11:38:55 +03:00
Marko Mäkelä	acd3db4e44	Merge 10.11 into 11.4	2025-09-16 17:01:39 +03:00
Alexey Yurchenko	e238246872	MDEV-37494 Diagnostics_area does not always contain apply error info It appears that some error conditions don't store error information in the Diagnostics_area. For example when table_def::compatible_with() check fails error message is stored in Relay_log_info instead. This results in optimistically identical votes and zero error buffer size breaks wsrep-lib logic as it relies on error buffer size to decide whether voting took place. To account for this, first try to obtain error info from Diagnostics_area, then fallback to Relay_log_info. If that fails use some "random" data to distinguish this condition from success in production.	2025-09-15 16:48:10 +02:00
Oleksandr Byelkin	15b1426c3a	Merge branch '10.11' into bb-11.4-release	2025-09-15 16:17:33 +02:00
Sergei Golubchik	886a51d956	MDEV-35875 Misleading error message for non-existing ENCRYPTION_KEY_ID update the test case	2025-09-15 11:00:02 +02:00
Sergei Golubchik	ed81e5f456	MDEV-37375 engines/iuds suite fails with ps-protocol and collateral cleanup	2025-09-15 11:00:02 +02:00
Marko Mäkelä	fe59b4ce96	MDEV-37412: Better test case Instead of using DBUG_EXECUTE_IF fault injection, let us construct a minimal corrupted log file that will produce an OPT_PAGE_CHECKSUM mismatch without depending on CMAKE_BUILD_TYPE=Debug.	2025-09-15 08:44:26 +03:00
Monty	6058e02732	MDEV-37172 Server crashes in Item_func_nextval::update_table after INSERT to the table, that uses expression with nextval() as default The issue was that unpack_vcol_info_from_frm() wrongly linked the used sequence tables into tables->internal_tables when more than one sequence table was used. Other things: - Fixed internal_table_exists() to take db into account. (This is making the code easier to read. As we where comparing pointers the old code also worked).	2025-09-14 19:24:07 +03:00
Oleksandr Byelkin	0707dac202	Merge branch '10.6' into 10.11	2025-09-12 13:08:40 +02:00
Dave Gosselin	47df0ba17c	Cherry-pick of 'mariadb-test: wait on disconnect' from 12.1 Cherry-picks mysqltest.cc and rpl_semi_sync_shutdown_await_ack changes from 12.1 to fix a race condition on disconnect.	2025-09-10 13:57:07 -04:00
Thirunarayanan Balathandayuthapani	4dcd2d8513	MDEV-37412 Corrupted page during recovery aborts the server Problem: ======= When InnoDB encounters a corrupted page during crash recovery, server would abort due to improper handling of page locks and space references. The recovery process was not properly cleaning up resources when corruption was detected, leading to inconsistent state and server termination. Solution: ========= recover_low(): Move page lock recursive acquisition after deferred/non-deferred page creation logic to ensure consistent locking behavior for both code paths. Ensure proper block recursive unlock for non-deferred tablespaces recv_recover_page(): Simplify corrupted page cleanup by removing redundant space reference handling.	2025-09-10 15:25:58 +03:00
Elena Stepanova	c40402e4a9	MDEV-37618 galera.MDEV-26266 fails with ER_OPTION_PREVENTS_STATEMENT with PS protocol	2025-09-10 14:12:10 +03:00
Marko Mäkelä	7e76a58e0b	Merge 10.11 into 11.4	2025-09-09 14:09:10 +03:00
Sergei Golubchik	5743435954	MDEV-37397 Assertion `bitmap_is_set(&read_partitions, next->id)' failed in int partition_info::vers_set_hist_part(THD *) after `633417308f` (MDEV-37312) lookup_handler is locked with F_WRLCK, because it may be used for deleting rows. And lookup_handler is locked with F_WRLCK after prune_partitions(), but the main handler is locked before, and might expects all partitions to be in the read least, non-pruned. Let's prepare the lookup handler before prune_partitions().	2025-09-04 17:20:02 +02:00
Monty	fd39c63b41	MDEV-37520 Failure to detect corruption during backups of Aria table Fixed the following issues: - aria_read_index() and aria_read_data(), used by mariabackup, checked the wrong status from maria_page_crc_check(). - Both functions did infinite retries if crc did not match. - Wrong usage of ma_check_if_zero() in maria_page_crc_check() Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>	2025-09-04 18:08:39 +03:00
Monty	882f6fa3aa	Fixed typos - Removed duplicate words, like "the the" and "to to" - Removed duplicate lines (one double sort line found in mysql.cc) - Fixed some typos found while searching for duplicate words. Command used to find duplicate words: egrep -rI "\s([a-zA-Z]+)\s+\1\s" \| grep -v param Thanks to Artjoms Rimdjonoks for the command and pointing out the spelling errors.	2025-09-04 18:08:39 +03:00
Monty	d2ce0650ad	MDEV-37356 Annotate_rows written in a 'random' position Ensure that Annotate_rows is always written direct after GTID information, before any table_map events. Before this patch, the following problems existed when mixing transactional and not transactional tables in the same statement: - Annotate rows could be written after row events or in the next GTID event. - See rpl_row_mixing_engines - Annotate_rows was not always written to binary log in case of error with a transactional table (rolled back) but a not transactional table was updated. - See sp_trans_log, binlog_row_mix_innodb_myisam Fixed by writing the Annotate_rows event into the non transactional cache if there are not transactional tables used. If not, write the event into the transactional cache.	2025-09-04 18:08:39 +03:00
Monty	6a4fe9923d	Improvements for myisamchk These changes was done as part of fixing MDEV-36858 MariaDB MyISAM secondary indexes silently break for tables > 10B rows Changes done in myisamchk: - Tables that are checked are opened in readonly mode if --force is not used. - *.MYD files will be opened in readonly mode for repair if --quick is used. - Added information about check progress if --verbose is used. - Output information about repaired/checked rows every 10000 rows instead of every 1000 rows. Note that this also affects aria_chk - Store open file mode in share->index_mode and share->data_mode instead of in share->mode. - Added new option --keys-active= as a simpler version of keys-used. - Changed output for "myisamchk -dvv" to get nicer output for tables with 10 billion rows.	2025-09-04 18:08:39 +03:00
Monty	8f771b28a1	MDEV-34914 maria.bulk_insert_crash fails on s390x (10.6+, Debug) This was caused by a wrong handling of bitmaps in copy_not_changed_fields() that did not work on big endian machines. This bug caused recovery of Aria files to fail on big endian machines like s390x or Sparc. This issue was noticed by the bulk_insert_crash.test on the s390x builder.	2025-09-04 17:15:50 +03:00
Marko Mäkelä	ef2f3d207b	MDEV-16168: Performance regression after MDEV-12288 The function row_purge_reset_trx_id() that had been introduced in commit `3c09f148f3` (MDEV-12288) introduces some extra buffer pool and redo log activity that will cause a significant performance regression under some workloads. This is currently the most significant performance issue, after commit `acd071f599` (MDEV-21923) fixed the InnoDB LSN allocation and MDEV-19749 the MDL bottleneck in 12.1. The purpose of row_purge_reset_trx_id() was to ensure that we can easily identify records for which no history exists. If DB_TRX_ID is 0, we could avoid looking up the transaction to see if the history is accessible or the record is implicitly locked. To avoid trx_sys_t::find() for stale DB_TRX_ID values, we can refer to trx_t::max_inactive_id, which was introduced in commit `4105017a58` (MDEV-30357). Instead of comparing DB_TRX_ID to 0, we may compare it to this cached value. The cache would be updated by trx_sys_t::find_same_or_older(), which is invoked for some operations on secondary indexes. row_purge_reset_trx_id(): Remove. We will no longer reset the DB_TRX_ID to 0 after an INSERT. We will retain a single undo log for all operations, though. Before MDEV-12288, there had been separate insert_undo and update_undo logs. row_check_index(): No longer warn "InnoDB: Clustered index record with stale history in table". lock_rec_queue_validate(), lock_rec_convert_impl_to_expl(), row_vers_impl_x_locked_low(): Instead of comparing the DB_TRX_ID to 0, compare it to trx_t::max_inactive_id. In dict0load.cc we will not spend any effort to avoid extra trx_sys.find() calls for stale DB_TRX_ID in dictionary tables. This code does not currently use trx_t objects, and therefore we cannot easily access trx_t::max_inactive_id. Loading table definitions into the InnoDB data dictionary cache (dict_sys) should be a very rare operation. Reviewed by: Vladislav Lesin	2025-09-04 08:40:40 +03:00
Marko Mäkelä	257f4b30ef	Merge 10.11 into 11.4	2025-09-03 10:32:56 +03:00
Nikita Malyavin	0108664a8a	Merge branch 10.11 into 11.4 # Conflicts: # sql/handler.h # sql/log_event.h # sql/log_event_server.cc	2025-09-02 15:58:39 +02:00
Marko Mäkelä	cc277a7d24	MDEV-36024: Redesign innodb_encrypt_log=ON The innodb_encrypt_log=ON subformat of FORMAT_10_8 is inefficient, because a new encryption or decryption context is being set up for every log record payload snippet. An in-place conversion between the old and new innodb_encrypt_log=ON format is technically possible. No such conversion has been implemented, though. There is some overhead with respect to the unencrypted format (innodb_encrypt_log=OFF): At the end of each mini-transaction, right before the CRC-32C, additional 8 bytes will be reserved for a nonce (really, log_sys.get_flushed_lsn()), which forms a part of an initialization vector. log_t::FORMAT_ENC_11: The new format identifier, a UTF-8 encoding of 🗝 U+1F5DD OLD KEY (encryption). In this format, everything except the types and lengths of log records will be encrypted. Thus, unlike in FORMAT_10_8, also page identifiers and FILE_ records will be encrypted. The initialization vector (IV) consists of the 8-byte nonce as well as the type and length byte(s) of the first record of the mini-transaction. Page identifiers will no longer form any part of the IV. The old log_t::FORMAT_ENC_10_8 (innodb_encrypt_log=ON) will be supported both by mariadb-backup and by crash recovery. Downgrade from the new format will only be possible if the new server has been running or restarted with innodb_encrypt_log=OFF. If innodb_encrypt_log=ON, only the new log_t::FORMAT_ENC_11 will be written. log_t::is_recoverable(): A new predicate, which holds for all 3 formats. recv_sys_t::tmp_buf: A heap-allocated buffer for decrypting a mini-transaction, or for making the wrap-around of a memory-mapped log file contiguous. recv_sys_t::start_lsn: The start of the mini-transaction. Updated at the start of parse_tail(). log_decrypt_mtr(): Decrypt a mini-transaction in recv_sys.tmp_buf. Theoretically, when reading the log via pread() rather than a read-only memory mapping, we could modify the contents of log_sys.buf in place. If we did that, we would have to re-read the last log block into log_sys.buf before resuming writes, because otherwise that block could be re-written as a mix of old decrypted data and new encrypted data, which would cause a subsequent recovery failure unless the log checkpoint had been advanced beyond this point. log_decrypt_legacy(): Decrypt a log_t::FORMAT_ENC_10_8 record snippet on stack. Replaces recv_buf::copy_if_needed(). recv_sys_t::get_backup_parser(): Return a recv_sys_t::parser, that is, a pointer to an instantiation of parse_mmap or parse_mtr for the current log format. recv_sys_t::parse_mtr(), recv_sys_t::parse_mmap(): Add a parameter template<uint32_t> for the current log_sys.format. log_parse_start(): Validate the CRC-32C of a mini-transaction. This has been split from the recv_sys_t::parse() template to reduce code duplication. These two are the lowest-level functions that will be instantiated for both recv_buf and recv_ring. recv_sys_t::parse(): Split into ::log_parse_start() and parse_tail(). Add a parameter template<uint32_t format> to specialize for log_sys.format at compilation time. recv_sys_t::parse_tail(): Operate on pointers to contiguous mini-transaction data. Use a parameter template<bool ENC_10_8> for special handling of the old innodb_encrypt_log=ON format. The former recv_buf::get_buf() is being inlined here. Much of the logic is split into non-inline functions, to avoid duplicating a lot of code for every template expansion. log_crypt: Encrypt or decrypt a mini-transaction in place in the new innodb_encrypt_log=ON format. We will use temporary buffers so that encryption_ctx_update() can be invoked on integer multiples of MY_AES_BLOCK_SIZE, except for the last bytes of the encrypted payload, which will be encrypted or decrypted in place thanks to ENCRYPTION_FLAG_NOPAD. log_crypt::append(): Invoke encryption_ctx_update() in MY_AES_BLOCK_SIZE (16-byte) blocks and scatter/gather shorter data blocks as needed. log_crypt::finish(), Handle the last (possibly incomplete) block as a special case, with ENCRYPTION_FLAG_NOPAD. mtr_t::parse_length(): Parse the length of a log record. mtr_t::encrypt(): Use log_crypt instead of the old log_encrypt_buf(). recv_buf::crc32c(): Add a parameter for the initial CRC-32C value. recv_sys_t::rewind(): Operate on pointers to the start of the mini-transaction and to the first skipped record. recv_sys_t::trim(): Declare as ATTRIBUTE_COLD so that this rarely invoked function will not be expanded inline in parse_tail(). recv_sys_t::parse_init(): Handle INIT_PAGE or FREE_PAGE while scanning to the end of the log. recv_sys_t::parse_page0(): Handle WRITE to FSP_SPACE_SIZE and FSP_SPACE_FLAGS. recv_sys_t::parse_store_if_exists(), recv_sys_t::parse_store(), recv_sys_t::parse_oom(): Handle page-level log records. mlog_decode_varint_length(): Make use of __builtin_clz() to avoid a loop when possible. mlog_decode_varint(): Define only on const byte*, as ATTRIBUTE_NOINLINE static because it is a rather large function. recv_buf::decode_varint(): Trivial wrapper for mlog_decode_varint(). recv_ring::decode_varint(): Special implementation. log_page_modify(): Note that a page will be modified in recovery. Split from recv_sys_t::parse_tail(). log_parse_file(): Handle non-page log records. log_record_corrupted(), log_unknown(), log_page_id_corrupted(): Common error reporting functions.	2025-09-02 13:28:34 +03:00
mariadb-satishkumar	ad44e1b964	MDEV-36993: Format log for srv_mon_reset_all	2025-09-02 15:31:34 +05:30
Brandon Nesterenko	a394fc0270	MDEV-29981: Replica stops with "Found invalid event in binary log" Replication can stop in error if a Heartbeat log event is sent to a replica during rotation. There are two bugs at play: 1. Prior to MDEV-30128 (added in 11.0), there is a bug when checking legacy events. When the replica rotates its relay logs, it initializes its Format_description_log_event with binlog version 3 (this is hard-coded). So immediately after rotation (and until a new Format_descriptor with binlog_format 4 is sent from the master), the IO thread is expecting binlog_format 3 (i.e. it will call queue_old_event() for incoming events). This invalidates any events that are sent with an event type higher than 14. In theory, we wouldn't expect any events to be sent in-between a rotate and the next format descriptor log event, but if a long enough period of time passes between then, the primary will generate and send a Heartbeat event (of type 27). In such case, the slave will see the heartbeat event of type 27, see it is higher than 14, and result in an error mentioning 'Found invalid event in binary log', with the expected log coordinates of the new log (which is optimistically populated from the Rotate log event, not the new event). 2. In all versions of MariaDB (11.0+), there is a bug when checking the state of a Heartbeat log event, in that it doesn't consider a rotated binary log. The check is meant to ensure that the heartbeat provided by the master (i.e. the state of the master) is greater than or equal to the state of the slave. In other words, it checks that the slave isn't ahead of the master. However, if the filename provided by the master heartbeat event is different than the filename saved for the slave's state, the check always fails. This is broken, because when the master rotates its logs, the new binary log file will have a different filename (i.e. an incremented index counter suffix). For example, if the master rotates its binary logs from master-bin.000002 to master-bin.000003, master-bin.000003 is ahead of master-bin.000002, but the slave will see a difference between the filenames and fail the check. To fix the first problem, this patch disallows passing a heartbeat event into queue_old_event (which is the source of the error, as it tries to parse a heartbeat log event). This function (queue_old_event) was removed with MDEV-30128, so bypassing it for heartbeat events is not consequential (and it is already also done for Format_description_events, which are not supported in old binlog file versions). Note that backporting all of MDEV-30128 was also considered, but this is less risky for GA. To fix the second problem, we simply ignore heartbeat events on the slave if the filenames don't match. This is because during rotation, it can appear that the slave is ahead of the master, which breaks the validity of the check (i.e. the check is to ensure the master is ahead of the slave). Additionally note that this patch restores a heartbeat check that was incorrectly removed in `780db8e252` Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>	2025-08-22 15:04:02 -06:00
Marko Mäkelä	3ee1991645	MDEV-36159 mariabackup failed after upgrade Ever since commit `685d958e38` (MDEV-14425) mariadb-backup --backup had some trouble to keep up with write workloads of the mariadbd server. Debarun Banerjee found out that mariadb-backup --backup was copying the log in the wrong way and not pausing when it made sense to do so. This change includes his fix as well as some dead code removal from xtrabackup_copy_mmap_logfile(). Some earlier changes to the default behaviour of mariadb-backup --backup will be reverted, by making the configuration parameters OFF by default. These parameters were basically working around this bug: * commit `652f33e0a4` (MDEV-30000) introduced --innodb-log-checkpoint-now and made it ON by default. Making the server execute a log checkpoint can be really I/O intensive. * commit `6acada713a` (MDEV-34062) introduced --innodb-log-file-mmap and made it ON by default on Linux and FreeBSD. There are no documented semantics what should happen to a memory mapping when there are concurrent pwrite(2) operations by other processes. While it appears to work, it is safer to default to clearly documented semantics. xtrabackup_copy_logfile(): Add a parameter early_exit. Always read a log snippet to the start of recv_sys.buf and assign recv_sys.len to the read length. We used to shift recv_sys.buf with memmove(). However, on recv_sys_t::PREMATURE_EOF we cannot know which part of the mini-transaction was correctly read, because that part of the ib_logfile0 may be concurrently modified by the server. So, we will reread everything from the start of the mini-transaction. xtrabackup_backup_func(): Invoke xtrabackup_copy_logfile(true), allowing it to stop on every recv_sys_t::PREMATURE_EOF. This will also avoid repeated "Retry" messages when there is no more redo log to copy. get_current_lsn(): Execute FLUSH ENGINE LOGS to ensure that InnoDB will complete any buffered writes to the ib_logfile0 and ensure that everything up to the current LSN has been written. backup_wait_for_commit_lsn(): Wait for as much as is really needed. This avoids an extra 5-second wait at the end of the backup. xtrabackup_copy_mmap_logfile(): Remove some dead code, and add debug assertions to demonstrate that the parser can only return recv_sys_t::OK or recv_sys_t::GOT_EOF.	2025-08-20 15:30:49 +03:00
Julius Goryavsky	aa3dd63d40	Merge branch '10.6' into '10.11'	2025-08-14 22:10:45 +02:00
Alexey Yurchenko	8dae7150b2	MTR test to verify that Galera gcs.stateless flag works Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-08-14 21:59:11 +02:00

1 2 3 4 5 ...

18,973 commits