Commit graph

18,973 commits

Author SHA1 Message Date
Marko Mäkelä
d71b2a7412 Merge 11.4 into 11.8 2025-10-01 10:32:47 +03:00
Marko Mäkelä
5cf9d846ea Merge 10.11 into 11.4 2025-10-01 07:24:54 +03:00
Sergei Golubchik
35767042e5 MDEV-37743 Frequent timeouts of the test innodb.innodb_bug38231
a comment in the test says

 # do not clean up - we do not know which of the three has been released
 # so the --reap command may hang because the command that is being executed
 # in that connection is still running/waiting
2025-09-30 16:38:29 +02:00
Marko Mäkelä
78aa5fb623 MDEV-37299 fixup: cmake -DPLUGIN_PERFSCHEMA=NO 2025-09-30 16:42:58 +03:00
Marko Mäkelä
3cc9ac0b30 MDEV-37482: Introduce innodb_adaptive_hash_index_cells
SET GLOBAL innodb_adaptive_hash_index_cells may be executed
while the server is running. This parameter will be effectively
multiplied by innodb_adaptive_hash_index_parts, because each partition will
contain its own hash table.

Previously, the number of hash table cells in the InnoDB adaptive hash index
depended on the initial innodb_buffer_pool_size and was insufficient
for some workloads, leading to excessively long hash bucket chains.
If innodb_adaptive_hash_index_cells is at its minimum and default value
16381 at startup, it will be derived from the innodb_buffer_pool_size,
for backward compatibility.
2025-09-30 10:15:09 +03:00
Marko Mäkelä
643d365ced Merge 11.4 into 11.8 2025-09-30 09:28:08 +03:00
bsrikanth-mariadb
6aa7498313 MDEV-31744: Assertion with COUNT(*) OVER (ORDER BY const RANGE BETWEEN...)
When the query uses several Window Functions:
SELECT
WIN_FUNC1() OVER (ORDER BY 'const', col1),
WIN_FUNC2() OVER (ORDER BY col1 RANGE BETWEEN CURRENT ROW
AND 5 FOLLOWING)
compare_window_funcs_by_window_specs() will try to get the Window Specs to
reuse the ORDER BY lists. If the lists produce the same order (like above)
Window Spec of the WIN_FUNC2 will reuse the ORDER BY list of WIN_FUNC1.

However, WIN_FUNC2 has a RANGE-type window frame. It expects to get
ORDER BY list with one element, which it will use to compute frame bounds.
Proving it with ORDER BY list from WIN_FUNC1 ('const', col1) was caused an
assertion failure

The fix is to:

Use the original ORDER BY list when constructing RANGE-type frames
Fix an apparent typo bug in compare_window_funcs_by_window_specs():
assignment
win_spec1->save_order_list= win_spec2->order_list;
Saved the order list from the wrong spec. Instead, take one from win_spec1.
2025-09-30 08:33:00 +05:30
Aleksey Midenkov
ff33f49d9a Merge 11.4 into 11.8 2025-09-29 18:25:09 +03:00
Marko Mäkelä
13076351f1 MDEV-37152: Reimplement innodb_buffer_pool_read_requests
Let us remove the thread-local variable mariadb_stats and introduce
trx_t::pages_accessed, trx_t::active_handler_stats for more
efficiently maintaining some statistics inside InnoDB.

buf_pool.stat.n_page_gets: Reimplemented as Atomic_counter<ulint>.
This will no longer track some accesses in the background where
!current_thd() || !thd_to_trx(current_thd).

trx_t::free(), trx_t::commit_cleanup(): Apply pages_accessed
to buf_pool.stat.n_page_gets.

buf_read_ahead_report(): Report a completed read-ahead batch.

ha_innobase::estimate_rows_upper_bound(): Do not bother updating
trx_t::op_info around some quick arithmetics.

ha_innobase::records_in_range(): Do invoke mariadb_set_stats.
This will change some ANALYZE FORMAT=JSON SELECT results of the test
main.rowid_filter_innodb.

Reviewed by: Vladislav Lesin
Tested by: Saahil Alam
2025-09-29 14:13:27 +03:00
Marko Mäkelä
a742fb7bce Merge 10.11 into 11.4 2025-09-29 08:25:37 +03:00
Thirunarayanan Balathandayuthapani
bef32e4bbe MDEV-37083 Fixup to trigger ahi for encryption.innochecksum 2025-09-26 18:33:26 +05:30
Jan Lindström
dd159aeb1b MDEV-30418 : Setting wsrep_slave_threads causes thread hang
Problem was that wsrep was disconnected and new slave
threads tried to connect to cluster but failed as
we were disconnected state.

Allow changing wsrep_slave_threads only when wsrep is enabled
and we are connected to a cluster. In other cases report
error and issue a warning.
2025-09-26 15:33:56 +03:00
Marko Mäkelä
e8ef8c0055 Merge 10.11 into 11.4 2025-09-24 13:40:09 +03:00
Marko Mäkelä
990b44495c Merge 10.6 into 10.11 2025-09-24 12:48:56 +03:00
Daniel Black
f2ef683b7a MDEV-37705 main.lotofstack /main.sp-error fails in MSAN+Debug
Tests on clang-20/21 had both of these tests overrunning the
stack. The check_stack_overrun function checked the function
earlier with a 2*STACK_MIN_SIZE margin. The exection within
the processing is deeper then when check_stack_overrun was
called.

Raising STACK_MIN_SIZE to 44k was sufficient (and 40k wasn't
oufficient). execution_constants also tested however
the topic mention tests are bigger.

Perfscheam tests
* perfschema.statement_program_nesting_event_check
* perfschema.statement_program_nested
* perfschema.max_program_zero

A small increase to the test thread-stack-size on statement_program_lost_inst
allows this test to continue to pass.
2025-09-24 09:08:16 +10:00
Jan Lindström
f9bdff6162 MDEV-37373 : InnoDB partition table disallow local GTIDs in galera
Problem was that for partitioned tables base table storage engine
is DB_TYPE_PARTITION_DB and naturally different than DB_TYPE_INNODB
so operation was not allowed in Galera.

Fixed by requesting implementing storage engine for partitioned
tables i.e. table->file->partition_ht() or if that does not exist
we can use base table storage engine. Resulting storage engine
type is then used on condition is operation allowed when
wsrep_mode=DISALLOW_LOCAL_GTID or not. Operations to InnoDB
storage engine i.e DB_TYPE_INNODB should be allowed.
2025-09-23 13:17:00 +03:00
Thirunarayanan Balathandayuthapani
687b18648c MDEV-35163 InnoDB persistent statistics fail to update after ALTER TABLE...ALGORITHM=COPY
Problem:
=======
- InnoDB statistics calculation for the table is done after
every 10 seconds by default in background thread dict_stats_thread()

- Doing multiple ALTER TABLE..ALGORITHM=COPY causes the
dict_stats_thread() to lag behind, therefore calculation of stats
for newly created intermediate table gets delayed

Fix:
====
- Stats calculation for newly created intermediate table is made
independent of background thread. After copying gets completed,
stats for new table is calculated as part of ALTER TABLE ... ALGORITHM=COPY.

dict_stats_rename_table(): Rename the table statistics from
intermediate table to new table

alter_stats_rebuild(): Removes the table name from the warning.
Because this warning can print for intermediate table as well.

Alter table using copy algorithm now calls alter_stats_rebuild()
under a shared MDL lock on a temporary #sql-alter- table,
differing from its previous use only during ALGORITHM=INPLACE
operations on user-visible tables.

dict_stats_schema_check(): Added a separate check for table
readability before checking for tablespace existence.
This could lead to detect of existence of persistent statistics
storage eariler and fallback to transient statistics.

This is a cherry-pick fix of mysql commit@cfe5f287ae99d004e8532a30003a7e8e77d379e3
2025-09-22 17:39:47 +05:30
Sergei Golubchik
c0233a09ee MDEV-37600 Backpoint MDEV-9804 Implement a caching_sha2_password plugin
but without caching
2025-09-21 13:13:30 +02:00
mariadb-satishkumar
1454d28cf8 MDEV-37299: Fix crash when server read-only and encrption ON
Modified srv_start to call fil_crypt_threads_init() only
when srv_read_only_mode is not set.

Modified encryption.innodb-read-only to capture number of
encryption threads created for both scenarios when
server is not read only as well as when server is read only.
2025-09-19 11:55:43 +05:30
Arcadiy Ivanov
62b21714d0 Reproducible test case for MDEV-37434
Add debug logging to help with tracing

Add the fix
2025-09-18 18:01:33 +02:00
Nikita Malyavin
28472359b1 MDEV-15990 versioning: don't allow changes in the past 2025-09-17 18:47:25 +03:00
Nikita Malyavin
8001679af6 MDEV-15990 handle timestamp-based collisions as well
Timestamp-versioned row deletion was exposed to a collisional problem: if
current timestamp wasn't changed, then a sequence of row delete+insert could
get a duplication error. A row delete would find another conflicting history row
and return an error.

This is true both for REPLACE and DELETE statements, however in REPLACE, the
"optimized" path is usually taken, especially in the tests. There, delete+insert
is substituted for a single versioned row update. In the end, both paths end up
as ha_update_row + ha_write_row.

The solution is to handle a history collision somehow.

From the design perspective, the user shouldn't experience history rows loss,
unless there's a technical limitation.

To the contrary, trxid-based changes should never generate history for the same
transaction, see MDEV-15427.

If two operations on the same row happened too quickly, so that they happen at
the same timestamp, the history row shouldn't be lost. We can still write a
history row, though it'll have row_start == row_end.

We cannot store more than one such historical row, as this will violate the
unique constraint on row_end. So we will have to phisically delete the row if
the history row is already available.

In this commit:
1. Improve TABLE::delete_row to handle the history collision: if an update
   results with a duplicate error, delete a row for real.
2. use TABLE::delete_row in a non-optimistic path of REPLACE, where the
   system-versioned case now belongs entirely.
2025-09-17 18:29:47 +03:00
Nikita Malyavin
aeb25743af MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY
We had a protection against it, by allowing versioned delete if:
trx->id != table->vers_start_id()

For replace this check fails: replace calls ha_delete_row(record[2]), but
table->vers_start_id() returns the value from record[0], which is irrelevant.

The same problem hits Field::is_max, which may have checked the wrong record.

Fix:
* Refactor Field::is_max to optionally accept a pointer as an argument.
* Refactor vers_start_id and vers_end_id to always accept a pointer to the
record. there is a difference with is_max is that is_max accepts the pointer to
the
field data, rather than to the record.

Method val_int() would be too effortful to refactor to accept the argument, so
instead the value in record is fetched directly, like it is done in
Field_longlong.
2025-09-17 11:38:55 +03:00
Marko Mäkelä
acd3db4e44 Merge 10.11 into 11.4 2025-09-16 17:01:39 +03:00
Alexey Yurchenko
e238246872 MDEV-37494 Diagnostics_area does not always contain apply error info
It appears that some error conditions don't store error information in the
Diagnostics_area. For example when table_def::compatible_with() check fails
error message is stored in Relay_log_info instead.
This results in optimistically identical votes and zero error buffer size
breaks wsrep-lib logic as it relies on error buffer size to decide whether
voting took place.
To account for this, first try to obtain error info from Diagnostics_area,
then fallback to Relay_log_info. If that fails use some "random" data to
distinguish this condition from success in production.
2025-09-15 16:48:10 +02:00
Oleksandr Byelkin
15b1426c3a Merge branch '10.11' into bb-11.4-release 2025-09-15 16:17:33 +02:00
Sergei Golubchik
886a51d956 MDEV-35875 Misleading error message for non-existing ENCRYPTION_KEY_ID
update the test case
2025-09-15 11:00:02 +02:00
Sergei Golubchik
ed81e5f456 MDEV-37375 engines/iuds suite fails with ps-protocol
and collateral cleanup
2025-09-15 11:00:02 +02:00
Marko Mäkelä
fe59b4ce96 MDEV-37412: Better test case
Instead of using DBUG_EXECUTE_IF fault injection, let us construct
a minimal corrupted log file that will produce an OPT_PAGE_CHECKSUM
mismatch without depending on CMAKE_BUILD_TYPE=Debug.
2025-09-15 08:44:26 +03:00
Monty
6058e02732 MDEV-37172 Server crashes in Item_func_nextval::update_table after INSERT to the table, that uses expression with nextval() as default
The issue was that unpack_vcol_info_from_frm() wrongly linked the used
sequence tables into tables->internal_tables when more than one sequence
table was used.

Other things:
- Fixed internal_table_exists() to take db into account.
  (This is making the code easier to read. As we where comparing
   pointers the old code also worked).
2025-09-14 19:24:07 +03:00
Oleksandr Byelkin
0707dac202 Merge branch '10.6' into 10.11 2025-09-12 13:08:40 +02:00
Dave Gosselin
47df0ba17c Cherry-pick of 'mariadb-test: wait on disconnect' from 12.1
Cherry-picks mysqltest.cc and rpl_semi_sync_shutdown_await_ack changes
from 12.1 to fix a race condition on disconnect.
2025-09-10 13:57:07 -04:00
Thirunarayanan Balathandayuthapani
4dcd2d8513 MDEV-37412 Corrupted page during recovery aborts the server
Problem:
=======
When InnoDB encounters a corrupted page during crash recovery,
server would abort due to improper handling of page locks
and space references. The recovery process was not properly
cleaning up resources when corruption was detected,
leading to inconsistent state and server termination.

Solution:
=========
recover_low(): Move page lock recursive acquisition
after deferred/non-deferred page creation logic to
ensure consistent locking behavior for both code paths.
Ensure proper block recursive unlock for non-deferred tablespaces

recv_recover_page(): Simplify corrupted page cleanup by
removing redundant space reference handling.
2025-09-10 15:25:58 +03:00
Elena Stepanova
c40402e4a9 MDEV-37618 galera.MDEV-26266 fails with ER_OPTION_PREVENTS_STATEMENT with PS protocol 2025-09-10 14:12:10 +03:00
Marko Mäkelä
7e76a58e0b Merge 10.11 into 11.4 2025-09-09 14:09:10 +03:00
Sergei Golubchik
5743435954 MDEV-37397 Assertion `bitmap_is_set(&read_partitions, next->id)' failed in int partition_info::vers_set_hist_part(THD *)
after 633417308f (MDEV-37312) lookup_handler is locked with F_WRLCK,
because it may be used for deleting rows.

And lookup_handler is locked with F_WRLCK after prune_partitions(),
but the main handler is locked before, and might expects all
partitions to be in the read least, non-pruned.

Let's prepare the lookup handler before prune_partitions().
2025-09-04 17:20:02 +02:00
Monty
fd39c63b41 MDEV-37520 Failure to detect corruption during backups of Aria table
Fixed the following issues:
- aria_read_index() and aria_read_data(), used by mariabackup, checked
  the wrong status from maria_page_crc_check().
- Both functions did infinite retries if crc did not match.
- Wrong usage of ma_check_if_zero() in maria_page_crc_check()

Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
2025-09-04 18:08:39 +03:00
Monty
882f6fa3aa Fixed typos
- Removed duplicate words, like "the the" and "to to"
- Removed duplicate lines (one double sort line found in mysql.cc)
- Fixed some typos found while searching for duplicate words.

Command used to find duplicate words:
egrep -rI "\s([a-zA-Z]+)\s+\1\s" | grep -v param

Thanks to Artjoms Rimdjonoks for the command and pointing out the
spelling errors.
2025-09-04 18:08:39 +03:00
Monty
d2ce0650ad MDEV-37356 Annotate_rows written in a 'random' position
Ensure that Annotate_rows is always written direct after GTID information,
before any table_map events.

Before this patch, the following problems existed when mixing
transactional and not transactional tables in the same statement:
- Annotate rows could be written after row events or in the next GTID
  event.
  - See rpl_row_mixing_engines

- Annotate_rows was not always written to binary log in case of error
  with a transactional table (rolled back) but a not transactional
  table was updated.
  - See sp_trans_log, binlog_row_mix_innodb_myisam

Fixed by writing the Annotate_rows event into the non transactional
cache if there are not transactional tables used. If not, write the
event into the transactional cache.
2025-09-04 18:08:39 +03:00
Monty
6a4fe9923d Improvements for myisamchk
These changes was done as part of fixing
MDEV-36858 MariaDB MyISAM secondary indexes silently break for
           tables > 10B rows

Changes done in myisamchk:
- Tables that are checked are opened in readonly mode if --force is not
  used.
- *.MYD files will be opened in readonly mode for repair if --quick
  is used.
- Added information about check progress if --verbose is used.
- Output information about repaired/checked rows every 10000 rows instead
  of every 1000 rows. Note that this also affects aria_chk
- Store open file mode in share->index_mode and share->data_mode instead
  of in share->mode.
- Added new option --keys-active= as a simpler version of keys-used.
- Changed output for "myisamchk -dvv" to get nicer output for tables
  with 10 billion rows.
2025-09-04 18:08:39 +03:00
Monty
8f771b28a1 MDEV-34914 maria.bulk_insert_crash fails on s390x (10.6+, Debug)
This was caused by a wrong handling of bitmaps in
copy_not_changed_fields() that did not work on big endian machines.
This bug caused recovery of Aria files to fail on big endian machines
like s390x or Sparc.

This issue was noticed by the bulk_insert_crash.test on the
s390x builder.
2025-09-04 17:15:50 +03:00
Marko Mäkelä
ef2f3d207b MDEV-16168: Performance regression after MDEV-12288
The function row_purge_reset_trx_id() that had been introduced in
commit 3c09f148f3 (MDEV-12288)
introduces some extra buffer pool and redo log activity that will
cause a significant performance regression under some workloads.

This is currently the most significant performance issue, after
commit acd071f599 (MDEV-21923)
fixed the InnoDB LSN allocation and MDEV-19749 the MDL bottleneck in 12.1.

The purpose of row_purge_reset_trx_id() was to ensure that we can
easily identify records for which no history exists. If DB_TRX_ID
is 0, we could avoid looking up the transaction to see if the
history is accessible or the record is implicitly locked.

To avoid trx_sys_t::find() for stale DB_TRX_ID values, we can refer
to trx_t::max_inactive_id, which was introduced in
commit 4105017a58 (MDEV-30357).
Instead of comparing DB_TRX_ID to 0, we may compare it to this
cached value. The cache would be updated by
trx_sys_t::find_same_or_older(), which is invoked for some operations
on secondary indexes.

row_purge_reset_trx_id(): Remove. We will no longer reset the
DB_TRX_ID to 0 after an INSERT. We will retain a single undo log
for all operations, though. Before MDEV-12288, there had been
separate insert_undo and update_undo logs.

row_check_index(): No longer warn
"InnoDB: Clustered index record with stale history in table".

lock_rec_queue_validate(), lock_rec_convert_impl_to_expl(),
row_vers_impl_x_locked_low(): Instead of comparing the DB_TRX_ID
to 0, compare it to trx_t::max_inactive_id.

In dict0load.cc we will not spend any effort to avoid extra
trx_sys.find() calls for stale DB_TRX_ID in dictionary tables.
This code does not currently use trx_t objects, and therefore
we cannot easily access trx_t::max_inactive_id. Loading table
definitions into the InnoDB data dictionary cache (dict_sys)
should be a very rare operation.

Reviewed by: Vladislav Lesin
2025-09-04 08:40:40 +03:00
Marko Mäkelä
257f4b30ef Merge 10.11 into 11.4 2025-09-03 10:32:56 +03:00
Nikita Malyavin
0108664a8a Merge branch 10.11 into 11.4
# Conflicts:
#	sql/handler.h
#	sql/log_event.h
#	sql/log_event_server.cc
2025-09-02 15:58:39 +02:00
Marko Mäkelä
cc277a7d24 MDEV-36024: Redesign innodb_encrypt_log=ON
The innodb_encrypt_log=ON subformat of FORMAT_10_8 is inefficient,
because a new encryption or decryption context is being set up for
every log record payload snippet.

An in-place conversion between the old and new innodb_encrypt_log=ON
format is technically possible. No such conversion has been
implemented, though. There is some overhead with respect to the
unencrypted format (innodb_encrypt_log=OFF): At the end of each
mini-transaction, right before the CRC-32C, additional 8 bytes will be
reserved for a nonce (really, log_sys.get_flushed_lsn()), which forms
a part of an initialization vector.

log_t::FORMAT_ENC_11: The new format identifier, a UTF-8 encoding of
🗝 U+1F5DD OLD KEY (encryption). In this format, everything except the
types and lengths of log records will be encrypted. Thus, unlike in
FORMAT_10_8, also page identifiers and FILE_ records will be encrypted.
The initialization vector (IV) consists of the 8-byte nonce as well as
the type and length byte(s) of the first record of the mini-transaction.
Page identifiers will no longer form any part of the IV.

The old log_t::FORMAT_ENC_10_8 (innodb_encrypt_log=ON) will be supported
both by mariadb-backup and by crash recovery. Downgrade from the new
format will only be possible if the new server has been running or
restarted with innodb_encrypt_log=OFF. If innodb_encrypt_log=ON,
only the new log_t::FORMAT_ENC_11 will be written.

log_t::is_recoverable(): A new predicate, which holds for all 3
formats.

recv_sys_t::tmp_buf: A heap-allocated buffer for decrypting a
mini-transaction, or for making the wrap-around of a memory-mapped
log file contiguous.

recv_sys_t::start_lsn: The start of the mini-transaction.
Updated at the start of parse_tail().

log_decrypt_mtr(): Decrypt a mini-transaction in recv_sys.tmp_buf.
Theoretically, when reading the log via pread() rather than a read-only
memory mapping, we could modify the contents of log_sys.buf in place.
If we did that, we would have to re-read the last log block into
log_sys.buf before resuming writes, because otherwise that block could be
re-written as a mix of old decrypted data and new encrypted data, which
would cause a subsequent recovery failure unless the log checkpoint had
been advanced beyond this point.

log_decrypt_legacy(): Decrypt a log_t::FORMAT_ENC_10_8 record snippet
on stack. Replaces recv_buf::copy_if_needed().

recv_sys_t::get_backup_parser(): Return a recv_sys_t::parser, that is,
a pointer to an instantiation of parse_mmap or parse_mtr for the current
log format.

recv_sys_t::parse_mtr(), recv_sys_t::parse_mmap(): Add a parameter
template<uint32_t> for the current log_sys.format.

log_parse_start(): Validate the CRC-32C of a mini-transaction.
This has been split from the recv_sys_t::parse() template to
reduce code duplication. These two are the lowest-level functions
that will be instantiated for both recv_buf and recv_ring.

recv_sys_t::parse(): Split into ::log_parse_start() and parse_tail().
Add a parameter template<uint32_t format> to specialize for
log_sys.format at compilation time.

recv_sys_t::parse_tail(): Operate on pointers to contiguous
mini-transaction data. Use a parameter template<bool ENC_10_8>
for special handling of the old innodb_encrypt_log=ON format.
The former recv_buf::get_buf() is being inlined here.
Much of the logic is split into non-inline functions, to avoid
duplicating a lot of code for every template expansion.

log_crypt: Encrypt or decrypt a mini-transaction in place in the
new innodb_encrypt_log=ON format. We will use temporary buffers
so that encryption_ctx_update() can be invoked on integer multiples
of MY_AES_BLOCK_SIZE, except for the last bytes of the encrypted
payload, which will be encrypted or decrypted in place thanks to
ENCRYPTION_FLAG_NOPAD.

log_crypt::append(): Invoke encryption_ctx_update() in MY_AES_BLOCK_SIZE
(16-byte) blocks and scatter/gather shorter data blocks as needed.

log_crypt::finish(), Handle the last (possibly incomplete) block as a
special case, with ENCRYPTION_FLAG_NOPAD.

mtr_t::parse_length(): Parse the length of a log record.

mtr_t::encrypt(): Use log_crypt instead of the old log_encrypt_buf().

recv_buf::crc32c(): Add a parameter for the initial CRC-32C value.

recv_sys_t::rewind(): Operate on pointers to the start of the
mini-transaction and to the first skipped record.

recv_sys_t::trim(): Declare as ATTRIBUTE_COLD so that this rarely
invoked function will not be expanded inline in parse_tail().

recv_sys_t::parse_init(): Handle INIT_PAGE or FREE_PAGE while scanning
to the end of the log.

recv_sys_t::parse_page0(): Handle WRITE to FSP_SPACE_SIZE and
FSP_SPACE_FLAGS.

recv_sys_t::parse_store_if_exists(), recv_sys_t::parse_store(),
recv_sys_t::parse_oom(): Handle page-level log records.

mlog_decode_varint_length(): Make use of __builtin_clz() to avoid a loop
when possible.

mlog_decode_varint(): Define only on const byte*, as
ATTRIBUTE_NOINLINE static because it is a rather large function.

recv_buf::decode_varint(): Trivial wrapper for mlog_decode_varint().

recv_ring::decode_varint(): Special implementation.

log_page_modify(): Note that a page will be modified in recovery.
Split from recv_sys_t::parse_tail().

log_parse_file(): Handle non-page log records.

log_record_corrupted(), log_unknown(), log_page_id_corrupted():
Common error reporting functions.
2025-09-02 13:28:34 +03:00
mariadb-satishkumar
ad44e1b964 MDEV-36993: Format log for srv_mon_reset_all 2025-09-02 15:31:34 +05:30
Brandon Nesterenko
a394fc0270 MDEV-29981: Replica stops with "Found invalid event in binary log"
Replication can stop in error if a Heartbeat log event is sent to a
replica during rotation. There are two bugs at play:

  1. Prior to MDEV-30128 (added in 11.0), there is a bug when checking
     legacy events. When the replica rotates its relay logs, it
     initializes its Format_description_log_event with binlog version 3
     (this is hard-coded). So immediately after rotation (and until a
     new Format_descriptor with binlog_format 4 is sent from the
     master), the IO thread is expecting binlog_format 3 (i.e. it will
     call queue_old_event() for incoming events). This invalidates any
     events that are sent with an event type higher than 14. In theory,
     we wouldn't expect any events to be sent in-between a rotate and
     the next format descriptor log event, but if a long enough period
     of time passes between then, the primary will generate and send a
     Heartbeat event (of type 27). In such case, the slave will see the
     heartbeat event of type 27, see it is higher than 14, and result
     in an error mentioning 'Found invalid event in binary log', with
     the expected log coordinates of the new log (which is
     optimistically populated from the Rotate log event, not the new
     event).

  2. In all versions of MariaDB (11.0+), there is a bug when checking
     the state of a Heartbeat log event, in that it doesn't consider a
     rotated binary log. The check is meant to ensure that the
     heartbeat provided by the master (i.e. the state of the master) is
     greater than or equal to the state of the slave. In other words,
     it checks that the slave isn't ahead of the master. However, if
     the filename provided by the master heartbeat event is different
     than the filename saved for the slave's state, the check always
     fails. This is broken, because when the master rotates its logs,
     the new binary log file will have a different filename (i.e. an
     incremented index counter suffix). For example, if the master
     rotates its binary logs from master-bin.000002 to
     master-bin.000003, master-bin.000003 is ahead of
     master-bin.000002, but the slave will see a difference between the
     filenames and fail the check.

To fix the first problem, this patch disallows passing a heartbeat
event into queue_old_event (which is the source of the error, as it
tries to parse a heartbeat log event). This function (queue_old_event)
was removed with MDEV-30128, so bypassing it for heartbeat events is
not consequential (and it is already also done for
Format_description_events, which are not supported in old binlog file
versions). Note that backporting all of MDEV-30128 was also considered,
but this is less risky for GA.

To fix the second problem, we simply ignore heartbeat events on the
slave if the filenames don't match. This is because during rotation,
it can appear that the slave is ahead of the master, which breaks the
validity of the check (i.e. the check is to ensure the master is
ahead of the slave).

Additionally note that this patch restores a heartbeat check that was
incorrectly removed in 780db8e252

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
2025-08-22 15:04:02 -06:00
Marko Mäkelä
3ee1991645 MDEV-36159 mariabackup failed after upgrade
Ever since commit 685d958e38
(MDEV-14425) mariadb-backup --backup had some trouble to keep up
with write workloads of the mariadbd server.

Debarun Banerjee found out that mariadb-backup --backup was
copying the log in the wrong way and not pausing when it made
sense to do so. This change includes his fix as well as some
dead code removal from xtrabackup_copy_mmap_logfile().

Some earlier changes to the default behaviour of mariadb-backup --backup
will be reverted, by making the configuration parameters OFF by default.
These parameters were basically working around this bug:

* commit 652f33e0a4 (MDEV-30000)
introduced --innodb-log-checkpoint-now and made it ON by default.
Making the server execute a log checkpoint can be really I/O intensive.
* commit 6acada713a (MDEV-34062)
introduced --innodb-log-file-mmap and made it ON by default on
Linux and FreeBSD. There are no documented semantics what should
happen to a memory mapping when there are concurrent pwrite(2)
operations by other processes. While it appears to work, it is safer
to default to clearly documented semantics.

xtrabackup_copy_logfile(): Add a parameter early_exit.
Always read a log snippet to the start of recv_sys.buf and assign
recv_sys.len to the read length. We used to shift recv_sys.buf
with memmove(). However, on recv_sys_t::PREMATURE_EOF we cannot know
which part of the mini-transaction was correctly read, because that
part of the ib_logfile0 may be concurrently modified by the server.
So, we will reread everything from the start of the mini-transaction.

xtrabackup_backup_func(): Invoke xtrabackup_copy_logfile(true),
allowing it to stop on every recv_sys_t::PREMATURE_EOF.
This will also avoid repeated "Retry" messages when there is no
more redo log to copy.

get_current_lsn(): Execute FLUSH ENGINE LOGS to ensure that
InnoDB will complete any buffered writes to the ib_logfile0
and ensure that everything up to the current LSN has been
written.

backup_wait_for_commit_lsn(): Wait for as much as is really needed.
This avoids an extra 5-second wait at the end of the backup.

xtrabackup_copy_mmap_logfile(): Remove some dead code, and add
debug assertions to demonstrate that the parser can only return
recv_sys_t::OK or recv_sys_t::GOT_EOF.
2025-08-20 15:30:49 +03:00
Julius Goryavsky
aa3dd63d40 Merge branch '10.6' into '10.11' 2025-08-14 22:10:45 +02:00
Alexey Yurchenko
8dae7150b2 MTR test to verify that Galera gcs.stateless flag works
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2025-08-14 21:59:11 +02:00