Commit graph

28,672 commits

Author SHA1 Message Date
Monty
f65dda628d Fixed that one can compile MariaDB with ASAN with -Wframe-larger-than=16384
Added PRAGMA_DISABLE_CHECK_STACK_FRAME around some functions
2025-09-04 18:08:38 +03:00
Monty
0ec675ce89 Fixed compiler issues when compiling with UBSAN
Updated also BUILD/compile-pentium64-ubsan to use -Wno-unused-parameter
to get rid of compiler warnings
2025-09-04 18:08:38 +03:00
Monty
8f771b28a1 MDEV-34914 maria.bulk_insert_crash fails on s390x (10.6+, Debug)
This was caused by a wrong handling of bitmaps in
copy_not_changed_fields() that did not work on big endian machines.
This bug caused recovery of Aria files to fail on big endian machines
like s390x or Sparc.

This issue was noticed by the bulk_insert_crash.test on the
s390x builder.
2025-09-04 17:15:50 +03:00
Marko Mäkelä
ef2f3d207b MDEV-16168: Performance regression after MDEV-12288
The function row_purge_reset_trx_id() that had been introduced in
commit 3c09f148f3 (MDEV-12288)
introduces some extra buffer pool and redo log activity that will
cause a significant performance regression under some workloads.

This is currently the most significant performance issue, after
commit acd071f599 (MDEV-21923)
fixed the InnoDB LSN allocation and MDEV-19749 the MDL bottleneck in 12.1.

The purpose of row_purge_reset_trx_id() was to ensure that we can
easily identify records for which no history exists. If DB_TRX_ID
is 0, we could avoid looking up the transaction to see if the
history is accessible or the record is implicitly locked.

To avoid trx_sys_t::find() for stale DB_TRX_ID values, we can refer
to trx_t::max_inactive_id, which was introduced in
commit 4105017a58 (MDEV-30357).
Instead of comparing DB_TRX_ID to 0, we may compare it to this
cached value. The cache would be updated by
trx_sys_t::find_same_or_older(), which is invoked for some operations
on secondary indexes.

row_purge_reset_trx_id(): Remove. We will no longer reset the
DB_TRX_ID to 0 after an INSERT. We will retain a single undo log
for all operations, though. Before MDEV-12288, there had been
separate insert_undo and update_undo logs.

row_check_index(): No longer warn
"InnoDB: Clustered index record with stale history in table".

lock_rec_queue_validate(), lock_rec_convert_impl_to_expl(),
row_vers_impl_x_locked_low(): Instead of comparing the DB_TRX_ID
to 0, compare it to trx_t::max_inactive_id.

In dict0load.cc we will not spend any effort to avoid extra
trx_sys.find() calls for stale DB_TRX_ID in dictionary tables.
This code does not currently use trx_t objects, and therefore
we cannot easily access trx_t::max_inactive_id. Loading table
definitions into the InnoDB data dictionary cache (dict_sys)
should be a very rare operation.

Reviewed by: Vladislav Lesin
2025-09-04 08:40:40 +03:00
Marko Mäkelä
257f4b30ef Merge 10.11 into 11.4 2025-09-03 10:32:56 +03:00
Marko Mäkelä
a304282782 MDEV-37553: Assertion failure lsn - get_flushed_lsn(...) < capacity()
log_t::append_prepare_wait(): Relax the debug assertion in case
log_overwrite_warning() has been called. In this case, the
contents of log_sys.buf (and the ib_logfile0) is basically
unrecoverable garbage, and it does not matter which write was
last persisted.

This assertion would easily fail in the 11.4 branch in the test
encryption.innochecksum after merging MDEV-36024.
2025-09-03 08:45:45 +03:00
Nikita Malyavin
0108664a8a Merge branch 10.11 into 11.4
# Conflicts:
#	sql/handler.h
#	sql/log_event.h
#	sql/log_event_server.cc
2025-09-02 15:58:39 +02:00
Marko Mäkelä
6619e1d01a Always inline simple mach_read/write
At least in some ATTRIBUTE_COLD code, mach_read_from_8() could be
invoked as a function call that could be as simple as wrapping
one or two instructions. Let us declare __attribute__((always_inline))
on those memory accessor functions that operate on 1, 2, 4, or 8 bytes
and are therefore likely to translate into few instructions, such as
mov;bswap or movbe on x86.
2025-09-02 13:30:39 +03:00
Marko Mäkelä
cc277a7d24 MDEV-36024: Redesign innodb_encrypt_log=ON
The innodb_encrypt_log=ON subformat of FORMAT_10_8 is inefficient,
because a new encryption or decryption context is being set up for
every log record payload snippet.

An in-place conversion between the old and new innodb_encrypt_log=ON
format is technically possible. No such conversion has been
implemented, though. There is some overhead with respect to the
unencrypted format (innodb_encrypt_log=OFF): At the end of each
mini-transaction, right before the CRC-32C, additional 8 bytes will be
reserved for a nonce (really, log_sys.get_flushed_lsn()), which forms
a part of an initialization vector.

log_t::FORMAT_ENC_11: The new format identifier, a UTF-8 encoding of
🗝 U+1F5DD OLD KEY (encryption). In this format, everything except the
types and lengths of log records will be encrypted. Thus, unlike in
FORMAT_10_8, also page identifiers and FILE_ records will be encrypted.
The initialization vector (IV) consists of the 8-byte nonce as well as
the type and length byte(s) of the first record of the mini-transaction.
Page identifiers will no longer form any part of the IV.

The old log_t::FORMAT_ENC_10_8 (innodb_encrypt_log=ON) will be supported
both by mariadb-backup and by crash recovery. Downgrade from the new
format will only be possible if the new server has been running or
restarted with innodb_encrypt_log=OFF. If innodb_encrypt_log=ON,
only the new log_t::FORMAT_ENC_11 will be written.

log_t::is_recoverable(): A new predicate, which holds for all 3
formats.

recv_sys_t::tmp_buf: A heap-allocated buffer for decrypting a
mini-transaction, or for making the wrap-around of a memory-mapped
log file contiguous.

recv_sys_t::start_lsn: The start of the mini-transaction.
Updated at the start of parse_tail().

log_decrypt_mtr(): Decrypt a mini-transaction in recv_sys.tmp_buf.
Theoretically, when reading the log via pread() rather than a read-only
memory mapping, we could modify the contents of log_sys.buf in place.
If we did that, we would have to re-read the last log block into
log_sys.buf before resuming writes, because otherwise that block could be
re-written as a mix of old decrypted data and new encrypted data, which
would cause a subsequent recovery failure unless the log checkpoint had
been advanced beyond this point.

log_decrypt_legacy(): Decrypt a log_t::FORMAT_ENC_10_8 record snippet
on stack. Replaces recv_buf::copy_if_needed().

recv_sys_t::get_backup_parser(): Return a recv_sys_t::parser, that is,
a pointer to an instantiation of parse_mmap or parse_mtr for the current
log format.

recv_sys_t::parse_mtr(), recv_sys_t::parse_mmap(): Add a parameter
template<uint32_t> for the current log_sys.format.

log_parse_start(): Validate the CRC-32C of a mini-transaction.
This has been split from the recv_sys_t::parse() template to
reduce code duplication. These two are the lowest-level functions
that will be instantiated for both recv_buf and recv_ring.

recv_sys_t::parse(): Split into ::log_parse_start() and parse_tail().
Add a parameter template<uint32_t format> to specialize for
log_sys.format at compilation time.

recv_sys_t::parse_tail(): Operate on pointers to contiguous
mini-transaction data. Use a parameter template<bool ENC_10_8>
for special handling of the old innodb_encrypt_log=ON format.
The former recv_buf::get_buf() is being inlined here.
Much of the logic is split into non-inline functions, to avoid
duplicating a lot of code for every template expansion.

log_crypt: Encrypt or decrypt a mini-transaction in place in the
new innodb_encrypt_log=ON format. We will use temporary buffers
so that encryption_ctx_update() can be invoked on integer multiples
of MY_AES_BLOCK_SIZE, except for the last bytes of the encrypted
payload, which will be encrypted or decrypted in place thanks to
ENCRYPTION_FLAG_NOPAD.

log_crypt::append(): Invoke encryption_ctx_update() in MY_AES_BLOCK_SIZE
(16-byte) blocks and scatter/gather shorter data blocks as needed.

log_crypt::finish(), Handle the last (possibly incomplete) block as a
special case, with ENCRYPTION_FLAG_NOPAD.

mtr_t::parse_length(): Parse the length of a log record.

mtr_t::encrypt(): Use log_crypt instead of the old log_encrypt_buf().

recv_buf::crc32c(): Add a parameter for the initial CRC-32C value.

recv_sys_t::rewind(): Operate on pointers to the start of the
mini-transaction and to the first skipped record.

recv_sys_t::trim(): Declare as ATTRIBUTE_COLD so that this rarely
invoked function will not be expanded inline in parse_tail().

recv_sys_t::parse_init(): Handle INIT_PAGE or FREE_PAGE while scanning
to the end of the log.

recv_sys_t::parse_page0(): Handle WRITE to FSP_SPACE_SIZE and
FSP_SPACE_FLAGS.

recv_sys_t::parse_store_if_exists(), recv_sys_t::parse_store(),
recv_sys_t::parse_oom(): Handle page-level log records.

mlog_decode_varint_length(): Make use of __builtin_clz() to avoid a loop
when possible.

mlog_decode_varint(): Define only on const byte*, as
ATTRIBUTE_NOINLINE static because it is a rather large function.

recv_buf::decode_varint(): Trivial wrapper for mlog_decode_varint().

recv_ring::decode_varint(): Special implementation.

log_page_modify(): Note that a page will be modified in recovery.
Split from recv_sys_t::parse_tail().

log_parse_file(): Handle non-page log records.

log_record_corrupted(), log_unknown(), log_page_id_corrupted():
Common error reporting functions.
2025-09-02 13:28:34 +03:00
Marko Mäkelä
1afc682932 MDEV-36024 preparation: Shrink mtr_buf_t
mtr_t::get_log_size(): Remove.

mtr_t::crc32c(): New function: compute CRC-32C and determine the size,
including the sequence byte and the CRC-32C.

mtr_t::encrypt(): Return the size, similar to crc32c().

mtr_t::log_file_op(): Return the size written.

fil_name_write(): Remove. Let us invoke mtr_t::log_file_op() directly.

fil_names_clear(): Keep track of the available size without
invoking mtr_t::get_log_size().

mtr_buf_t::m_size: Remove.

mtr_buf_t::list_t: Use ilist instead of sized_ilist.

mtr_buf_t::for_each_block(): Remove. Let us allow iteration via
begin() and end(), without any lambda function objects.
2025-09-02 13:21:36 +03:00
mariadb-satishkumar
ad44e1b964 MDEV-36993: Format log for srv_mon_reset_all 2025-09-02 15:31:34 +05:30
Daniel Black
f4d203ca2c rocksdb: tests timeout under MSAN+Debug
But rocksdb.bulk_load_unsorted_rev and rocksdb.bulk_load_unsorted
succeed under non-debug builds, and because it was slow at 87 seconds)
there is a --big-test criteria for these tests.
2025-08-29 20:40:26 +10:00
Daniel Black
725874941d MDEV_37504 Rocks replace __unused__ attribute with nameless parameters
Per c++98.
2025-08-29 20:40:26 +10:00
Daniel Black
a0384c2f88 MDEV-37504 MemorySanitizer: use-of-uninitialized-value myrocks::Rdb_key_def::pack_field
m_charset_codec is uninitalized when calling m_make_unpack_info_func.

In the cases where m_make_unpack_info_func is one of:
* Rdb_key_def::make_unpack_unknown_varchar
* Rdb_key_def::make_unpack_unknown
* Rdb_key_def::dummy_make_unpack_info

the m_charset_coded that forms the first argument to this function
is unused.

In these limited cases we initialize the m_charset_codec member
as the only use is to pass though to the m_make_unpack_info_func

Ultimately MemorySanitizer shouldn't error on this as all
of these 3 functions clearly have the attribute
__unused__ on their first argument where the m_charset_coded is
passed.
2025-08-29 20:40:26 +10:00
Marko Mäkelä
21bb6a3e34 MDEV-37447: Race condition between buf_pool_t::shrink() and page_guess()
buf_pool_t::shrink(): When relocating a buffer page, invalidate
the page identifier of the original page so that buf_pool_t::page_guess()
will not accidentally match it.

Before commit b6923420f3 (MDEV-29445)
introduced buf_pool_t::page_guess(), the validity of block descriptor
pointers was checked by buf_pool_t::is_uncompressed(const buf_block_t*).
Therefore, any block descriptors that used to be part of a larger buffer
pool would not be accessed at all.

This race condition is very hard to reproduce. To reproduce it,
an optimistic btr_pcur_t::restore_position() or similar will have to
be invoked on a block that has been relocated by buf_pool_t::shrink()
and that had not meanwhile been replaced with another page with a
different identifier.

Reviewed by: Vladislav Lesin
2025-08-27 11:02:19 +03:00
Marko Mäkelä
47fefd4a96 Merge 10.6 into 10.11 2025-08-22 06:47:54 +03:00
Marko Mäkelä
1d84cb272f Fix clang-21 -Wunnecessary-virtual-specifier
Member functions of a final class cannot be virtual.
2025-08-21 15:17:44 +03:00
Marko Mäkelä
4f7a5f7477 Fix clang-21 -Wuninitialized-const-pointer
pfs_get_thread_file_name_locker_v1(): Note that the last parameter
is unused. Let us pass it as NULL to avoid the warning.
2025-08-21 14:38:48 +03:00
Ruiqiang Hao
e8026a5019 MDEV-35566 Ensure compatibility with ARMv9 by updating .arch directive
The pmem_cvap() function currently uses the '.arch armv8.2-a' directive
for the 'dc cvap' instruction. This will cause build errors below when
compiling for ARMv9 systems. Update the '.arch' directive to 'armv9.4-a'
to ensure compatibility with ARMv9 architectures.

{standard input}: Assembler messages:
{standard input}:169: Error: selected processor does not support `retaa'
{standard input}:286: Error: selected processor does not support `retaa'
make[2]: *** [storage/innobase/CMakeFiles/innobase_embedded.dir/build.make:
1644: storage/innobase/CMakeFiles/innobase_embedded.dir/sync/cache.cc.o]
Error 1

Signed-off-by: Ruiqiang Hao <Ruiqiang.Hao@windriver.com>
2025-08-15 09:47:40 +03:00
Thirunarayanan Balathandayuthapani
e46c9a0152 MDEV-37296 ALTER TABLE allows adding unique hash key with duplicate values
Problem:
=======
- During copy algorithm, InnoDB fails to detect the duplicate
key error for unique hash key blob index. Unique HASH index
treated as virtual index inside InnoDB.
When table does unique hash key , server does search on
the hash key before doing any insert operation and
finds the duplicate value in check_duplicate_long_entry_key().
Bulk insert does all the insert together when copy of
intermediate table is finished. This leads to undetection of
duplicate key error while building the index.

Solution:
========
- Avoid bulk insert operation when table does have unique
hash key blob index.

dict_table_t::can_bulk_insert(): To check whether the table
is eligible for bulk insert operation during alter copy algorithm.
Check whether any virtual column name starts with DB_ROW_HASH_ to
know whether blob column has unique index on it.
2025-08-11 13:29:32 +05:30
Marko Mäkelä
8c3f6a1b85 MDEV-30289 fixup: Clean up mtr_buf_t further
mtr_buf_t::at(), mtr_buf_t::find(): Unused functions, removed.

mtr_buf_t::push(): Invoke memcpy(), not memmove().

Fixes up commit 67dc8af2a7 (MDEV-30289).
2025-08-08 16:35:58 +03:00
Marko Mäkelä
d66a74acb8 Merge 10.6 into 10.11 2025-08-08 14:15:04 +03:00
Marko Mäkelä
4164f17ba7 MDEV-37360: SIGSEGV in srv_printf_innodb_monitor
srv_printf_innodb_monitor(): After acquiring a latch,
abort the iteration if innodb_adaptive_hash_index=OFF.
If the adaptive hash index was disabled in a concurrently
executing thread, btr_search_sys_t::partition::clear() would have
freed part->heap, leading to us dereferencing a null pointer.

Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Saahil Alam
2025-08-08 14:13:27 +03:00
Marko Mäkelä
9ffec4c1f3 MDEV-37360: SIGSEGV in srv_printf_innodb_monitor
srv_printf_innodb_monitor(): After acquiring a latch,
abort the iteration if innodb_adaptive_hash_index=OFF.
If the adaptive hash index was disabled in a concurrently
executing thread, btr_search_sys_t::partition::clear() would have
freed part->heap, leading to us dereferencing a null pointer.

Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Saahil Alam
2025-08-08 14:11:45 +03:00
Marko Mäkelä
5b12799bd2 Cleanup: Remove an unused header 2025-08-07 14:27:14 +03:00
Thirunarayanan Balathandayuthapani
05f9fd3dd2 MDEV-37192 Crash recovery reports corrupiton after bulk load
Problem:
=======
- InnoDB modifies the PAGE_ROOT_AUTO_INC value on clustered index
root page. But before committing the PAGE_ROOT_AUTO_INC changes
mini-transaction, InnoDB does bulk insert operation and
calculates the page checksum and store as a part of redo log in
mini-transaction. During recovery, InnoDB fails to validate the
page checksum.

Solution:
========
- Avoid writing the persistent auto increment value before doing
bulk insert operation.

- For bulk insert operation, persistent auto increment value
is written via btr_write_autoinc while applying the buffered
insert operation.
2025-08-06 16:15:35 +05:30
Monty
df1eb3fbb2 Fixed mi_test1 and mi_test_all.sh
The mi_rsame() test in mi_test has been broken since MDEV-15458 in 2019.
Fixed same was as ma_test1 was fixed.
2025-08-05 11:19:31 +03:00
Monty
6435aeb241 Fixed "frame size..larger thane 16384" error in MyISAM
Fixed by using my_alloc() and by using 'needed allocation size' instead
of maxium possible.
2025-08-05 11:19:30 +03:00
Nikita Malyavin
c4b76b984f MDEV-15990 innodb: change DB_FOREIGN_DUPLICATE_KEY to DB_DUPLICATE_KEY
during row insert

DB_FOREIGN_DUPLICATE_KEY in row_ins_duplicate_error_in_clust was
motivated by handling the cascade changes during versioned operations.

It was found though, that certain row_update_for_mysql calls could
return DB_FOREIGN_DUPLICATE_KEY, even if there's no foreign relations.

Change DB_FOREIGN_DUPLICATE_KEY to DB_DUPLICATE_KEY in
row_ins_duplicate_error_in_clust.

It will be later converted to DB_FOREIGN_DUPLICATE_KEY in
row_ins_check_foreign_constraint if needed.

Additionally, ha_delete_row should return neither. Ensure it by an
assertion.
2025-08-04 17:44:05 +02:00
Nikita Malyavin
6353a80ef5 MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY
We had a protection against it, by allowing versioned delete if:
trx->id != table->vers_start_id()

For replace this check fails: replace calls ha_delete_row(record[2]), but
table->vers_start_id() returns the value from record[0], which is irrelevant.

The same problem hits Field::is_max, which may have checked the wrong record.

Fix:
* Refactor Field::is_max to optionally accept a pointer as an argument.
* Refactor vers_start_id and vers_end_id to always accept a pointer to the
record. there is a difference with is_max is that is_max accepts the pointer to
the
field data, rather than to the record.

Method val_int() would be too effortful to refactor to accept the argument, so
instead the value in record is fetched directly, like it is done in
Field_longlong.
2025-08-04 17:44:05 +02:00
Sergei Golubchik
55a39f13e4 Merge remote-tracking branch 'github/10.11' into 10.11 2025-08-03 10:30:02 +02:00
Sergei Golubchik
59f9ef24ea Merge remote-tracking branch 'github/10.6' into 10.6 2025-08-03 09:51:53 +02:00
Daniel Black
0de58ecbd5 connect engine: correct two uninitalized variable errors
storage/connect/tabxml.cpp:1616:46: warning: ‘*this.XMLCOL::Long’ may be used uninitialized [-Wmaybe-uninitialized]
 1616 |   Valbuf = (char*)PlugSubAlloc(g, NULL, n * (Long + 1));

In this case we are overriding the class 3 lines earlier. Add
some constructs to preserve the value of Long as the old class
being replaced with a new subclass.

storage/connect/filter.cpp:1594:13: warning: ‘*this.FILTERCMP::FILTERX.FILTERX::FILTER.FILTER::Opc’ is used uninitialized [-Wuninitialized]
 1594 |   Bt = OpBmp(g, Opc);
The construction of FILTERCMP has an Opc(ode) and this should be passed
rather than relying on the uninitialized value of the parent class.

Also save its value in the class.
2025-07-29 13:15:02 +10:00
Daniel Black
4f9221ae88 MDEV-36542: remove _lint macro which is unused
Attribute noreturn functions don't need
a return afterwards.

aria_pack was missing the noreturn attribute
on its my_end function.
2025-07-29 13:15:02 +10:00
Daniel Black
6fd57f478f MDEV-36542 Remove UNINIT_VAR(x)=x under UBSAN
Clang processes the "int x=x" code from UNINIT_VAR
literally resulting in an uninitialized read and write.
This is something we want to avoid. Gcc does the same
without emitting warnings.

As the UNINIT_VAR was around avoiding compiler false detection,
and clang doesn't false detect, is default action is a
noop.

Static analysers (examined Infer and SonarQube) are
clang based and have the same detection.

Using a __clang__ instead of WITH_UBSAN would acheived
a better result, however reviewer wanted to keep WITH_UBSAN
only.

LINT_INIT_STRUCT is no longer required, even a gcc-4.8.5
doesn't warn with this construct removed which matches
the comment that it was fixed in gcc ~4.7.

mysql.cc - all paths in com_go populate buff before use.

json: Item_func_json_merge::val_str
  LINT_INIT(js2) unneeded as usage in the previous statements
  it is explicitly initialized to NULL.

Item_func_json_contains_path::val_bool n_found is guarded
by an uninitialized read by mode_one and from
gcc-13.3.0 in Ubuntu 24.04 this is detected. As the only
remaining use of LINIT_INIT this usage has been applied
with the expanded macro with the unused _lint define removed.

The LINT_INIT macro is removed.

_ma_ck_delete - org_key only valid under share->now_transactional
likewise with _ma_ck_write_btree_with_log

connect engine never used anything that FORCE_INIT_OF_VARS
would change.

Reviewer: Monty
2025-07-29 13:15:02 +10:00
Sergei Golubchik
c4ed889b74 Merge branch '10.11' into 11.4 2025-07-28 19:40:10 +02:00
Sergei Golubchik
053f9bcb5b Merge branch '10.6' into 10.11 2025-07-28 18:06:31 +02:00
Sergei Golubchik
ca91bf5d2a Update ColumnStore 23.10.5-1 2025-07-27 15:44:07 +02:00
Michael Widenius
49febfad21 Fixed compiler error with framesize=16384 in InnoDB log0sync.cc
When compiling with debug and valgrind on Ubuntu 22.04 with gcc 11.4.0,
the used framsize was 16400 bytes because of the code:
completion_callback callbacks[1000];

Reducing the array to 950 fixes the issue
2025-07-27 16:35:12 +03:00
Sergei Golubchik
b0a2b921cc ColumnStore 6.4.11-1 2025-07-25 12:28:30 +02:00
Marko Mäkelä
55e0c34f4f MDEV-37263 Hang or crash when shrinking innodb_buffer_pool_size
buf_pool_t::shrink(): If we run out of pages to evict from buf_pool.LRU,
abort the operation. Also, do not leak the spare block that we may have
allocated.
2025-07-18 10:06:33 +03:00
Marko Mäkelä
cedfe8eca4 MDEV-37250 buf_pool_t::shrink() assertion failure
buf_pool_t::shrink(): When relocating a dirty page of the temporary
tablespace, reset the oldest_modification() on the discarded block,
like we do for persistent pages in buf_flush_relocate_on_flush_list().

buf_pool_t::resize(): Add debug assertions to catch this error earlier.

This bug does not seem to affect non-debug builds.

Reviewed by: Thirunarayanan Balathandayuthapani
2025-07-17 12:24:25 +03:00
Aleksey Midenkov
9a51709dba MDEV-29001 DROP DEFAULT makes SHOW CREATE non-idempotent
DROP DEFAULT adds DEFAULT NULL in case of nullable column. In case of
NOT NULL column it drops default expression if any exists.
2025-07-17 09:18:18 +02:00
Sergei Golubchik
9306353d2d MDEV-36753 Assertion `str[strlen(str)-1] != '\n'' failed in my_message_sql upon REPAIR .. USE_FRM with encryption enabled
remove '\n' from error log messages
2025-07-17 09:18:17 +02:00
Sergei Golubchik
9703c90712 MDEV-37199 UNIQUE KEY USING HASH accepting duplicate records
Server-level UNIQUE constraints (namely, WITHOUT OVERLAPS and USING HASH)
only worked with InnoDB in REPEATABLE READ isolation mode, when the
constraint was checked first and then the row was inserted or updated.
Gap locks prevented race conditions when a concurrent connection
could've also checked the constraint and inserted/updated a row
at the same time.

In READ COMMITTED there are no gap locks. To avoid race conditions,
we now check the constraint *after* the row operation. This is
enabled by the HA_CHECK_UNIQUE_AFTER_WRITE table flag that InnoDB
sets in the READ COMMITTED transactions.

Checking the constraint after the row operation is more complex.
First, the constraint will see the current (inserted/updated) row,
and needs to skip it. Second, IGNORE operations become tricky,
as we need to revert the insert/update and continue statement execution.

write_row() (INSERT IGNORE) is reverted with delete_row(). Conveniently
it deletes the current row, that is, the last inserted row.

update_row(a,b) (UPDATE IGNORE) is reverted with a reversed update,
update_row(b,a). Conveniently, it updates the current row too.

Except in InnoDB when the PK is updated - in this case InnoDB internally
performs delete+insert, but does not move the cursor, so the "current"
row is the deleted one and the reverse update doesn't work.
This combination now throws an "unsupported" error and will
be fixed in MDEV-37233
2025-07-16 13:02:44 +02:00
Marko Mäkelä
024c7e881f MDEV-37103 innodb_immediate_scrub_data_uncompressed=ON may break innodb_undo_log_truncate=ON
The test innodb.undo_truncate occasionally demonstrates a race condition
where scrubbing is writing zeroes to a freed undo page, and
innodb_undo_log_truncate=ON truncating the same tablespace. The
truncation is an exception to the rule that InnoDB tablespace file sizes
can only grow, never shrink.

The fields fil_space_t::size and fil_node_t::size are protected by
fil_system.mutex, which used to be a highly contended resource. We
do not want to revert back to acquiring the mutex in fil_space_t::io()
because that would introduce an obvious scalability bottleneck.

fil_space_t::flush_freed(): Do not try to scrub pages of the undo
tablespace in order to prevent a race condition between io()
and undo tablespace truncation.

fil_space_t::io(): Prevent a null pointer dereference when reporting
an out-of-bounds access to the non-first file of the system or
temporary tablespace. Do not invoke set_corrupted() after an
out-of-bounds asynchronous read.

Note: fil_space_t::flush_freed() may only invoke PUNCH_RANGE on
page_compressed tablespaces, never on an undo tablespace.
2025-07-16 12:01:59 +03:00
Marko Mäkelä
e3c5565dfb MDEV-36330 fixup: Only fix innodb_snapsho_isolation=ON
ha_innobase::store_lock(): Do not create a read view or start the
transaction if innodb_snapshot_isolation=OFF. This should save some
resources with the default settings.
2025-07-15 16:26:16 +03:00
Marko Mäkelä
b7b2e009b3 MDEV-37215 SELECT FOR UPDATE crash in SERIALIZABLE
ha_innobase::store_lock(): Set also trx->will_lock when starting
a transaction at SERIALIZABLE isolation level. This fixes up
commit 7fbbbc983f (MDEV-36330).
2025-07-14 10:31:56 +03:00
Marko Mäkelä
499fa24d63 MDEV-27058 fixup: Fix a bogus assertion
buf_page_get_low(): Do not expect a valid state of
buf_page_t::in_zip_hash for blocks that are not file pages.
This debug assertion had been misplaced in
commit aaef2e1d8c (MDEV-27058)
that removed the condition
block->page.state() == BUF_BLOCK_FILE_PAGE.
2025-07-14 10:31:48 +03:00
Yuchen Pei
ea962ca495
MDEV-30436 [fixup] Add missing check for HAVE_PSI_INTERFACE
A fixup of 3e9aa07cce, with thanks to
Daniel Black.
2025-07-14 15:45:28 +10:00