log_file_t::read(), log_file_t::write(): Invoke pread() or pwrite()
directly, so that we can give more accurate diagnostics in case of
a failure, and so that we will avoid the overhead of setting up 5(!)
stack frames and related objects.
tpool::pwrite(): Add a missing const qualifier.
A server that was running with innodb_log_file_size=96M and
innodb_buffer_pool_size=6M had inserted some data into a table
that was subsequently dropped. When the server was killed and
restarted, an assertion failed in recv_sys_t::parse() while
a FSP_SIZE change was unnecessarily being processed during
the skip_the_rest: loop in recv_scan_log().
The ib_logfile0 contents was as follows:
1. The checkpoint start LSN points to the start of some mini-transaction.
2. There may be log records for modifying files for which a FILE_MODIFY
had been written before the checkpoint. These records were "purged"
by advancing the checkpoint.
3. At some point during the initial parsing with store=true the space
reserved for recv_sys.pages will run out and recv_scan_log() would switch
to the skip_the_rest: mode.
4. We encounter a log record for extending a tablespace that will be
deleted a bit later. This would trip the bogus debug assertion.
5. Later on, there would be a FILE_DELETE record for this tablespace.
6. The checkpoint end LSN points to a possibly empty sequence of
FILE_MODIFY records and a FILE_CHECKPOINT record. Recovery had parsed these
records first, before rewinding to the checkpoint start LSN.
7. There could be further records following the FILE_CHECKPOINT record.
Recovery will process all records until an inconsistency is found and
it is assumed that the end of the circular ib_logfile0 was reached.
recv_sys_t::parse(): For the template instantiation with store=false,
remove a debug assertion that could fail in a multi-batch recovery,
while recv_scan_log(false) would be in the skip_the_rest: loop.
It is very well possible that we have not encountered all FILE_ records
yet, and therefore we should not complain about unknown tablespaces.
Reviewed by: Debarun Banerjee
When using the default innodb_log_buffer_size=2m, mariadb-backup --backup
would spend a lot of time re-reading and re-parsing the log. For reads,
it would be beneficial to memory-map the entire ib_logfile0 to the
address space (typically 48 bits or 256 TiB) and read it from there,
both during --backup and --prepare.
We will introduce the Boolean read-only parameter innodb_log_file_mmap
that will be OFF by default on most platforms, to avoid aggressive
read-ahead of the entire ib_logfile0 in when only a tiny portion would be
accessed. On Linux and FreeBSD the default is innodb_log_file_mmap=ON,
because those platforms define a specific mmap(2) option for enabling
such read-ahead and therefore it can be assumed that the default would
be on-demand paging. This parameter will only have impact on the initial
InnoDB startup and recovery. Any writes to the log will use regular I/O,
except when the ib_logfile0 is stored in a specially configured file system
that is backed by persistent memory (Linux "mount -o dax").
We also experimented with allowing writes of the ib_logfile0 via a
memory mapping and decided against it. A fundamental problem would be
unnecessary read-before-write in case of a major page fault, that is,
when a new, not yet cached, virtual memory page in the circular
ib_logfile0 is being written to. There appears to be no way to tell
the operating system that we do not care about the previous contents of
the page, or that the page fault handler should just zero it out.
Many references to HAVE_PMEM have been replaced with references to
HAVE_INNODB_MMAP.
The predicate log_sys.is_pmem() has been replaced with
log_sys.is_mmap() && !log_sys.is_opened().
Memory-mapped regular files differ from MAP_SYNC (PMEM) mappings in the
way that an open file handle to ib_logfile0 will be retained. In both
code paths, log_sys.is_mmap() will hold. Holding a file handle open will
allow log_t::clear_mmap() to disable the interface with fewer operations.
It should be noted that ever since
commit 685d958e38 (MDEV-14425)
most 64-bit Linux platforms on our CI platforms
(s390x a.k.a. IBM System Z being a notable exception) read and write
/dev/shm/*/ib_logfile0 via a memory mapping, pretending that it is
persistent memory (mount -o dax). So, the memory mapping based log
parsing that this change is enabling by default on Linux and FreeBSD
has already been extensively tested on Linux.
::log_mmap(): If a log cannot be opened as PMEM and the desired access
is read-only, try to open a read-only memory mapping.
xtrabackup_copy_mmap_snippet(), xtrabackup_copy_mmap_logfile():
Copy the InnoDB log in mariadb-backup --backup from a memory
mapped file.
The issue is caused by a race between buf_page_create_low getting the
page from buffer pool hash and buf_LRU_free_page evicting it from LRU.
The issue is introduced in 10.6 by MDEV-27058
commit aaef2e1d8c
MDEV-27058: Reduce the size of buf_block_t and buf_page_t
The solution is buffer fix the page before releasing buffer pool mutex
in buf_page_create_low when x_lock_try fails to acquire the page latch.
log_t::persist(): Add a parameter holding_latch to specify
whether the caller is already holding exclusive log_sys.latch,
like log_write_and_flush() always is.
The loose regex for the MDEV-34539 test ended up
matching the opensuse in the path in buildbot.
Adjust to more complete regex including space,
backtick and \n, which becomes much less common
as a path name.
The lsof utility is prone to blocking on system calls that
it uses to obtain information about sockets (or files, devices,
etc.). This behavior is described in its own documentation.
It has a '-b' option (in combination with warnings suppression
via '-w') that reduces the probability of blocking, introducing
new problems (luckily probably not relevant for our use case).
However, there is no guarantee that it will not hang on some
distributions, with some TCP/IP stack implementations, or with
some filesystems, etc. Also, of the three utilities that are
suitable for our purposes, lsof is the slowest. So if there
are other utilities that we use during SST, such as 'ss' or
'sockstat', it is reasonable to use them instead of lsof.
This commit changes the prioritization of utilities, it does
not need additional tests (besides the numerous SST tests
already available in the galera suites). If the system still
need to use lsof, this commit adds the '-b' and '-w' options
to it command line - to reduce the likelihood of blocking.
Removed handling of the long-unsupported xtrabackup_pid file,
as it is not even created by modern versions of mariabackup.
Instead, added stopping of the asynchronous process that
mariabackup runs (if it is still active) to the exception
handler.
This commit makes the SST script for mariabackup more
resilient to unexpected terminations or hangs while
mariabackup or when SST scripts in a previous session
are still running (in reality they were hung while
waiting for something).
GCC 12.2.0 could issue -Wnonnull for an unreachable call to
strlen(new_path). Let us prevent that by replacing the condition
(type == FILE_RENAME) with the equivalent (new_path).
This should also optimize the generated code, because the life time
of the parameter "type" will be reduced.
log_t::resize_write_buf(): If d<0 and d>-length, d will fit in ssize_t,
which is a signed 32-bit or 64-bit integer. Cast from int64_t to ssize_t
to make this clear and to silence a compiler warning.
my_b_encr_write(): Initialize also block_length, and at the same time
last_block_length, so that all 128 bits can be initialized with fewer
writes. This fixes an error that was caught in the test
encryption.tempfiles_encrypted.
test_my_safe_print_str(): Skip a test that would attempt to
display uninitialized data in the test unit.stacktrace.
Previously, our CI did not build unit tests with MemorySanitizer.
handle_delayed_insert(): Remove a redundant call to pthread_exit(0),
which would for some reason cause MemorySanitizer in clang-19 to
report a stack overflow in a RelWithDebInfo build. This fixes a
failure of several tests.
Reviewed by: Vladislav Vaintroub
buf_page_t::read_complete(): Fix an incorrect condition that had been
added in commit aaef2e1d8c (MDEV-27058).
Also for compressed-only pages we must remember that buffered changes
may exist.
buf_read_page(): Correct the function comment; this is for a synchronous
and not asynchronous read. Pass the parameter unzip=true to
buf_read_page_low(), because each of our callers will be interested in
the uncompressed page frame. This will cause the test
encryption.innodb-compressed-blob to emit more errors when the
correct keys for decrypting the clustered index root page are unavailable.
Reviewed by: Debarun Banerjee
buf_pool_t::page_fix(): If a change buffer merge may be needed on a
ROW_FORMAT=COMPRESSED page that exists in compressed-only format in
the buffer pool, go ahead to decompress the block. This fixes an
infinite loop.
Reviewed by: Debarun Banerjee
Add support for removing the Content-Type header to the S3 engine. This
is required for compatibility with some S3 providers.
This also adds a provider option to the S3 engine which will turn on
relevant compatibility options for specific providers.
This was required for getting MariaDB S3 engine to work with "Huawei
Cloud S3".
To get Huawei S3 storage to work on has set one of the following
S3 options:
s3_provider=Huawei
s3_ssl_no_verify=1
Author: Andrew Hutchings <andrew@mariadb.org>
The 10.5->10.6 merge commit 3bc98a4ec4 casts the arg to an int16
pointer in set_extraction_flag_processor(). This matched the previous
commit c76eabfb5e where set_extraction_flag was changed to have int16 arg
instead of int.
The commit a5e4c34991 for MDEV-29363 added a call to
set_extraction_flag_processor on IMMUTABLE_FL (MARKER_IMMUTABLE in 10.6).
The subsequent 10.5->10.6 merge f071b7620b did not cast the flag
to int16 when merging this change.
The result is big-endian processors cleared the immutable
flag rather than set the flag, resulting in MDEV-29363
being unfixed on big-endian processors.