The log_sys.lsn_lock is a very contended resource with a small
critical section in log_sys.append_prepare(). On many processor
microarchitectures, replacing the system call based log_sys.lsn_lock
with a pure spin lock would fare worse during high concurrency workloads,
wasting a significant amount of CPU cycles in the spin loop.
On other microarchitectures, we would see a significant amount of time
being spent in native_queued_spin_lock_slowpath() in the Linux kernel,
plus context switching between user and kernel address space. This was
pointed out by Steve Shaw from Intel Corporation.
Depending on the workload and the hardware implementation, it may be
useful to use a pure spin lock in log_sys.append_prepare().
We will introduce a parameter. The statement
SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=50;
would enable a spin lock that will execute that many MY_RELAX_CPU()
operations (such as the x86 PAUSE instruction) between successive
attempts of acquiring the spin lock. The use of a system call based
log_sys.lsn_lock (which is the default setting) can be enabled by
SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=0;
This patch will also introduce #ifdef LOG_LATCH_DEBUG
(part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON) for more accurate
tracking of log_sys.latch ownership and reorganize the fields of
log_sys to improve the locality of reference and to reduce the
chances of false sharing.
When a spin lock is being used, it will be maintained in the
most significant bit of log_sys.buf_free. This is useful, because that is
one of the fields that is covered by the lock. For IA-32 or AMD64, we
implement the spin lock specially via log_t::lsn_lock_bts(), employing the
i386 LOCK BTS instruction. A straightforward std::atomic::fetch_or() would
translate into an inefficient loop around LOCK CMPXCHG.
mtr_t::spin_wait_delay: The value of innodb_log_spin_wait_delay.
mtr_t::finisher: Pointer to the currently used mtr_t::finish_write()
implementation. This allows to avoid introducing conditional branches.
We no longer invoke log_sys.is_pmem() at the mini-transaction level,
but we would do that in log_write_up_to().
mtr_t::finisher_update(): Update finisher when spin_wait_delay is
changed from or to 0 (the spin lock is changed to log_sys.lsn_lock or
vice versa).
If mariabackup with backup locks is used on SST we do not
pause and desync galera provider at all. If WSREP_MODE_BF_MARIABACKUP
case provider is paused and desync at BLOCK_COMMIT phase. In
other cases provider is paused and desync at BLOCK_DDL phase.
Added support to BACKUP STAGE to maria-backup
This is a port of the code from ES 10.6
See MDEV-5336 for backup stages description.
The following old options are not supported by the new code:
--rsync ; This is because rsync will not work on tables
that are in used.
--no-backup-locks ; This is disabled as mariadb-backup will always
use backup locks for better performance.
Apparently, invoking fcntl(fd, F_SETFL, O_DIRECT) will lead to
unexpected behaviour on Linux bcachefs and possibly other file systems,
depending on the operating system version. So, let us avoid doing that,
and instead just attempt to pass the O_DIRECT flag to open(). This should
make us compatible with NetBSD, IBM AIX, as well as Solaris and its
derivatives.
This fix does not change the fact that we had only implemented
innodb_log_file_buffering=OFF on systems where we can determine the
physical block size (typically 512 or 4096 bytes).
Currently, those operating systems are Linux and Microsoft Windows.
HAVE_FCNTL_DIRECT, os_file_set_nocache(): Remove.
OS_FILE_OVERWRITE, OS_FILE_CREATE_PATH: Remove (never used parameters).
os_file_log_buffered(), os_file_log_maybe_unbuffered(): Helper functions.
os_file_create_simple_func(): When applicable, initially attempt to
open files in O_DIRECT mode.
os_file_create_func(): When applicable, initially attempt to
open files in O_DIRECT mode.
For type==OS_LOG_FILE && create_mode != OS_FILE_CREATE
we will first invoke stat(2) on the file name to find out if the size
is compatible with O_DIRECT. If create_mode == OS_FILE_CREATE, we will
invoke fstat(2) on the created log file afterwards, and may close and
reopen the file in O_DIRECT mode if applicable.
create_temp_file(): Support O_DIRECT. This is only used if O_TMPFILE is
available and innodb_disable_sort_file_cache=ON (non-default value).
Notably, that setting never worked on Microsoft Windows.
row_merge_file_create_mode(): Split from row_merge_file_create_low().
Create a temporary file in the specified mode.
Reviewed by: Vladislav Vaintroub
The `unused-but-set-variable` warning is raised on MacOS from the
`posix_fadvise` standin macro, since offset is often otherwise unused. Add a
cast to absorb this warning.
Signed-off-by: Trevor Gross <tmgross@umich.edu>
The innodb_changed_pages plugin only was part of XtraDB, never InnoDB.
It would be useful for incremental backups.
We will remove the code from mariadb-backup for now, because it cannot
serve any useful purpose until the server part has been implemented.
This is the prerequisite patch to refactor the method
Item_default_value::fix_fields.
The former implementation of this method was extracted and placed
into the standalone function make_default_field() and the method
Item_default_value::tie_field(). The motivation for this modification
is upcoming changes for core implementation of the task MDEV-15703
since these functions will be used from several places within
the source code.
Postfix for a6290a5bc5, in 10.11
where OS_DATA_FILE_NO_O_DIRECT gets used. Same #ifdef conditions
as other uses of OS_DATA_FILE_NO_O_DIRECT.
Noticed on aarch64-macos builder.
mariadb-backup:
Adding a function get_os_user() to detect the OS user name
if the user name is not specified, to make mariadb-backup:
- work like MariaDB client tools work
- match its --help page, which says:
-u, --user=name This option specifies the username used when
connecting to the server, if that's not the current user.
Most things where wrong in the test suite.
The one thing that was a bug was that table_map_id was in some places
defined as ulong and in other places as ulonglong. On Linux 64 bit this
is not a problem as ulong == ulonglong, but on windows this caused failures.
Fixed by ensuring that all instances of table_map_id are ulonglong.
The directio(3C) function on Solaris is supported on NFS and UFS
while the majority of users should be on ZFS, which is a copy-on-write
file system that implements transparent compression and therefore
cannot support unbuffered I/O.
Let us remove the call to directio() and simply treat
innodb_flush_method=O_DIRECT in the same way as the previous
default value innodb_flush_method=fsync on Solaris. Also, let us
remove some dead code around calls to os_file_set_nocache() on
platforms where fcntl(2) is not usable with O_DIRECT.
On IBM AIX, O_DIRECT is not documented for fcntl(2), only for open(2).
This commit fixes GTID inconsistency which was injected by mariabackup SST.
Donor node now writes new info file: donor_galera_info, which is streamed
along the mariabackup donation to the joiner node. The donor_galera_info
file contains both GTID and gtid domain_id, and joiner will use these to
initialize the GTID state.
Commit has new mtr test case: galera_3nodes.galera_gtid_consistency, which
exercises potentially harmful mariabackup SST scenarios. The test has also
scenario with IST joining.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
opt_kill_long_query_type being an enum could be 0 corresponding
to ALL. When ALL is specified, the CONNECTION ADMIN is still
required.
Also check REPLICA MONITOR privilege and make the tests
find the results by recording stderr.
Noticed thanks to bug report by Tim van Dijen.
Fixes: 79b58f1ca8
This is a port of the Percona Server commit 5265f42e290573e9591f8ca28ab66afc051f89a3
which is the same as their bug PXB-1807: xtrabackup does not accept fractional values for
innodb_max_dirty_pages_pct
Problem:
Variable specified as double in MySQL server, but read as long in the
xtrabackup. This causes xtrabackup to fail at startup when the value
contains decimal point.
Fix:
Make xtrabackup to interpret the value as double to be compatible with
server.
Problem:
The file backup-my.cnf from the backup directory was loaded by
"mariabackup --prepare" only in case of the explicit --target-dir given.
It was not loaded from the default directory ./xtrabackup_backupfiles/
in case if the explicit --target-dir was missing.
In other words, it worked as follows:
1. When started as "mariabackup --prepare --target-dir=DIR", mariabackup:
a. loads defaults from "DIR/backup-my.cnf"
b. processes data files in the specified directory DIR
2. When started as "mariabackup --prepare", mariabackup:
a. does not load defaults from "./xtrabackup_backupfiles/backup-my.cnf"
b. processes data files in the default directory "./xtrabackup_backupfiles/"
This patch fixes the second scenario, so it works as follows:
2. When started as "mariabackup --prepare", mariabackup:
a. loads defaults from "./xtrabackup_backupfiles/backup-my.cnf"
b. processes data files in the default directory "./xtrabackup_backupfiles/"
This change fixes (among others) the problem with the
"Can't open shared library '/file_key_management.so'" error
reported when "mariabackup --prepare" is used without --target-dir
in combinaton with the encryption plugin.
Revert the patch for MDEV-18917, which removed this functionality.
This restores that mariabackup --prepare recovers the transactional
binlog position from the redo log, and writes it to the file
xtrabackup_binlog_pos_innodb.
This position is updated only on every InnoDB commit. This means that
if the last event in the binlog at the time of backup is a DDL or
non-transactional update, the recovered position from --no-lock will
be behind the state of the backup.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.
Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.
The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.
purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).
purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.
purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().
purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().
Reviewed by: Vladislav Lesin