Fixing buildbot failures on mariabackup.aria_log_dir_path_rel.
The problem was that directory_exists() was called with the
relative aria_log_dir_path value, while the current directory
in mariadb-backup is not necessarily equal to datadir when MTR is running.
Fix:
- Moving building the absolute path un level upper:
from the function copy_back_aria_logs() to the function copy_back().
- Passing the built absolute path to both directory_exists() and
copy_back_aria_logs() as a parameter.
- `mariadb-backup --backup` was fixed to fetch the value of the
@@aria_log_dir_path server variable and copy aria_log* files
from @@aria_log_dir_path directory to the backup directory.
Absolute and relative (to --datadir) paths are supported.
Before this change aria_log* files were copied to the backup
only if they were in the default location in @@datadir.
- `mariadb-backup --copy-back` now understands a new my.cnf and command line
parameter --aria-log-dir-path.
`mariadb-backup --copy-back` in the main loop in copy_back()
(when copying back from the backup directory to --datadir)
was fixed to ignore all aria_log* files.
A new function copy_back_aria_logs() was added.
It consists of a separate loop copying back aria_log* files from
the backup directory to the directory specified in --aria-log-dir-path.
Absolute and relative (to --datadir) paths are supported.
If --aria-log-dir-path is not specified,
aria_log* files are copied to --datadir by default.
- The function is_absolute_path() was fixed to understand MTR style
paths on Windows with forward slashes, e.g.
--aria-log-dir-path=D:/Buildbot/amd64-windows/build/mysql-test/var/...
fil_space_t::create(), fil_space_t::add(): Expect the caller to
acquire and release fil_system.mutex. In this way, creating a tablespace
and adding the first (usually only) data file will be atomic.
recv_sys_t::recover_deferred(): Correctly protect some changes by
holding fil_system.mutex.
Tested by: Matthias Leich
This is a non-functional change.
simplifying the code logic:
- removing global variables ds_data and ds_meta
- passing these variables as parameters to functions instead
- adding helper classes: Datasink_free_list and Backup_datasinks
- moving some function accepting a ds_ctxt parameter
as methods to ds_ctxt.
server has systemd support and calls sd_notify() to communicate
the status to systemd.
mariabackup links the whole server in, but it should not notify
systemd, because it's not started or managed by systemd.
The solution is to suppress error messages for missing tablespaces if
mariabackup is launched with "--prepare --export" options.
"mariabackup --prepare --export" invokes itself with --mysqld parameter.
If the parameter is set, then it starts server to feed "FLUSH TABLES ...
FOR EXPORT;" queries for exported tablespaces. This is "normal" server
start, that's why new srv_operation value is introduced.
Reviewed by Marko Makela.
fil_node_open_file_low() tries to close files from the top of
fil_system.space_list if the number of opened files is exceeded.
It invokes fil_space_t::try_to_close(), which iterates the list searching
for the first opened space. Then it just closes the space, leaving it in
the same position in fil_system.space_list.
On heavy files opening, like during 'SHOW TABLE STATUS ...' execution,
if the number of opened files limit is reached,
fil_space_t::try_to_close() iterates more and more closed spaces before
reaching any opened space for each fil_node_open_file_low() call. What
causes performance regression if the number of spaces is big enough.
The fix is to keep opened spaces at the top of fil_system.space_list,
and move closed files at the end of the list.
For this purpose fil_space_t::space_list_last_opened pointer is
introduced. It points to the last inserted opened space in
fil_space_t::space_list. When space is opened, it's inserted to the
position just after the pointer points to in fil_space_t::space_list to
preserve the logic, inroduced in MDEV-23855. Any closed space is added
to the end of fil_space_t::space_list.
As opened spaces are located at the top of fil_space_t::space_list,
fil_space_t::try_to_close() finds opened space faster.
There can be the case when opened and closed spaces are mixed in
fil_space_t::space_list if fil_system.freeze_space_list was set during
fil_node_open_file_low() execution. But this should not cause any error,
as fil_space_t::try_to_close() still iterates spaces in the list.
There is no need in any test case for the fix, as it does not change any
functionality, but just fixes performance regression.
Renaming the default MariaDB backup directory from
xtrabackup_backupfiles to mariadb_backup_files.
Renaming files:
- xtrabackup_binlog_info to mariadb_backup_binlog_info
- xtrabackup_checkpoints to mariadb_backup_checkpoints
- xtrabackup_galera_info to mariadb_backup_galera_info
- xtrabackup_info to mariadb_backup_info
- xtrabackup_slave_info to mariadb_backup_slave_info
While performing SAST scanning using Cppcheck against source code of
commit 81196469, several code vulnerabilities were found.
Fix following issues:
1. Parameters of `snprintf` function are incorrect.
Cppcheck error:
client/mysql_plugin.c:1228: error: snprintf format string requires 6 parameters but only 5 are given.
It is due to commit 630d7229 introduced option `--lc-messages-dir`
in the bootstrap command. However the parameter was not even given
in the `snprintf` after changing the format string.
Fix:
Restructure the code logic and correct the function parameters for
`snprintf`.
2. Null pointer is used in a `snprintf` which could cause a crash.
Cppcheck error:
extra/mariabackup/xbcloud.cc:2534: error: Null pointer dereference
The code intended to print the swift_project name, if the
opt_swift_project_id is NULL but opt_swift_project is not NULL.
However the parameter of `snprintf` was mistakenly using
`opt_swift_project_id`.
Fix:
Change to use the correct string from `opt_swift_project`.
3. Potential double release of a memory
Cppcheck error:
plugin/auth_pam/testing/pam_mariadb_mtr.c:69: error: Memory pointed to by 'resp' is freed twice.
A pointer `resp` is reused and allocated new memory after it has been
freed. However, `resp` was not set to NULL after freed.
Potential double release of the same pointer if the call back
function doesn't allocate new memory for `resp` pointer.
Fix:
Set the `resp` pointer to NULL after the first free() to make sure
the same address is not freed twice.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
The MariaDB code base uses strcat() and strcpy() in several
places. These are known to have memory safety issues and their usage is
discouraged. Common security scanners like Flawfinder flags them. In MariaDB we
should start using modern and safer variants on these functions.
This is similar to memory issues fixes in 19af1890b5
and 9de9f105b5 but now replace use of strcat()
and strcpy() with safer options strncat() and strncpy().
However, add '\0' forcefully to make sure the result string is correct since
for these two functions it is not guaranteed what new string will be null-terminated.
Example:
size_t dest_len = sizeof(g->Message);
strncpy(g->Message, "Null json tree", dest_len); strncat(g->Message, ":",
sizeof(g->Message) - strlen(g->Message)); size_t wrote_sz = strlen(g->Message);
size_t cur_len = wrote_sz >= dest_len ? dest_len - 1 : wrote_sz;
g->Message[cur_len] = '\0';
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services
-- Reviewer and co-author Vicențiu Ciorbaru <vicentiu@mariadb.org>
-- Reviewer additions:
* The initial function implementation was flawed. Replaced with a simpler
and also correct version.
* Simplified code by making use of snprintf instead of chaining strcat.
* Simplified code by removing dynamic string construction in the first
place and using static strings if possible. See connect storage engine
changes.
Starting with commit baf276e6d4 (MDEV-19229)
the parameter innodb_undo_tablespaces can be increased from its
previous default value 0 while allowing an upgrade from old databases.
We will change the default setting to innodb_undo_tablespaces=3
so that the space occupied by possible bursts of undo log records
can be reclaimed after SET GLOBAL innodb_undo_log_truncate=ON.
We will not enable innodb_undo_log_truncate by default, because it
causes some observable performance degradation.
Special thanks to Thirunarayanan Balathandayuthapani for diagnosing
and fixing a number of bugs related to this new default setting.
Tested by: Matthias Leich, Axel Schwenke, Vladislav Vaintroub
(with both values of innodb_undo_log_truncate)
The purpose of the change buffer was to reduce random disk access,
which could be useful on rotational storage, but maybe less so on
solid-state storage.
When we wished to
(1) insert a record into a non-unique secondary index,
(2) delete-mark a secondary index record,
(3) delete a secondary index record as part of purge (but not ROLLBACK),
and the B-tree leaf page where the record belongs to is not in the buffer
pool, we inserted a record into the change buffer B-tree, indexed by
the page identifier. When the page was eventually read into the buffer
pool, we looked up the change buffer B-tree for any modifications to the
page, applied these upon the completion of the read operation. This
was called the insert buffer merge.
We remove the change buffer, because it has been the source of
various hard-to-reproduce corruption bugs, including those fixed in
commit 5b9ee8d819 and
commit 165564d3c3 but not limited to them.
A downgrade will fail with a clear message starting with
commit db14eb16f9 (MDEV-30106).
buf_page_t::state: Merge IBUF_EXIST to UNFIXED and
WRITE_FIX_IBUF to WRITE_FIX.
buf_pool_t::watch[]: Remove.
trx_t: Move isolation_level, check_foreigns, check_unique_secondary,
bulk_insert into the same bit-field. The only purpose of
trx_t::check_unique_secondary is to enable bulk insert into an
empty table. It no longer enables insert buffering for UNIQUE INDEX.
btr_cur_t::thr: Remove. This field was originally needed for change
buffering. Later, its use was extended to cover SPATIAL INDEX.
Much of the time, rtr_info::thr holds this field. When it does not,
we will add parameters to SPATIAL INDEX specific functions.
ibuf_upgrade_needed(): Check if the change buffer needs to be updated.
ibuf_upgrade(): Merge and upgrade the change buffer after all redo log
has been applied. Free any pages consumed by the change buffer, and
zero out the change buffer root page to mark the upgrade completed,
and to prevent a downgrade to an earlier version.
dict_load_tablespaces(): Renamed from
dict_check_tablespaces_and_store_max_id(). This needs to be invoked
before ibuf_upgrade().
btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.
btr_page_alloc(): Renamed from btr_page_alloc_low(). We no longer
allocate any change buffer pages.
btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.
row_search_index_entry(), btr_lift_page_up(): Add a parameter thr
for the SPATIAL INDEX case.
rtr_page_split_and_insert(): Specialized from btr_page_split_and_insert().
rtr_root_raise_and_insert(): Specialized from btr_root_raise_and_insert().
Note: The support for upgrading from the MySQL 3.23 or MySQL 4.0
change buffer format that predates the MySQL 4.1 introduction of
the option innodb_file_per_table was removed in MySQL 5.6.5
as part of mysql/mysql-server@69b6241a79
and MariaDB 10.0.11 as part of 1d0f70c2f8.
In the tests innodb.log_upgrade and innodb.log_corruption, we create
valid (upgraded) change buffer pages.
Tested by: Matthias Leich
We introduce the following settable Boolean global variables:
innodb_log_file_write_through: Whether writes to ib_logfile0 are
write-through (disabling any caching, as in O_SYNC or O_DSYNC).
innodb_data_file_write_through: Whether writes to any InnoDB data files
(including the temporary tablespace) are write-through.
innodb_data_file_buffering: Whether the file system cache is enabled
for InnoDB data files.
All these parameters are OFF by default, that is, the file system cache
will be disabled, but any hardware caching is enabled, that is,
explicit calls to fsync(), fdatasync() or similar functions are needed.
On systems that support FUA it may make sense to enable write-through,
to avoid extra system calls.
If the deprecated read-only start-up parameter is set to one of the
following values, then the values of the 4 Boolean flags (the above 3
plus innodb_log_file_buffering) will be set as follows:
O_DSYNC:
innodb_log_file_write_through=ON, innodb_data_file_write_through=ON,
innodb_data_file_buffering=OFF, and
(if supported) innodb_log_file_buffering=OFF.
fsync, littlesync, nosync, or (Microsoft Windows specific) normal:
innodb_log_file_write_through=OFF, innodb_data_file_write_through=OFF,
and innodb_data_file_buffering=ON.
Note: fsync() or fdatasync() will only be disabled if the separate
parameter debug_no_sync (in the code, my_disable_sync) is set.
In mariadb-backup, the parameter innodb_flush_method will be ignored.
The Boolean parameters can be modified by SET GLOBAL while the
server is running. This will require reopening the ib_logfile0
or all currently open InnoDB data files.
We will open files straight in O_DSYNC or O_SYNC mode when applicable.
Data files we will try to open straight in O_DIRECT mode when the
page size is at least 4096 bytes. For atomically creating data files,
we will invoke os_file_set_nocache() to enable O_DIRECT afterwards,
because O_DIRECT is not supported on some file systems. We will also
continue to invoke os_file_set_nocache() on ib_logfile0 when
innodb_log_file_buffering=OFF can be fulfilled.
For reopening the ib_logfile0, we use the same logic that was developed
for online log resizing and reused for updates of
innodb_log_file_buffering.
Reopening all data files is implemented in the new function
fil_space_t::reopen_all().
Reviewed by: Vladislav Vaintroub
Tested by: Matthias Leich
The MDEV-25004 test innodb_fts.versioning is omitted because ever since
commit 685d958e38 InnoDB would not allow
writes to a database where the redo log file ib_logfile0 is missing.
to copy datafile
- Mariabackup fails to copy the undo log tablespace when it undergoes
truncation. So Mariabackup should detect the redo log which does
undo tablespace truncation and also backup should read the minimum
file size of the tablespace and ignore the error while reading.
- Throw error when innodb undo tablespace read failed, but backup
doesn't find the redo log for undo tablespace truncation
io_watching_thread(): Declare as a detachable thread, similar to
log_copying_thread().
stop_backup_threads(): Wait for both log_copying_thread and
io_watching_thread to clear their flags. Expect log_sys.mutex
to be held by the caller.
xtrabackup_backup_func(): Initialize log_copying_stop before
creating io_watching_thread. This prevents a race condition
where io_watching_thread() could wait on the condition variable
before it had been fully initialized. This race condition would
cause a hang in the GNU libc implementation of pthread_cond_destroy()
at the end of stop_backup_threads().
This race condition was introduced in
commit 38fd7b7d91 (MDEV-21452).
The variable was not really being used for anything. The parameters
innodb_read_io_threads, innodb_write_io_threads have replaced
innodb_file_io_threads.
- Mariabackup fails to open the undo tablespaces while applying delta
files to the corresponding data file. Mariabackup opens the
undo tablespaces first time in srv_undo_tablespaces_init() and does
tries to open the undo tablespaces in xtrabackup_apply_deltas() with
conflicting mode and leads to the failure.
- Mariabackup should close the undo tablespaces before applying
the incremental delta files.
os_file_read(): Merged with os_file_read_no_error_handling().
Crashing on a partial page read is as unhelpful as crashing on a
corrupted page read (commit 0b47c126e3).
Report the file name if it is available via IORequest.
trx_sys_t::undo_log_nonempty: Set to true if there are undo logs
to rollback and purge.
The algorithm for re-creating the undo tablespace when
trx_sys_t::undo_log_nonempty is disabled:
1) trx_sys_t::reset_page(): Reset the TRX_SYS page and assign all
rollback segment slots from 1..127 to FIL_NULL
2) Free the rollback segment header page of system tablespace
for the slots 1..127
3) Update the binlog and WSREP information in system tablespace
rollback segment header
Step (1), (2) and Step (3) should happen atomically within a
single mini-transaction.
4) srv_undo_delete_old_tablespaces(): Delete the old undo tablespaces
present in the undo log directory
5) Make checkpoint to get rid of old undo log tablespaces redo logs
6) Assign new start space id for the undo log tablespaces
7) Re-create the specified undo log tablespaces. InnoDB uses same
mtr for this one and step (6)
8) Make checkpoint again, so that server or mariabackup
can read the undo log tablespace page0 before applying
the redo logs
srv_undo_tablespaces_reinit(): Recreate the undo log tablespaces.
It does reset trx_sys page, delete the old undo tablespaces,
update the binlog offset, write set replication checkpoint
in system rollback segment page
trx_rseg_update_binlog_offset(): Added 2 new parameters to pass
binlog file name and binlog offset
trx_rseg_array_init(): Return error if the rollback segment
slot points to non-existent tablespace
srv_undo_tablespaces_init(): Added new parameter mtr
to initialize all undo tablespaces
trx_assign_rseg_low(): Allow the transaction to use the rollback
segment slots(1..127) even if InnoDB failed to change to the
requested innodb_undo_tablespaces=0
srv_start(): Override the user specified value of
innodb_undo_tablespaces variable with already existing actual
undo tablespaces
wf_incremental_process(): Detects whether TRX_SYS page has been
modified since last backup. If it is then incremental backup
fails and throws the information about taking full backup again
xb_assign_undo_space_start(): Removed the function. Because
undo001 has first undo space id value in page0
Added test case to test the scenario during startup and mariabackup
incremental process too.
Reviewed-by : Marko Mäkelä
Tested-by : Matthias Leich
mariadb-backup: Add the Boolean option --innodb-log-file-buffering
(default ON) to control whether the server's ib_logfile0 should be
accessed via the file system cache during --backup. We may be retrying
reads of the last log block very frequently, which may cause I/O stalls
when the file system cache is being bypassed.
This addresses a regression that was introduced in
commit 4c0cd953ab (MDEV-28766).
On some affected systems, it may make sense to additionally
SET GLOBAL innodb_log_file_buffering=OFF on the server for the
duration of making a backup.
In commit 28325b0863
a compile-time option was introduced to disable the macros
DBUG_ENTER and DBUG_RETURN or DBUG_VOID_RETURN.
The parameter name WITH_DBUG_TRACE would hint that it also
covers DBUG_PRINT statements. Let us do that: WITH_DBUG_TRACE=OFF
shall disable DBUG_PRINT() as well.
A few InnoDB recovery tests used to check that some output from
DBUG_PRINT("ib_log", ...) is present. We can live without those checks.
Reviewed by: Vladislav Vaintroub
Let us use the normal platform-specific preprocessor symbols
__linux__, __sun__, _AIX instead of some homebrew ones.
The preprocessor symbol UNIV_HPUX must have lost its meaning
by f6deb00a56 (note: the symbol
UNIV_HPUX10 is being checked for, but only UNIV_HPUX is defined).
xb_read_delta_metadata(): For ROW_FORMAT=COMPRESSED tables, initialize
the info.zip_size with the physical page size and let info.page_size
remain the logical page size, like xb_delta_open_matching_space()
expects it to be ever since
commit 0a1c3477bf (MDEV-18493).
Changing the mariabackup history table from PERCONA_SCHEMA.xtrabackup_history
to mysql.mariabackup_history.
Additionally, extending xb_history.test for better coverage:
- Recording the fact that the history table is created during
"mariabackup --history" invocation when it does not exist.
- Recording the history table structure (adding SHOW CREATE TABLE)
- Recording how --history vs --history=foo affect the "name" column
of the history table.
- Recording the fact that two consequent executions of
"mariabackup --history[=foo]" insert into the history table
incrementally, without truncating it on every execution.
Since the 10.5 split of the privileges, the required GRANTs
for various mariabackup operations has changed.
In the addition of tests, a number of mappings where incorrect:
The option --lock-ddl-per-table didn't require connection admin.
The option --safe-slave-backup requires SLAVE MONITOR even without
the --no-lock option.
Even though commit b817afaa1c passed
the test mariabackup.compress_qpress, that test turned out to be
too small to reveal one more problem that had previously been prevented
by the existence of ctrl_mutex. I did not realize that there can be
multiple concurrent callers to compress_write(). One of them is the
log copying thread; further callers are data file copying threads
(default: --parallel=1).
By default, there is only one compression worker thread
(--compress-threads=1).
compress_write(): Fix a race condition between threads that would
use the same worker thread object. Make thd->data_avail contain the
thread identifier of the submitter, and add thd->avail_cond to
notify other compress_write() threads that are waiting for a slot.
This reverts the revert 4f62dfe676
and fixes the hang that was introduced when ctrl_mutex was removed.
The test mariabackup.compress_qpress covers this code, but the
test is skipped if a stand-alone qpress executable is not available.
It is not available in many software repositories, possibly because
the code base has not been updated since 2010.
This was tested with an executable that was compile from the source
code at http://www.quicklz.com/qpress-11-source.zip (after adding
a missing #include <unistd.h> for the definition of isatty()).
Compared to the grandparent commit (before the revert), the changes
are as follows:
comp_thread_ctxt_t::done_cond: A separate condition for completed
compression, signaling that thd->to_len has been updated.
compress_write(): Replace some threads[i] with thd.
Reset thd->to_len = 0 after consuming the compressed data.
compress_worker_thread_func(): After consuming the uncompressed
data, set thd->data_avail = FALSE. After compressing, signal
thd->done_cond.
An interface to use memory-mapped I/O on the InnoDB redo log that
is stored in persistent memory was introduced
in commit 685d958e38 (MDEV-14425).
log_t::attach(): In mariadb-backup --backup, never attempt to
use memory-mapped I/O for reading the log file of the server.
xtrabackup_copy_logfile(): Assert !log_sys.is_pmem() and remove
the code to deal with a memory-mapped log.
This fixes a race condition scenario of the following type:
1. Backup parsed a mini-transaction from the memory-mapped buffer.
This took some time.
2. Meanwhile, the server might have overwritten this portion
of the circular log_sys.buf.
3. Backup copied the data to the output file while or after
the server had overwritten this portion of the file.
4. Backup failed to notice that a log overrun occurred.
The symptom of this was that a mariadb-backup --prepare of the
log failed. In the analyzed case, the error message was:
[ERROR] InnoDB: Missing FILE_CHECKPOINT(...)
This will also make it possible to run mariadb-backup --backup
under "rr replay".
The approach to handling corruption that was chosen by Oracle in
commit 177d8b0c12
is not really useful. Not only did it actually fail to prevent InnoDB
from crashing, but it is making things worse by blocking attempts to
rescue data from or rebuild a partially readable table.
We will try to prevent crashes in a different way: by propagating
errors up the call stack. We will never mark the clustered index
persistently corrupted, so that data recovery may be attempted by
reading from the table, or by rebuilding the table.
This should also fix MDEV-13680 (crash on btr_page_alloc() failure);
it was extensively tested with innodb_file_per_table=0 and a
non-autoextend system tablespace.
We should now avoid crashes in many cases, such as when a page
cannot be read or allocated, or an inconsistency is detected when
attempting to update multiple pages. We will not crash on double-free,
such as on the recovery of DDL in system tablespace in case something
was corrupted.
Crashes on corrupted data are still possible. The fault injection mechanism
that is introduced in the subsequent commit may help catch more of them.
buf_page_import_corrupt_failure: Remove the fault injection, and instead
corrupt some pages using Perl code in the tests.
btr_cur_pessimistic_insert(): Always reserve extents (except for the
change buffer), in order to prevent a subsequent allocation failure.
btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages().
btr_assert_not_corrupted(), btr_corruption_report(): Remove.
Similar checks are already part of btr_block_get().
FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE.
dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(),
trx_undo_page_get_s_latched(): Replaced with error-checking calls.
trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get().
trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed.
trx_sys_create_sys_pages(): Merged with trx_sysf_create().
dict_check_tablespaces_and_store_max_id(): Do not access
DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot().
Merge dict_check_sys_tables() with this function.
dir_pathname(): Replaces os_file_make_new_pathname().
row_undo_ins_remove_sec(): Do not modify the undo page by adding
a terminating NUL byte to the record.
btr_decryption_failed(): Report decryption failures
dict_set_corrupted_by_space(), dict_set_encrypted_by_space(),
dict_set_corrupted_index_cache_only(): Remove.
dict_set_corrupted(): Remove the constant parameter dict_locked=false.
Never flag the clustered index corrupted in SYS_INDEXES, because
that would deny further access to the table. It might be possible to
repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case
no B-tree leaf page is corrupted.
dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(),
row_purge_skip_uncommitted_virtual_index(): Remove, and refactor
the callers to read dict_index_t::type only once.
dict_table_is_corrupted(): Remove.
dict_index_t::is_btree(): Determine if the index is a valid B-tree.
BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove.
UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger
assertion failures, but error codes being returned.
buf_corrupt_page_release(): Replaced with a direct call to
buf_pool.corrupted_evict().
fil_invalid_page_access_msg(): Never crash on an invalid read;
let the caller of buf_page_get_gen() decide.
btr_pcur_t::restore_position(): Propagate failure status to the caller
by returning CORRUPTED.
opt_search_plan_for_table(): Simplify the code.
row_purge_del_mark(), row_purge_upd_exist_or_extern_func(),
row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(),
row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free()
when no secondary indexes exist.
row_undo_mod_upd_exist_sec(): Simplify the code.
row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT
if the clustered index (and therefore the table) is corrupted, similar
to what we do in row_insert_for_mysql().
fut_get_ptr(): Replace with buf_page_get_gen() calls.
buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION
if the page is marked as freed. For other modes than
BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will
trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED,
we will return nullptr for freed pages, so that the callers
can be simplified. The purge of transaction history will be
a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on
corrupted data.
buf_page_get_low(): Never crash on a corrupted page, but simply
return nullptr.
fseg_page_is_allocated(): Replaces fseg_page_is_free().
fts_drop_common_tables(): Return an error if the transaction
was rolled back.
fil_space_t::set_corrupted(): Report a tablespace as corrupted if
it was not reported already.
fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report
out-of-bounds page access or other errors.
Clean up mtr_t::page_lock()
buf_page_get_low(): Validate the page identifier (to check for
recently read corrupted pages) after acquiring the page latch.
buf_page_t::read_complete(): Flag uninitialized (all-zero) pages
with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch.
mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi().
recv_sys_t::free_corrupted_page(): Only set_corrupt_fs()
if any log records exist for the page. We do not mind if read-ahead
produces corrupted (or all-zero) pages that were not actually needed
during recovery.
recv_recover_page(): Return whether the operation succeeded.
recv_sys_t::recover_low(): Simplify the logic. Check for recovery error.
Thanks to Matthias Leich for testing this extensively and to the
authors of https://rr-project.org for making it easy to diagnose
and fix any failures that were found during the testing.
INNODB_VERSION_STR: Replaced with PACKAGE_VERSION (non-functional change).
INNODB_VERSION_SHORT: Replaced with direct use of
MYSQL_VERSION_MAJOR << 8 | MYSQL_VERSION_MINOR.
check_version(): Simplify the mariadb-backup version check,
and require the server version to be MariaDB 10.8 or later,
because that is when the InnoDB redo log format was last changed.
comp_thread_ctxt_t: Remove ctrl_mutex, ctrl_cond, started. We do not
actually need them for anything.
destroy_worker_thread(): Split from destroy_worker_threads().
create_worker_threads(): We already initialize
thd->data_avail=FALSE and thd->cancelled=FALSE before
invoking pthread_create(). If any thread creation fails,
clean up by destroy_worker_thread().
compress_worker_thread_func(): Assume that thd->started and
thd->data_avail are already initialized.
Reviewed by: Vladislav Vaintroub
When "mariabackup --target-dir=$basedir --incremental-dir=$incremental_dir"
is running and is moving a new table file (e.g. `db1/t1.new`) from the
incremental directory to the base directory, it needs to verify that the base
backup database directory (e.g. `$basedir/db1`) really exists
(or create it otherwise).
The table `db1/t1` can come from a new database `db1` which
was created during the base mariabackup execution time.
In such case the directory `db1` exists only in the incremental directory,
but does not exist in the base directory.
don't initialize error_log_handler_list in set_handlers()
* error_log_handler_list is initialized to LOG_FILE early, in init_base()
* set_handlers always reinitializes it to LOG_FILE, so it's pointless
* after init_base() concurrent threads start using sql_log_warning,
so following set_handlers() shouldn't modify error_log_handler_list
without some protection
We will remove the parameter innodb_disallow_writes because it is badly
designed and implemented. The parameter was never allowed at startup.
It was only internally used by Galera snapshot transfer.
If a user executed
SET GLOBAL innodb_disallow_writes=ON;
the server could hang even on subsequent read operations.
During Galera snapshot transfer, we will block writes
to implement an rsync friendly snapshot, as follows:
sst_flush_tables() will acquire a global lock by executing
FLUSH TABLES WITH READ LOCK, which will block any writes
at the high level.
sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
will suspend or disable InnoDB background tasks or threads that could
initiate writes. As part of this, log_make_checkpoint() will be invoked
to ensure that anything in the InnoDB buf_pool.flush_list will be written
to the data files. This has the nice side effect that the Galera joiner
will avoid crash recovery.
The changes to sql/wsrep.cc and to the tests are based on a prototype
that was developed by Jan Lindström.
Reviewed by: Jan Lindström
When the log is stored in persistent memory, log_sys.buf[] is
a ring buffer that directly maps to the circular ib_logfile0 file.
There were several errors that could occur in the special case
when a log record ends exactly at the end of the log file and the
next record would start at log_sys.buf[log_sys.START_OFFSET].
mariabackup.huge_lsn,strict_full_crc32: Write the first record
at the very end of the circular file, to reproduce the failure
scenarios.
recv_sys_t::parse(): On PMEM, wrap the end offset of the record
from log_sys.file_size to log_sys.START_OFFSET if needed.
Otherwise, both InnoDB recovery and mariadb-backup would try
to parse the next record from an invalid address.
filename_to_spacename(): Remove an assumption about the format
of file names. While the server currently writes file names like
./databasename/tablename.ibd we might want to stop writing the
redundant ./ prefix in the future. The test mariabackup.huge_lsn
is generating such file names.
xtrabackup_copy_logfile(): Correctly copy a record that ends at
the very end of the log_sys.buf[].
The errors in mariadb-backup were reproduced with the test
mariabackup.huge_lsn,strict_full_crc32 and an additional patch
to use the start checkpoint of the test:
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 27dce5fa17d..e17a1692d6f 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -1796,7 +1796,8 @@ dberr_t recv_sys_t::find_checkpoint()
continue;
}
- if (checkpoint_lsn >= log_sys.next_checkpoint_lsn)
+ if (checkpoint_lsn >= log_sys.next_checkpoint_lsn &&
+ checkpoint_lsn != 0x1000fffffe10)
{
log_sys.next_checkpoint_lsn= checkpoint_lsn;
log_sys.next_checkpoint_no= field == log_t::CHECKPOINT_1;
This commit adds correct handling of binlogs for SST using rsync
or mariabackup. Before this fix, binlogs were handled incorrectly -
- only one (last) binary log file was transferred during SST, which
then led to various failures (for example, when trying to list all
events from the binary log). These bugs were long masked by flaws
in the primitive binlogs handling code in the SST scripts, which
causing binary logs files to be erased after transfer or not added
to the binlog index on the joiner node. Now the correct transfer
of all binary logs (not just the last of the binary log files) has
been implemented both for the rsync (at the script level) and for
the mariabackup (at the level of the main utility code).
This commit also adds a new sst_max_binlogs=<n> parameter, which
can be located in the [sst] section or in the [xtrabackup] section
(historically, supported for mariabackup only, not for rsync), or
in one of the server sections. This parameter specifies the number
of binary log files to be sent to the joiner node during SST. This
option is added for compatibility with old SST scripting behavior,
which can be emulated by setting the sst_max_binlogs=1 (although
in general this can cause problems for the reasons described above).
In addition, setting the sst_max_binlogs=0 can be used to suppress
the transmission of binary logs to the joiner nodes during SST
(although sometimes a single file with the current binary log can
still be transmitted to the joiner, even with sst_max_binlogs=0,
because this sometimes necessary in modes that involve the use of
GTIDs with Galera).
Also, this commit ensures correct handling of paths to various
innodb files and directories in the SST scripts, and fixes some
problems with this that existed in mariabackup utility (which
were associated with incorrect handling of the innodb_data_dir
parameter in some scenarios).
In addition, this commit contains the following enhancements:
1) Added tests for mtr, which check the correct work with binlogs
after SST (using rsync and mariabackup);
2) Added correct handling of slashes at the end of all paths that
the SST script receives as parameters;
3) Improved parsing code for --mysqld-args parameters. Now it
correctly processes the sequence "--" after the name of the
one-letter option;
4) Checking the secret signature during joiner authentication
is made independent of presence of bash (as a unix shell)
in the system and diff utility no longer needed to check
certificates compliance;
5) All directories that are necessary for the correct placement
of various logs are automatically created by SST scripts in
advance (before running mariabackup on the joiner node);
6) Removal of old binary logs on joiner is done using the binlog
index (if it exists) (not only by fixed pattern that based
on the current binlog name, as before);
7) Paths for placing binary logs are correctly processed if they
are set as relative paths (to the datadir);
8) SST scripts are made even more resistant to spaces in filenames
(now for binlogs);
9) In case of failure, SST scripts now always end with an exit
code other than zero;
10) SST script for rsync now correctly create a tar file with
the binlogs, even if the paths to them (in the binlog index
file) are specified as a mix of absolute and relative paths,
and even if they do not match with the datadir path specified
in the current configuration settings.
The performance_schema counter wait/io/file/innodb/innodb_log_file
is always reported as 0.
The way how redo log writes are being waited for was refactored in
commit 30ea63b7d2 by the introduction
of flush_lock and write_lock. Even before that change, all the
wait/io/file/innodb/ counters were always 0 in my tests.
Moreover, if the PMEM interface that was introduced in
commit 3daef523af
is being used, writes to the InnoDB log file will completely avoid
any system calls and performance_schema instrumentation.
In commit 685d958e38 also the reads
of the redo log (during recovery) would bypass any system calls.
A prominent bottleneck in mtr_t::commit() is log_sys.mutex between
log_sys.append_prepare() and log_close().
User-visible change: The minimum innodb_log_file_size will be
increased from 1MiB to 4MiB so that some conditions can be
trivially satisfied.
log_sys.latch (log_latch): Replaces log_sys.mutex and
log_sys.flush_order_mutex. Copying mtr_t::m_log to
log_sys.buf is protected by a shared log_sys.latch.
Writes from log_sys.buf to the file system will be protected
by an exclusive log_sys.latch.
log_sys.lsn_lock: Protects the allocation of log buffer
in log_sys.append_prepare().
sspin_lock: A simple spin lock, for log_sys.lsn_lock.
Thanks to Vladislav Vaintroub for suggesting this idea, and for
reviewing these changes.
mariadb-backup: Replace some use of log_sys.mutex with recv_sys.mutex.
buf_pool_t::insert_into_flush_list(): Implement sorting of flush_list
because ordering is otherwise no longer guaranteed. Ordering by LSN
is needed for the proper operation of redo log checkpoints.
log_sys.append_prepare(): Advance log_sys.lsn and log_sys.buf_free by
the length, and return the old values. Also increment write_to_buf,
which was previously done in log_close().
mtr_t::finish_write(): Obtain the buffer pointer from
log_sys.append_prepare().
log_sys.buf_free: Make the field Atomic_relaxed,
to simplify log_flush_margin(). Use only loads and stores
to avoid costly read-modify-write atomic operations.
buf_pool.flush_list_requests: Replaces
export_vars.innodb_buffer_pool_write_requests
and srv_stats.buf_pool_write_requests.
Protected by buf_pool.flush_list_mutex.
buf_pool_t::insert_into_flush_list(): Do not invoke page_cleaner_wakeup().
Let the caller do that after a batch of calls.
recv_recover_page(): Invoke a minimal part of
buf_pool.insert_into_flush_list().
ReleaseBlocks::modified: A number of pages added to buf_pool.flush_list.
ReleaseBlocks::operator(): Merge buf_flush_note_modification() here.
log_t::set_capacity(): Renamed from log_set_capacity().
In commit 685d958e38 (MDEV-14425),
the log parsing in mariadb-backup --backup was rewritten.
The parameter STORE_IF_EXISTS that is being passed to recv_sys.parse_mtr()
or recv_sys.parse_pmem() instead of STORE_NO caused unnecessary additional
memory allocation for redo log records.
- Store the deferred tablespace name while loading the tablespace
for backup process.
- Mariabackup stores the list of space ids which has page0 INIT_PAGE
records. backup_first_page_op() and first_page_init() was introduced
to track the page0 INIT_PAGE records.
- backup_file_op() and log_file_op() was changed to handle
FILE_MODIFY redo log records. It is used to identify the
deferred tablespace space id.
- Whenever file operation redo log was processed by backup,
backup_file_op() should check whether the space name exist
in deferred tablespace. If it is then it needs to store the
space id, name when FILE_MODIFY, FILE_RENAME redo log processed
and it should delete the tablespace name from defer list in other
cases.
- backup_fix_ddl() should check whether deferred tablespace has
any page0 init records. If it is then consider the tablespace
as newly created tablespace. If not then backup should try
to reload the tablespace with SRV_BACKUP_NO_DEFER mode to
avoid the deferring of tablespace.
In commit 685d958e38 (MDEV-14425)
a bug was introduced to mariadb-backup --backup for the case when
the log is wrapping around to log_sys.START_OFFSET (12288).
This could also cause a "Missing FILE_CHECKPOINT" error during
mariadb-backup --prepare, in case the log copying resumed after
the server had produced a multiple of innodb_log_file_size-12288
bytes of more log so that the last mini-transaction would end
exactly at the end of the log file.
xtrabackup_copy_logfile(): If the log wraps around, read everything
to the end of the log file, and then the rest from log_sys.START_OFFSET.
The InnoDB redo log used to be formatted in blocks of 512 bytes.
The log blocks were encrypted and the checksum was calculated while
holding log_sys.mutex, creating a serious scalability bottleneck.
We remove the fixed-size redo log block structure altogether and
essentially turn every mini-transaction into a log block of its own.
This allows encryption and checksum calculations to be performed
on local mtr_t::m_log buffers, before acquiring log_sys.mutex.
The mutex only protects a memcpy() of the data to the shared
log_sys.buf, as well as the padding of the log, in case the
to-be-written part of the log would not end in a block boundary of
the underlying storage. For now, the "padding" consists of writing
a single NUL byte, to allow recovery and mariadb-backup to detect
the end of the circular log faster.
Like the previous implementation, we will overwrite the last log block
over and over again, until it has been completely filled. It would be
possible to write only up to the last completed block (if no more
recent write was requested), or to write dummy FILE_CHECKPOINT records
to fill the incomplete block, by invoking the currently disabled
function log_pad(). This would require adjustments to some logic around
log checkpoints, page flushing, and shutdown.
An upgrade after a crash of any previous version is not supported.
Logically empty log files from a previous version will be upgraded.
An attempt to start up InnoDB without a valid ib_logfile0 will be
refused. Previously, the redo log used to be created automatically
if it was missing. Only with with innodb_force_recovery=6, it is
possible to start InnoDB in read-only mode even if the log file
does not exist. This allows the contents of a possibly corrupted
database to be dumped.
Because a prepared backup from an earlier version of mariadb-backup
will create a 0-sized log file, we will allow an upgrade from such
log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system
tablespace looks valid.
The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced
with 64-byte log checkpoint blocks at 0x1000 and 0x2000.
The start of log records will move from 0x800 to 0x3000. This allows us
to use 4096-byte aligned blocks for all I/O in a future revision.
We extend the MDEV-12353 redo log record format as follows.
(1) Empty mini-transactions or extra NUL bytes will not be allowed.
(2) The end-of-minitransaction marker (a NUL byte) will be replaced
with a 1-bit sequence number, which will be toggled each time when the
circular log file wraps back to the beginning.
(3) After the sequence bit, a CRC-32C checksum of all data
(excluding the sequence bit) will written.
(4) If the log is encrypted, 8 bytes will be written before
the checksum and included in it. This is part of the
initialization vector (IV) of encrypted log data.
(5) File names, page numbers, and checkpoint information will not be
encrypted. Only the payload bytes of page-level log will be encrypted.
The tablespace ID and page number will form part of the IV.
(6) For padding, arbitrary-length FILE_CHECKPOINT records may be written,
with all-zero payload, and with the normal end marker and checksum.
The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON.
In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will
no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup
will require a valid log file. When resizing the log, we will create
a logically empty ib_logfile101 at the current LSN and use an atomic rename
to replace ib_logfile0 with it. See the test innodb.log_file_size.
Because there is no mandatory padding in the log file, we are able
to create a dummy log file as of an arbitrary log sequence number.
See the test mariabackup.huge_lsn.
The parameter innodb_log_write_ahead_size and the
INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed.
The minimum value of innodb_log_buffer_size will be increased to 2MiB
(because log_sys.buf will replace recv_sys.buf) and the increment
adjusted to 4096 bytes (the maximum log block size).
The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed:
os_log_fsyncs
os_log_pending_fsyncs
log_pending_log_flushes
log_pending_checkpoint_writes
The following status variables will be removed:
Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs)
Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design)
log_sys.get_block_size(): Return the physical block size of the log file.
This is only implemented on Linux and Microsoft Windows for now, and for
the power-of-2 block sizes between 64 and 4096 bytes (the minimum and
maximum size of a checkpoint block). If the block size is anything else,
the traditional 512-byte size will be used via normal file system
buffering.
If the file system buffers can be bypassed, a message like the following
will be issued:
InnoDB: File system buffers for log disabled (block size=512 bytes)
InnoDB: File system buffers for log disabled (block size=4096 bytes)
This has been tested on Linux and Microsoft Windows with both sizes.
On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC.
Tests in 3 different environments where the log is stored in a device
with a physical block size of 512 bytes are yielding better throughput
without O_DIRECT. This could be due to the fact that in the event the
last log block is being overwritten (if multiple transactions would
become durable at the same time, and each of will write a small
number of bytes to the last log block), it should be faster to re-copy
data from log_sys.buf or log_sys.flush_buf to the kernel buffer,
to be finally written at fdatasync() time.
The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for
data files. This option will enable O_DIRECT on the log file on Linux.
It may be unsafe to use when the storage device does not support
FUA (Force Unit Access) mode.
When the server is compiled WITH_PMEM=ON, we will use memory-mapped
I/O for the log file if the log resides on a "mount -o dax" device.
We will identify PMEM in a start-up message:
InnoDB: log sequence number 0 (memory-mapped); transaction id 3
On Linux, we will also invoke mmap() on any ib_logfile0 that resides
in /dev/shm, effectively treating the log file as persistent memory.
This should speed up "./mtr --mem" and increase the test coverage of
PMEM on non-PMEM hardware. It also allows users to estimate how much
the performance would be improved by installing persistent memory.
On other tmpfs file systems such as /run, we will not use mmap().
mariadb-backup: Eliminated several variables. We will refer
directly to recv_sys and log_sys.
backup_wait_for_lsn(): Detect non-progress of
xtrabackup_copy_logfile(). In this new log format with
arbitrary-sized blocks, we can only detect log file overrun
indirectly, by observing that the scanned log sequence number
is not advancing.
xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit,
because we are not allowed to modify the server's log file, and our
memory mapping is read-only.
trx_flush_log_if_needed_low(): Do not use the callback on pmem.
Using neither flush_lock nor write_lock around PMEM writes seems
to yield the best performance. The pmem_persist() calls may
still be somewhat slower than the pwrite() and fdatasync() based
interface (PMEM mounted without -o dax).
recv_sys_t::buf: Remove. We will use log_sys.buf for parsing.
recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE.
recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn.
recv_sys_t, log_sys_t: Removed many data members.
recv_sys.lsn: Renamed from recv_sys.recovered_lsn.
recv_sys.offset: Renamed from recv_sys.recovered_offset.
log_sys.buf_size: Replaces srv_log_buffer_size.
recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset]
when the buffer is being allocated from the memory heap.
recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is
backed by ib_logfile0. The pointer will wrap from recv_sys.len
(log_sys.file_size) to log_sys.START_OFFSET. For the record that
wraps around, we may copy file name or record payload data to
the auxiliary buffer decrypt_buf in order to have a contiguous
block of memory. The maximum size of a record is less than
innodb_page_size bytes.
recv_sys_t::parse(): Take the smart pointer as a template parameter.
Do not temporarily add a trailing NUL byte to FILE_ records, because
we are not supposed to modify the memory-mapped log file. (It is
attached in read-write mode already during recovery.)
recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse().
recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be
returned on PMEM, use recv_ring to wrap around the buffer to the start.
mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free
on PMEM, because it has no meaning on the mmap-based log.
log_sys.write_to_buf: Count writes to log_sys.buf. Replaces
srv_stats.log_write_requests and export_vars.innodb_log_write_requests.
Protected by log_sys.mutex. Updated consistently in log_close().
Previously, mtr_t::commit() conditionally updated the count,
which was inconsistent.
log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf,
for writing to log_sys.log (the ib_logfile0). Replaces
srv_stats.log_writes and export_vars.innodb_log_writes.
Protected by log_sys.mutex.
log_sys.waits: Count waits in append_prepare(). Replaces
srv_stats.log_waits and export_vars.innodb_log_waits.
recv_recover_page(): Do not unnecessarily acquire
log_sys.flush_order_mutex. We are inserting the blocks in arbitary
order anyway, to be adjusted in recv_sys.apply(true).
We will change the definition of flush_lock and write_lock to
avoid potential false sharing. Depending on sizeof(log_sys) and
CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could
share a cache line with each other or with the last data members
of log_sys.
Thanks to Matthias Leich for providing https://rr-project.org traces
for various failures during the development, and to
Thirunarayanan Balathandayuthapani for his help in debugging
some of the recovery code. And thanks to the developers of the
rr debugger for a tool without which extensive changes to InnoDB
would be very challenging to get right.
Thanks to Vladislav Vaintroub for useful feedback and
to him, Axel Schwenke and Krunal Bauskar for testing the performance.
The previous default innodb_buffer_pool_chunk_size of 128M
made sense when the innodb buffer pool size was a few GB.
When the pool size is 128GB this means the chunk size is 0.1%
of this. Fine tuning the buffer pool size on such a fine
increment doesn't make practical sense. Also on extremely
large buffer pool systems, initializing on the default 128M can
also take a considerable amount of time.
When large pages are enabled, the chunk size has to be a multiple
of an available large page size or memory allocation without
use can occur.
Previously the default 0 was documented as disabling resizing.
With srv_buf_pool_chunk_unit > 0 assertions in the code and the
minimium value set, I doubt this was ever the case.
As such the autosizing (based on default 0) takes place as follows:
* a 64th of the innodb_buffer_pool_size
* if large pages, this is rounded down the the nearest multiple
of the large page size.
* If less than 1MB, set to 1MB.
This does mean the new default innodb_buffer_pool_chunk size is
2MB, derived form the above formular with 128MB as the buffer pool
size.
The innodb_buffer_pool_chunk_size is changed to a size_t for
better compatiblity with the memory allocations which use size_t.
The previous upper limit is changed to the maxium of a size_t. The
maximium value used is the buffer pool size anyway.
Getting this default value of the chunk size to a more practical
size facilitates further development of more automated resizing
without significant overhead or memory fragmentation.
innodb_buffer_pool_resize test adjusted based on 1M default
chunk size thanks Wlad.
The previous threads locked need to be released too.
This occurs if the initialization of any of the non-first
mutex/conditition variables errors occurs.
This is follow-up to commit 1193a793c4.
We will set innodb_use_native_aio=OFF by default also in mariadb-backup
when running on a potentially affected kernel.
because plugin code is not only about encryption anymore
(also loads provider plugins), and xb_ prefix prevents name
clashes with the server code (that mariabackup links with).
bzip2/lz4/lzma/lzo/snappy compression is now provided via *services*
they're almost like normal services, but in include/providers/
and they're supposed to provide exactly the same interface
as original compression libraries (but not everything,
only enough of if for the code to compile).
the services are implemented via dummy functions that return
corresponding error values (LZMA_PROG_ERROR, LZO_E_INTERNAL_ERROR, etc).
the actual compression libraries are linked into corresponding
provider plugins. Providers are daemon plugins that when loaded
replace service pointers to point to actual compression functions.
That is, run-time dependency on compression libraries is now on plugins,
and the server doesn't need any compression libraries to run, but
will automatically support the compression when a plugin is loaded.
InnoDB and Mroonga use compression plugins now. RocksDB doesn't,
because it comes with standalone utility binaries that cannot
load plugins.
make BACKUP STAGE behave as FTWRL, desyncing and pausing the node
to prevent BF threads (appliers) from interfering with blocking stages.
This is needed because BF threads don't respect BACKUP MDL locks.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>