The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.
Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.
The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.
purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).
purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.
purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().
purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().
Reviewed by: Vladislav Lesin
There are many filesystem related errors that can occur with
MariaBackup. These already outputed to stderr with a good description of
the error. Many of these are permission or resource (file descriptor)
limits where the assertion and resulting core crash doesn't offer
developers anything more than the log message. To the user, assertions
and core crashes come across as poor error handling.
As such we return an error and handle this all the way up the stack.
copy_back(): Also copy the dummy empty ib_logfile0 so that
MariaDB Server 10.8 or later can be started after
--copy-back or --move-back.
Thanks to Daniel Black for reporting this.
This is a 10.5 version of
commit ebf3649259
copy_back(): Also copy the dummy empty ib_logfile0 so that
MariaDB Server 10.8 or later can be started after
--copy-back or --move-back.
Thanks to Daniel Black for reporting this.
There are many filesystem related errors that can occur with
MariaBackup. These already outputed to stderr with a good description of
the error. Many of these are permission or resource (file descriptor)
limits where the assertion and resulting core crash doesn't offer
developers anything more than the log message. To the user, assertions
and core crashes come across as poor error handling.
As such we return an error and handle this all the way up the stack.
The crash happened in filename_to_spacename() when using it on a
filename that is not in the format of "./database/table.ibd".
According to Marko, it is possible the function is called with
the path to an undo file, which would cause a crash.
This patch fixes this by, instead of crashing with unexpected filenames,
returning them 'as such', except for changing all '\' to '/'.
Fixing buildbot failures on mariabackup.aria_log_dir_path_rel.
The problem was that directory_exists() was called with the
relative aria_log_dir_path value, while the current directory
in mariadb-backup is not necessarily equal to datadir when MTR is running.
Fix:
- Moving building the absolute path un level upper:
from the function copy_back_aria_logs() to the function copy_back().
- Passing the built absolute path to both directory_exists() and
copy_back_aria_logs() as a parameter.
- `mariadb-backup --backup` was fixed to fetch the value of the
@@aria_log_dir_path server variable and copy aria_log* files
from @@aria_log_dir_path directory to the backup directory.
Absolute and relative (to --datadir) paths are supported.
Before this change aria_log* files were copied to the backup
only if they were in the default location in @@datadir.
- `mariadb-backup --copy-back` now understands a new my.cnf and command line
parameter --aria-log-dir-path.
`mariadb-backup --copy-back` in the main loop in copy_back()
(when copying back from the backup directory to --datadir)
was fixed to ignore all aria_log* files.
A new function copy_back_aria_logs() was added.
It consists of a separate loop copying back aria_log* files from
the backup directory to the directory specified in --aria-log-dir-path.
Absolute and relative (to --datadir) paths are supported.
If --aria-log-dir-path is not specified,
aria_log* files are copied to --datadir by default.
- The function is_absolute_path() was fixed to understand MTR style
paths on Windows with forward slashes, e.g.
--aria-log-dir-path=D:/Buildbot/amd64-windows/build/mysql-test/var/...
fil_space_t::create(), fil_space_t::add(): Expect the caller to
acquire and release fil_system.mutex. In this way, creating a tablespace
and adding the first (usually only) data file will be atomic.
recv_sys_t::recover_deferred(): Correctly protect some changes by
holding fil_system.mutex.
Tested by: Matthias Leich
This is a non-functional change.
simplifying the code logic:
- removing global variables ds_data and ds_meta
- passing these variables as parameters to functions instead
- adding helper classes: Datasink_free_list and Backup_datasinks
- moving some function accepting a ds_ctxt parameter
as methods to ds_ctxt.
server has systemd support and calls sd_notify() to communicate
the status to systemd.
mariabackup links the whole server in, but it should not notify
systemd, because it's not started or managed by systemd.
The solution is to suppress error messages for missing tablespaces if
mariabackup is launched with "--prepare --export" options.
"mariabackup --prepare --export" invokes itself with --mysqld parameter.
If the parameter is set, then it starts server to feed "FLUSH TABLES ...
FOR EXPORT;" queries for exported tablespaces. This is "normal" server
start, that's why new srv_operation value is introduced.
Reviewed by Marko Makela.
fil_node_open_file_low() tries to close files from the top of
fil_system.space_list if the number of opened files is exceeded.
It invokes fil_space_t::try_to_close(), which iterates the list searching
for the first opened space. Then it just closes the space, leaving it in
the same position in fil_system.space_list.
On heavy files opening, like during 'SHOW TABLE STATUS ...' execution,
if the number of opened files limit is reached,
fil_space_t::try_to_close() iterates more and more closed spaces before
reaching any opened space for each fil_node_open_file_low() call. What
causes performance regression if the number of spaces is big enough.
The fix is to keep opened spaces at the top of fil_system.space_list,
and move closed files at the end of the list.
For this purpose fil_space_t::space_list_last_opened pointer is
introduced. It points to the last inserted opened space in
fil_space_t::space_list. When space is opened, it's inserted to the
position just after the pointer points to in fil_space_t::space_list to
preserve the logic, inroduced in MDEV-23855. Any closed space is added
to the end of fil_space_t::space_list.
As opened spaces are located at the top of fil_space_t::space_list,
fil_space_t::try_to_close() finds opened space faster.
There can be the case when opened and closed spaces are mixed in
fil_space_t::space_list if fil_system.freeze_space_list was set during
fil_node_open_file_low() execution. But this should not cause any error,
as fil_space_t::try_to_close() still iterates spaces in the list.
There is no need in any test case for the fix, as it does not change any
functionality, but just fixes performance regression.
While performing SAST scanning using Cppcheck against source code of
commit 81196469, several code vulnerabilities were found.
Fix following issues:
1. Parameters of `snprintf` function are incorrect.
Cppcheck error:
client/mysql_plugin.c:1228: error: snprintf format string requires 6 parameters but only 5 are given.
It is due to commit 630d7229 introduced option `--lc-messages-dir`
in the bootstrap command. However the parameter was not even given
in the `snprintf` after changing the format string.
Fix:
Restructure the code logic and correct the function parameters for
`snprintf`.
2. Null pointer is used in a `snprintf` which could cause a crash.
Cppcheck error:
extra/mariabackup/xbcloud.cc:2534: error: Null pointer dereference
The code intended to print the swift_project name, if the
opt_swift_project_id is NULL but opt_swift_project is not NULL.
However the parameter of `snprintf` was mistakenly using
`opt_swift_project_id`.
Fix:
Change to use the correct string from `opt_swift_project`.
3. Potential double release of a memory
Cppcheck error:
plugin/auth_pam/testing/pam_mariadb_mtr.c:69: error: Memory pointed to by 'resp' is freed twice.
A pointer `resp` is reused and allocated new memory after it has been
freed. However, `resp` was not set to NULL after freed.
Potential double release of the same pointer if the call back
function doesn't allocate new memory for `resp` pointer.
Fix:
Set the `resp` pointer to NULL after the first free() to make sure
the same address is not freed twice.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
The MariaDB code base uses strcat() and strcpy() in several
places. These are known to have memory safety issues and their usage is
discouraged. Common security scanners like Flawfinder flags them. In MariaDB we
should start using modern and safer variants on these functions.
This is similar to memory issues fixes in 19af1890b5
and 9de9f105b5 but now replace use of strcat()
and strcpy() with safer options strncat() and strncpy().
However, add '\0' forcefully to make sure the result string is correct since
for these two functions it is not guaranteed what new string will be null-terminated.
Example:
size_t dest_len = sizeof(g->Message);
strncpy(g->Message, "Null json tree", dest_len); strncat(g->Message, ":",
sizeof(g->Message) - strlen(g->Message)); size_t wrote_sz = strlen(g->Message);
size_t cur_len = wrote_sz >= dest_len ? dest_len - 1 : wrote_sz;
g->Message[cur_len] = '\0';
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services
-- Reviewer and co-author Vicențiu Ciorbaru <vicentiu@mariadb.org>
-- Reviewer additions:
* The initial function implementation was flawed. Replaced with a simpler
and also correct version.
* Simplified code by making use of snprintf instead of chaining strcat.
* Simplified code by removing dynamic string construction in the first
place and using static strings if possible. See connect storage engine
changes.
to copy datafile
- Mariabackup fails to copy the undo log tablespace when it undergoes
truncation. So Mariabackup should detect the redo log which does
undo tablespace truncation and also backup should read the minimum
file size of the tablespace and ignore the error while reading.
- Throw error when innodb undo tablespace read failed, but backup
doesn't find the redo log for undo tablespace truncation
io_watching_thread(): Declare as a detachable thread, similar to
log_copying_thread().
stop_backup_threads(): Wait for both log_copying_thread and
io_watching_thread to clear their flags. Expect log_sys.mutex
to be held by the caller.
xtrabackup_backup_func(): Initialize log_copying_stop before
creating io_watching_thread. This prevents a race condition
where io_watching_thread() could wait on the condition variable
before it had been fully initialized. This race condition would
cause a hang in the GNU libc implementation of pthread_cond_destroy()
at the end of stop_backup_threads().
This race condition was introduced in
commit 38fd7b7d91 (MDEV-21452).
The variable was not really being used for anything. The parameters
innodb_read_io_threads, innodb_write_io_threads have replaced
innodb_file_io_threads.
- Mariabackup fails to open the undo tablespaces while applying delta
files to the corresponding data file. Mariabackup opens the
undo tablespaces first time in srv_undo_tablespaces_init() and does
tries to open the undo tablespaces in xtrabackup_apply_deltas() with
conflicting mode and leads to the failure.
- Mariabackup should close the undo tablespaces before applying
the incremental delta files.
os_file_read(): Merged with os_file_read_no_error_handling().
Crashing on a partial page read is as unhelpful as crashing on a
corrupted page read (commit 0b47c126e3).
Report the file name if it is available via IORequest.
In commit 28325b0863
a compile-time option was introduced to disable the macros
DBUG_ENTER and DBUG_RETURN or DBUG_VOID_RETURN.
The parameter name WITH_DBUG_TRACE would hint that it also
covers DBUG_PRINT statements. Let us do that: WITH_DBUG_TRACE=OFF
shall disable DBUG_PRINT() as well.
A few InnoDB recovery tests used to check that some output from
DBUG_PRINT("ib_log", ...) is present. We can live without those checks.
Reviewed by: Vladislav Vaintroub
Let us use the normal platform-specific preprocessor symbols
__linux__, __sun__, _AIX instead of some homebrew ones.
The preprocessor symbol UNIV_HPUX must have lost its meaning
by f6deb00a56 (note: the symbol
UNIV_HPUX10 is being checked for, but only UNIV_HPUX is defined).
xb_read_delta_metadata(): For ROW_FORMAT=COMPRESSED tables, initialize
the info.zip_size with the physical page size and let info.page_size
remain the logical page size, like xb_delta_open_matching_space()
expects it to be ever since
commit 0a1c3477bf (MDEV-18493).
Since the 10.5 split of the privileges, the required GRANTs
for various mariabackup operations has changed.
In the addition of tests, a number of mappings where incorrect:
The option --lock-ddl-per-table didn't require connection admin.
The option --safe-slave-backup requires SLAVE MONITOR even without
the --no-lock option.
Even though commit b817afaa1c passed
the test mariabackup.compress_qpress, that test turned out to be
too small to reveal one more problem that had previously been prevented
by the existence of ctrl_mutex. I did not realize that there can be
multiple concurrent callers to compress_write(). One of them is the
log copying thread; further callers are data file copying threads
(default: --parallel=1).
By default, there is only one compression worker thread
(--compress-threads=1).
compress_write(): Fix a race condition between threads that would
use the same worker thread object. Make thd->data_avail contain the
thread identifier of the submitter, and add thd->avail_cond to
notify other compress_write() threads that are waiting for a slot.
This reverts the revert 4f62dfe676
and fixes the hang that was introduced when ctrl_mutex was removed.
The test mariabackup.compress_qpress covers this code, but the
test is skipped if a stand-alone qpress executable is not available.
It is not available in many software repositories, possibly because
the code base has not been updated since 2010.
This was tested with an executable that was compile from the source
code at http://www.quicklz.com/qpress-11-source.zip (after adding
a missing #include <unistd.h> for the definition of isatty()).
Compared to the grandparent commit (before the revert), the changes
are as follows:
comp_thread_ctxt_t::done_cond: A separate condition for completed
compression, signaling that thd->to_len has been updated.
compress_write(): Replace some threads[i] with thd.
Reset thd->to_len = 0 after consuming the compressed data.
compress_worker_thread_func(): After consuming the uncompressed
data, set thd->data_avail = FALSE. After compressing, signal
thd->done_cond.
The approach to handling corruption that was chosen by Oracle in
commit 177d8b0c12
is not really useful. Not only did it actually fail to prevent InnoDB
from crashing, but it is making things worse by blocking attempts to
rescue data from or rebuild a partially readable table.
We will try to prevent crashes in a different way: by propagating
errors up the call stack. We will never mark the clustered index
persistently corrupted, so that data recovery may be attempted by
reading from the table, or by rebuilding the table.
This should also fix MDEV-13680 (crash on btr_page_alloc() failure);
it was extensively tested with innodb_file_per_table=0 and a
non-autoextend system tablespace.
We should now avoid crashes in many cases, such as when a page
cannot be read or allocated, or an inconsistency is detected when
attempting to update multiple pages. We will not crash on double-free,
such as on the recovery of DDL in system tablespace in case something
was corrupted.
Crashes on corrupted data are still possible. The fault injection mechanism
that is introduced in the subsequent commit may help catch more of them.
buf_page_import_corrupt_failure: Remove the fault injection, and instead
corrupt some pages using Perl code in the tests.
btr_cur_pessimistic_insert(): Always reserve extents (except for the
change buffer), in order to prevent a subsequent allocation failure.
btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages().
btr_assert_not_corrupted(), btr_corruption_report(): Remove.
Similar checks are already part of btr_block_get().
FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE.
dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(),
trx_undo_page_get_s_latched(): Replaced with error-checking calls.
trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get().
trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed.
trx_sys_create_sys_pages(): Merged with trx_sysf_create().
dict_check_tablespaces_and_store_max_id(): Do not access
DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot().
Merge dict_check_sys_tables() with this function.
dir_pathname(): Replaces os_file_make_new_pathname().
row_undo_ins_remove_sec(): Do not modify the undo page by adding
a terminating NUL byte to the record.
btr_decryption_failed(): Report decryption failures
dict_set_corrupted_by_space(), dict_set_encrypted_by_space(),
dict_set_corrupted_index_cache_only(): Remove.
dict_set_corrupted(): Remove the constant parameter dict_locked=false.
Never flag the clustered index corrupted in SYS_INDEXES, because
that would deny further access to the table. It might be possible to
repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case
no B-tree leaf page is corrupted.
dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(),
row_purge_skip_uncommitted_virtual_index(): Remove, and refactor
the callers to read dict_index_t::type only once.
dict_table_is_corrupted(): Remove.
dict_index_t::is_btree(): Determine if the index is a valid B-tree.
BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove.
UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger
assertion failures, but error codes being returned.
buf_corrupt_page_release(): Replaced with a direct call to
buf_pool.corrupted_evict().
fil_invalid_page_access_msg(): Never crash on an invalid read;
let the caller of buf_page_get_gen() decide.
btr_pcur_t::restore_position(): Propagate failure status to the caller
by returning CORRUPTED.
opt_search_plan_for_table(): Simplify the code.
row_purge_del_mark(), row_purge_upd_exist_or_extern_func(),
row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(),
row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free()
when no secondary indexes exist.
row_undo_mod_upd_exist_sec(): Simplify the code.
row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT
if the clustered index (and therefore the table) is corrupted, similar
to what we do in row_insert_for_mysql().
fut_get_ptr(): Replace with buf_page_get_gen() calls.
buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION
if the page is marked as freed. For other modes than
BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will
trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED,
we will return nullptr for freed pages, so that the callers
can be simplified. The purge of transaction history will be
a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on
corrupted data.
buf_page_get_low(): Never crash on a corrupted page, but simply
return nullptr.
fseg_page_is_allocated(): Replaces fseg_page_is_free().
fts_drop_common_tables(): Return an error if the transaction
was rolled back.
fil_space_t::set_corrupted(): Report a tablespace as corrupted if
it was not reported already.
fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report
out-of-bounds page access or other errors.
Clean up mtr_t::page_lock()
buf_page_get_low(): Validate the page identifier (to check for
recently read corrupted pages) after acquiring the page latch.
buf_page_t::read_complete(): Flag uninitialized (all-zero) pages
with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch.
mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi().
recv_sys_t::free_corrupted_page(): Only set_corrupt_fs()
if any log records exist for the page. We do not mind if read-ahead
produces corrupted (or all-zero) pages that were not actually needed
during recovery.
recv_recover_page(): Return whether the operation succeeded.
recv_sys_t::recover_low(): Simplify the logic. Check for recovery error.
Thanks to Matthias Leich for testing this extensively and to the
authors of https://rr-project.org for making it easy to diagnose
and fix any failures that were found during the testing.
comp_thread_ctxt_t: Remove ctrl_mutex, ctrl_cond, started. We do not
actually need them for anything.
destroy_worker_thread(): Split from destroy_worker_threads().
create_worker_threads(): We already initialize
thd->data_avail=FALSE and thd->cancelled=FALSE before
invoking pthread_create(). If any thread creation fails,
clean up by destroy_worker_thread().
compress_worker_thread_func(): Assume that thd->started and
thd->data_avail are already initialized.
Reviewed by: Vladislav Vaintroub
When "mariabackup --target-dir=$basedir --incremental-dir=$incremental_dir"
is running and is moving a new table file (e.g. `db1/t1.new`) from the
incremental directory to the base directory, it needs to verify that the base
backup database directory (e.g. `$basedir/db1`) really exists
(or create it otherwise).
The table `db1/t1` can come from a new database `db1` which
was created during the base mariabackup execution time.
In such case the directory `db1` exists only in the incremental directory,
but does not exist in the base directory.