This reverts the revert 4f62dfe676
and fixes the hang that was introduced when ctrl_mutex was removed.
The test mariabackup.compress_qpress covers this code, but the
test is skipped if a stand-alone qpress executable is not available.
It is not available in many software repositories, possibly because
the code base has not been updated since 2010.
This was tested with an executable that was compile from the source
code at http://www.quicklz.com/qpress-11-source.zip (after adding
a missing #include <unistd.h> for the definition of isatty()).
Compared to the grandparent commit (before the revert), the changes
are as follows:
comp_thread_ctxt_t::done_cond: A separate condition for completed
compression, signaling that thd->to_len has been updated.
compress_write(): Replace some threads[i] with thd.
Reset thd->to_len = 0 after consuming the compressed data.
compress_worker_thread_func(): After consuming the uncompressed
data, set thd->data_avail = FALSE. After compressing, signal
thd->done_cond.
The approach to handling corruption that was chosen by Oracle in
commit 177d8b0c12
is not really useful. Not only did it actually fail to prevent InnoDB
from crashing, but it is making things worse by blocking attempts to
rescue data from or rebuild a partially readable table.
We will try to prevent crashes in a different way: by propagating
errors up the call stack. We will never mark the clustered index
persistently corrupted, so that data recovery may be attempted by
reading from the table, or by rebuilding the table.
This should also fix MDEV-13680 (crash on btr_page_alloc() failure);
it was extensively tested with innodb_file_per_table=0 and a
non-autoextend system tablespace.
We should now avoid crashes in many cases, such as when a page
cannot be read or allocated, or an inconsistency is detected when
attempting to update multiple pages. We will not crash on double-free,
such as on the recovery of DDL in system tablespace in case something
was corrupted.
Crashes on corrupted data are still possible. The fault injection mechanism
that is introduced in the subsequent commit may help catch more of them.
buf_page_import_corrupt_failure: Remove the fault injection, and instead
corrupt some pages using Perl code in the tests.
btr_cur_pessimistic_insert(): Always reserve extents (except for the
change buffer), in order to prevent a subsequent allocation failure.
btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages().
btr_assert_not_corrupted(), btr_corruption_report(): Remove.
Similar checks are already part of btr_block_get().
FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE.
dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(),
trx_undo_page_get_s_latched(): Replaced with error-checking calls.
trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get().
trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed.
trx_sys_create_sys_pages(): Merged with trx_sysf_create().
dict_check_tablespaces_and_store_max_id(): Do not access
DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot().
Merge dict_check_sys_tables() with this function.
dir_pathname(): Replaces os_file_make_new_pathname().
row_undo_ins_remove_sec(): Do not modify the undo page by adding
a terminating NUL byte to the record.
btr_decryption_failed(): Report decryption failures
dict_set_corrupted_by_space(), dict_set_encrypted_by_space(),
dict_set_corrupted_index_cache_only(): Remove.
dict_set_corrupted(): Remove the constant parameter dict_locked=false.
Never flag the clustered index corrupted in SYS_INDEXES, because
that would deny further access to the table. It might be possible to
repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case
no B-tree leaf page is corrupted.
dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(),
row_purge_skip_uncommitted_virtual_index(): Remove, and refactor
the callers to read dict_index_t::type only once.
dict_table_is_corrupted(): Remove.
dict_index_t::is_btree(): Determine if the index is a valid B-tree.
BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove.
UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger
assertion failures, but error codes being returned.
buf_corrupt_page_release(): Replaced with a direct call to
buf_pool.corrupted_evict().
fil_invalid_page_access_msg(): Never crash on an invalid read;
let the caller of buf_page_get_gen() decide.
btr_pcur_t::restore_position(): Propagate failure status to the caller
by returning CORRUPTED.
opt_search_plan_for_table(): Simplify the code.
row_purge_del_mark(), row_purge_upd_exist_or_extern_func(),
row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(),
row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free()
when no secondary indexes exist.
row_undo_mod_upd_exist_sec(): Simplify the code.
row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT
if the clustered index (and therefore the table) is corrupted, similar
to what we do in row_insert_for_mysql().
fut_get_ptr(): Replace with buf_page_get_gen() calls.
buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION
if the page is marked as freed. For other modes than
BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will
trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED,
we will return nullptr for freed pages, so that the callers
can be simplified. The purge of transaction history will be
a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on
corrupted data.
buf_page_get_low(): Never crash on a corrupted page, but simply
return nullptr.
fseg_page_is_allocated(): Replaces fseg_page_is_free().
fts_drop_common_tables(): Return an error if the transaction
was rolled back.
fil_space_t::set_corrupted(): Report a tablespace as corrupted if
it was not reported already.
fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report
out-of-bounds page access or other errors.
Clean up mtr_t::page_lock()
buf_page_get_low(): Validate the page identifier (to check for
recently read corrupted pages) after acquiring the page latch.
buf_page_t::read_complete(): Flag uninitialized (all-zero) pages
with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch.
mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi().
recv_sys_t::free_corrupted_page(): Only set_corrupt_fs()
if any log records exist for the page. We do not mind if read-ahead
produces corrupted (or all-zero) pages that were not actually needed
during recovery.
recv_recover_page(): Return whether the operation succeeded.
recv_sys_t::recover_low(): Simplify the logic. Check for recovery error.
Thanks to Matthias Leich for testing this extensively and to the
authors of https://rr-project.org for making it easy to diagnose
and fix any failures that were found during the testing.
comp_thread_ctxt_t: Remove ctrl_mutex, ctrl_cond, started. We do not
actually need them for anything.
destroy_worker_thread(): Split from destroy_worker_threads().
create_worker_threads(): We already initialize
thd->data_avail=FALSE and thd->cancelled=FALSE before
invoking pthread_create(). If any thread creation fails,
clean up by destroy_worker_thread().
compress_worker_thread_func(): Assume that thd->started and
thd->data_avail are already initialized.
Reviewed by: Vladislav Vaintroub
When "mariabackup --target-dir=$basedir --incremental-dir=$incremental_dir"
is running and is moving a new table file (e.g. `db1/t1.new`) from the
incremental directory to the base directory, it needs to verify that the base
backup database directory (e.g. `$basedir/db1`) really exists
(or create it otherwise).
The table `db1/t1` can come from a new database `db1` which
was created during the base mariabackup execution time.
In such case the directory `db1` exists only in the incremental directory,
but does not exist in the base directory.
This follows up the previous fix in
commit c3c53926c4 (MDEV-26554).
ha_innobase::delete_table(): Work around the insufficient
metadata locking (MDL) during DML operations by acquiring exclusive
InnoDB table locks on all child tables. Previously, this was only
done on TRUNCATE and ALTER.
ibuf_delete_rec(), btr_cur_optimistic_delete(): Do not invoke
lock_update_delete() during change buffer operations.
The revised trx_t::commit(std::vector<pfs_os_file_t>&) will
hold exclusive lock_sys.latch while invoking fil_delete_tablespace(),
which in turn may invoke ibuf_delete_rec().
dict_index_t::has_locking(): A new predicate, replacing the dummy
!dict_table_is_locking_disabled(index->table). Used for skipping lock
operations during ibuf_delete_rec().
trx_t::commit(std::vector<pfs_os_file_t>&): Release the locks
and remove the table from the cache while holding exclusive
lock_sys.latch.
trx_t::commit_in_memory(): Skip release_locks() if dict_operation holds.
trx_t::commit(): Reset dict_operation before invoking commit_in_memory()
via commit_persist().
lock_release_on_drop(): Release locks while lock_sys.latch is
exclusively locked.
lock_table(): Add a parameter for a pointer to the table.
We must not dereference the table before a lock_sys.latch has
been acquired. If the pointer to the table does not match the table
at that point, the table is invalid and DB_DEADLOCK will be returned.
row_ins_foreign_check_on_constraint(): Improve the checks.
Remove a bogus DB_LOCK_WAIT_TIMEOUT return that was needed
before commit c5fd9aa562 (MDEV-25919).
row_upd_check_references_constraints(),
wsrep_row_upd_check_foreign_constraints(): Simplify checks.
don't initialize error_log_handler_list in set_handlers()
* error_log_handler_list is initialized to LOG_FILE early, in init_base()
* set_handlers always reinitializes it to LOG_FILE, so it's pointless
* after init_base() concurrent threads start using sql_log_warning,
so following set_handlers() shouldn't modify error_log_handler_list
without some protection
We will remove the parameter innodb_disallow_writes because it is badly
designed and implemented. The parameter was never allowed at startup.
It was only internally used by Galera snapshot transfer.
If a user executed
SET GLOBAL innodb_disallow_writes=ON;
the server could hang even on subsequent read operations.
During Galera snapshot transfer, we will block writes
to implement an rsync friendly snapshot, as follows:
sst_flush_tables() will acquire a global lock by executing
FLUSH TABLES WITH READ LOCK, which will block any writes
at the high level.
sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
will suspend or disable InnoDB background tasks or threads that could
initiate writes. As part of this, log_make_checkpoint() will be invoked
to ensure that anything in the InnoDB buf_pool.flush_list will be written
to the data files. This has the nice side effect that the Galera joiner
will avoid crash recovery.
The changes to sql/wsrep.cc and to the tests are based on a prototype
that was developed by Jan Lindström.
Reviewed by: Jan Lindström
As main() invokes parse_page() when -S or -D are set, it can be a case
when parse_page() is invoked when -D filename is not set, that is why
any attempt to write to page dump file must be done only if the file
name is set with -D.
The bug is caused by 2ef7a5a13a
(MDEV-13443).
In commit 7a4fbb55b0 (MDEV-25105)
the innochecksum option --write (-w) was removed altogether.
It should have been made a Boolean option, so that old data files
may be converted to a format that is compatible with
innodb_checksum_algorithm=strict_crc32 by executing the following:
innochecksum -n -w ibdata* */*.ibd
It would be better to use an older-version innochecksum
for such a conversion, so that page checksums will be validated
before updating the checksum.
It never was possible for innochecksum to convert files to the
innodb_checksum_algorithm=full_crc32 format that is the default
for new InnoDB data files.
This commit adds correct handling of binlogs for SST using rsync
or mariabackup. Before this fix, binlogs were handled incorrectly -
- only one (last) binary log file was transferred during SST, which
then led to various failures (for example, when trying to list all
events from the binary log). These bugs were long masked by flaws
in the primitive binlogs handling code in the SST scripts, which
causing binary logs files to be erased after transfer or not added
to the binlog index on the joiner node. Now the correct transfer
of all binary logs (not just the last of the binary log files) has
been implemented both for the rsync (at the script level) and for
the mariabackup (at the level of the main utility code).
This commit also adds a new sst_max_binlogs=<n> parameter, which
can be located in the [sst] section or in the [xtrabackup] section
(historically, supported for mariabackup only, not for rsync), or
in one of the server sections. This parameter specifies the number
of binary log files to be sent to the joiner node during SST. This
option is added for compatibility with old SST scripting behavior,
which can be emulated by setting the sst_max_binlogs=1 (although
in general this can cause problems for the reasons described above).
In addition, setting the sst_max_binlogs=0 can be used to suppress
the transmission of binary logs to the joiner nodes during SST
(although sometimes a single file with the current binary log can
still be transmitted to the joiner, even with sst_max_binlogs=0,
because this sometimes necessary in modes that involve the use of
GTIDs with Galera).
Also, this commit ensures correct handling of paths to various
innodb files and directories in the SST scripts, and fixes some
problems with this that existed in mariabackup utility (which
were associated with incorrect handling of the innodb_data_dir
parameter in some scenarios).
In addition, this commit contains the following enhancements:
1) Added tests for mtr, which check the correct work with binlogs
after SST (using rsync and mariabackup);
2) Added correct handling of slashes at the end of all paths that
the SST script receives as parameters;
3) Improved parsing code for --mysqld-args parameters. Now it
correctly processes the sequence "--" after the name of the
one-letter option;
4) Checking the secret signature during joiner authentication
is made independent of presence of bash (as a unix shell)
in the system and diff utility no longer needed to check
certificates compliance;
5) All directories that are necessary for the correct placement
of various logs are automatically created by SST scripts in
advance (before running mariabackup on the joiner node);
6) Removal of old binary logs on joiner is done using the binlog
index (if it exists) (not only by fixed pattern that based
on the current binlog name, as before);
7) Paths for placing binary logs are correctly processed if they
are set as relative paths (to the datadir);
8) SST scripts are made even more resistant to spaces in filenames
(now for binlogs);
9) In case of failure, SST scripts now always end with an exit
code other than zero;
10) SST script for rsync now correctly create a tar file with
the binlogs, even if the paths to them (in the binlog index
file) are specified as a mix of absolute and relative paths,
and even if they do not match with the datadir path specified
in the current configuration settings.
- Store the deferred tablespace name while loading the tablespace
for backup process.
- Mariabackup stores the list of space ids which has page0 INIT_PAGE
records. backup_first_page_op() and first_page_init() was introduced
to track the page0 INIT_PAGE records.
- backup_file_op() and log_file_op() was changed to handle
FILE_MODIFY redo log records. It is used to identify the
deferred tablespace space id.
- Whenever file operation redo log was processed by backup,
backup_file_op() should check whether the space name exist
in deferred tablespace. If it is then it needs to store the
space id, name when FILE_MODIFY, FILE_RENAME redo log processed
and it should delete the tablespace name from defer list in other
cases.
- backup_fix_ddl() should check whether deferred tablespace has
any page0 init records. If it is then consider the tablespace
as newly created tablespace. If not then backup should try
to reload the tablespace with SRV_BACKUP_NO_DEFER mode to
avoid the deferring of tablespace.
- compile wolfcrypt with kdf.c, to avoid undefined symbols in tls13.c
- define WOLFSSL_HAVE_ERROR_QUEUE to avoid endless loop SSL_get_error
- Do not use SSL_CTX_set_tmp_dh/get_dh2048, this would require additional
compilation options in WolfSSL. Disable it for WolfSSL build, it works
without it anyway.
- fix "macro already defined" Windows warning.
1) Removed symlinks that are not very well supported in tar under Windows.
2) Added comment + changed code formatting in viosslfactories.c
3) Fixed a small bug in the yassl code.
4) Fixed a typo in the script code.
The previous threads locked need to be released too.
This occurs if the initialization of any of the non-first
mutex/conditition variables errors occurs.
This is follow-up to commit 1193a793c4.
We will set innodb_use_native_aio=OFF by default also in mariadb-backup
when running on a potentially affected kernel.
Add a couple of NO_XXX prprocessor constants to wolfssl build.
Looked into cmake defaults, those are set there too. Some of
these are (supposedly) weak ciphers, and some just fallen out from wide
use.
Workaround WolfSSL bug https://github.com/wolfSSL/wolfssl/issues/4242
(heap overflow) by using fastmath library everywhere, except Windows clang
Before the patch, default math library was used on all 32bit platforms.
make BACKUP STAGE behave as FTWRL, desyncing and pausing the node
to prevent BF threads (appliers) from interfering with blocking stages.
This is needed because BF threads don't respect BACKUP MDL locks.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Analysis: get_defaults_options() rewrites the value for my_defaults_file,
my_defaults_extra_file and my_defaults_group_suffix to 0. So the config file
can't be read.
Fix: Let handle_options() handle --defaults* option and create a new command
line to pass it to load_defaults().
because the name was misleading, it counts not threads, but THDs,
and as THD_count is the only way to increment/decrement it, it
could as well be declared inside THD_count.
InnoDB tablespace identifiers and page numbers are 32-bit numbers.
Let us use a 32-bit type for them in innochecksum.
The changes in commit 1918bdf32c
broke the build on 32-bit Windows.
Thanks to Vicențiu Ciorbaru for an initial version of this fixup.
It is implementation-defined whether alignment requirements
that are larger than std::max_align_t (typically 8 or 16 bytes)
will be honored by the compiler and linker.
It turns out that on IBM AIX, both alignas() and MY_ALIGNED()
only guarantees alignment up to 16 bytes.
For some data structures, specifying alignment to the CPU
cache line size (typically 64 or 128 bytes) is a mere performance
optimization, and we do not really care whether the requested
alignment is guaranteed.
But, for the correct operation of direct I/O, we do require that
the buffers be aligned at a block size boundary.
field_ref_zero: Define as a pointer, not an array.
For innochecksum, we can make this point to unaligned memory;
for anything else, we will allocate an aligned buffer from the heap.
This buffer will be used for overwriting freed data pages when
innodb_immediate_scrub_data_uncompressed=ON. And exactly that code
hit an assertion failure on AIX, in the test innodb.innodb_scrub.
log_sys.checkpoint_buf: Define as a pointer to aligned memory
that is allocated from heap.
log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf
instead of trying to allocate an aligned buffer from the stack.
This gives a short overview over found/missing dependencies as well
as enabled/disabled features.
Initial author Heinz Wiesinger <heinz@m2mobi.com>
Additions by Vicențiu Ciorbaru <vicentiu@mariadb.org>
* Report all plugins enabled via MYSQL_ADD_PLUGIN
* Simplify code. Eliminate duplication by making use of WITH_xxx
variable values to set feature "ON" / "OFF" state.
Reviewed by: wlad@mariadb.com (code details) serg@mariadb.com (the idea)
Store and maintain xdes pages always. And doesn't verify checksums for
freed pages.
innochecksum can work only with the first space file of multiple ones.
Tell about it and abort in case of not the first file.
fil_ibd_create(): Remove code that should have been removed in
commit 86dc7b4d4c already.
We no longer wrote an initialized page to the file, but we would
still allocate a page image in memory and write it.
xb_space_create_file(): Remove an unnecessary page write.
(This is a functional change for Mariabackup.)
Let us simply refuse an upgrade from earlier versions if the
upgrade procedure was not followed. This simplifies the purge,
commit, and rollback of transactions.
Before upgrading to MariaDB 10.3 or later, a clean shutdown
of the server (with innodb_fast_shutdown=1 or 0) is necessary,
to ensure that any incomplete transactions are rolled back.
The undo log format was changed in MDEV-12288. There is only
one persistent undo log for each transaction.
In commit 1c5ae99194 (MDEV-25666)
we had changed Mariabackup so that it would no longer skip files
whose names start with #sql. This turned out to be wrong.
Because operations on such named files are not protected by any
locks in the server, it is not safe to copy them.
Not copying the files may make the InnoDB data dictionary
inconsistent with the file system. So, we must do something
in InnoDB to adjust for that.
If InnoDB is being started up without the redo log (ib_logfile0)
or with a zero-length log file, we will assume that the server
was restored from a backup, and adjust things as follows:
dict_check_sys_tables(), fil_ibd_open(): Do not complain about
missing #sql files if they would be dropped a little later.
dict_stats_update_if_needed(): Never add #sql tables to
the recomputing queue. This avoids a potential race condition when
dropping the garbage tables.
drop_garbage_tables_after_restore(): Try to drop any garbage tables.
innodb_ddl_recovery_done(): Invoke drop_garbage_tables_after_restore()
if srv_start_after_restore (a new flag) was set and we are not in
read-only mode (innodb_read_only=ON or innodb_force_recovery>3).
The tests and dbug_mariabackup_event() instrumentation
were developed by Vladislav Vaintroub, who also reviewed this.
In commit 49e2c8f0a6 (MDEV-25743)
we made dict_sys_t::find() incompatible with the rest of the
table name hash table operations in case the table name contains
non-ASCII octets (using a compatibility mode that facilitates the
upgrade into the MySQL 5.0 filename-safe encoding) and the target
platform implements signed char.
ut_fold_string(): Remove; replace with my_crc32c(). This also makes
table name hash value calculations independent on whether char
is unsigned or signed.
This fixed the MySQL bug# 20338 about misuse of double underscore
prefix __WIN__, which was old MySQL's idea of identifying Windows
Replace it by _WIN32 standard symbol for targeting Windows OS
(both 32 and 64 bit)
Not that connect storage engine is not fixed in this patch (must be
fixed in "upstream" branch)
Many InnoDB data dictionary cache operations require that the
table name be copied so that it will be NUL terminated.
(For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.)
dict_table_t::is_garbage_name(): Check if a name belongs to
the background drop table queue.
dict_check_if_system_table_exists(): Remove.
dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables
SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup.
dict_sys_t::create_or_check_sys_tables(): Replaces
dict_create_or_check_foreign_constraint_tables() and
dict_create_or_check_sys_virtual().
dict_sys_t::load_table(): Replaces dict_table_get_low()
and dict_load_table().
dict_sys_t::find_table(): Renamed from get_table().
dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded
tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist.
trx_t::has_stats_table_lock(): Moved to dict0stats.cc.
Some error messages will now report table names in the internal
databasename/tablename format, instead of `databasename`.`tablename`.
Changes:
- To detect automatic strlen() I removed the methods in String that
uses 'const char *' without a length:
- String::append(const char*)
- Binary_string(const char *str)
- String(const char *str, CHARSET_INFO *cs)
- append_for_single_quote(const char *)
All usage of append(const char*) is changed to either use
String::append(char), String::append(const char*, size_t length) or
String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
String::append()
- Added overflow argument to escape_string_for_mysql() and
escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
This was needed as most usage of the above functions never tested the
result for -1 and would have given wrong results or crashes in case
of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
Changed all Item_func::func_name()'s to func_name_cstring()'s.
The old Item_func_or_sum::func_name() is now an inline function that
returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
LEX_CSTRING.
- Changed for some functions the name argument from const char * to
to const LEX_CSTRING &:
- Item::Item_func_fix_attributes()
- Item::check_type_...()
- Type_std_attributes::agg_item_collations()
- Type_std_attributes::agg_item_set_converter()
- Type_std_attributes::agg_arg_charsets...()
- Type_handler_hybrid_field_type::aggregate_for_result()
- Type_handler_geometry::check_type_geom_or_binary()
- Type_handler::Item_func_or_sum_illegal_param()
- Predicant_to_list_comparator::add_value_skip_null()
- Predicant_to_list_comparator::add_value()
- cmp_item_row::prepare_comparators()
- cmp_item_row::aggregate_row_elements_for_comparison()
- Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
could be simplified to not use String_space(), thanks to the fixed
my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
- NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
clarify what the function really does.
- Rename of protocol function:
bool store(const char *from, CHARSET_INFO *cs) to
bool store_string_or_null(const char *from, CHARSET_INFO *cs).
This was done to both clarify the difference between this 'store' function
and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.
Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
The implementation of handlerton::drop_database in InnoDB is
unnecessarily complex. The minimal implementation should check
that no conflicting locks or references exist on the tables,
delete all table metadata in a single transaction, and finally
delete the tablespaces.
Note: DROP DATABASE will delete each individual table that the
SQL layer knows about, one table per transaction.
The handlerton::drop_database is basically a final cleanup step
for removing any garbage that could have been left behind
in InnoDB due to some bug, or not having atomic DDL in the past.
hash_node_t: Remove. Use the proper data type name in pointers.
dict_drop_index_tree(): Do not take the table as a parameter.
Instead, return the tablespace ID if the tablespace should be dropped
(we are dropping a clustered index tree).
fil_delete_tablespace(), fil_system_t::detach(): Return a single
detached file handle. Multi-file tablespaces cannot be deleted
via this interface.
ha_innobase::delete_table(): Remove a work-around for non-atomic DDL
and do not try to drop tables with similar-looking name.
innodb_drop_database(): Complete rewrite.
innobase_drop_database(), dict_get_first_table_name_in_db(),
row_drop_database_for_mysql(), drop_all_foreign_keys_in_db(): Remove.
row_purge_remove_clust_if_poss_low(), row_undo_ins_remove_clust_rec():
If the tablespace is to be deleted, try to evict the table definition
from the cache. Failing that, set dict_table_t::space to nullptr.
lock_release_on_rollback(): On the rollback of CREATE TABLE, release all
locks that the transaction had on the table, to avoid heap-use-after-free.
The functions fil_file_readdir_next_file(), os_file_opendir(),
os_file_closedir() became dead code in the server in MariaDB 10.4.0
with commit 09af00cbde (the removal of
the crash recovery logic for the TRUNCATE TABLE implementation that
was replaced in MDEV-13564).
os_file_opendir(), os_file_closedir(): Define as macros.
During data file creation, InnoDB holds dict_sys mutex, tries to
write page 0 of the file and flushes the file. This not only causing
unnecessary contention but also a deviation from the write-ahead
logging protocol.
The clean sequence of operations is that we first start a dictionary
transaction and write SYS_TABLES and SYS_INDEXES records that identify
the tablespace. Then, we durably write a FILE_CREATE record to the
write-ahead log and create the file.
Recovery should not unnecessarily insist that the first page of each
data file that is referred to by the redo log is valid. It must be
enough that page 0 of the tablespace can be initialized based on the
redo log contents.
We introduce a new data structure deferred_spaces that keeps track
of corrupted-looking files during recovery. The data structure holds
the last LSN of a FILE_ record referring to the data file, the
tablespace identifier, and the last known file name.
There are two scenarios can happen during recovery:
i) Sufficient memory: InnoDB can reconstruct the
tablespace after parsing all redo log records.
ii) Insufficient memory(multiple apply phase): InnoDB should
store the deferred tablespace redo logs even though
tablespace is not present. InnoDB should start constructing
the tablespace when it first encounters deferred tablespace
id.
Mariabackup copies the zero filled ibd file in backup_fix_ddl() as
the extension of .new file. Mariabackup test case does page flushing
when it deals with DDL operation during backup operation.
fil_ibd_create(): Remove the write of page0 and flushing of file
fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has
zero filled page0
Datafile: Clean up the error handling, and do not report errors
if we are in the middle of recovery. The caller will check
Datafile::m_defer.
fil_node_t::deferred: Indicates whether the tablespace loading was
deferred during recovery
FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace
file was cannot be loaded.
recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to
initialize fil_space_t based on buffered metadata and records to
initialize page 0. Ignore the flags in fil_name_t, because they are
intentionally invalid.
fil_name_process(): Update deferred_spaces.
recv_sys_t::parse(): Store the redo log if the tablespace id
is present in deferred spaces
recv_sys_t::recover_low(): Should recover the first page of
the tablespace even though the tablespace instance is not
present
recv_sys_t::apply(): Initialize the deferred tablespace
before applying the deferred tablespace records
recv_validate_tablespace(): Skip the validation for deferred_spaces.
recv_rename_files(): Moved and revised from recv_sys_t::apply().
For deferred-recovery tablespaces, do not attempt to rename the
file if a deferred-recovery tablespace is associated with the name.
recv_recovery_from_checkpoint_start(): Invoke recv_rename_files()
and initialize all deferred tablespaces before applying redo log.
fil_node_t::read_page0(): Skip page0 validation if the tablespace
is deferred
buf_page_create_deferred(): A variant of buf_page_create() when
the fil_space_t is not available yet
This is joint work with Thirunarayanan Balathandayuthapani,
who implemented an initial prototype.
Ever since MDEV-18518 made DDL operations mostly crash-safe inside InnoDB,
it became obvious that Mariabackup might not be entirely safe with regard to
concurrent DDL operations.
check_if_skip_table(): Do not skip files whose name starts with #sql.
We cannot know whether a DDL operation is in progress and the table
might in fact be needed later.
When CMAKE_CROSSCOMPILING_EMULATOR is defined, a cross-compile
can be made, however with native (emulated) execution possible.
This commit takes those points in the build system that
execute built targets natively and allow these to be executed
in a crosscompile if CMAKE_CROSSCOMPILING_EMULATOR is defined.
Closes#1805
SST scripts for Galera should use the new mariabackup interface
instead of the innobackupex interface, which is currently only
supported for compatibility reasons.
This commit converts the SST script for mariabackup to use the
new interface. It does not need separate tests, as any problems
will be seen as failures when running multiple tests for the
mariabackup-based SST.
This patch fixes an issue with launching mariabackup during SST
(when used with Galera), when during bootstrap mariabackup receives
the "--innodb" option, which is incorrectly interpreted as shortcut
for "--innodb-force-recovery". This patch does not require separate
test for mtr, as the problem is visible in general testing on
buildbot.
A consistency check for fil_space_t::name is causing recovery failures
in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field
altogether.
fil_space_t::name was more or less a copy of dict_table_t::name
(except for some special cases), and it was not being used for
anything useful.
There used to be a name_hash, but it had been removed already in
commit a75dbfd718 (MDEV-12266).
We will also remove os_normalize_path(), OS_PATH_SEPARATOR,
OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and /
roughly in the same way. The intention is that for per-table
tablespaces, the filenames will always follow the pattern
prefix/databasename/tablename.ibd. (Any \ in the prefix must not
be converted.)
ut_basename_noext(): Remove (unused function).
read_link_file(): Replaces RemoteDatafile::read_link_file().
We will ensure that the last two path component separators are
forward slashes (converting up to 2 trailing backslashes on
Microsoft Windows), so that everywhere else we can
assume that data file names end in "/databasename/tablename.ibd".
Note: On Microsoft Windows, path names that start with \\?\ must
not contain / as path component separators. Previously, such paths
did work in the DATA DIRECTORY argument of InnoDB tables.
Reviewed by: Vladislav Vaintroub
The mariabackup interface currently supports passing a password
through an explicit command line variable, but does not support
passing a password through the MYSQL_PWD environment variable.
At the same time, the Galera SST script for mariabackup uses
the environment variable to pass the password, which leads
(in some cases) to an unsuccessful launch of mariabackup and
to the inability to start the cluster. This patch fixes this
issue. It does not need a separate test, as the problem is
visible in general testing on buildbot.
During the prepare phase of restoring backups, "mariabackup" does
not seem to allow (or recognize) the option "innodb_force_recovery"
for the embedded InnoDB server instance that it starts.
If page corruption observed during page recovery, the prepare step
fails. While this is indeed the correct behavior ideally, allowing
this option to be set in case of emergencies might be useful when
the current backup is the only copy available. Some error messages
during "--prepare" suggest to set "innodb_force_recovery" to 1:
[ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
For backwards compatibility, "mariabackup --innobackupex --apply-log"
should also have this option.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
based on the man-page
```
The replace program is used by msql2mysql. See msql2mysql(1).
```
msql2mysql is labeled as Client component, so should the dependency
Closes PR #900
std version has an advantage of a more convenient units implementation from
std::chrono. Now it's no need to multipy/divide to bring anything to
micro seconds.
liburing is a new optional dependency (WITH_URING=auto|yes|no)
that replaces libaio when it is available.
aio_uring: class which wraps io_uring stuff
aio_uring::bind()/unbind(): optional optimization
aio_uring::submit_io(): mutex prevents data race. liburing calls are
thread-unsafe. But if you look into it's implementation you'll see
atomic operations. They're used for synchronization between kernel and
user-space only. That's why our own synchronization is still needed.
For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288)
because the io_uring_setup system call that is invoked
by io_uring_queue_init() requests locked memory. The value
was found empirically; with 262144, we would occasionally
fail to enable io_uring when using the maximum values of
innodb_read_io_threads=64 and innodb_write_io_threads=64.
aio_uring::thread_routine(): Tolerate -EINTR return from
io_uring_wait_cqe(), because it may occur on shutdown
on Ubuntu 20.10 (Groovy Gorilla).
This was mostly implemented by Eugene Kosov. Systemd integration
and improved startup/shutdown error handling by Marko Mäkelä.
Historically, InnoDB supported a buggy page checksum algorithm that did not
compute a checksum over the full page. Later, well before MySQL 4.1
introduced .ibd files and the innodb_file_per_table option, the algorithm
was corrected and the first 4 bytes of each page were redefined to be
a checksum.
The original checksum was so slow that an option to disable page checksum
was introduced for benchmarketing purposes.
The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set
extension, which includes instructions for faster computation of CRC-32C.
In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was
implemented to make of that. As that option was changed to be the default
in MySQL 5.7, a bug was found on big-endian platforms and some work-around
code was added to weaken that checksum further. MariaDB disables that
work-around by default since MDEV-17958.
Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER
and ARM and also for IA-32/AMD64, making use of carry-less multiplication
where available.
Long story short, innodb_checksum_algorithm=crc32 is faster and more secure
than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb.
It should have removed any need to use innodb_checksum_algorithm=none.
The setting innodb_checksum_algorithm=crc32 is the default in
MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5,
MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default.
It is even faster and more secure.
The default settings in MariaDB do allow old data files to be read,
no matter if a worse checksum algorithm had been used.
(Unfortunately, before innodb_checksum_algorithm=full_crc32,
the data files did not identify which checksum algorithm is being used.)
The non-default settings innodb_checksum_algorithm=strict_crc32 or
innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C
checksums. The incompatibility with old data files is why they are
not the default.
The newest server not to support innodb_checksum_algorithm=crc32
were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life.
A valid reason for using innodb_checksum_algorithm=innodb could have
been the ability to downgrade. If it is really needed, data files
can be converted with an older version of the innochecksum utility.
Because there is no good reason to allow data files to be written
with insecure checksums, we will reject those option values:
innodb_checksum_algorithm=none
innodb_checksum_algorithm=innodb
innodb_checksum_algorithm=strict_none
innodb_checksum_algorithm=strict_innodb
Furthermore, the following innochecksum options will be removed,
because only strict crc32 will be supported:
innochecksum --strict-check=crc32
innochecksum -C crc32
innochecksum --write=crc32
innochecksum -w crc32
If a user wishes to convert a data file to use a different checksum
(so that it might be used with the no-longer-supported
MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE
nor system tablespace format changes that were made in MariaDB 10.3),
then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or
MySQL 5.7 can be used.
Reviewed by: Thirunarayanan Balathandayuthapani
We have innodb_use_native_aio=ON by default since the introduction of
that parameter in commit 2f9fb41b05
(MySQL 5.5 and MariaDB 5.5).
However, to really benefit from the setting, the files should be
opened in O_DIRECT mode, to bypass the file system cache.
In this way, the reads and writes can be submitted with DMA, using
the InnoDB buffer pool directly, and no processor cycles need to be
used for copying data. The use of O_DIRECT benefits not only the
current libaio implementation, but also liburing.
os_file_set_nocache(): Test innodb_flush_method in the function,
not in the callers.
One should not change the program arguments!
This change also reduces warnings from the icc compiler.
Almost all changes are just syntax changes (adding const to
'get_one_option function' declarations).
Other changes:
- Added a few cast of 'argument' from 'const char*' to 'char *'. This
was mainly in calls to 'external' functions we don't have control of.
- Ensure that all reset of 'password command line argument' are similar.
(In almost all cases it was just adding a comment and a cast)
- In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
the command line argument. These places where changed to instead allocate
the option in a MEM_ROOT to avoid changing the argument. Some of this
code was changed to ensure that different programs did parsing the
same way. Added a test case for the changes in mysqlbinlog.cc
- Changed a few variables that took their value from command line options
from 'char *' to 'const char *'.
In commit 3a9a3be1c6 (MDEV-23855)
some previous logic was replaced with the condition
dirty_pct < srv_max_dirty_pages_pct_lwm, which caused
the default value of the parameter innodb_max_dirty_pages_pct_lwm=0
to lose its special meaning: 'refer to innodb_max_dirty_pages_pct instead'.
This implicit special meaning was visible in the function
af_get_pct_for_dirty(), which was removed in
commit f0c295e2de (MDEV-24369).
page_cleaner_flush_pages_recommendation(): Restore the special
meaning that was removed in MDEV-24369.
buf_flush_page_cleaner(): If srv_max_dirty_pages_pct_lwm==0.0,
refer to srv_max_buf_pool_modified_pct. This fixes the observed
performance regression due to excessive page flushing.
buf_pool_t::page_cleaner_wakeup(): Revise the wakeup condition.
innodb_init(): Do initialize srv_max_io_capacity in Mariabackup.
It was previously constantly 0, which caused mariadb-backup --prepare
to hang in buf_flush_sync(), making no progress.