Commit graph

1095 commits

Author SHA1 Message Date
Marko Mäkelä
e71aca8200 Merge 10.9 into 10.10 2022-08-30 13:33:02 +03:00
Marko Mäkelä
c8cd162a0a Merge 10.7 into 10.8 2022-08-30 13:04:17 +03:00
Marko Mäkelä
b86be02ecf Merge 10.6 into 10.7 2022-08-30 13:02:42 +03:00
Marko Mäkelä
f410974f0f Merge 10.5 into 10.6 2022-08-30 13:01:16 +03:00
Marko Mäkelä
259050f864 Merge 10.9 into 10.10 2022-08-29 14:04:25 +03:00
Daniel Black
0324bde846 mariabackup: remove MySQL wording 2022-08-26 11:52:53 +10:00
Daniel Black
79b58f1ca8 MDEV-23607 MariaBackup - align required GRANTS to cmd options
Since the 10.5 split of the privileges, the required GRANTs
for various mariabackup operations has changed.

In the addition of tests, a number of mappings where incorrect:

The option --lock-ddl-per-table didn't require connection admin.

The option --safe-slave-backup requires SLAVE MONITOR even without
the --no-lock option.
2022-08-26 11:52:53 +10:00
Marko Mäkelä
2bddc5d045 Merge 10.7 into 10.8 2022-08-24 10:22:37 +03:00
Marko Mäkelä
bdd80e3fb1 Merge 10.6 into 10.7 2022-08-24 09:22:34 +03:00
Marko Mäkelä
d65a2b7bde Merge 10.5 into 10.6 2022-08-22 14:02:43 +03:00
Marko Mäkelä
1d90d6874d Merge 10.4 into 10.5 2022-08-22 13:38:40 +03:00
Marko Mäkelä
36d173e523 Merge 10.3 into 10.4 2022-08-22 12:34:42 +03:00
Marko Mäkelä
c2df3d30c0 MDEV-21452 fixup: Avoid an unnecessary mutex operation 2022-08-19 09:21:02 +03:00
Marko Mäkelä
a1055ab35d MDEV-29043 mariabackup --compress hangs
Even though commit b817afaa1c passed
the test mariabackup.compress_qpress, that test turned out to be
too small to reveal one more problem that had previously been prevented
by the existence of ctrl_mutex. I did not realize that there can be
multiple concurrent callers to compress_write(). One of them is the
log copying thread; further callers are data file copying threads
(default: --parallel=1).

By default, there is only one compression worker thread
(--compress-threads=1).

compress_write(): Fix a race condition between threads that would
use the same worker thread object. Make thd->data_avail contain the
thread identifier of the submitter, and add thd->avail_cond to
notify other compress_write() threads that are waiting for a slot.
2022-08-19 09:18:24 +03:00
Oleksandr Byelkin
75d631f333 Merge branch '10.7' into 10.8 2022-08-09 09:52:15 +02:00
Oleksandr Byelkin
4c18f68d59 Merge branch '10.9' into 10.10 2022-08-09 09:47:16 +02:00
Oleksandr Byelkin
50b270525a Merge branch '10.7' into 10.8 2022-08-08 17:15:13 +02:00
Oleksandr Byelkin
1d48041982 Merge branch '10.6' into 10.7 2022-08-08 17:12:32 +02:00
Oleksandr Byelkin
d2f1c3ed6c Merge branch '10.5' into bb-10.6-release 2022-08-03 12:19:59 +02:00
Oleksandr Byelkin
af143474d8 Merge branch '10.4' into 10.5 2022-08-03 07:12:27 +02:00
Oleksandr Byelkin
48e35b8cf6 Merge branch '10.3' into 10.4 2022-08-02 14:15:39 +02:00
Sergei Golubchik
5b4154373a only copy buffer pool dump in SST galera mode
and then only into the default name, so that the joiner could find it
2022-08-01 15:53:14 +02:00
Sergei Golubchik
5197519f4f revert mariabackup part of MDEV-27524, fix the test 2022-08-01 15:53:13 +02:00
Sergei Golubchik
e1caa4bd5e don't use ssl for windows named pipes - it doesn't work 2022-07-28 17:18:40 +02:00
Marko Mäkelä
4ce6e78059 Merge 10.9 into 10.10 2022-07-28 11:25:21 +03:00
Marko Mäkelä
f79cebb4d0 Merge 10.7 into 10.8 2022-07-28 10:33:26 +03:00
Marko Mäkelä
742e1c727f Merge 10.6 into 10.7 2022-07-27 18:26:21 +03:00
Marko Mäkelä
30914389fe Merge 10.5 into 10.6 2022-07-27 17:52:37 +03:00
Marko Mäkelä
098c0f2634 Merge 10.4 into 10.5 2022-07-27 17:17:24 +03:00
Oleksandr Byelkin
3bb36e9495 Merge branch '10.3' into 10.4 2022-07-27 11:02:57 +02:00
Thirunarayanan Balathandayuthapani
1d3629875e MDEV-29137 mariabackup excessive logging of ddl tracking
- Remove the FILE_MODIFY message in backup_file_op()
2022-07-26 11:33:52 +05:30
Thirunarayanan Balathandayuthapani
6156a2be30 MDEV-29137 mariabackup excessive logging of ddl tracking
- Remove the FILE_MODIFY message from mariabackup which was
displaying the list of file names which were modified since
the previous checkpoint.
2022-07-25 17:03:40 +05:30
Marko Mäkelä
b817afaa1c MDEV-28689, MDEV-28690: Remove ctrl_mutex
This reverts the revert 4f62dfe676
and fixes the hang that was introduced when ctrl_mutex was removed.

The test mariabackup.compress_qpress covers this code, but the
test is skipped if a stand-alone qpress executable is not available.
It is not available in many software repositories, possibly because
the code base has not been updated since 2010.

This was tested with an executable that was compile from the source
code at http://www.quicklz.com/qpress-11-source.zip (after adding
a missing #include <unistd.h> for the definition of isatty()).

Compared to the grandparent commit (before the revert), the changes
are as follows:

comp_thread_ctxt_t::done_cond: A separate condition for completed
compression, signaling that thd->to_len has been updated.

compress_write(): Replace some threads[i] with thd.
Reset thd->to_len = 0 after consuming the compressed data.

compress_worker_thread_func(): After consuming the uncompressed
data, set thd->data_avail = FALSE. After compressing, signal
thd->done_cond.
2022-07-11 21:00:18 +03:00
Vladislav Vaintroub
4f62dfe676 Revert "MDEV-28689, MDEV-28690: Incorrect error handling for ctrl_mutex"
This reverts commit 863c3eda87.
2022-07-11 15:00:34 +02:00
Marko Mäkelä
155019b96b MDEV-28994 Backup of memory-mapped log is corrupted
An interface to use memory-mapped I/O on the InnoDB redo log that
is stored in persistent memory was introduced
in commit 685d958e38 (MDEV-14425).

log_t::attach(): In mariadb-backup --backup, never attempt to
use memory-mapped I/O for reading the log file of the server.

xtrabackup_copy_logfile(): Assert !log_sys.is_pmem() and remove
the code to deal with a memory-mapped log.

This fixes a race condition scenario of the following type:
1. Backup parsed a mini-transaction from the memory-mapped buffer.
This took some time.
2. Meanwhile, the server might have overwritten this portion
of the circular log_sys.buf.
3. Backup copied the data to the output file while or after
the server had overwritten this portion of the file.
4. Backup failed to notice that a log overrun occurred.

The symptom of this was that a mariadb-backup --prepare of the
log failed. In the analyzed case, the error message was:
[ERROR] InnoDB: Missing FILE_CHECKPOINT(...)

This will also make it possible to run mariadb-backup --backup
under "rr replay".
2022-07-01 18:07:07 +03:00
Marko Mäkelä
d371e35257 Merge 10.9 into 10.10 2022-06-17 11:31:53 +03:00
Marko Mäkelä
cb19e211ec Merge 10.7 into 10.8 2022-06-16 11:15:21 +03:00
Marko Mäkelä
a8c22dae8b Merge 10.6 into 10.7 2022-06-16 10:50:58 +03:00
Marko Mäkelä
5bb90cb2ac Merge 10.5 into 10.6 2022-06-16 10:01:29 +03:00
Vlad Lesin
27309fc6b0 MDEV-28832 infinite loop in mariabackup if log LOG_HEADER_FORMAT field is 0
Avoid the loop with getting rid of back and forth jumping.
2022-06-15 13:30:42 +03:00
Marko Mäkelä
32edabd1f2 Merge 10.9 into 10.10 2022-06-09 15:26:09 +03:00
Marko Mäkelä
57d4a242da Merge 10.7 into 10.8 2022-06-06 16:22:09 +03:00
Marko Mäkelä
7e39470e33 Merge 10.6 into 10.7 2022-06-06 14:56:20 +03:00
Marko Mäkelä
0b47c126e3 MDEV-13542: Crashing on corrupted page is unhelpful
The approach to handling corruption that was chosen by Oracle in
commit 177d8b0c12
is not really useful. Not only did it actually fail to prevent InnoDB
from crashing, but it is making things worse by blocking attempts to
rescue data from or rebuild a partially readable table.

We will try to prevent crashes in a different way: by propagating
errors up the call stack. We will never mark the clustered index
persistently corrupted, so that data recovery may be attempted by
reading from the table, or by rebuilding the table.

This should also fix MDEV-13680 (crash on btr_page_alloc() failure);
it was extensively tested with innodb_file_per_table=0 and a
non-autoextend system tablespace.

We should now avoid crashes in many cases, such as when a page
cannot be read or allocated, or an inconsistency is detected when
attempting to update multiple pages. We will not crash on double-free,
such as on the recovery of DDL in system tablespace in case something
was corrupted.

Crashes on corrupted data are still possible. The fault injection mechanism
that is introduced in the subsequent commit may help catch more of them.

buf_page_import_corrupt_failure: Remove the fault injection, and instead
corrupt some pages using Perl code in the tests.

btr_cur_pessimistic_insert(): Always reserve extents (except for the
change buffer), in order to prevent a subsequent allocation failure.

btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages().

btr_assert_not_corrupted(), btr_corruption_report(): Remove.
Similar checks are already part of btr_block_get().

FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE.

dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(),
trx_undo_page_get_s_latched(): Replaced with error-checking calls.

trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get().

trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed.

trx_sys_create_sys_pages(): Merged with trx_sysf_create().

dict_check_tablespaces_and_store_max_id(): Do not access
DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot().
Merge dict_check_sys_tables() with this function.

dir_pathname(): Replaces os_file_make_new_pathname().

row_undo_ins_remove_sec(): Do not modify the undo page by adding
a terminating NUL byte to the record.

btr_decryption_failed(): Report decryption failures

dict_set_corrupted_by_space(), dict_set_encrypted_by_space(),
dict_set_corrupted_index_cache_only(): Remove.

dict_set_corrupted(): Remove the constant parameter dict_locked=false.
Never flag the clustered index corrupted in SYS_INDEXES, because
that would deny further access to the table. It might be possible to
repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case
no B-tree leaf page is corrupted.

dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(),
row_purge_skip_uncommitted_virtual_index(): Remove, and refactor
the callers to read dict_index_t::type only once.

dict_table_is_corrupted(): Remove.

dict_index_t::is_btree(): Determine if the index is a valid B-tree.

BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove.

UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger
assertion failures, but error codes being returned.

buf_corrupt_page_release(): Replaced with a direct call to
buf_pool.corrupted_evict().

fil_invalid_page_access_msg(): Never crash on an invalid read;
let the caller of buf_page_get_gen() decide.

btr_pcur_t::restore_position(): Propagate failure status to the caller
by returning CORRUPTED.

opt_search_plan_for_table(): Simplify the code.

row_purge_del_mark(), row_purge_upd_exist_or_extern_func(),
row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(),
row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free()
when no secondary indexes exist.

row_undo_mod_upd_exist_sec(): Simplify the code.

row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT
if the clustered index (and therefore the table) is corrupted, similar
to what we do in row_insert_for_mysql().

fut_get_ptr(): Replace with buf_page_get_gen() calls.

buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION
if the page is marked as freed. For other modes than
BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will
trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED,
we will return nullptr for freed pages, so that the callers
can be simplified. The purge of transaction history will be
a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on
corrupted data.

buf_page_get_low(): Never crash on a corrupted page, but simply
return nullptr.

fseg_page_is_allocated(): Replaces fseg_page_is_free().

fts_drop_common_tables(): Return an error if the transaction
was rolled back.

fil_space_t::set_corrupted(): Report a tablespace as corrupted if
it was not reported already.

fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report
out-of-bounds page access or other errors.

Clean up mtr_t::page_lock()

buf_page_get_low(): Validate the page identifier (to check for
recently read corrupted pages) after acquiring the page latch.

buf_page_t::read_complete(): Flag uninitialized (all-zero) pages
with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch.

mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi().

recv_sys_t::free_corrupted_page(): Only set_corrupt_fs()
if any log records exist for the page. We do not mind if read-ahead
produces corrupted (or all-zero) pages that were not actually needed
during recovery.

recv_recover_page(): Return whether the operation succeeded.

recv_sys_t::recover_low(): Simplify the logic. Check for recovery error.

Thanks to Matthias Leich for testing this extensively and to the
authors of https://rr-project.org for making it easy to diagnose
and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
Marko Mäkelä
6b9bba41e8 MDEV-28554: Remove innodb_version
INNODB_VERSION_STR: Replaced with PACKAGE_VERSION (non-functional change).

INNODB_VERSION_SHORT: Replaced with direct use of
MYSQL_VERSION_MAJOR << 8 | MYSQL_VERSION_MINOR.

check_version(): Simplify the mariadb-backup version check,
and require the server version to be MariaDB 10.8 or later,
because that is when the InnoDB redo log format was last changed.
2022-06-03 12:20:19 +03:00
Marko Mäkelä
2f8d0af883 Merge 10.5 into 10.6 2022-06-02 17:39:13 +03:00
Marko Mäkelä
4b3c3e526e Merge 10.4 into 10.5 2022-06-02 16:51:13 +03:00
Marko Mäkelä
96f4b4a55b Merge 10.3 into 10.4 2022-06-02 16:34:17 +03:00
Marko Mäkelä
91d5fffa07 MDEV-28719: compress_write() leaks data_mutex on error 2022-06-01 11:20:47 +03:00
Marko Mäkelä
863c3eda87 MDEV-28689, MDEV-28690: Incorrect error handling for ctrl_mutex
comp_thread_ctxt_t: Remove ctrl_mutex, ctrl_cond, started. We do not
actually need them for anything.

destroy_worker_thread(): Split from destroy_worker_threads().

create_worker_threads(): We already initialize
thd->data_avail=FALSE and thd->cancelled=FALSE before
invoking pthread_create(). If any thread creation fails,
clean up by destroy_worker_thread().

compress_worker_thread_func(): Assume that thd->started and
thd->data_avail are already initialized.

Reviewed by: Vladislav Vaintroub
2022-05-30 15:49:45 +03:00
Sergei Golubchik
b7ffccf49b Merge branch '10.7' into 10.8 2022-05-18 13:26:48 +02:00
Sergei Golubchik
99a433ed1c Merge branch '10.6' into 10.7 2022-05-18 10:34:38 +02:00
Marko Mäkelä
daa2680c78 Merge 10.5 into 10.6 2022-05-12 08:11:57 +03:00
Vlad Lesin
3fabdc3ca8 MDEV-28473 field_ref_zero is not initialized in xtrabackup_prepare_func()
The solution is to initialize field_ref_zero in main_low() before
xtrabackup_backup_func() and xtrabackup_prepare_func() calls.
2022-05-11 17:20:31 +03:00
Sergei Golubchik
443c2a715d Merge branch '10.7' into 10.8 2022-05-11 12:21:36 +02:00
Sergei Golubchik
fd132be117 Merge branch '10.6' into 10.7 2022-05-11 11:25:33 +02:00
Sergei Golubchik
3bc98a4ec4 Merge branch '10.5' into 10.6 2022-05-10 14:01:23 +02:00
Sergei Golubchik
ef781162ff Merge branch '10.4' into 10.5 2022-05-09 22:04:06 +02:00
Sergei Golubchik
a70a1cf3f4 Merge branch '10.3' into 10.4 2022-05-08 23:03:08 +02:00
Oleksandr Byelkin
9614fde1aa Merge branch '10.2' into 10.3 2022-05-03 10:59:54 +02:00
Alexander Barkov
680ca15269 MDEV-28446 mariabackup prepare fails for incrementals if a new schema is created after full backup is taken
When "mariabackup --target-dir=$basedir --incremental-dir=$incremental_dir"
is running and is moving a new table file (e.g. `db1/t1.new`) from the
incremental directory to the base directory, it needs to verify that the base
backup database directory (e.g. `$basedir/db1`) really exists
(or create it otherwise).

The table `db1/t1` can come from a new database `db1` which
was created during the base mariabackup execution time.

In such case the directory `db1` exists only in the incremental directory,
but does not exist in the base directory.
2022-05-02 11:21:10 +04:00
Marko Mäkelä
133c2129cd Merge 10.7 into 10.8 2022-04-27 10:43:00 +03:00
Marko Mäkelä
638afc4acf Merge 10.6 into 10.7 2022-04-26 18:59:40 +03:00
Alexander Barkov
907e4c62ce MDEV-21037 mariabackup does not detect multi-source replication slave 2022-04-25 15:00:09 +04:00
Marko Mäkelä
fae0ccad6e Merge 10.5 into 10.6 2022-04-21 17:46:40 +03:00
Marko Mäkelä
620c55e708 Merge 10.4 into 10.5 2022-04-21 15:33:50 +03:00
Vlad Lesin
1b558dd462 MDEV-27919 mariabackup --log-copy-interval is measured in millisecondss in 10.5 and in microseconds in 10.6
Multiply polling interval by 1000.
2022-04-21 15:24:59 +03:00
Marko Mäkelä
394784095e Merge 10.3 into 10.4 2022-04-21 11:33:59 +03:00
Sergei Golubchik
bbdec04d59 MDEV-24317 Data race in LOGGER::init_error_log at sql/log.cc:1443 and in LOGGER::error_log_print at sql/log.cc:1181
don't initialize error_log_handler_list in set_handlers()
* error_log_handler_list is initialized to LOG_FILE early, in init_base()
* set_handlers always reinitializes it to LOG_FILE, so it's pointless
* after init_base() concurrent threads start using sql_log_warning,
  so following set_handlers() shouldn't modify error_log_handler_list
  without some protection
2022-04-12 13:07:20 +02:00
Marko Mäkelä
5d8dcfd86c MDEV-25975: Merge 10.4 into 10.5 2022-04-06 10:30:49 +03:00
Marko Mäkelä
d172df9913 MDEV-25975: Merge 10.3 into 10.4 2022-04-06 09:18:38 +03:00
Marko Mäkelä
e9735a8185 MDEV-25975 innodb_disallow_writes causes shutdown to hang
We will remove the parameter innodb_disallow_writes because it is badly
designed and implemented. The parameter was never allowed at startup.
It was only internally used by Galera snapshot transfer.
If a user executed
SET GLOBAL innodb_disallow_writes=ON;
the server could hang even on subsequent read operations.

During Galera snapshot transfer, we will block writes
to implement an rsync friendly snapshot, as follows:

sst_flush_tables() will acquire a global lock by executing
FLUSH TABLES WITH READ LOCK, which will block any writes
at the high level.

sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
will suspend or disable InnoDB background tasks or threads that could
initiate writes. As part of this, log_make_checkpoint() will be invoked
to ensure that anything in the InnoDB buf_pool.flush_list will be written
to the data files. This has the nice side effect that the Galera joiner
will avoid crash recovery.

The changes to sql/wsrep.cc and to the tests are based on a prototype
that was developed by Jan Lindström.

Reviewed by: Jan Lindström
2022-04-06 08:06:49 +03:00
Marko Mäkelä
dce8a846ae Merge 10.7 into 10.8 2022-03-03 11:34:58 +02:00
Marko Mäkelä
64ea3eab8f Merge 10.6 into 10.7 2022-03-03 11:11:00 +02:00
Otto Kekäläinen
1fa872f6ef Fix various spelling errors
Among others:
existance -> existence
reinitialze -> reinitialize
successfuly -> successfully
2022-03-03 13:42:49 +11:00
Marko Mäkelä
32d741b5b0 Merge 10.7 into 10.8 2022-02-25 16:24:13 +02:00
Marko Mäkelä
3d88f9f34c Merge 10.6 into 10.7 2022-02-25 16:09:16 +02:00
Marko Mäkelä
a23414dd32 MDEV-27939 Log buffer wrap-around errors on PMEM
When the log is stored in persistent memory, log_sys.buf[] is
a ring buffer that directly maps to the circular ib_logfile0 file.

There were several errors that could occur in the special case
when a log record ends exactly at the end of the log file and the
next record would start at log_sys.buf[log_sys.START_OFFSET].

mariabackup.huge_lsn,strict_full_crc32: Write the first record
at the very end of the circular file, to reproduce the failure
scenarios.

recv_sys_t::parse(): On PMEM, wrap the end offset of the record
from log_sys.file_size to log_sys.START_OFFSET if needed.
Otherwise, both InnoDB recovery and mariadb-backup would try
to parse the next record from an invalid address.

filename_to_spacename(): Remove an assumption about the format
of file names. While the server currently writes file names like
./databasename/tablename.ibd we might want to stop writing the
redundant ./ prefix in the future. The test mariabackup.huge_lsn
is generating such file names.

xtrabackup_copy_logfile(): Correctly copy a record that ends at
the very end of the log_sys.buf[].

The errors in mariadb-backup were reproduced with the test
mariabackup.huge_lsn,strict_full_crc32 and an additional patch
to use the start checkpoint of the test:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 27dce5fa17d..e17a1692d6f 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -1796,7 +1796,8 @@ dberr_t recv_sys_t::find_checkpoint()
         continue;
       }

-      if (checkpoint_lsn >= log_sys.next_checkpoint_lsn)
+      if (checkpoint_lsn >= log_sys.next_checkpoint_lsn &&
+          checkpoint_lsn != 0x1000fffffe10)
       {
         log_sys.next_checkpoint_lsn= checkpoint_lsn;
         log_sys.next_checkpoint_no= field == log_t::CHECKPOINT_1;
2022-02-25 15:50:09 +02:00
Marko Mäkelä
6daf8f8a0d Merge 10.5 into 10.6 2022-02-25 13:48:47 +02:00
Marko Mäkelä
b791b942e1 Merge 10.4 into 10.5 2022-02-25 13:27:41 +02:00
Marko Mäkelä
f5ff7d09c7 Merge 10.3 into 10.4 2022-02-25 13:00:48 +02:00
Marko Mäkelä
00b70bbb51 Merge 10.2 into 10.3 2022-02-25 10:43:38 +02:00
Julius Goryavsky
17e0f5224c MDEV-27524: Incorrect binlogs after Galera SST using rsync and mariabackup
This commit adds correct handling of binlogs for SST using rsync
or mariabackup. Before this fix, binlogs were handled incorrectly -
- only one (last) binary log file was transferred during SST, which
then led to various failures (for example, when trying to list all
events from the binary log). These bugs were long masked by flaws
in the primitive binlogs handling code in the SST scripts, which
causing binary logs files to be erased after transfer or not added
to the binlog index on the joiner node. Now the correct transfer
of all binary logs (not just the last of the binary log files) has
been implemented both for the rsync (at the script level) and for
the mariabackup (at the level of the main utility code).

This commit also adds a new sst_max_binlogs=<n> parameter, which
can be located in the [sst] section or in the [xtrabackup] section
(historically, supported for mariabackup only, not for rsync), or
in one of the server sections. This parameter specifies the number
of binary log files to be sent to the joiner node during SST. This
option is added for compatibility with old SST scripting behavior,
which can be emulated by setting the sst_max_binlogs=1 (although
in general this can cause problems for the reasons described above).
In addition, setting the sst_max_binlogs=0 can be used to suppress
the transmission of binary logs to the joiner nodes during SST
(although sometimes a single file with the current binary log can
still be transmitted to the joiner, even with sst_max_binlogs=0,
because this sometimes necessary in modes that involve the use of
GTIDs with Galera).

Also, this commit ensures correct handling of paths to various
innodb files and directories in the SST scripts, and fixes some
problems with this that existed in mariabackup utility (which
were associated with incorrect handling of the innodb_data_dir
parameter in some scenarios).

In addition, this commit contains the following enhancements:

 1) Added tests for mtr, which check the correct work with binlogs
    after SST (using rsync and mariabackup);
 2) Added correct handling of slashes at the end of all paths that
    the SST script receives as parameters;
 3) Improved parsing code for --mysqld-args parameters. Now it
    correctly processes the sequence "--" after the name of the
    one-letter option;
 4) Checking the secret signature during joiner authentication
    is made independent of presence of bash (as a unix shell)
    in the system and diff utility no longer needed to check
    certificates compliance;
 5) All directories that are necessary for the correct placement
    of various logs are automatically created by SST scripts in
    advance (before running mariabackup on the joiner node);
 6) Removal of old binary logs on joiner is done using the binlog
    index (if it exists) (not only by fixed pattern that based
    on the current binlog name, as before);
 7) Paths for placing binary logs are correctly processed if they
    are set as relative paths (to the datadir);
 8) SST scripts are made even more resistant to spaces in filenames
    (now for binlogs);
 9) In case of failure, SST scripts now always end with an exit
    code other than zero;
10) SST script for rsync now correctly create a tar file with
    the binlogs, even if the paths to them (in the binlog index
    file) are specified as a mix of absolute and relative paths,
    and even if they do not match with the datadir path specified
    in the current configuration settings.
2022-02-22 10:45:06 +01:00
Julius Goryavsky
571eb9d775 mariabackup: cosmetic changes (whitespaces and indentation) 2022-02-22 10:20:58 +01:00
Marko Mäkelä
f1beeb58e6 MDEV-27848: Remove unused wait/io/file/innodb/innodb_log_file
The performance_schema counter wait/io/file/innodb/innodb_log_file
is always reported as 0.

The way how redo log writes are being waited for was refactored in
commit 30ea63b7d2 by the introduction
of flush_lock and write_lock. Even before that change, all the
wait/io/file/innodb/ counters were always 0 in my tests.

Moreover, if the PMEM interface that was introduced in
commit 3daef523af
is being used, writes to the InnoDB log file will completely avoid
any system calls and performance_schema instrumentation.
In commit 685d958e38 also the reads
of the redo log (during recovery) would bypass any system calls.
2022-02-15 15:03:15 +02:00
Marko Mäkelä
a635c40648 MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit()
A prominent bottleneck in mtr_t::commit() is log_sys.mutex between
log_sys.append_prepare() and log_close().

User-visible change: The minimum innodb_log_file_size will be
increased from 1MiB to 4MiB so that some conditions can be
trivially satisfied.

log_sys.latch (log_latch): Replaces log_sys.mutex and
log_sys.flush_order_mutex. Copying mtr_t::m_log to
log_sys.buf is protected by a shared log_sys.latch.
Writes from log_sys.buf to the file system will be protected
by an exclusive log_sys.latch.

log_sys.lsn_lock: Protects the allocation of log buffer
in log_sys.append_prepare().

sspin_lock: A simple spin lock, for log_sys.lsn_lock.

Thanks to Vladislav Vaintroub for suggesting this idea, and for
reviewing these changes.

mariadb-backup: Replace some use of log_sys.mutex with recv_sys.mutex.

buf_pool_t::insert_into_flush_list(): Implement sorting of flush_list
because ordering is otherwise no longer guaranteed. Ordering by LSN
is needed for the proper operation of redo log checkpoints.

log_sys.append_prepare(): Advance log_sys.lsn and log_sys.buf_free by
the length, and return the old values. Also increment write_to_buf,
which was previously done in log_close().

mtr_t::finish_write(): Obtain the buffer pointer from
log_sys.append_prepare().

log_sys.buf_free: Make the field Atomic_relaxed,
to simplify log_flush_margin(). Use only loads and stores
to avoid costly read-modify-write atomic operations.

buf_pool.flush_list_requests: Replaces
export_vars.innodb_buffer_pool_write_requests
and srv_stats.buf_pool_write_requests.
Protected by buf_pool.flush_list_mutex.

buf_pool_t::insert_into_flush_list(): Do not invoke page_cleaner_wakeup().
Let the caller do that after a batch of calls.

recv_recover_page(): Invoke a minimal part of
buf_pool.insert_into_flush_list().

ReleaseBlocks::modified: A number of pages added to buf_pool.flush_list.

ReleaseBlocks::operator(): Merge buf_flush_note_modification() here.

log_t::set_capacity(): Renamed from log_set_capacity().
2022-02-10 16:37:12 +02:00
Marko Mäkelä
8c7c92adf3 MDEV-27787 mariadb-backup --backup is allocating extra memory for log records
In commit 685d958e38 (MDEV-14425),
the log parsing in mariadb-backup --backup was rewritten.
The parameter STORE_IF_EXISTS that is being passed to recv_sys.parse_mtr()
or recv_sys.parse_pmem() instead of STORE_NO caused unnecessary additional
memory allocation for redo log records.
2022-02-10 15:39:27 +02:00
Oleksandr Byelkin
4fb2cb1a30 Merge branch '10.7' into 10.8 2022-02-04 14:50:25 +01:00
Oleksandr Byelkin
9ed8deb656 Merge branch '10.6' into 10.7 2022-02-04 14:11:46 +01:00
Thirunarayanan Balathandayuthapani
8d742fe4ac MDEV-26326 mariabackup skip valid ibd file
- Store the deferred tablespace name while loading the tablespace
for backup process.

- Mariabackup stores the list of space ids which has page0 INIT_PAGE
records. backup_first_page_op() and first_page_init() was introduced
to track the page0 INIT_PAGE records.

- backup_file_op() and log_file_op() was changed to handle
FILE_MODIFY redo log records. It is used to identify the
deferred tablespace space id.

- Whenever file operation redo log was processed by backup,
backup_file_op() should check whether the space name exist
in deferred tablespace. If it is then it needs to store the
space id, name when FILE_MODIFY, FILE_RENAME redo log processed
and it should delete the tablespace name from defer list in other
cases.

- backup_fix_ddl() should check whether deferred tablespace has
any page0 init records. If it is then consider the tablespace
as newly created tablespace. If not then backup should try
to reload the tablespace with SRV_BACKUP_NO_DEFER mode to
avoid the deferring of tablespace.
2022-02-01 19:50:08 +05:30
Marko Mäkelä
c64e507fad MDEV-27621 Backup fails with FATAL ERROR: Was only able to copy log
In commit 685d958e38 (MDEV-14425)
a bug was introduced to mariadb-backup --backup for the case when
the log is wrapping around to log_sys.START_OFFSET (12288).

This could also cause a "Missing FILE_CHECKPOINT" error during
mariadb-backup --prepare, in case the log copying resumed after
the server had produced a multiple of innodb_log_file_size-12288
bytes of more log so that the last mini-transaction would end
exactly at the end of the log file.

xtrabackup_copy_logfile(): If the log wraps around, read everything
to the end of the log file, and then the rest from log_sys.START_OFFSET.
2022-01-27 16:17:40 +02:00
Marko Mäkelä
685d958e38 MDEV-14425 Improve the redo log for concurrency
The InnoDB redo log used to be formatted in blocks of 512 bytes.
The log blocks were encrypted and the checksum was calculated while
holding log_sys.mutex, creating a serious scalability bottleneck.

We remove the fixed-size redo log block structure altogether and
essentially turn every mini-transaction into a log block of its own.
This allows encryption and checksum calculations to be performed
on local mtr_t::m_log buffers, before acquiring log_sys.mutex.
The mutex only protects a memcpy() of the data to the shared
log_sys.buf, as well as the padding of the log, in case the
to-be-written part of the log would not end in a block boundary of
the underlying storage. For now, the "padding" consists of writing
a single NUL byte, to allow recovery and mariadb-backup to detect
the end of the circular log faster.

Like the previous implementation, we will overwrite the last log block
over and over again, until it has been completely filled. It would be
possible to write only up to the last completed block (if no more
recent write was requested), or to write dummy FILE_CHECKPOINT records
to fill the incomplete block, by invoking the currently disabled
function log_pad(). This would require adjustments to some logic around
log checkpoints, page flushing, and shutdown.

An upgrade after a crash of any previous version is not supported.
Logically empty log files from a previous version will be upgraded.

An attempt to start up InnoDB without a valid ib_logfile0 will be
refused. Previously, the redo log used to be created automatically
if it was missing. Only with with innodb_force_recovery=6, it is
possible to start InnoDB in read-only mode even if the log file
does not exist. This allows the contents of a possibly corrupted
database to be dumped.

Because a prepared backup from an earlier version of mariadb-backup
will create a 0-sized log file, we will allow an upgrade from such
log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system
tablespace looks valid.

The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced
with 64-byte log checkpoint blocks at 0x1000 and 0x2000.

The start of log records will move from 0x800 to 0x3000. This allows us
to use 4096-byte aligned blocks for all I/O in a future revision.

We extend the MDEV-12353 redo log record format as follows.

(1) Empty mini-transactions or extra NUL bytes will not be allowed.
(2) The end-of-minitransaction marker (a NUL byte) will be replaced
with a 1-bit sequence number, which will be toggled each time when the
circular log file wraps back to the beginning.
(3) After the sequence bit, a CRC-32C checksum of all data
(excluding the sequence bit) will written.
(4) If the log is encrypted, 8 bytes will be written before
the checksum and included in it. This is part of the
initialization vector (IV) of encrypted log data.
(5) File names, page numbers, and checkpoint information will not be
encrypted. Only the payload bytes of page-level log will be encrypted.
The tablespace ID and page number will form part of the IV.
(6) For padding, arbitrary-length FILE_CHECKPOINT records may be written,
with all-zero payload, and with the normal end marker and checksum.
The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON.

In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will
no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup
will require a valid log file. When resizing the log, we will create
a logically empty ib_logfile101 at the current LSN and use an atomic rename
to replace ib_logfile0 with it. See the test innodb.log_file_size.

Because there is no mandatory padding in the log file, we are able
to create a dummy log file as of an arbitrary log sequence number.
See the test mariabackup.huge_lsn.

The parameter innodb_log_write_ahead_size and the
INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed.

The minimum value of innodb_log_buffer_size will be increased to 2MiB
(because log_sys.buf will replace recv_sys.buf) and the increment
adjusted to 4096 bytes (the maximum log block size).

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed:

os_log_fsyncs
os_log_pending_fsyncs
log_pending_log_flushes
log_pending_checkpoint_writes

The following status variables will be removed:

Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs)
Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design)

log_sys.get_block_size(): Return the physical block size of the log file.
This is only implemented on Linux and Microsoft Windows for now, and for
the power-of-2 block sizes between 64 and 4096 bytes (the minimum and
maximum size of a checkpoint block). If the block size is anything else,
the traditional 512-byte size will be used via normal file system
buffering.

If the file system buffers can be bypassed, a message like the following
will be issued:

InnoDB: File system buffers for log disabled (block size=512 bytes)
InnoDB: File system buffers for log disabled (block size=4096 bytes)

This has been tested on Linux and Microsoft Windows with both sizes.

On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC.
Tests in 3 different environments where the log is stored in a device
with a physical block size of 512 bytes are yielding better throughput
without O_DIRECT. This could be due to the fact that in the event the
last log block is being overwritten (if multiple transactions would
become durable at the same time, and each of will write a small
number of bytes to the last log block), it should be faster to re-copy
data from log_sys.buf or log_sys.flush_buf to the kernel buffer,
to be finally written at fdatasync() time.

The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for
data files. This option will enable O_DIRECT on the log file on Linux.
It may be unsafe to use when the storage device does not support
FUA (Force Unit Access) mode.

When the server is compiled WITH_PMEM=ON, we will use memory-mapped
I/O for the log file if the log resides on a "mount -o dax" device.
We will identify PMEM in a start-up message:

InnoDB: log sequence number 0 (memory-mapped); transaction id 3

On Linux, we will also invoke mmap() on any ib_logfile0 that resides
in /dev/shm, effectively treating the log file as persistent memory.
This should speed up "./mtr --mem" and increase the test coverage of
PMEM on non-PMEM hardware. It also allows users to estimate how much
the performance would be improved by installing persistent memory.
On other tmpfs file systems such as /run, we will not use mmap().

mariadb-backup: Eliminated several variables. We will refer
directly to recv_sys and log_sys.

backup_wait_for_lsn(): Detect non-progress of
xtrabackup_copy_logfile(). In this new log format with
arbitrary-sized blocks, we can only detect log file overrun
indirectly, by observing that the scanned log sequence number
is not advancing.

xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit,
because we are not allowed to modify the server's log file, and our
memory mapping is read-only.

trx_flush_log_if_needed_low(): Do not use the callback on pmem.
Using neither flush_lock nor write_lock around PMEM writes seems
to yield the best performance. The pmem_persist() calls may
still be somewhat slower than the pwrite() and fdatasync() based
interface (PMEM mounted without -o dax).

recv_sys_t::buf: Remove. We will use log_sys.buf for parsing.

recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE.

recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn.

recv_sys_t, log_sys_t: Removed many data members.

recv_sys.lsn: Renamed from recv_sys.recovered_lsn.
recv_sys.offset: Renamed from recv_sys.recovered_offset.
log_sys.buf_size: Replaces srv_log_buffer_size.

recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset]
when the buffer is being allocated from the memory heap.

recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is
backed by ib_logfile0. The pointer will wrap from recv_sys.len
(log_sys.file_size) to log_sys.START_OFFSET. For the record that
wraps around, we may copy file name or record payload data to
the auxiliary buffer decrypt_buf in order to have a contiguous
block of memory. The maximum size of a record is less than
innodb_page_size bytes.

recv_sys_t::parse(): Take the smart pointer as a template parameter.
Do not temporarily add a trailing NUL byte to FILE_ records, because
we are not supposed to modify the memory-mapped log file. (It is
attached in read-write mode already during recovery.)

recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse().

recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be
returned on PMEM, use recv_ring to wrap around the buffer to the start.

mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free
on PMEM, because it has no meaning on the mmap-based log.

log_sys.write_to_buf: Count writes to log_sys.buf. Replaces
srv_stats.log_write_requests and export_vars.innodb_log_write_requests.
Protected by log_sys.mutex. Updated consistently in log_close().
Previously, mtr_t::commit() conditionally updated the count,
which was inconsistent.

log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf,
for writing to log_sys.log (the ib_logfile0). Replaces
srv_stats.log_writes and export_vars.innodb_log_writes.
Protected by log_sys.mutex.

log_sys.waits: Count waits in append_prepare(). Replaces
srv_stats.log_waits and export_vars.innodb_log_waits.

recv_recover_page(): Do not unnecessarily acquire
log_sys.flush_order_mutex. We are inserting the blocks in arbitary
order anyway, to be adjusted in recv_sys.apply(true).

We will change the definition of flush_lock and write_lock to
avoid potential false sharing. Depending on sizeof(log_sys) and
CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could
share a cache line with each other or with the last data members
of log_sys.

Thanks to Matthias Leich for providing https://rr-project.org traces
for various failures during the development, and to
Thirunarayanan Balathandayuthapani for his help in debugging
some of the recovery code. And thanks to the developers of the
rr debugger for a tool without which extensive changes to InnoDB
would be very challenging to get right.

Thanks to Vladislav Vaintroub for useful feedback and
to him, Axel Schwenke and Krunal Bauskar for testing the performance.
2022-01-21 16:03:47 +02:00
Daniel Black
d434250ee1 MDEV-25342: autosize innodb_buffer_pool_chunk_size
The previous default innodb_buffer_pool_chunk_size of 128M
made sense when the innodb buffer pool size was a few GB.

When the pool size is 128GB this means the chunk size is 0.1%
of this. Fine tuning the buffer pool size on such a fine
increment doesn't make practical sense. Also on extremely
large buffer pool systems, initializing on the default 128M can
also take a considerable amount of time.

When large pages are enabled, the chunk size has to be a multiple
of an available large page size or memory allocation without
use can occur.

Previously the default 0 was documented as disabling resizing.
With srv_buf_pool_chunk_unit > 0 assertions in the code and the
minimium value set, I doubt this was ever the case.

As such the autosizing (based on default 0) takes place as follows:
* a 64th of the innodb_buffer_pool_size
* if large pages, this is rounded down the the nearest multiple
  of the large page size.
* If less than 1MB, set to 1MB.

This does mean the new default innodb_buffer_pool_chunk size is
2MB, derived form the above formular with 128MB as the buffer pool
size.

The innodb_buffer_pool_chunk_size is changed to a size_t for
better compatiblity with the memory allocations which use size_t.
The previous upper limit is changed to the maxium of a size_t. The
maximium value used is the buffer pool size anyway.

Getting this default value of the chunk size to a more practical
size facilitates further development of more automated resizing
without significant overhead or memory fragmentation.

innodb_buffer_pool_resize test adjusted based on 1M default
chunk size thanks Wlad.
2022-01-18 14:20:57 +02:00
Marko Mäkelä
c22107fd90 Merge 10.6 into 10.7 2021-11-29 11:42:07 +02:00
Marko Mäkelä
51c89849d1 Merge 10.5 into 10.6 2021-11-29 11:39:34 +02:00
Marko Mäkelä
d4cb177603 Merge 10.4 into 10.5 2021-11-29 11:16:20 +02:00
Marko Mäkelä
4da2273876 Merge 10.3 into 10.4 2021-11-29 10:59:22 +02:00
Marko Mäkelä
289721de9a Merge 10.2 into 10.3 2021-11-29 10:33:06 +02:00
ryancaicse
f809a4fbd0 MDEV-26558 Fix a deadlock due to cyclic dependence
Fix a potential deadlock bug between locks ctrl_mutex and entry->mutex
2021-11-24 12:57:44 +02:00
Alexey Bychko
fe065f8d90 MDEV-22522 RPM packages have meaningless summary/description
this patch moves cpack summury and description for optional packages
to the appropriate CMakeLists.txt files
2021-11-23 11:29:24 +07:00
Vladislav Vaintroub
009f3e06f3 improve build, allow sql library to be built in parallel with builtins 2021-11-09 17:02:45 +02:00
Sergei Krivonos
f7c6c02a06 Revert "improve build, allow sql library to be built in parallel with builtins"
This reverts commit 1a3570dec3.
2021-11-09 15:44:07 +02:00
Vladislav Vaintroub
1a3570dec3 improve build, allow sql library to be built in parallel with builtins 2021-11-09 12:06:49 +02:00
Marko Mäkelä
06988bdcaa Merge 10.6 into 10.7 2021-11-09 09:40:29 +02:00
Marko Mäkelä
25ac047baf Merge 10.5 into 10.6 2021-11-09 09:11:50 +02:00
Marko Mäkelä
9c18b96603 Merge 10.4 into 10.5 2021-11-09 08:50:33 +02:00
Marko Mäkelä
47ab793d71 Merge 10.3 into 10.4 2021-11-09 08:40:14 +02:00
Marko Mäkelä
524b4a89da Merge 10.2 into 10.3 2021-11-09 08:26:59 +02:00
Daniel Black
7c30bc38a5 MDEV-26561 mariabackup release locks
The previous threads locked need to be released too.

This occurs if the initialization of any of the non-first
mutex/conditition variables errors occurs.
2021-11-09 17:05:55 +11:00
ryancaicse
e1eb39a446 MDEV-26561 Fix a bug due to unreleased lock
Fix a bug of unreleased lock ctrl_mutex in the method create_worker_threads
2021-11-09 17:05:55 +11:00
Oleksandr Byelkin
167eac7d43 Merge branch '10.6' into 10.7 2021-11-03 11:30:27 +01:00
Marko Mäkelä
3dc0d884ec MDEV-26674 workaround for mariadb-backup
This is follow-up to commit 1193a793c4.
We will set innodb_use_native_aio=OFF by default also in mariadb-backup
when running on a potentially affected kernel.
2021-11-02 15:24:20 +02:00
Sergei Golubchik
db20c77782 mariabackup: rename encryption_plugin -> xb_plugin
because plugin code is not only about encryption anymore
(also loads provider plugins), and xb_ prefix prevents name
clashes with the server code (that mariabackup links with).
2021-10-27 15:55:14 +02:00
Sergei Golubchik
8c806c4152 MDEV-26794 MariaBackup does not recognize added providers upon prepare of incremental backup
prefer backup-my.cnf from the incremental-dir over the one in target-dir
2021-10-27 15:55:14 +02:00
Sergei Golubchik
2be6804650 MDEV-26791 MariaBackup logs compression provider plugins as encryption plugin 2021-10-27 15:55:14 +02:00
Sergei Golubchik
b91acd405a MDEV-26773 MariaBackup backup does not work with compression providers
make mariabackup to load not only encryption but also provider plugins.
2021-10-27 15:55:14 +02:00
Kartik Soneji
bf8b699f64 MDEV-12933 sort out the compression library chaos
bzip2/lz4/lzma/lzo/snappy compression is now provided via *services*

they're almost like normal services, but in include/providers/
and they're supposed to provide exactly the same interface
as original compression libraries (but not everything,
only enough of if for the code to compile).

the services are implemented via dummy functions that return
corresponding error values (LZMA_PROG_ERROR, LZO_E_INTERNAL_ERROR, etc).

the actual compression libraries are linked into corresponding
provider plugins. Providers are daemon plugins that when loaded
replace service pointers to point to actual compression functions.

That is, run-time dependency on compression libraries is now on plugins,
and the server doesn't need any compression libraries to run, but
will automatically support the compression when a plugin is loaded.

InnoDB and Mroonga use compression plugins now. RocksDB doesn't,
because it comes with standalone utility binaries that cannot
load plugins.
2021-10-27 15:55:14 +02:00
Marko Mäkelä
79185bd056 Merge 10.6 into 10.7 2021-09-24 15:32:39 +03:00
Marko Mäkelä
d95361107c Merge 10.5 into 10.6 2021-09-24 14:38:52 +03:00
Marko Mäkelä
7e2b42324c Merge 10.4 into 10.5 2021-09-24 08:42:23 +03:00
Marko Mäkelä
9024498e88 Merge 10.3 into 10.4 2021-09-22 18:26:54 +03:00
Marko Mäkelä
b46cf33ab8 Merge 10.2 into 10.3 2021-09-22 18:01:41 +03:00
Vladislav Vaintroub
b1351c1594 MDEV-26574 An improper locking bug due to unreleased lock in the ds_xbstream.cc
release lock in all as cases n xbstream_open, also fix the case where malloc would return NULL.
2021-09-15 14:55:45 +02:00
Vladislav Vaintroub
8937762ead MDEV-26573 : A static analyzer warning about ds_archive.cc
This file had not been compiled for long time.
Remove this from the tree.
2021-09-15 14:19:24 +02:00
Marko Mäkelä
4be366111b Merge 10.6 into 10.7 2021-09-11 18:01:31 +03:00
Marko Mäkelä
15139964d5 Merge 10.5 into 10.6 2021-09-11 17:55:27 +03:00
Vicențiu Ciorbaru
7c33ecb665 Merge remote-tracking branch 'upstream/10.4' into 10.5 2021-09-10 17:16:18 +03:00
Vicențiu Ciorbaru
de7e027d5e Merge remote-tracking branch 'upstream/10.3' into 10.4 2021-09-09 09:23:35 +03:00
Vicențiu Ciorbaru
b85b8348e7 Merge branch '10.2' into 10.3 2021-09-07 16:32:35 +03:00
Vladislav Vaintroub
d6b7738dcc Fix potential null pointer access after the allocation error 2021-09-01 18:21:34 +02:00
Marko Mäkelä
58aaa67064 Merge 10.6 into 10.7 2021-08-31 11:01:19 +03:00
Marko Mäkelä
e94172c2a0 Merge 10.5 into 10.6 2021-08-31 11:00:41 +03:00
Marko Mäkelä
e62120cec7 Merge 10.4 into 10.5 2021-08-31 10:04:56 +03:00
Marko Mäkelä
0464761126 Merge 10.3 into 10.4 2021-08-31 09:22:21 +03:00
Marko Mäkelä
e835cc851e Merge 10.2 into 10.3 2021-08-31 08:36:59 +03:00
Sergei Golubchik
fe2a7048e7 typo fixed 2021-08-26 15:13:48 +02:00
Marko Mäkelä
64f7dffcc7 Merge 10.6 into 10.7 2021-08-23 11:28:08 +03:00
Marko Mäkelä
49f95c4065 Merge 10.5 into 10.6 2021-08-23 11:21:33 +03:00
Marko Mäkelä
2c9f2a4c8c Merge 10.4 into 10.5 2021-08-23 11:10:59 +03:00
Vladislav Vaintroub
1002703baa MDEV-19712 backup stages commented out.
Remove commented out code, so that occasional reader is not confused.
2021-08-20 00:25:43 +02:00
Marko Mäkelä
3bf42eb21b Merge 10.6 into 10.7 2021-08-19 13:03:48 +03:00
Marko Mäkelä
f3fcf5f45c Merge 10.5 to 10.6 2021-08-19 12:25:00 +03:00
Marko Mäkelä
4a25957274 Merge 10.4 into 10.5 2021-08-18 18:22:35 +03:00
Vladislav Vaintroub
582cf12f94 mariabackup - fix string format in error message 2021-08-11 11:40:56 +02:00
Marko Mäkelä
ddcb242b3c Merge 10.6 into 10.7 2021-08-04 11:52:39 +03:00
Oleksandr Byelkin
6efb5e9f5e Merge branch '10.5' into 10.6 2021-08-02 10:11:41 +02:00
Oleksandr Byelkin
ae6bdc6769 Merge branch '10.4' into 10.5 2021-07-31 23:19:51 +02:00
Oleksandr Byelkin
7841a7eb09 Merge branch '10.3' into 10.4 2021-07-31 22:59:58 +02:00
Leandro Pacheco
2b84e1c966 MDEV-23080: desync and pause node on BACKUP STAGE BLOCK_DDL
make BACKUP STAGE behave as FTWRL, desyncing and pausing the node
to prevent BF threads (appliers) from interfering with blocking stages.
This is needed because BF threads don't respect BACKUP MDL locks.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-07-27 08:11:41 +03:00
Marko Mäkelä
a880ef57ef MDEV-26195 fixup: Format mismatch in mariabackup
write_backup_config_file(): Use the correct format for
innodb_undo_tablespaces. The data type was changed in
commit ca501ffb04.
2021-07-24 21:42:03 +03:00
Sergei Golubchik
b34cafe9d9 cleanup: move thread_count to THD_count::value()
because the name was misleading, it counts not threads, but THDs,
and as THD_count is the only way to increment/decrement it, it
could as well be declared inside THD_count.
2021-07-24 12:37:50 +02:00
Marko Mäkelä
b50ea90063 Merge 10.2 into 10.3 2021-07-22 18:57:54 +03:00
Marko Mäkelä
ca501ffb04 MDEV-26195: Use a 32-bit data type for some tablespace fields
In the InnoDB data files, we allocate 32 bits for tablespace identifiers
and page numbers as well as tablespace flags. But, in main memory
data structures we allocate 32 or 64 bits, depending on the register
width of the processor. Let us always use 32-bit fields to eliminate
a mismatch and reduce the memory footprint on 64-bit systems.
2021-07-22 11:22:47 +03:00
Marko Mäkelä
cee37b5d26 Merge 10.6 into 10.7 2021-07-22 11:22:09 +03:00
Marko Mäkelä
641f09398f Merge 10.5 into 10.6 2021-07-22 10:11:08 +03:00
Marko Mäkelä
82d5994520 MDEV-26110: Do not rely on alignment on static allocation
It is implementation-defined whether alignment requirements
that are larger than std::max_align_t (typically 8 or 16 bytes)
will be honored by the compiler and linker.

It turns out that on IBM AIX, both alignas() and MY_ALIGNED()
only guarantees alignment up to 16 bytes.

For some data structures, specifying alignment to the CPU
cache line size (typically 64 or 128 bytes) is a mere performance
optimization, and we do not really care whether the requested
alignment is guaranteed.

But, for the correct operation of direct I/O, we do require that
the buffers be aligned at a block size boundary.

field_ref_zero: Define as a pointer, not an array.
For innochecksum, we can make this point to unaligned memory;
for anything else, we will allocate an aligned buffer from the heap.
This buffer will be used for overwriting freed data pages when
innodb_immediate_scrub_data_uncompressed=ON. And exactly that code
hit an assertion failure on AIX, in the test innodb.innodb_scrub.

log_sys.checkpoint_buf: Define as a pointer to aligned memory
that is allocated from heap.

log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf
instead of trying to allocate an aligned buffer from the stack.
2021-07-22 10:05:13 +03:00
Sergei Golubchik
6190a02f35 Merge branch '10.2' into 10.3 2021-07-21 20:11:07 +02:00
Heinz Wiesinger
751ebe44fd Add feature summary at the end of cmake.
This gives a short overview over found/missing dependencies as well
as enabled/disabled features.

Initial author Heinz Wiesinger <heinz@m2mobi.com>
Additions by Vicențiu Ciorbaru <vicentiu@mariadb.org>
* Report all plugins enabled via MYSQL_ADD_PLUGIN
* Simplify code. Eliminate duplication by making use of WITH_xxx
  variable values to set feature "ON" / "OFF" state.

Reviewed by: wlad@mariadb.com (code details) serg@mariadb.com (the idea)
2021-07-21 10:22:56 +03:00
Marko Mäkelä
ed0a7b1b3f MDEV-24626 fixup: Remove useless code
fil_ibd_create(): Remove code that should have been removed in
commit 86dc7b4d4c already.
We no longer wrote an initialized page to the file, but we would
still allocate a page image in memory and write it.

xb_space_create_file(): Remove an unnecessary page write.
(This is a functional change for Mariabackup.)
2021-07-20 17:35:03 +03:00
Oli Sennhauser
da495b1b69 Typo fix in extrabackup.cc and innobackupex.cc
Thanks to @shinguz for helping with this.
This a backport commit from 10.7
2021-07-15 10:42:54 +03:00
Oli Sennhauser
2c4d1fb544 Typo fix in extrabackup.cc and innobackupex.cc
Thanks to @shinguz for helping with this.
2021-07-12 13:29:55 +03:00
Marko Mäkelä
f778a5d5e2 MDEV-25854: Remove garbage tables after restoring a backup
In commit 1c5ae99194 (MDEV-25666)
we had changed Mariabackup so that it would no longer skip files
whose names start with #sql. This turned out to be wrong.
Because operations on such named files are not protected by any
locks in the server, it is not safe to copy them.

Not copying the files may make the InnoDB data dictionary
inconsistent with the file system. So, we must do something
in InnoDB to adjust for that.

If InnoDB is being started up without the redo log (ib_logfile0)
or with a zero-length log file, we will assume that the server
was restored from a backup, and adjust things as follows:

dict_check_sys_tables(), fil_ibd_open(): Do not complain about
missing #sql files if they would be dropped a little later.

dict_stats_update_if_needed(): Never add #sql tables to
the recomputing queue. This avoids a potential race condition when
dropping the garbage tables.

drop_garbage_tables_after_restore(): Try to drop any garbage tables.

innodb_ddl_recovery_done(): Invoke drop_garbage_tables_after_restore()
if srv_start_after_restore (a new flag) was set and we are not in
read-only mode (innodb_read_only=ON or innodb_force_recovery>3).

The tests and dbug_mariabackup_event() instrumentation
were developed by Vladislav Vaintroub, who also reviewed this.
2021-06-17 13:46:16 +03:00
Marko Mäkelä
6ba938af62 MDEV-25905: Assertion table2==NULL in dict_sys_t::add()
In commit 49e2c8f0a6 (MDEV-25743)
we made dict_sys_t::find() incompatible with the rest of the
table name hash table operations in case the table name contains
non-ASCII octets (using a compatibility mode that facilitates the
upgrade into the MySQL 5.0 filename-safe encoding) and the target
platform implements signed char.

ut_fold_string(): Remove; replace with my_crc32c(). This also makes
table name hash value calculations independent on whether char
is unsigned or signed.
2021-06-14 12:38:56 +03:00
Vladislav Vaintroub
3d6eb7afcf MDEV-25602 get rid of __WIN__ in favor of standard _WIN32
This fixed the MySQL bug# 20338 about misuse of double underscore
prefix __WIN__, which was old MySQL's idea of identifying Windows
Replace it by _WIN32 standard symbol for targeting Windows OS
(both 32 and 64 bit)

Not that connect storage engine is not fixed in this patch (must be
fixed in "upstream" branch)
2021-06-06 13:21:03 +02:00
Marko Mäkelä
a722ee88f3 Merge 10.5 into 10.6 2021-06-01 11:39:38 +03:00
Marko Mäkelä
9c7a456a92 Merge 10.4 into 10.5 2021-06-01 10:38:09 +03:00
Marko Mäkelä
77d8da57d7 Merge 10.3 into 10.4 2021-06-01 09:14:59 +03:00
Marko Mäkelä
950a220060 Merge 10.2 into 10.3 2021-06-01 08:40:59 +03:00
Vladislav Vaintroub
d3c77e08ae MDEV-20556 Remove references to "xtrabackup" and "innobackupex" in mariabackup --help 2021-05-31 12:54:21 +02:00
Vladislav Vaintroub
5bd517259f MDEV-25815 mariabackup crash or debug assert with --backup --databases-exclude
Fix regression (debug assertion or division by 0)
caused by cfd3d70ccb
2021-05-29 06:32:40 +02:00
Rucha Deodhar
4e19539c14 MDEV-22189: Change error messages inside code to have mariadb instead of
mysql

Fix: Changed error messages, rerecorded results and changed other relevant
files.
2021-05-24 11:38:13 +05:30
Marko Mäkelä
49e2c8f0a6 MDEV-25743: Unnecessary copying of table names in InnoDB dictionary
Many InnoDB data dictionary cache operations require that the
table name be copied so that it will be NUL terminated.
(For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.)

dict_table_t::is_garbage_name(): Check if a name belongs to
the background drop table queue.

dict_check_if_system_table_exists(): Remove.

dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables
SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup.

dict_sys_t::create_or_check_sys_tables(): Replaces
dict_create_or_check_foreign_constraint_tables() and
dict_create_or_check_sys_virtual().

dict_sys_t::load_table(): Replaces dict_table_get_low()
and dict_load_table().

dict_sys_t::find_table(): Renamed from get_table().

dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded
tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist.

trx_t::has_stats_table_lock(): Moved to dict0stats.cc.

Some error messages will now report table names in the internal
databasename/tablename format, instead of `databasename`.`tablename`.
2021-05-21 18:03:40 +03:00
Marko Mäkelä
5d495fc44b Merge 10.5 into 10.6 2021-05-19 09:15:54 +03:00
Marko Mäkelä
db8fb40824 Merge 10.4 into 10.5 2021-05-19 08:39:39 +03:00
Marko Mäkelä
c366845a0b MDEV-25691: Simplify handlerton::drop_database for InnoDB
The implementation of handlerton::drop_database in InnoDB is
unnecessarily complex. The minimal implementation should check
that no conflicting locks or references exist on the tables,
delete all table metadata in a single transaction, and finally
delete the tablespaces.

Note: DROP DATABASE will delete each individual table that the
SQL layer knows about, one table per transaction.
The handlerton::drop_database is basically a final cleanup step
for removing any garbage that could have been left behind
in InnoDB due to some bug, or not having atomic DDL in the past.

hash_node_t: Remove. Use the proper data type name in pointers.

dict_drop_index_tree(): Do not take the table as a parameter.
Instead, return the tablespace ID if the tablespace should be dropped
(we are dropping a clustered index tree).

fil_delete_tablespace(), fil_system_t::detach(): Return a single
detached file handle. Multi-file tablespaces cannot be deleted
via this interface.

ha_innobase::delete_table(): Remove a work-around for non-atomic DDL
and do not try to drop tables with similar-looking name.

innodb_drop_database(): Complete rewrite.

innobase_drop_database(), dict_get_first_table_name_in_db(),
row_drop_database_for_mysql(), drop_all_foreign_keys_in_db(): Remove.

row_purge_remove_clust_if_poss_low(), row_undo_ins_remove_clust_rec():
If the tablespace is to be deleted, try to evict the table definition
from the cache. Failing that, set dict_table_t::space to nullptr.

lock_release_on_rollback(): On the rollback of CREATE TABLE, release all
locks that the transaction had on the table, to avoid heap-use-after-free.
2021-05-18 12:53:40 +03:00
Marko Mäkelä
08b6fd9395 MDEV-25710: Dead code os_file_opendir() in the server
The functions fil_file_readdir_next_file(), os_file_opendir(),
os_file_closedir() became dead code in the server in MariaDB 10.4.0
with commit 09af00cbde (the removal of
the crash recovery logic for the TRUNCATE TABLE implementation that
was replaced in MDEV-13564).

os_file_opendir(), os_file_closedir(): Define as macros.
2021-05-18 12:13:18 +03:00
Marko Mäkelä
86dc7b4d4c MDEV-24626 Remove synchronous write of page0 file during file creation
During data file creation, InnoDB holds dict_sys mutex, tries to
write page 0 of the file and flushes the file. This not only causing
unnecessary contention but also a deviation from the write-ahead
logging protocol.

The clean sequence of operations is that we first start a dictionary
transaction and write SYS_TABLES and SYS_INDEXES records that identify
the tablespace. Then, we durably write a FILE_CREATE record to the
write-ahead log and create the file.

Recovery should not unnecessarily insist that the first page of each
data file that is referred to by the redo log is valid. It must be
enough that page 0 of the tablespace can be initialized based on the
redo log contents.

We introduce a new data structure deferred_spaces that keeps track
of corrupted-looking files during recovery. The data structure holds
the last LSN of a FILE_ record referring to the data file, the
tablespace identifier, and the last known file name.

There are two scenarios can happen during recovery:
i) Sufficient memory: InnoDB can reconstruct the
tablespace after parsing all redo log records.

ii) Insufficient memory(multiple apply phase): InnoDB should
store the deferred tablespace redo logs even though
tablespace is not present. InnoDB should start constructing
the tablespace when it first encounters deferred tablespace
id.

Mariabackup copies the zero filled ibd file in backup_fix_ddl() as
the extension of .new file. Mariabackup test case does page flushing
when it deals with DDL operation during backup operation.

fil_ibd_create(): Remove the write of page0 and flushing of file

fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has
zero filled page0

Datafile: Clean up the error handling, and do not report errors
if we are in the middle of recovery. The caller will check
Datafile::m_defer.

fil_node_t::deferred: Indicates whether the tablespace loading was
deferred during recovery

FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace
file was cannot be loaded.

recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to
initialize fil_space_t based on buffered metadata and records to
initialize page 0. Ignore the flags in fil_name_t, because they are
intentionally invalid.

fil_name_process(): Update deferred_spaces.

recv_sys_t::parse(): Store the redo log if the tablespace id
is present in deferred spaces

recv_sys_t::recover_low(): Should recover the first page of
the tablespace even though the tablespace instance is not
present

recv_sys_t::apply(): Initialize the deferred tablespace
before applying the deferred tablespace records

recv_validate_tablespace(): Skip the validation for deferred_spaces.

recv_rename_files(): Moved and revised from recv_sys_t::apply().
For deferred-recovery tablespaces, do not attempt to rename the
file if a deferred-recovery tablespace is associated with the name.

recv_recovery_from_checkpoint_start(): Invoke recv_rename_files()
and initialize all deferred tablespaces before applying redo log.

fil_node_t::read_page0(): Skip page0 validation if the tablespace
is deferred

buf_page_create_deferred(): A variant of buf_page_create() when
the fil_space_t is not available yet

This is joint work with Thirunarayanan Balathandayuthapani,
who implemented an initial prototype.
2021-05-17 18:12:33 +03:00
Marko Mäkelä
1c5ae99194 MDEV-25666: After backup, "Could not find a valid tablespace file"
Ever since MDEV-18518 made DDL operations mostly crash-safe inside InnoDB,
it became obvious that Mariabackup might not be entirely safe with regard to
concurrent DDL operations.

check_if_skip_table(): Do not skip files whose name starts with #sql.
We cannot know whether a DDL operation is in progress and the table
might in fact be needed later.
2021-05-14 08:36:14 +03:00
Marko Mäkelä
a0b133f431 MDEV-25312 fixup: Invoke fil_space_t::rename() correctly 2021-05-14 08:24:43 +03:00
Marko Mäkelä
916b237b3f Merge 10.5 into 10.6 2021-05-07 15:00:27 +03:00
Vladislav Vaintroub
a5b3982585 MDEV-25613 assertion (file_system.n_open > 0) failed
Remove operations on fil_system.n_open from mariabackup, as they are not
protected by the mutex, and serve no higher purpose anyway.
2021-05-07 08:13:28 +02:00
Marko Mäkelä
d2e2d32933 Merge 10.5 into 10.6 2021-04-14 12:32:27 +03:00
Marko Mäkelä
6c3e860cbf Merge 10.4 into 10.5 2021-04-14 11:35:39 +03:00
Marko Mäkelä
5008171b05 Merge 10.3 into 10.4 2021-04-14 10:33:59 +03:00
Marko Mäkelä
6e6318b29b Merge 10.2 into 10.3 2021-04-13 10:26:01 +03:00
Julius Goryavsky
3eecb8db22 MDEV-25356: SST scripts should use the new mariabackup interface
SST scripts for Galera should use the new mariabackup interface
instead of the innobackupex interface, which is currently only
supported for compatibility reasons.

This commit converts the SST script for mariabackup to use the
new interface. It does not need separate tests, as any problems
will be seen as failures when running multiple tests for the
mariabackup-based SST.
2021-04-11 17:07:36 +02:00
Julius Goryavsky
bf1e09e0c4 Removed extra spaces in generated command lines (minor "cosmetic"
change after MDEV-24197)
2021-04-11 02:27:03 +02:00
Julius Goryavsky
8ff0ac45dc MDEV-25328: --innodb command line option causes mariabackup to fail
This patch fixes an issue with launching mariabackup during SST
(when used with Galera), when during bootstrap mariabackup receives
the "--innodb" option, which is incorrectly interpreted as shortcut
for "--innodb-force-recovery". This patch does not require separate
test for mtr, as the problem is visible in general testing on
buildbot.
2021-04-11 02:26:52 +02:00
Marko Mäkelä
450c017c2d Merge 10.2 into 10.3 2021-04-09 14:32:06 +03:00
Marko Mäkelä
cf552f5886 MDEV-25312 Replace fil_space_t::name with fil_space_t::name()
A consistency check for fil_space_t::name is causing recovery failures
in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field
altogether.

fil_space_t::name was more or less a copy of dict_table_t::name
(except for some special cases), and it was not being used for
anything useful.

There used to be a name_hash, but it had been removed already in
commit a75dbfd718 (MDEV-12266).

We will also remove os_normalize_path(), OS_PATH_SEPARATOR,
OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and /
roughly in the same way. The intention is that for per-table
tablespaces, the filenames will always follow the pattern
prefix/databasename/tablename.ibd. (Any \ in the prefix must not
be converted.)

ut_basename_noext(): Remove (unused function).

read_link_file(): Replaces RemoteDatafile::read_link_file().
We will ensure that the last two path component separators are
forward slashes (converting up to 2 trailing backslashes on
Microsoft Windows), so that everywhere else we can
assume that data file names end in "/databasename/tablename.ibd".

Note: On Microsoft Windows, path names that start with \\?\ must
not contain / as path component separators. Previously, such paths
did work in the DATA DIRECTORY argument of InnoDB tables.

Reviewed by: Vladislav Vaintroub
2021-04-07 18:01:13 +03:00
Julius Goryavsky
fb9d151934 MDEV-25321: mariabackup failed if password is passed via environment variable
The mariabackup interface currently supports passing a password
through an explicit command line variable, but does not support
passing a password through the MYSQL_PWD environment variable.
At the same time, the Galera SST script for mariabackup uses
the environment variable to pass the password, which leads
(in some cases) to an unsuccessful launch of mariabackup and
to the inability to start the cluster. This patch fixes this
issue. It does not need a separate test, as the problem is
visible in general testing on buildbot.
2021-04-01 21:47:30 +02:00
Srinidhi Kaushik
5bc5ecce08 MDEV-24197: Add "innodb_force_recovery" for "mariabackup --prepare"
During the prepare phase of restoring backups, "mariabackup" does
not seem to allow (or recognize) the option "innodb_force_recovery"
for the embedded InnoDB server instance that it starts.

If page corruption observed during page recovery, the prepare step
fails. While this is indeed the correct behavior ideally, allowing
this option to be set in case of emergencies might be useful when
the current backup is the only copy available. Some error messages
during "--prepare" suggest to set "innodb_force_recovery" to 1:

  [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.

For backwards compatibility, "mariabackup --innobackupex --apply-log"
should also have this option.

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
2021-04-01 13:34:40 +03:00
Vladislav Vaintroub
08cb5d8483 MDEV-25221 Do not remove source file, if copy_file() fails in mariabackup --move-back
Remove an incompletely copied destination file.
2021-03-31 14:23:56 +02:00
Eugene Kosov
1274bb7729 MDEV-25223 change fil_system_t::space_list and fil_system_t::named_spaces from UT_LIST to ilist
Mostly a refactoring. Also, debug functions were added for ease of life while debugging
2021-03-24 15:15:18 +03:00
Marko Mäkelä
1055cf967f Merge 10.5 into 10.6 2021-03-20 18:40:07 +02:00
David Carlier
1bacab8ab9 mariabackup little FreeBSD update support.
In this platform, it s better not to rely on optional proc filesystem presence.
 Using native API to retrieve binary absolute path instead.
2021-03-20 15:29:56 +11:00
Eugene Kosov
dbe941e06f cleanup: os_thread_create -> std::thread 2021-03-19 11:44:28 +03:00
Eugene Kosov
62e4aaa240 cleanup: os_thread_sleep() -> std::this_thread::sleep_for()
std version has an advantage of a more convenient units implementation from
std::chrono. Now it's no need to multipy/divide to bring anything to
micro seconds.
2021-03-19 11:44:03 +03:00
Marko Mäkelä
783625d78f MDEV-24883 add io_uring support for tpool
liburing is a new optional dependency (WITH_URING=auto|yes|no)
that replaces libaio when it is available.

aio_uring: class which wraps io_uring stuff

aio_uring::bind()/unbind(): optional optimization

aio_uring::submit_io(): mutex prevents data race. liburing calls are
thread-unsafe. But if you look into it's implementation you'll see
atomic operations. They're used for synchronization between kernel and
user-space only. That's why our own synchronization is still needed.

For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288)
because the io_uring_setup system call that is invoked
by io_uring_queue_init() requests locked memory. The value
was found empirically; with 262144, we would occasionally
fail to enable io_uring when using the maximum values of
innodb_read_io_threads=64 and innodb_write_io_threads=64.

aio_uring::thread_routine(): Tolerate -EINTR return from
io_uring_wait_cqe(), because it may occur on shutdown
on Ubuntu 20.10 (Groovy Gorilla).

This was mostly implemented by Eugene Kosov. Systemd integration
and improved startup/shutdown error handling by Marko Mäkelä.
2021-03-15 11:30:17 +02:00
Marko Mäkelä
a43ff483fa Merge 10.5 into 10.6 2021-03-11 20:20:07 +02:00
Marko Mäkelä
a4b7232b2c Merge 10.4 into 10.5 2021-03-11 20:09:34 +02:00
Marko Mäkelä
7a4fbb55b0 MDEV-25105 Remove innodb_checksum_algorithm values none,innodb,...
Historically, InnoDB supported a buggy page checksum algorithm that did not
compute a checksum over the full page. Later, well before MySQL 4.1
introduced .ibd files and the innodb_file_per_table option, the algorithm
was corrected and the first 4 bytes of each page were redefined to be
a checksum.

The original checksum was so slow that an option to disable page checksum
was introduced for benchmarketing purposes.

The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set
extension, which includes instructions for faster computation of CRC-32C.
In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was
implemented to make of that. As that option was changed to be the default
in MySQL 5.7, a bug was found on big-endian platforms and some work-around
code was added to weaken that checksum further. MariaDB disables that
work-around by default since MDEV-17958.

Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER
and ARM and also for IA-32/AMD64, making use of carry-less multiplication
where available.

Long story short, innodb_checksum_algorithm=crc32 is faster and more secure
than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb.
It should have removed any need to use innodb_checksum_algorithm=none.

The setting innodb_checksum_algorithm=crc32 is the default in
MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5,
MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default.
It is even faster and more secure.

The default settings in MariaDB do allow old data files to be read,
no matter if a worse checksum algorithm had been used.
(Unfortunately, before innodb_checksum_algorithm=full_crc32,
the data files did not identify which checksum algorithm is being used.)

The non-default settings innodb_checksum_algorithm=strict_crc32 or
innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C
checksums. The incompatibility with old data files is why they are
not the default.

The newest server not to support innodb_checksum_algorithm=crc32
were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life.
A valid reason for using innodb_checksum_algorithm=innodb could have
been the ability to downgrade. If it is really needed, data files
can be converted with an older version of the innochecksum utility.

Because there is no good reason to allow data files to be written
with insecure checksums, we will reject those option values:

    innodb_checksum_algorithm=none
    innodb_checksum_algorithm=innodb
    innodb_checksum_algorithm=strict_none
    innodb_checksum_algorithm=strict_innodb

Furthermore, the following innochecksum options will be removed,
because only strict crc32 will be supported:

    innochecksum --strict-check=crc32
    innochecksum -C crc32
    innochecksum --write=crc32
    innochecksum -w crc32

If a user wishes to convert a data file to use a different checksum
(so that it might be used with the no-longer-supported
MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE
nor system tablespace format changes that were made in MariaDB 10.3),
then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or
MySQL 5.7 can be used.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-03-11 12:46:18 +02:00
David CARLIER
1dff411e84 arguments overflow fix proposal. the list is assumed to be implictly null terminated at usage time. 2021-03-09 15:51:38 +02:00
David CARLIER
e3a597378e mariabackup utility, binary path implementation for Mac.
implements in a native way get_exepath which gives reliably the full
path.
2021-03-09 15:51:38 +02:00
Marko Mäkelä
d346763479 Merge 10.5 into 10.6 2021-03-08 10:51:31 +02:00
Marko Mäkelä
a5d3c1c819 Merge 10.4 into 10.5 2021-03-08 10:16:20 +02:00
Marko Mäkelä
a26e7a3726 Merge 10.3 into 10.4 2021-03-08 09:39:54 +02:00
Marko Mäkelä
bcd160753c Merge 10.2 into 10.3 2021-03-05 10:06:42 +02:00
Vladislav Vaintroub
545cba13eb MDEV-22929 fixup. Print "completed OK!" if page corruption and --log-innodb-page-corruption
Since we do not stop at corrupted page error, there is no reason to log
a backup error.
2021-03-05 09:04:30 +01:00
Marko Mäkelä
420f8e24ab MDEV-24854: Change innodb_flush_method=O_DIRECT by default
We have innodb_use_native_aio=ON by default since the introduction of
that parameter in commit 2f9fb41b05
(MySQL 5.5 and MariaDB 5.5).

However, to really benefit from the setting, the files should be
opened in O_DIRECT mode, to bypass the file system cache.
In this way, the reads and writes can be submitted with DMA, using
the InnoDB buffer pool directly, and no processor cycles need to be
used for copying data. The use of O_DIRECT benefits not only the
current libaio implementation, but also liburing.

os_file_set_nocache(): Test innodb_flush_method in the function,
not in the callers.
2021-02-20 11:58:58 +02:00
Marko Mäkelä
b19ec8848c Merge 10.5 into 10.6 2021-02-11 09:26:53 +02:00
Monty
5d6ad2ad66 Added 'const' to arguments in get_one_option and find_typeset()
One should not change the program arguments!
This change also reduces warnings from the icc compiler.

Almost all changes are just syntax changes (adding const to
'get_one_option function' declarations).

Other changes:
- Added a few cast of 'argument' from 'const char*' to 'char *'. This
  was mainly in calls to 'external' functions we don't have control of.
- Ensure that all reset of 'password command line argument' are similar.
  (In almost all cases it was just adding a comment and a cast)
- In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
  the command line argument. These places where changed to instead allocate
  the option in a MEM_ROOT to avoid changing the argument. Some of this
  code was changed to ensure that different programs did parsing the
  same way. Added a test case for the changes in mysqlbinlog.cc
- Changed a few variables that took their value from command line options
  from 'char *' to 'const char *'.
2021-02-08 12:16:29 +02:00
Marko Mäkelä
92abdcca5a Merge 10.5 into 10.6 2021-01-07 09:08:09 +02:00
Marko Mäkelä
a993310593 MDEV-24537 innodb_max_dirty_pages_pct_lwm=0 lost its special meaning
In commit 3a9a3be1c6 (MDEV-23855)
some previous logic was replaced with the condition
dirty_pct < srv_max_dirty_pages_pct_lwm, which caused
the default value of the parameter innodb_max_dirty_pages_pct_lwm=0
to lose its special meaning: 'refer to innodb_max_dirty_pages_pct instead'.

This implicit special meaning was visible in the function
af_get_pct_for_dirty(), which was removed in
commit f0c295e2de (MDEV-24369).

page_cleaner_flush_pages_recommendation(): Restore the special
meaning that was removed in MDEV-24369.

buf_flush_page_cleaner(): If srv_max_dirty_pages_pct_lwm==0.0,
refer to srv_max_buf_pool_modified_pct. This fixes the observed
performance regression due to excessive page flushing.

buf_pool_t::page_cleaner_wakeup(): Revise the wakeup condition.

innodb_init(): Do initialize srv_max_io_capacity in Mariabackup.
It was previously constantly 0, which caused mariadb-backup --prepare
to hang in buf_flush_sync(), making no progress.
2021-01-06 13:53:14 +02:00
Oleksandr Byelkin
02e7bff882 Merge commit '10.4' into 10.5 2021-01-06 10:53:00 +01:00
Oleksandr Byelkin
478b83032b Merge branch '10.3' into 10.4 2020-12-25 09:13:28 +01:00
Oleksandr Byelkin
25561435e0 Merge branch '10.2' into 10.3 2020-12-23 19:28:02 +01:00
Marko Mäkelä
0aa02567dd Merge 10.3 into 10.4 2020-12-23 14:52:59 +02:00
Marko Mäkelä
fa1aef39eb Merge 10.2 into 10.3 2020-12-23 14:25:45 +02:00
Vlad Lesin
719da2c4cc MDEV-22810 mariabackup does not honor open_files_limit from option during backup prepare
open_files_limit option was processed only for --backup, but not for
--prepare.
2020-12-16 10:23:41 +03:00
Marko Mäkelä
ff5d306e29 MDEV-21452: Replace ib_mutex_t with mysql_mutex_t
SHOW ENGINE INNODB MUTEX functionality is completely removed,
as are the InnoDB latching order checks.

We will enforce innodb_fatal_semaphore_wait_threshold
only for dict_sys.mutex and lock_sys.mutex.

dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex.

lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex.

FIXME: srv_sys should be removed altogether; it is duplicating tpool
functionality.

fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must
not hold fil_system.mutex.

fil_close_all_files(): To prevent SAFE_MUTEX warnings for
fil_space_destroy_crypt_data(), we must not hold fil_system.mutex
while invoking fil_space_free_low() on a detached tablespace.
2020-12-15 17:56:18 +02:00
Marko Mäkelä
38fd7b7d91 MDEV-21452: Replace all direct use of os_event_t
Let us replace os_event_t with mysql_cond_t, and replace the
necessary ib_mutex_t with mysql_mutex_t so that they can be
used with condition variables.

Also, let us replace polling (os_thread_sleep() or timed waits)
with plain mysql_cond_wait() wherever possible.

Furthermore, we will use the lightweight srw_mutex for trx_t::mutex,
to hopefully reduce contention on lock_sys.mutex.

FIXME: Add test coverage of
mariabackup --backup --kill-long-queries-timeout
2020-12-15 17:56:17 +02:00
Marko Mäkelä
9ecd766526 Merge 10.5 into 10.6 2020-12-14 18:02:40 +02:00
Marko Mäkelä
f24b738318 MDEV-24313 (2 of 2): Silently ignored innodb_use_native_aio=1
In commit 5e62b6a5e0 (MDEV-16264)
the logic of os_aio_init() was changed so that it will never fail,
but instead automatically disable innodb_use_native_aio (which is
enabled by default) if the io_setup() system call would fail due
to resource limits being exceeded. This is questionable, especially
because falling back to simulated AIO may lead to significantly
reduced performance.

srv_n_file_io_threads, srv_n_read_io_threads, srv_n_write_io_threads:
Change the data type from ulong to uint.

os_aio_init(): Remove the parameters, and actually return an error code.

thread_pool::configure_aio(): Do not silently fall back to simulated AIO.

Reviewed by: Vladislav Vaintroub
2020-12-14 15:27:03 +02:00
Marko Mäkelä
8677c14e65 MDEV-24391 heap-use-after-free in fil_space_t::flush_low()
We observed a race condition that involved two threads
executing fil_flush_file_spaces() and one thread
executing fil_delete_tablespace(). After one of the
fil_flush_file_spaces() observed that
space.needs_flush_not_stopping() is set and was
releasing the fil_system.mutex, the other fil_flush_file_spaces()
would complete the execution of fil_space_t::flush_low() on
the same tablespace. Then, fil_delete_tablespace() would
destroy the object, because the value of fil_space_t::n_pending
did not prevent that. Finally, the fil_flush_file_spaces() would
resume execution and invoke fil_space_t::flush_low() on the freed
object.

This race condition was introduced in
commit 118e258aaa of MDEV-23855.

fil_space_t::flush(): Add a template parameter that indicates
whether the caller is holding a reference to prevent the
tablespace from being freed.

buf_dblwr_t::flush_buffered_writes_completed(),
row_quiesce_table_start(): Acquire a reference for the duration
of the fil_space_t::flush_low() operation. It should be impossible
for the object to be freed in these code paths, but we want to
satisfy the debug assertions.

fil_space_t::flush_low(): Do not increment or decrement the
reference count, but instead assert that the caller is holding
a reference.

fil_space_extend_must_retry(), fil_flush_file_spaces():
Acquire a reference before releasing fil_system.mutex.
This is what will fix the race condition.
2020-12-11 09:05:26 +02:00
Marko Mäkelä
1eb59c307d MDEV-24340 Unique final message of InnoDB during shutdown
innobase_space_shutdown(): Remove. We want this step to be executed
before the message "InnoDB: Shutdown completed; log sequence number "
is output by innodb_shutdown(). It used to be executed after that step.

innodb_shutdown(): Duplicate the code that used to live in
innobase_space_shutdown().

innobase_init_abort(): Merge with innobase_space_shutdown().
2020-12-04 11:46:47 +02:00
Marko Mäkelä
a13fac9eee Merge 10.5 into 10.6 2020-12-03 08:12:47 +02:00
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Vlad Lesin
e30a05f454 MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered
Post-push Windows compilation errors fix.
2020-12-01 18:33:10 +03:00
Marko Mäkelä
81ab9ea63f Merge 10.2 into 10.3 2020-12-01 14:55:46 +02:00
Vlad Lesin
e6b3e38d62 MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered
The new option --log-innodb-page-corruption is introduced.

When this option is set, backup is not interrupted if innodb corrupted
page is detected. Instead it logs all found corrupted pages in
innodb_corrupted_pages file in backup directory and finishes with error.

For incremental backup corrupted pages are also copied to .delta file,
because we can't do LSN check for such pages during backup,
innodb_corrupted_pages will also be created in incremental backup
directory.

During --prepare, corrupted pages list is read from the file just after
redo log is applied, and each page from the list is checked if it is allocated
in it's tablespace or not. If it is not allocated, then it is zeroed out,
flushed to the tablespace and removed from the list. If all pages are removed
from the list, then --prepare is finished successfully and
innodb_corrupted_pages file is removed from backup directory. Otherwise
--prepare is finished with error message and innodb_corrupted_pages contains
the list of the pages, which are detected as corrupted during backup, and are
allocated in their tablespaces, what means backup directory contains corrupted
innodb pages, and backup can not be considered as consistent.

For incremental --prepare corrupted pages from .delta files are applied
to the base backup, innodb_corrupted_pages is read from both base in
incremental directories, and the same action is proceded for corrupted
pages list as for full --prepare. innodb_corrupted_pages file is
modified or removed only in base directory.

If DDL happens during backup, it is also processed at the end of backup
to have correct tablespace names in innodb_corrupted_pages.
2020-12-01 08:08:57 +03:00
Marko Mäkelä
533a13af06 Merge 10.3 into 10.4 2020-11-03 14:49:17 +02:00
Marko Mäkelä
09a1f0075a Merge 10.5 into 10.6 2020-11-02 12:49:19 +02:00
Oleksandr Byelkin
8e1e2856f2 Merge branch '10.4' into 10.5 2020-11-01 14:26:15 +01:00
Oleksandr Byelkin
80c951ce28 Merge branch '10.3' into 10.4 2020-10-31 21:06:49 +01:00
Oleksandr Byelkin
794f665139 Merge branch '10.2' into 10.3 2020-10-30 17:23:53 +01:00
Marko Mäkelä
898521e2dd Merge 10.4 into 10.5 2020-10-30 11:15:30 +02:00
Marko Mäkelä
7b2bb67113 Merge 10.3 into 10.4 2020-10-29 13:38:38 +02:00
Vlad Lesin
6cb88685c4 MDEV-24026: InnoDB: Failing assertion: os_total_large_mem_allocated >= size upon incremental backup
mariabackup deallocated uninitialized
write_filt_ctxt.u.wf_incremental_ctxt in xtrabackup_copy_datafile() when
some table should be skipped due to parsed DDL redo log record.
2020-10-29 07:39:43 +01:00
Marko Mäkelä
a8de8f261d Merge 10.2 into 10.3 2020-10-28 10:01:50 +02:00
Marko Mäkelä
c27e53f459 MDEV-23855: Use normal mutex for log_sys.mutex, log_sys.flush_order_mutex
With an unreasonably small innodb_log_file_size, the page cleaner
thread would frequently acquire log_sys.flush_order_mutex and spend
a significant portion of CPU time spinning on that mutex when
determining the checkpoint LSN.
2020-10-26 17:53:55 +02:00
Marko Mäkelä
118e258aaa MDEV-23855: Shrink fil_space_t
Merge n_pending_ios, n_pending_ops to std::atomic<uint32_t> n_pending.
Change some more fil_space_t members to uint32_t to reduce
the memory footprint.

fil_space_t::add(), fil_ibd_create(): Attach the already opened
handle to the tablespace, and enforce the fil_system.n_open limit.

dict_boot(): Initialize fil_system.max_assigned_id.

srv_boot(): Call srv_thread_pool_init() before anything else,
so that files should be opened in the correct mode on Windows.

fil_ibd_create(): Create the file in OS_FILE_AIO mode, just like
fil_node_open_file_low() does it.

dict_table_t::is_accessible(): Replaces fil_table_accessible().

Reviewed by: Vladislav Vaintroub
2020-10-26 17:53:54 +02:00
Marko Mäkelä
45ed9dd957 MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contention
Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored
for system tablespace on SSD

When the maximum configured number of file is exceeded, InnoDB will
close data files. We used to maintain a fil_system.LRU list and
a counter fil_node_t::n_pending to achieve this, at the huge cost
of multiple fil_system.mutex operations per I/O operation.

fil_node_open_file_low(): Implement a FIFO replacement policy:
The last opened file will be moved to the end of fil_system.space_list,
and files will be closed from the start of the list. However, we will
not move tablespaces in fil_system.space_list while
i_s_tablespaces_encryption_fill_table() is executing
(producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION)
because it may cause information of some tablespaces to go missing.
We also avoid this in mariabackup --backup because datafiles_iter_next()
assumes that the ordering is not changed.

IORequest: Fold more parameters to IORequest::type.

fil_space_t::io(): Replaces fil_io().

fil_space_t::flush(): Replaces fil_flush().

OS_AIO_IBUF: Remove. We will always issue synchronous reads of the
change buffer pages in buf_read_page_low().

We will always ignore some errors for background reads.

This should reduce fil_system.mutex contention a little.

fil_node_t::complete_write(): Replaces fil_node_t::complete_io().
On both read and write completion, fil_space_t::release_for_io()
will have to be called.

fil_space_t::io(): Do not acquire fil_system.mutex in the normal
code path.

xb_delta_open_matching_space(): Do not try to open the system tablespace
which was already opened. This fixes a file sharing violation in
mariabackup --prepare --incremental.

Reviewed by: Vladislav Vaintroub
2020-10-26 17:09:01 +02:00
Marko Mäkelä
3a9a3be1c6 MDEV-23855: Improve InnoDB log checkpoint performance
After MDEV-15053, MDEV-22871, MDEV-23399 shifted the scalability
bottleneck, log checkpoints became a new bottleneck.

If innodb_io_capacity is set low or innodb_max_dirty_pct_lwm is
set high and the workload fits in the buffer pool, the page cleaner
thread will perform very little flushing. When we reach the capacity
of the circular redo log file ib_logfile0 and must initiate a checkpoint,
some 'furious flushing' will be necessary. (If innodb_flush_sync=OFF,
then flushing would continue at the innodb_io_capacity rate, and
writers would be throttled.)

We have the best chance of advancing the checkpoint LSN immediately
after a page flush batch has been completed. Hence, it is best to
perform checkpoints after every batch in the page cleaner thread,
attempting to run once per second.

By initiating high-priority flushing in the page cleaner as early
as possible, we aim to make the throughput more stable.

The function buf_flush_wait_flushed() used to sleep for 10ms, hoping
that the page cleaner thread would do something during that time.
The observed end result was that a large number of threads that call
log_free_check() would end up sleeping while nothing useful is happening.

We will revise the design so that in the default innodb_flush_sync=ON
mode, buf_flush_wait_flushed() will wake up the page cleaner thread
to perform the necessary flushing, and it will wait for a signal from
the page cleaner thread.

If innodb_io_capacity is set to a low value (causing the page cleaner to
throttle its work), a write workload would initially perform well, until
the capacity of the circular ib_logfile0 is reached and log_free_check()
will trigger checkpoints. At that point, the extra waiting in
buf_flush_wait_flushed() will start reducing throughput.

The page cleaner thread will also initiate log checkpoints after each
buf_flush_lists() call, because that is the best point of time for
the checkpoint LSN to advance by the maximum amount.

Even in 'furious flushing' mode we invoke buf_flush_lists() with
innodb_io_capacity_max pages at a time, and at the start of each
batch (in the log_flush() callback function that runs in a separate
task) we will invoke os_aio_wait_until_no_pending_writes(). This
tweak allows the checkpoint to advance in smaller steps and
significantly reduces the maximum latency. On an Intel Optane 960
NVMe SSD on Linux, it reduced from 4.6 seconds to 74 milliseconds.
On Microsoft Windows with a slower SSD, it reduced from more than
180 seconds to 0.6 seconds.

We will make innodb_adaptive_flushing=OFF simply flush innodb_io_capacity
per second whenever the dirty proportion of buffer pool pages exceeds
innodb_max_dirty_pages_pct_lwm. For innodb_adaptive_flushing=ON we try
to make page_cleaner_flush_pages_recommendation() more consistent and
predictable: if we are below innodb_adaptive_flushing_lwm, let us flush
pages according to the return value of af_get_pct_for_dirty().

innodb_max_dirty_pages_pct_lwm: Revert the change of the default value
that was made in MDEV-23399. The value innodb_max_dirty_pages_pct_lwm=0
guarantees that a shutdown of an idle server will be fast. Users might
be surprised if normal shutdown suddenly became slower when upgrading
within a GA release series.

innodb_checkpoint_usec: Remove. The master task will no longer perform
periodic log checkpoints. It is the duty of the page cleaner thread.

log_sys.max_modified_age: Remove. The current span of the
buf_pool.flush_list expressed in LSN only matters for adaptive
flushing (outside the 'furious flushing' condition).
For the correctness of checkpoints, the only thing that matters is
the checkpoint age (log_sys.lsn - log_sys.last_checkpoint_lsn).
This run-time constant was also reported as log_max_modified_age_sync.

log_sys.max_checkpoint_age_async: Remove. This does not serve any
purpose, because the checkpoints will now be triggered by the page
cleaner thread. We will retain the log_sys.max_checkpoint_age limit
for engaging 'furious flushing'.

page_cleaner.slot: Remove. It turns out that
page_cleaner_slot.flush_list_time was duplicating
page_cleaner.slot.flush_time and page_cleaner.slot.flush_list_pass
was duplicating page_cleaner.flush_pass.
Likewise, there were some redundant monitor counters, because the
page cleaner thread no longer performs any buf_pool.LRU flushing, and
because there only is one buf_flush_page_cleaner thread.

buf_flush_sync_lsn: Protect writes by buf_pool.flush_list_mutex.

buf_pool_t::get_oldest_modification(): Add a parameter to specify the
return value when no persistent data pages are dirty. Require the
caller to hold buf_pool.flush_list_mutex.

log_buf_pool_get_oldest_modification(): Take the fall-back LSN
as a parameter. All callers will also invoke log_sys.get_lsn().

log_preflush_pool_modified_pages(): Replaced with buf_flush_wait_flushed().

buf_flush_wait_flushed(): Implement two limits. If not enough buffer pool
has been flushed, signal the page cleaner (unless innodb_flush_sync=OFF)
and wait for the page cleaner to complete. If the page cleaner
thread is not running (which can be the case durign shutdown),
initiate the flush and wait for it directly.

buf_flush_ahead(): If innodb_flush_sync=ON (the default),
submit a new buf_flush_sync_lsn target for the page cleaner
but do not wait for the flushing to finish.

log_get_capacity(), log_get_max_modified_age_async(): Remove, to make
it easier to see that af_get_pct_for_lsn() is not acquiring any mutexes.

page_cleaner_flush_pages_recommendation(): Protect all access to
buf_pool.flush_list with buf_pool.flush_list_mutex. Previously there
were some race conditions in the calculation.

buf_flush_sync_for_checkpoint(): New function to process
buf_flush_sync_lsn in the page cleaner thread. At the end of
each batch, we try to wake up any blocked buf_flush_wait_flushed().
If everything up to buf_flush_sync_lsn has been flushed, we will
reset buf_flush_sync_lsn=0. The page cleaner thread will keep
'furious flushing' until the limit is reached. Any threads that
are waiting in buf_flush_wait_flushed() will be able to resume
as soon as their own limit has been satisfied.

buf_flush_page_cleaner: Prioritize buf_flush_sync_lsn and do not
sleep as long as it is set. Do not update any page_cleaner statistics
for this special mode of operation. In the normal mode
(buf_flush_sync_lsn is not set for innodb_flush_sync=ON),
try to wake up once per second. No longer check whether
srv_inc_activity_count() has been called. After each batch,
try to perform a log checkpoint, because the best chances for
the checkpoint LSN to advance by the maximum amount are upon
completing a flushing batch.

log_t: Move buf_free, max_buf_free possibly to the same cache line
with log_sys.mutex.

log_margin_checkpoint_age(): Simplify the logic, and replace
a 0.1-second sleep with a call to buf_flush_wait_flushed() to
initiate flushing. Moved to the same compilation unit
with the only caller.

log_close(): Clean up the calculations. (Should be no functional
change.) Return whether flush-ahead is needed. Moved to the same
compilation unit with the only caller.

mtr_t::finish_write(): Return whether flush-ahead is needed.

mtr_t::commit(): Invoke buf_flush_ahead() when needed. Let us avoid
external calls in mtr_t::commit() and make the logic easier to follow
by having related code in a single compilation unit. Also, we will
invoke srv_stats.log_write_requests.inc() only once per
mini-transaction commit, while not holding mutexes.

log_checkpoint_margin(): Only care about log_sys.max_checkpoint_age.
Upon reaching log_sys.max_checkpoint_age where we must wait to prevent
the log from getting corrupted, let us wait for at most 1MiB of LSN
at a time, before rechecking the condition. This should allow writers
to proceed even if the redo log capacity has been reached and
'furious flushing' is in progress. We no longer care about
log_sys.max_modified_age_sync or log_sys.max_modified_age_async.
The log_sys.max_modified_age_sync could be a relic from the time when
there was a srv_master_thread that wrote dirty pages to data files.
Also, we no longer have any log_sys.max_checkpoint_age_async limit,
because log checkpoints will now be triggered by the page cleaner
thread upon completing buf_flush_lists().

log_set_capacity(): Simplify the calculations of the limit
(no functional change).

log_checkpoint_low(): Split from log_checkpoint(). Moved to the
same compilation unit with the caller.

log_make_checkpoint(): Only wait for everything to be flushed until
the current LSN.

create_log_file(): After checkpoint, invoke log_write_up_to()
to ensure that the FILE_CHECKPOINT record has been written.
This avoids ut_ad(!srv_log_file_created) in create_log_file_rename().

srv_start(): Do not call recv_recovery_from_checkpoint_start()
if the log has just been created. Set fil_system.space_id_reuse_warned
before dict_boot() has been executed, and clear it after recovery
has finished.

dict_boot(): Initialize fil_system.max_assigned_id.

srv_check_activity(): Remove. The activity count is counting transaction
commits and therefore mostly interesting for the purge of history.

BtrBulk::insert(): Do not explicitly wake up the page cleaner,
but do invoke srv_inc_activity_count(), because that counter is
still being used in buf_load_throttle_if_needed() for some
heuristics. (It might be cleaner to execute buf_load() in the
page cleaner thread!)

Reviewed by: Vladislav Vaintroub
2020-10-26 17:09:01 +02:00
Marko Mäkelä
5999d5120e MDEV-23399 fixup: Avoid crash on Mariabackup shutdown
innodb_preshutdown(): Terminate the encryption threads before
the page cleaner thread can be shut down.

innodb_shutdown(): Always wait for the encryption threads and
page cleaner to shut down.

srv_shutdown_all_bg_threads(): Wait for the encryption threads and
the page cleaner to shut down. (After an aborted startup,
innodb_shutdown() would not be called.)

row_get_background_drop_list_len_low(): Remove.

os_thread_count: Remove. Alternatively, at the end of
srv_shutdown_all_bg_threads() we could try to wait longer
for the count to reach 0. On some platforms, an assertion
os_thread_count==0 could fail even after a small delay,
even though in the core dump all threads would have exited.

srv_shutdown_threads(): Renamed from srv_shutdown_all_bg_threads().
Do not wait for the page cleaner to shut down, because the later
innodb_shutdown(), which may invoke
logs_empty_and_mark_files_at_shutdown(), assumes that it exists.
2020-10-26 15:05:41 +02:00
Vlad Lesin
985ede9203 MDEV-20755 InnoDB: Database page corruption on disk or a failed file read of tablespace upon prepare of mariabackup incremental backup
The problem:

When incremental backup is taken, delta files are created for innodb tables
which are marked as new tables during innodb ddl tracking. When such
tablespace is tried to be opened during prepare in
xb_delta_open_matching_space(), it is "created", i.e.
xb_space_create_file() is invoked, instead of opening, even if
a tablespace with the same name exists in the base backup directory.

xb_space_create_file() writes page 0 header the tablespace.
This header does not contain crypt data, as mariabackup does not have
any information about crypt data in delta file metadata for
tablespaces.

After delta file is applied, recovery process is started. As the
sequence of recovery for different pages is not defined, there can be
the situation when crypt data redo log event is executed after some
other page is read for recovery. When some page is read for recovery, it's
decrypted using crypt data stored in tablespace header in page 0, if
there is no crypt data, the page is not decryped and does not pass corruption
test.

This causes error for incremental backup --prepare for encrypted
tablespaces.

The error is not stable because crypt data redo log event updates crypt
data on page 0, and recovery for different pages can be executed in
undefined order.

The fix:

When delta file is created, the corresponding write filter copies only
the pages which LSN is greater then some incremental LSN. When new file
is created during incremental backup, the LSN of all it's pages must be
greater then incremental LSN, so there is no need to create delta for
such table, we can just copy it completely.

The fix is to copy the whole file which was tracked during incremental backup
with innodb ddl tracker, and copy it to base directory during --prepare
instead of delta applying.

There is also DBUG_EXECUTE_IF() in innodb code to avoid writing redo log
record for crypt data updating on page 0 to make the test case stable.

Note:

The issue is not reproducible in 10.5 as optimized DDL's are deprecated
in 10.5. But the fix is still useful because it allows to decrease
data copy size during backup, as delta file contains some extra info.
The test case should be removed for 10.5 as it will always pass.
2020-10-23 11:02:25 +03:00
Marko Mäkelä
1657b7a583 Merge 10.4 to 10.5 2020-10-22 17:08:49 +03:00
Marko Mäkelä
46957a6a77 Merge 10.3 into 10.4 2020-10-22 13:27:18 +03:00
Marko Mäkelä
e3d692aa09 Merge 10.2 into 10.3 2020-10-22 08:26:28 +03:00
Julius Goryavsky
888010d9dd MDEV-21951: mariabackup SST fail if data-directory have lost+found directory
To fix this, it is necessary to add an option to exclude the
database with the name "lost+found" from processing (the database
name will be checked by the check_if_skip_database_by_path() or
by the check_if_skip_database() function, and as a result
"lost+found" will be skipped).

In addition, it is necessary to slightly modify the verification
logic in the check_if_skip_database() function.

Also added a new test galera_sst_mariabackup_lost_found.test
2020-10-20 12:41:06 +02:00
Marko Mäkelä
1066312a12 MDEV-23982: Mariabackup hangs on backup
MDEV-13318 introduced a condition to Mariabackup that can cause it to
hang if the server goes idle after writing a log block that has no
payload after the 12-byte header. Normal recovery in log0recv.cc would
allow blocks with exactly 12 bytes of length, and only reject blocks
where the length field is shorter than that.
2020-10-19 20:36:05 +03:00
Marko Mäkelä
a0113683d7 Fixup 9028cc6b86
We forgot to change innodb_autoextend_increment from ULONG to
UINT (always 32-bit) in Mariabackup.
2020-10-16 07:51:37 +03:00
Marko Mäkelä
9028cc6b86 Cleanup: Make InnoDB page numbers uint32_t
InnoDB stores a 32-bit page number in page headers and in some
data structures, such as FIL_ADDR (consisting of a 32-bit page number
and a 16-bit byte offset within a page). For better compile-time
error detection and to reduce the memory footprint in some data
structures, let us use a uint32_t for the page number, instead
of ulint (size_t) which can be 64 bits.
2020-10-15 17:06:17 +03:00
Marko Mäkelä
7cffb5f6e8 MDEV-23399: Performance regression with write workloads
The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted
the performance bottleneck to the page flushing.

The configuration parameters will be changed as follows:

innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction)
innodb_lru_scan_depth=1536 (old: 1024)
innodb_max_dirty_pages_pct=90 (old: 75)
innodb_max_dirty_pages_pct_lwm=75 (old: 0)

Note: The parameter innodb_lru_scan_depth will only affect LRU
eviction of buffer pool pages when a new page is being allocated. The
page cleaner thread will no longer evict any pages. It used to
guarantee that some pages will remain free in the buffer pool. Now, we
perform that eviction 'on demand' in buf_LRU_get_free_block().
The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows:
 * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks()
 * As a buf_pool.free limit in buf_LRU_list_batch() for terminating
   the flushing that is initiated e.g., by buf_LRU_get_free_block()
The parameter also used to serve as an initial limit for unzip_LRU
eviction (evicting uncompressed page frames while retaining
ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit
of 100 or unlimited for invoking buf_LRU_scan_and_free_block().

The status variables will be changed as follows:

innodb_buffer_pool_pages_flushed: This includes also the count of
innodb_buffer_pool_pages_LRU_flushed and should work reliably,
updated one by one in buf_flush_page() to give more real-time
statistics. The function buf_flush_stats(), which we are removing,
was not called in every code path. For both counters, we will use
regular variables that are incremented in a critical section of
buf_pool.mutex. Note that show_innodb_vars() directly links to the
variables, and reads of the counters will *not* be protected by
buf_pool.mutex, so you cannot get a consistent snapshot of both variables.

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be
removed, because the page cleaner no longer deals with writing or
evicting least recently used pages, and because the single-page writes
have been removed:
* buffer_LRU_batch_flush_avg_time_slot
* buffer_LRU_batch_flush_avg_time_thread
* buffer_LRU_batch_flush_avg_time_est
* buffer_LRU_batch_flush_avg_pass
* buffer_LRU_single_flush_scanned
* buffer_LRU_single_flush_num_scan
* buffer_LRU_single_flush_scanned_per_call

When moving to a single buffer pool instance in MDEV-15058, we missed
some opportunity to simplify the buf_flush_page_cleaner thread. It was
unnecessarily using a mutex and some complex data structures, even
though we always have a single page cleaner thread.

Furthermore, the buf_flush_page_cleaner thread had separate 'recovery'
and 'shutdown' modes where it was waiting to be triggered by some
other thread, adding unnecessary latency and potential for hangs in
relatively rarely executed startup or shutdown code.

The page cleaner was also running two kinds of batches in an
interleaved fashion: "LRU flush" (writing out some least recently used
pages and evicting them on write completion) and the normal batches
that aim to increase the MIN(oldest_modification) in the buffer pool,
to help the log checkpoint advance.

The buf_pool.flush_list flushing was being blocked by
buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN
of a page is ahead of log_sys.get_flushed_lsn(), that is, what has
been persistently written to the redo log, we would trigger a log
flush and then resume the page flushing. This would unnecessarily
limit the performance of the page cleaner thread and trigger the
infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms.
The settings might not be optimal" that were suppressed in
commit d1ab89037a unless log_warnings>2.

Our revised algorithm will make log_sys.get_flushed_lsn() advance at
the start of buf_flush_lists(), and then execute a 'best effort' to
write out all pages. The flush batches will skip pages that were modified
since the log was written, or are are currently exclusively locked.
The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message
will be removed, because by design, the buf_flush_page_cleaner() should
not be blocked during a batch for extended periods of time.

We will remove the single-page flushing altogether. Related to this,
the debug parameter innodb_doublewrite_batch_size will be removed,
because all of the doublewrite buffer will be used for flushing
batches. If a page needs to be evicted from the buffer pool and all
100 least recently used pages in the buffer pool have unflushed
changes, buf_LRU_get_free_block() will execute buf_flush_lists() to
write out and evict innodb_lru_flush_size pages. At most one thread
will execute buf_flush_lists() in buf_LRU_get_free_block(); other
threads will wait for that LRU flushing batch to finish.

To improve concurrency, we will replace the InnoDB ib_mutex_t and
os_event_t native mutexes and condition variables in this area of code.
Most notably, this means that the buffer pool mutex (buf_pool.mutex)
is no longer instrumented via any InnoDB interfaces. It will continue
to be instrumented via PERFORMANCE_SCHEMA.

For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be
declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical
sections of buf_pool.flush_list_mutex should be shorter than those for
buf_pool.mutex, because in the worst case, they cover a linear scan of
buf_pool.flush_list, while the worst case of a critical section of
buf_pool.mutex covers a linear scan of the potentially much longer
buf_pool.LRU list.

mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable
with SAFE_MUTEX. Some InnoDB debug assertions need this predicate
instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner().

buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[].
The number of active flush operations.

buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t
instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA
and SAFE_MUTEX instrumentation.

buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU.

buf_pool_t::done_flush_list: Condition variable for !n_flush_list.

buf_pool_t::do_flush_list: Condition variable to wake up the
buf_flush_page_cleaner when a log checkpoint needs to be written
or the server is being shut down. Replaces buf_flush_event.
We will keep using timed waits (the page cleaner thread will wake
_at least_ once per second), because the calculations for
innodb_adaptive_flushing depend on fixed time intervals.

buf_dblwr: Allocate statically, and move all code to member functions.
Use a native mutex and condition variable. Remove code to deal with
single-page flushing.

buf_dblwr_check_block(): Make the check debug-only. We were spending
a significant amount of execution time in page_simple_validate_new().

flush_counters_t::unzip_LRU_evicted: Remove.

IORequest: Make more members const. FIXME: m_fil_node should be removed.

buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex
(which we are removing).

page_cleaner_slot_t, page_cleaner_t: Remove many redundant members.

pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot().

recv_writer_thread: Remove. Recovery works just fine without it, if we
simply invoke buf_flush_sync() at the end of each batch in
recv_sys_t::apply().

recv_recovery_from_checkpoint_finish(): Remove. We can simply call
recv_sys.debug_free() directly.

srv_started_redo: Replaces srv_start_state.

SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown()
can communicate with the normal page cleaner loop via the new function
flush_buffer_pool().

buf_flush_remove(): Assert that the calling thread is holding
buf_pool.flush_list_mutex. This removes unnecessary mutex operations
from buf_flush_remove_pages() and buf_flush_dirty_pages(),
which replace buf_LRU_flush_or_remove_pages().

buf_flush_lists(): Renamed from buf_flush_batch(), with simplified
interface. Return the number of flushed pages. Clarified comments and
renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions
buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this
function, which was their only caller, and remove 2 unnecessary
buf_pool.mutex release/re-acquisition that we used to perform around
the buf_flush_batch() call. At the start, if not all log has been
durably written, wait for a background task to do it, or start a new
task to do it. This allows the log write to run concurrently with our
page flushing batch. Any pages that were skipped due to too recent
FIL_PAGE_LSN or due to them being latched by a writer should be flushed
during the next batch, unless there are further modifications to those
pages. It is possible that a page that we must flush due to small
oldest_modification also carries a recent FIL_PAGE_LSN or is being
constantly modified. In the worst case, all writers would then end up
waiting in log_free_check() to allow the flushing and the checkpoint
to complete.

buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_flush_space(): Auxiliary function to look up a tablespace for
page flushing.

buf_flush_page(): Defer the computation of space->full_crc32(). Never
call log_write_up_to(), but instead skip persistent pages whose latest
modification (FIL_PAGE_LSN) is newer than the redo log. Also skip
pages on which we cannot acquire a shared latch without waiting.

buf_flush_try_neighbors(): Do not bother checking buf_fix_count
because buf_flush_page() will no longer wait for the page latch.
Take the tablespace as a parameter, and only execute this function
when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold().

buf_flush_relocate_on_flush_list(): Declare as cold, and push down
a condition from the callers.

buf_flush_check_neighbor(): Take id.fold() as a parameter.

buf_flush_sync(): Ensure that the buf_pool.flush_list is empty,
because the flushing batch will skip pages whose modifications have
not yet been written to the log or were latched for modification.

buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables.

buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize
the counters, and report n->evicted.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_do_LRU_batch(): Return the number of pages flushed.

buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if
adaptive hash index entries are pointing to the block.

buf_LRU_get_free_block(): Do not wake up the page cleaner, because it
will no longer perform any useful work for us, and we do not want it
to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0)
writes out and evicts at most innodb_lru_flush_size pages. (The
function buf_do_LRU_batch() may complete after writing fewer pages if
more than innodb_lru_scan_depth pages end up in buf_pool.free list.)
Eliminate some mutex release-acquire cycles, and wait for the LRU
flush batch to complete before rescanning.

buf_LRU_check_size_of_non_data_objects(): Simplify the code.

buf_page_write_complete(): Remove the parameter evict, and always
evict pages that were part of an LRU flush.

buf_page_create(): Take a pre-allocated page as a parameter.

buf_pool_t::free_block(): Free a pre-allocated block.

recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block
while not holding recv_sys.mutex. During page allocation, we may
initiate a page flush, which in turn may initiate a log flush, which
would require acquiring log_sys.mutex, which should always be acquired
before recv_sys.mutex in order to avoid deadlocks. Therefore, we must
not be holding recv_sys.mutex while allocating a buffer pool block.

BtrBulk::logFreeCheck(): Skip a redundant condition.

row_undo_step(): Do not invoke srv_inc_activity_count() for every row
that is being rolled back. It should suffice to invoke the function in
trx_flush_log_if_needed() during trx_t::commit_in_memory() when the
rollback completes.

sync_check_enable(): Remove. We will enable innodb_sync_debug from the
very beginning.

Reviewed by: Vladislav Vaintroub
2020-10-15 17:04:56 +03:00
Marko Mäkelä
a9550c47e4 MDEV-16264 fixup: Remove unused code and data
LATCH_ID_OS_AIO_READ_MUTEX,
LATCH_ID_OS_AIO_WRITE_MUTEX,
LATCH_ID_OS_AIO_LOG_MUTEX,
LATCH_ID_OS_AIO_IBUF_MUTEX,
LATCH_ID_OS_AIO_SYNC_MUTEX: Remove. The tpool is not instrumented.

lock_set_timeout_event(): Remove.

srv_sys_mutex_key, srv_sys_t::mutex, SYNC_THREADS: Remove.

srv_slot_t::suspended: Remove. We only ever assigned this data member
true, so it is redundant.

ib_wqueue_wait(), ib_wqueue_timedwait(): Remove.

os_thread_join(): Remove.

os_thread_create(), os_thread_exit(): Remove redundant parameters.

These were missed in commit 5e62b6a5e0.
2020-09-30 14:28:11 +03:00
Marko Mäkelä
6ce0a6f9ad Merge 10.5 into 10.6 2020-09-24 10:21:26 +03:00
Marko Mäkelä
882ce206db Merge 10.4 into 10.5 2020-09-23 11:32:43 +03:00
Marko Mäkelä
952a028a52 Merge 10.3 into 10.4
We omit commit a3bdce8f1e
and commit a0e2a293bc
because they would make the test galera_3nodes.galera_gtid_2_cluster
fail and disable it.
2020-09-21 17:42:02 +03:00
Marko Mäkelä
2cf489d430 Merge 10.2 into 10.3 2020-09-21 16:39:23 +03:00
Marko Mäkelä
407d170c92 MDEV-23711 fixup: GCC -Og -Wmaybe-uninitialized 2020-09-21 16:29:29 +03:00
Vlad Lesin
0a224edc3e MDEV-23711 make mariabackup innodb redo log read error message more clear
log_group_read_log_seg() returns error when:

1) Calculated log block number does not correspond to read log block
number. This can be caused by:
  a) Garbage or an incompletely written log block. We can exclude this
  case by checking log block checksum if it's enabled(see innodb-log-checksums,
  encrypted log block contains checksum always).
  b) The log block is overwritten. In this case checksum will be correct and
  read log block number will be greater then requested one.

2) When log block length is wrong. In this case recv_sys->found_corrupt_log
is set.

3) When redo log block checksum is wrong. In this case innodb code
writes messages to error log with the following prefix: "Invalid log
block checksum."

The fix processes all the cases above.
2020-09-21 12:29:52 +03:00
Marko Mäkelä
3a423088ac Merge 10.3 into 10.4 2020-09-21 12:29:00 +03:00
Marko Mäkelä
cbcb4ecabb Merge 10.2 into 10.3 2020-09-21 11:04:04 +03:00
Vladislav Vaintroub
ccbe6bb6fc MDEV-19935 Create unified CRC-32 interface
Add CRC32C code to mysys. The x86-64 implementation uses PCMULQDQ in addition to CRC32 instruction
after Intel whitepaper, and is ported from rocksdb code.

Optimized ARM and POWER CRC32 were already present in mysys.
2020-09-17 16:07:37 +02:00
Vlad Lesin
80075ba011 MDEV-19264 Better support MariaDB GTID for Mariabackup's --slave-info option
Parse SHOW SLAVE STATUS output for the "Using_Gtid" column. If the value
is "No", then old log file and position is backed up, otherwise gtid_slave_pos
is backed up.
2020-09-14 11:14:50 +03:00
Marko Mäkelä
938db04898 Cleanup: Remove os0proc.* 2020-09-03 16:40:42 +03:00
Oleksandr Byelkin
5edf3e0388 Merge branch '10.5' into 10.6 2020-09-02 14:36:14 +02:00
Marko Mäkelä
2fa9f8c53a Merge 10.3 into 10.4 2020-08-20 11:01:47 +03:00
Marko Mäkelä
de0e7cd72a Merge 10.2 into 10.3 2020-08-20 09:12:16 +03:00
Marko Mäkelä
309302a3da MDEV-23475 InnoDB performance regression for write-heavy workloads
In commit fe39d02f51 (MDEV-20638)
we removed some wake-up signaling of the master thread that should
have been there, to ensure a steady log checkpointing workload.

Common sense suggests that the commit omitted some necessary calls
to srv_inc_activity_count(). But, an attempt to add the call to
trx_flush_log_if_needed_low() as well as to reinstate the function
innobase_active_small() did not restore the performance for the
case where sync_binlog=1 is set.

Therefore, we will revert the entire commit in MariaDB Server 10.2.
In MariaDB Server 10.5, adding a srv_inc_activity_count() call to
trx_flush_log_if_needed_low() did restore the performance, so we
will not revert MDEV-20638 across all versions.
2020-08-19 11:18:56 +03:00
Marko Mäkelä
4c50120d14 MDEV-23474 InnoDB fails to restart after SET GLOBAL innodb_log_checksums=OFF
Regretfully, the parameter innodb_log_checksums was introduced
in MySQL 5.7.9 (the first GA release of that series) by
mysql/mysql-server@af0acedd88
which partly replaced a parameter that had been introduced in 5.7.8
mysql/mysql-server@22ba38218e
as innodb_log_checksum_algorithm.

Given that the CRC-32C operations are accelerated on many processor
implementations (AMD64 with SSE4.2; since MDEV-22669 also on IA-32
with SSE4.2, POWER 8 and later, ARMv8 with some extensions)
and by lookup tables when only generic SISD instructions are available,
there should be no valid reason to disable checksums.

In MariaDB 10.5.2, as a preparation for MDEV-12353, MDEV-19543 deprecated
and ignored the parameter innodb_log_checksums altogether. This should
imply that after a clean shutdown with innodb_log_checksums=OFF one
cannot upgrade to MariaDB Server 10.5 at all.

Due to these problems, let us deprecate the parameter innodb_log_checksums
and honor it only during server startup.
The command SET GLOBAL innodb_log_checksums will always set the
parameter to ON.
2020-08-18 16:46:07 +03:00
Marko Mäkelä
cf87f3e08c Merge 10.4 into 10.5 2020-08-14 11:33:35 +03:00
Marko Mäkelä
2f7b37b021 Merge 10.3 into 10.4, except MDEV-22543
Also, fix GCC -Og -Wmaybe-uninitialized in run_backup_stage()
2020-08-13 18:48:41 +03:00
Marko Mäkelä
4bd56a697f Merge 10.2 into 10.3 2020-08-13 18:18:25 +03:00
Marko Mäkelä
31aef3ae99 Fix GCC 10.2.0 -Og -Wmaybe-uninitialized
For some reason, GCC emits more -Wmaybe-uninitialized warnings
when using the flag -Og than when using -O2. Many of the warnings
look genuine.
2020-08-11 15:58:16 +03:00
Marko Mäkelä
e685809a3b MDEV-23397 Remove deprecated InnoDB options in 10.6 2020-08-04 12:51:59 +03:00
Marko Mäkelä
9a7948e3f6 Merge 10.5 into 10.6 2020-08-04 07:55:16 +03:00
Marko Mäkelä
bbd70fcc43 MDEV-23379 Deprecate&ignore InnoDB concurrency throttling parameters
The parameters innodb_thread_concurrency and innodb_commit_concurrency
were useful years ago when both computing resources and the implementation
of some shared data structures were limited. MySQL 5.0 or 5.1 had trouble
scaling beyond 8 concurrent connections. Most of the scalability bottlenecks
have been removed since then, and the transactions per second delivered
by MariaDB Server 10.5 should not dramatically drop upon exceeding the
'optimal' number of connections.

Hence, enabling any concurrency throttling for InnoDB actually makes
things worse. We have seen many customers mistakenly setting this to a
small value like 16 or 64 and then complaining the server was slow.

Ignoring the parameters allows us to remove some normally unused code
and data structures, which could slightly improve performance.

innodb_thread_concurrency, innodb_commit_concurrency,
innodb_replication_delay, innodb_concurrency_tickets,
innodb_thread_sleep_delay, innodb_adaptive_max_sleep_delay:
Deprecate and ignore; hard-wire to 0.

The column INFORMATION_SCHEMA.INNODB_TRX.trx_concurrency_tickets
will always report 0.
2020-08-04 06:59:29 +03:00
Marko Mäkelä
50a11f396a Merge 10.4 into 10.5 2020-08-01 14:42:51 +03:00
Marko Mäkelä
9216114ce7 Merge 10.3 into 10.4 2020-07-31 18:09:08 +03:00
Marko Mäkelä
66ec3a770f Merge 10.2 into 10.3 2020-07-31 13:51:28 +03:00
Thirunarayanan Balathandayuthapani
fe39d02f51 MDEV-20638 Remove the deadcode from srv_master_thread() and srv_active_wake_master_thread_low()
- Due to commit fe95cb2e40 (MDEV-16125),
InnoDB master thread does not need to call srv_resume_thread()
and therefore there is no need to wake up the thread.
Due to the above patch, InnoDB should remove the following dead code.

srv_check_activity(): Makes the parameter as in,out and returns the
recent activity value

innobase_active_small(): Removed

srv_active_wake_master_thread(): Removed

srv_wake_master_thread(): Removed

srv_active_wake_master_thread_low(): Removed

Simplify srv_master_thread() and remove switch cases, added the assert.

Replace srv_wake_master_thread() with srv_inc_activity_count()

INNOBASE_WAKE_INTERVAL: Removed
2020-07-23 16:23:20 +05:30
Vladislav Vaintroub
9701759b3d MDEV-23043 Refactor Windows service handling
Removed the existing nt_service classes - they provide little
abstraction, and only obscure a relatively simple service handling.
This replaces by similar code inspired by MS docs samples.

Service handling is now moved into winmain.cc, which contains
the main() function for Windows.

winmain provides reporting callbacks, which should be used by external code
,to report transitions from starting to running to shutting down to stopped.

Removed a  do-nothing ServiceMain thread, and the
non-working service "pause/continue".  Removed a lot of #ifdef __WIN__
code from mysqld.cc
2020-07-04 18:24:40 +02:00
Vladislav Vaintroub
272828a171 Merge branch '10.5' into 10.6 2020-07-04 11:53:26 +02:00
Marko Mäkelä
1813d92d0c Merge 10.4 into 10.5 2020-07-02 09:41:44 +03:00
Oleksandr Byelkin
b0f836053b MDEV-22983: Mariabackup's --help option disappeared
return --help option
2020-07-01 23:30:44 +03:00
Oleksandr Byelkin
5018b998a7 return --help option 2020-06-23 17:07:03 +02:00
Marko Mäkelä
cfd3d70ccb MDEV-22871: Remove pointer indirection for InnoDB hash_table_t
hash_get_n_cells(): Remove. Access n_cells directly.

hash_get_nth_cell(): Remove. Access array directly.

hash_table_clear(): Replaced with hash_table_t::clear().

hash_table_create(), hash_table_free(): Remove.

hash0hash.cc: Remove.
2020-06-18 14:16:01 +03:00
Marko Mäkelä
c515b1d092 Merge 10.4 into 10.5 2020-06-18 13:58:54 +03:00
Vlad Lesin
205b0ce6ad MDEV-22894: Mariabackup should not read [mariadb-client] option group
from configuration files
2020-06-18 12:19:48 +03:00
Vlad Lesin
9bdf35e90f MDEV-18215: mariabackup does not report unknown command line options
MDEV-21298: mariabackup doesn't read from the [mariadbd] and [mariadbd-X.Y]
server option groups from configuration files
MDEV-21301: mariabackup doesn't read [mariadb-backup] option group in
configuration file

All three issues require to change the same code, that is why their
fixes are joined in one commit.

The fix is in invoking load_defaults_or_exit() and handle_options() for
backup-specific groups separately from client-server groups to let the last
handle_options() call fail on unknown backup-specific options.

The order of options procesing is the following:
1) Load server groups and process server options, ignore unknown
options
2) Load client groups and process client options, ignore unknown
options
3) Load backup groups and process client-server options, exit on
unknown option
4) Process --mysqld-args command line options, ignore unknown options

New global flag my_handle_options_init_variables was added to have
ability to invoke handle_options() for the same allowed options set
several times without re-initialising previously set option values.

--password value destroying is moved from option processing callback to
mariabackup's handle_options() function to have ability to invoke server's
handle_options() several times for the same possible allowed options
set.

Galera invokes wsrep_sst_mariabackup.sh with mysqld command line
options to configure mariabackup as close to the server as possible.
It is not known what server options are supported by mariabackup when the
script is invoked. That is why new mariabackup option "--mysqld-args" is added,
all unknown options that follow this option will be silently ignored.

wsrep_sst_mariabackup.sh was also changed to:
- use "--mysqld-args" mariabackup option to pass mysqld options,
- remove deprecated innobackupex mode,
- remove unsupported mariabackup options:
    --encrypt
    --encrypt-key
    --rebuild-indexes
    --rebuild-threads
2020-06-14 13:23:07 +03:00
mysqlonarm
dec3f8ca69
MDEV-22641: Provide SIMD optimized wrapper for zlib crc32() (#1558)
Existing implementation used my_checksum (from mysys)
for calculating table checksum and binlog checksum.

This implementation was optimized for powerpc only and lacked
SIMD implementation for x86 (using clmul) and ARM
(using ACLE) instead used zlib-crc32.

mariabackup had its own copy of the crc32 implementation
using hardware optimized implementation only for x86 and lagged
hardware based implementation for powerpc and ARM.

Patch helps unifies all such calls and help aggregate all of them
using an unified interface my_checksum().

Said unification also enables hardware optimized calls for all
architecture viz. x86, ARM, POWERPC.
Default always fallback to zlib crc32.

Thanks to Daniel Black for reviewing, fixing and testing
PowerPC changes. Thanks to Marko and Daniel for early code feedback.
2020-06-01 11:34:06 +03:00
Marko Mäkelä
5ece2155cb Merge 10.4 into 10.5 2020-05-20 17:46:05 +03:00
Marko Mäkelä
2bf93a8fd6 Merge 10.3 into 10.4 2020-05-19 21:18:15 +03:00
Marko Mäkelä
79ed33c184 Merge 10.2 into 10.3 2020-05-19 17:05:05 +03:00
Vlad Lesin
0f9bfcc323 MDEV-22554: "mariabackup --prepare" exits with code 0 even though innodb
error is logged

The fix is to set flag in ib::error::~error() and check it in
mariabackup.

ib::error::error() is replaced with ib::warn::warn() in
AIO::linux_create_io_ctx() because of two reasons:

1) if we leave it as is, then mariabackup MTR tests will fail with --mem
option, because Linux AIO can not be used on tmpfs,

2) when Linux AIO can not be initialized, InnoDB falls back to simulated
AIO, so such sutiation is not fatal error, it should be treated as warning.
2020-05-19 11:25:56 +03:00
Vladislav Vaintroub
e2bc029211 MDEV-7021 Pass directory security descriptor from mysql_install_db.exe to bootstrap
This ensures that directory permissions are correct in all cases, even if
boostrap is passed non-standard locations for innodb.

Directory permissions are copied from the datadir.
2020-05-18 18:11:40 +02:00
Vladislav Vaintroub
8cb3060c5c MDEV-22272 Windows installer - run service unter virtual service account
Change mysql_install_db.exe to run service under virtual account.
Set directory permissions so that service has full access to data files.

mariabackup --copy-back permission handling (MDEV-17008) needs to be
changed as well.
Now, whenever a directory is created in course of copy-back,
its permissions are copied from the datadir. This handling assumes,
that datadir already has the correct permissions for the Windows service.
2020-05-18 18:07:26 +02:00
Marko Mäkelä
18a62eb76d MDEV-21133 follow-up: Use fil_page_get_type()
Let us use the common accessor function fil_page_get_type()
instead of accessing the page header field FIL_PAGE_TYPE directly.
2020-05-07 17:15:34 +03:00
Eugene Kosov
89ff4176c1 MDEV-22437 make THR_THD* variable thread_local
Now all access goes through _current_thd() and set_current_thd()
functions.

Some functions like THD::store_globals() can not fail now.
2020-05-05 18:13:31 +03:00
Marko Mäkelä
496d0372ef Merge 10.4 into 10.5 2020-04-29 15:40:51 +03:00
Marko Mäkelä
0632b8034b Merge 10.3 into 10.4 2020-04-29 09:05:15 +03:00
Marko Mäkelä
1fbdcada73 Merge 10.2 into 10.3 2020-04-28 22:29:13 +03:00
Vlad Lesin
d0150dc14e MDEV-20230: mariabackup --ftwrl-wait-timeout never times out on explicit
lock

--ftwrl-wait-timeout does not finish mariabackup execution when acquired
backup lock can't be grabbed for the certain amount of time, it just
waits for a long queries finishing before acquiring the lock to avoid
unnecessary locking.

This commit extends --ftwrl-wait-timeout so, that mariabackup execution
is finished if it waits for backup lock during certain amount of time.
2020-04-27 22:10:50 +03:00
Marko Mäkelä
fbe2712705 Merge 10.4 into 10.5
The functional changes of commit 5836191c8f
(MDEV-21168) are omitted due to MDEV-742 having addressed the issue.
2020-04-25 21:57:52 +03:00
Marko Mäkelä
88cf6f1c7f Merge 10.3 into 10.4 2020-04-22 18:18:51 +03:00
Marko Mäkelä
455cf6196c Merge 10.2 into 10.3 2020-04-22 14:45:55 +03:00
Vlad Lesin
0efe1971c6 MDEV-19347: Mariabackup does not honor ignore_db_dirs from server
config.

The solution is to read the system variable value on startup and to fill
databases_exclude_hash.

xb_load_list_string() became non-static and was reformatted. The system
variable value is read and processed in get_mysql_vars(), which was also
reformatted.
2020-04-21 10:34:37 +03:00
Marko Mäkelä
af91266498 Merge 10.3 into 10.4
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
2020-04-16 12:12:26 +03:00
Marko Mäkelä
84db10f27b Merge 10.2 into 10.3 2020-04-15 09:56:03 +03:00
Thirunarayanan Balathandayuthapani
6bbc0eedc6 MDEV-22193 Avoid un-necessary page initialization during recovery
- InnoDB is doing un-necessary redo log page initialisation during
recovery and unnecessary traversal of redo log during last phase.
This patch does the optimization of removing unnecessary redo log page
initialisation and detects the memory exhaust earlier.
2020-04-09 21:25:31 +05:30
Vlad Lesin
5836191c8f MDEV-21168: Active XA transactions stop slave from working after backup
was restored.

Optionally rollback prepared XA's on "mariabackup --prepare".

The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves.
2020-04-07 15:05:38 +03:00
Daniel Black
e8351934b6
Merge pull request #1221 from grooverdan/10.4-MDEV-18851-multiple-sized-large-page-support
MDEV-18851: multiple sized large page support (linux)
2020-04-02 23:54:08 +04:00
Marko Mäkelä
aae3f921ad Cleanup recv_sys: Move things to members
recv_sys.recovery_on: Replaces recv_recovery_on.

recv_sys_t::apply(): Replaces recv_apply_hashed_log_recs().

recv_sys_var_init(): Remove.

recv_sys_t::recover_low(): Attempt to initialize a page based
on buffered redo log records.
2020-03-30 18:45:09 +03:00
Vladislav Vaintroub
1b58ef7d3f Build cleanups.
Fix clang-cl built
2020-03-25 15:53:38 +01:00
Rasmus Johansson
9e1b3af4a4 MDEV-21303 Make executables MariaDB named
To change all executables to have a mariadb name I had to:
- Do name changes in every CMakeLists.txt that produces executables
- CREATE_MARIADB_SYMLINK was removed and GET_SYMLINK added by Wlad to reuse the function in other places also
- The scripts/CMakeLists.txt could make use of GET_SYMLINK instead of introducing redundant code, but I thought I'll leave that for next release
- A lot of changes to debian/.install and debian/.links files due to swapping of real executable and symlink. I did not however change the name of the manpages, so the real name is still mysql there and mariadb are symlinks.
- The Windows part needed a change now when we made the executables mariadb -named. MSI (and ZIP) do not support symlinks and to not break backward compatibility we had to include mysql named binaries also. Done by Wlad
2020-03-21 20:20:29 +01:00
Sergei Golubchik
91d1588d30 Merge branch 'github/10.5' into 10.5 2020-03-14 09:52:35 +01:00
Marko Mäkelä
f224525204 MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC
The -Wconversion in GCC seems to be stricter than in clang.
GCC at least since version 4.4.7 issues truncation warnings for
assignments to bitfields, while clang 10 appears to only issue
warnings when the sizes in bytes rounded to the nearest integer
powers of 2 are different.

Before GCC 10.0.0, -Wconversion required more casts and would not
allow some operations, such as x<<=1 or x+=1 on a data type that
is narrower than int.

GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
about x|=y even when x and y are compatible types that are narrower
than int.  Hence, we must rewrite some x|=y as
x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.

In GCC 6 and later, the warning for assigning wider to bitfields
that are narrower than 8, 16, or 32 bits can be suppressed by
applying a bitwise & with the exact bitmask of the bitfield.
For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
cases.

The bitwise negation operator appears to promote short integers
to a wider type, and hence we must add explicit truncation casts
around them. Microsoft Visual C does not allow a static_cast to
truncate a constant, such as static_cast<byte>(1) truncating int.
Hence, we will use the constructor-style cast byte(~1) for such cases.

This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
2020-03-12 19:46:41 +02:00
Oleksandr Byelkin
fad47df995 Merge branch '10.4' into 10.5 2020-03-11 17:52:49 +01:00
Oleksandr Byelkin
b7362d5fbc Merge branch '10.3' into 10.4 2020-03-11 14:28:24 +01:00
Oleksandr Byelkin
3c9bc0ce19 Merge branch '10.2' into 10.3 2020-03-11 14:05:41 +01:00
Marko Mäkelä
574d8b2940 MDEV-21907: Fix most clang -Wconversion in InnoDB
Declare innodb_purge_threads as 4-byte integer (UINT)
instead of 4-or-8-byte (ULONG) and adjust the documentation string.
2020-03-11 08:29:48 +02:00
Sergei Golubchik
c1c5222cae cleanup: PSI key is *always* the first argument 2020-03-10 19:24:23 +01:00
Sergei Golubchik
7c58e97bf6 perfschema memory related instrumentation changes 2020-03-10 19:24:22 +01:00
Eugene Kosov
2b8b85bd0a fix use-after-free 2020-03-10 15:14:53 +03:00
Marko Mäkelä
4383897a01 MDEV-14425 preparation: Remove log_header_read()
The function log_header_read() was only used during server startup,
and it will mostly be used only for reading checkpoint information
from pre-MDEV-14425 format redo log files.

Let us replace the function with more direct calls, so that
it is clearer what is going on. It is not strictly necessary to
hold any mutex during this operation, and because there will be
only a limited number of operations during early server startup,
it is not necessary to increment any I/O counters.
2020-03-04 10:08:33 +02:00
Marko Mäkelä
8511f04fdb Cleanup: Remove srv_start_lsn
Most of the time, we can refer to recv_sys.recovered_lsn.
2020-03-02 15:01:46 +02:00
Eugene Kosov
9ef2d29ff4 MDEV-14425 deprecate and ignore innodb_log_files_in_group
Now there can be only one log file instead of several which
logically work as a single file.

Possible names of redo log files: ib_logfile0,
ib_logfile101 (for just created one)

innodb_log_fiels_in_group: value of this variable is not used
by InnoDB. Possible values are still 1..100, to not break upgrade

LOG_FILE_NAME: add constant of value "ib_logfile0"
LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile"

get_log_file_path(): convenience function that returns full
path of a redo log file

SRV_N_LOG_FILES_MAX: removed

srv_n_log_files: we can't remove this for compatibility reasons,
but now server doesn't use this variable

log_sys_t::file::fd: now just one, not std::vector

log_sys_t::log_capacity: removed word 'group'

find_and_check_log_file(): part of logic from huge srv_start()
moved here

recv_sys_t::files: file descriptors of redo log files.
There can be several of those in case we're upgrading
from older MariaDB version.

recv_sys_t::remove_extra_log_files: whether to remove
ib_logfile{1,2,3...} after successfull upgrade.

recv_sys_t::read(): open if needed and read from one
of several log files

recv_sys_t::files_size(): open if needed and return files count

redo_file_sizes_are_correct(): check that redo log files
sizes are equal. Just to log an error for a user.
Corresponding check was moved from srv0start.cc

namespace deprecated: put all deprecated variables here to
prevent usage of it by us, developers
2020-02-19 12:21:59 +03:00
Marko Mäkelä
f8a9f90667 MDEV-12353: Remove support for crash-upgrade
We tighten some assertions regarding dict_index_t::is_dummy
and crash recovery, now that redo log processing will
no longer create dummy objects.
2020-02-13 19:13:45 +02:00
Marko Mäkelä
7ae21b18a6 MDEV-12353: Change the redo log encoding
log_t::FORMAT_10_5: physical redo log format tag

log_phys_t: Buffered records in the physical format.
The log record bytes will follow the last data field,
making use of alignment padding that would otherwise be wasted.
If there are multiple records for the same page, also those
may be appended to an existing log_phys_t object if the memory
is available.

In the physical format, the first byte of a record identifies the
record and its length (up to 15 bytes). For longer records, the
immediately following bytes will encode the remaining length
in a variable-length encoding. Usually, a variable-length-encoded
page identifier will follow, followed by optional payload, whose
length is included in the initially encoded total record length.

When a mini-transaction is updating multiple fields in a page,
it can avoid repeating the tablespace identifier and page number
by setting the same_page flag (most significant bit) in the first
byte of the log record. The byte offset of the record will be
relative to where the previous record for that page ended.

Until MDEV-14425 introduces a separate file-level log for
redo log checkpoints and file operations, we will write the
file-level records in the page-level redo log file.
The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
will be removed in MDEV-14425, and one sequential scan of the
page recovery log will suffice.

Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
If the information is needed, it can be parsed from WRITE records that
modify FSP_SPACE_FLAGS.

MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
as part of this work, before being replaced with WRITE (along with
MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).

mtr_buf_t::empty(): Check if the buffer is empty.

mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.

mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
for the same_page encoding.

page_recv_t::last_offset: Reflects mtr_t::m_last_offset.

Valid values for last_offset during recovery should be 0 or above 8.
(The first 8 bytes of a page are the checksum and the page number,
and neither are ever updated directly by log records.)
Internally, the special value 1 indicates that the same_page form
will not be allowed for the subsequent record.

mtr_t::page_create(): Take the block descriptor as parameter,
so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
record will always followed by a subtype byte, because same_page
records must be longer than 1 byte.

trx_undo_page_init(): Combine the writes in WRITE record.

trx_undo_header_create(): Write 4 bytes using a special MEMSET
record that includes 1 bytes of length and 2 bytes of payload.

flst_write_addr(): Define as a static function. Combine the writes.

flst_zero_both(): Replaces two flst_zero_addr() calls.

flst_init(): Do not inline the function.

fsp_free_seg_inode(): Zerofill the whole inode.

fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
to FIL_NULL when using the physical format.

btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
must have been invoked.

fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.

fil_names_dirty_and_write(): Remove the parameter mtr.
Write the records using a separate mini-transaction object,
because any FILE_ records must be at the start of a mini-transaction log.

recv_recover_page(): Add a fil_space_t* parameter.
After applying log to the a ROW_FORMAT=COMPRESSED page,
invoke buf_zip_decompress() to restore the uncompressed page.

buf_page_io_complete(): Remove the temporary hack to discard the
uncompressed page of a ROW_FORMAT=COMPRESSED page.

page_zip_write_header(): Remove. Use mtr_t::write() or
mtr_t::memset() instead, and update the compressed page frame
separately.

trx_undo_header_add_space_for_xid(): Remove.

trx_undo_seg_create(): Perform the changes that were previously
made by trx_undo_header_add_space_for_xid().

btr_reset_instant(): New function: Reset the table to MariaDB 10.2
or 10.3 format when rolling back an instant ALTER TABLE operation.

page_rec_find_owner_rec(): Merge with the only callers.

page_cur_insert_rec_low(): Combine writes by using a local buffer.
MEMMOVE data from the preceding record whenever feasible
(copying at least 3 bytes).

page_cur_insert_rec_zip(): Combine writes to page header fields.

PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
part from the preceding record.

PageBulk::finishPage(): Combine the writes to the page header
and to the sparse page directory slots.

mtr_t::write(): Only log the least significant (last) bytes
of multi-byte fields that actually differ.

For updating FSP_SIZE, we must always write all 4 bytes to the
redo log, so that the fil_space_set_recv_size() logic in
recv_sys_t::parse() will work.

mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
instead of a numeric offset to the page frame. Only log the
last bytes of multi-byte fields that actually differ.

In fil_space_crypt_t::write_page0(), we must log also any
unchanged bytes, so that recovery will recognize the record
and invoke fil_crypt_parse().

Future work:
MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
MDEV-21725 Optimize btr_page_reorganize_low() redo logging
MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
2020-02-13 19:12:17 +02:00
Marko Mäkelä
67c76704a8 MDEV-12353: Remove MLOG_INDEX_LOAD (innodb_log_optimize_ddl)
NOTE: This may break crash-upgrade from a dataset that was
created with innodb_log_optimize_ddl=ON. Also due to
ROW_FORMAT=COMPRESSED pages, it will be easiest to disallow
crash-upgrade.

It would be more robust to disable the MDEV-12699 logic when
crash-upgrading from old redo log format.

log_optimized_ddl_op: Remove.

fil_space_t::enable_lsn, file_name_t::enable_lsn: Remove.

ddl_tracker_t::optimized_ddl: Remove.

TODO: Remove ddl_tracker
2020-02-13 18:19:15 +02:00
Marko Mäkelä
1a6f708ec5 MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances
Our benchmarking efforts indicate that the reasons for splitting the
buf_pool in commit c18084f71b
have mostly gone away, possibly as a result of
mysql/mysql-server@ce6109ebfd
or similar work.

Only in one write-heavy benchmark where the working set size is
ten times the buffer pool size, the buf_pool->mutex would be
less contended with 4 buffer pool instances than with 1 instance,
in buf_page_io_complete(). That contention could be alleviated
further by making more use of std::atomic and by splitting
buf_pool_t::mutex further (MDEV-15053).

We will deprecate and ignore the following parameters:

	innodb_buffer_pool_instances
	innodb_page_cleaners

There will be only one buffer pool and one page cleaner task.

In a number of INFORMATION_SCHEMA views, columns that indicated
the buffer pool instance will be removed:

	information_schema.innodb_buffer_page.pool_id
	information_schema.innodb_buffer_page_lru.pool_id
	information_schema.innodb_buffer_pool_stats.pool_id
	information_schema.innodb_cmpmem.buffer_pool_instance
	information_schema.innodb_cmpmem_reset.buffer_pool_instance
2020-02-12 14:45:21 +02:00
Eugene Kosov
691c691adc clean up redo log
main change: rename first redo log without file close

second change: use os_offset_t to represent offset in a file

third change: fix log texts
2020-02-01 23:58:24 +08:00
Marko Mäkelä
50324ce624 MDEV-21351 Replace recv_sys.heap with list of buf_block_t
InnoDB crash recovery used a special type of mem_heap_t that
allocates backing store from the buffer pool. That incurred
a significant overhead, leading to underutilization of memory,
and limiting the maximum contiguous allocated size of a log record.

recv_sys_t::blocks: A linked list of buf_block_t that are allocated
by buf_block_alloc() for redo log records. Replaces recv_sys_t::heap.
We repurpose buf_block_t::unzip_LRU for linking the elements.

recv_sys_t::max_log_blocks: Renamed from recv_n_pool_free_frames.

recv_sys_t::max_blocks(): Accessor for max_log_blocks.

recv_sys_t::alloc(): Allocate memory from the current recv_sys_t::blocks
element, or allocate another block.  In debug builds, various free()
member functions must be invoked, because we repurpose
buf_page_t::buf_fix_count for tracking allocations.

recv_sys_t::free_corrupted_page(): Renamed from recv_recover_corrupt_page()

recv_sys_t::is_memory_exhausted(): Renamed from recv_sys_heap_check()

recv_sys_t::pages and its elements are allocated directly by the
system memory allocator.

recv_parse_log_recs(): Remove the parameter available_memory.

We rename some variables 'store_to_hash' to 'store', because
recv_sys.pages is not actually a hash table.

This is joint work with Thirunarayanan Balathandayuthapani.
2020-01-29 12:53:39 +02:00
Marko Mäkelä
a983b24407 Merge 10.4 into 10.5 2020-01-28 14:17:09 +02:00
Oleksandr Byelkin
bfc24bb2ec Merge branch '10.3' into 10.4 2020-01-24 14:50:23 +01:00
Oleksandr Byelkin
ceda5f724f Merge branch '10.2' into 10.3 2020-01-24 14:16:20 +01:00
Oleksandr Byelkin
f2ccfcaca1 Merge branch '10.1' into 10.2 2020-01-24 13:46:49 +01:00
Julius Goryavsky
982294ac16 MDEV-17601: MariaDB Galera does not expect 'mbstream' as streamfmt
Setting "streamfmt=mbstream" in the "[sst]" section causes SST to fail
because the format automatically switches to 'tar' by default (insead
of mbstream).

To fix this, we need to add mbstream to the list of valid values for
the format, making it synonymous with xbstream. This must be done both
in the SST script and when parsing the options of the corresponding
utilities.
2020-01-21 10:50:48 +01:00
Eugene Kosov
e9de6386ad MDEV-18115 remove now unneeded constraint
log_group_max_size: is not needed because redo log do not use fil_io() now
2020-01-18 23:42:55 +08:00
Sergei Golubchik
ff5a528f26 mysqltest crashes on Debian
Debian is apparently offended that pcre2-posix implements POSIX API,
thus it renames all posix-compatible symbols in libpcre2-posix to have the
PCRE2 prefix. But Debian doesn't do anything to pcre2posix.h header,
so any unaware application will get POSIX compatible type names
and function prototypes from pcre2, but actual symbols will come
from libc.

To remedy this enormous incongruity we have to redefine POSIX-compatible
function names in pcre2posix to match Debian's hack.
2020-01-16 18:13:55 +01:00
Eugene Kosov
562c037b48 MDEV-18115 Remove dummy tablespace for the redo log
Redo log subsystem was decoupled from tablespace subsystem. It now manages file
descriptors for redo log files by itself.

FIL_TYPE_LOG: removed, code in various places was simplified

SRV_LOG_SPACE_FIRST_ID: renamed to SRV_SPACE_ID_UPPER_BOUND
  to better match its purpose. Code in various places was simplified

fil_n_log_flushes: replaced with log_sys::flushes
fil_n_pending_log_flushes: replaced with log_sys::pending_flushes

log_t::files::files: redo log file descriptors
log_t::files::file_names: redo log file names

log_t::files::set_file_names(): set file names without opening them
log_t::files::open_files(): opens redo log files
log_t::files::read(): treats several files as one big
log_t::files::write(): treats several files as one big
log_t::files::fsync(): flushes page cache to disk
log_t::files::close_files(): closes redo log files

fil_open_log_and_system_tablespace_files(): renamed to
  fil_open_system_tablespace_files()
  and obviously it now doesn't open redo log files

global files[1000]: removed. Why it was needed at all?
2020-01-01 22:09:51 +08:00
Marko Mäkelä
8cc15c036d Merge 10.4 into 10.5 2019-12-27 21:17:16 +02:00
Marko Mäkelä
4c25e75ce7 Merge 10.3 into 10.4 2019-12-27 18:20:28 +02:00
Marko Mäkelä
5ab70e7f68 Merge 10.2 into 10.3 2019-12-27 15:14:48 +02:00
Thirunarayanan Balathandayuthapani
bba59abb03 MDEV-19176 Reduce the memory usage during recovery
- Moved the recv_sys->heap memory condition inside recv_parse_log_recs().
So that, InnoDB can mark the status as STORE_NO earlier.

- InnoDB uses one third of buffer pool chunk size for reading the redo
log records. In that case, we can avoid the scenario where buffer ran
out of memory issue during recovery.
2019-12-23 15:51:02 +05:30
Alexey Botchkov
9dadfdcde5 MDEV-14024 PCRE2.
Related changes in the server code.
2019-12-21 10:34:02 +01:00
Marko Mäkelä
28c89b7151 Merge 10.4 into 10.5 2019-12-16 07:47:17 +02:00
Marko Mäkelä
8fa759a576 Merge 10.3 into 10.4
We disable the MDEV-21189 test galera.galera_partition
because it times out.
2019-12-13 17:30:37 +02:00