mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 03:52:35 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	e71aca8200	Merge 10.9 into 10.10	2022-08-30 13:33:02 +03:00
Marko Mäkelä	c8cd162a0a	Merge 10.7 into 10.8	2022-08-30 13:04:17 +03:00
Marko Mäkelä	b86be02ecf	Merge 10.6 into 10.7	2022-08-30 13:02:42 +03:00
Marko Mäkelä	f410974f0f	Merge 10.5 into 10.6	2022-08-30 13:01:16 +03:00
Marko Mäkelä	259050f864	Merge 10.9 into 10.10	2022-08-29 14:04:25 +03:00
Daniel Black	0324bde846	mariabackup: remove MySQL wording	2022-08-26 11:52:53 +10:00
Daniel Black	79b58f1ca8	MDEV-23607 MariaBackup - align required GRANTS to cmd options Since the 10.5 split of the privileges, the required GRANTs for various mariabackup operations has changed. In the addition of tests, a number of mappings where incorrect: The option --lock-ddl-per-table didn't require connection admin. The option --safe-slave-backup requires SLAVE MONITOR even without the --no-lock option.	2022-08-26 11:52:53 +10:00
Marko Mäkelä	2bddc5d045	Merge 10.7 into 10.8	2022-08-24 10:22:37 +03:00
Marko Mäkelä	bdd80e3fb1	Merge 10.6 into 10.7	2022-08-24 09:22:34 +03:00
Marko Mäkelä	d65a2b7bde	Merge 10.5 into 10.6	2022-08-22 14:02:43 +03:00
Marko Mäkelä	1d90d6874d	Merge 10.4 into 10.5	2022-08-22 13:38:40 +03:00
Marko Mäkelä	36d173e523	Merge 10.3 into 10.4	2022-08-22 12:34:42 +03:00
Marko Mäkelä	c2df3d30c0	MDEV-21452 fixup: Avoid an unnecessary mutex operation	2022-08-19 09:21:02 +03:00
Marko Mäkelä	a1055ab35d	MDEV-29043 mariabackup --compress hangs Even though commit `b817afaa1c` passed the test mariabackup.compress_qpress, that test turned out to be too small to reveal one more problem that had previously been prevented by the existence of ctrl_mutex. I did not realize that there can be multiple concurrent callers to compress_write(). One of them is the log copying thread; further callers are data file copying threads (default: --parallel=1). By default, there is only one compression worker thread (--compress-threads=1). compress_write(): Fix a race condition between threads that would use the same worker thread object. Make thd->data_avail contain the thread identifier of the submitter, and add thd->avail_cond to notify other compress_write() threads that are waiting for a slot.	2022-08-19 09:18:24 +03:00
Oleksandr Byelkin	75d631f333	Merge branch '10.7' into 10.8	2022-08-09 09:52:15 +02:00
Oleksandr Byelkin	4c18f68d59	Merge branch '10.9' into 10.10	2022-08-09 09:47:16 +02:00
Oleksandr Byelkin	50b270525a	Merge branch '10.7' into 10.8	2022-08-08 17:15:13 +02:00
Oleksandr Byelkin	1d48041982	Merge branch '10.6' into 10.7	2022-08-08 17:12:32 +02:00
Oleksandr Byelkin	d2f1c3ed6c	Merge branch '10.5' into bb-10.6-release	2022-08-03 12:19:59 +02:00
Oleksandr Byelkin	af143474d8	Merge branch '10.4' into 10.5	2022-08-03 07:12:27 +02:00
Oleksandr Byelkin	48e35b8cf6	Merge branch '10.3' into 10.4	2022-08-02 14:15:39 +02:00
Sergei Golubchik	5b4154373a	only copy buffer pool dump in SST galera mode and then only into the default name, so that the joiner could find it	2022-08-01 15:53:14 +02:00
Sergei Golubchik	5197519f4f	revert mariabackup part of MDEV-27524, fix the test	2022-08-01 15:53:13 +02:00
Sergei Golubchik	e1caa4bd5e	don't use ssl for windows named pipes - it doesn't work	2022-07-28 17:18:40 +02:00
Marko Mäkelä	4ce6e78059	Merge 10.9 into 10.10	2022-07-28 11:25:21 +03:00
Marko Mäkelä	f79cebb4d0	Merge 10.7 into 10.8	2022-07-28 10:33:26 +03:00
Marko Mäkelä	742e1c727f	Merge 10.6 into 10.7	2022-07-27 18:26:21 +03:00
Marko Mäkelä	30914389fe	Merge 10.5 into 10.6	2022-07-27 17:52:37 +03:00
Marko Mäkelä	098c0f2634	Merge 10.4 into 10.5	2022-07-27 17:17:24 +03:00
Oleksandr Byelkin	3bb36e9495	Merge branch '10.3' into 10.4	2022-07-27 11:02:57 +02:00
Thirunarayanan Balathandayuthapani	1d3629875e	MDEV-29137 mariabackup excessive logging of ddl tracking - Remove the FILE_MODIFY message in backup_file_op()	2022-07-26 11:33:52 +05:30
Thirunarayanan Balathandayuthapani	6156a2be30	MDEV-29137 mariabackup excessive logging of ddl tracking - Remove the FILE_MODIFY message from mariabackup which was displaying the list of file names which were modified since the previous checkpoint.	2022-07-25 17:03:40 +05:30
Marko Mäkelä	b817afaa1c	MDEV-28689, MDEV-28690: Remove ctrl_mutex This reverts the revert `4f62dfe676` and fixes the hang that was introduced when ctrl_mutex was removed. The test mariabackup.compress_qpress covers this code, but the test is skipped if a stand-alone qpress executable is not available. It is not available in many software repositories, possibly because the code base has not been updated since 2010. This was tested with an executable that was compile from the source code at http://www.quicklz.com/qpress-11-source.zip (after adding a missing #include <unistd.h> for the definition of isatty()). Compared to the grandparent commit (before the revert), the changes are as follows: comp_thread_ctxt_t::done_cond: A separate condition for completed compression, signaling that thd->to_len has been updated. compress_write(): Replace some threads[i] with thd. Reset thd->to_len = 0 after consuming the compressed data. compress_worker_thread_func(): After consuming the uncompressed data, set thd->data_avail = FALSE. After compressing, signal thd->done_cond.	2022-07-11 21:00:18 +03:00
Vladislav Vaintroub	4f62dfe676	Revert "MDEV-28689, MDEV-28690: Incorrect error handling for ctrl_mutex" This reverts commit `863c3eda87`.	2022-07-11 15:00:34 +02:00
Marko Mäkelä	155019b96b	MDEV-28994 Backup of memory-mapped log is corrupted An interface to use memory-mapped I/O on the InnoDB redo log that is stored in persistent memory was introduced in commit `685d958e38` (MDEV-14425). log_t::attach(): In mariadb-backup --backup, never attempt to use memory-mapped I/O for reading the log file of the server. xtrabackup_copy_logfile(): Assert !log_sys.is_pmem() and remove the code to deal with a memory-mapped log. This fixes a race condition scenario of the following type: 1. Backup parsed a mini-transaction from the memory-mapped buffer. This took some time. 2. Meanwhile, the server might have overwritten this portion of the circular log_sys.buf. 3. Backup copied the data to the output file while or after the server had overwritten this portion of the file. 4. Backup failed to notice that a log overrun occurred. The symptom of this was that a mariadb-backup --prepare of the log failed. In the analyzed case, the error message was: [ERROR] InnoDB: Missing FILE_CHECKPOINT(...) This will also make it possible to run mariadb-backup --backup under "rr replay".	2022-07-01 18:07:07 +03:00
Marko Mäkelä	d371e35257	Merge 10.9 into 10.10	2022-06-17 11:31:53 +03:00
Marko Mäkelä	cb19e211ec	Merge 10.7 into 10.8	2022-06-16 11:15:21 +03:00
Marko Mäkelä	a8c22dae8b	Merge 10.6 into 10.7	2022-06-16 10:50:58 +03:00
Marko Mäkelä	5bb90cb2ac	Merge 10.5 into 10.6	2022-06-16 10:01:29 +03:00
Vlad Lesin	27309fc6b0	MDEV-28832 infinite loop in mariabackup if log LOG_HEADER_FORMAT field is 0 Avoid the loop with getting rid of back and forth jumping.	2022-06-15 13:30:42 +03:00
Marko Mäkelä	32edabd1f2	Merge 10.9 into 10.10	2022-06-09 15:26:09 +03:00
Marko Mäkelä	57d4a242da	Merge 10.7 into 10.8	2022-06-06 16:22:09 +03:00
Marko Mäkelä	7e39470e33	Merge 10.6 into 10.7	2022-06-06 14:56:20 +03:00
Marko Mäkelä	0b47c126e3	MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit `177d8b0c12` is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.	2022-06-06 14:03:22 +03:00
Marko Mäkelä	6b9bba41e8	MDEV-28554: Remove innodb_version INNODB_VERSION_STR: Replaced with PACKAGE_VERSION (non-functional change). INNODB_VERSION_SHORT: Replaced with direct use of MYSQL_VERSION_MAJOR << 8 \| MYSQL_VERSION_MINOR. check_version(): Simplify the mariadb-backup version check, and require the server version to be MariaDB 10.8 or later, because that is when the InnoDB redo log format was last changed.	2022-06-03 12:20:19 +03:00
Marko Mäkelä	2f8d0af883	Merge 10.5 into 10.6	2022-06-02 17:39:13 +03:00
Marko Mäkelä	4b3c3e526e	Merge 10.4 into 10.5	2022-06-02 16:51:13 +03:00
Marko Mäkelä	96f4b4a55b	Merge 10.3 into 10.4	2022-06-02 16:34:17 +03:00
Marko Mäkelä	91d5fffa07	MDEV-28719: compress_write() leaks data_mutex on error	2022-06-01 11:20:47 +03:00
Marko Mäkelä	863c3eda87	MDEV-28689, MDEV-28690: Incorrect error handling for ctrl_mutex comp_thread_ctxt_t: Remove ctrl_mutex, ctrl_cond, started. We do not actually need them for anything. destroy_worker_thread(): Split from destroy_worker_threads(). create_worker_threads(): We already initialize thd->data_avail=FALSE and thd->cancelled=FALSE before invoking pthread_create(). If any thread creation fails, clean up by destroy_worker_thread(). compress_worker_thread_func(): Assume that thd->started and thd->data_avail are already initialized. Reviewed by: Vladislav Vaintroub	2022-05-30 15:49:45 +03:00
Sergei Golubchik	b7ffccf49b	Merge branch '10.7' into 10.8	2022-05-18 13:26:48 +02:00
Sergei Golubchik	99a433ed1c	Merge branch '10.6' into 10.7	2022-05-18 10:34:38 +02:00
Marko Mäkelä	daa2680c78	Merge 10.5 into 10.6	2022-05-12 08:11:57 +03:00
Vlad Lesin	3fabdc3ca8	MDEV-28473 field_ref_zero is not initialized in xtrabackup_prepare_func() The solution is to initialize field_ref_zero in main_low() before xtrabackup_backup_func() and xtrabackup_prepare_func() calls.	2022-05-11 17:20:31 +03:00
Sergei Golubchik	443c2a715d	Merge branch '10.7' into 10.8	2022-05-11 12:21:36 +02:00
Sergei Golubchik	fd132be117	Merge branch '10.6' into 10.7	2022-05-11 11:25:33 +02:00
Sergei Golubchik	3bc98a4ec4	Merge branch '10.5' into 10.6	2022-05-10 14:01:23 +02:00
Sergei Golubchik	ef781162ff	Merge branch '10.4' into 10.5	2022-05-09 22:04:06 +02:00
Sergei Golubchik	a70a1cf3f4	Merge branch '10.3' into 10.4	2022-05-08 23:03:08 +02:00
Oleksandr Byelkin	9614fde1aa	Merge branch '10.2' into 10.3	2022-05-03 10:59:54 +02:00
Alexander Barkov	680ca15269	MDEV-28446 mariabackup prepare fails for incrementals if a new schema is created after full backup is taken When "mariabackup --target-dir=$basedir --incremental-dir=$incremental_dir" is running and is moving a new table file (e.g. `db1/t1.new`) from the incremental directory to the base directory, it needs to verify that the base backup database directory (e.g. `$basedir/db1`) really exists (or create it otherwise). The table `db1/t1` can come from a new database `db1` which was created during the base mariabackup execution time. In such case the directory `db1` exists only in the incremental directory, but does not exist in the base directory.	2022-05-02 11:21:10 +04:00
Marko Mäkelä	133c2129cd	Merge 10.7 into 10.8	2022-04-27 10:43:00 +03:00
Marko Mäkelä	638afc4acf	Merge 10.6 into 10.7	2022-04-26 18:59:40 +03:00
Alexander Barkov	907e4c62ce	MDEV-21037 mariabackup does not detect multi-source replication slave	2022-04-25 15:00:09 +04:00
Marko Mäkelä	fae0ccad6e	Merge 10.5 into 10.6	2022-04-21 17:46:40 +03:00
Marko Mäkelä	620c55e708	Merge 10.4 into 10.5	2022-04-21 15:33:50 +03:00
Vlad Lesin	1b558dd462	MDEV-27919 mariabackup --log-copy-interval is measured in millisecondss in 10.5 and in microseconds in 10.6 Multiply polling interval by 1000.	2022-04-21 15:24:59 +03:00
Marko Mäkelä	394784095e	Merge 10.3 into 10.4	2022-04-21 11:33:59 +03:00
Sergei Golubchik	bbdec04d59	MDEV-24317 Data race in LOGGER::init_error_log at sql/log.cc:1443 and in LOGGER::error_log_print at sql/log.cc:1181 don't initialize error_log_handler_list in set_handlers() * error_log_handler_list is initialized to LOG_FILE early, in init_base() * set_handlers always reinitializes it to LOG_FILE, so it's pointless * after init_base() concurrent threads start using sql_log_warning, so following set_handlers() shouldn't modify error_log_handler_list without some protection	2022-04-12 13:07:20 +02:00
Marko Mäkelä	5d8dcfd86c	MDEV-25975: Merge 10.4 into 10.5	2022-04-06 10:30:49 +03:00
Marko Mäkelä	d172df9913	MDEV-25975: Merge 10.3 into 10.4	2022-04-06 09:18:38 +03:00
Marko Mäkelä	e9735a8185	MDEV-25975 innodb_disallow_writes causes shutdown to hang We will remove the parameter innodb_disallow_writes because it is badly designed and implemented. The parameter was never allowed at startup. It was only internally used by Galera snapshot transfer. If a user executed SET GLOBAL innodb_disallow_writes=ON; the server could hang even on subsequent read operations. During Galera snapshot transfer, we will block writes to implement an rsync friendly snapshot, as follows: sst_flush_tables() will acquire a global lock by executing FLUSH TABLES WITH READ LOCK, which will block any writes at the high level. sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true), will suspend or disable InnoDB background tasks or threads that could initiate writes. As part of this, log_make_checkpoint() will be invoked to ensure that anything in the InnoDB buf_pool.flush_list will be written to the data files. This has the nice side effect that the Galera joiner will avoid crash recovery. The changes to sql/wsrep.cc and to the tests are based on a prototype that was developed by Jan Lindström. Reviewed by: Jan Lindström	2022-04-06 08:06:49 +03:00
Marko Mäkelä	dce8a846ae	Merge 10.7 into 10.8	2022-03-03 11:34:58 +02:00
Marko Mäkelä	64ea3eab8f	Merge 10.6 into 10.7	2022-03-03 11:11:00 +02:00
Otto Kekäläinen	1fa872f6ef	Fix various spelling errors Among others: existance -> existence reinitialze -> reinitialize successfuly -> successfully	2022-03-03 13:42:49 +11:00
Marko Mäkelä	32d741b5b0	Merge 10.7 into 10.8	2022-02-25 16:24:13 +02:00
Marko Mäkelä	3d88f9f34c	Merge 10.6 into 10.7	2022-02-25 16:09:16 +02:00
Marko Mäkelä	a23414dd32	MDEV-27939 Log buffer wrap-around errors on PMEM When the log is stored in persistent memory, log_sys.buf[] is a ring buffer that directly maps to the circular ib_logfile0 file. There were several errors that could occur in the special case when a log record ends exactly at the end of the log file and the next record would start at log_sys.buf[log_sys.START_OFFSET]. mariabackup.huge_lsn,strict_full_crc32: Write the first record at the very end of the circular file, to reproduce the failure scenarios. recv_sys_t::parse(): On PMEM, wrap the end offset of the record from log_sys.file_size to log_sys.START_OFFSET if needed. Otherwise, both InnoDB recovery and mariadb-backup would try to parse the next record from an invalid address. filename_to_spacename(): Remove an assumption about the format of file names. While the server currently writes file names like ./databasename/tablename.ibd we might want to stop writing the redundant ./ prefix in the future. The test mariabackup.huge_lsn is generating such file names. xtrabackup_copy_logfile(): Correctly copy a record that ends at the very end of the log_sys.buf[]. The errors in mariadb-backup were reproduced with the test mariabackup.huge_lsn,strict_full_crc32 and an additional patch to use the start checkpoint of the test: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 27dce5fa17d..e17a1692d6f 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -1796,7 +1796,8 @@ dberr_t recv_sys_t::find_checkpoint() continue; } - if (checkpoint_lsn >= log_sys.next_checkpoint_lsn) + if (checkpoint_lsn >= log_sys.next_checkpoint_lsn && + checkpoint_lsn != 0x1000fffffe10) { log_sys.next_checkpoint_lsn= checkpoint_lsn; log_sys.next_checkpoint_no= field == log_t::CHECKPOINT_1;	2022-02-25 15:50:09 +02:00
Marko Mäkelä	6daf8f8a0d	Merge 10.5 into 10.6	2022-02-25 13:48:47 +02:00
Marko Mäkelä	b791b942e1	Merge 10.4 into 10.5	2022-02-25 13:27:41 +02:00
Marko Mäkelä	f5ff7d09c7	Merge 10.3 into 10.4	2022-02-25 13:00:48 +02:00
Marko Mäkelä	00b70bbb51	Merge 10.2 into 10.3	2022-02-25 10:43:38 +02:00
Julius Goryavsky	17e0f5224c	MDEV-27524: Incorrect binlogs after Galera SST using rsync and mariabackup This commit adds correct handling of binlogs for SST using rsync or mariabackup. Before this fix, binlogs were handled incorrectly - - only one (last) binary log file was transferred during SST, which then led to various failures (for example, when trying to list all events from the binary log). These bugs were long masked by flaws in the primitive binlogs handling code in the SST scripts, which causing binary logs files to be erased after transfer or not added to the binlog index on the joiner node. Now the correct transfer of all binary logs (not just the last of the binary log files) has been implemented both for the rsync (at the script level) and for the mariabackup (at the level of the main utility code). This commit also adds a new sst_max_binlogs=<n> parameter, which can be located in the [sst] section or in the [xtrabackup] section (historically, supported for mariabackup only, not for rsync), or in one of the server sections. This parameter specifies the number of binary log files to be sent to the joiner node during SST. This option is added for compatibility with old SST scripting behavior, which can be emulated by setting the sst_max_binlogs=1 (although in general this can cause problems for the reasons described above). In addition, setting the sst_max_binlogs=0 can be used to suppress the transmission of binary logs to the joiner nodes during SST (although sometimes a single file with the current binary log can still be transmitted to the joiner, even with sst_max_binlogs=0, because this sometimes necessary in modes that involve the use of GTIDs with Galera). Also, this commit ensures correct handling of paths to various innodb files and directories in the SST scripts, and fixes some problems with this that existed in mariabackup utility (which were associated with incorrect handling of the innodb_data_dir parameter in some scenarios). In addition, this commit contains the following enhancements: 1) Added tests for mtr, which check the correct work with binlogs after SST (using rsync and mariabackup); 2) Added correct handling of slashes at the end of all paths that the SST script receives as parameters; 3) Improved parsing code for --mysqld-args parameters. Now it correctly processes the sequence "--" after the name of the one-letter option; 4) Checking the secret signature during joiner authentication is made independent of presence of bash (as a unix shell) in the system and diff utility no longer needed to check certificates compliance; 5) All directories that are necessary for the correct placement of various logs are automatically created by SST scripts in advance (before running mariabackup on the joiner node); 6) Removal of old binary logs on joiner is done using the binlog index (if it exists) (not only by fixed pattern that based on the current binlog name, as before); 7) Paths for placing binary logs are correctly processed if they are set as relative paths (to the datadir); 8) SST scripts are made even more resistant to spaces in filenames (now for binlogs); 9) In case of failure, SST scripts now always end with an exit code other than zero; 10) SST script for rsync now correctly create a tar file with the binlogs, even if the paths to them (in the binlog index file) are specified as a mix of absolute and relative paths, and even if they do not match with the datadir path specified in the current configuration settings.	2022-02-22 10:45:06 +01:00
Julius Goryavsky	571eb9d775	mariabackup: cosmetic changes (whitespaces and indentation)	2022-02-22 10:20:58 +01:00
Marko Mäkelä	f1beeb58e6	MDEV-27848: Remove unused wait/io/file/innodb/innodb_log_file The performance_schema counter wait/io/file/innodb/innodb_log_file is always reported as 0. The way how redo log writes are being waited for was refactored in commit `30ea63b7d2` by the introduction of flush_lock and write_lock. Even before that change, all the wait/io/file/innodb/ counters were always 0 in my tests. Moreover, if the PMEM interface that was introduced in commit `3daef523af` is being used, writes to the InnoDB log file will completely avoid any system calls and performance_schema instrumentation. In commit `685d958e38` also the reads of the redo log (during recovery) would bypass any system calls.	2022-02-15 15:03:15 +02:00
Marko Mäkelä	a635c40648	MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit() A prominent bottleneck in mtr_t::commit() is log_sys.mutex between log_sys.append_prepare() and log_close(). User-visible change: The minimum innodb_log_file_size will be increased from 1MiB to 4MiB so that some conditions can be trivially satisfied. log_sys.latch (log_latch): Replaces log_sys.mutex and log_sys.flush_order_mutex. Copying mtr_t::m_log to log_sys.buf is protected by a shared log_sys.latch. Writes from log_sys.buf to the file system will be protected by an exclusive log_sys.latch. log_sys.lsn_lock: Protects the allocation of log buffer in log_sys.append_prepare(). sspin_lock: A simple spin lock, for log_sys.lsn_lock. Thanks to Vladislav Vaintroub for suggesting this idea, and for reviewing these changes. mariadb-backup: Replace some use of log_sys.mutex with recv_sys.mutex. buf_pool_t::insert_into_flush_list(): Implement sorting of flush_list because ordering is otherwise no longer guaranteed. Ordering by LSN is needed for the proper operation of redo log checkpoints. log_sys.append_prepare(): Advance log_sys.lsn and log_sys.buf_free by the length, and return the old values. Also increment write_to_buf, which was previously done in log_close(). mtr_t::finish_write(): Obtain the buffer pointer from log_sys.append_prepare(). log_sys.buf_free: Make the field Atomic_relaxed, to simplify log_flush_margin(). Use only loads and stores to avoid costly read-modify-write atomic operations. buf_pool.flush_list_requests: Replaces export_vars.innodb_buffer_pool_write_requests and srv_stats.buf_pool_write_requests. Protected by buf_pool.flush_list_mutex. buf_pool_t::insert_into_flush_list(): Do not invoke page_cleaner_wakeup(). Let the caller do that after a batch of calls. recv_recover_page(): Invoke a minimal part of buf_pool.insert_into_flush_list(). ReleaseBlocks::modified: A number of pages added to buf_pool.flush_list. ReleaseBlocks::operator(): Merge buf_flush_note_modification() here. log_t::set_capacity(): Renamed from log_set_capacity().	2022-02-10 16:37:12 +02:00
Marko Mäkelä	8c7c92adf3	MDEV-27787 mariadb-backup --backup is allocating extra memory for log records In commit `685d958e38` (MDEV-14425), the log parsing in mariadb-backup --backup was rewritten. The parameter STORE_IF_EXISTS that is being passed to recv_sys.parse_mtr() or recv_sys.parse_pmem() instead of STORE_NO caused unnecessary additional memory allocation for redo log records.	2022-02-10 15:39:27 +02:00
Oleksandr Byelkin	4fb2cb1a30	Merge branch '10.7' into 10.8	2022-02-04 14:50:25 +01:00
Oleksandr Byelkin	9ed8deb656	Merge branch '10.6' into 10.7	2022-02-04 14:11:46 +01:00
Thirunarayanan Balathandayuthapani	8d742fe4ac	MDEV-26326 mariabackup skip valid ibd file - Store the deferred tablespace name while loading the tablespace for backup process. - Mariabackup stores the list of space ids which has page0 INIT_PAGE records. backup_first_page_op() and first_page_init() was introduced to track the page0 INIT_PAGE records. - backup_file_op() and log_file_op() was changed to handle FILE_MODIFY redo log records. It is used to identify the deferred tablespace space id. - Whenever file operation redo log was processed by backup, backup_file_op() should check whether the space name exist in deferred tablespace. If it is then it needs to store the space id, name when FILE_MODIFY, FILE_RENAME redo log processed and it should delete the tablespace name from defer list in other cases. - backup_fix_ddl() should check whether deferred tablespace has any page0 init records. If it is then consider the tablespace as newly created tablespace. If not then backup should try to reload the tablespace with SRV_BACKUP_NO_DEFER mode to avoid the deferring of tablespace.	2022-02-01 19:50:08 +05:30
Marko Mäkelä	c64e507fad	MDEV-27621 Backup fails with FATAL ERROR: Was only able to copy log In commit `685d958e38` (MDEV-14425) a bug was introduced to mariadb-backup --backup for the case when the log is wrapping around to log_sys.START_OFFSET (12288). This could also cause a "Missing FILE_CHECKPOINT" error during mariadb-backup --prepare, in case the log copying resumed after the server had produced a multiple of innodb_log_file_size-12288 bytes of more log so that the last mini-transaction would end exactly at the end of the log file. xtrabackup_copy_logfile(): If the log wraps around, read everything to the end of the log file, and then the rest from log_sys.START_OFFSET.	2022-01-27 16:17:40 +02:00
Marko Mäkelä	685d958e38	MDEV-14425 Improve the redo log for concurrency The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.	2022-01-21 16:03:47 +02:00
Daniel Black	d434250ee1	MDEV-25342: autosize innodb_buffer_pool_chunk_size The previous default innodb_buffer_pool_chunk_size of 128M made sense when the innodb buffer pool size was a few GB. When the pool size is 128GB this means the chunk size is 0.1% of this. Fine tuning the buffer pool size on such a fine increment doesn't make practical sense. Also on extremely large buffer pool systems, initializing on the default 128M can also take a considerable amount of time. When large pages are enabled, the chunk size has to be a multiple of an available large page size or memory allocation without use can occur. Previously the default 0 was documented as disabling resizing. With srv_buf_pool_chunk_unit > 0 assertions in the code and the minimium value set, I doubt this was ever the case. As such the autosizing (based on default 0) takes place as follows: * a 64th of the innodb_buffer_pool_size * if large pages, this is rounded down the the nearest multiple of the large page size. * If less than 1MB, set to 1MB. This does mean the new default innodb_buffer_pool_chunk size is 2MB, derived form the above formular with 128MB as the buffer pool size. The innodb_buffer_pool_chunk_size is changed to a size_t for better compatiblity with the memory allocations which use size_t. The previous upper limit is changed to the maxium of a size_t. The maximium value used is the buffer pool size anyway. Getting this default value of the chunk size to a more practical size facilitates further development of more automated resizing without significant overhead or memory fragmentation. innodb_buffer_pool_resize test adjusted based on 1M default chunk size thanks Wlad.	2022-01-18 14:20:57 +02:00
Marko Mäkelä	c22107fd90	Merge 10.6 into 10.7	2021-11-29 11:42:07 +02:00
Marko Mäkelä	51c89849d1	Merge 10.5 into 10.6	2021-11-29 11:39:34 +02:00
Marko Mäkelä	d4cb177603	Merge 10.4 into 10.5	2021-11-29 11:16:20 +02:00
Marko Mäkelä	4da2273876	Merge 10.3 into 10.4	2021-11-29 10:59:22 +02:00
Marko Mäkelä	289721de9a	Merge 10.2 into 10.3	2021-11-29 10:33:06 +02:00
ryancaicse	f809a4fbd0	MDEV-26558 Fix a deadlock due to cyclic dependence Fix a potential deadlock bug between locks ctrl_mutex and entry->mutex	2021-11-24 12:57:44 +02:00
Alexey Bychko	fe065f8d90	MDEV-22522 RPM packages have meaningless summary/description this patch moves cpack summury and description for optional packages to the appropriate CMakeLists.txt files	2021-11-23 11:29:24 +07:00
Vladislav Vaintroub	009f3e06f3	improve build, allow sql library to be built in parallel with builtins	2021-11-09 17:02:45 +02:00
Sergei Krivonos	f7c6c02a06	Revert "improve build, allow sql library to be built in parallel with builtins" This reverts commit `1a3570dec3`.	2021-11-09 15:44:07 +02:00
Vladislav Vaintroub	1a3570dec3	improve build, allow sql library to be built in parallel with builtins	2021-11-09 12:06:49 +02:00
Marko Mäkelä	06988bdcaa	Merge 10.6 into 10.7	2021-11-09 09:40:29 +02:00
Marko Mäkelä	25ac047baf	Merge 10.5 into 10.6	2021-11-09 09:11:50 +02:00
Marko Mäkelä	9c18b96603	Merge 10.4 into 10.5	2021-11-09 08:50:33 +02:00
Marko Mäkelä	47ab793d71	Merge 10.3 into 10.4	2021-11-09 08:40:14 +02:00
Marko Mäkelä	524b4a89da	Merge 10.2 into 10.3	2021-11-09 08:26:59 +02:00
Daniel Black	7c30bc38a5	MDEV-26561 mariabackup release locks The previous threads locked need to be released too. This occurs if the initialization of any of the non-first mutex/conditition variables errors occurs.	2021-11-09 17:05:55 +11:00
ryancaicse	e1eb39a446	MDEV-26561 Fix a bug due to unreleased lock Fix a bug of unreleased lock ctrl_mutex in the method create_worker_threads	2021-11-09 17:05:55 +11:00
Oleksandr Byelkin	167eac7d43	Merge branch '10.6' into 10.7	2021-11-03 11:30:27 +01:00
Marko Mäkelä	3dc0d884ec	MDEV-26674 workaround for mariadb-backup This is follow-up to commit `1193a793c4`. We will set innodb_use_native_aio=OFF by default also in mariadb-backup when running on a potentially affected kernel.	2021-11-02 15:24:20 +02:00
Sergei Golubchik	db20c77782	mariabackup: rename encryption_plugin -> xb_plugin because plugin code is not only about encryption anymore (also loads provider plugins), and xb_ prefix prevents name clashes with the server code (that mariabackup links with).	2021-10-27 15:55:14 +02:00
Sergei Golubchik	8c806c4152	MDEV-26794 MariaBackup does not recognize added providers upon prepare of incremental backup prefer backup-my.cnf from the incremental-dir over the one in target-dir	2021-10-27 15:55:14 +02:00
Sergei Golubchik	2be6804650	MDEV-26791 MariaBackup logs compression provider plugins as encryption plugin	2021-10-27 15:55:14 +02:00
Sergei Golubchik	b91acd405a	MDEV-26773 MariaBackup backup does not work with compression providers make mariabackup to load not only encryption but also provider plugins.	2021-10-27 15:55:14 +02:00
Kartik Soneji	bf8b699f64	MDEV-12933 sort out the compression library chaos bzip2/lz4/lzma/lzo/snappy compression is now provided via services they're almost like normal services, but in include/providers/ and they're supposed to provide exactly the same interface as original compression libraries (but not everything, only enough of if for the code to compile). the services are implemented via dummy functions that return corresponding error values (LZMA_PROG_ERROR, LZO_E_INTERNAL_ERROR, etc). the actual compression libraries are linked into corresponding provider plugins. Providers are daemon plugins that when loaded replace service pointers to point to actual compression functions. That is, run-time dependency on compression libraries is now on plugins, and the server doesn't need any compression libraries to run, but will automatically support the compression when a plugin is loaded. InnoDB and Mroonga use compression plugins now. RocksDB doesn't, because it comes with standalone utility binaries that cannot load plugins.	2021-10-27 15:55:14 +02:00
Marko Mäkelä	79185bd056	Merge 10.6 into 10.7	2021-09-24 15:32:39 +03:00
Marko Mäkelä	d95361107c	Merge 10.5 into 10.6	2021-09-24 14:38:52 +03:00
Marko Mäkelä	7e2b42324c	Merge 10.4 into 10.5	2021-09-24 08:42:23 +03:00
Marko Mäkelä	9024498e88	Merge 10.3 into 10.4	2021-09-22 18:26:54 +03:00
Marko Mäkelä	b46cf33ab8	Merge 10.2 into 10.3	2021-09-22 18:01:41 +03:00
Vladislav Vaintroub	b1351c1594	MDEV-26574 An improper locking bug due to unreleased lock in the ds_xbstream.cc release lock in all as cases n xbstream_open, also fix the case where malloc would return NULL.	2021-09-15 14:55:45 +02:00
Vladislav Vaintroub	8937762ead	MDEV-26573 : A static analyzer warning about ds_archive.cc This file had not been compiled for long time. Remove this from the tree.	2021-09-15 14:19:24 +02:00
Marko Mäkelä	4be366111b	Merge 10.6 into 10.7	2021-09-11 18:01:31 +03:00
Marko Mäkelä	15139964d5	Merge 10.5 into 10.6	2021-09-11 17:55:27 +03:00
Vicențiu Ciorbaru	7c33ecb665	Merge remote-tracking branch 'upstream/10.4' into 10.5	2021-09-10 17:16:18 +03:00
Vicențiu Ciorbaru	de7e027d5e	Merge remote-tracking branch 'upstream/10.3' into 10.4	2021-09-09 09:23:35 +03:00
Vicențiu Ciorbaru	b85b8348e7	Merge branch '10.2' into 10.3	2021-09-07 16:32:35 +03:00
Vladislav Vaintroub	d6b7738dcc	Fix potential null pointer access after the allocation error	2021-09-01 18:21:34 +02:00
Marko Mäkelä	58aaa67064	Merge 10.6 into 10.7	2021-08-31 11:01:19 +03:00
Marko Mäkelä	e94172c2a0	Merge 10.5 into 10.6	2021-08-31 11:00:41 +03:00
Marko Mäkelä	e62120cec7	Merge 10.4 into 10.5	2021-08-31 10:04:56 +03:00
Marko Mäkelä	0464761126	Merge 10.3 into 10.4	2021-08-31 09:22:21 +03:00
Marko Mäkelä	e835cc851e	Merge 10.2 into 10.3	2021-08-31 08:36:59 +03:00
Sergei Golubchik	fe2a7048e7	typo fixed	2021-08-26 15:13:48 +02:00
Marko Mäkelä	64f7dffcc7	Merge 10.6 into 10.7	2021-08-23 11:28:08 +03:00
Marko Mäkelä	49f95c4065	Merge 10.5 into 10.6	2021-08-23 11:21:33 +03:00
Marko Mäkelä	2c9f2a4c8c	Merge 10.4 into 10.5	2021-08-23 11:10:59 +03:00
Vladislav Vaintroub	1002703baa	MDEV-19712 backup stages commented out. Remove commented out code, so that occasional reader is not confused.	2021-08-20 00:25:43 +02:00
Marko Mäkelä	3bf42eb21b	Merge 10.6 into 10.7	2021-08-19 13:03:48 +03:00
Marko Mäkelä	f3fcf5f45c	Merge 10.5 to 10.6	2021-08-19 12:25:00 +03:00
Marko Mäkelä	4a25957274	Merge 10.4 into 10.5	2021-08-18 18:22:35 +03:00
Vladislav Vaintroub	582cf12f94	mariabackup - fix string format in error message	2021-08-11 11:40:56 +02:00
Marko Mäkelä	ddcb242b3c	Merge 10.6 into 10.7	2021-08-04 11:52:39 +03:00
Oleksandr Byelkin	6efb5e9f5e	Merge branch '10.5' into 10.6	2021-08-02 10:11:41 +02:00
Oleksandr Byelkin	ae6bdc6769	Merge branch '10.4' into 10.5	2021-07-31 23:19:51 +02:00
Oleksandr Byelkin	7841a7eb09	Merge branch '10.3' into 10.4	2021-07-31 22:59:58 +02:00
Leandro Pacheco	2b84e1c966	MDEV-23080: desync and pause node on BACKUP STAGE BLOCK_DDL make BACKUP STAGE behave as FTWRL, desyncing and pausing the node to prevent BF threads (appliers) from interfering with blocking stages. This is needed because BF threads don't respect BACKUP MDL locks. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-07-27 08:11:41 +03:00
Marko Mäkelä	a880ef57ef	MDEV-26195 fixup: Format mismatch in mariabackup write_backup_config_file(): Use the correct format for innodb_undo_tablespaces. The data type was changed in commit `ca501ffb04`.	2021-07-24 21:42:03 +03:00
Sergei Golubchik	b34cafe9d9	cleanup: move thread_count to THD_count::value() because the name was misleading, it counts not threads, but THDs, and as THD_count is the only way to increment/decrement it, it could as well be declared inside THD_count.	2021-07-24 12:37:50 +02:00
Marko Mäkelä	b50ea90063	Merge 10.2 into 10.3	2021-07-22 18:57:54 +03:00
Marko Mäkelä	ca501ffb04	MDEV-26195: Use a 32-bit data type for some tablespace fields In the InnoDB data files, we allocate 32 bits for tablespace identifiers and page numbers as well as tablespace flags. But, in main memory data structures we allocate 32 or 64 bits, depending on the register width of the processor. Let us always use 32-bit fields to eliminate a mismatch and reduce the memory footprint on 64-bit systems.	2021-07-22 11:22:47 +03:00
Marko Mäkelä	cee37b5d26	Merge 10.6 into 10.7	2021-07-22 11:22:09 +03:00
Marko Mäkelä	641f09398f	Merge 10.5 into 10.6	2021-07-22 10:11:08 +03:00
Marko Mäkelä	82d5994520	MDEV-26110: Do not rely on alignment on static allocation It is implementation-defined whether alignment requirements that are larger than std::max_align_t (typically 8 or 16 bytes) will be honored by the compiler and linker. It turns out that on IBM AIX, both alignas() and MY_ALIGNED() only guarantees alignment up to 16 bytes. For some data structures, specifying alignment to the CPU cache line size (typically 64 or 128 bytes) is a mere performance optimization, and we do not really care whether the requested alignment is guaranteed. But, for the correct operation of direct I/O, we do require that the buffers be aligned at a block size boundary. field_ref_zero: Define as a pointer, not an array. For innochecksum, we can make this point to unaligned memory; for anything else, we will allocate an aligned buffer from the heap. This buffer will be used for overwriting freed data pages when innodb_immediate_scrub_data_uncompressed=ON. And exactly that code hit an assertion failure on AIX, in the test innodb.innodb_scrub. log_sys.checkpoint_buf: Define as a pointer to aligned memory that is allocated from heap. log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf instead of trying to allocate an aligned buffer from the stack.	2021-07-22 10:05:13 +03:00
Sergei Golubchik	6190a02f35	Merge branch '10.2' into 10.3	2021-07-21 20:11:07 +02:00
Heinz Wiesinger	751ebe44fd	Add feature summary at the end of cmake. This gives a short overview over found/missing dependencies as well as enabled/disabled features. Initial author Heinz Wiesinger <heinz@m2mobi.com> Additions by Vicențiu Ciorbaru <vicentiu@mariadb.org> * Report all plugins enabled via MYSQL_ADD_PLUGIN * Simplify code. Eliminate duplication by making use of WITH_xxx variable values to set feature "ON" / "OFF" state. Reviewed by: wlad@mariadb.com (code details) serg@mariadb.com (the idea)	2021-07-21 10:22:56 +03:00
Marko Mäkelä	ed0a7b1b3f	MDEV-24626 fixup: Remove useless code fil_ibd_create(): Remove code that should have been removed in commit `86dc7b4d4c` already. We no longer wrote an initialized page to the file, but we would still allocate a page image in memory and write it. xb_space_create_file(): Remove an unnecessary page write. (This is a functional change for Mariabackup.)	2021-07-20 17:35:03 +03:00
Oli Sennhauser	da495b1b69	Typo fix in extrabackup.cc and innobackupex.cc Thanks to @shinguz for helping with this. This a backport commit from 10.7	2021-07-15 10:42:54 +03:00
Oli Sennhauser	2c4d1fb544	Typo fix in extrabackup.cc and innobackupex.cc Thanks to @shinguz for helping with this.	2021-07-12 13:29:55 +03:00
Marko Mäkelä	f778a5d5e2	MDEV-25854: Remove garbage tables after restoring a backup In commit `1c5ae99194` (MDEV-25666) we had changed Mariabackup so that it would no longer skip files whose names start with #sql. This turned out to be wrong. Because operations on such named files are not protected by any locks in the server, it is not safe to copy them. Not copying the files may make the InnoDB data dictionary inconsistent with the file system. So, we must do something in InnoDB to adjust for that. If InnoDB is being started up without the redo log (ib_logfile0) or with a zero-length log file, we will assume that the server was restored from a backup, and adjust things as follows: dict_check_sys_tables(), fil_ibd_open(): Do not complain about missing #sql files if they would be dropped a little later. dict_stats_update_if_needed(): Never add #sql tables to the recomputing queue. This avoids a potential race condition when dropping the garbage tables. drop_garbage_tables_after_restore(): Try to drop any garbage tables. innodb_ddl_recovery_done(): Invoke drop_garbage_tables_after_restore() if srv_start_after_restore (a new flag) was set and we are not in read-only mode (innodb_read_only=ON or innodb_force_recovery>3). The tests and dbug_mariabackup_event() instrumentation were developed by Vladislav Vaintroub, who also reviewed this.	2021-06-17 13:46:16 +03:00
Marko Mäkelä	6ba938af62	MDEV-25905: Assertion table2==NULL in dict_sys_t::add() In commit `49e2c8f0a6` (MDEV-25743) we made dict_sys_t::find() incompatible with the rest of the table name hash table operations in case the table name contains non-ASCII octets (using a compatibility mode that facilitates the upgrade into the MySQL 5.0 filename-safe encoding) and the target platform implements signed char. ut_fold_string(): Remove; replace with my_crc32c(). This also makes table name hash value calculations independent on whether char is unsigned or signed.	2021-06-14 12:38:56 +03:00
Vladislav Vaintroub	3d6eb7afcf	MDEV-25602 get rid of __WIN__ in favor of standard _WIN32 This fixed the MySQL bug# 20338 about misuse of double underscore prefix __WIN__, which was old MySQL's idea of identifying Windows Replace it by _WIN32 standard symbol for targeting Windows OS (both 32 and 64 bit) Not that connect storage engine is not fixed in this patch (must be fixed in "upstream" branch)	2021-06-06 13:21:03 +02:00
Marko Mäkelä	a722ee88f3	Merge 10.5 into 10.6	2021-06-01 11:39:38 +03:00
Marko Mäkelä	9c7a456a92	Merge 10.4 into 10.5	2021-06-01 10:38:09 +03:00
Marko Mäkelä	77d8da57d7	Merge 10.3 into 10.4	2021-06-01 09:14:59 +03:00
Marko Mäkelä	950a220060	Merge 10.2 into 10.3	2021-06-01 08:40:59 +03:00
Vladislav Vaintroub	d3c77e08ae	MDEV-20556 Remove references to "xtrabackup" and "innobackupex" in mariabackup --help	2021-05-31 12:54:21 +02:00
Vladislav Vaintroub	5bd517259f	MDEV-25815 mariabackup crash or debug assert with --backup --databases-exclude Fix regression (debug assertion or division by 0) caused by `cfd3d70ccb`	2021-05-29 06:32:40 +02:00
Rucha Deodhar	4e19539c14	MDEV-22189: Change error messages inside code to have mariadb instead of mysql Fix: Changed error messages, rerecorded results and changed other relevant files.	2021-05-24 11:38:13 +05:30
Marko Mäkelä	49e2c8f0a6	MDEV-25743: Unnecessary copying of table names in InnoDB dictionary Many InnoDB data dictionary cache operations require that the table name be copied so that it will be NUL terminated. (For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.) dict_table_t::is_garbage_name(): Check if a name belongs to the background drop table queue. dict_check_if_system_table_exists(): Remove. dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup. dict_sys_t::create_or_check_sys_tables(): Replaces dict_create_or_check_foreign_constraint_tables() and dict_create_or_check_sys_virtual(). dict_sys_t::load_table(): Replaces dict_table_get_low() and dict_load_table(). dict_sys_t::find_table(): Renamed from get_table(). dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist. trx_t::has_stats_table_lock(): Moved to dict0stats.cc. Some error messages will now report table names in the internal databasename/tablename format, instead of `databasename`.`tablename`.	2021-05-21 18:03:40 +03:00
Marko Mäkelä	5d495fc44b	Merge 10.5 into 10.6	2021-05-19 09:15:54 +03:00
Marko Mäkelä	db8fb40824	Merge 10.4 into 10.5	2021-05-19 08:39:39 +03:00
Marko Mäkelä	c366845a0b	MDEV-25691: Simplify handlerton::drop_database for InnoDB The implementation of handlerton::drop_database in InnoDB is unnecessarily complex. The minimal implementation should check that no conflicting locks or references exist on the tables, delete all table metadata in a single transaction, and finally delete the tablespaces. Note: DROP DATABASE will delete each individual table that the SQL layer knows about, one table per transaction. The handlerton::drop_database is basically a final cleanup step for removing any garbage that could have been left behind in InnoDB due to some bug, or not having atomic DDL in the past. hash_node_t: Remove. Use the proper data type name in pointers. dict_drop_index_tree(): Do not take the table as a parameter. Instead, return the tablespace ID if the tablespace should be dropped (we are dropping a clustered index tree). fil_delete_tablespace(), fil_system_t::detach(): Return a single detached file handle. Multi-file tablespaces cannot be deleted via this interface. ha_innobase::delete_table(): Remove a work-around for non-atomic DDL and do not try to drop tables with similar-looking name. innodb_drop_database(): Complete rewrite. innobase_drop_database(), dict_get_first_table_name_in_db(), row_drop_database_for_mysql(), drop_all_foreign_keys_in_db(): Remove. row_purge_remove_clust_if_poss_low(), row_undo_ins_remove_clust_rec(): If the tablespace is to be deleted, try to evict the table definition from the cache. Failing that, set dict_table_t::space to nullptr. lock_release_on_rollback(): On the rollback of CREATE TABLE, release all locks that the transaction had on the table, to avoid heap-use-after-free.	2021-05-18 12:53:40 +03:00
Marko Mäkelä	08b6fd9395	MDEV-25710: Dead code os_file_opendir() in the server The functions fil_file_readdir_next_file(), os_file_opendir(), os_file_closedir() became dead code in the server in MariaDB 10.4.0 with commit `09af00cbde` (the removal of the crash recovery logic for the TRUNCATE TABLE implementation that was replaced in MDEV-13564). os_file_opendir(), os_file_closedir(): Define as macros.	2021-05-18 12:13:18 +03:00
Marko Mäkelä	86dc7b4d4c	MDEV-24626 Remove synchronous write of page0 file during file creation During data file creation, InnoDB holds dict_sys mutex, tries to write page 0 of the file and flushes the file. This not only causing unnecessary contention but also a deviation from the write-ahead logging protocol. The clean sequence of operations is that we first start a dictionary transaction and write SYS_TABLES and SYS_INDEXES records that identify the tablespace. Then, we durably write a FILE_CREATE record to the write-ahead log and create the file. Recovery should not unnecessarily insist that the first page of each data file that is referred to by the redo log is valid. It must be enough that page 0 of the tablespace can be initialized based on the redo log contents. We introduce a new data structure deferred_spaces that keeps track of corrupted-looking files during recovery. The data structure holds the last LSN of a FILE_ record referring to the data file, the tablespace identifier, and the last known file name. There are two scenarios can happen during recovery: i) Sufficient memory: InnoDB can reconstruct the tablespace after parsing all redo log records. ii) Insufficient memory(multiple apply phase): InnoDB should store the deferred tablespace redo logs even though tablespace is not present. InnoDB should start constructing the tablespace when it first encounters deferred tablespace id. Mariabackup copies the zero filled ibd file in backup_fix_ddl() as the extension of .new file. Mariabackup test case does page flushing when it deals with DDL operation during backup operation. fil_ibd_create(): Remove the write of page0 and flushing of file fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has zero filled page0 Datafile: Clean up the error handling, and do not report errors if we are in the middle of recovery. The caller will check Datafile::m_defer. fil_node_t::deferred: Indicates whether the tablespace loading was deferred during recovery FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace file was cannot be loaded. recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to initialize fil_space_t based on buffered metadata and records to initialize page 0. Ignore the flags in fil_name_t, because they are intentionally invalid. fil_name_process(): Update deferred_spaces. recv_sys_t::parse(): Store the redo log if the tablespace id is present in deferred spaces recv_sys_t::recover_low(): Should recover the first page of the tablespace even though the tablespace instance is not present recv_sys_t::apply(): Initialize the deferred tablespace before applying the deferred tablespace records recv_validate_tablespace(): Skip the validation for deferred_spaces. recv_rename_files(): Moved and revised from recv_sys_t::apply(). For deferred-recovery tablespaces, do not attempt to rename the file if a deferred-recovery tablespace is associated with the name. recv_recovery_from_checkpoint_start(): Invoke recv_rename_files() and initialize all deferred tablespaces before applying redo log. fil_node_t::read_page0(): Skip page0 validation if the tablespace is deferred buf_page_create_deferred(): A variant of buf_page_create() when the fil_space_t is not available yet This is joint work with Thirunarayanan Balathandayuthapani, who implemented an initial prototype.	2021-05-17 18:12:33 +03:00
Marko Mäkelä	1c5ae99194	MDEV-25666: After backup, "Could not find a valid tablespace file" Ever since MDEV-18518 made DDL operations mostly crash-safe inside InnoDB, it became obvious that Mariabackup might not be entirely safe with regard to concurrent DDL operations. check_if_skip_table(): Do not skip files whose name starts with #sql. We cannot know whether a DDL operation is in progress and the table might in fact be needed later.	2021-05-14 08:36:14 +03:00
Marko Mäkelä	a0b133f431	MDEV-25312 fixup: Invoke fil_space_t::rename() correctly	2021-05-14 08:24:43 +03:00
Marko Mäkelä	916b237b3f	Merge 10.5 into 10.6	2021-05-07 15:00:27 +03:00
Vladislav Vaintroub	a5b3982585	MDEV-25613 assertion (file_system.n_open > 0) failed Remove operations on fil_system.n_open from mariabackup, as they are not protected by the mutex, and serve no higher purpose anyway.	2021-05-07 08:13:28 +02:00
Marko Mäkelä	d2e2d32933	Merge 10.5 into 10.6	2021-04-14 12:32:27 +03:00
Marko Mäkelä	6c3e860cbf	Merge 10.4 into 10.5	2021-04-14 11:35:39 +03:00
Marko Mäkelä	5008171b05	Merge 10.3 into 10.4	2021-04-14 10:33:59 +03:00
Marko Mäkelä	6e6318b29b	Merge 10.2 into 10.3	2021-04-13 10:26:01 +03:00
Julius Goryavsky	3eecb8db22	MDEV-25356: SST scripts should use the new mariabackup interface SST scripts for Galera should use the new mariabackup interface instead of the innobackupex interface, which is currently only supported for compatibility reasons. This commit converts the SST script for mariabackup to use the new interface. It does not need separate tests, as any problems will be seen as failures when running multiple tests for the mariabackup-based SST.	2021-04-11 17:07:36 +02:00
Julius Goryavsky	bf1e09e0c4	Removed extra spaces in generated command lines (minor "cosmetic" change after MDEV-24197)	2021-04-11 02:27:03 +02:00
Julius Goryavsky	8ff0ac45dc	MDEV-25328: --innodb command line option causes mariabackup to fail This patch fixes an issue with launching mariabackup during SST (when used with Galera), when during bootstrap mariabackup receives the "--innodb" option, which is incorrectly interpreted as shortcut for "--innodb-force-recovery". This patch does not require separate test for mtr, as the problem is visible in general testing on buildbot.	2021-04-11 02:26:52 +02:00
Marko Mäkelä	450c017c2d	Merge 10.2 into 10.3	2021-04-09 14:32:06 +03:00
Marko Mäkelä	cf552f5886	MDEV-25312 Replace fil_space_t::name with fil_space_t::name() A consistency check for fil_space_t::name is causing recovery failures in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field altogether. fil_space_t::name was more or less a copy of dict_table_t::name (except for some special cases), and it was not being used for anything useful. There used to be a name_hash, but it had been removed already in commit `a75dbfd718` (MDEV-12266). We will also remove os_normalize_path(), OS_PATH_SEPARATOR, OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and / roughly in the same way. The intention is that for per-table tablespaces, the filenames will always follow the pattern prefix/databasename/tablename.ibd. (Any \ in the prefix must not be converted.) ut_basename_noext(): Remove (unused function). read_link_file(): Replaces RemoteDatafile::read_link_file(). We will ensure that the last two path component separators are forward slashes (converting up to 2 trailing backslashes on Microsoft Windows), so that everywhere else we can assume that data file names end in "/databasename/tablename.ibd". Note: On Microsoft Windows, path names that start with \\?\ must not contain / as path component separators. Previously, such paths did work in the DATA DIRECTORY argument of InnoDB tables. Reviewed by: Vladislav Vaintroub	2021-04-07 18:01:13 +03:00
Julius Goryavsky	fb9d151934	MDEV-25321: mariabackup failed if password is passed via environment variable The mariabackup interface currently supports passing a password through an explicit command line variable, but does not support passing a password through the MYSQL_PWD environment variable. At the same time, the Galera SST script for mariabackup uses the environment variable to pass the password, which leads (in some cases) to an unsuccessful launch of mariabackup and to the inability to start the cluster. This patch fixes this issue. It does not need a separate test, as the problem is visible in general testing on buildbot.	2021-04-01 21:47:30 +02:00
Srinidhi Kaushik	5bc5ecce08	MDEV-24197: Add "innodb_force_recovery" for "mariabackup --prepare" During the prepare phase of restoring backups, "mariabackup" does not seem to allow (or recognize) the option "innodb_force_recovery" for the embedded InnoDB server instance that it starts. If page corruption observed during page recovery, the prepare step fails. While this is indeed the correct behavior ideally, allowing this option to be set in case of emergencies might be useful when the current backup is the only copy available. Some error messages during "--prepare" suggest to set "innodb_force_recovery" to 1: [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. For backwards compatibility, "mariabackup --innobackupex --apply-log" should also have this option. Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>	2021-04-01 13:34:40 +03:00
Vladislav Vaintroub	08cb5d8483	MDEV-25221 Do not remove source file, if copy_file() fails in mariabackup --move-back Remove an incompletely copied destination file.	2021-03-31 14:23:56 +02:00
Eugene Kosov	1274bb7729	MDEV-25223 change fil_system_t::space_list and fil_system_t::named_spaces from UT_LIST to ilist Mostly a refactoring. Also, debug functions were added for ease of life while debugging	2021-03-24 15:15:18 +03:00
Marko Mäkelä	1055cf967f	Merge 10.5 into 10.6	2021-03-20 18:40:07 +02:00
David Carlier	1bacab8ab9	mariabackup little FreeBSD update support. In this platform, it s better not to rely on optional proc filesystem presence. Using native API to retrieve binary absolute path instead.	2021-03-20 15:29:56 +11:00
Eugene Kosov	dbe941e06f	cleanup: os_thread_create -> std::thread	2021-03-19 11:44:28 +03:00
Eugene Kosov	62e4aaa240	cleanup: os_thread_sleep() -> std::this_thread::sleep_for() std version has an advantage of a more convenient units implementation from std::chrono. Now it's no need to multipy/divide to bring anything to micro seconds.	2021-03-19 11:44:03 +03:00
Marko Mäkelä	783625d78f	MDEV-24883 add io_uring support for tpool liburing is a new optional dependency (WITH_URING=auto\|yes\|no) that replaces libaio when it is available. aio_uring: class which wraps io_uring stuff aio_uring::bind()/unbind(): optional optimization aio_uring::submit_io(): mutex prevents data race. liburing calls are thread-unsafe. But if you look into it's implementation you'll see atomic operations. They're used for synchronization between kernel and user-space only. That's why our own synchronization is still needed. For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288) because the io_uring_setup system call that is invoked by io_uring_queue_init() requests locked memory. The value was found empirically; with 262144, we would occasionally fail to enable io_uring when using the maximum values of innodb_read_io_threads=64 and innodb_write_io_threads=64. aio_uring::thread_routine(): Tolerate -EINTR return from io_uring_wait_cqe(), because it may occur on shutdown on Ubuntu 20.10 (Groovy Gorilla). This was mostly implemented by Eugene Kosov. Systemd integration and improved startup/shutdown error handling by Marko Mäkelä.	2021-03-15 11:30:17 +02:00
Marko Mäkelä	a43ff483fa	Merge 10.5 into 10.6	2021-03-11 20:20:07 +02:00
Marko Mäkelä	a4b7232b2c	Merge 10.4 into 10.5	2021-03-11 20:09:34 +02:00
Marko Mäkelä	7a4fbb55b0	MDEV-25105 Remove innodb_checksum_algorithm values none,innodb,... Historically, InnoDB supported a buggy page checksum algorithm that did not compute a checksum over the full page. Later, well before MySQL 4.1 introduced .ibd files and the innodb_file_per_table option, the algorithm was corrected and the first 4 bytes of each page were redefined to be a checksum. The original checksum was so slow that an option to disable page checksum was introduced for benchmarketing purposes. The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set extension, which includes instructions for faster computation of CRC-32C. In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was implemented to make of that. As that option was changed to be the default in MySQL 5.7, a bug was found on big-endian platforms and some work-around code was added to weaken that checksum further. MariaDB disables that work-around by default since MDEV-17958. Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER and ARM and also for IA-32/AMD64, making use of carry-less multiplication where available. Long story short, innodb_checksum_algorithm=crc32 is faster and more secure than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb. It should have removed any need to use innodb_checksum_algorithm=none. The setting innodb_checksum_algorithm=crc32 is the default in MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5, MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default. It is even faster and more secure. The default settings in MariaDB do allow old data files to be read, no matter if a worse checksum algorithm had been used. (Unfortunately, before innodb_checksum_algorithm=full_crc32, the data files did not identify which checksum algorithm is being used.) The non-default settings innodb_checksum_algorithm=strict_crc32 or innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C checksums. The incompatibility with old data files is why they are not the default. The newest server not to support innodb_checksum_algorithm=crc32 were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life. A valid reason for using innodb_checksum_algorithm=innodb could have been the ability to downgrade. If it is really needed, data files can be converted with an older version of the innochecksum utility. Because there is no good reason to allow data files to be written with insecure checksums, we will reject those option values: innodb_checksum_algorithm=none innodb_checksum_algorithm=innodb innodb_checksum_algorithm=strict_none innodb_checksum_algorithm=strict_innodb Furthermore, the following innochecksum options will be removed, because only strict crc32 will be supported: innochecksum --strict-check=crc32 innochecksum -C crc32 innochecksum --write=crc32 innochecksum -w crc32 If a user wishes to convert a data file to use a different checksum (so that it might be used with the no-longer-supported MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE nor system tablespace format changes that were made in MariaDB 10.3), then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or MySQL 5.7 can be used. Reviewed by: Thirunarayanan Balathandayuthapani	2021-03-11 12:46:18 +02:00
David CARLIER	1dff411e84	arguments overflow fix proposal. the list is assumed to be implictly null terminated at usage time.	2021-03-09 15:51:38 +02:00
David CARLIER	e3a597378e	mariabackup utility, binary path implementation for Mac. implements in a native way get_exepath which gives reliably the full path.	2021-03-09 15:51:38 +02:00
Marko Mäkelä	d346763479	Merge 10.5 into 10.6	2021-03-08 10:51:31 +02:00
Marko Mäkelä	a5d3c1c819	Merge 10.4 into 10.5	2021-03-08 10:16:20 +02:00
Marko Mäkelä	a26e7a3726	Merge 10.3 into 10.4	2021-03-08 09:39:54 +02:00
Marko Mäkelä	bcd160753c	Merge 10.2 into 10.3	2021-03-05 10:06:42 +02:00
Vladislav Vaintroub	545cba13eb	MDEV-22929 fixup. Print "completed OK!" if page corruption and --log-innodb-page-corruption Since we do not stop at corrupted page error, there is no reason to log a backup error.	2021-03-05 09:04:30 +01:00
Marko Mäkelä	420f8e24ab	MDEV-24854: Change innodb_flush_method=O_DIRECT by default We have innodb_use_native_aio=ON by default since the introduction of that parameter in commit `2f9fb41b05` (MySQL 5.5 and MariaDB 5.5). However, to really benefit from the setting, the files should be opened in O_DIRECT mode, to bypass the file system cache. In this way, the reads and writes can be submitted with DMA, using the InnoDB buffer pool directly, and no processor cycles need to be used for copying data. The use of O_DIRECT benefits not only the current libaio implementation, but also liburing. os_file_set_nocache(): Test innodb_flush_method in the function, not in the callers.	2021-02-20 11:58:58 +02:00
Marko Mäkelä	b19ec8848c	Merge 10.5 into 10.6	2021-02-11 09:26:53 +02:00
Monty	5d6ad2ad66	Added 'const' to arguments in get_one_option and find_typeset() One should not change the program arguments! This change also reduces warnings from the icc compiler. Almost all changes are just syntax changes (adding const to 'get_one_option function' declarations). Other changes: - Added a few cast of 'argument' from 'const char' to 'char '. This was mainly in calls to 'external' functions we don't have control of. - Ensure that all reset of 'password command line argument' are similar. (In almost all cases it was just adding a comment and a cast) - In mysqlbinlog.cc and mysqld.cc there was a few cases that changed the command line argument. These places where changed to instead allocate the option in a MEM_ROOT to avoid changing the argument. Some of this code was changed to ensure that different programs did parsing the same way. Added a test case for the changes in mysqlbinlog.cc - Changed a few variables that took their value from command line options from 'char ' to 'const char '.	2021-02-08 12:16:29 +02:00
Marko Mäkelä	92abdcca5a	Merge 10.5 into 10.6	2021-01-07 09:08:09 +02:00
Marko Mäkelä	a993310593	MDEV-24537 innodb_max_dirty_pages_pct_lwm=0 lost its special meaning In commit `3a9a3be1c6` (MDEV-23855) some previous logic was replaced with the condition dirty_pct < srv_max_dirty_pages_pct_lwm, which caused the default value of the parameter innodb_max_dirty_pages_pct_lwm=0 to lose its special meaning: 'refer to innodb_max_dirty_pages_pct instead'. This implicit special meaning was visible in the function af_get_pct_for_dirty(), which was removed in commit `f0c295e2de` (MDEV-24369). page_cleaner_flush_pages_recommendation(): Restore the special meaning that was removed in MDEV-24369. buf_flush_page_cleaner(): If srv_max_dirty_pages_pct_lwm==0.0, refer to srv_max_buf_pool_modified_pct. This fixes the observed performance regression due to excessive page flushing. buf_pool_t::page_cleaner_wakeup(): Revise the wakeup condition. innodb_init(): Do initialize srv_max_io_capacity in Mariabackup. It was previously constantly 0, which caused mariadb-backup --prepare to hang in buf_flush_sync(), making no progress.	2021-01-06 13:53:14 +02:00
Oleksandr Byelkin	02e7bff882	Merge commit '10.4' into 10.5	2021-01-06 10:53:00 +01:00
Oleksandr Byelkin	478b83032b	Merge branch '10.3' into 10.4	2020-12-25 09:13:28 +01:00
Oleksandr Byelkin	25561435e0	Merge branch '10.2' into 10.3	2020-12-23 19:28:02 +01:00
Marko Mäkelä	0aa02567dd	Merge 10.3 into 10.4	2020-12-23 14:52:59 +02:00
Marko Mäkelä	fa1aef39eb	Merge 10.2 into 10.3	2020-12-23 14:25:45 +02:00
Vlad Lesin	719da2c4cc	MDEV-22810 mariabackup does not honor open_files_limit from option during backup prepare open_files_limit option was processed only for --backup, but not for --prepare.	2020-12-16 10:23:41 +03:00
Marko Mäkelä	ff5d306e29	MDEV-21452: Replace ib_mutex_t with mysql_mutex_t SHOW ENGINE INNODB MUTEX functionality is completely removed, as are the InnoDB latching order checks. We will enforce innodb_fatal_semaphore_wait_threshold only for dict_sys.mutex and lock_sys.mutex. dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex. lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex. FIXME: srv_sys should be removed altogether; it is duplicating tpool functionality. fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must not hold fil_system.mutex. fil_close_all_files(): To prevent SAFE_MUTEX warnings for fil_space_destroy_crypt_data(), we must not hold fil_system.mutex while invoking fil_space_free_low() on a detached tablespace.	2020-12-15 17:56:18 +02:00
Marko Mäkelä	38fd7b7d91	MDEV-21452: Replace all direct use of os_event_t Let us replace os_event_t with mysql_cond_t, and replace the necessary ib_mutex_t with mysql_mutex_t so that they can be used with condition variables. Also, let us replace polling (os_thread_sleep() or timed waits) with plain mysql_cond_wait() wherever possible. Furthermore, we will use the lightweight srw_mutex for trx_t::mutex, to hopefully reduce contention on lock_sys.mutex. FIXME: Add test coverage of mariabackup --backup --kill-long-queries-timeout	2020-12-15 17:56:17 +02:00
Marko Mäkelä	9ecd766526	Merge 10.5 into 10.6	2020-12-14 18:02:40 +02:00
Marko Mäkelä	f24b738318	MDEV-24313 (2 of 2): Silently ignored innodb_use_native_aio=1 In commit `5e62b6a5e0` (MDEV-16264) the logic of os_aio_init() was changed so that it will never fail, but instead automatically disable innodb_use_native_aio (which is enabled by default) if the io_setup() system call would fail due to resource limits being exceeded. This is questionable, especially because falling back to simulated AIO may lead to significantly reduced performance. srv_n_file_io_threads, srv_n_read_io_threads, srv_n_write_io_threads: Change the data type from ulong to uint. os_aio_init(): Remove the parameters, and actually return an error code. thread_pool::configure_aio(): Do not silently fall back to simulated AIO. Reviewed by: Vladislav Vaintroub	2020-12-14 15:27:03 +02:00
Marko Mäkelä	8677c14e65	MDEV-24391 heap-use-after-free in fil_space_t::flush_low() We observed a race condition that involved two threads executing fil_flush_file_spaces() and one thread executing fil_delete_tablespace(). After one of the fil_flush_file_spaces() observed that space.needs_flush_not_stopping() is set and was releasing the fil_system.mutex, the other fil_flush_file_spaces() would complete the execution of fil_space_t::flush_low() on the same tablespace. Then, fil_delete_tablespace() would destroy the object, because the value of fil_space_t::n_pending did not prevent that. Finally, the fil_flush_file_spaces() would resume execution and invoke fil_space_t::flush_low() on the freed object. This race condition was introduced in commit `118e258aaa` of MDEV-23855. fil_space_t::flush(): Add a template parameter that indicates whether the caller is holding a reference to prevent the tablespace from being freed. buf_dblwr_t::flush_buffered_writes_completed(), row_quiesce_table_start(): Acquire a reference for the duration of the fil_space_t::flush_low() operation. It should be impossible for the object to be freed in these code paths, but we want to satisfy the debug assertions. fil_space_t::flush_low(): Do not increment or decrement the reference count, but instead assert that the caller is holding a reference. fil_space_extend_must_retry(), fil_flush_file_spaces(): Acquire a reference before releasing fil_system.mutex. This is what will fix the race condition.	2020-12-11 09:05:26 +02:00
Marko Mäkelä	1eb59c307d	MDEV-24340 Unique final message of InnoDB during shutdown innobase_space_shutdown(): Remove. We want this step to be executed before the message "InnoDB: Shutdown completed; log sequence number " is output by innodb_shutdown(). It used to be executed after that step. innodb_shutdown(): Duplicate the code that used to live in innobase_space_shutdown(). innobase_init_abort(): Merge with innobase_space_shutdown().	2020-12-04 11:46:47 +02:00
Marko Mäkelä	a13fac9eee	Merge 10.5 into 10.6	2020-12-03 08:12:47 +02:00
Marko Mäkelä	6a1e655cb0	Merge 10.4 into 10.5	2020-12-02 18:29:49 +02:00
Marko Mäkelä	589cf8dbf3	Merge 10.3 into 10.4	2020-12-01 19:51:14 +02:00
Vlad Lesin	e30a05f454	MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered Post-push Windows compilation errors fix.	2020-12-01 18:33:10 +03:00
Marko Mäkelä	81ab9ea63f	Merge 10.2 into 10.3	2020-12-01 14:55:46 +02:00
Vlad Lesin	e6b3e38d62	MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered The new option --log-innodb-page-corruption is introduced. When this option is set, backup is not interrupted if innodb corrupted page is detected. Instead it logs all found corrupted pages in innodb_corrupted_pages file in backup directory and finishes with error. For incremental backup corrupted pages are also copied to .delta file, because we can't do LSN check for such pages during backup, innodb_corrupted_pages will also be created in incremental backup directory. During --prepare, corrupted pages list is read from the file just after redo log is applied, and each page from the list is checked if it is allocated in it's tablespace or not. If it is not allocated, then it is zeroed out, flushed to the tablespace and removed from the list. If all pages are removed from the list, then --prepare is finished successfully and innodb_corrupted_pages file is removed from backup directory. Otherwise --prepare is finished with error message and innodb_corrupted_pages contains the list of the pages, which are detected as corrupted during backup, and are allocated in their tablespaces, what means backup directory contains corrupted innodb pages, and backup can not be considered as consistent. For incremental --prepare corrupted pages from .delta files are applied to the base backup, innodb_corrupted_pages is read from both base in incremental directories, and the same action is proceded for corrupted pages list as for full --prepare. innodb_corrupted_pages file is modified or removed only in base directory. If DDL happens during backup, it is also processed at the end of backup to have correct tablespace names in innodb_corrupted_pages.	2020-12-01 08:08:57 +03:00
Marko Mäkelä	533a13af06	Merge 10.3 into 10.4	2020-11-03 14:49:17 +02:00
Marko Mäkelä	09a1f0075a	Merge 10.5 into 10.6	2020-11-02 12:49:19 +02:00
Oleksandr Byelkin	8e1e2856f2	Merge branch '10.4' into 10.5	2020-11-01 14:26:15 +01:00
Oleksandr Byelkin	80c951ce28	Merge branch '10.3' into 10.4	2020-10-31 21:06:49 +01:00
Oleksandr Byelkin	794f665139	Merge branch '10.2' into 10.3	2020-10-30 17:23:53 +01:00
Marko Mäkelä	898521e2dd	Merge 10.4 into 10.5	2020-10-30 11:15:30 +02:00
Marko Mäkelä	7b2bb67113	Merge 10.3 into 10.4	2020-10-29 13:38:38 +02:00
Vlad Lesin	6cb88685c4	MDEV-24026: InnoDB: Failing assertion: os_total_large_mem_allocated >= size upon incremental backup mariabackup deallocated uninitialized write_filt_ctxt.u.wf_incremental_ctxt in xtrabackup_copy_datafile() when some table should be skipped due to parsed DDL redo log record.	2020-10-29 07:39:43 +01:00
Marko Mäkelä	a8de8f261d	Merge 10.2 into 10.3	2020-10-28 10:01:50 +02:00
Marko Mäkelä	c27e53f459	MDEV-23855: Use normal mutex for log_sys.mutex, log_sys.flush_order_mutex With an unreasonably small innodb_log_file_size, the page cleaner thread would frequently acquire log_sys.flush_order_mutex and spend a significant portion of CPU time spinning on that mutex when determining the checkpoint LSN.	2020-10-26 17:53:55 +02:00
Marko Mäkelä	118e258aaa	MDEV-23855: Shrink fil_space_t Merge n_pending_ios, n_pending_ops to std::atomic<uint32_t> n_pending. Change some more fil_space_t members to uint32_t to reduce the memory footprint. fil_space_t::add(), fil_ibd_create(): Attach the already opened handle to the tablespace, and enforce the fil_system.n_open limit. dict_boot(): Initialize fil_system.max_assigned_id. srv_boot(): Call srv_thread_pool_init() before anything else, so that files should be opened in the correct mode on Windows. fil_ibd_create(): Create the file in OS_FILE_AIO mode, just like fil_node_open_file_low() does it. dict_table_t::is_accessible(): Replaces fil_table_accessible(). Reviewed by: Vladislav Vaintroub	2020-10-26 17:53:54 +02:00
Marko Mäkelä	45ed9dd957	MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contention Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored for system tablespace on SSD When the maximum configured number of file is exceeded, InnoDB will close data files. We used to maintain a fil_system.LRU list and a counter fil_node_t::n_pending to achieve this, at the huge cost of multiple fil_system.mutex operations per I/O operation. fil_node_open_file_low(): Implement a FIFO replacement policy: The last opened file will be moved to the end of fil_system.space_list, and files will be closed from the start of the list. However, we will not move tablespaces in fil_system.space_list while i_s_tablespaces_encryption_fill_table() is executing (producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION) because it may cause information of some tablespaces to go missing. We also avoid this in mariabackup --backup because datafiles_iter_next() assumes that the ordering is not changed. IORequest: Fold more parameters to IORequest::type. fil_space_t::io(): Replaces fil_io(). fil_space_t::flush(): Replaces fil_flush(). OS_AIO_IBUF: Remove. We will always issue synchronous reads of the change buffer pages in buf_read_page_low(). We will always ignore some errors for background reads. This should reduce fil_system.mutex contention a little. fil_node_t::complete_write(): Replaces fil_node_t::complete_io(). On both read and write completion, fil_space_t::release_for_io() will have to be called. fil_space_t::io(): Do not acquire fil_system.mutex in the normal code path. xb_delta_open_matching_space(): Do not try to open the system tablespace which was already opened. This fixes a file sharing violation in mariabackup --prepare --incremental. Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	3a9a3be1c6	MDEV-23855: Improve InnoDB log checkpoint performance After MDEV-15053, MDEV-22871, MDEV-23399 shifted the scalability bottleneck, log checkpoints became a new bottleneck. If innodb_io_capacity is set low or innodb_max_dirty_pct_lwm is set high and the workload fits in the buffer pool, the page cleaner thread will perform very little flushing. When we reach the capacity of the circular redo log file ib_logfile0 and must initiate a checkpoint, some 'furious flushing' will be necessary. (If innodb_flush_sync=OFF, then flushing would continue at the innodb_io_capacity rate, and writers would be throttled.) We have the best chance of advancing the checkpoint LSN immediately after a page flush batch has been completed. Hence, it is best to perform checkpoints after every batch in the page cleaner thread, attempting to run once per second. By initiating high-priority flushing in the page cleaner as early as possible, we aim to make the throughput more stable. The function buf_flush_wait_flushed() used to sleep for 10ms, hoping that the page cleaner thread would do something during that time. The observed end result was that a large number of threads that call log_free_check() would end up sleeping while nothing useful is happening. We will revise the design so that in the default innodb_flush_sync=ON mode, buf_flush_wait_flushed() will wake up the page cleaner thread to perform the necessary flushing, and it will wait for a signal from the page cleaner thread. If innodb_io_capacity is set to a low value (causing the page cleaner to throttle its work), a write workload would initially perform well, until the capacity of the circular ib_logfile0 is reached and log_free_check() will trigger checkpoints. At that point, the extra waiting in buf_flush_wait_flushed() will start reducing throughput. The page cleaner thread will also initiate log checkpoints after each buf_flush_lists() call, because that is the best point of time for the checkpoint LSN to advance by the maximum amount. Even in 'furious flushing' mode we invoke buf_flush_lists() with innodb_io_capacity_max pages at a time, and at the start of each batch (in the log_flush() callback function that runs in a separate task) we will invoke os_aio_wait_until_no_pending_writes(). This tweak allows the checkpoint to advance in smaller steps and significantly reduces the maximum latency. On an Intel Optane 960 NVMe SSD on Linux, it reduced from 4.6 seconds to 74 milliseconds. On Microsoft Windows with a slower SSD, it reduced from more than 180 seconds to 0.6 seconds. We will make innodb_adaptive_flushing=OFF simply flush innodb_io_capacity per second whenever the dirty proportion of buffer pool pages exceeds innodb_max_dirty_pages_pct_lwm. For innodb_adaptive_flushing=ON we try to make page_cleaner_flush_pages_recommendation() more consistent and predictable: if we are below innodb_adaptive_flushing_lwm, let us flush pages according to the return value of af_get_pct_for_dirty(). innodb_max_dirty_pages_pct_lwm: Revert the change of the default value that was made in MDEV-23399. The value innodb_max_dirty_pages_pct_lwm=0 guarantees that a shutdown of an idle server will be fast. Users might be surprised if normal shutdown suddenly became slower when upgrading within a GA release series. innodb_checkpoint_usec: Remove. The master task will no longer perform periodic log checkpoints. It is the duty of the page cleaner thread. log_sys.max_modified_age: Remove. The current span of the buf_pool.flush_list expressed in LSN only matters for adaptive flushing (outside the 'furious flushing' condition). For the correctness of checkpoints, the only thing that matters is the checkpoint age (log_sys.lsn - log_sys.last_checkpoint_lsn). This run-time constant was also reported as log_max_modified_age_sync. log_sys.max_checkpoint_age_async: Remove. This does not serve any purpose, because the checkpoints will now be triggered by the page cleaner thread. We will retain the log_sys.max_checkpoint_age limit for engaging 'furious flushing'. page_cleaner.slot: Remove. It turns out that page_cleaner_slot.flush_list_time was duplicating page_cleaner.slot.flush_time and page_cleaner.slot.flush_list_pass was duplicating page_cleaner.flush_pass. Likewise, there were some redundant monitor counters, because the page cleaner thread no longer performs any buf_pool.LRU flushing, and because there only is one buf_flush_page_cleaner thread. buf_flush_sync_lsn: Protect writes by buf_pool.flush_list_mutex. buf_pool_t::get_oldest_modification(): Add a parameter to specify the return value when no persistent data pages are dirty. Require the caller to hold buf_pool.flush_list_mutex. log_buf_pool_get_oldest_modification(): Take the fall-back LSN as a parameter. All callers will also invoke log_sys.get_lsn(). log_preflush_pool_modified_pages(): Replaced with buf_flush_wait_flushed(). buf_flush_wait_flushed(): Implement two limits. If not enough buffer pool has been flushed, signal the page cleaner (unless innodb_flush_sync=OFF) and wait for the page cleaner to complete. If the page cleaner thread is not running (which can be the case durign shutdown), initiate the flush and wait for it directly. buf_flush_ahead(): If innodb_flush_sync=ON (the default), submit a new buf_flush_sync_lsn target for the page cleaner but do not wait for the flushing to finish. log_get_capacity(), log_get_max_modified_age_async(): Remove, to make it easier to see that af_get_pct_for_lsn() is not acquiring any mutexes. page_cleaner_flush_pages_recommendation(): Protect all access to buf_pool.flush_list with buf_pool.flush_list_mutex. Previously there were some race conditions in the calculation. buf_flush_sync_for_checkpoint(): New function to process buf_flush_sync_lsn in the page cleaner thread. At the end of each batch, we try to wake up any blocked buf_flush_wait_flushed(). If everything up to buf_flush_sync_lsn has been flushed, we will reset buf_flush_sync_lsn=0. The page cleaner thread will keep 'furious flushing' until the limit is reached. Any threads that are waiting in buf_flush_wait_flushed() will be able to resume as soon as their own limit has been satisfied. buf_flush_page_cleaner: Prioritize buf_flush_sync_lsn and do not sleep as long as it is set. Do not update any page_cleaner statistics for this special mode of operation. In the normal mode (buf_flush_sync_lsn is not set for innodb_flush_sync=ON), try to wake up once per second. No longer check whether srv_inc_activity_count() has been called. After each batch, try to perform a log checkpoint, because the best chances for the checkpoint LSN to advance by the maximum amount are upon completing a flushing batch. log_t: Move buf_free, max_buf_free possibly to the same cache line with log_sys.mutex. log_margin_checkpoint_age(): Simplify the logic, and replace a 0.1-second sleep with a call to buf_flush_wait_flushed() to initiate flushing. Moved to the same compilation unit with the only caller. log_close(): Clean up the calculations. (Should be no functional change.) Return whether flush-ahead is needed. Moved to the same compilation unit with the only caller. mtr_t::finish_write(): Return whether flush-ahead is needed. mtr_t::commit(): Invoke buf_flush_ahead() when needed. Let us avoid external calls in mtr_t::commit() and make the logic easier to follow by having related code in a single compilation unit. Also, we will invoke srv_stats.log_write_requests.inc() only once per mini-transaction commit, while not holding mutexes. log_checkpoint_margin(): Only care about log_sys.max_checkpoint_age. Upon reaching log_sys.max_checkpoint_age where we must wait to prevent the log from getting corrupted, let us wait for at most 1MiB of LSN at a time, before rechecking the condition. This should allow writers to proceed even if the redo log capacity has been reached and 'furious flushing' is in progress. We no longer care about log_sys.max_modified_age_sync or log_sys.max_modified_age_async. The log_sys.max_modified_age_sync could be a relic from the time when there was a srv_master_thread that wrote dirty pages to data files. Also, we no longer have any log_sys.max_checkpoint_age_async limit, because log checkpoints will now be triggered by the page cleaner thread upon completing buf_flush_lists(). log_set_capacity(): Simplify the calculations of the limit (no functional change). log_checkpoint_low(): Split from log_checkpoint(). Moved to the same compilation unit with the caller. log_make_checkpoint(): Only wait for everything to be flushed until the current LSN. create_log_file(): After checkpoint, invoke log_write_up_to() to ensure that the FILE_CHECKPOINT record has been written. This avoids ut_ad(!srv_log_file_created) in create_log_file_rename(). srv_start(): Do not call recv_recovery_from_checkpoint_start() if the log has just been created. Set fil_system.space_id_reuse_warned before dict_boot() has been executed, and clear it after recovery has finished. dict_boot(): Initialize fil_system.max_assigned_id. srv_check_activity(): Remove. The activity count is counting transaction commits and therefore mostly interesting for the purge of history. BtrBulk::insert(): Do not explicitly wake up the page cleaner, but do invoke srv_inc_activity_count(), because that counter is still being used in buf_load_throttle_if_needed() for some heuristics. (It might be cleaner to execute buf_load() in the page cleaner thread!) Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	5999d5120e	MDEV-23399 fixup: Avoid crash on Mariabackup shutdown innodb_preshutdown(): Terminate the encryption threads before the page cleaner thread can be shut down. innodb_shutdown(): Always wait for the encryption threads and page cleaner to shut down. srv_shutdown_all_bg_threads(): Wait for the encryption threads and the page cleaner to shut down. (After an aborted startup, innodb_shutdown() would not be called.) row_get_background_drop_list_len_low(): Remove. os_thread_count: Remove. Alternatively, at the end of srv_shutdown_all_bg_threads() we could try to wait longer for the count to reach 0. On some platforms, an assertion os_thread_count==0 could fail even after a small delay, even though in the core dump all threads would have exited. srv_shutdown_threads(): Renamed from srv_shutdown_all_bg_threads(). Do not wait for the page cleaner to shut down, because the later innodb_shutdown(), which may invoke logs_empty_and_mark_files_at_shutdown(), assumes that it exists.	2020-10-26 15:05:41 +02:00
Vlad Lesin	985ede9203	MDEV-20755 InnoDB: Database page corruption on disk or a failed file read of tablespace upon prepare of mariabackup incremental backup The problem: When incremental backup is taken, delta files are created for innodb tables which are marked as new tables during innodb ddl tracking. When such tablespace is tried to be opened during prepare in xb_delta_open_matching_space(), it is "created", i.e. xb_space_create_file() is invoked, instead of opening, even if a tablespace with the same name exists in the base backup directory. xb_space_create_file() writes page 0 header the tablespace. This header does not contain crypt data, as mariabackup does not have any information about crypt data in delta file metadata for tablespaces. After delta file is applied, recovery process is started. As the sequence of recovery for different pages is not defined, there can be the situation when crypt data redo log event is executed after some other page is read for recovery. When some page is read for recovery, it's decrypted using crypt data stored in tablespace header in page 0, if there is no crypt data, the page is not decryped and does not pass corruption test. This causes error for incremental backup --prepare for encrypted tablespaces. The error is not stable because crypt data redo log event updates crypt data on page 0, and recovery for different pages can be executed in undefined order. The fix: When delta file is created, the corresponding write filter copies only the pages which LSN is greater then some incremental LSN. When new file is created during incremental backup, the LSN of all it's pages must be greater then incremental LSN, so there is no need to create delta for such table, we can just copy it completely. The fix is to copy the whole file which was tracked during incremental backup with innodb ddl tracker, and copy it to base directory during --prepare instead of delta applying. There is also DBUG_EXECUTE_IF() in innodb code to avoid writing redo log record for crypt data updating on page 0 to make the test case stable. Note: The issue is not reproducible in 10.5 as optimized DDL's are deprecated in 10.5. But the fix is still useful because it allows to decrease data copy size during backup, as delta file contains some extra info. The test case should be removed for 10.5 as it will always pass.	2020-10-23 11:02:25 +03:00
Marko Mäkelä	1657b7a583	Merge 10.4 to 10.5	2020-10-22 17:08:49 +03:00
Marko Mäkelä	46957a6a77	Merge 10.3 into 10.4	2020-10-22 13:27:18 +03:00
Marko Mäkelä	e3d692aa09	Merge 10.2 into 10.3	2020-10-22 08:26:28 +03:00
Julius Goryavsky	888010d9dd	MDEV-21951: mariabackup SST fail if data-directory have lost+found directory To fix this, it is necessary to add an option to exclude the database with the name "lost+found" from processing (the database name will be checked by the check_if_skip_database_by_path() or by the check_if_skip_database() function, and as a result "lost+found" will be skipped). In addition, it is necessary to slightly modify the verification logic in the check_if_skip_database() function. Also added a new test galera_sst_mariabackup_lost_found.test	2020-10-20 12:41:06 +02:00
Marko Mäkelä	1066312a12	MDEV-23982: Mariabackup hangs on backup MDEV-13318 introduced a condition to Mariabackup that can cause it to hang if the server goes idle after writing a log block that has no payload after the 12-byte header. Normal recovery in log0recv.cc would allow blocks with exactly 12 bytes of length, and only reject blocks where the length field is shorter than that.	2020-10-19 20:36:05 +03:00
Marko Mäkelä	a0113683d7	Fixup `9028cc6b86` We forgot to change innodb_autoextend_increment from ULONG to UINT (always 32-bit) in Mariabackup.	2020-10-16 07:51:37 +03:00
Marko Mäkelä	9028cc6b86	Cleanup: Make InnoDB page numbers uint32_t InnoDB stores a 32-bit page number in page headers and in some data structures, such as FIL_ADDR (consisting of a 32-bit page number and a 16-bit byte offset within a page). For better compile-time error detection and to reduce the memory footprint in some data structures, let us use a uint32_t for the page number, instead of ulint (size_t) which can be 64 bits.	2020-10-15 17:06:17 +03:00
Marko Mäkelä	7cffb5f6e8	MDEV-23399: Performance regression with write workloads The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted the performance bottleneck to the page flushing. The configuration parameters will be changed as follows: innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction) innodb_lru_scan_depth=1536 (old: 1024) innodb_max_dirty_pages_pct=90 (old: 75) innodb_max_dirty_pages_pct_lwm=75 (old: 0) Note: The parameter innodb_lru_scan_depth will only affect LRU eviction of buffer pool pages when a new page is being allocated. The page cleaner thread will no longer evict any pages. It used to guarantee that some pages will remain free in the buffer pool. Now, we perform that eviction 'on demand' in buf_LRU_get_free_block(). The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows: * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks() * As a buf_pool.free limit in buf_LRU_list_batch() for terminating the flushing that is initiated e.g., by buf_LRU_get_free_block() The parameter also used to serve as an initial limit for unzip_LRU eviction (evicting uncompressed page frames while retaining ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit of 100 or unlimited for invoking buf_LRU_scan_and_free_block(). The status variables will be changed as follows: innodb_buffer_pool_pages_flushed: This includes also the count of innodb_buffer_pool_pages_LRU_flushed and should work reliably, updated one by one in buf_flush_page() to give more real-time statistics. The function buf_flush_stats(), which we are removing, was not called in every code path. For both counters, we will use regular variables that are incremented in a critical section of buf_pool.mutex. Note that show_innodb_vars() directly links to the variables, and reads of the counters will not be protected by buf_pool.mutex, so you cannot get a consistent snapshot of both variables. The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed, because the page cleaner no longer deals with writing or evicting least recently used pages, and because the single-page writes have been removed: * buffer_LRU_batch_flush_avg_time_slot * buffer_LRU_batch_flush_avg_time_thread * buffer_LRU_batch_flush_avg_time_est * buffer_LRU_batch_flush_avg_pass * buffer_LRU_single_flush_scanned * buffer_LRU_single_flush_num_scan * buffer_LRU_single_flush_scanned_per_call When moving to a single buffer pool instance in MDEV-15058, we missed some opportunity to simplify the buf_flush_page_cleaner thread. It was unnecessarily using a mutex and some complex data structures, even though we always have a single page cleaner thread. Furthermore, the buf_flush_page_cleaner thread had separate 'recovery' and 'shutdown' modes where it was waiting to be triggered by some other thread, adding unnecessary latency and potential for hangs in relatively rarely executed startup or shutdown code. The page cleaner was also running two kinds of batches in an interleaved fashion: "LRU flush" (writing out some least recently used pages and evicting them on write completion) and the normal batches that aim to increase the MIN(oldest_modification) in the buffer pool, to help the log checkpoint advance. The buf_pool.flush_list flushing was being blocked by buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit `d1ab89037a` unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message will be removed, because by design, the buf_flush_page_cleaner() should not be blocked during a batch for extended periods of time. We will remove the single-page flushing altogether. Related to this, the debug parameter innodb_doublewrite_batch_size will be removed, because all of the doublewrite buffer will be used for flushing batches. If a page needs to be evicted from the buffer pool and all 100 least recently used pages in the buffer pool have unflushed changes, buf_LRU_get_free_block() will execute buf_flush_lists() to write out and evict innodb_lru_flush_size pages. At most one thread will execute buf_flush_lists() in buf_LRU_get_free_block(); other threads will wait for that LRU flushing batch to finish. To improve concurrency, we will replace the InnoDB ib_mutex_t and os_event_t native mutexes and condition variables in this area of code. Most notably, this means that the buffer pool mutex (buf_pool.mutex) is no longer instrumented via any InnoDB interfaces. It will continue to be instrumented via PERFORMANCE_SCHEMA. For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical sections of buf_pool.flush_list_mutex should be shorter than those for buf_pool.mutex, because in the worst case, they cover a linear scan of buf_pool.flush_list, while the worst case of a critical section of buf_pool.mutex covers a linear scan of the potentially much longer buf_pool.LRU list. mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable with SAFE_MUTEX. Some InnoDB debug assertions need this predicate instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner(). buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list: Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[]. The number of active flush operations. buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA and SAFE_MUTEX instrumentation. buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU. buf_pool_t::done_flush_list: Condition variable for !n_flush_list. buf_pool_t::do_flush_list: Condition variable to wake up the buf_flush_page_cleaner when a log checkpoint needs to be written or the server is being shut down. Replaces buf_flush_event. We will keep using timed waits (the page cleaner thread will wake _at least_ once per second), because the calculations for innodb_adaptive_flushing depend on fixed time intervals. buf_dblwr: Allocate statically, and move all code to member functions. Use a native mutex and condition variable. Remove code to deal with single-page flushing. buf_dblwr_check_block(): Make the check debug-only. We were spending a significant amount of execution time in page_simple_validate_new(). flush_counters_t::unzip_LRU_evicted: Remove. IORequest: Make more members const. FIXME: m_fil_node should be removed. buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex (which we are removing). page_cleaner_slot_t, page_cleaner_t: Remove many redundant members. pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot(). recv_writer_thread: Remove. Recovery works just fine without it, if we simply invoke buf_flush_sync() at the end of each batch in recv_sys_t::apply(). recv_recovery_from_checkpoint_finish(): Remove. We can simply call recv_sys.debug_free() directly. srv_started_redo: Replaces srv_start_state. SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown() can communicate with the normal page cleaner loop via the new function flush_buffer_pool(). buf_flush_remove(): Assert that the calling thread is holding buf_pool.flush_list_mutex. This removes unnecessary mutex operations from buf_flush_remove_pages() and buf_flush_dirty_pages(), which replace buf_LRU_flush_or_remove_pages(). buf_flush_lists(): Renamed from buf_flush_batch(), with simplified interface. Return the number of flushed pages. Clarified comments and renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this function, which was their only caller, and remove 2 unnecessary buf_pool.mutex release/re-acquisition that we used to perform around the buf_flush_batch() call. At the start, if not all log has been durably written, wait for a background task to do it, or start a new task to do it. This allows the log write to run concurrently with our page flushing batch. Any pages that were skipped due to too recent FIL_PAGE_LSN or due to them being latched by a writer should be flushed during the next batch, unless there are further modifications to those pages. It is possible that a page that we must flush due to small oldest_modification also carries a recent FIL_PAGE_LSN or is being constantly modified. In the worst case, all writers would then end up waiting in log_free_check() to allow the flushing and the checkpoint to complete. buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_flush_space(): Auxiliary function to look up a tablespace for page flushing. buf_flush_page(): Defer the computation of space->full_crc32(). Never call log_write_up_to(), but instead skip persistent pages whose latest modification (FIL_PAGE_LSN) is newer than the redo log. Also skip pages on which we cannot acquire a shared latch without waiting. buf_flush_try_neighbors(): Do not bother checking buf_fix_count because buf_flush_page() will no longer wait for the page latch. Take the tablespace as a parameter, and only execute this function when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold(). buf_flush_relocate_on_flush_list(): Declare as cold, and push down a condition from the callers. buf_flush_check_neighbor(): Take id.fold() as a parameter. buf_flush_sync(): Ensure that the buf_pool.flush_list is empty, because the flushing batch will skip pages whose modifications have not yet been written to the log or were latched for modification. buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables. buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize the counters, and report n->evicted. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_do_LRU_batch(): Return the number of pages flushed. buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if adaptive hash index entries are pointing to the block. buf_LRU_get_free_block(): Do not wake up the page cleaner, because it will no longer perform any useful work for us, and we do not want it to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0) writes out and evicts at most innodb_lru_flush_size pages. (The function buf_do_LRU_batch() may complete after writing fewer pages if more than innodb_lru_scan_depth pages end up in buf_pool.free list.) Eliminate some mutex release-acquire cycles, and wait for the LRU flush batch to complete before rescanning. buf_LRU_check_size_of_non_data_objects(): Simplify the code. buf_page_write_complete(): Remove the parameter evict, and always evict pages that were part of an LRU flush. buf_page_create(): Take a pre-allocated page as a parameter. buf_pool_t::free_block(): Free a pre-allocated block. recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block while not holding recv_sys.mutex. During page allocation, we may initiate a page flush, which in turn may initiate a log flush, which would require acquiring log_sys.mutex, which should always be acquired before recv_sys.mutex in order to avoid deadlocks. Therefore, we must not be holding recv_sys.mutex while allocating a buffer pool block. BtrBulk::logFreeCheck(): Skip a redundant condition. row_undo_step(): Do not invoke srv_inc_activity_count() for every row that is being rolled back. It should suffice to invoke the function in trx_flush_log_if_needed() during trx_t::commit_in_memory() when the rollback completes. sync_check_enable(): Remove. We will enable innodb_sync_debug from the very beginning. Reviewed by: Vladislav Vaintroub	2020-10-15 17:04:56 +03:00
Marko Mäkelä	a9550c47e4	MDEV-16264 fixup: Remove unused code and data LATCH_ID_OS_AIO_READ_MUTEX, LATCH_ID_OS_AIO_WRITE_MUTEX, LATCH_ID_OS_AIO_LOG_MUTEX, LATCH_ID_OS_AIO_IBUF_MUTEX, LATCH_ID_OS_AIO_SYNC_MUTEX: Remove. The tpool is not instrumented. lock_set_timeout_event(): Remove. srv_sys_mutex_key, srv_sys_t::mutex, SYNC_THREADS: Remove. srv_slot_t::suspended: Remove. We only ever assigned this data member true, so it is redundant. ib_wqueue_wait(), ib_wqueue_timedwait(): Remove. os_thread_join(): Remove. os_thread_create(), os_thread_exit(): Remove redundant parameters. These were missed in commit `5e62b6a5e0`.	2020-09-30 14:28:11 +03:00
Marko Mäkelä	6ce0a6f9ad	Merge 10.5 into 10.6	2020-09-24 10:21:26 +03:00
Marko Mäkelä	882ce206db	Merge 10.4 into 10.5	2020-09-23 11:32:43 +03:00
Marko Mäkelä	952a028a52	Merge 10.3 into 10.4 We omit commit `a3bdce8f1e` and commit `a0e2a293bc` because they would make the test galera_3nodes.galera_gtid_2_cluster fail and disable it.	2020-09-21 17:42:02 +03:00
Marko Mäkelä	2cf489d430	Merge 10.2 into 10.3	2020-09-21 16:39:23 +03:00
Marko Mäkelä	407d170c92	MDEV-23711 fixup: GCC -Og -Wmaybe-uninitialized	2020-09-21 16:29:29 +03:00
Vlad Lesin	0a224edc3e	MDEV-23711 make mariabackup innodb redo log read error message more clear log_group_read_log_seg() returns error when: 1) Calculated log block number does not correspond to read log block number. This can be caused by: a) Garbage or an incompletely written log block. We can exclude this case by checking log block checksum if it's enabled(see innodb-log-checksums, encrypted log block contains checksum always). b) The log block is overwritten. In this case checksum will be correct and read log block number will be greater then requested one. 2) When log block length is wrong. In this case recv_sys->found_corrupt_log is set. 3) When redo log block checksum is wrong. In this case innodb code writes messages to error log with the following prefix: "Invalid log block checksum." The fix processes all the cases above.	2020-09-21 12:29:52 +03:00
Marko Mäkelä	3a423088ac	Merge 10.3 into 10.4	2020-09-21 12:29:00 +03:00
Marko Mäkelä	cbcb4ecabb	Merge 10.2 into 10.3	2020-09-21 11:04:04 +03:00
Vladislav Vaintroub	ccbe6bb6fc	MDEV-19935 Create unified CRC-32 interface Add CRC32C code to mysys. The x86-64 implementation uses PCMULQDQ in addition to CRC32 instruction after Intel whitepaper, and is ported from rocksdb code. Optimized ARM and POWER CRC32 were already present in mysys.	2020-09-17 16:07:37 +02:00
Vlad Lesin	80075ba011	MDEV-19264 Better support MariaDB GTID for Mariabackup's --slave-info option Parse SHOW SLAVE STATUS output for the "Using_Gtid" column. If the value is "No", then old log file and position is backed up, otherwise gtid_slave_pos is backed up.	2020-09-14 11:14:50 +03:00
Marko Mäkelä	938db04898	Cleanup: Remove os0proc.*	2020-09-03 16:40:42 +03:00
Oleksandr Byelkin	5edf3e0388	Merge branch '10.5' into 10.6	2020-09-02 14:36:14 +02:00
Marko Mäkelä	2fa9f8c53a	Merge 10.3 into 10.4	2020-08-20 11:01:47 +03:00
Marko Mäkelä	de0e7cd72a	Merge 10.2 into 10.3	2020-08-20 09:12:16 +03:00
Marko Mäkelä	309302a3da	MDEV-23475 InnoDB performance regression for write-heavy workloads In commit `fe39d02f51` (MDEV-20638) we removed some wake-up signaling of the master thread that should have been there, to ensure a steady log checkpointing workload. Common sense suggests that the commit omitted some necessary calls to srv_inc_activity_count(). But, an attempt to add the call to trx_flush_log_if_needed_low() as well as to reinstate the function innobase_active_small() did not restore the performance for the case where sync_binlog=1 is set. Therefore, we will revert the entire commit in MariaDB Server 10.2. In MariaDB Server 10.5, adding a srv_inc_activity_count() call to trx_flush_log_if_needed_low() did restore the performance, so we will not revert MDEV-20638 across all versions.	2020-08-19 11:18:56 +03:00
Marko Mäkelä	4c50120d14	MDEV-23474 InnoDB fails to restart after SET GLOBAL innodb_log_checksums=OFF Regretfully, the parameter innodb_log_checksums was introduced in MySQL 5.7.9 (the first GA release of that series) by mysql/mysql-server@af0acedd88 which partly replaced a parameter that had been introduced in 5.7.8 mysql/mysql-server@22ba38218e as innodb_log_checksum_algorithm. Given that the CRC-32C operations are accelerated on many processor implementations (AMD64 with SSE4.2; since MDEV-22669 also on IA-32 with SSE4.2, POWER 8 and later, ARMv8 with some extensions) and by lookup tables when only generic SISD instructions are available, there should be no valid reason to disable checksums. In MariaDB 10.5.2, as a preparation for MDEV-12353, MDEV-19543 deprecated and ignored the parameter innodb_log_checksums altogether. This should imply that after a clean shutdown with innodb_log_checksums=OFF one cannot upgrade to MariaDB Server 10.5 at all. Due to these problems, let us deprecate the parameter innodb_log_checksums and honor it only during server startup. The command SET GLOBAL innodb_log_checksums will always set the parameter to ON.	2020-08-18 16:46:07 +03:00
Marko Mäkelä	cf87f3e08c	Merge 10.4 into 10.5	2020-08-14 11:33:35 +03:00
Marko Mäkelä	2f7b37b021	Merge 10.3 into 10.4, except MDEV-22543 Also, fix GCC -Og -Wmaybe-uninitialized in run_backup_stage()	2020-08-13 18:48:41 +03:00
Marko Mäkelä	4bd56a697f	Merge 10.2 into 10.3	2020-08-13 18:18:25 +03:00
Marko Mäkelä	31aef3ae99	Fix GCC 10.2.0 -Og -Wmaybe-uninitialized For some reason, GCC emits more -Wmaybe-uninitialized warnings when using the flag -Og than when using -O2. Many of the warnings look genuine.	2020-08-11 15:58:16 +03:00
Marko Mäkelä	e685809a3b	MDEV-23397 Remove deprecated InnoDB options in 10.6	2020-08-04 12:51:59 +03:00
Marko Mäkelä	9a7948e3f6	Merge 10.5 into 10.6	2020-08-04 07:55:16 +03:00
Marko Mäkelä	bbd70fcc43	MDEV-23379 Deprecate&ignore InnoDB concurrency throttling parameters The parameters innodb_thread_concurrency and innodb_commit_concurrency were useful years ago when both computing resources and the implementation of some shared data structures were limited. MySQL 5.0 or 5.1 had trouble scaling beyond 8 concurrent connections. Most of the scalability bottlenecks have been removed since then, and the transactions per second delivered by MariaDB Server 10.5 should not dramatically drop upon exceeding the 'optimal' number of connections. Hence, enabling any concurrency throttling for InnoDB actually makes things worse. We have seen many customers mistakenly setting this to a small value like 16 or 64 and then complaining the server was slow. Ignoring the parameters allows us to remove some normally unused code and data structures, which could slightly improve performance. innodb_thread_concurrency, innodb_commit_concurrency, innodb_replication_delay, innodb_concurrency_tickets, innodb_thread_sleep_delay, innodb_adaptive_max_sleep_delay: Deprecate and ignore; hard-wire to 0. The column INFORMATION_SCHEMA.INNODB_TRX.trx_concurrency_tickets will always report 0.	2020-08-04 06:59:29 +03:00
Marko Mäkelä	50a11f396a	Merge 10.4 into 10.5	2020-08-01 14:42:51 +03:00
Marko Mäkelä	9216114ce7	Merge 10.3 into 10.4	2020-07-31 18:09:08 +03:00
Marko Mäkelä	66ec3a770f	Merge 10.2 into 10.3	2020-07-31 13:51:28 +03:00
Thirunarayanan Balathandayuthapani	fe39d02f51	MDEV-20638 Remove the deadcode from srv_master_thread() and srv_active_wake_master_thread_low() - Due to commit `fe95cb2e40` (MDEV-16125), InnoDB master thread does not need to call srv_resume_thread() and therefore there is no need to wake up the thread. Due to the above patch, InnoDB should remove the following dead code. srv_check_activity(): Makes the parameter as in,out and returns the recent activity value innobase_active_small(): Removed srv_active_wake_master_thread(): Removed srv_wake_master_thread(): Removed srv_active_wake_master_thread_low(): Removed Simplify srv_master_thread() and remove switch cases, added the assert. Replace srv_wake_master_thread() with srv_inc_activity_count() INNOBASE_WAKE_INTERVAL: Removed	2020-07-23 16:23:20 +05:30
Vladislav Vaintroub	9701759b3d	MDEV-23043 Refactor Windows service handling Removed the existing nt_service classes - they provide little abstraction, and only obscure a relatively simple service handling. This replaces by similar code inspired by MS docs samples. Service handling is now moved into winmain.cc, which contains the main() function for Windows. winmain provides reporting callbacks, which should be used by external code ,to report transitions from starting to running to shutting down to stopped. Removed a do-nothing ServiceMain thread, and the non-working service "pause/continue". Removed a lot of #ifdef __WIN__ code from mysqld.cc	2020-07-04 18:24:40 +02:00
Vladislav Vaintroub	272828a171	Merge branch '10.5' into 10.6	2020-07-04 11:53:26 +02:00
Marko Mäkelä	1813d92d0c	Merge 10.4 into 10.5	2020-07-02 09:41:44 +03:00
Oleksandr Byelkin	b0f836053b	MDEV-22983: Mariabackup's --help option disappeared return --help option	2020-07-01 23:30:44 +03:00
Oleksandr Byelkin	5018b998a7	return --help option	2020-06-23 17:07:03 +02:00
Marko Mäkelä	cfd3d70ccb	MDEV-22871: Remove pointer indirection for InnoDB hash_table_t hash_get_n_cells(): Remove. Access n_cells directly. hash_get_nth_cell(): Remove. Access array directly. hash_table_clear(): Replaced with hash_table_t::clear(). hash_table_create(), hash_table_free(): Remove. hash0hash.cc: Remove.	2020-06-18 14:16:01 +03:00
Marko Mäkelä	c515b1d092	Merge 10.4 into 10.5	2020-06-18 13:58:54 +03:00
Vlad Lesin	205b0ce6ad	MDEV-22894: Mariabackup should not read [mariadb-client] option group from configuration files	2020-06-18 12:19:48 +03:00
Vlad Lesin	9bdf35e90f	MDEV-18215: mariabackup does not report unknown command line options MDEV-21298: mariabackup doesn't read from the [mariadbd] and [mariadbd-X.Y] server option groups from configuration files MDEV-21301: mariabackup doesn't read [mariadb-backup] option group in configuration file All three issues require to change the same code, that is why their fixes are joined in one commit. The fix is in invoking load_defaults_or_exit() and handle_options() for backup-specific groups separately from client-server groups to let the last handle_options() call fail on unknown backup-specific options. The order of options procesing is the following: 1) Load server groups and process server options, ignore unknown options 2) Load client groups and process client options, ignore unknown options 3) Load backup groups and process client-server options, exit on unknown option 4) Process --mysqld-args command line options, ignore unknown options New global flag my_handle_options_init_variables was added to have ability to invoke handle_options() for the same allowed options set several times without re-initialising previously set option values. --password value destroying is moved from option processing callback to mariabackup's handle_options() function to have ability to invoke server's handle_options() several times for the same possible allowed options set. Galera invokes wsrep_sst_mariabackup.sh with mysqld command line options to configure mariabackup as close to the server as possible. It is not known what server options are supported by mariabackup when the script is invoked. That is why new mariabackup option "--mysqld-args" is added, all unknown options that follow this option will be silently ignored. wsrep_sst_mariabackup.sh was also changed to: - use "--mysqld-args" mariabackup option to pass mysqld options, - remove deprecated innobackupex mode, - remove unsupported mariabackup options: --encrypt --encrypt-key --rebuild-indexes --rebuild-threads	2020-06-14 13:23:07 +03:00
mysqlonarm	dec3f8ca69	MDEV-22641: Provide SIMD optimized wrapper for zlib crc32() (#1558 ) Existing implementation used my_checksum (from mysys) for calculating table checksum and binlog checksum. This implementation was optimized for powerpc only and lacked SIMD implementation for x86 (using clmul) and ARM (using ACLE) instead used zlib-crc32. mariabackup had its own copy of the crc32 implementation using hardware optimized implementation only for x86 and lagged hardware based implementation for powerpc and ARM. Patch helps unifies all such calls and help aggregate all of them using an unified interface my_checksum(). Said unification also enables hardware optimized calls for all architecture viz. x86, ARM, POWERPC. Default always fallback to zlib crc32. Thanks to Daniel Black for reviewing, fixing and testing PowerPC changes. Thanks to Marko and Daniel for early code feedback.	2020-06-01 11:34:06 +03:00
Marko Mäkelä	5ece2155cb	Merge 10.4 into 10.5	2020-05-20 17:46:05 +03:00
Marko Mäkelä	2bf93a8fd6	Merge 10.3 into 10.4	2020-05-19 21:18:15 +03:00
Marko Mäkelä	79ed33c184	Merge 10.2 into 10.3	2020-05-19 17:05:05 +03:00
Vlad Lesin	0f9bfcc323	MDEV-22554: "mariabackup --prepare" exits with code 0 even though innodb error is logged The fix is to set flag in ib::error::~error() and check it in mariabackup. ib::error::error() is replaced with ib::warn::warn() in AIO::linux_create_io_ctx() because of two reasons: 1) if we leave it as is, then mariabackup MTR tests will fail with --mem option, because Linux AIO can not be used on tmpfs, 2) when Linux AIO can not be initialized, InnoDB falls back to simulated AIO, so such sutiation is not fatal error, it should be treated as warning.	2020-05-19 11:25:56 +03:00
Vladislav Vaintroub	e2bc029211	MDEV-7021 Pass directory security descriptor from mysql_install_db.exe to bootstrap This ensures that directory permissions are correct in all cases, even if boostrap is passed non-standard locations for innodb. Directory permissions are copied from the datadir.	2020-05-18 18:11:40 +02:00
Vladislav Vaintroub	8cb3060c5c	MDEV-22272 Windows installer - run service unter virtual service account Change mysql_install_db.exe to run service under virtual account. Set directory permissions so that service has full access to data files. mariabackup --copy-back permission handling (MDEV-17008) needs to be changed as well. Now, whenever a directory is created in course of copy-back, its permissions are copied from the datadir. This handling assumes, that datadir already has the correct permissions for the Windows service.	2020-05-18 18:07:26 +02:00
Marko Mäkelä	18a62eb76d	MDEV-21133 follow-up: Use fil_page_get_type() Let us use the common accessor function fil_page_get_type() instead of accessing the page header field FIL_PAGE_TYPE directly.	2020-05-07 17:15:34 +03:00
Eugene Kosov	89ff4176c1	MDEV-22437 make THR_THD* variable thread_local Now all access goes through _current_thd() and set_current_thd() functions. Some functions like THD::store_globals() can not fail now.	2020-05-05 18:13:31 +03:00
Marko Mäkelä	496d0372ef	Merge 10.4 into 10.5	2020-04-29 15:40:51 +03:00
Marko Mäkelä	0632b8034b	Merge 10.3 into 10.4	2020-04-29 09:05:15 +03:00
Marko Mäkelä	1fbdcada73	Merge 10.2 into 10.3	2020-04-28 22:29:13 +03:00
Vlad Lesin	d0150dc14e	MDEV-20230: mariabackup --ftwrl-wait-timeout never times out on explicit lock --ftwrl-wait-timeout does not finish mariabackup execution when acquired backup lock can't be grabbed for the certain amount of time, it just waits for a long queries finishing before acquiring the lock to avoid unnecessary locking. This commit extends --ftwrl-wait-timeout so, that mariabackup execution is finished if it waits for backup lock during certain amount of time.	2020-04-27 22:10:50 +03:00
Marko Mäkelä	fbe2712705	Merge 10.4 into 10.5 The functional changes of commit `5836191c8f` (MDEV-21168) are omitted due to MDEV-742 having addressed the issue.	2020-04-25 21:57:52 +03:00
Marko Mäkelä	88cf6f1c7f	Merge 10.3 into 10.4	2020-04-22 18:18:51 +03:00
Marko Mäkelä	455cf6196c	Merge 10.2 into 10.3	2020-04-22 14:45:55 +03:00
Vlad Lesin	0efe1971c6	MDEV-19347: Mariabackup does not honor ignore_db_dirs from server config. The solution is to read the system variable value on startup and to fill databases_exclude_hash. xb_load_list_string() became non-static and was reformatted. The system variable value is read and processed in get_mysql_vars(), which was also reformatted.	2020-04-21 10:34:37 +03:00
Marko Mäkelä	af91266498	Merge 10.3 into 10.4 In main.index_merge_myisam we remove the test that was added in commit `a2d24def8c` because it duplicates the test case that was added in commit `5af12e4635`.	2020-04-16 12:12:26 +03:00
Marko Mäkelä	84db10f27b	Merge 10.2 into 10.3	2020-04-15 09:56:03 +03:00
Thirunarayanan Balathandayuthapani	6bbc0eedc6	MDEV-22193 Avoid un-necessary page initialization during recovery - InnoDB is doing un-necessary redo log page initialisation during recovery and unnecessary traversal of redo log during last phase. This patch does the optimization of removing unnecessary redo log page initialisation and detects the memory exhaust earlier.	2020-04-09 21:25:31 +05:30
Vlad Lesin	5836191c8f	MDEV-21168: Active XA transactions stop slave from working after backup was restored. Optionally rollback prepared XA's on "mariabackup --prepare". The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for slaves.	2020-04-07 15:05:38 +03:00
Daniel Black	e8351934b6	Merge pull request #1221 from grooverdan/10.4-MDEV-18851-multiple-sized-large-page-support MDEV-18851: multiple sized large page support (linux)	2020-04-02 23:54:08 +04:00
Marko Mäkelä	aae3f921ad	Cleanup recv_sys: Move things to members recv_sys.recovery_on: Replaces recv_recovery_on. recv_sys_t::apply(): Replaces recv_apply_hashed_log_recs(). recv_sys_var_init(): Remove. recv_sys_t::recover_low(): Attempt to initialize a page based on buffered redo log records.	2020-03-30 18:45:09 +03:00
Vladislav Vaintroub	1b58ef7d3f	Build cleanups. Fix clang-cl built	2020-03-25 15:53:38 +01:00
Rasmus Johansson	9e1b3af4a4	MDEV-21303 Make executables MariaDB named To change all executables to have a mariadb name I had to: - Do name changes in every CMakeLists.txt that produces executables - CREATE_MARIADB_SYMLINK was removed and GET_SYMLINK added by Wlad to reuse the function in other places also - The scripts/CMakeLists.txt could make use of GET_SYMLINK instead of introducing redundant code, but I thought I'll leave that for next release - A lot of changes to debian/.install and debian/.links files due to swapping of real executable and symlink. I did not however change the name of the manpages, so the real name is still mysql there and mariadb are symlinks. - The Windows part needed a change now when we made the executables mariadb -named. MSI (and ZIP) do not support symlinks and to not break backward compatibility we had to include mysql named binaries also. Done by Wlad	2020-03-21 20:20:29 +01:00
Sergei Golubchik	91d1588d30	Merge branch 'github/10.5' into 10.5	2020-03-14 09:52:35 +01:00
Marko Mäkelä	f224525204	MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC The -Wconversion in GCC seems to be stricter than in clang. GCC at least since version 4.4.7 issues truncation warnings for assignments to bitfields, while clang 10 appears to only issue warnings when the sizes in bytes rounded to the nearest integer powers of 2 are different. Before GCC 10.0.0, -Wconversion required more casts and would not allow some operations, such as x<<=1 or x+=1 on a data type that is narrower than int. GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining about x\|=y even when x and y are compatible types that are narrower than int. Hence, we must rewrite some x\|=y as x=static_cast<byte>(x\|y) or similar, or we must disable -Wconversion. In GCC 6 and later, the warning for assigning wider to bitfields that are narrower than 8, 16, or 32 bits can be suppressed by applying a bitwise & with the exact bitmask of the bitfield. For older GCC, we must disable -Wconversion for GCC 4 or 5 in such cases. The bitwise negation operator appears to promote short integers to a wider type, and hence we must add explicit truncation casts around them. Microsoft Visual C does not allow a static_cast to truncate a constant, such as static_cast<byte>(1) truncating int. Hence, we will use the constructor-style cast byte(~1) for such cases. This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0, clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019) on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).	2020-03-12 19:46:41 +02:00
Oleksandr Byelkin	fad47df995	Merge branch '10.4' into 10.5	2020-03-11 17:52:49 +01:00
Oleksandr Byelkin	b7362d5fbc	Merge branch '10.3' into 10.4	2020-03-11 14:28:24 +01:00
Oleksandr Byelkin	3c9bc0ce19	Merge branch '10.2' into 10.3	2020-03-11 14:05:41 +01:00
Marko Mäkelä	574d8b2940	MDEV-21907: Fix most clang -Wconversion in InnoDB Declare innodb_purge_threads as 4-byte integer (UINT) instead of 4-or-8-byte (ULONG) and adjust the documentation string.	2020-03-11 08:29:48 +02:00
Sergei Golubchik	c1c5222cae	cleanup: PSI key is always the first argument	2020-03-10 19:24:23 +01:00
Sergei Golubchik	7c58e97bf6	perfschema memory related instrumentation changes	2020-03-10 19:24:22 +01:00
Eugene Kosov	2b8b85bd0a	fix use-after-free	2020-03-10 15:14:53 +03:00
Marko Mäkelä	4383897a01	MDEV-14425 preparation: Remove log_header_read() The function log_header_read() was only used during server startup, and it will mostly be used only for reading checkpoint information from pre-MDEV-14425 format redo log files. Let us replace the function with more direct calls, so that it is clearer what is going on. It is not strictly necessary to hold any mutex during this operation, and because there will be only a limited number of operations during early server startup, it is not necessary to increment any I/O counters.	2020-03-04 10:08:33 +02:00
Marko Mäkelä	8511f04fdb	Cleanup: Remove srv_start_lsn Most of the time, we can refer to recv_sys.recovered_lsn.	2020-03-02 15:01:46 +02:00
Eugene Kosov	9ef2d29ff4	MDEV-14425 deprecate and ignore innodb_log_files_in_group Now there can be only one log file instead of several which logically work as a single file. Possible names of redo log files: ib_logfile0, ib_logfile101 (for just created one) innodb_log_fiels_in_group: value of this variable is not used by InnoDB. Possible values are still 1..100, to not break upgrade LOG_FILE_NAME: add constant of value "ib_logfile0" LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile" get_log_file_path(): convenience function that returns full path of a redo log file SRV_N_LOG_FILES_MAX: removed srv_n_log_files: we can't remove this for compatibility reasons, but now server doesn't use this variable log_sys_t::file::fd: now just one, not std::vector log_sys_t::log_capacity: removed word 'group' find_and_check_log_file(): part of logic from huge srv_start() moved here recv_sys_t::files: file descriptors of redo log files. There can be several of those in case we're upgrading from older MariaDB version. recv_sys_t::remove_extra_log_files: whether to remove ib_logfile{1,2,3...} after successfull upgrade. recv_sys_t::read(): open if needed and read from one of several log files recv_sys_t::files_size(): open if needed and return files count redo_file_sizes_are_correct(): check that redo log files sizes are equal. Just to log an error for a user. Corresponding check was moved from srv0start.cc namespace deprecated: put all deprecated variables here to prevent usage of it by us, developers	2020-02-19 12:21:59 +03:00
Marko Mäkelä	f8a9f90667	MDEV-12353: Remove support for crash-upgrade We tighten some assertions regarding dict_index_t::is_dummy and crash recovery, now that redo log processing will no longer create dummy objects.	2020-02-13 19:13:45 +02:00
Marko Mäkelä	7ae21b18a6	MDEV-12353: Change the redo log encoding log_t::FORMAT_10_5: physical redo log format tag log_phys_t: Buffered records in the physical format. The log record bytes will follow the last data field, making use of alignment padding that would otherwise be wasted. If there are multiple records for the same page, also those may be appended to an existing log_phys_t object if the memory is available. In the physical format, the first byte of a record identifies the record and its length (up to 15 bytes). For longer records, the immediately following bytes will encode the remaining length in a variable-length encoding. Usually, a variable-length-encoded page identifier will follow, followed by optional payload, whose length is included in the initially encoded total record length. When a mini-transaction is updating multiple fields in a page, it can avoid repeating the tablespace identifier and page number by setting the same_page flag (most significant bit) in the first byte of the log record. The byte offset of the record will be relative to where the previous record for that page ended. Until MDEV-14425 introduces a separate file-level log for redo log checkpoints and file operations, we will write the file-level records in the page-level redo log file. The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT) will be removed in MDEV-14425, and one sequential scan of the page recovery log will suffice. Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags. If the information is needed, it can be parsed from WRITE records that modify FSP_SPACE_FLAGS. MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily as part of this work, before being replaced with WRITE (along with MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES). mtr_buf_t::empty(): Check if the buffer is empty. mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty. mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record, for the same_page encoding. page_recv_t::last_offset: Reflects mtr_t::m_last_offset. Valid values for last_offset during recovery should be 0 or above 8. (The first 8 bytes of a page are the checksum and the page number, and neither are ever updated directly by log records.) Internally, the special value 1 indicates that the same_page form will not be allowed for the subsequent record. mtr_t::page_create(): Take the block descriptor as parameter, so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE record will always followed by a subtype byte, because same_page records must be longer than 1 byte. trx_undo_page_init(): Combine the writes in WRITE record. trx_undo_header_create(): Write 4 bytes using a special MEMSET record that includes 1 bytes of length and 2 bytes of payload. flst_write_addr(): Define as a static function. Combine the writes. flst_zero_both(): Replaces two flst_zero_addr() calls. flst_init(): Do not inline the function. fsp_free_seg_inode(): Zerofill the whole inode. fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT to FIL_NULL when using the physical format. btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page() must have been invoked. fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE. fil_names_dirty_and_write(): Remove the parameter mtr. Write the records using a separate mini-transaction object, because any FILE_ records must be at the start of a mini-transaction log. recv_recover_page(): Add a fil_space_t* parameter. After applying log to the a ROW_FORMAT=COMPRESSED page, invoke buf_zip_decompress() to restore the uncompressed page. buf_page_io_complete(): Remove the temporary hack to discard the uncompressed page of a ROW_FORMAT=COMPRESSED page. page_zip_write_header(): Remove. Use mtr_t::write() or mtr_t::memset() instead, and update the compressed page frame separately. trx_undo_header_add_space_for_xid(): Remove. trx_undo_seg_create(): Perform the changes that were previously made by trx_undo_header_add_space_for_xid(). btr_reset_instant(): New function: Reset the table to MariaDB 10.2 or 10.3 format when rolling back an instant ALTER TABLE operation. page_rec_find_owner_rec(): Merge with the only callers. page_cur_insert_rec_low(): Combine writes by using a local buffer. MEMMOVE data from the preceding record whenever feasible (copying at least 3 bytes). page_cur_insert_rec_zip(): Combine writes to page header fields. PageBulk::insertPage(): Issue MEMMOVE records to copy a matching part from the preceding record. PageBulk::finishPage(): Combine the writes to the page header and to the sparse page directory slots. mtr_t::write(): Only log the least significant (last) bytes of multi-byte fields that actually differ. For updating FSP_SIZE, we must always write all 4 bytes to the redo log, so that the fil_space_set_recv_size() logic in recv_sys_t::parse() will work. mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument instead of a numeric offset to the page frame. Only log the last bytes of multi-byte fields that actually differ. In fil_space_crypt_t::write_page0(), we must log also any unchanged bytes, so that recovery will recognize the record and invoke fil_crypt_parse(). Future work: MDEV-21724 Optimize page_cur_insert_rec_low() redo logging MDEV-21725 Optimize btr_page_reorganize_low() redo logging MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED	2020-02-13 19:12:17 +02:00
Marko Mäkelä	67c76704a8	MDEV-12353: Remove MLOG_INDEX_LOAD (innodb_log_optimize_ddl) NOTE: This may break crash-upgrade from a dataset that was created with innodb_log_optimize_ddl=ON. Also due to ROW_FORMAT=COMPRESSED pages, it will be easiest to disallow crash-upgrade. It would be more robust to disable the MDEV-12699 logic when crash-upgrading from old redo log format. log_optimized_ddl_op: Remove. fil_space_t::enable_lsn, file_name_t::enable_lsn: Remove. ddl_tracker_t::optimized_ddl: Remove. TODO: Remove ddl_tracker	2020-02-13 18:19:15 +02:00
Marko Mäkelä	1a6f708ec5	MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances Our benchmarking efforts indicate that the reasons for splitting the buf_pool in commit `c18084f71b` have mostly gone away, possibly as a result of mysql/mysql-server@ce6109ebfd or similar work. Only in one write-heavy benchmark where the working set size is ten times the buffer pool size, the buf_pool->mutex would be less contended with 4 buffer pool instances than with 1 instance, in buf_page_io_complete(). That contention could be alleviated further by making more use of std::atomic and by splitting buf_pool_t::mutex further (MDEV-15053). We will deprecate and ignore the following parameters: innodb_buffer_pool_instances innodb_page_cleaners There will be only one buffer pool and one page cleaner task. In a number of INFORMATION_SCHEMA views, columns that indicated the buffer pool instance will be removed: information_schema.innodb_buffer_page.pool_id information_schema.innodb_buffer_page_lru.pool_id information_schema.innodb_buffer_pool_stats.pool_id information_schema.innodb_cmpmem.buffer_pool_instance information_schema.innodb_cmpmem_reset.buffer_pool_instance	2020-02-12 14:45:21 +02:00
Eugene Kosov	691c691adc	clean up redo log main change: rename first redo log without file close second change: use os_offset_t to represent offset in a file third change: fix log texts	2020-02-01 23:58:24 +08:00
Marko Mäkelä	50324ce624	MDEV-21351 Replace recv_sys.heap with list of buf_block_t InnoDB crash recovery used a special type of mem_heap_t that allocates backing store from the buffer pool. That incurred a significant overhead, leading to underutilization of memory, and limiting the maximum contiguous allocated size of a log record. recv_sys_t::blocks: A linked list of buf_block_t that are allocated by buf_block_alloc() for redo log records. Replaces recv_sys_t::heap. We repurpose buf_block_t::unzip_LRU for linking the elements. recv_sys_t::max_log_blocks: Renamed from recv_n_pool_free_frames. recv_sys_t::max_blocks(): Accessor for max_log_blocks. recv_sys_t::alloc(): Allocate memory from the current recv_sys_t::blocks element, or allocate another block. In debug builds, various free() member functions must be invoked, because we repurpose buf_page_t::buf_fix_count for tracking allocations. recv_sys_t::free_corrupted_page(): Renamed from recv_recover_corrupt_page() recv_sys_t::is_memory_exhausted(): Renamed from recv_sys_heap_check() recv_sys_t::pages and its elements are allocated directly by the system memory allocator. recv_parse_log_recs(): Remove the parameter available_memory. We rename some variables 'store_to_hash' to 'store', because recv_sys.pages is not actually a hash table. This is joint work with Thirunarayanan Balathandayuthapani.	2020-01-29 12:53:39 +02:00
Marko Mäkelä	a983b24407	Merge 10.4 into 10.5	2020-01-28 14:17:09 +02:00
Oleksandr Byelkin	bfc24bb2ec	Merge branch '10.3' into 10.4	2020-01-24 14:50:23 +01:00
Oleksandr Byelkin	ceda5f724f	Merge branch '10.2' into 10.3	2020-01-24 14:16:20 +01:00
Oleksandr Byelkin	f2ccfcaca1	Merge branch '10.1' into 10.2	2020-01-24 13:46:49 +01:00
Julius Goryavsky	982294ac16	MDEV-17601: MariaDB Galera does not expect 'mbstream' as streamfmt Setting "streamfmt=mbstream" in the "[sst]" section causes SST to fail because the format automatically switches to 'tar' by default (insead of mbstream). To fix this, we need to add mbstream to the list of valid values for the format, making it synonymous with xbstream. This must be done both in the SST script and when parsing the options of the corresponding utilities.	2020-01-21 10:50:48 +01:00
Eugene Kosov	e9de6386ad	MDEV-18115 remove now unneeded constraint log_group_max_size: is not needed because redo log do not use fil_io() now	2020-01-18 23:42:55 +08:00
Sergei Golubchik	ff5a528f26	mysqltest crashes on Debian Debian is apparently offended that pcre2-posix implements POSIX API, thus it renames all posix-compatible symbols in libpcre2-posix to have the PCRE2 prefix. But Debian doesn't do anything to pcre2posix.h header, so any unaware application will get POSIX compatible type names and function prototypes from pcre2, but actual symbols will come from libc. To remedy this enormous incongruity we have to redefine POSIX-compatible function names in pcre2posix to match Debian's hack.	2020-01-16 18:13:55 +01:00
Eugene Kosov	562c037b48	MDEV-18115 Remove dummy tablespace for the redo log Redo log subsystem was decoupled from tablespace subsystem. It now manages file descriptors for redo log files by itself. FIL_TYPE_LOG: removed, code in various places was simplified SRV_LOG_SPACE_FIRST_ID: renamed to SRV_SPACE_ID_UPPER_BOUND to better match its purpose. Code in various places was simplified fil_n_log_flushes: replaced with log_sys::flushes fil_n_pending_log_flushes: replaced with log_sys::pending_flushes log_t::files::files: redo log file descriptors log_t::files::file_names: redo log file names log_t::files::set_file_names(): set file names without opening them log_t::files::open_files(): opens redo log files log_t::files::read(): treats several files as one big log_t::files::write(): treats several files as one big log_t::files::fsync(): flushes page cache to disk log_t::files::close_files(): closes redo log files fil_open_log_and_system_tablespace_files(): renamed to fil_open_system_tablespace_files() and obviously it now doesn't open redo log files global files[1000]: removed. Why it was needed at all?	2020-01-01 22:09:51 +08:00
Marko Mäkelä	8cc15c036d	Merge 10.4 into 10.5	2019-12-27 21:17:16 +02:00
Marko Mäkelä	4c25e75ce7	Merge 10.3 into 10.4	2019-12-27 18:20:28 +02:00
Marko Mäkelä	5ab70e7f68	Merge 10.2 into 10.3	2019-12-27 15:14:48 +02:00
Thirunarayanan Balathandayuthapani	bba59abb03	MDEV-19176 Reduce the memory usage during recovery - Moved the recv_sys->heap memory condition inside recv_parse_log_recs(). So that, InnoDB can mark the status as STORE_NO earlier. - InnoDB uses one third of buffer pool chunk size for reading the redo log records. In that case, we can avoid the scenario where buffer ran out of memory issue during recovery.	2019-12-23 15:51:02 +05:30
Alexey Botchkov	9dadfdcde5	MDEV-14024 PCRE2. Related changes in the server code.	2019-12-21 10:34:02 +01:00
Marko Mäkelä	28c89b7151	Merge 10.4 into 10.5	2019-12-16 07:47:17 +02:00
Marko Mäkelä	8fa759a576	Merge 10.3 into 10.4 We disable the MDEV-21189 test galera.galera_partition because it times out.	2019-12-13 17:30:37 +02:00

... 5 6 7 8 9 ...

1095 commits