mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 19:11:46 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	32edabd1f2	Merge 10.9 into 10.10	2022-06-09 15:26:09 +03:00
Marko Mäkelä	57d4a242da	Merge 10.7 into 10.8	2022-06-06 16:22:09 +03:00
Marko Mäkelä	7e39470e33	Merge 10.6 into 10.7	2022-06-06 14:56:20 +03:00
Marko Mäkelä	0b47c126e3	MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit `177d8b0c12` is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.	2022-06-06 14:03:22 +03:00
Marko Mäkelä	6b9bba41e8	MDEV-28554: Remove innodb_version INNODB_VERSION_STR: Replaced with PACKAGE_VERSION (non-functional change). INNODB_VERSION_SHORT: Replaced with direct use of MYSQL_VERSION_MAJOR << 8 \| MYSQL_VERSION_MINOR. check_version(): Simplify the mariadb-backup version check, and require the server version to be MariaDB 10.8 or later, because that is when the InnoDB redo log format was last changed.	2022-06-03 12:20:19 +03:00
Marko Mäkelä	2f8d0af883	Merge 10.5 into 10.6	2022-06-02 17:39:13 +03:00
Marko Mäkelä	4b3c3e526e	Merge 10.4 into 10.5	2022-06-02 16:51:13 +03:00
Marko Mäkelä	96f4b4a55b	Merge 10.3 into 10.4	2022-06-02 16:34:17 +03:00
Marko Mäkelä	91d5fffa07	MDEV-28719: compress_write() leaks data_mutex on error	2022-06-01 11:20:47 +03:00
Marko Mäkelä	863c3eda87	MDEV-28689, MDEV-28690: Incorrect error handling for ctrl_mutex comp_thread_ctxt_t: Remove ctrl_mutex, ctrl_cond, started. We do not actually need them for anything. destroy_worker_thread(): Split from destroy_worker_threads(). create_worker_threads(): We already initialize thd->data_avail=FALSE and thd->cancelled=FALSE before invoking pthread_create(). If any thread creation fails, clean up by destroy_worker_thread(). compress_worker_thread_func(): Assume that thd->started and thd->data_avail are already initialized. Reviewed by: Vladislav Vaintroub	2022-05-30 15:49:45 +03:00
Sergei Golubchik	b7ffccf49b	Merge branch '10.7' into 10.8	2022-05-18 13:26:48 +02:00
Sergei Golubchik	99a433ed1c	Merge branch '10.6' into 10.7	2022-05-18 10:34:38 +02:00
Marko Mäkelä	daa2680c78	Merge 10.5 into 10.6	2022-05-12 08:11:57 +03:00
Vlad Lesin	3fabdc3ca8	MDEV-28473 field_ref_zero is not initialized in xtrabackup_prepare_func() The solution is to initialize field_ref_zero in main_low() before xtrabackup_backup_func() and xtrabackup_prepare_func() calls.	2022-05-11 17:20:31 +03:00
Sergei Golubchik	443c2a715d	Merge branch '10.7' into 10.8	2022-05-11 12:21:36 +02:00
Sergei Golubchik	fd132be117	Merge branch '10.6' into 10.7	2022-05-11 11:25:33 +02:00
Sergei Golubchik	3bc98a4ec4	Merge branch '10.5' into 10.6	2022-05-10 14:01:23 +02:00
Sergei Golubchik	ef781162ff	Merge branch '10.4' into 10.5	2022-05-09 22:04:06 +02:00
Sergei Golubchik	a70a1cf3f4	Merge branch '10.3' into 10.4	2022-05-08 23:03:08 +02:00
Oleksandr Byelkin	9614fde1aa	Merge branch '10.2' into 10.3	2022-05-03 10:59:54 +02:00
Alexander Barkov	680ca15269	MDEV-28446 mariabackup prepare fails for incrementals if a new schema is created after full backup is taken When "mariabackup --target-dir=$basedir --incremental-dir=$incremental_dir" is running and is moving a new table file (e.g. `db1/t1.new`) from the incremental directory to the base directory, it needs to verify that the base backup database directory (e.g. `$basedir/db1`) really exists (or create it otherwise). The table `db1/t1` can come from a new database `db1` which was created during the base mariabackup execution time. In such case the directory `db1` exists only in the incremental directory, but does not exist in the base directory.	2022-05-02 11:21:10 +04:00
Marko Mäkelä	133c2129cd	Merge 10.7 into 10.8	2022-04-27 10:43:00 +03:00
Marko Mäkelä	638afc4acf	Merge 10.6 into 10.7	2022-04-26 18:59:40 +03:00
Alexander Barkov	907e4c62ce	MDEV-21037 mariabackup does not detect multi-source replication slave	2022-04-25 15:00:09 +04:00
Marko Mäkelä	fae0ccad6e	Merge 10.5 into 10.6	2022-04-21 17:46:40 +03:00
Marko Mäkelä	620c55e708	Merge 10.4 into 10.5	2022-04-21 15:33:50 +03:00
Vlad Lesin	1b558dd462	MDEV-27919 mariabackup --log-copy-interval is measured in millisecondss in 10.5 and in microseconds in 10.6 Multiply polling interval by 1000.	2022-04-21 15:24:59 +03:00
Marko Mäkelä	394784095e	Merge 10.3 into 10.4	2022-04-21 11:33:59 +03:00
Sergei Golubchik	bbdec04d59	MDEV-24317 Data race in LOGGER::init_error_log at sql/log.cc:1443 and in LOGGER::error_log_print at sql/log.cc:1181 don't initialize error_log_handler_list in set_handlers() * error_log_handler_list is initialized to LOG_FILE early, in init_base() * set_handlers always reinitializes it to LOG_FILE, so it's pointless * after init_base() concurrent threads start using sql_log_warning, so following set_handlers() shouldn't modify error_log_handler_list without some protection	2022-04-12 13:07:20 +02:00
Marko Mäkelä	5d8dcfd86c	MDEV-25975: Merge 10.4 into 10.5	2022-04-06 10:30:49 +03:00
Marko Mäkelä	d172df9913	MDEV-25975: Merge 10.3 into 10.4	2022-04-06 09:18:38 +03:00
Marko Mäkelä	e9735a8185	MDEV-25975 innodb_disallow_writes causes shutdown to hang We will remove the parameter innodb_disallow_writes because it is badly designed and implemented. The parameter was never allowed at startup. It was only internally used by Galera snapshot transfer. If a user executed SET GLOBAL innodb_disallow_writes=ON; the server could hang even on subsequent read operations. During Galera snapshot transfer, we will block writes to implement an rsync friendly snapshot, as follows: sst_flush_tables() will acquire a global lock by executing FLUSH TABLES WITH READ LOCK, which will block any writes at the high level. sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true), will suspend or disable InnoDB background tasks or threads that could initiate writes. As part of this, log_make_checkpoint() will be invoked to ensure that anything in the InnoDB buf_pool.flush_list will be written to the data files. This has the nice side effect that the Galera joiner will avoid crash recovery. The changes to sql/wsrep.cc and to the tests are based on a prototype that was developed by Jan Lindström. Reviewed by: Jan Lindström	2022-04-06 08:06:49 +03:00
Marko Mäkelä	dce8a846ae	Merge 10.7 into 10.8	2022-03-03 11:34:58 +02:00
Marko Mäkelä	64ea3eab8f	Merge 10.6 into 10.7	2022-03-03 11:11:00 +02:00
Otto Kekäläinen	1fa872f6ef	Fix various spelling errors Among others: existance -> existence reinitialze -> reinitialize successfuly -> successfully	2022-03-03 13:42:49 +11:00
Marko Mäkelä	32d741b5b0	Merge 10.7 into 10.8	2022-02-25 16:24:13 +02:00
Marko Mäkelä	3d88f9f34c	Merge 10.6 into 10.7	2022-02-25 16:09:16 +02:00
Marko Mäkelä	a23414dd32	MDEV-27939 Log buffer wrap-around errors on PMEM When the log is stored in persistent memory, log_sys.buf[] is a ring buffer that directly maps to the circular ib_logfile0 file. There were several errors that could occur in the special case when a log record ends exactly at the end of the log file and the next record would start at log_sys.buf[log_sys.START_OFFSET]. mariabackup.huge_lsn,strict_full_crc32: Write the first record at the very end of the circular file, to reproduce the failure scenarios. recv_sys_t::parse(): On PMEM, wrap the end offset of the record from log_sys.file_size to log_sys.START_OFFSET if needed. Otherwise, both InnoDB recovery and mariadb-backup would try to parse the next record from an invalid address. filename_to_spacename(): Remove an assumption about the format of file names. While the server currently writes file names like ./databasename/tablename.ibd we might want to stop writing the redundant ./ prefix in the future. The test mariabackup.huge_lsn is generating such file names. xtrabackup_copy_logfile(): Correctly copy a record that ends at the very end of the log_sys.buf[]. The errors in mariadb-backup were reproduced with the test mariabackup.huge_lsn,strict_full_crc32 and an additional patch to use the start checkpoint of the test: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 27dce5fa17d..e17a1692d6f 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -1796,7 +1796,8 @@ dberr_t recv_sys_t::find_checkpoint() continue; } - if (checkpoint_lsn >= log_sys.next_checkpoint_lsn) + if (checkpoint_lsn >= log_sys.next_checkpoint_lsn && + checkpoint_lsn != 0x1000fffffe10) { log_sys.next_checkpoint_lsn= checkpoint_lsn; log_sys.next_checkpoint_no= field == log_t::CHECKPOINT_1;	2022-02-25 15:50:09 +02:00
Marko Mäkelä	6daf8f8a0d	Merge 10.5 into 10.6	2022-02-25 13:48:47 +02:00
Marko Mäkelä	b791b942e1	Merge 10.4 into 10.5	2022-02-25 13:27:41 +02:00
Marko Mäkelä	f5ff7d09c7	Merge 10.3 into 10.4	2022-02-25 13:00:48 +02:00
Marko Mäkelä	00b70bbb51	Merge 10.2 into 10.3	2022-02-25 10:43:38 +02:00
Julius Goryavsky	17e0f5224c	MDEV-27524: Incorrect binlogs after Galera SST using rsync and mariabackup This commit adds correct handling of binlogs for SST using rsync or mariabackup. Before this fix, binlogs were handled incorrectly - - only one (last) binary log file was transferred during SST, which then led to various failures (for example, when trying to list all events from the binary log). These bugs were long masked by flaws in the primitive binlogs handling code in the SST scripts, which causing binary logs files to be erased after transfer or not added to the binlog index on the joiner node. Now the correct transfer of all binary logs (not just the last of the binary log files) has been implemented both for the rsync (at the script level) and for the mariabackup (at the level of the main utility code). This commit also adds a new sst_max_binlogs=<n> parameter, which can be located in the [sst] section or in the [xtrabackup] section (historically, supported for mariabackup only, not for rsync), or in one of the server sections. This parameter specifies the number of binary log files to be sent to the joiner node during SST. This option is added for compatibility with old SST scripting behavior, which can be emulated by setting the sst_max_binlogs=1 (although in general this can cause problems for the reasons described above). In addition, setting the sst_max_binlogs=0 can be used to suppress the transmission of binary logs to the joiner nodes during SST (although sometimes a single file with the current binary log can still be transmitted to the joiner, even with sst_max_binlogs=0, because this sometimes necessary in modes that involve the use of GTIDs with Galera). Also, this commit ensures correct handling of paths to various innodb files and directories in the SST scripts, and fixes some problems with this that existed in mariabackup utility (which were associated with incorrect handling of the innodb_data_dir parameter in some scenarios). In addition, this commit contains the following enhancements: 1) Added tests for mtr, which check the correct work with binlogs after SST (using rsync and mariabackup); 2) Added correct handling of slashes at the end of all paths that the SST script receives as parameters; 3) Improved parsing code for --mysqld-args parameters. Now it correctly processes the sequence "--" after the name of the one-letter option; 4) Checking the secret signature during joiner authentication is made independent of presence of bash (as a unix shell) in the system and diff utility no longer needed to check certificates compliance; 5) All directories that are necessary for the correct placement of various logs are automatically created by SST scripts in advance (before running mariabackup on the joiner node); 6) Removal of old binary logs on joiner is done using the binlog index (if it exists) (not only by fixed pattern that based on the current binlog name, as before); 7) Paths for placing binary logs are correctly processed if they are set as relative paths (to the datadir); 8) SST scripts are made even more resistant to spaces in filenames (now for binlogs); 9) In case of failure, SST scripts now always end with an exit code other than zero; 10) SST script for rsync now correctly create a tar file with the binlogs, even if the paths to them (in the binlog index file) are specified as a mix of absolute and relative paths, and even if they do not match with the datadir path specified in the current configuration settings.	2022-02-22 10:45:06 +01:00
Julius Goryavsky	571eb9d775	mariabackup: cosmetic changes (whitespaces and indentation)	2022-02-22 10:20:58 +01:00
Marko Mäkelä	f1beeb58e6	MDEV-27848: Remove unused wait/io/file/innodb/innodb_log_file The performance_schema counter wait/io/file/innodb/innodb_log_file is always reported as 0. The way how redo log writes are being waited for was refactored in commit `30ea63b7d2` by the introduction of flush_lock and write_lock. Even before that change, all the wait/io/file/innodb/ counters were always 0 in my tests. Moreover, if the PMEM interface that was introduced in commit `3daef523af` is being used, writes to the InnoDB log file will completely avoid any system calls and performance_schema instrumentation. In commit `685d958e38` also the reads of the redo log (during recovery) would bypass any system calls.	2022-02-15 15:03:15 +02:00
Marko Mäkelä	a635c40648	MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit() A prominent bottleneck in mtr_t::commit() is log_sys.mutex between log_sys.append_prepare() and log_close(). User-visible change: The minimum innodb_log_file_size will be increased from 1MiB to 4MiB so that some conditions can be trivially satisfied. log_sys.latch (log_latch): Replaces log_sys.mutex and log_sys.flush_order_mutex. Copying mtr_t::m_log to log_sys.buf is protected by a shared log_sys.latch. Writes from log_sys.buf to the file system will be protected by an exclusive log_sys.latch. log_sys.lsn_lock: Protects the allocation of log buffer in log_sys.append_prepare(). sspin_lock: A simple spin lock, for log_sys.lsn_lock. Thanks to Vladislav Vaintroub for suggesting this idea, and for reviewing these changes. mariadb-backup: Replace some use of log_sys.mutex with recv_sys.mutex. buf_pool_t::insert_into_flush_list(): Implement sorting of flush_list because ordering is otherwise no longer guaranteed. Ordering by LSN is needed for the proper operation of redo log checkpoints. log_sys.append_prepare(): Advance log_sys.lsn and log_sys.buf_free by the length, and return the old values. Also increment write_to_buf, which was previously done in log_close(). mtr_t::finish_write(): Obtain the buffer pointer from log_sys.append_prepare(). log_sys.buf_free: Make the field Atomic_relaxed, to simplify log_flush_margin(). Use only loads and stores to avoid costly read-modify-write atomic operations. buf_pool.flush_list_requests: Replaces export_vars.innodb_buffer_pool_write_requests and srv_stats.buf_pool_write_requests. Protected by buf_pool.flush_list_mutex. buf_pool_t::insert_into_flush_list(): Do not invoke page_cleaner_wakeup(). Let the caller do that after a batch of calls. recv_recover_page(): Invoke a minimal part of buf_pool.insert_into_flush_list(). ReleaseBlocks::modified: A number of pages added to buf_pool.flush_list. ReleaseBlocks::operator(): Merge buf_flush_note_modification() here. log_t::set_capacity(): Renamed from log_set_capacity().	2022-02-10 16:37:12 +02:00
Marko Mäkelä	8c7c92adf3	MDEV-27787 mariadb-backup --backup is allocating extra memory for log records In commit `685d958e38` (MDEV-14425), the log parsing in mariadb-backup --backup was rewritten. The parameter STORE_IF_EXISTS that is being passed to recv_sys.parse_mtr() or recv_sys.parse_pmem() instead of STORE_NO caused unnecessary additional memory allocation for redo log records.	2022-02-10 15:39:27 +02:00
Oleksandr Byelkin	4fb2cb1a30	Merge branch '10.7' into 10.8	2022-02-04 14:50:25 +01:00
Oleksandr Byelkin	9ed8deb656	Merge branch '10.6' into 10.7	2022-02-04 14:11:46 +01:00
Thirunarayanan Balathandayuthapani	8d742fe4ac	MDEV-26326 mariabackup skip valid ibd file - Store the deferred tablespace name while loading the tablespace for backup process. - Mariabackup stores the list of space ids which has page0 INIT_PAGE records. backup_first_page_op() and first_page_init() was introduced to track the page0 INIT_PAGE records. - backup_file_op() and log_file_op() was changed to handle FILE_MODIFY redo log records. It is used to identify the deferred tablespace space id. - Whenever file operation redo log was processed by backup, backup_file_op() should check whether the space name exist in deferred tablespace. If it is then it needs to store the space id, name when FILE_MODIFY, FILE_RENAME redo log processed and it should delete the tablespace name from defer list in other cases. - backup_fix_ddl() should check whether deferred tablespace has any page0 init records. If it is then consider the tablespace as newly created tablespace. If not then backup should try to reload the tablespace with SRV_BACKUP_NO_DEFER mode to avoid the deferring of tablespace.	2022-02-01 19:50:08 +05:30

... 2 3 4 5 6 ...

905 commits