mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-30 18:41:56 +01:00

Author	SHA1	Message	Date
Eugene Kosov	31d0727a10	MDEV-18235: Changes related to fsync() Remove fil_node_t::sync_event. I had a discussion with kernel fellows and they said it's safe to call fsync() simultaneously at least on VFS and ext4. So initially I wanted to disable check for recent Linux but than I realized code is buggy. Consider a case when one thread is inside fsync() and two others are waiting inside os_event. First thread after fsync() calls os_event_set() which is a broadcast! So two waiting threads will awake and may call fsync() at the same time. One fix is to add a notify_one() functionality to os_event but I decided to remove incorrect check completely. Note, it works for one waiting thread but not for more than one. IMO it's ok to avoid existing bugs but there is not too much sense in avoiding possible(!) bugs as this code does. fil_space_t::is_in_rotation_list(), fil_space_t::is_in_unflushed_spaces(): Replace redundant bool fields with member functions. fil_node_t::needs_flush: Replaces fil_node_t::modification_counter and fil_node_t::flush_counter. We need to know whether there _are_ some unflushed writes and we do not need to know _how many_ writes. fil_system_t::modification_counter: Remove as not needed. Even if we needed fil_node_t::modification_counter, every file could have its own counter that would be incremented on each write. fil_system_t::modification_counter is a global modification counter for all files. It was incremented on every write. But whether some file was flushed or not is an internal fil_node_t deal/state and this makes fil_system_t::modification_counter useless. Closes #1061	2019-01-25 15:40:04 +02:00
Marko Mäkelä	8e80fd6bfd	Merge 10.1 into 10.2	2019-01-17 11:24:38 +02:00
Marko Mäkelä	71eb762611	Merge 10.0 into 10.1	2019-01-17 06:40:24 +02:00
Eugene Kosov	a06a3e4670	MDEV-18233 Moving the hash_node_t to improve locality of reference When performing a hash search via HASH_SEARCH we first look at a key of a node and then at its pointer to the next node in chain. If we have those in one cache line instead of a two we reduce memory reads. I found dict_table_t, fil_space_t and buf_page_t suitable for such improvement.	2019-01-14 22:14:56 +03:00
Marko Mäkelä	447e493179	Remove some unnecessary InnoDB #include	2018-11-29 12:53:44 +02:00
Marko Mäkelä	e82e216e37	MDEV-17849 Undo tablespace truncation recovery fails to shrink file fil_space_t::add(): Replaces fil_node_create(), fil_node_create_low(). Let the caller pass fil_node_t::handle, to avoid having to close and re-open files. fil_node_t::read_page0(): Refactored from fil_node_open_file(). Read the first page of a data file. fil_node_open_file(): Open the file only once. srv_undo_tablespace_open(): Set the file handle for the opened undo tablespace. This should ensure that ut_ad(file->is_open()) no longer fails in recv_add_trim(). xtrabackup_backup_func(): Remove some dead code. xb_fil_cur_open(): Open files only if needed. Undo tablespaces should already have been opened.	2018-11-27 14:49:39 +02:00
Marko Mäkelä	eb6364619f	Remove the redundant variable fil_n_file_opened	2018-11-27 14:30:39 +02:00
Marko Mäkelä	b3009059d0	Minor cleanup	2018-10-29 12:05:39 +02:00
Eugene Kosov	14be814380	MDEV-17491 micro optimize page_id_t page_id_t: remove m_fold member various places: pass page_id_t by value instead of by reference	2018-10-25 18:46:27 +03:00
Marko Mäkelä	3448ceb02a	MDEV-13564: Implement innodb_unsafe_truncate=ON for compatibility While MariaDB Server 10.2 is not really guaranteed to be compatible with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format change that could be present in XtraBackup, but was reverted from MariaDB in MDEV-12289), we do not want to disrupt users who have deployed xtrabackup and MariaDB Server 10.2 in their environments. With this change, MariaDB 10.2 will continue to use the backup-unsafe TRUNCATE TABLE code, so that neither the undo log nor the redo log formats will change in an incompatible way. Undo tablespace truncation will keep using the redo log only. Recovery or backup with old code will fail to shrink the undo tablespace files, but the contents will be recovered just fine. In the MariaDB Server 10.2 series only, we introduce the configuration parameter innodb_unsafe_truncate and make it ON by default. To allow MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE operations, use loose_innodb_unsafe_truncate=OFF. MariaDB Server 10.3.10 and later releases will always use the backup-safe TRUNCATE TABLE, and this parameter will not be added there. recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables() unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan tables if RENAME operations are not transactional within InnoDB. LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT. log_init(), log_group_file_header_flush(), srv_prepare_to_delete_redo_log_files(), innobase_start_or_create_for_mysql(): Choose the redo log format and subformat based on the value of innodb_unsafe_truncate.	2018-10-11 08:17:04 +03:00
Marko Mäkelä	75f8e86f57	MDEV-17158 TRUNCATE is not atomic after MDEV-13564 It turned out that ha_innobase::truncate() would prematurely commit the transaction already before the completion of the ha_innobase::create(). All of this must be atomic. innodb.truncate_crash: Use the correct DEBUG_SYNC point, and tolerate non-truncation of the table, because the redo log for the TRUNCATE transaction commit might be flushed due to some InnoDB background activity. dict_build_tablespace_for_table(): Merge to the function dict_build_table_def_step(). dict_build_table_def_step(): If a table is being created during an already started data dictionary transaction (such as TRUNCATE), persistently write the table_id to the undo log header before creating any file. In this way, the recovery of TRUNCATE will be able to delete the new file before rolling back the rename of the original table. dict_table_rename_in_cache(): Add the parameter replace_new_file, used as part of rolling back a TRUNCATE operation. fil_rename_tablespace_check(): Add the parameter replace_new. If the parameter is set and a file identified by new_path exists, remove a possible tablespace and also the file. create_table_info_t::create_table_def(): Remove some debug assertions that no longer hold. During TRUNCATE, the transaction will already have been started (and performed a rename operation) before the table is created. Also, remove a call to dict_build_tablespace_for_table(). create_table_info_t::create_table(): Add the parameter create_fk=true. During TRUNCATE TABLE, do not add FOREIGN KEY constraints to the InnoDB data dictionary, because they will also not be removed. row_table_add_foreign_constraints(): If trx=NULL, do not modify the InnoDB data dictionary, but only load the FOREIGN KEY constraints from the data dictionary. ha_innobase::create(): Lock the InnoDB data dictionary cache only if no transaction was passed by the caller. Unlock it in any case. innobase_rename_table(): Add the parameter commit = true. If !commit, do not lock or unlock the data dictionary cache. ha_innobase::truncate(): Lock the data dictionary before invoking rename or create, and let ha_innobase::create() unlock it and also commit or roll back the transaction. trx_undo_mark_as_dict(): Renamed from trx_undo_mark_as_dict_operation() and declared global instead of static. row_undo_ins_parse_undo_rec(): If table_id is set, this must be rolling back the rename operation in TRUNCATE TABLE, and therefore replace_new_file=true.	2018-09-10 14:59:58 +03:00
Marko Mäkelä	cf2a4426a2	MDEV-14717 RENAME TABLE in InnoDB is not crash-safe This is a backport of commit `0bc36758ba` and commit `9eb3fcc9fb`. InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2 redo log records during table-rebuilding ALGORITHM=INPLACE operations. We must write the records for any .ibd file renames, so that the operations are crash-safe. If InnoDB is killed during a RENAME TABLE operation, it can happen that the transaction for updating the data dictionary will be rolled back. But, nothing will roll back the renaming of the .ibd file (the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter, the renaming of the dict_table_t::name in the dict_sys cache. We introduce the undo log record TRX_UNDO_RENAME_TABLE to fix this. fil_space_for_table_exists_in_mem(): Remove the parameters adjust_space, table_id and some code that was trying to work around these deficiencies. fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record. dict_table_rename_in_cache(): Invoke fil_name_write_rename(). trx_undo_rec_copy(): Set the first 2 bytes to the length of the copied undo log record. trx_undo_page_report_rename(), trx_undo_report_rename(): Write a TRX_UNDO_RENAME_TABLE record with the old table name. row_rename_table_for_mysql(): Invoke trx_undo_report_rename() before modifying any data dictionary tables. row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE by invoking dict_table_rename_in_cache(), which will take care of both renaming the table and the file. ha_innobase::truncate(): Remove a work-around.	2018-09-07 22:10:02 +03:00
Marko Mäkelä	055a3334ad	MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.	2018-09-07 22:10:02 +03:00
Marko Mäkelä	2ca904f0ca	MDEV-13103 Deal with page_compressed page corruption fil_page_decompress(): Replaces fil_decompress_page(). Allow the caller detect errors. Remove duplicated code. Use the "safe" instead of "fast" variants of decompression routines. fil_page_compress(): Replaces fil_compress_page(). The length of the input buffer always was srv_page_size (innodb_page_size). Remove printouts, and remove the fil_space_t* parameter. buf_tmp_buffer_t::reserved: Make private; the accessors acquire() and release() will use atomic memory access. buf_pool_reserve_tmp_slot(): Make static. Remove the second parameter. Do not acquire any mutex. Remove the allocation of the buffers. buf_tmp_reserve_crypt_buf(), buf_tmp_reserve_compression_buf(): Refactored away from buf_pool_reserve_tmp_slot(). buf_page_decrypt_after_read(): Make static, and simplify the logic. Use the encryption buffer also for decompressing. buf_page_io_complete(), buf_dblwr_process(): Check more failures. fil_space_encrypt(): Simplify the debug checks. fil_space_t::printed_compression_failure: Remove. fil_get_compression_alg_name(): Remove. fil_iterate(): Allocate a buffer for compression and decompression only once, instead of allocating and freeing it for every page that uses compression, during IMPORT TABLESPACE. Also, validate the page checksum before decryption, and reduce the scope of some variables. fil_page_is_index_page(), fil_page_is_lzo_compressed(): Remove (unused). AbstractCallback::operator()(): Remove the parameter 'offset'. The check for it in FetchIndexRootPages::operator() was basically redundant and dead code since the previous refactoring.	2018-06-14 14:23:01 +03:00
Marko Mäkelä	f5eb37129f	MDEV-13103 Deal with page_compressed page corruption fil_page_decompress(): Replaces fil_decompress_page(). Allow the caller detect errors. Remove duplicated code. Use the "safe" instead of "fast" variants of decompression routines. fil_page_compress(): Replaces fil_compress_page(). The length of the input buffer always was srv_page_size (innodb_page_size). Remove printouts, and remove the fil_space_t* parameter. buf_tmp_buffer_t::reserved: Make private; the accessors acquire() and release() will use atomic memory access. buf_pool_reserve_tmp_slot(): Make static. Remove the second parameter. Do not acquire any mutex. Remove the allocation of the buffers. buf_tmp_reserve_crypt_buf(), buf_tmp_reserve_compression_buf(): Refactored away from buf_pool_reserve_tmp_slot(). buf_page_decrypt_after_read(): Make static, and simplify the logic. Use the encryption buffer also for decompressing. buf_page_io_complete(), buf_dblwr_process(): Check more failures. fil_space_encrypt(): Simplify the debug checks. fil_space_t::printed_compression_failure: Remove. fil_get_compression_alg_name(): Remove. fil_iterate(): Allocate a buffer for compression and decompression only once, instead of allocating and freeing it for every page that uses compression, during IMPORT TABLESPACE. fil_node_get_space_id(), fil_page_is_index_page(), fil_page_is_lzo_compressed(): Remove (unused code).	2018-06-14 13:46:07 +03:00
Marko Mäkelä	3fcc11fbb4	Remove traces of the non-working MDEV-6354 MariaDB never supported the MySQL 5.7 compression format. FIL_PAGE_TYPE_COMPRESSED: Remove. This was originally added as FIL_PAGE_COMPRESSED.	2018-06-13 16:02:40 +03:00
Marko Mäkelä	df42830b28	Merge 10.1 into 10.2	2018-06-06 11:25:33 +03:00
Marko Mäkelä	1d4e1d3263	Merge 10.0 to 10.1	2018-06-06 11:04:17 +03:00
Marko Mäkelä	55abcfa7b7	MDEV-16124 fil_rename_tablespace() times out and crashes server during table-rebuilding ALTER TABLE InnoDB insisted on closing the file handle before renaming a file. Renaming a file should never be a problem on POSIX systems. Also on Windows it should work if the file was opened in FILE_SHARE_DELETE mode. fil_space_t::stop_ios: Remove. We no longer need to stop file access during rename operations. fil_mutex_enter_and_prepare_for_io(): Remove the wait for stop_ios. fil_rename_tablespace(): Remove the retry logic; do not close the file handle. Remove the unused fault injection that was added along with the DATA DIRECTORY functionality (MySQL WL#5980). os_file_create_simple_func(), os_file_create_func(), os_file_create_simple_no_error_handling_func(): Include FILE_SHARE_DELETE in the share_mode. (We will still prevent multiple InnoDB instances from using the same files by not setting FILE_SHARE_WRITE.)	2018-06-05 18:16:12 +03:00
Marko Mäkelä	3d7915f000	Merge 10.1 into 10.2	2018-03-21 22:58:52 +02:00
Marko Mäkelä	6247c64c2a	MDEV-12396 IMPORT TABLESPACE cleanup Reduce unnecessary inter-module calls for IMPORT TABLESPACE. Move some IMPORT-related code from fil0fil.cc to row0import.cc. PageCallback: Remove. Make AbstractCallback the base class. PageConverter: Define some member functions inline.	2018-03-20 15:31:39 +02:00
Marko Mäkelä	112df06996	MDEV-15529 IMPORT TABLESPACE unnecessarily uses the doublewrite buffer fil_space_t::atomic_write_supported: Always set this flag for TEMPORARY TABLESPACE and during IMPORT TABLESPACE. The page writes during these operations are by definition not crash-safe because they are not written to the redo log. fil_space_t::use_doublewrite(): Determine if doublewrite should be used. buf_dblwr_update(): Add assertions, and let the caller check whether doublewrite buffering is desired. buf_flush_write_block_low(): Disable the doublewrite buffer for the temporary tablespace and for IMPORT TABLESPACE. fil_space_set_imported(), fil_node_open_file(), fil_space_create(): Initialize or revise the space->atomic_write_supported flag. buf_page_io_complete(), buf_flush_write_complete(): Add the parameter dblwr, to indicate whether doublewrite was used for writes. buf_dblwr_sync_datafiles(): Remove an unnecessary flush of persistent tablespaces when flushing temporary tablespaces. (Move the call to buf_dblwr_flush_buffered_writes().)	2018-03-10 11:54:34 +02:00
Sergei Golubchik	d4df7bc9b1	Merge branch 'github/10.0' into 10.1	2018-02-02 10:09:44 +01:00
Marko Mäkelä	431607237d	MDEV-12173 "Error: trying to do an operation on a dropped tablespace" InnoDB is issuing a 'noise' message that is not a sign of abnormal operation. The only issuers of it are the debug function lock_rec_block_validate() and the change buffer merge. While the error should ideally never occur in transactional locking, we happen to know that DISCARD TABLESPACE and TRUNCATE TABLE and possibly DROP TABLE are breaking InnoDB table locks. When it comes to the change buffer merge, the message simply is useless noise. We know perfectly well that a tablespace can be dropped while a change buffer merge is pending. And the code is prepared to handle that, which is demonstrated by the fact that whenever the message was issued, InnoDB did not crash. fil_inc_pending_ops(): Remove the parameter print_err.	2018-01-22 16:58:13 +02:00
Marko Mäkelä	ec062c6181	MDEV-12121 follow-up: Unbreak the WITH_INNODB_AHI=OFF build	2018-01-15 15:40:28 +02:00
Marko Mäkelä	843e4508c0	Merge 10.1 into 10.2	2017-11-07 23:02:39 +02:00
Marko Mäkelä	51b4366bfb	MDEV-13328 ALTER TABLE…DISCARD TABLESPACE takes a lot of time With a big buffer pool that contains many data pages, DISCARD TABLESPACE took a long time, because it would scan the entire buffer pool to remove any pages that belong to the tablespace. With a large buffer pool, this would take a lot of time, especially when the table-to-discard is empty. The minimum amount of work that DISCARD TABLESPACE must do is to remove the pages of the to-be-discarded table from the buf_pool->flush_list because any writes to the data file must be prevented before the file is deleted. If DISCARD TABLESPACE does not evict the pages from the buffer pool, then IMPORT TABLESPACE must do it, because we must prevent pre-DISCARD, not-yet-evicted pages from being mistaken for pages of the imported tablespace. It would not be a useful fix to simply move the buffer pool scan to the IMPORT TABLESPACE step. What we can do is to actively evict those pages that could be mistaken for imported pages. In this way, when importing a small table into a big buffer pool, the import should still run relatively fast. Import is bypassing the buffer pool when reading pages for the adjustment phase. In the adjustment phase, if a page exists in the buffer pool, we could replace it with the page from the imported file. Unfortunately I did not get this to work properly, so instead we will simply evict any matching page from the buffer pool. buf_page_get_gen(): Implement BUF_EVICT_IF_IN_POOL, a new mode where the requested page will be evicted if it is found. There must be no unwritten changes for the page. buf_remove_t: Remove. Instead, use trx!=NULL to signify that a write to file is desired, and use a separate parameter bool drop_ahi. buf_LRU_flush_or_remove_pages(), fil_delete_tablespace(): Replace buf_remove_t. buf_LRU_remove_pages(), buf_LRU_remove_all_pages(): Remove. PageConverter::m_mtr: A dummy mini-transaction buffer PageConverter::PageConverter(): Complete the member initialization list. PageConverter::operator()(): Evict any 'shadow' pages from the buffer pool so that pre-existing (garbage) pages cannot be mistaken for pages that exist in the being-imported file. row_discard_tablespace(): Remove a bogus comment that seems to refer to IMPORT TABLESPACE, not DISCARD TABLESPACE.	2017-11-06 18:08:33 +02:00
Marko Mäkelä	dd35fb35fa	Return uint16_t instead of ulint	2017-09-13 16:02:31 +03:00
Jan Lindström	aa22981dd2	MDEV-12741: innodb.ibuf_not_empty failed in buildbot with "InnoDB: Trying to do I/O to a tablespace which does not exist" Background thread is doing ibuf merge, in buf0rea.cc buf_read_ibuf_merge_pages(). It first tries to get page_size and if space is not found it deletes them, but as we do not hold any mutexes, space can be marked as stopped between that and buf_read_page_low() for same space. This naturally leads seen error message on log. buf_read_page_low(): Add parameter ignore_missing_space = false that is passed to fil_io() buf_read_ibuf_merge_pages(): call buf_read_page_low with ignore_missing_space = true, this function will handle missing space error code after buf_read_page_low returns. fil_io(): if ignore_missing_space = true do not print error message about trying to do I/0 for missing space, just return correct error code that is handled later.	2017-08-31 12:34:05 +03:00
Jan Lindström	61096ff214	MDEV-13591: InnoDB: Database page corruption on disk or a failed file read and assertion failure Problem is that page 0 and its possible enrryption information is not read for undo tablespaces. fil_crypt_get_latest_key_version(): Do not send event to encryption threads if event does not yet exists. Seen on regression testing. fil_read_first_page: Add new parameter does page belong to undo tablespace and if it does, we do not read FSP_HEADER. srv_undo_tablespace_open : Read first page of the tablespace to get crypt_data if it exists and pass it to fil_space_create. Tested using innodb_encryption with combinations with innodb-undo-tablespaces.	2017-08-28 09:49:30 +03:00
Thirunarayanan Balathandayuthapani	9d57468dde	Bug #25357789 INNODB: LATCH ORDER VIOLATION DURING TRUNCATE TABLE IF INNODB_SYNC_DEBUG ENABLED Analysis: ======== (1) During TRUNCATE of file_per_table tablespace, dict_operation_lock is released before eviction of dirty pages of a tablespace from the buffer pool. After eviction, we try to re-acquire dict_operation_lock (higher level latch) but we already hold lower level latch (index->lock). This causes latch order violation (2) Deadlock issue is present if child table is being truncated and it holds index lock. At the same time, cascade dml happens and it took dict_operation_lock and waiting for index lock. Fix: ==== 1) Release the indexes lock before releasing the dict operation lock. 2) Ignore the cascading dml operation on the parent table, for the cascading foreign key, if the child table is truncated or if it is in the process of being truncated. Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com> Reviewed-by: Kevin Lewis <kevin.lewis@oracle.com> RB: 16122	2017-08-09 22:28:30 +03:00
Jan Lindström	34eef269eb	MDEV-11939: innochecksum mistakes a file for an encrypted one (page 0 invalid) Always read full page 0 to determine does tablespace contain encryption metadata. Tablespaces that are page compressed or page compressed and encrypted do not compare checksum as it does not exists. For encrypted tables use checksum verification written for encrypted tables and normal tables use normal method. buf_page_is_checksum_valid_crc32 buf_page_is_checksum_valid_innodb buf_page_is_checksum_valid_none Modify Innochecksum logging to file to avoid compilation warnings. fil0crypt.cc fil0crypt.h Modify to be able to use in innochecksum compilation and move fil_space_verify_crypt_checksum to end of the file. Add innochecksum logging to file. univ.i Add innochecksum strict_verify, log_file and cur_page_num variables as extern. page_zip_verify_checksum Add innochecksum logging to file and remove unnecessary code. innochecksum.cc Lot of changes most notable able to read encryption metadata from page 0 of the tablespace. Added test case where we corrupt intentionally FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION (encryption key version) FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION+4 (post encryption checksum) FIL_DATA+10 (data)	2017-08-08 09:41:09 +03:00
Jan Lindström	8b019f87dd	MDEV-11939: innochecksum mistakes a file for an encrypted one (page 0 invalid) Always read full page 0 to determine does tablespace contain encryption metadata. Tablespaces that are page compressed or page compressed and encrypted do not compare checksum as it does not exists. For encrypted tables use checksum verification written for encrypted tables and normal tables use normal method. buf_page_is_checksum_valid_crc32 buf_page_is_checksum_valid_innodb buf_page_is_checksum_valid_none Add Innochecksum logging to file buf_page_is_corrupted Remove ib_logf and page_warn_strict_checksum calls in innochecksum compilation. Add innochecksum logging to file. fil0crypt.cc fil0crypt.h Modify to be able to use in innochecksum compilation and move fil_space_verify_crypt_checksum to end of the file. Add innochecksum logging to file. univ.i Add innochecksum strict_verify, log_file and cur_page_num variables as extern. page_zip_verify_checksum Add innochecksum logging to file. innochecksum.cc Lot of changes most notable able to read encryption metadata from page 0 of the tablespace. Added test case where we corrupt intentionally FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION (encryption key version) FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION+4 (post encryption checksum) FIL_DATA+10 (data)	2017-08-03 08:29:36 +03:00
Marko Mäkelä	8c71c6aa8b	MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.	2017-07-05 11:43:28 +03:00
Marko Mäkelä	615b1f4189	Merge 10.1 into 10.2 innodb.table_flags: Adjust the test case. Due to the MDEV-12873 fix in 10.2, the corrupted flags for table test.td would be converted, and a tablespace flag mismatch will occur when trying to open the file.	2017-06-15 14:35:51 +03:00
Marko Mäkelä	58f87a41bd	Remove some fields from dict_table_t dict_table_t::thd: Remove. This was only used by btr_root_block_get() for reporting decryption failures, and it was only assigned by ha_innobase::open(), and never cleared. This could mean that if a connection is closed, the pointer would become stale, and the server could crash while trying to report the error. It could also mean that an error is being reported to the wrong client. It is better to use current_thd in this case, even though it could mean that if the code is invoked from an InnoDB background operation, there would be no connection to which to send the error message. Remove dict_table_t::crypt_data and dict_table_t::page_0_read. These fields were never read. fil_open_single_table_tablespace(): Remove the parameter "table".	2017-06-15 12:41:02 +03:00
Marko Mäkelä	a78476d342	Merge 10.1 into 10.2	2017-06-12 17:43:07 +03:00
Jan Lindström	58c56dd7f8	MDEV-12610: MariaDB start is slow Problem appears to be that the function fsp_flags_try_adjust() is being unconditionally invoked on every .ibd file on startup. Based on performance investigation also the top function fsp_header_get_crypt_offset() needs to addressed. Ported implementation of fsp_header_get_encryption_offset() function from 10.2 to fsp_header_get_crypt_offset(). Introduced a new function fil_crypt_read_crypt_data() to read page 0 if it is not yet read. fil_crypt_find_space_to_rotate(): Now that page 0 for every .ibd file is not read on startup we need to check has page 0 read from space that we investigate for key rotation, if it is not read we read it. fil_space_crypt_get_status(): Now that page 0 for every .ibd file is not read on startup here also we need to read page 0 if it is not yet read it. This is needed as tests use IS query to wait until background encryption or decryption has finished and this function is used to produce results. fil_crypt_thread(): Add is_stopping condition for tablespace so that we do not rotate pages if usage of tablespace should be stopped. This was needed for failure seen on regression testing. fil_space_create: Remove page_0_crypt_read and extra unnecessary info output. fil_open_single_table_tablespace(): We call fsp_flags_try_adjust only when when no errors has happened and server was not started on read only mode and tablespace validation was requested or flags contain other table options except low order bits to FSP_FLAGS_POS_PAGE_SSIZE position. fil_space_t::page_0_crypt_read removed. Added test case innodb-first-page-read to test startup when encryption is on and when encryption is off to check that not for all tables page 0 is read on startup.	2017-06-09 13:15:39 +03:00
Jan Lindström	1af8bf39ca	MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.	2017-06-01 14:07:48 +03:00
Marko Mäkelä	8f643e2063	Merge 10.1 into 10.2	2017-05-23 11:09:47 +03:00
Marko Mäkelä	70505dd45b	Merge 10.1 into 10.2	2017-05-22 09:46:51 +03:00
Marko Mäkelä	65e1399e64	Merge 10.0 into 10.1 Significantly reduce the amount of InnoDB, XtraDB and Mariabackup code changes by defining pfs_os_file_t as something that is transparently compatible with os_file_t.	2017-05-20 08:41:20 +03:00
Marko Mäkelä	13a350ac29	Merge 10.0 into 10.1	2017-05-19 12:29:37 +03:00
Vicențiu Ciorbaru	45898c2092	Merge remote-tracking branch 'origin/10.0' into 10.0	2017-05-18 15:45:55 +03:00
Vicențiu Ciorbaru	b87873b221	Merge branch 'merge-innodb-5.6' into bb-10.0-vicentiu This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293 from current 5.6.36 innodb. Bug #23481444 OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE INDEX APPLIED BY UNCOMMITTED ROW Problem: ======== row_search_for_mysql() does whole table traversal for range query even though the end range is passed. Whole table traversal happens when the record is not with in transaction read view. Solution: ========= Convert the innodb last record of page to mysql format and compare with end range if the traversal of row_search_mvcc() exceeds 100, no ICP involved. If it is out of range then InnoDB can avoid the whole table traversal. Need to refactor the code little bit to make it compile. Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com> Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com> Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com> RB: 14660	2017-05-17 14:53:28 +03:00
Marko Mäkelä	e22d86a3eb	fil_create_new_single_table_tablespace(): Correct a bogus nonnull attribute The parameter path can be passed as NULL. This error was reported by GCC 7.1.0 when compiling CMAKE_BUILD_TYPE=Debug with -O3.	2017-05-17 13:49:51 +03:00
Vicențiu Ciorbaru	0af9818240	5.6.36	2017-05-15 17:17:16 +03:00
Marko Mäkelä	021d636551	Fix some integer type mismatch. Use uint32_t for the encryption key_id. When filling unsigned integer values into INFORMATION_SCHEMA tables, use the method Field::store(longlong, bool unsigned) instead of using Field::store(double). Fix also some miscellanous type mismatch related to ulint (size_t).	2017-05-10 12:45:46 +03:00
Marko Mäkelä	f9cc391863	Merge 10.1 into 10.2 This only merges MDEV-12253, adapting it to MDEV-12602 which is already present in 10.2 but not yet in the 10.1 revision that is being merged. TODO: Error handling in crash recovery needs to be improved. If a page cannot be decrypted (or read), we should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages. Currently, the test encryption.innodb-redo-badkey randomly fails like this (the last messages are from cmake -DWITH_ASAN): 2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994 2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1 2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption 2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown... i================================================================= ==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0 #0 0x736750 in operator delete(void) (/mariadb/server/build/sql/mysqld+0x736750) #1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4 #2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17 #3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3 #4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2 #5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2 #6 0x197e5e6 in innobase_init(void) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3	2017-05-05 10:38:53 +03:00
Marko Mäkelä	b82c602db5	MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().	2017-04-28 14:12:52 +03:00

1 2 3 4

155 commits