mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	680463a8d9	Merge 10.2 into 10.3	2020-06-05 16:51:26 +03:00
Marko Mäkelä	efc70da5fd	MDEV-22769 Shutdown hang or crash due to XA breaking locks The background drop table queue in InnoDB is a work-around for cases where the SQL layer is requesting DDL on tables on which transactional locks exist. One such case are XA transactions. Our test case exploits the fact that the recovery of XA PREPARE transactions will only resurrect InnoDB table locks, but not MDL that should block any concurrent DDL. srv_shutdown_t: Introduce the srv_shutdown_state=SRV_SHUTDOWN_INITIATED for the initial part of shutdown, to wait for the background drop table queue to be emptied. srv_shutdown_bg_undo_sources(): Assign srv_shutdown_state=SRV_SHUTDOWN_INITIATED before waiting for the background drop table queue to be emptied. row_drop_tables_for_mysql_in_background(): On slow shutdown, if no active transactions exist (excluding ones that are in XA PREPARE state), skip any tables on which locks exist. row_drop_table_for_mysql(): Do not unnecessarily attempt to drop InnoDB persistent statistics for tables that have already been added to the background drop table queue. row_mysql_close(): Relax an assertion, and free all memory even if innodb_force_recovery=2 would prevent the background drop table queue from being emptied.	2020-06-05 15:22:46 +03:00
Marko Mäkelä	3d0bb2b7f1	Merge 10.2 into 10.3	2020-05-15 19:11:57 +03:00
Marko Mäkelä	ad6171b91c	MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB If the InnoDB buffer pool contains many pages for a table or index that is being dropped or rebuilt, and if many of such pages are pointed to by the adaptive hash index, dropping the adaptive hash index may consume a lot of time. The time-consuming operation of dropping the adaptive hash index entries is being executed while the InnoDB data dictionary cache dict_sys is exclusively locked. It is not actually necessary to drop all adaptive hash index entries at the time a table or index is being dropped or rebuilt. We can let the LRU replacement policy of the buffer pool take care of this gradually. For this to work, we must detach the dict_table_t and dict_index_t objects from the main dict_sys cache, and once the last adaptive hash index entry for the detached table is removed (when the garbage page is evicted from the buffer pool) we can free the dict_table_t and dict_index_t object. Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE skip both the buffer pool eviction and the drop of the adaptive hash index. We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE. We can remove the eviction from DROP TABLE. We must retain the eviction in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the discarded table is being re-imported with the same tablespace identifier, the fresh data from the imported tablespace will replace any stale pages in the buffer pool. rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can no longer be interrupted inside InnoDB. fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(), fseg_free_page_low(), fseg_free_extent(): Remove the parameter that specifies whether the adaptive hash index should be dropped. btr_search_lazy_free(): Lazily free an index when the last reference to it is dropped from the adaptive hash index. buf_pool_clear_hash_index(): Declare static, and move to the same compilation unit with the bulk of the adaptive hash index code. dict_index_t::clone(), dict_index_t::clone_if_needed(): Clone an index that is being rebuilt while adaptive hash index entries exist. The original index will be inserted into dict_table_t::freed_indexes and dict_index_t::set_freed() will be called. dict_index_t::set_freed(), dict_index_t::freed(): Note that or check whether the index has been freed. We will use the impossible page number 1 to denote this condition. dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count(). dict_index_t::detach_columns(): Move the assignment n_fields=0 to ha_innobase_inplace_ctx::clear_added_indexes(). We must have access to the columns when freeing the adaptive hash index. Note: dict_table_t::v_cols[] will remain valid. If virtual columns are dropped or added, the table definition will be reloaded in ha_innobase::commit_inplace_alter_table(). buf_page_mtr_lock(): Drop a stale adaptive hash index if needed. We will also reduce the number of btr_get_search_latch() calls and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT in order to benefit cmake -DWITH_INNODB_AHI=OFF.	2020-05-15 17:23:08 +03:00
Marko Mäkelä	15fa70b840	Merge 10.2 into 10.3	2020-05-13 11:45:05 +03:00
Marko Mäkelä	6bc4444d7c	Merge 10.1 into 10.2	2020-05-13 11:12:31 +03:00
Marko Mäkelä	26aab96ecf	MDEV-22497 [ERROR] InnoDB: Unable to purge a record The InnoDB insert buffer was upgraded in MySQL 5.5 into a change buffer that also covers delete-mark and delete (purge) operations. There is an important constraint for delete operations: a B-tree leaf page must not become empty unless the entire tree becomes empty, consisting of an empty root page. Because change buffer merges only occur on a single leaf page at a time, delete operations must not be buffered if it is possible that the last record of the page could be deleted. (In that case, we would refuse to use the change buffer, and if we really delete the last record, we would shrink the index tree.) The function ibuf_get_volume_buffered_hash() is part of our insurance that the page would not become empty. It is supposed to map each buffered INSERT or DELETE_MARK record payload into a hash value. We will only count each such record as a distinct key if there is no hash collision. DELETE operations will always decrement the predicted number fo records in the page. Due to a bug in the function, we would actually compute the hash value not only on the record payload, but also on some following bytes, in case the record contains NULL values. In MySQL Bug #61104, we had some examples of this dating back to 2012. But back then, we failed to reproduce the bug, and in commit `d84c95579b` we simply demoted the hard assertion to a message printout and a debug assertion failure. ibuf_get_volume_buffered_hash(): Correctly compute the hash value of the payload bytes only. Note: we will consider ('foo','bar'),(NULL,'foobar'),('foob','ar') to be equal, but this is not a problem, because in case of a hash collision, we could also consider ('boo','far') to be equal, and underestimate the number of records in the page, leading to refusing to buffer a DELETE.	2020-05-07 20:44:33 +03:00
Oleksandr Byelkin	7fb73ed143	Merge branch '10.2' into 10.3	2020-05-04 16:47:11 +02:00
Daniel Black	ba2061da52	MDEV-21595: innodb offset_t rename to rec_offs thanks to: perl -i -pe 's/\boffset_t\b/rec_offs/g' $(git grep -lw offset_t storage/innobase)	2020-04-29 12:02:47 +03:00
Marko Mäkelä	1a9b6c4c7f	Merge 10.2 into 10.3	2020-03-30 11:12:56 +03:00
Eugene Kosov	884d22f288	remove fishy reinterpret_cast from buf_page_is_zeroes() In my micro-benchmarks memcmp(4196) 3 times faster than old implementation. Also, it's generally better to use as less reinterpret_casts<> as possible. buf_is_zeroes(): renamed from buf_page_is_zeroes() and argument changed to span<> for convenience. st_::span<T>::const_iterator: fixed page_zip-verify_checksum(): make argument byte* instead of void*	2020-03-20 21:35:42 +03:00
Marko Mäkelä	3466b47b0d	Merge 10.2 into 10.3	2019-12-13 10:08:57 +02:00
Eugene Kosov	f0aa073f2b	MDEV-20950 Reduce size of record offsets offset_t: this is a type which represents one record offset. It's unsigned short int. a lot of functions: replace ulint with offset_t btr_pcur_restore_position_func(), page_validate(), row_ins_scan_sec_index_for_duplicate(), row_upd_clust_rec_by_insert_inherit_func(), row_vers_impl_x_locked_low(), trx_undo_prev_version_build(): allocate record offsets on the stack instead of waiting for rec_get_offsets() to allocate it from mem_heap_t. So, reducing memory allocations. RECORD_OFFSET, INDEX_OFFSET: now it's less convenient to store pointers in offset_t* array. One pointer occupies now several offset_t. And those constant are start indexes into array to places where to store pointer values REC_OFFS_HEADER_SIZE: adjusted for the new reality REC_OFFS_NORMAL_SIZE: increase size from 100 to 300 which means less heap allocations. And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which is smaller than previous 800 bytes. REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality rem0rec.h, rem0rec.ic, rem0rec.cc: various arguments, return values and local variables types were changed to fix numerous integer conversions issues. enum field_type_t: offset types concept was introduces which replaces old offset flags stuff. Like in earlier version, 2 upper bits are used to store offset type. And this enum represents those types. REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed get_type(), set_type(), get_value(), combine(): these are convenience functions to work with offsets and it's types rec_offs_base()[0]: still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL rec_offs_base()[i]: these have type offset_t now. Two upper bits contains type.	2019-12-13 00:26:50 +07:00
Jan Lindström	9d9a2253c6	Merge remote-tracking branch 10.2 into 10.3 Conflicts: mysql-test/suite/galera/t/galera_binlog_event_max_size_max-master.opt mysql-test/suite/innodb/r/innodb-mdev-7513.result mysql-test/suite/innodb/t/innodb-mdev-7513.test mysql-test/suite/wsrep/disabled.def storage/innobase/ibuf/ibuf0ibuf.cc	2019-12-02 14:35:10 +02:00
Marko Mäkelä	a8395853cc	MDEV-21152 Bogus debug assertion btr_pcur_is_after_last_in_tree() in ibuf code As noted in commit `abd45cdc38` a search with PAGE_CUR_GE may land on the supremum record on a leaf page that is not the rightmost leaf page. This could occur when all keys on the current page are smaller than the search key, and the smallest key on the successor page is larger than the search key. Hence, after a failed PAGE_CUR_GE search, assertions btr_pcur_is_after_last_in_tree() are bogus and should be replaced with btr_pcur_is_after_last_on_page().	2019-11-26 16:32:51 +02:00
Marko Mäkelä	613e13072c	Merge 10.2 into 10.3	2019-11-19 00:38:33 +02:00
Marko Mäkelä	b80df9eba2	MDEV-21069 Crash on DROP TABLE if the data file is corrupted buf_read_ibuf_merge_pages(): Discard any page numbers that are outside the current bounds of the tablespace, by invoking the function ibuf_delete_recs() that was introduced in MDEV-20934. This could avoid an infinite change buffer merge loop on innodb_fast_shutdown=0, because normally the change buffer merge would only be attempted if a page was successfully loaded into the buffer pool. dict_drop_index_tree(): Add the parameter trx_t*. To prevent the DROP TABLE crash, do not invoke btr_free_if_exists() if the entire .ibd file will be dropped. Thus, we will avoid a crash if the BTR_SEG_LEAF or BTR_SEG_TOP of the index is corrupted, and we will also avoid unnecessarily accessing the to-be-dropped tablespace via the buffer pool. In MariaDB 10.2, we disable the DROP TABLE fix if innodb_safe_truncate=0, because the backup-unsafe MySQL 5.7 WL#6501 form of TRUNCATE TABLE requires that the individual pages be freed inside the tablespace.	2019-11-19 00:07:06 +02:00
Marko Mäkelä	3d4a801533	MDEV-12353 preparation: Replace mtr_x_lock() and friends Apart from page latches (buf_block_t::lock), mini-transactions are keeping track of at most one dict_index_t::lock and fil_space_t::latch at a time, and in a rare case, purge_sys.latch. Let us introduce interfaces for acquiring an index latch or a tablespace latch. In a later version, we may want to introduce mtr_t members for holding a latched dict_index_t* and fil_space_t, and replace the remaining use of mtr_t::m_memo with std::set<buf_block_t> or with a map<buf_block_t,byte> pointing to log records.	2019-11-14 11:40:33 +02:00
Marko Mäkelä	abd45cdc38	MDEV-20934: Correct a debug assertion A search with PAGE_CUR_GE may land on the supremum record on a leaf page that is not the rightmost leaf page. This could occur when all keys on the current page are smaller than the search key, and the smallest key on the successor page is larger than the search key. ibuf_delete_recs(): Correct the debug assertion accordingly.	2019-11-13 09:26:10 +02:00
Marko Mäkelä	4fcfdb60e7	Merge 10.2 into 10.3	2019-11-11 14:56:51 +02:00
Marko Mäkelä	29d67d051a	Cleanup btr_page_get_prev(), btr_page_get_next() Remove the redundant parameter mtr_t*. Make use of page_has_prev(), page_has_next() whenever possible.	2019-11-11 13:36:21 +02:00
Marko Mäkelä	908ca4668d	Merge 10.2 into 10.3	2019-11-06 13:14:31 +02:00
Marko Mäkelä	d7a2401750	MDEV-20934 Infinite loop on innodb_fast_shutdown=0 with inconsistent change buffer Due to a data corruption bug that may have occurred a long time earlier (possibly involving physical backup and MySQL Bug #69122, which was addressed in commit `f166ec71b7`) it seems possible that the InnoDB change buffer might end up containing entries, while no buffered changes exist according to the change buffer bitmap pages in the .ibd files. ibuf_delete_recs(): New function, to be invoked on slow shutdown only. Remove all buffered changes for a specific page. ibuf_merge_or_delete_for_page(): If the change buffer bitmap is clean and a slow shutdown is in progress, invoke ibuf_delete_recs(). We do not want to do that during normal operation, due to the additional overhead that is involved. The bitmap page should be consistent with the change buffer in the first place.	2019-11-06 08:48:48 +02:00
Oleksandr Byelkin	55b2281a5d	Merge branch '10.2' into 10.3	2019-10-31 10:58:06 +01:00
Marko Mäkelä	2809842031	MDEV-20864 Introduce debug option innodb_change_buffer_dump To diagnose a hang in slow shutdown (innodb_fast_shutdown=0), let us introduce a Boolean startup option in debug builds that will cause the contents of the InnoDB change buffer to be dumped to the server error log at startup.	2019-10-19 15:16:47 +03:00
Marko Mäkelä	8e3d85e112	Merge 10.2 into 10.3	2019-10-12 06:34:09 +03:00
Marko Mäkelä	966d97b5f9	Merge 10.1 into 10.2	2019-10-11 18:38:18 +03:00
Marko Mäkelä	cbfd6882f4	Merge 5.5 into 10.1	2019-10-11 15:19:55 +03:00
Marko Mäkelä	ea61b79694	MDEV-20805 ibuf_add_free_page() is not initializing FIL_PAGE_TYPE first In the function recv_parse_or_apply_log_rec_body() there are debug checks for validating the state of the page when redo log records are being applied. Most notably, FIL_PAGE_TYPE should be set before anything else is being written to the page. ibuf_add_free_page(): Set FIL_PAGE_TYPE before performing any other changes.	2019-10-11 14:12:36 +03:00
Marko Mäkelä	892378fb9d	Merge 10.2 into 10.3	2019-10-09 13:25:11 +03:00
Thirunarayanan Balathandayuthapani	c65cb244b3	MDEV-19335 Remove buf_page_t::encrypted The field buf_page_t::encrypted was added in MDEV-8588. It was made mostly redundant in MDEV-12699. Remove the field.	2019-10-09 13:13:12 +03:00
Marko Mäkelä	d480d28f4f	Add page_has_prev(), page_has_next(), page_has_siblings() Until now, InnoDB inefficiently compared the aligned fields FIL_PAGE_PREV, FIL_PAGE_NEXT to the byte-order-agnostic value FIL_NULL. This is a backport of `32170f8c6d` from MariaDB Server 10.3.	2019-10-09 08:29:26 +03:00
Marko Mäkelä	da9201dd5b	Merge 10.2 into 10.3	2019-09-10 09:25:20 +03:00
Marko Mäkelä	43a6e81ccb	MDEV-19514 preparation: Remove innodb_change_buffering_debug=2 The setting innodb_change_buffering_debug=2 was supposed to inject a crash during change buffer merge. There is no public test for that functionality, and even if there were, it would be better to use DEBUG_SYNC to halt the thread that does change buffer merge, force a redo log flush from another thread, and finally kill the server externally.	2019-09-09 18:18:52 +03:00
Marko Mäkelä	e82fe21e3a	Merge 10.2 into 10.3	2019-07-02 17:46:22 +03:00
Thirunarayanan Balathandayuthapani	723a4b1d78	MDEV-17228 Encrypted temporary tables are not encrypted - Introduce a new variable called innodb_encrypt_temporary_tables which is a boolean variable. It decides whether to encrypt the temporary tablespace. - Encrypts the temporary tablespace based on full checksum format. - Introduced a new counter to track encrypted and decrypted temporary tablespace pages. - Warnings issued if temporary table creation has conflict value with innodb_encrypt_temporary_tables - Added a new test case which reads and writes the pages from/to temporary tablespace.	2019-06-28 19:07:59 +05:30
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	f177f125d4	Merge branch '5.5' into 10.1	2019-05-11 19:15:57 +03:00
Vicențiu Ciorbaru	15f1e03d46	Follow-up to changing FSF address Some places didn't match the previous rules, making the Floor address wrong. Additional sed rules: sed -i -e 's/Place.Suite ., Boston/Street, Fifth Floor, Boston/g' sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'	2019-05-11 18:30:45 +03:00
Marko Mäkelä	acf6f92aa9	Merge 10.2 into 10.3	2019-04-25 09:05:52 +03:00
Marko Mäkelä	d315b4ff39	Remove IBUF_COUNT_DEBUG The compile-time option IBUF_COUNT_DEBUG has not been used for years. It would only work with up to 3 created .ibd files, with no buffered changes existing while InnoDB is started up.	2019-04-19 12:44:46 +03:00
Marko Mäkelä	250799f961	Merge 10.2 into 10.3	2019-04-17 15:26:17 +03:00
Marko Mäkelä	169c00994b	MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit `aa3f7a107c`. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).	2019-04-17 13:58:41 +03:00
Marko Mäkelä	ee7a4f4462	MDEV-12266: Pass fil_space_t* to fseg_free_page() fseg_free_page_func(): Avoid an unnecessary tablespace ID lookup. The callers should pass the tablespace that they already know.	2019-04-08 21:38:43 +03:00
Marko Mäkelä	45531949ae	Merge 10.2 into 10.3	2018-12-18 09:15:41 +02:00
Marko Mäkelä	7d245083a4	Merge 10.1 into 10.2	2018-12-17 20:15:38 +02:00
Marko Mäkelä	06e5f28f9f	MDEV-12266: Remove a level of pointer indirection Replace table->space->id with table->space_id.	2018-11-22 17:10:26 +02:00
Marko Mäkelä	fd58bb71e2	Merge 10.2 into 10.3	2018-11-19 18:45:53 +02:00
Marko Mäkelä	ff88e4bb8a	Remove many redundant #include from InnoDB	2018-11-19 11:42:14 +02:00

1 2 3 4 5

205 commits