mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-16 20:07:13 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	745fd4b39f	MDEV-21174: Remove some mlog_write_initial_log_record_fast() Pass buf_block_t* to more functions that write redo log. page_zip_write_node_ptr(), page_zip_write_blob_ptr(), page_zip_compress_write_log_no_data(): Take buf_block_t* as parameter, and do not tolerate mtr=NULL. page_zip_compress(): Do not tolerate mtr=NULL. page_zip_dir_insert(): Take page_cur_t* as parameter. mlog_write_initial_log_record(): Remove. This function was unused. RecIterator::remove(): Remove the redundant page_zip parameter. PageConverter::m_page_zip_ptr: Remove.	2019-12-13 18:15:51 +02:00
Marko Mäkelä	8fa759a576	Merge 10.3 into 10.4 We disable the MDEV-21189 test galera.galera_partition because it times out.	2019-12-13 17:30:37 +02:00
Marko Mäkelä	3466b47b0d	Merge 10.2 into 10.3	2019-12-13 10:08:57 +02:00
Eugene Kosov	f0aa073f2b	MDEV-20950 Reduce size of record offsets offset_t: this is a type which represents one record offset. It's unsigned short int. a lot of functions: replace ulint with offset_t btr_pcur_restore_position_func(), page_validate(), row_ins_scan_sec_index_for_duplicate(), row_upd_clust_rec_by_insert_inherit_func(), row_vers_impl_x_locked_low(), trx_undo_prev_version_build(): allocate record offsets on the stack instead of waiting for rec_get_offsets() to allocate it from mem_heap_t. So, reducing memory allocations. RECORD_OFFSET, INDEX_OFFSET: now it's less convenient to store pointers in offset_t* array. One pointer occupies now several offset_t. And those constant are start indexes into array to places where to store pointer values REC_OFFS_HEADER_SIZE: adjusted for the new reality REC_OFFS_NORMAL_SIZE: increase size from 100 to 300 which means less heap allocations. And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which is smaller than previous 800 bytes. REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality rem0rec.h, rem0rec.ic, rem0rec.cc: various arguments, return values and local variables types were changed to fix numerous integer conversions issues. enum field_type_t: offset types concept was introduces which replaces old offset flags stuff. Like in earlier version, 2 upper bits are used to store offset type. And this enum represents those types. REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed get_type(), set_type(), get_value(), combine(): these are convenience functions to work with offsets and it's types rec_offs_base()[0]: still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL rec_offs_base()[i]: these have type offset_t now. Two upper bits contains type.	2019-12-13 00:26:50 +07:00
Marko Mäkelä	d3b2625ba0	MDEV-21259 Assertion failed in mtr_t::write() btr_free_externally_stored_field(): Pass w=mtr_t::OPT to note that the BTR_EXTERN_LEN is not necessarily changing when a multi-page ROW_FORMAT=COMPRESSED off-page column is being freed, and to allow redundant writes to the redo log to be optimized away. Ever since commit `56f6dab1d0` the refactored function mtr_t::write() asserts by default that the page contents is being changed.	2019-12-09 21:11:08 +02:00
Marko Mäkelä	af5947f433	MDEV-21174: Replace mlog_write_string() with mtr_t::memcpy() mtr_t::memcpy(): Replaces mlog_write_string(), mlog_log_string(). The buf_block_t is passed a parameter, so that mlog_write_initial_log_record_low() can be used instead of mlog_write_initial_log_record_fast(). fil_space_crypt_t::write_page0(): Remove the fil_space_t* parameter.	2019-12-03 11:05:19 +02:00
Marko Mäkelä	87839258f8	MDEV-21174: Replace mlog_memset() with mtr_t::memset() Passing buf_block_t helps us avoid calling mlog_write_initial_log_record_fast() and page_get_page_no(), and allows us to implement more debug checks, such as that on ROW_FORMAT=COMPRESSED index pages, only the page header may be modified by MLOG_MEMSET records. fseg_n_reserved_pages(): Add a buf_block_t parameter.	2019-12-03 11:05:19 +02:00
Marko Mäkelä	caea64df18	Cleanup: Remove some page_get_page_no() calls Refer to buf_page_t::id instead of parsing the tablespace identifier or page number from the buffer pool page.	2019-12-03 11:05:19 +02:00
Marko Mäkelä	56f6dab1d0	MDEV-21174: Replace mlog_write_ulint() with mtr_t::write() mtr_t::write(): Replaces mlog_write_ulint(), mlog_write_ull(). Optimize away writes if the page contents does not change, except when a dummy write has been explicitly requested. Because the member function template takes a block descriptor as a parameter, it is possible to introduce better consistency checks. Due to this, the code for handling file-based lists, undo logs and user transactions was refactored to pass around buf_block_t.	2019-12-03 11:05:18 +02:00
Marko Mäkelä	cd92c6c83d	MDEV-12353 preparation: Do not write MLOG_REC_MIN_MARK btr_set_min_rec_mark(): Write MLOG_1BYTE instead of MLOG_REC_MIN_MARK or MLOG_COMP_REC_MIN_MARK. On ROW_FORMAT=COMPRESSED pages, the minimum record flag is not stored at all. The flag is computed for the uncompressed page by page_zip_decompress(). Hence, nothing needs to be logged for ROW_FORMAT=COMPRESSED tables for this operation. To facilitate crash-upgrade and hot backup from older versions, we will retain the code to parse and apply the old log record types MLOG_REC_MIN_MARK and MLOG_COMP_REC_MIN_MARK.	2019-12-03 11:05:18 +02:00
Marko Mäkelä	ddbbf97670	Merge 10.4 into 10.5	2019-11-27 06:29:14 +02:00
Marko Mäkelä	3eda03d0fe	MDEV-21148: Assertion index->n_core_fields + n_add >= index->n_fields Revert part of commit `6cedb671e9` because it turns out to be theoretically impossible to parse a ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC metadata record where the variable-length fields in the PRIMARY KEY have been written as nonempty strings.	2019-11-26 20:46:25 +02:00
Marko Mäkelä	5b686af2ec	Merge 10.4 into 10.5	2019-11-20 15:47:16 +02:00
Marko Mäkelä	6cedb671e9	MDEV-21088 Table cannot be loaded after instant ADD/DROP COLUMN btr_cur_instant_init_low(): Accurately parse the metadata record header for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT. CHAR columns used to be unnecessarily written as nonempty strings of bytes.	2019-11-20 14:12:53 +08:00
Vladislav Vaintroub	5e62b6a5e0	MDEV-16264 Use threadpool for Innodb background work. Almost all threads have gone - the "ticking" threads, that sleep a while then do some work) (srv_monitor_thread, srv_error_monitor_thread, srv_master_thread) were replaced with timers. Some timers are periodic, e.g the "master" timer. - The btr_defragment_thread is also replaced by a timer , which reschedules it self when current defragment "item" needs throttling - the buf_resize_thread and buf_dump_threads are substitutes with tasks Ditto with page cleaner workers. - purge workers threads are not tasks as well, and purge cleaner coordinator is a combination of a task and timer. - All AIO is outsourced to tpool, Innodb just calls thread_pool::submit_io() and provides the callback. - The srv_slot_t was removed, and innodb_debug_sync used in purge is currently not working, and needs reimplementation.	2019-11-15 18:09:30 +01:00
Marko Mäkelä	786b004972	Cleanup: More use of mtr_memo_type_t	2019-11-15 14:55:38 +02:00
Marko Mäkelä	ae90f8431b	Merge 10.4 into 10.5	2019-11-14 14:49:20 +02:00
Marko Mäkelä	89ae01fd00	Merge 10.3 into 10.4	2019-11-14 13:23:36 +02:00
Marko Mäkelä	3d4a801533	MDEV-12353 preparation: Replace mtr_x_lock() and friends Apart from page latches (buf_block_t::lock), mini-transactions are keeping track of at most one dict_index_t::lock and fil_space_t::latch at a time, and in a rare case, purge_sys.latch. Let us introduce interfaces for acquiring an index latch or a tablespace latch. In a later version, we may want to introduce mtr_t members for holding a latched dict_index_t* and fil_space_t, and replace the remaining use of mtr_t::m_memo with std::set<buf_block_t> or with a map<buf_block_t,byte> pointing to log records.	2019-11-14 11:40:33 +02:00
Marko Mäkelä	0117d0e65a	Merge 10.4 into 10.5	2019-11-11 15:21:58 +02:00
Marko Mäkelä	3da895a736	Merge 10.3 into 10.4	2019-11-11 15:03:46 +02:00
Marko Mäkelä	4fcfdb60e7	Merge 10.2 into 10.3	2019-11-11 14:56:51 +02:00
Marko Mäkelä	98e1d603bf	MDEV-21024: Optimize writing BTR_EXTERN_LEN btr_store_big_rec_extern_fields(): Remove the redundant initialization of the most significant 32 bits of BTR_EXTERN_LEN. InnoDB never supported BLOBs that are longer than 4GiB. In fact, dtuple_convert_big_rec() would write emit an error message if a clustered index record tuple would exceed 1,000,000,000 bytes in length. The BTR_EXTERN_LEN in the BLOB pointers in clustered index leaf page records is zero-initialized at least since commit `41bb3537ba`	2019-11-11 14:14:26 +02:00
Marko Mäkelä	29d67d051a	Cleanup btr_page_get_prev(), btr_page_get_next() Remove the redundant parameter mtr_t*. Make use of page_has_prev(), page_has_next() whenever possible.	2019-11-11 13:36:21 +02:00
Marko Mäkelä	a6d614fb4a	MDEV-12353 preparation: Remove redundant writes fsp_alloc_seg_inode_page(): Ever since commit `3926673ce7` all newly allocated pages are zero-initialized. Assert that this is the case for the FSEG_ID fields. (Side note: before that fix, other parts of the pages could contain nonzero garbage.) btr_store_big_rec_extern_fields(): Remove the redundant initialization of the most significant 32 bits of BTR_EXTERN_LEN. InnoDB never supported BLOBs that are longer than 4GiB. In fact, dtuple_convert_big_rec() would write emit an error message if a clustered index record tuple would exceed 1,000,000,000 bytes in length.	2019-11-08 11:04:26 +02:00
Marko Mäkelä	52246dff2c	Merge 10.4 into 10.5	2019-11-08 09:43:41 +02:00
Marko Mäkelä	8a5eb4141b	MDEV-17138 follow-up: Use MLOG_MEMSET for writing FIL_NULL Always use the MLOG_MEMSET record for writing FIL_NULL, because it is more compact.	2019-11-08 09:00:10 +02:00
Oleksandr Byelkin	3ad37ed0eb	Merge 10.4 into 10.5	2019-11-07 08:52:30 +01:00
Marko Mäkelä	64a02e4fa2	MDEV-19586: Add const qualifiers Except for fil_name_process(), which invokes os_normalize_path(), the redo log record parser will not modify the redo log records. Add const qualifiers accordingly.	2019-11-04 09:25:26 +02:00
Marko Mäkelä	ec40980ddd	Merge 10.3 into 10.4	2019-11-01 15:23:18 +02:00
Marko Mäkelä	0b9cee2cbf	Merge 10.2 into 10.3	2019-10-18 09:05:27 +03:00
Marko Mäkelä	fa32d28f2f	MDEV-20852 BtrBulk is unnecessarily holding dict_index_t::lock The BtrBulk class, which was introduced in MySQL 5.7, is by design the exclusive writer to an index. It is therefore unnecessary to acquire the dict_index_t::lock in that code. Holding the dict_index_t::lock would unnecessarily block other threads (SQL connections and the InnoDB purge threads) from buffering concurrent modifications to being-created secondary indexes. This fix is motivated by a change in MySQL 5.7.28: Bug #29008298 MYSQLD CRASHES ITSELF WHEN CREATING INDEX mysql/mysql-server@f9fb96c20f PageBulk::init(), PageBulk::latch(): Never acquire m_index->lock. PageBulk::storeExt(): Remove some pointer indirection, and improve a debug assertion that seems to prove that some code is redundant. BtrBulk::pageCommit(): Assert that m_index->lock is not being held. btr_blob_log_check_t: Do not acquire m_index->lock if m_op == BTR_STORE_INSERT_BULK. Add UNIV_UNLIKELY hints around that condition. btr_store_big_rec_extern_fields(): Allow index->lock not to be held while op == BTR_STORE_INSERT_BULK. Add UNIV_UNLIKELY hints around that condition.	2019-10-17 14:04:07 +03:00
Marko Mäkelä	b42294bc64	MDEV-19514 Defer change buffer merge until pages are requested We will remove the InnoDB background operation of merging buffered changes to secondary index leaf pages. Changes will only be merged as a result of an operation that accesses a secondary index leaf page, such as a SQL statement that performs a lookup via that index, or is modifying the index. Also ROLLBACK and some background operations, such as purging the history of committed transactions, or computing index cardinality statistics, can cause change buffer merge. Encryption key rotation will not perform change buffer merge. The motivation of this change is to simplify the I/O logic and to allow crash recovery to happen in the background (MDEV-14481). We also hope that this will reduce the number of "mystery" crashes due to corrupted data. Because change buffer merge will typically take place as a result of executing SQL statements, there should be a clearer connection between the crash and the SQL statements that were executed when the server crashed. In many cases, a slight performance improvement was observed. This is joint work with Thirunarayanan Balathandayuthapani and was tested by Axel Schwenke and Matthias Leich. The InnoDB monitor counter innodb_ibuf_merge_usec will be removed. On slow shutdown (innodb_fast_shutdown=0), we will continue to merge all buffered changes (and purge all undo log history). Two InnoDB configuration parameters will be changed as follows: innodb_disable_background_merge: Removed. This parameter existed only in debug builds. All change buffer merges will use synchronous reads. innodb_force_recovery will be changed as follows: * innodb_force_recovery=4 will be the same as innodb_force_recovery=3 (the change buffer merge cannot be disabled; it can only happen as a result of an operation that accesses a secondary index leaf page). The option used to be capable of corrupting secondary index leaf pages. Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'. * innodb_force_recovery=5 (which essentially hard-wires SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED) becomes safe to use. Bogus data can be returned to SQL, but persistent InnoDB data files will not be corrupted further. * innodb_force_recovery=6 (ignore the redo log files) will be the only option that can potentially cause persistent corruption of InnoDB data files. Code changes: buf_page_t::ibuf_exist: New flag, to indicate whether buffered changes exist for a buffer pool page. Pages with pending changes can be returned by buf_page_get_gen(). Previously, the changes were always merged inside buf_page_get_gen() if needed. ibuf_page_exists(const buf_page_t&): Check if a buffered changes exist for an X-latched or read-fixed page. buf_page_get_gen(): Add the parameter allow_ibuf_merge=false. All callers that know that they may be accessing a secondary index leaf page must pass this parameter as allow_ibuf_merge=true, unless it does not matter for that caller whether all buffered changes have been applied. Assert that whenever allow_ibuf_merge holds, the page actually is a leaf page. Attempt change buffer merge only to secondary B-tree index leaf pages. btr_block_get(): Add parameter 'bool merge'. All callers of btr_block_get() should know whether the page could be a secondary index leaf page. If it is not, we should avoid consulting the change buffer bitmap to even consider a merge. This is the main interface to requesting index pages from the buffer pool. ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace buf_page_get_known_nowait() with much simpler logic, because it is now guaranteed that that the block is x-latched or read-fixed. mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge(). On crash recovery, we will no longer merge any buffered changes for the pages that we read into the buffer pool during the last batch of applying log records. buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove. btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait() to its only remaining caller. buf_page_make_young_if_needed(): Define as an inline function. Add the parameter buf_pool. buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the parameter buf_pool. fil_space_validate_for_mtr_commit(): Remove a bogus comment about background merge of the change buffer. btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(), btr_cur_open_at_index_side_func(): Use narrower data types and scopes. ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages(). Merge the change buffer by invoking buf_page_get_gen().	2019-10-11 17:28:15 +03:00
Marko Mäkelä	d04f2de80a	Merge 10.4 into 10.5	2019-10-11 08:41:36 +03:00
Marko Mäkelä	09afd3da1a	Merge 10.3 into 10.4	2019-10-10 21:30:40 +03:00
Marko Mäkelä	7f84e3ad75	Merge 10.2 into 10.3	2019-10-10 20:38:44 +03:00
Marko Mäkelä	6d7a826953	MDEV-20788: Bogus assertion failure for PAGE_FREE list In MDEV-11369 (instant ADD COLUMN) in MariaDB Server 10.3, we introduced the hidden metadata record that must be the first record in the clustered index if and only if index->is_instant() holds. To catch MDEV-19783, in commit `ed0793e096` and commit `99dc40d6ac` we added some assertions to find cases where the metadata record is missing while it should not be, or a record exists when it should not. Those assertions were invalid when traversing the PAGE_FREE list. That list can contain anything; we must only be able to determine the successor and the size of each garbage record in it. page_validate(), page_simple_validate_old(), page_simple_validate_new(): Do not invoke page_rec_get_next_const() for traversing the PAGE_FREE list, but instead use a lower-level accessor that does not attempt to validate the REC_INFO_MIN_REC_FLAG. page_copy_rec_list_end_no_locks(), page_copy_rec_list_start(), page_delete_rec_list_start(): Add assertions. btr_page_get_split_rec_to_left(): Remove a redundant return value, and make the output parameter the return value. btr_page_get_split_rec_to_right(), btr_page_split_and_insert(): Clean up.	2019-10-10 20:29:30 +03:00
Marko Mäkelä	c11e5cdd12	Merge 10.3 into 10.4	2019-10-10 11:19:25 +03:00
Marko Mäkelä	892378fb9d	Merge 10.2 into 10.3	2019-10-09 13:25:11 +03:00
Eugene Kosov	ed0793e096	MDEV-19783: Add more REC_INFO_MIN_REC_FLAG checks btr_cur_pessimistic_delete(): code changed in a way that allows to put more REC_INFO_MIN_REC_FLAG assertions inside btr_set_min_rec_mark(). Without that change tests innodb.innodb-table-online, innodb.temp_table_savepoint and innodb_zip.prefix_index_liftedlimit fail. Removed basically duplicated page_zip_validate() calls which fails because of temporary(!) invariant violation. That fixed innodb_zip.wl5522_debug_zip and innodb_zip.prefix_index_liftedlimit	2019-10-09 08:29:26 +03:00
Marko Mäkelä	d480d28f4f	Add page_has_prev(), page_has_next(), page_has_siblings() Until now, InnoDB inefficiently compared the aligned fields FIL_PAGE_PREV, FIL_PAGE_NEXT to the byte-order-agnostic value FIL_NULL. This is a backport of `32170f8c6d` from MariaDB Server 10.3.	2019-10-09 08:29:26 +03:00
Marko Mäkelä	a340af9223	btr_block_get(): Remove redundant parameters	2019-09-25 16:08:48 +03:00
Marko Mäkelä	5d0bab47fc	btr_block_get(), btr_block_get_func(): Change the parameter to const dict_index_t& btr_level_list_remove(): Clean up the parameters. Renamed from btr_level_list_remove_func().	2019-09-25 13:34:49 +03:00
Marko Mäkelä	60c04be659	Merge 10.3 into 10.4	2019-09-12 12:16:40 +03:00
Marko Mäkelä	0fa5ad3acf	Merge 10.2 into 10.3	2019-09-11 16:42:01 +03:00
Marko Mäkelä	0f950e53f0	MDEV-20562 btr_cur_open_at_rnd_pos() fails to return error for corrupted page In mysql-server/commit@f46329044f the InnoDB function btr_cur_open_at_rnd_pos() was corrected so that it would return a status that indicates whether the cursor was successfully positioned. But this change was not correctly merged to MariaDB in `2e814d4702`. btr_cur_open_at_rnd_pos(): In the code path that was introduced in MDEV-8588, properly return failure status. No deterministic test case was found for this failure. It was caught after removing the function page_copy_rec_list_end_to_created_page() in a development branch. As a result, the fill factor of index trees would improve, and supposedly, so would the probability of btr_cur_open_at_rnd_pos() reaching the intentionally corrupted page in the test innodb.leaf_page_corrupted_during_recovery. The wrong return value would cause btr_estimate_number_of_different_key_vals() to wrongly invoke btr_rec_get_externally_stored_len() on a non-leaf page and trigger an assertion failure at the start of that function.	2019-09-11 15:30:19 +03:00
Eugene Kosov	4c7a743964	Merge 10.3 into 10.4	2019-07-26 15:22:31 +03:00
Eugene Kosov	29df1003d9	MDEV-20184 data race at global counter btr_cur_n_non_sea Make all accesses to btr_cur_n_non_sea atomic.	2019-07-26 13:52:52 +03:00
Marko Mäkelä	09e9f884f1	MDEV-20048 Assertion 'n < tuple->n_fields on ROLLBACK after DROP COLUMN btr_push_update_extern_fields(): Add a parameter for the original number of fields in the record before btr_cur_trim(). Assume that this function will only be called for the clustered index, which is the only index that can contain off-page columns. trx_undo_prev_version_build(), btr_cur_pessimistic_update(): Only invoke btr_push_update_extern_fields() for the clustered index.	2019-07-19 18:13:36 +03:00
Marko Mäkelä	7a3d34d645	Merge 10.3 into 10.4	2019-07-02 21:44:58 +03:00

1 2 3 4 5 ...

339 commits