mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 11:01:52 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	574d8b2940	MDEV-21907: Fix most clang -Wconversion in InnoDB Declare innodb_purge_threads as 4-byte integer (UINT) instead of 4-or-8-byte (ULONG) and adjust the documentation string.	2020-03-11 08:29:48 +02:00
Marko Mäkelä	96901d9545	Cleanup: Remove dict_ind_redundant There is no reason for the dummy index object dict_ind_redundant to exist any more. It was only being passed to btr_create(). btr_create(): If !index, assume that a ROW_FORMAT=REDUNDANT table is being created. We could pass ibuf.index, dict_sys.sys_tables->indexes.start and so on, if those objects had been initialized before the function btr_create() is called.	2020-02-20 22:00:43 +02:00
Marko Mäkelä	23de5b8f07	MDEV-21725 Optimize btr_page_reorganize_low() redo logging btr_page_reorganize_low(): Log only the changed data in the page. TODO: Do not copy the entire changed payload to the redo log. Emit a combination of MEMMOVE and WRITE records to reduce the log volume.	2020-02-18 10:54:28 +02:00
Marko Mäkelä	fc87698048	MDEV-12353: Write less log for BLOB pages fsp_page_create(): Always initialize the page. The logic to avoid initialization was made redundant and should have been removed in mysql/mysql-server@ce0a1e85e2 (MySQL 5.7.5). btr_store_big_rec_extern_fields(): Remove the redundant initialization of FIL_PAGE_PREV and FIL_PAGE_NEXT. An INIT_PAGE record will have been written already. Only write the ROW_FORMAT=COMPRESSED page payload from FIL_PAGE_DATA onwards. We were unnecessarily writing from FIL_PAGE_TYPE onwards, which caused an assertion failure on recovery: recv_sys_t::alloc(size_t): Assertion 'len <= srv_page_size' failed when running the following tests: ./mtr --no-reorder innodb_zip.blob,4k innodb_zip.bug56680,4k	2020-02-17 10:13:32 +02:00
Marko Mäkelä	f8a9f90667	MDEV-12353: Remove support for crash-upgrade We tighten some assertions regarding dict_index_t::is_dummy and crash recovery, now that redo log processing will no longer create dummy objects.	2020-02-13 19:13:45 +02:00
Marko Mäkelä	7ae21b18a6	MDEV-12353: Change the redo log encoding log_t::FORMAT_10_5: physical redo log format tag log_phys_t: Buffered records in the physical format. The log record bytes will follow the last data field, making use of alignment padding that would otherwise be wasted. If there are multiple records for the same page, also those may be appended to an existing log_phys_t object if the memory is available. In the physical format, the first byte of a record identifies the record and its length (up to 15 bytes). For longer records, the immediately following bytes will encode the remaining length in a variable-length encoding. Usually, a variable-length-encoded page identifier will follow, followed by optional payload, whose length is included in the initially encoded total record length. When a mini-transaction is updating multiple fields in a page, it can avoid repeating the tablespace identifier and page number by setting the same_page flag (most significant bit) in the first byte of the log record. The byte offset of the record will be relative to where the previous record for that page ended. Until MDEV-14425 introduces a separate file-level log for redo log checkpoints and file operations, we will write the file-level records in the page-level redo log file. The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT) will be removed in MDEV-14425, and one sequential scan of the page recovery log will suffice. Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags. If the information is needed, it can be parsed from WRITE records that modify FSP_SPACE_FLAGS. MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily as part of this work, before being replaced with WRITE (along with MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES). mtr_buf_t::empty(): Check if the buffer is empty. mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty. mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record, for the same_page encoding. page_recv_t::last_offset: Reflects mtr_t::m_last_offset. Valid values for last_offset during recovery should be 0 or above 8. (The first 8 bytes of a page are the checksum and the page number, and neither are ever updated directly by log records.) Internally, the special value 1 indicates that the same_page form will not be allowed for the subsequent record. mtr_t::page_create(): Take the block descriptor as parameter, so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE record will always followed by a subtype byte, because same_page records must be longer than 1 byte. trx_undo_page_init(): Combine the writes in WRITE record. trx_undo_header_create(): Write 4 bytes using a special MEMSET record that includes 1 bytes of length and 2 bytes of payload. flst_write_addr(): Define as a static function. Combine the writes. flst_zero_both(): Replaces two flst_zero_addr() calls. flst_init(): Do not inline the function. fsp_free_seg_inode(): Zerofill the whole inode. fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT to FIL_NULL when using the physical format. btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page() must have been invoked. fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE. fil_names_dirty_and_write(): Remove the parameter mtr. Write the records using a separate mini-transaction object, because any FILE_ records must be at the start of a mini-transaction log. recv_recover_page(): Add a fil_space_t* parameter. After applying log to the a ROW_FORMAT=COMPRESSED page, invoke buf_zip_decompress() to restore the uncompressed page. buf_page_io_complete(): Remove the temporary hack to discard the uncompressed page of a ROW_FORMAT=COMPRESSED page. page_zip_write_header(): Remove. Use mtr_t::write() or mtr_t::memset() instead, and update the compressed page frame separately. trx_undo_header_add_space_for_xid(): Remove. trx_undo_seg_create(): Perform the changes that were previously made by trx_undo_header_add_space_for_xid(). btr_reset_instant(): New function: Reset the table to MariaDB 10.2 or 10.3 format when rolling back an instant ALTER TABLE operation. page_rec_find_owner_rec(): Merge with the only callers. page_cur_insert_rec_low(): Combine writes by using a local buffer. MEMMOVE data from the preceding record whenever feasible (copying at least 3 bytes). page_cur_insert_rec_zip(): Combine writes to page header fields. PageBulk::insertPage(): Issue MEMMOVE records to copy a matching part from the preceding record. PageBulk::finishPage(): Combine the writes to the page header and to the sparse page directory slots. mtr_t::write(): Only log the least significant (last) bytes of multi-byte fields that actually differ. For updating FSP_SIZE, we must always write all 4 bytes to the redo log, so that the fil_space_set_recv_size() logic in recv_sys_t::parse() will work. mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument instead of a numeric offset to the page frame. Only log the last bytes of multi-byte fields that actually differ. In fil_space_crypt_t::write_page0(), we must log also any unchanged bytes, so that recovery will recognize the record and invoke fil_crypt_parse(). Future work: MDEV-21724 Optimize page_cur_insert_rec_low() redo logging MDEV-21725 Optimize btr_page_reorganize_low() redo logging MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED	2020-02-13 19:12:17 +02:00
Marko Mäkelä	2e7a084283	MDEV-21174: Remove mlog_write_initial_log_record_fast() Pass buf_block_t* to all functions that write redo log. Specifically, replace the parameters page,page_zip with buf_block_t* block in page_zip_ functions.	2020-02-13 18:19:15 +02:00
Marko Mäkelä	2a77b2a510	MDEV-12353: Replace MLOG_LIST__DELETE and MLOG_*REC_DELETE No longer write the following redo log records: MLOG_COMP_LIST_END_DELETE, MLOG_LIST_END_DELETE, MLOG_COMP_LIST_START_DELETE, MLOG_LIST_START_DELETE, MLOG_REC_DELETE,MLOG_COMP_REC_DELETE. Each individual deleted record will be logged separately using physical log records. page_dir_slot_set_n_owned(), page_zip_rec_set_owned(), page_zip_dir_delete(), page_zip_clear_rec(): Add the parameter mtr, and write redo log. page_dir_slot_set_rec(): Remove. Replaced with lower-level operations that write redo log when necessary. page_rec_set_n_owned(): Replaces rec_set_n_owned_old(), rec_set_n_owned_new(). rec_set_heap_no(): Replaces rec_set_heap_no_old(), rec_set_heap_no_new(). page_mem_free(), page_dir_split_slot(), page_dir_balance_slot(): Add the parameter mtr. page_dir_set_n_slots(): Merge with the caller page_dir_split_slot(). page_dir_slot_set_rec(): Merge with the callers page_dir_split_slot() and page_dir_balance_slot(). page_cur_insert_rec_low(), page_cur_insert_rec_zip(): Suppress the logging of lower-level operations. page_cur_delete_rec_write_log(): Remove. page_cur_delete_rec(): Do not tolerate mtr=NULL. rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_comp(): Replace rec_set_heap_no_old() and rec_set_heap_no_new() with direct access that does not involve redo logging. mtr_t::memcpy(): Do allow non-redo-logged writes to uncompressed pages of ROW_FORMAT=COMPRESSED pages. buf_page_io_complete(): Evict the uncompressed page of a ROW_FORMAT=COMPRESSED page after recovery. Because we no longer write logical log records for deleting index records, but instead write physical records that may refer directly to the compressed page frame of a ROW_FORMAT=COMPRESSED page, and because on recovery we will only apply the changes to the ROW_FORMAT=COMPRESSED page, the uncompressed page frame can be stale until page_zip_decompress() is executed. recv_parse_or_apply_log_rec_body(): After applying MLOG_ZIP_WRITE_STRING, ensure that the FIL_PAGE_TYPE of the uncompressed page matches the compressed page, because buf_flush_init_for_writing() assumes that field to be valid. mlog_init_t::mark_ibuf_exist(): Invoke page_zip_decompress(), because the uncompressed page after buf_page_create() is not necessarily up to date. buf_LRU_block_remove_hashed(): Bypass a page_zip_validate() check during redo log apply. recv_apply_hashed_log_recs(): Invoke mlog_init.mark_ibuf_exist() also for the last batch, to ensure that page_zip_decompress() will be called for freshly initialized pages.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	d00185c40d	MDEV-12353: Replace MLOG_PAGE_CREATE_RTREE, MLOG_PAGE_COMP_CREATE_RTREE page_create(): Create normal B-tree pages. Callers that create R-tree pages will set FIL_PAGE_TYPE and reset the split sequence number afterwards. The creation of ROW_FORMAT=COMPRESSED pages is unaffected; they will be logged as compressed page images. page_create_low(): Take const buf_block_t* as a parameter. Let the callers invoke buf_block_modify_clock_inc().	2020-02-13 18:19:14 +02:00
Marko Mäkelä	db5cdc3195	MDEV-12353: Replace MLOG_PAGE_REORGANIZE, MLOG_COMP_PAGE_REORGANIZE Log page reorganize as a series of insert operations. This will make the redo log volume proportional to the page payload size. btr_page_reorganize_low(): Add template <bool recovery=false> btr_page_reorganize_block(): Remove the parameter 'bool recovery'	2020-02-13 18:19:14 +02:00
Marko Mäkelä	acd265b69b	MDEV-12353: Exclusively use page_zip_reorganize() for ROW_FORMAT=COMPRESSED page_zip_reorganize(): Restore the page on failure. In callers, omit now-redundant calls to page_zip_decompress(). btr_page_reorganize_low(): Define in static scope only, and remove the z_level parameter. Assert that ROW_FORMAT is not COMPRESSED. btr_page_reorganize_block(), btr_page_reorganize(): Invoke page_zip_reorganize() for ROW_FORMAT=COMPRESSED.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	5bea43f5e0	MDEV-12353: Deprecate and ignore innodb_log_compressed_pages page_zip_compress_write_log_no_data(): Remove. We no longer write the MLOG_ZIP_PAGE_COMPRESS_NO_DATA record. Instead, we will write MLOG_ZIP_PAGE_COMPRESS records.	2020-02-13 18:19:13 +02:00
Marko Mäkelä	1a6f708ec5	MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances Our benchmarking efforts indicate that the reasons for splitting the buf_pool in commit `c18084f71b` have mostly gone away, possibly as a result of mysql/mysql-server@ce6109ebfd or similar work. Only in one write-heavy benchmark where the working set size is ten times the buffer pool size, the buf_pool->mutex would be less contended with 4 buffer pool instances than with 1 instance, in buf_page_io_complete(). That contention could be alleviated further by making more use of std::atomic and by splitting buf_pool_t::mutex further (MDEV-15053). We will deprecate and ignore the following parameters: innodb_buffer_pool_instances innodb_page_cleaners There will be only one buffer pool and one page cleaner task. In a number of INFORMATION_SCHEMA views, columns that indicated the buffer pool instance will be removed: information_schema.innodb_buffer_page.pool_id information_schema.innodb_buffer_page_lru.pool_id information_schema.innodb_buffer_pool_stats.pool_id information_schema.innodb_cmpmem.buffer_pool_instance information_schema.innodb_cmpmem_reset.buffer_pool_instance	2020-02-12 14:45:21 +02:00
Marko Mäkelä	fc2f2fa853	MDEV-19747: Deprecate and ignore innodb_log_optimize_ddl During native table rebuild or index creation, InnoDB used to skip redo logging and write MLOG_INDEX_LOAD records to inform crash recovery and Mariabackup of the gaps in redo log. This is fragile and prohibits some optimizations, such as skipping the doublewrite buffer for newly (re)initialized pages (MDEV-19738). row_merge_write_redo(): Remove. We do not write MLOG_INDEX_LOAD records any more. Instead, we write full redo log. FlushObserver: Remove. fseg_free_page_func(): Remove the parameter log. Redo logging cannot be disabled. fil_space_t::redo_skipped_count: Remove. We cannot remove buf_block_t::skip_flush_check, because PageBulk will temporarily generate invalid B-tree pages in the buffer pool.	2020-02-11 18:44:26 +02:00
Eugene Kosov	700e010309	fix aligned memcpy()-like functions usage I found that memcpy_aligned was used incorrectly at redo log and decided to put assertions in aligned functions. And found even more incorrect cases. Given the amount discovered of bugs, I left assertions to prevent future bugs. my_assume_aligned(): instead of MY_ASSUME_ALIGNED macro	2020-01-23 00:12:43 +08:00
Marko Mäkelä	28c89b7151	Merge 10.4 into 10.5	2019-12-16 07:47:17 +02:00
Marko Mäkelä	745fd4b39f	MDEV-21174: Remove some mlog_write_initial_log_record_fast() Pass buf_block_t* to more functions that write redo log. page_zip_write_node_ptr(), page_zip_write_blob_ptr(), page_zip_compress_write_log_no_data(): Take buf_block_t* as parameter, and do not tolerate mtr=NULL. page_zip_compress(): Do not tolerate mtr=NULL. page_zip_dir_insert(): Take page_cur_t* as parameter. mlog_write_initial_log_record(): Remove. This function was unused. RecIterator::remove(): Remove the redundant page_zip parameter. PageConverter::m_page_zip_ptr: Remove.	2019-12-13 18:15:51 +02:00
Marko Mäkelä	2b5a269cb4	MDEV-21174: Clean up record insertion page_cur_insert_rec_low(): Take page_cur_t* as a parameter, and do not tolerate mtr=NULL. page_cur_insert_rec_zip(): Do not tolerate mtr=NULL.	2019-12-13 18:15:51 +02:00
Marko Mäkelä	8fa759a576	Merge 10.3 into 10.4 We disable the MDEV-21189 test galera.galera_partition because it times out.	2019-12-13 17:30:37 +02:00
Marko Mäkelä	3466b47b0d	Merge 10.2 into 10.3	2019-12-13 10:08:57 +02:00
Eugene Kosov	f0aa073f2b	MDEV-20950 Reduce size of record offsets offset_t: this is a type which represents one record offset. It's unsigned short int. a lot of functions: replace ulint with offset_t btr_pcur_restore_position_func(), page_validate(), row_ins_scan_sec_index_for_duplicate(), row_upd_clust_rec_by_insert_inherit_func(), row_vers_impl_x_locked_low(), trx_undo_prev_version_build(): allocate record offsets on the stack instead of waiting for rec_get_offsets() to allocate it from mem_heap_t. So, reducing memory allocations. RECORD_OFFSET, INDEX_OFFSET: now it's less convenient to store pointers in offset_t* array. One pointer occupies now several offset_t. And those constant are start indexes into array to places where to store pointer values REC_OFFS_HEADER_SIZE: adjusted for the new reality REC_OFFS_NORMAL_SIZE: increase size from 100 to 300 which means less heap allocations. And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which is smaller than previous 800 bytes. REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality rem0rec.h, rem0rec.ic, rem0rec.cc: various arguments, return values and local variables types were changed to fix numerous integer conversions issues. enum field_type_t: offset types concept was introduces which replaces old offset flags stuff. Like in earlier version, 2 upper bits are used to store offset type. And this enum represents those types. REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed get_type(), set_type(), get_value(), combine(): these are convenience functions to work with offsets and it's types rec_offs_base()[0]: still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL rec_offs_base()[i]: these have type offset_t now. Two upper bits contains type.	2019-12-13 00:26:50 +07:00
Marko Mäkelä	bb45941685	MDEV-21205 Assertion failure in btr_sec_min_rec_mark In commit `af5947f433` the function btr_discard_page() is invoking btr_set_min_rec_mark() with the wrong buf_block_t* object. node_ptr is on merge_block, not block. btr_discard_page(): Remove the variables merge_page, page, and always refer to block->frame or merge_block->frame instead. Also, limit the scope of node_ptr and avoid duplicated conditions. btr_set_min_rec_mark(): Add a template parameter, so that the caller can specify whether the page is supposed to have a left sibling. Otherwise, the assertion (which was introduced in the same commit) would fail in btr_discard_page().	2019-12-04 10:51:38 +02:00
Marko Mäkelä	af5947f433	MDEV-21174: Replace mlog_write_string() with mtr_t::memcpy() mtr_t::memcpy(): Replaces mlog_write_string(), mlog_log_string(). The buf_block_t is passed a parameter, so that mlog_write_initial_log_record_low() can be used instead of mlog_write_initial_log_record_fast(). fil_space_crypt_t::write_page0(): Remove the fil_space_t* parameter.	2019-12-03 11:05:19 +02:00
Marko Mäkelä	87839258f8	MDEV-21174: Replace mlog_memset() with mtr_t::memset() Passing buf_block_t helps us avoid calling mlog_write_initial_log_record_fast() and page_get_page_no(), and allows us to implement more debug checks, such as that on ROW_FORMAT=COMPRESSED index pages, only the page header may be modified by MLOG_MEMSET records. fseg_n_reserved_pages(): Add a buf_block_t parameter.	2019-12-03 11:05:19 +02:00
Marko Mäkelä	caea64df18	Cleanup: Remove some page_get_page_no() calls Refer to buf_page_t::id instead of parsing the tablespace identifier or page number from the buffer pool page.	2019-12-03 11:05:19 +02:00
Marko Mäkelä	56f6dab1d0	MDEV-21174: Replace mlog_write_ulint() with mtr_t::write() mtr_t::write(): Replaces mlog_write_ulint(), mlog_write_ull(). Optimize away writes if the page contents does not change, except when a dummy write has been explicitly requested. Because the member function template takes a block descriptor as a parameter, it is possible to introduce better consistency checks. Due to this, the code for handling file-based lists, undo logs and user transactions was refactored to pass around buf_block_t.	2019-12-03 11:05:18 +02:00
Marko Mäkelä	cd92c6c83d	MDEV-12353 preparation: Do not write MLOG_REC_MIN_MARK btr_set_min_rec_mark(): Write MLOG_1BYTE instead of MLOG_REC_MIN_MARK or MLOG_COMP_REC_MIN_MARK. On ROW_FORMAT=COMPRESSED pages, the minimum record flag is not stored at all. The flag is computed for the uncompressed page by page_zip_decompress(). Hence, nothing needs to be logged for ROW_FORMAT=COMPRESSED tables for this operation. To facilitate crash-upgrade and hot backup from older versions, we will retain the code to parse and apply the old log record types MLOG_REC_MIN_MARK and MLOG_COMP_REC_MIN_MARK.	2019-12-03 11:05:18 +02:00
Marko Mäkelä	bf2cc46798	MDEV-21133: Remove buf_frame_copy()	2019-12-03 11:05:18 +02:00
Marko Mäkelä	a6e8a7df82	Cleanup: flst_read_addr(), fil_addr_t fil_addr_t: Use exactly sized data types. flst_read_addr(): Remove the unused parameter mtr. page_offset(): Return uint16_t.	2019-11-28 11:44:40 +02:00
Marko Mäkelä	25e2a556de	MDEV-21133 Optimize access to InnoDB page header fields Introduce memcpy_aligned<N>(), memcmp_aligned<N>(), memset_aligned<N>() and use them for accessing InnoDB page header fields that are known to be aligned. MY_ASSUME_ALIGNED(): Wrapper for the GCC/clang __builtin_assume_aligned(). Nothing similar seems to exist in Microsoft Visual Studio, and the C++20 std::assume_aligned is not available to us yet. Explicitly specified alignment guarantees allow compilers to generate faster code on platforms with strict alignment rules, instead of emitting calls to potentially unaligned memcpy(), memcmp(), or memset().	2019-11-26 10:15:03 +02:00
Marko Mäkelä	ae90f8431b	Merge 10.4 into 10.5	2019-11-14 14:49:20 +02:00
Marko Mäkelä	89ae01fd00	Merge 10.3 into 10.4	2019-11-14 13:23:36 +02:00
Marko Mäkelä	3d4a801533	MDEV-12353 preparation: Replace mtr_x_lock() and friends Apart from page latches (buf_block_t::lock), mini-transactions are keeping track of at most one dict_index_t::lock and fil_space_t::latch at a time, and in a rare case, purge_sys.latch. Let us introduce interfaces for acquiring an index latch or a tablespace latch. In a later version, we may want to introduce mtr_t members for holding a latched dict_index_t* and fil_space_t, and replace the remaining use of mtr_t::m_memo with std::set<buf_block_t> or with a map<buf_block_t,byte> pointing to log records.	2019-11-14 11:40:33 +02:00
Marko Mäkelä	c99470b366	Merge 10.4 into 10.5	2019-11-13 20:38:14 +02:00
Marko Mäkelä	49019dde65	MDEV-17138 follow-up: Optimize index page creation btr_create(), btr_root_raise_and_insert(): Write a MLOG_MEMSET record to set FIL_PAGE_PREV,FIL_PAGE_NEXT to FIL_NULL, instead of writing two MLOG_4BYTES records. For ROW_FORMAT=COMPRESSED pages, we will not use MLOG_MEMSET because we want the crash-downgrade to earlier 10.4 releases to succeed. mlog_parse_nbytes(): Relax the too strict assertion. There is no problem with MLOG_MEMSET records that affect the uncompressed header of ROW_FORMAT=COMPRESSED index pages.	2019-11-13 18:35:04 +02:00
Marko Mäkelä	0117d0e65a	Merge 10.4 into 10.5	2019-11-11 15:21:58 +02:00
Marko Mäkelä	3da895a736	Merge 10.3 into 10.4	2019-11-11 15:03:46 +02:00
Marko Mäkelä	4fcfdb60e7	Merge 10.2 into 10.3	2019-11-11 14:56:51 +02:00
Marko Mäkelä	33f74e8fcf	MDEV-21024: Clean up IMPORT TABLESPACE page_rec_write_field(): Remove. dict_create_index_tree_step(): If the SYS_INDEXES.PAGE does not change, do not update it in the data dictionary. Typically, all index page numbers would be unchanged before and after IMPORT TABLESPACE, except if some secondary indexes were created after loading some data. btr_root_fseg_adjust_on_import(): Remove the redundant mtr_t* parameter. Redo logging is disabled during the page adjustments that IMPORT TABLESPACE is performing.	2019-11-11 14:14:26 +02:00
Marko Mäkelä	dfdd96214b	MDEV-21024: Clean up btr_root_raise_and_insert() The root page must never have any siblings, so it is unnecessary to clear those fields.	2019-11-11 14:14:26 +02:00
Marko Mäkelä	29d67d051a	Cleanup btr_page_get_prev(), btr_page_get_next() Remove the redundant parameter mtr_t*. Make use of page_has_prev(), page_has_next() whenever possible.	2019-11-11 13:36:21 +02:00
Marko Mäkelä	64a02e4fa2	MDEV-19586: Add const qualifiers Except for fil_name_process(), which invokes os_normalize_path(), the redo log record parser will not modify the redo log records. Add const qualifiers accordingly.	2019-11-04 09:25:26 +02:00
Marko Mäkelä	bb450b1fed	Merge 10.2 into 10.3	2019-10-12 15:38:58 +03:00
Marko Mäkelä	361e8284f3	MDEV-20813 Assertion failure in buf_flush_init_for_writing() for innodb_immediate_scrub_data_uncompressed=ON The assertion that was added in commit `c0c003beb4` to augment the fix of MDEV-20805 turns out to be invalid when innodb_immediate_scrub_data_uncompressed is enabled. In this mode, fsp_init_file_page() will be invoked on data pages that have been freed, causing writes of almost-all-zero pages. btr_page_free(): Adjust the comment. buf_flush_init_for_writing(): Disable the assertion with a note that it should be re-enabled in MDEV-15528.	2019-10-12 15:28:55 +03:00
Marko Mäkelä	b42294bc64	MDEV-19514 Defer change buffer merge until pages are requested We will remove the InnoDB background operation of merging buffered changes to secondary index leaf pages. Changes will only be merged as a result of an operation that accesses a secondary index leaf page, such as a SQL statement that performs a lookup via that index, or is modifying the index. Also ROLLBACK and some background operations, such as purging the history of committed transactions, or computing index cardinality statistics, can cause change buffer merge. Encryption key rotation will not perform change buffer merge. The motivation of this change is to simplify the I/O logic and to allow crash recovery to happen in the background (MDEV-14481). We also hope that this will reduce the number of "mystery" crashes due to corrupted data. Because change buffer merge will typically take place as a result of executing SQL statements, there should be a clearer connection between the crash and the SQL statements that were executed when the server crashed. In many cases, a slight performance improvement was observed. This is joint work with Thirunarayanan Balathandayuthapani and was tested by Axel Schwenke and Matthias Leich. The InnoDB monitor counter innodb_ibuf_merge_usec will be removed. On slow shutdown (innodb_fast_shutdown=0), we will continue to merge all buffered changes (and purge all undo log history). Two InnoDB configuration parameters will be changed as follows: innodb_disable_background_merge: Removed. This parameter existed only in debug builds. All change buffer merges will use synchronous reads. innodb_force_recovery will be changed as follows: * innodb_force_recovery=4 will be the same as innodb_force_recovery=3 (the change buffer merge cannot be disabled; it can only happen as a result of an operation that accesses a secondary index leaf page). The option used to be capable of corrupting secondary index leaf pages. Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'. * innodb_force_recovery=5 (which essentially hard-wires SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED) becomes safe to use. Bogus data can be returned to SQL, but persistent InnoDB data files will not be corrupted further. * innodb_force_recovery=6 (ignore the redo log files) will be the only option that can potentially cause persistent corruption of InnoDB data files. Code changes: buf_page_t::ibuf_exist: New flag, to indicate whether buffered changes exist for a buffer pool page. Pages with pending changes can be returned by buf_page_get_gen(). Previously, the changes were always merged inside buf_page_get_gen() if needed. ibuf_page_exists(const buf_page_t&): Check if a buffered changes exist for an X-latched or read-fixed page. buf_page_get_gen(): Add the parameter allow_ibuf_merge=false. All callers that know that they may be accessing a secondary index leaf page must pass this parameter as allow_ibuf_merge=true, unless it does not matter for that caller whether all buffered changes have been applied. Assert that whenever allow_ibuf_merge holds, the page actually is a leaf page. Attempt change buffer merge only to secondary B-tree index leaf pages. btr_block_get(): Add parameter 'bool merge'. All callers of btr_block_get() should know whether the page could be a secondary index leaf page. If it is not, we should avoid consulting the change buffer bitmap to even consider a merge. This is the main interface to requesting index pages from the buffer pool. ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace buf_page_get_known_nowait() with much simpler logic, because it is now guaranteed that that the block is x-latched or read-fixed. mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge(). On crash recovery, we will no longer merge any buffered changes for the pages that we read into the buffer pool during the last batch of applying log records. buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove. btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait() to its only remaining caller. buf_page_make_young_if_needed(): Define as an inline function. Add the parameter buf_pool. buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the parameter buf_pool. fil_space_validate_for_mtr_commit(): Remove a bogus comment about background merge of the change buffer. btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(), btr_cur_open_at_index_side_func(): Use narrower data types and scopes. ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages(). Merge the change buffer by invoking buf_page_get_gen().	2019-10-11 17:28:15 +03:00
Marko Mäkelä	d04f2de80a	Merge 10.4 into 10.5	2019-10-11 08:41:36 +03:00
Marko Mäkelä	09afd3da1a	Merge 10.3 into 10.4	2019-10-10 21:30:40 +03:00
Marko Mäkelä	4cdb72f237	MDEV-19783: Relax an assertion btr_page_get_split_rec_to_left(): Assert that in the leftmost leaf page, if the metadata record exists, index->is_instant() must hold. The assertion of commit `01f45becd1` could fail during innobase_instant_try().	2019-10-10 21:22:38 +03:00
Marko Mäkelä	01f45becd1	MDEV-19783: Add more assertions btr_page_get_split_rec_to_left(): Assert that in the leftmost leaf page, the metadata record exists if and only if index->is_instant(). page_validate(): Correct the wording of a message. rec_init_offsets(): Assert that whenever a record is in "instant ALTER" format, index->is_instant() must hold.	2019-10-10 20:40:26 +03:00
Marko Mäkelä	7f84e3ad75	Merge 10.2 into 10.3	2019-10-10 20:38:44 +03:00

1 2 3 4

187 commits