mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	30dc4287ec	Merge 10.5 into 10.6	2020-12-19 12:23:06 +02:00
Marko Mäkelä	0c23e32d27	MDEV-24445 Using innodb_undo_tablespaces corrupts system tablespace In the rewrite of MDEV-8139 (based on MDEV-15528), we introduced a wrong assumption that any persistent tablespace that is not an .ibd file is the system tablespace. This assumption is broken when innodb_undo_tablespaces (files undo001, undo002, ...) are being used. By default, we have innodb_undo_tablespaces=0 (the persistent undo log is being stored in the system tablespace). In MDEV-15528 and MDEV-8139 we rewrote the page scrubbing logic so that it will follow the tried-and-true write-ahead logging protocol, first writing FREE_PAGE records and then in the page flushing, zerofilling or hole-punching freed pages. Unfortunately, the implementation included a wrong assumption that that anything that is not in an .ibd file must be the system tablespace. This wrong assumption would cause overwrites of valid data pages in the system tablespace. mtr_t::m_freed_in_system_tablespace: Remove. mtr_t::m_freed_space: The tablespace associated with m_freed_pages. buf_page_free(): Take the tablespace and page number as a parameter, instead of taking a page identifier.	2020-12-18 19:23:34 +02:00
Marko Mäkelä	ff5d306e29	MDEV-21452: Replace ib_mutex_t with mysql_mutex_t SHOW ENGINE INNODB MUTEX functionality is completely removed, as are the InnoDB latching order checks. We will enforce innodb_fatal_semaphore_wait_threshold only for dict_sys.mutex and lock_sys.mutex. dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex. lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex. FIXME: srv_sys should be removed altogether; it is duplicating tpool functionality. fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must not hold fil_system.mutex. fil_close_all_files(): To prevent SAFE_MUTEX warnings for fil_space_destroy_crypt_data(), we must not hold fil_system.mutex while invoking fil_space_free_low() on a detached tablespace.	2020-12-15 17:56:18 +02:00
Marko Mäkelä	38fd7b7d91	MDEV-21452: Replace all direct use of os_event_t Let us replace os_event_t with mysql_cond_t, and replace the necessary ib_mutex_t with mysql_mutex_t so that they can be used with condition variables. Also, let us replace polling (os_thread_sleep() or timed waits) with plain mysql_cond_wait() wherever possible. Furthermore, we will use the lightweight srw_mutex for trx_t::mutex, to hopefully reduce contention on lock_sys.mutex. FIXME: Add test coverage of mariabackup --backup --kill-long-queries-timeout	2020-12-15 17:56:17 +02:00
Marko Mäkelä	9702be2c73	MDEV-24142: Remove __FILE__,__LINE__ related to buf_block_t::lock	2020-12-03 15:28:53 +02:00
Marko Mäkelä	ac028ec5d8	MDEV-24142: Remove the LatchDebug interface to rw-locks The latching order checks for rw-locks have not caught many bugs in the past few years and they are greatly complicating the code. Last time the debug checks were useful was in commit `59caf2c3c1` (MDEV-13485). The B-tree hang MDEV-14637 was not caught by LatchDebug, because the granularity of the checks is not sufficient to distinguish the levels of non-leaf B-tree pages. The interface was already made dead code by the grandparent commit `03ca6495df`.	2020-12-03 15:27:50 +02:00
Marko Mäkelä	03ca6495df	MDEV-24142: Replace InnoDB rw_lock_t with sux_lock InnoDB buffer pool block and index tree latches depend on a special kind of read-update-write lock that allows reentrant (recursive) acquisition of the 'update' and 'write' locks as well as an upgrade from 'update' lock to 'write' lock. The 'update' lock allows any number of reader locks from other threads, but no concurrent 'update' or 'write' lock. If there were no requirement to support an upgrade from 'update' to 'write', we could compose the lock out of two srw_lock (implemented as any type of native rw-lock, such as SRWLOCK on Microsoft Windows). Removing this requirement is very difficult, so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we implemented an 'update' mode to our srw_lock. Re-entrant or recursive locking is mostly needed when writing or freeing BLOB pages, but also in crash recovery or when merging buffered changes to an index page. The re-entrancy allows us to attach a previously acquired page to a sub-mini-transaction that will be committed before whatever else is holding the page latch. The SUX lock supports Shared ('read'), Update, and eXclusive ('write') locking modes. The S latches are not re-entrant, but a single S latch may be acquired even if the thread already holds an U latch. The idea of the U latch is to allow a write of something that concurrent readers do not care about (such as the contents of BTR_SEG_LEAF, BTR_SEG_TOP and other page allocation metadata structures, or the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field is only updated when a dict_table_t for the table exists, and only read when a dict_table_t for the table is being added to dict_sys.) block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page() to allow concurrent readers but no concurrent modifications while the page is being written to the data file. That latch will be released by buf_page_write_complete() in a different thread. Hence, we use the special lock owner value FOR_IO. The index_lock::u_lock() improves concurrency on operations that involve non-leaf index pages. The interface has been cleaned up a little. We will use x_lock_recursive() instead of x_lock() when we know that a lock is already held by the current thread. Similarly, a lock upgrade from U to X is only allowed via u_x_upgrade() or x_lock_upgraded() but not via x_lock(). We will disable the LatchDebug and sync_array interfaces to InnoDB rw-locks. The SEMAPHORES section of SHOW ENGINE INNODB STATUS output will no longer include any information about InnoDB rw-locks, only TTASEventMutex (cmake -DMUTEXTYPE=event) waits. This will make a part of the 'innotop' script dead code. The block_lock buf_block_t::lock will not be covered by any PERFORMANCE_SCHEMA instrumentation. SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES will no longer output source code file names or line numbers. The dict_index_t::lock will be identified by index and table names, which should be much more useful. PERFORMANCE_SCHEMA is lumping information about all dict_index_t::lock together as event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'. buf_page_free(): Remove the file,line parameters. The sux_lock will not store such diagnostic information. buf_block_dbg_add_level(): Define as empty macro, to be removed in a subsequent commit. Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO the index_lock dict_index_t::lock will be instrumented via PERFORMANCE_SCHEMA. Similar to commit `1669c8890c` we will distinguish lock waits by registering shared_lock,exclusive_lock events instead of try_shared_lock,try_exclusive_lock. Actual 'try' operations will not be instrumented at all. rw_lock_list: Remove. After MDEV-24167, this only covered buf_block_t::lock and dict_index_t::lock. We will output their information by traversing buf_pool or dict_sys.	2020-12-03 15:19:49 +02:00
Marko Mäkelä	edbde4a11f	MDEV-24167: Replace fil_space::latch We must avoid acquiring a latch while we are already holding one. The tablespace latch was being acquired recursively in some operations that allocate or free pages.	2020-11-24 15:43:12 +02:00
Marko Mäkelä	898521e2dd	Merge 10.4 into 10.5	2020-10-30 11:15:30 +02:00
Marko Mäkelä	7b2bb67113	Merge 10.3 into 10.4	2020-10-29 13:38:38 +02:00
Marko Mäkelä	2b6f804490	Merge 10.2 into 10.3	2020-10-28 10:44:40 +02:00
Eugene Kosov	afc9d00c66	MDEV-23991 dict_table_stats_lock() has unnecessarily long scope Patch removes dict_index_t::stats_latch. Table/index statistics now protected with dict_sys->mutex. That way statistics computation can happen in parallel in several threads and dict_sys->mutex will be locked only for a short period of time. This patch is a joint work with Marko Mäkelä dict_index_t:🔒 make mutable which allows to pass const pointer when only lock is touched in an object btr_height_get() btr_get_size(): make index argument const for better type safety btr_estimate_number_of_different_key_vals(): now returns computed values instead of setting fields in dict_index_t directly remove everything related to dict_index_t::stats_latch dict_stats_index_set_n_diff(): now returns computed values instead of setting fields in dict_index_t directly dict_stats_analyze_index(): now returns computed values instead of setting fields in dict_index_t directly Reviewed by: Marko Mäkelä	2020-10-27 19:09:20 +03:00
Marko Mäkelä	9028cc6b86	Cleanup: Make InnoDB page numbers uint32_t InnoDB stores a 32-bit page number in page headers and in some data structures, such as FIL_ADDR (consisting of a 32-bit page number and a 16-bit byte offset within a page). For better compile-time error detection and to reduce the memory footprint in some data structures, let us use a uint32_t for the page number, instead of ulint (size_t) which can be 64 bits.	2020-10-15 17:06:17 +03:00
Marko Mäkelä	ecb913c2a5	Cleanup: Compare page_id_t directly	2020-10-15 17:06:16 +03:00
Marko Mäkelä	3421223363	Merge 10.4 into 10.5	2020-09-09 16:57:30 +03:00
Marko Mäkelä	66ae50a564	Merge 10.3 into 10.4	2020-09-09 15:00:21 +03:00
Marko Mäkelä	7e07e38cf6	Merge 10.2 into 10.3	2020-09-09 13:06:46 +03:00
Thirunarayanan Balathandayuthapani	b1009ae5c1	MDEV-23456 fil_space_crypt_t::write_page0() is accessing an uninitialized page buf_page_create() is invoked when page is initialized. So that previous contents of the page ignored. In few cases, it calls buf_page_get_gen() is called to fetch the page from buffer pool. It should take x-latch on the page. If other thread uses the block or block io state is different from BUF_IO_NONE then release the mutex and check the state and buffer fix count again. For compressed page, use the existing free block from LRU list to create new page. Retry to fetch the compressed page if it is in flush list fseg_create(), fseg_create_general(): Introduce block as a parameter where segment header is placed. It is used to avoid repetitive x-latch on the same page Change the assert to check whether the page has SX latch and X latch in all callee function of buf_page_create() mtr_t::get_fix_count(): Get the buffer fix count of the given block added by the mtr FindBlock is added to find the buffer fix count of the given block acquired by the mini-transaction	2020-09-09 11:58:15 +05:30
Marko Mäkelä	d5d8756de3	Merge 10.4 into 10.5	2020-08-20 12:52:44 +03:00
Marko Mäkelä	2fa9f8c53a	Merge 10.3 into 10.4	2020-08-20 11:01:47 +03:00
Eugene Kosov	90c8d773ed	MDEV-21251 CHECK TABLE fails to check info_bits of records btr_validate_index(): do not stop checking after some level failed. That way it'll become possible to see errors in leaf pages even when uppers layers are corrupted too. page_validate(): check info_bits and status_bits more	2020-08-15 23:05:09 +03:00
Marko Mäkelä	50a11f396a	Merge 10.4 into 10.5	2020-08-01 14:42:51 +03:00
Marko Mäkelä	9216114ce7	Merge 10.3 into 10.4	2020-07-31 18:09:08 +03:00
Marko Mäkelä	66ec3a770f	Merge 10.2 into 10.3	2020-07-31 13:51:28 +03:00
Marko Mäkelä	05fa4558e0	MDEV-22110 InnoDB unnecessarily writes unmodified pages At least since commit `6a7be48b1b` InnoDB appears to be invoking buf_flush_note_modification() on pages that were exclusively latched but not modified in a mini-transaction. MTR_MEMO_MODIFY, mtr_t::modify(): Define not only in debug code, but also in release code. We will set the MTR_MEMO_MODIFY flag on the earliest mtr_t::m_memo entry that we find. MTR_LOG_NONE: Only use this mode in cases where the previous mode will be restored before anything is modified in the mini-transaction. MTR_MEMO_PAGE_X_MODIFY, MTR_MEMO_PAGE_SX_MODIFY: The allowed flag combinations that include MTR_MEMO_MODIFY. ReleaseBlocks: Only invoke buf_flush_note_modification() on those buffer pool blocks on which mtr_t::set_modified() and mtr_t::modify() were invoked.	2020-07-28 14:02:11 +03:00
Marko Mäkelä	3c3f172f17	MDEV-23308 CHECK TABLE attempts to access parent_right_page_no=FIL_NULL mysql/mysql-server@e00ad49edc which introduced WL#6326 to MySQL 5.7.2 added a buffer page acquisition to CHECK TABLE code (solely for the purpose of obeying the changed latching order), but failed to check that a parent page actually exists. It would not necessarily exist in a corrupted index where a parent page is missing pointer records to child pages.	2020-07-28 13:00:59 +03:00
Marko Mäkelä	d34eb4b1f6	MDEV-8139 follow-up fix: Remove an unused constexpr	2020-07-13 17:57:11 +03:00
Thirunarayanan Balathandayuthapani	572e53d8cc	MDEV-22931 mtr_t::mtr_t() allocates some memory mtr_t::m_freed_pages: Renamed from m_freed_ranges and made it as pointer indirection. mtr_t::add_freed_offset(): Allocates m_freed_pages. mtr_t:clear_freed_ranges(): Removed. mtr_t::init(): Added debug assertion to check whether m_freed_pages is not yet initialized. btr_page_alloc_low(): Remove #ifdef UNIV_DEBUG_SCRUBBING. mtr_t::commit(): Delete m_freed_pages, reset m_trim_pages and m_freed_in_system_tablespace. fil_space_t::clear_freed_ranges(): Added a comment to explain how undo log tablespaces uses it.	2020-06-19 21:12:13 +05:30
Marko Mäkelä	cfd3d70ccb	MDEV-22871: Remove pointer indirection for InnoDB hash_table_t hash_get_n_cells(): Remove. Access n_cells directly. hash_get_nth_cell(): Remove. Access array directly. hash_table_clear(): Replaced with hash_table_t::clear(). hash_table_create(), hash_table_free(): Remove. hash0hash.cc: Remove.	2020-06-18 14:16:01 +03:00
Marko Mäkelä	431200090e	MDEV-22867 Assertion instant.n_core_fields == n_core_fields failed This is a race condition where a table on which a 10.3-style instant ADD COLUMN is emptied during the execution of ALTER TABLE ... DROP COLUMN ..., DROP INDEX ..., ALGORITHM=NOCOPY. In commit `2c4844c9e7` the function instant_metadata_lock() would prevent this race condition. But, it would also hold a page latch on the leftmost leaf page of clustered index for the duration of a possible DROP INDEX operation. The race could be fixed by restoring the function instant_metadata_lock() that was removed in commit `ea37b14409` but it would be more future-proof to prevent the dict_index_t::clear_instant_add() call from being issued at all. We at some point support DROP COLUMN ..., ADD INDEX ..., ALGORITHM=NOCOPY and that would spend a non-trivial amount of execution time in ha_innobase::inplace_alter(), making a server hang possible. Currently this is not supported and our added test case will notice when the support is introduced. dict_index_t::must_avoid_clear_instant_add(): Determine if a call to clear_instant_add() must be avoided. btr_discard_only_page_on_level(): Preserve the metadata record if must_avoid_clear_instant_add() holds. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Do not remove the metadata record even if the table becomes empty but must_avoid_clear_instant_add() holds. btr_pcur_store_position(): Relax a debug assertion. This is joint work with Thirunarayanan Balathandayuthapani.	2020-06-12 20:33:39 +03:00
Thirunarayanan Balathandayuthapani	c92f7e287f	MDEV-8139 Fix Scrubbing fil_space_t::freed_ranges: Store ranges of freed page numbers. fil_space_t::last_freed_lsn: Store the most recent LSN of freeing a page. fil_space_t::freed_mutex: Protects freed_ranges, last_freed_lsn. fil_space_create(): Initialize the freed_range mutex. fil_space_free_low(): Frees the freed_range mutex. range_set: Ranges of page numbers. buf_page_create(): Removes the page from freed_ranges when page is being reused. btr_free_root(): Remove the PAGE_INDEX_ID invalidation. Because btr_free_root() and dict_drop_index_tree() are executed in the same atomic mini-transaction, there is no need to invalidate the root page. buf_release_freed_page(): Split from buf_flush_freed_page(). Skip any I/O buf_flush_freed_pages(): Get the freed ranges from tablespace and Write punch-hole or zeroes of the freed ranges. buf_flush_try_neighbors(): Handles the flushing of freed ranges. mtr_t::freed_pages: Variable to store the list of freed pages. mtr_t::add_freed_pages(): To add freed pages. mtr_t::clear_freed_pages(): To clear the freed pages. mtr_t::m_freed_in_system_tablespace: Variable to indicate whether page has been freed in system tablespace. mtr_t::m_trim_pages: Variable to indicate whether the space has been trimmed. mtr_t::commit(): Add the freed page and update the last freed lsn in the tablespace and clear the tablespace freed range if space is trimmed. file_name_t::freed_pages: Store the freed pages during recovery. file_name_t::add_freed_page(), file_name_t::remove_freed_page(): To add and remove freed page during recovery. store_freed_or_init_rec(): Store or remove the freed pages while encountering FREE_PAGE or INIT_PAGE redo log record. recv_init_crash_recovery_spaces(): Add the freed page encountered during recovery to respective tablespace.	2020-06-12 09:17:51 +05:30
Marko Mäkelä	17a7bafec0	MDEV-22110 preparation: Remove mtr_memo_contains macros Let us invoke the debug member functions of mtr_t directly. mtr_t::memo_contains(): Change the parameter type to const rw_lock_t&. This function cannot be invoked on buf_block_t::lock. The function mtr_t::memo_contains_flagged() is intended to be invoked on buf_block_t* or rw_lock_t*, and it along with mtr_t::memo_contains_page_flagged() are the way to check whether a buffer pool page has been latched within a mini-transaction.	2020-06-10 07:50:09 +03:00
Marko Mäkelä	b1ab211dee	MDEV-15053 Reduce buf_pool_t::mutex contention User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0 and will no longer report the PAGE_STATE value READY_FOR_USE. We will remove some fields from buf_page_t and move much code to member functions of buf_pool_t and buf_page_t, so that the access rules of data members can be enforced consistently. Evicting or adding pages in buf_pool.LRU will remain covered by buf_pool.mutex. Evicting or adding pages in buf_pool.page_hash will remain covered by both buf_pool.mutex and the buf_pool.page_hash X-latch. After this fix, buf_pool.page_hash lookups can entirely avoid acquiring buf_pool.mutex, only relying on buf_pool.hash_lock_get() S-latch. Similarly, buf_flush_check_neighbors() can will rely solely on buf_pool.mutex, no buf_pool.page_hash latch at all. The buf_pool.mutex is rather contended in I/O heavy benchmarks, especially when the workload does not fit in the buffer pool. The first attempt to alleviate the contention was the buf_pool_t::mutex split in commit `4ed7082eef` which introduced buf_block_t::mutex, which we are now removing. Later, multiple instances of buf_pool_t were introduced in commit `c18084f71b` and recently removed by us in commit `1a6f708ec5` (MDEV-15058). UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool related debugging in otherwise non-debug builds has not been used for years. Instead, we have been using UNIV_DEBUG, which is enabled in CMAKE_BUILD_TYPE=Debug. buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on std::atomic and the buf_pool.page_hash latches, and in some cases depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before. We must always release buf_block_t::lock before invoking unfix() or io_unfix(), to prevent a glitch where a block that was added to the buf_pool.free list would apper X-latched. See commit `c5883debd6` how this glitch was finally caught in a debug environment. We move some buf_pool_t::page_hash specific code from the ha and hash modules to buf_pool, for improved readability. buf_pool_t::close(): Assert that all blocks are clean, except on aborted startup or crash-like shutdown. buf_pool_t::validate(): No longer attempt to validate n_flush[] against the number of BUF_IO_WRITE fixed blocks, because buf_page_t::flush_type no longer exists. buf_pool_t::watch_set(): Replaces buf_pool_watch_set(). Reduce mutex contention by separating the buf_pool.watch[] allocation and the insert into buf_pool.page_hash. buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a buf_pool.page_hash latch. Replaces and extends buf_page_hash_lock_s_confirm() and buf_page_hash_lock_x_confirm(). buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES. buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads: Use Atomic_counter. buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out(). buf_pool_t::LRU_remove(): Remove a block from the LRU list and return its predecessor. Incorporates buf_LRU_adjust_hp(), which was removed. buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(), for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by BTR_DELETE_OP (purge), which is never invoked on temporary tables. buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments. buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition. buf_LRU_free_page(): Clarify the function comment. buf_flush_check_neighbor(), buf_flush_check_neighbors(): Rewrite the construction of the page hash range. We will hold the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64) consecutive lookups of buf_pool.page_hash. buf_flush_page_and_try_neighbors(): Remove. Merge to its only callers, and remove redundant operations in buf_flush_LRU_list_batch(). buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite. Do not acquire buf_pool.mutex, and iterate directly with page_id_t. ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined and avoids any loops. fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove. buf_flush_page(): Add a fil_space_t* parameter. Minimize the buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated atomically with the io_fix, and we will protect most buf_block_t fields with buf_block_t::lock. The function buf_flush_write_block_low() is removed and merged here. buf_page_init_for_read(): Use static linkage. Initialize the newly allocated block and acquire the exclusive buf_block_t::lock while not holding any mutex. IORequest::IORequest(): Remove the body. We only need to invoke set_punch_hole() in buf_flush_page() and nowhere else. buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type. This field is only used during a fil_io() call. That function already takes IORequest as a parameter, so we had better introduce for the rarely changing field. buf_block_t::init(): Replaces buf_page_init(). buf_page_t::init(): Replaces buf_page_init_low(). buf_block_t::initialise(): Initialise many fields, but keep the buf_page_t::state(). Both buf_pool_t::validate() and buf_page_optimistic_get() requires that buf_page_t::in_file() be protected atomically with buf_page_t::in_page_hash and buf_page_t::in_LRU_list. buf_page_optimistic_get(): Now that buf_block_t::mutex no longer exists, we must check buf_page_t::io_fix() after acquiring the buf_pool.page_hash lock, to detect whether buf_page_init_for_read() has been initiated. We will also check the io_fix() before acquiring hash_lock in order to avoid unnecessary computation. The field buf_block_t::modify_clock (protected by buf_block_t::lock) allows buf_page_optimistic_get() to validate the block. buf_page_t::real_size: Remove. It was only used while flushing pages of page_compressed tables. buf_page_encrypt(): Add an output parameter that allows us ot eliminate buf_page_t::real_size. Replace a condition with debug assertion. buf_page_should_punch_hole(): Remove. buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch(). Add the parameter size (to replace buf_page_t::real_size). buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page(). Add the parameter size (to replace buf_page_t::real_size). fil_system_t::detach(): Replaces fil_space_detach(). Ensure that fil_validate() will not be violated even if fil_system.mutex is released and reacquired. fil_node_t::complete_io(): Renamed from fil_node_complete_io(). fil_node_t::close_to_free(): Replaces fil_node_close_to_free(). Avoid invoking fil_node_t::close() because fil_system.n_open has already been decremented in fil_space_t::detach(). BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY. BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE, and distinguish dirty pages by buf_page_t::oldest_modification(). BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead. This state was only being used for buf_page_t that are in buf_pool.watch. buf_pool_t::watch[]: Remove pointer indirection. buf_page_t::in_flush_list: Remove. It was set if and only if buf_page_t::oldest_modification() is nonzero. buf_page_decrypt_after_read(), buf_corrupt_page_release(), buf_page_check_corrupt(): Change the const fil_space_t* parameter to const fil_node_t& so that we can report the correct file name. buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function. buf_page_io_complete(): Split to buf_page_read_complete() and buf_page_write_complete(). buf_dblwr_t::in_use: Remove. buf_dblwr_t::buf_block_array: Add IORequest::flush_t. buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of os_aio_wait_until_no_pending_writes(). buf_flush_write_complete(): Declare static, not global. Add the parameter IORequest::flush_t. buf_flush_freed_page(): Simplify the code. recv_sys_t::flush_lru: Renamed from flush_type and changed to bool. fil_read(), fil_write(): Replaced with direct use of fil_io(). fil_buffering_disabled(): Remove. Check srv_file_flush_method directly. fil_mutex_enter_and_prepare_for_io(): Return the resolved fil_space_t* to avoid a duplicated lookup in the caller. fil_report_invalid_page_access(): Clean up the parameters. fil_io(): Return fil_io_t, which comprises fil_node_t and error code. Always invoke fil_space_t::acquire_for_io() and let either the sync=true caller or fil_aio_callback() invoke fil_space_t::release_for_io(). fil_aio_callback(): Rewrite to replace buf_page_io_complete(). fil_check_pending_operations(): Remove a parameter, and remove some redundant lookups. fil_node_close_to_free(): Wait for n_pending==0. Because we no longer do an extra lookup of the tablespace between fil_io() and the completion of the operation, we must give fil_node_t::complete_io() a chance to decrement the counter. fil_close_tablespace(): Remove unused parameter trx, and document that this is only invoked during the error handling of IMPORT TABLESPACE. row_import_discard_changes(): Merged with the only caller, row_import_cleanup(). Do not lock up the data dictionary while invoking fil_close_tablespace(). logs_empty_and_mark_files_at_shutdown(): Do not invoke fil_close_all_files(), to avoid a !needs_flush assertion failure on fil_node_t::close(). innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files(). fil_close_all_files(): Invoke fil_flush_file_spaces() to ensure proper durability. thread_pool::unbind(): Fix a crash that would occur on Windows after srv_thread_pool->disable_aio() and os_file_close(). This fix was submitted by Vladislav Vaintroub. Thanks to Matthias Leich and Axel Schwenke for extensive testing, Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.	2020-06-05 12:35:46 +03:00
Marko Mäkelä	58d2d82022	MDEV-22710 Assertion ...status != buf_page_t::FREED in ibuf_remove_free_page() The buf_page_free() call that was introduced in MDEV-15528 was performed too early in fseg_free_page(), tripping a debug check in ibuf_remove_free_page(). In all other callers, we can (and will) invoke buf_page_free() right after fseg_free_page(), but in ibuf_remove_free_page() we will defer that call to the end of the mini-transaction. (That call was already present.)	2020-06-03 14:13:16 +03:00
Marko Mäkelä	23047d3ed4	Merge 10.4 into 10.5	2020-05-18 17:30:02 +03:00
Marko Mäkelä	9e6e43551f	Merge 10.3 into 10.4 We will expose some more std::atomic internals in Atomic_counter, so that dict_index_t::lock will support the default assignment operator.	2020-05-16 07:39:15 +03:00
Marko Mäkelä	3d0bb2b7f1	Merge 10.2 into 10.3	2020-05-15 19:11:57 +03:00
Marko Mäkelä	ad6171b91c	MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB If the InnoDB buffer pool contains many pages for a table or index that is being dropped or rebuilt, and if many of such pages are pointed to by the adaptive hash index, dropping the adaptive hash index may consume a lot of time. The time-consuming operation of dropping the adaptive hash index entries is being executed while the InnoDB data dictionary cache dict_sys is exclusively locked. It is not actually necessary to drop all adaptive hash index entries at the time a table or index is being dropped or rebuilt. We can let the LRU replacement policy of the buffer pool take care of this gradually. For this to work, we must detach the dict_table_t and dict_index_t objects from the main dict_sys cache, and once the last adaptive hash index entry for the detached table is removed (when the garbage page is evicted from the buffer pool) we can free the dict_table_t and dict_index_t object. Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE skip both the buffer pool eviction and the drop of the adaptive hash index. We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE. We can remove the eviction from DROP TABLE. We must retain the eviction in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the discarded table is being re-imported with the same tablespace identifier, the fresh data from the imported tablespace will replace any stale pages in the buffer pool. rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can no longer be interrupted inside InnoDB. fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(), fseg_free_page_low(), fseg_free_extent(): Remove the parameter that specifies whether the adaptive hash index should be dropped. btr_search_lazy_free(): Lazily free an index when the last reference to it is dropped from the adaptive hash index. buf_pool_clear_hash_index(): Declare static, and move to the same compilation unit with the bulk of the adaptive hash index code. dict_index_t::clone(), dict_index_t::clone_if_needed(): Clone an index that is being rebuilt while adaptive hash index entries exist. The original index will be inserted into dict_table_t::freed_indexes and dict_index_t::set_freed() will be called. dict_index_t::set_freed(), dict_index_t::freed(): Note that or check whether the index has been freed. We will use the impossible page number 1 to denote this condition. dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count(). dict_index_t::detach_columns(): Move the assignment n_fields=0 to ha_innobase_inplace_ctx::clear_added_indexes(). We must have access to the columns when freeing the adaptive hash index. Note: dict_table_t::v_cols[] will remain valid. If virtual columns are dropped or added, the table definition will be reloaded in ha_innobase::commit_inplace_alter_table(). buf_page_mtr_lock(): Drop a stale adaptive hash index if needed. We will also reduce the number of btr_get_search_latch() calls and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT in order to benefit cmake -DWITH_INNODB_AHI=OFF.	2020-05-15 17:23:08 +03:00
Marko Mäkelä	7bcaa541aa	Merge 10.4 into 10.5	2020-05-05 21:16:22 +03:00
Marko Mäkelä	2c3c851d2c	Merge 10.3 into 10.4	2020-05-05 20:33:10 +03:00
Oleksandr Byelkin	7fb73ed143	Merge branch '10.2' into 10.3	2020-05-04 16:47:11 +02:00
Daniel Black	ba2061da52	MDEV-21595: innodb offset_t rename to rec_offs thanks to: perl -i -pe 's/\boffset_t\b/rec_offs/g' $(git grep -lw offset_t storage/innobase)	2020-04-29 12:02:47 +03:00
Marko Mäkelä	0870b75af7	MDEV-22126 Rename confusing constant mtr_t::OPT The template parameter mtr_t::OPT refers to optional, not optimized. Also the default parameter mtr_t::NORMAL refers to optimized writes. The name MAYBE_NOP would be more descriptive, conveying the idea that a write to a durable page might not actually have any effect.	2020-04-03 08:50:46 +03:00
Marko Mäkelä	f224525204	MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC The -Wconversion in GCC seems to be stricter than in clang. GCC at least since version 4.4.7 issues truncation warnings for assignments to bitfields, while clang 10 appears to only issue warnings when the sizes in bytes rounded to the nearest integer powers of 2 are different. Before GCC 10.0.0, -Wconversion required more casts and would not allow some operations, such as x<<=1 or x+=1 on a data type that is narrower than int. GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining about x\|=y even when x and y are compatible types that are narrower than int. Hence, we must rewrite some x\|=y as x=static_cast<byte>(x\|y) or similar, or we must disable -Wconversion. In GCC 6 and later, the warning for assigning wider to bitfields that are narrower than 8, 16, or 32 bits can be suppressed by applying a bitwise & with the exact bitmask of the bitfield. For older GCC, we must disable -Wconversion for GCC 4 or 5 in such cases. The bitwise negation operator appears to promote short integers to a wider type, and hence we must add explicit truncation casts around them. Microsoft Visual C does not allow a static_cast to truncate a constant, such as static_cast<byte>(1) truncating int. Hence, we will use the constructor-style cast byte(~1) for such cases. This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0, clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019) on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).	2020-03-12 19:46:41 +02:00
Marko Mäkelä	8be3794b42	MDEV-21924 Clean up InnoDB GIS record comparison The extension of the record comparison functions for SPATIAL INDEX in mysql/mysql-server@b66ad511b6 was suboptimal for multiple reasons: Some functions used unnecessary temporary variables of the int type, instead of the more appropriate size_t, causing type mismatch. Many functions unnecessarily required rec_get_offsets() to be computed, or a parameter for length, although the size of the minimum bounding rectangle (MBR) is hard-coded as SPDIMS * 2 * sizeof(double), or 32 bytes. In InnoDB SPATIAL INDEX records, there always is a 32-byte key followed by either a 4-byte child page number or the PRIMARY KEY value. The length parameters were not properly validated. The function cmp_geometry_field() was making an incorrect attempt at checking that the lengths are at least sizeof(double) (8 bytes), even though the function is accessing up to 32 bytes in both MBR. Functions that are called from only one compilation unit are defined in another compilation unit, making the code harder to follow and potentially slower to execute. cmp_dtuple_rec_with_gis(): FIXME: Correct the debug assertion and possibly the function TABLE_SHARE::init_from_binary_frm_image() or related code, which causes an unexpected length of DATA_MBR_LEN + 2 bytes to be passed to this function.	2020-03-12 18:13:53 +02:00
Marko Mäkelä	574d8b2940	MDEV-21907: Fix most clang -Wconversion in InnoDB Declare innodb_purge_threads as 4-byte integer (UINT) instead of 4-or-8-byte (ULONG) and adjust the documentation string.	2020-03-11 08:29:48 +02:00
Marko Mäkelä	96901d9545	Cleanup: Remove dict_ind_redundant There is no reason for the dummy index object dict_ind_redundant to exist any more. It was only being passed to btr_create(). btr_create(): If !index, assume that a ROW_FORMAT=REDUNDANT table is being created. We could pass ibuf.index, dict_sys.sys_tables->indexes.start and so on, if those objects had been initialized before the function btr_create() is called.	2020-02-20 22:00:43 +02:00
Marko Mäkelä	23de5b8f07	MDEV-21725 Optimize btr_page_reorganize_low() redo logging btr_page_reorganize_low(): Log only the changed data in the page. TODO: Do not copy the entire changed payload to the redo log. Emit a combination of MEMMOVE and WRITE records to reduce the log volume.	2020-02-18 10:54:28 +02:00
Marko Mäkelä	fc87698048	MDEV-12353: Write less log for BLOB pages fsp_page_create(): Always initialize the page. The logic to avoid initialization was made redundant and should have been removed in mysql/mysql-server@ce0a1e85e2 (MySQL 5.7.5). btr_store_big_rec_extern_fields(): Remove the redundant initialization of FIL_PAGE_PREV and FIL_PAGE_NEXT. An INIT_PAGE record will have been written already. Only write the ROW_FORMAT=COMPRESSED page payload from FIL_PAGE_DATA onwards. We were unnecessarily writing from FIL_PAGE_TYPE onwards, which caused an assertion failure on recovery: recv_sys_t::alloc(size_t): Assertion 'len <= srv_page_size' failed when running the following tests: ./mtr --no-reorder innodb_zip.blob,4k innodb_zip.bug56680,4k	2020-02-17 10:13:32 +02:00
Marko Mäkelä	f8a9f90667	MDEV-12353: Remove support for crash-upgrade We tighten some assertions regarding dict_index_t::is_dummy and crash recovery, now that redo log processing will no longer create dummy objects.	2020-02-13 19:13:45 +02:00

1 2 3 4 5

232 commits