mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 11:01:52 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	ecb913c2a5	Cleanup: Compare page_id_t directly	2020-10-15 17:06:16 +03:00
Marko Mäkelä	7cffb5f6e8	MDEV-23399: Performance regression with write workloads The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted the performance bottleneck to the page flushing. The configuration parameters will be changed as follows: innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction) innodb_lru_scan_depth=1536 (old: 1024) innodb_max_dirty_pages_pct=90 (old: 75) innodb_max_dirty_pages_pct_lwm=75 (old: 0) Note: The parameter innodb_lru_scan_depth will only affect LRU eviction of buffer pool pages when a new page is being allocated. The page cleaner thread will no longer evict any pages. It used to guarantee that some pages will remain free in the buffer pool. Now, we perform that eviction 'on demand' in buf_LRU_get_free_block(). The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows: * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks() * As a buf_pool.free limit in buf_LRU_list_batch() for terminating the flushing that is initiated e.g., by buf_LRU_get_free_block() The parameter also used to serve as an initial limit for unzip_LRU eviction (evicting uncompressed page frames while retaining ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit of 100 or unlimited for invoking buf_LRU_scan_and_free_block(). The status variables will be changed as follows: innodb_buffer_pool_pages_flushed: This includes also the count of innodb_buffer_pool_pages_LRU_flushed and should work reliably, updated one by one in buf_flush_page() to give more real-time statistics. The function buf_flush_stats(), which we are removing, was not called in every code path. For both counters, we will use regular variables that are incremented in a critical section of buf_pool.mutex. Note that show_innodb_vars() directly links to the variables, and reads of the counters will not be protected by buf_pool.mutex, so you cannot get a consistent snapshot of both variables. The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed, because the page cleaner no longer deals with writing or evicting least recently used pages, and because the single-page writes have been removed: * buffer_LRU_batch_flush_avg_time_slot * buffer_LRU_batch_flush_avg_time_thread * buffer_LRU_batch_flush_avg_time_est * buffer_LRU_batch_flush_avg_pass * buffer_LRU_single_flush_scanned * buffer_LRU_single_flush_num_scan * buffer_LRU_single_flush_scanned_per_call When moving to a single buffer pool instance in MDEV-15058, we missed some opportunity to simplify the buf_flush_page_cleaner thread. It was unnecessarily using a mutex and some complex data structures, even though we always have a single page cleaner thread. Furthermore, the buf_flush_page_cleaner thread had separate 'recovery' and 'shutdown' modes where it was waiting to be triggered by some other thread, adding unnecessary latency and potential for hangs in relatively rarely executed startup or shutdown code. The page cleaner was also running two kinds of batches in an interleaved fashion: "LRU flush" (writing out some least recently used pages and evicting them on write completion) and the normal batches that aim to increase the MIN(oldest_modification) in the buffer pool, to help the log checkpoint advance. The buf_pool.flush_list flushing was being blocked by buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit `d1ab89037a` unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message will be removed, because by design, the buf_flush_page_cleaner() should not be blocked during a batch for extended periods of time. We will remove the single-page flushing altogether. Related to this, the debug parameter innodb_doublewrite_batch_size will be removed, because all of the doublewrite buffer will be used for flushing batches. If a page needs to be evicted from the buffer pool and all 100 least recently used pages in the buffer pool have unflushed changes, buf_LRU_get_free_block() will execute buf_flush_lists() to write out and evict innodb_lru_flush_size pages. At most one thread will execute buf_flush_lists() in buf_LRU_get_free_block(); other threads will wait for that LRU flushing batch to finish. To improve concurrency, we will replace the InnoDB ib_mutex_t and os_event_t native mutexes and condition variables in this area of code. Most notably, this means that the buffer pool mutex (buf_pool.mutex) is no longer instrumented via any InnoDB interfaces. It will continue to be instrumented via PERFORMANCE_SCHEMA. For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical sections of buf_pool.flush_list_mutex should be shorter than those for buf_pool.mutex, because in the worst case, they cover a linear scan of buf_pool.flush_list, while the worst case of a critical section of buf_pool.mutex covers a linear scan of the potentially much longer buf_pool.LRU list. mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable with SAFE_MUTEX. Some InnoDB debug assertions need this predicate instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner(). buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list: Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[]. The number of active flush operations. buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA and SAFE_MUTEX instrumentation. buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU. buf_pool_t::done_flush_list: Condition variable for !n_flush_list. buf_pool_t::do_flush_list: Condition variable to wake up the buf_flush_page_cleaner when a log checkpoint needs to be written or the server is being shut down. Replaces buf_flush_event. We will keep using timed waits (the page cleaner thread will wake _at least_ once per second), because the calculations for innodb_adaptive_flushing depend on fixed time intervals. buf_dblwr: Allocate statically, and move all code to member functions. Use a native mutex and condition variable. Remove code to deal with single-page flushing. buf_dblwr_check_block(): Make the check debug-only. We were spending a significant amount of execution time in page_simple_validate_new(). flush_counters_t::unzip_LRU_evicted: Remove. IORequest: Make more members const. FIXME: m_fil_node should be removed. buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex (which we are removing). page_cleaner_slot_t, page_cleaner_t: Remove many redundant members. pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot(). recv_writer_thread: Remove. Recovery works just fine without it, if we simply invoke buf_flush_sync() at the end of each batch in recv_sys_t::apply(). recv_recovery_from_checkpoint_finish(): Remove. We can simply call recv_sys.debug_free() directly. srv_started_redo: Replaces srv_start_state. SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown() can communicate with the normal page cleaner loop via the new function flush_buffer_pool(). buf_flush_remove(): Assert that the calling thread is holding buf_pool.flush_list_mutex. This removes unnecessary mutex operations from buf_flush_remove_pages() and buf_flush_dirty_pages(), which replace buf_LRU_flush_or_remove_pages(). buf_flush_lists(): Renamed from buf_flush_batch(), with simplified interface. Return the number of flushed pages. Clarified comments and renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this function, which was their only caller, and remove 2 unnecessary buf_pool.mutex release/re-acquisition that we used to perform around the buf_flush_batch() call. At the start, if not all log has been durably written, wait for a background task to do it, or start a new task to do it. This allows the log write to run concurrently with our page flushing batch. Any pages that were skipped due to too recent FIL_PAGE_LSN or due to them being latched by a writer should be flushed during the next batch, unless there are further modifications to those pages. It is possible that a page that we must flush due to small oldest_modification also carries a recent FIL_PAGE_LSN or is being constantly modified. In the worst case, all writers would then end up waiting in log_free_check() to allow the flushing and the checkpoint to complete. buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_flush_space(): Auxiliary function to look up a tablespace for page flushing. buf_flush_page(): Defer the computation of space->full_crc32(). Never call log_write_up_to(), but instead skip persistent pages whose latest modification (FIL_PAGE_LSN) is newer than the redo log. Also skip pages on which we cannot acquire a shared latch without waiting. buf_flush_try_neighbors(): Do not bother checking buf_fix_count because buf_flush_page() will no longer wait for the page latch. Take the tablespace as a parameter, and only execute this function when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold(). buf_flush_relocate_on_flush_list(): Declare as cold, and push down a condition from the callers. buf_flush_check_neighbor(): Take id.fold() as a parameter. buf_flush_sync(): Ensure that the buf_pool.flush_list is empty, because the flushing batch will skip pages whose modifications have not yet been written to the log or were latched for modification. buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables. buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize the counters, and report n->evicted. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_do_LRU_batch(): Return the number of pages flushed. buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if adaptive hash index entries are pointing to the block. buf_LRU_get_free_block(): Do not wake up the page cleaner, because it will no longer perform any useful work for us, and we do not want it to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0) writes out and evicts at most innodb_lru_flush_size pages. (The function buf_do_LRU_batch() may complete after writing fewer pages if more than innodb_lru_scan_depth pages end up in buf_pool.free list.) Eliminate some mutex release-acquire cycles, and wait for the LRU flush batch to complete before rescanning. buf_LRU_check_size_of_non_data_objects(): Simplify the code. buf_page_write_complete(): Remove the parameter evict, and always evict pages that were part of an LRU flush. buf_page_create(): Take a pre-allocated page as a parameter. buf_pool_t::free_block(): Free a pre-allocated block. recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block while not holding recv_sys.mutex. During page allocation, we may initiate a page flush, which in turn may initiate a log flush, which would require acquiring log_sys.mutex, which should always be acquired before recv_sys.mutex in order to avoid deadlocks. Therefore, we must not be holding recv_sys.mutex while allocating a buffer pool block. BtrBulk::logFreeCheck(): Skip a redundant condition. row_undo_step(): Do not invoke srv_inc_activity_count() for every row that is being rolled back. It should suffice to invoke the function in trx_flush_log_if_needed() during trx_t::commit_in_memory() when the rollback completes. sync_check_enable(): Remove. We will enable innodb_sync_debug from the very beginning. Reviewed by: Vladislav Vaintroub	2020-10-15 17:04:56 +03:00
Marko Mäkelä	1c65ab26de	Merge 10.4 into 10.5	2020-10-01 14:30:17 +03:00
Marko Mäkelä	bd64c2e8cc	Cleanup: Remove unnecessary trx_i_s_cache_t::last_read_mutex We can simply use C++11 std::atomic for avoiding undefined behaviour related to concurrent stores to a shared variable. On most if not all ISAs, std::memory_order_relaxed loads and stores will not really differ from non-atomic loads or stores.	2020-10-01 13:51:31 +03:00
Marko Mäkelä	e57c1167ab	MDEV-23806 Undo page corruption on recovery When commit `0fd3def284` removed the MLOG_UNDO_ERASE_END record in MariaDB 10.3.3 in an attempt to reduce our redo log volume, it introduced technical debt for commit `56f6dab1d0` (MDEV-21174) and commit `7ae21b18a6` (MDEV-12353), which optimized mtr_t::write() (née mlog_write_ulint()) so that the initial bytes that are unchanged are not logged. trx_undo_report_row_operation(): Write a log record for the memset() operation if the page is not going to be freed.	2020-09-24 14:12:22 +03:00
Marko Mäkelä	00cd53d39a	MDEV-23719: Make lock_sys use page_id_t Since commit `8ccb3caafb` it should be more efficient to use page_id_t rather than two separate variables for tablespace identifier and page number. lock_rec_fold(): Replaced with page_id_t::fold(). lock_rec_hash(): Replaced with lock_sys.hash(page_id). lock_rec_expl_exist_on_page(), lock_rec_get_first_on_page_addr(), lock_rec_get_first_on_page(): Replaced with lock_sys.get_first().	2020-09-17 14:08:41 +03:00
Marko Mäkelä	3421223363	Merge 10.4 into 10.5	2020-09-09 16:57:30 +03:00
Marko Mäkelä	66ae50a564	Merge 10.3 into 10.4	2020-09-09 15:00:21 +03:00
Marko Mäkelä	7e07e38cf6	Merge 10.2 into 10.3	2020-09-09 13:06:46 +03:00
Thirunarayanan Balathandayuthapani	b1009ae5c1	MDEV-23456 fil_space_crypt_t::write_page0() is accessing an uninitialized page buf_page_create() is invoked when page is initialized. So that previous contents of the page ignored. In few cases, it calls buf_page_get_gen() is called to fetch the page from buffer pool. It should take x-latch on the page. If other thread uses the block or block io state is different from BUF_IO_NONE then release the mutex and check the state and buffer fix count again. For compressed page, use the existing free block from LRU list to create new page. Retry to fetch the compressed page if it is in flush list fseg_create(), fseg_create_general(): Introduce block as a parameter where segment header is placed. It is used to avoid repetitive x-latch on the same page Change the assert to check whether the page has SX latch and X latch in all callee function of buf_page_create() mtr_t::get_fix_count(): Get the buffer fix count of the given block added by the mtr FindBlock is added to find the buffer fix count of the given block acquired by the mini-transaction	2020-09-09 11:58:15 +05:30
Marko Mäkelä	5ff7e68c7e	Merge 10.4 into 10.5	2020-09-04 18:44:44 +03:00
Marko Mäkelä	1cda462f46	Merge 10.3 into 10.4	2020-09-03 16:55:14 +03:00
Marko Mäkelä	938db04898	Cleanup: Remove os0proc.*	2020-09-03 16:40:42 +03:00
Marko Mäkelä	a7dd7c8993	MDEV-23651 InnoDB: Failing assertion: !space->referenced() commit `de942c9f61` (MDEV-15983) introduced a race condition that we inadequately fixed in commit `93b69825ad` (MDEV-16169). Because fil_space_t::release() or fil_space_t::acquire() are not protected by fil_system.mutex like their predecessors, it is possible that stop_new_ops was set between the time a thread checked fil_space_t::is_stopping() and invoked fil_space_t::acquire(). In an execution trace, this happened in fil_system_t::keyrotate_next(), causing an assertion failure in fil_delete_tablespace() in the other thread that seeked to stop new operations. We fix this bug by merging the flag fil_space_t::stop_new_ops and the reference count fil_space_t::n_pending_ops into a single word that is only being accessed by atomic memory operations. fil_space_t::set_stopping(): Accessor for changing the state of the former stop_new_ops flag. fil_space_t::acquire(): Return whether the acquisition succeeded. It would fail between set_stopping(true) and set_stopping(false).	2020-09-03 14:49:11 +03:00
Marko Mäkelä	97a4a3872e	Merge 10.4 into 10.5	2020-08-26 12:02:07 +03:00
Marko Mäkelä	1e08e08ccb	Merge 10.3 into 10.4	2020-08-26 11:30:20 +03:00
Marko Mäkelä	6a042281bd	Merge 10.2 into 10.3	2020-08-26 10:45:47 +03:00
Marko Mäkelä	8cf8ad86d4	MDEV-23547 InnoDB: Failing assertion: *len in row_upd_ext_fetch This bug was originally repeated on 10.4 after defining a UNIQUE KEY on a TEXT column, which is implemented by MDEV-371 by creating the index on a hidden virtual column. While row_vers_vc_matches_cluster() is executing in a purge thread to find out if an index entry may be removed in a secondary index that comprises a virtual column, another purge thread may process the undo log record that this check is interested in, and write a null BLOB pointer in that record. This would trip the assertion. To prevent this from occurring, we must propagate the 'missing BLOB' error up the call stack. row_upd_ext_fetch(): Return NULL when the error occurs. row_upd_index_replace_new_col_val(): Return whether the previous version was built successfully. row_upd_index_replace_new_col_vals_index_pos(): Check the error result. Yes, we would intentionally crash on this error if it occurs outside the purge thread. row_upd_index_replace_new_col_vals(): Check for the error condition, and simplify the logic. trx_undo_prev_version_build(): Check for the error condition.	2020-08-25 15:32:15 +03:00
Marko Mäkelä	0b4ed0b7fb	Merge 10.4 into 10.5	2020-08-21 20:32:04 +03:00
Marko Mäkelä	aa6cb7ed03	Merge 10.3 into 10.4	2020-08-21 19:57:22 +03:00
Marko Mäkelä	c277bcd591	Merge 10.2 into 10.3	2020-08-21 19:18:34 +03:00
Marko Mäkelä	f3160ee44f	MDEV-22782 AddressSanitizer race condition in trx_free() In trx_free() we used to declare the entire trx_t unaccessible and then declare that some data members are accessible. This involves a race condition with other threads that may concurrently access the data members that must remain accessible. One type of error is "AddressSanitizer: unknown-crash", whose exact cause we have not determined. Another type of error (reported in MDEV-23472) is "use-after-poison", where the reported shadow bytes would in fact be 00, indicating that the memory was no longer poisoned. The poison-access-unpoison race condition was confirmed by "rr replay". We eliminate the race condition by invoking MEM_NOACCESS on each individual data member of trx_t before freeing the memory to the pool. The memory would not be unpoisoned until the pool is freed or the memory is being reused for another allocation. trx_t::free(): Replaces trx_free(). trx_t::active_commit_ordered: Changed to bool, so that MEM_NOACCESS can be invoked. Removed some accessor functions. Pool: Remove all MEM_ instrumentation. TrxFactory: Move the MEM_ instrumentation from Pool. TrxFactory::debug(): Removed. Moved to trx_t::free(). Because the memory was already marked unaccessible in trx_t::free(), the Factory::debug() call in Pool::putl() would be unable to access it. trx_allocate_for_background(): Replaces trx_create_low(). trx_t::free(): Perform all consistency checks while avoiding duplication, and declare most data members unaccessible.	2020-08-21 18:23:28 +03:00
Marko Mäkelä	d5d8756de3	Merge 10.4 into 10.5	2020-08-20 12:52:44 +03:00
Marko Mäkelä	2fa9f8c53a	Merge 10.3 into 10.4	2020-08-20 11:01:47 +03:00
Marko Mäkelä	de0e7cd72a	Merge 10.2 into 10.3	2020-08-20 09:12:16 +03:00
Marko Mäkelä	309302a3da	MDEV-23475 InnoDB performance regression for write-heavy workloads In commit `fe39d02f51` (MDEV-20638) we removed some wake-up signaling of the master thread that should have been there, to ensure a steady log checkpointing workload. Common sense suggests that the commit omitted some necessary calls to srv_inc_activity_count(). But, an attempt to add the call to trx_flush_log_if_needed_low() as well as to reinstate the function innobase_active_small() did not restore the performance for the case where sync_binlog=1 is set. Therefore, we will revert the entire commit in MariaDB Server 10.2. In MariaDB Server 10.5, adding a srv_inc_activity_count() call to trx_flush_log_if_needed_low() did restore the performance, so we will not revert MDEV-20638 across all versions.	2020-08-19 11:18:56 +03:00
Marko Mäkelä	bbd70fcc43	MDEV-23379 Deprecate&ignore InnoDB concurrency throttling parameters The parameters innodb_thread_concurrency and innodb_commit_concurrency were useful years ago when both computing resources and the implementation of some shared data structures were limited. MySQL 5.0 or 5.1 had trouble scaling beyond 8 concurrent connections. Most of the scalability bottlenecks have been removed since then, and the transactions per second delivered by MariaDB Server 10.5 should not dramatically drop upon exceeding the 'optimal' number of connections. Hence, enabling any concurrency throttling for InnoDB actually makes things worse. We have seen many customers mistakenly setting this to a small value like 16 or 64 and then complaining the server was slow. Ignoring the parameters allows us to remove some normally unused code and data structures, which could slightly improve performance. innodb_thread_concurrency, innodb_commit_concurrency, innodb_replication_delay, innodb_concurrency_tickets, innodb_thread_sleep_delay, innodb_adaptive_max_sleep_delay: Deprecate and ignore; hard-wire to 0. The column INFORMATION_SCHEMA.INNODB_TRX.trx_concurrency_tickets will always report 0.	2020-08-04 06:59:29 +03:00
Marko Mäkelä	50a11f396a	Merge 10.4 into 10.5	2020-08-01 14:42:51 +03:00
Marko Mäkelä	9216114ce7	Merge 10.3 into 10.4	2020-07-31 18:09:08 +03:00
Marko Mäkelä	66ec3a770f	Merge 10.2 into 10.3	2020-07-31 13:51:28 +03:00
Marko Mäkelä	c5d4dd2533	MDEV-23339 innodb_force_recovery=2 may still abort the rollback of recovered transactions trx_rollback_active(), trx_rollback_resurrected(): Replace an incorrect condition that we failed to replace in commit `b68f1d847f` (MDEV-21217).	2020-07-30 09:24:36 +03:00
Thirunarayanan Balathandayuthapani	fe39d02f51	MDEV-20638 Remove the deadcode from srv_master_thread() and srv_active_wake_master_thread_low() - Due to commit `fe95cb2e40` (MDEV-16125), InnoDB master thread does not need to call srv_resume_thread() and therefore there is no need to wake up the thread. Due to the above patch, InnoDB should remove the following dead code. srv_check_activity(): Makes the parameter as in,out and returns the recent activity value innobase_active_small(): Removed srv_active_wake_master_thread(): Removed srv_wake_master_thread(): Removed srv_active_wake_master_thread_low(): Removed Simplify srv_master_thread() and remove switch cases, added the assert. Replace srv_wake_master_thread() with srv_inc_activity_count() INNOBASE_WAKE_INTERVAL: Removed	2020-07-23 16:23:20 +05:30
Marko Mäkelä	4ec032b492	Merge 10.4 into 10.5	2020-07-21 17:33:16 +03:00
Marko Mäkelä	b1538f4d60	Merge 10.3 into 10.4	2020-07-21 16:36:47 +03:00
Marko Mäkelä	b75563cdfd	MDEV-15880: ASAN heap-use-after-free with innodb_evict_tables_on_commit_debug trx_update_mod_tables_timestamp(): When implementing innodb_evict_tables_on_commit_debug, do not evict tables on which transactional locks exist. This debug variable was broken since its introduction in commit `947b0b5722`.	2020-07-21 16:03:08 +03:00
Marko Mäkelä	4d4865de6f	Merge 10.4 into 10.5	2020-07-20 15:55:59 +03:00
Marko Mäkelä	4b959bd8df	Merge 10.3 into 10.4	2020-07-20 15:34:59 +03:00
Marko Mäkelä	acc58fd835	Merge 10.2 into 10.3	2020-07-20 15:11:59 +03:00
Marko Mäkelä	ca9276e37e	Merge 10.1 into 10.2	2020-07-20 14:53:24 +03:00
Marko Mäkelä	57ec42bc32	MDEV-23190 InnoDB data file extension is not crash-safe When InnoDB is extending a data file, it is updating the FSP_SIZE field in the first page of the data file. In commit `8451e09073` (MDEV-11556) we removed a work-around for this bug and made recovery stricter, by making it track changes to FSP_SIZE via redo log records, and extend the data files before any changes are being applied to them. It turns out that the function fsp_fill_free_list() is not crash-safe with respect to this when it is initializing the change buffer bitmap page (page 1, or generally, N*innodb_page_size+1). It uses a separate mini-transaction that is committed (and will be written to the redo log file) before the mini-transaction that actually extended the data file. Hence, recovery can observe a reference to a page that is beyond the current end of the data file. fsp_fill_free_list(): Initialize the change buffer bitmap page in the same mini-transaction. The rest of the changes are fixing a bug that the use of the separate mini-transaction was attempting to work around. Namely, we must ensure that no other thread will access the change buffer bitmap page before our mini-transaction has been committed and all page latches have been released. That is, for read-ahead as well as neighbour flushing, we must avoid accessing pages that might not yet be durably part of the tablespace. fil_space_t::committed_size: The size of the tablespace as persisted by mtr_commit(). fil_space_t::max_page_number_for_io(): Limit the highest page number for I/O batches to committed_size. MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch. mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch. mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK, copy space->size to space->committed_size. In this way, read-ahead or flushing will never be invoked on pages that do not yet exist according to FSP_SIZE.	2020-07-20 14:48:56 +03:00
Marko Mäkelä	2d00e003b2	After-merge fixes for ASAN The merge commit `0fd89a1a89` of commit `b6ec1e8bbf` was slightly incomplete. ReadView::mem_valid(): Use the correct primitive MEM_MAKE_ADDRESSABLE(), because MEM_UNDEFINED() now has no effect on ASAN. recv_sys_t::alloc(), recv_sys_t::add(): Use MEM_MAKE_ADDRESSABLE() instead of MEM_UNDEFINED(), to get the correct behaviour for ASAN. For Valgrind and MSAN, there is no change in behaviour. recv_sys_t::free(), recv_sys_t::clear(): Before freeing memory to buf_pool.free_list, invoke MEM_MAKE_ADDRESSABLE() on the entire buf_block_t::frame, to cancel the effect of MEM_NOACCESS() in recv_sys_t::alloc().	2020-07-04 14:28:11 +03:00
Monty	0fd89a1a89	Merge remote-tracking branch 'origin/10.4' into 10.5	2020-07-03 23:31:12 +03:00
Monty	5211af1c16	Merge remote-tracking branch 'origin/10.3' into 10.4	2020-07-03 00:35:28 +03:00
Marko Mäkelä	b6ec1e8bbf	MDEV-20377 post-fix: Introduce MEM_MAKE_ADDRESSABLE In AddressSanitizer, we only want memory poisoning to happen in connection with custom memory allocation or freeing. The primary use of MEM_UNDEFINED is for declaring memory uninitialized in Valgrind or MemorySanitizer. We do not want MEM_UNDEFINED to have the unwanted side effect that AddressSanitizer would no longer be able to complain about accessing unallocated memory. MEM_UNDEFINED(): Define as no-op for AddressSanitizer. MEM_MAKE_ADDRESSABLE(): Define as MEM_UNDEFINED() or ASAN_UNPOISON_MEMORY_REGION(). MEM_CHECK_ADDRESSABLE(): Wrap also __asan_region_is_poisoned().	2020-07-02 17:59:28 +03:00
Monty	65f831d17c	Fixed bugs found by valgrind - Some of the bug fixes are backports from 10.5! - The fix in innobase/fil/fil0fil.cc is just a backport to get less error messages in mysqld.1.err when running with valgrind. - Renamed HAVE_valgrind_or_MSAN to HAVE_valgrind	2020-07-02 17:57:34 +03:00
Marko Mäkelä	1813d92d0c	Merge 10.4 into 10.5	2020-07-02 09:41:44 +03:00
Marko Mäkelä	f347b3e0e6	Merge 10.3 into 10.4	2020-07-02 07:39:33 +03:00
Marko Mäkelä	1df1a63924	Merge 10.2 into 10.3	2020-07-02 06:17:51 +03:00
Marko Mäkelä	c36834c832	MDEV-20377: Make WITH_MSAN more usable MemorySanitizer (clang -fsanitize=memory) requires that all code be compiled with instrumentation enabled. The only exception is the C runtime library. Failure to use instrumented libraries will cause bogus messages about memory being uninitialized. In WITH_MSAN builds, we must avoid calling getservbyname(), because even though it is a standard library function, it is not instrumented, not even in clang 10. Note: Before MariaDB Server 10.5, ./mtr will typically fail due to the old PCRE library, which was updated in MDEV-14024. The following cmake options were tested on 10.5 in commit `94d0bb4dbe`: cmake \ -DCMAKE_C_FLAGS='-march=native -O2' \ -DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \ -DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \ -DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \ -DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \ -DWITH_SAFEMALLOC=OFF \ -DWITH_{ZLIB,SSL,PCRE}=bundled \ -DHAVE_LIBAIO_H=0 \ -DWITH_MSAN=ON MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED() and __msan_unpoison(). MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow(). InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros. ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in functions instead of inline assembler when building WITH_MSAN. This will require at least -msse4.2 when building for IA-32 or AMD64. The inline assembler would not be instrumented, and would thus cause bogus failures.	2020-07-01 17:23:00 +03:00
Eugene Kosov	e9c389c334	MDEV-22701 InnoDB: encapsulate trx_sys.mutex and trx_sys.trx_list into a separate class thread_safe_trx_ilist_t: almost generic one UT_LIST was replaced with ilist<t> innobase_kill_query: wrong comment removed.	2020-06-23 19:11:57 +03:00

1 2 3 4 5 ...

1146 commits