mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	4f4a4cf9eb	MDEV-23399 fixup: Use plain pthread_cond The condition variables that were introduced in commit `7cffb5f6e8` (MDEV-23399) are never instrumented with PERFORMANCE_SCHEMA. Let us avoid the storage overhead and dead code.	2021-02-07 12:19:24 +02:00
Marko Mäkelä	83591a23d6	MDEV-24350 buf_dblwr unnecessarily uses memory-intensive srv_stats counters The counters in srv_stats use std::atomic and multiple cache lines per counter. This is an overkill in a case where a critical section already exists in the code. A regular variable will work just fine, with much smaller memory bus impact.	2020-12-04 17:52:23 +02:00
Marko Mäkelä	a5a2ef079c	MDEV-23855: Implement asynchronous doublewrite Synchronous writes and calls to fdatasync(), fsync() or FlushFileBuffers() would ruin performance. So, let us submit asynchronous writes for the doublewrite buffer. We submit a single request for the likely case that the two doublewrite buffers are contiquous in the system tablespace. buf_dblwr_t::flush_buffered_writes_completed(): The completion callback of buf_dblwr_t::flush_buffered_writes(). os_aio_wait_until_no_pending_writes(): Also wait for doublewrite batches. buf_dblwr_t::element::space: Remove. We can simply use element::request.node->space instead. Reviewed by: Vladislav Vaintroub	2020-10-26 17:53:55 +02:00
Marko Mäkelä	ef3f71fa74	MDEV-23399 fixup: Interleaved doublewrite batches Author: Vladislav Vaintroub	2020-10-26 17:53:54 +02:00
Marko Mäkelä	45ed9dd957	MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contention Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored for system tablespace on SSD When the maximum configured number of file is exceeded, InnoDB will close data files. We used to maintain a fil_system.LRU list and a counter fil_node_t::n_pending to achieve this, at the huge cost of multiple fil_system.mutex operations per I/O operation. fil_node_open_file_low(): Implement a FIFO replacement policy: The last opened file will be moved to the end of fil_system.space_list, and files will be closed from the start of the list. However, we will not move tablespaces in fil_system.space_list while i_s_tablespaces_encryption_fill_table() is executing (producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION) because it may cause information of some tablespaces to go missing. We also avoid this in mariabackup --backup because datafiles_iter_next() assumes that the ordering is not changed. IORequest: Fold more parameters to IORequest::type. fil_space_t::io(): Replaces fil_io(). fil_space_t::flush(): Replaces fil_flush(). OS_AIO_IBUF: Remove. We will always issue synchronous reads of the change buffer pages in buf_read_page_low(). We will always ignore some errors for background reads. This should reduce fil_system.mutex contention a little. fil_node_t::complete_write(): Replaces fil_node_t::complete_io(). On both read and write completion, fil_space_t::release_for_io() will have to be called. fil_space_t::io(): Do not acquire fil_system.mutex in the normal code path. xb_delta_open_matching_space(): Do not try to open the system tablespace which was already opened. This fixes a file sharing violation in mariabackup --prepare --incremental. Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	7cffb5f6e8	MDEV-23399: Performance regression with write workloads The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted the performance bottleneck to the page flushing. The configuration parameters will be changed as follows: innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction) innodb_lru_scan_depth=1536 (old: 1024) innodb_max_dirty_pages_pct=90 (old: 75) innodb_max_dirty_pages_pct_lwm=75 (old: 0) Note: The parameter innodb_lru_scan_depth will only affect LRU eviction of buffer pool pages when a new page is being allocated. The page cleaner thread will no longer evict any pages. It used to guarantee that some pages will remain free in the buffer pool. Now, we perform that eviction 'on demand' in buf_LRU_get_free_block(). The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows: * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks() * As a buf_pool.free limit in buf_LRU_list_batch() for terminating the flushing that is initiated e.g., by buf_LRU_get_free_block() The parameter also used to serve as an initial limit for unzip_LRU eviction (evicting uncompressed page frames while retaining ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit of 100 or unlimited for invoking buf_LRU_scan_and_free_block(). The status variables will be changed as follows: innodb_buffer_pool_pages_flushed: This includes also the count of innodb_buffer_pool_pages_LRU_flushed and should work reliably, updated one by one in buf_flush_page() to give more real-time statistics. The function buf_flush_stats(), which we are removing, was not called in every code path. For both counters, we will use regular variables that are incremented in a critical section of buf_pool.mutex. Note that show_innodb_vars() directly links to the variables, and reads of the counters will not be protected by buf_pool.mutex, so you cannot get a consistent snapshot of both variables. The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed, because the page cleaner no longer deals with writing or evicting least recently used pages, and because the single-page writes have been removed: * buffer_LRU_batch_flush_avg_time_slot * buffer_LRU_batch_flush_avg_time_thread * buffer_LRU_batch_flush_avg_time_est * buffer_LRU_batch_flush_avg_pass * buffer_LRU_single_flush_scanned * buffer_LRU_single_flush_num_scan * buffer_LRU_single_flush_scanned_per_call When moving to a single buffer pool instance in MDEV-15058, we missed some opportunity to simplify the buf_flush_page_cleaner thread. It was unnecessarily using a mutex and some complex data structures, even though we always have a single page cleaner thread. Furthermore, the buf_flush_page_cleaner thread had separate 'recovery' and 'shutdown' modes where it was waiting to be triggered by some other thread, adding unnecessary latency and potential for hangs in relatively rarely executed startup or shutdown code. The page cleaner was also running two kinds of batches in an interleaved fashion: "LRU flush" (writing out some least recently used pages and evicting them on write completion) and the normal batches that aim to increase the MIN(oldest_modification) in the buffer pool, to help the log checkpoint advance. The buf_pool.flush_list flushing was being blocked by buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit `d1ab89037a` unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message will be removed, because by design, the buf_flush_page_cleaner() should not be blocked during a batch for extended periods of time. We will remove the single-page flushing altogether. Related to this, the debug parameter innodb_doublewrite_batch_size will be removed, because all of the doublewrite buffer will be used for flushing batches. If a page needs to be evicted from the buffer pool and all 100 least recently used pages in the buffer pool have unflushed changes, buf_LRU_get_free_block() will execute buf_flush_lists() to write out and evict innodb_lru_flush_size pages. At most one thread will execute buf_flush_lists() in buf_LRU_get_free_block(); other threads will wait for that LRU flushing batch to finish. To improve concurrency, we will replace the InnoDB ib_mutex_t and os_event_t native mutexes and condition variables in this area of code. Most notably, this means that the buffer pool mutex (buf_pool.mutex) is no longer instrumented via any InnoDB interfaces. It will continue to be instrumented via PERFORMANCE_SCHEMA. For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical sections of buf_pool.flush_list_mutex should be shorter than those for buf_pool.mutex, because in the worst case, they cover a linear scan of buf_pool.flush_list, while the worst case of a critical section of buf_pool.mutex covers a linear scan of the potentially much longer buf_pool.LRU list. mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable with SAFE_MUTEX. Some InnoDB debug assertions need this predicate instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner(). buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list: Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[]. The number of active flush operations. buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA and SAFE_MUTEX instrumentation. buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU. buf_pool_t::done_flush_list: Condition variable for !n_flush_list. buf_pool_t::do_flush_list: Condition variable to wake up the buf_flush_page_cleaner when a log checkpoint needs to be written or the server is being shut down. Replaces buf_flush_event. We will keep using timed waits (the page cleaner thread will wake _at least_ once per second), because the calculations for innodb_adaptive_flushing depend on fixed time intervals. buf_dblwr: Allocate statically, and move all code to member functions. Use a native mutex and condition variable. Remove code to deal with single-page flushing. buf_dblwr_check_block(): Make the check debug-only. We were spending a significant amount of execution time in page_simple_validate_new(). flush_counters_t::unzip_LRU_evicted: Remove. IORequest: Make more members const. FIXME: m_fil_node should be removed. buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex (which we are removing). page_cleaner_slot_t, page_cleaner_t: Remove many redundant members. pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot(). recv_writer_thread: Remove. Recovery works just fine without it, if we simply invoke buf_flush_sync() at the end of each batch in recv_sys_t::apply(). recv_recovery_from_checkpoint_finish(): Remove. We can simply call recv_sys.debug_free() directly. srv_started_redo: Replaces srv_start_state. SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown() can communicate with the normal page cleaner loop via the new function flush_buffer_pool(). buf_flush_remove(): Assert that the calling thread is holding buf_pool.flush_list_mutex. This removes unnecessary mutex operations from buf_flush_remove_pages() and buf_flush_dirty_pages(), which replace buf_LRU_flush_or_remove_pages(). buf_flush_lists(): Renamed from buf_flush_batch(), with simplified interface. Return the number of flushed pages. Clarified comments and renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this function, which was their only caller, and remove 2 unnecessary buf_pool.mutex release/re-acquisition that we used to perform around the buf_flush_batch() call. At the start, if not all log has been durably written, wait for a background task to do it, or start a new task to do it. This allows the log write to run concurrently with our page flushing batch. Any pages that were skipped due to too recent FIL_PAGE_LSN or due to them being latched by a writer should be flushed during the next batch, unless there are further modifications to those pages. It is possible that a page that we must flush due to small oldest_modification also carries a recent FIL_PAGE_LSN or is being constantly modified. In the worst case, all writers would then end up waiting in log_free_check() to allow the flushing and the checkpoint to complete. buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_flush_space(): Auxiliary function to look up a tablespace for page flushing. buf_flush_page(): Defer the computation of space->full_crc32(). Never call log_write_up_to(), but instead skip persistent pages whose latest modification (FIL_PAGE_LSN) is newer than the redo log. Also skip pages on which we cannot acquire a shared latch without waiting. buf_flush_try_neighbors(): Do not bother checking buf_fix_count because buf_flush_page() will no longer wait for the page latch. Take the tablespace as a parameter, and only execute this function when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold(). buf_flush_relocate_on_flush_list(): Declare as cold, and push down a condition from the callers. buf_flush_check_neighbor(): Take id.fold() as a parameter. buf_flush_sync(): Ensure that the buf_pool.flush_list is empty, because the flushing batch will skip pages whose modifications have not yet been written to the log or were latched for modification. buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables. buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize the counters, and report n->evicted. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_do_LRU_batch(): Return the number of pages flushed. buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if adaptive hash index entries are pointing to the block. buf_LRU_get_free_block(): Do not wake up the page cleaner, because it will no longer perform any useful work for us, and we do not want it to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0) writes out and evicts at most innodb_lru_flush_size pages. (The function buf_do_LRU_batch() may complete after writing fewer pages if more than innodb_lru_scan_depth pages end up in buf_pool.free list.) Eliminate some mutex release-acquire cycles, and wait for the LRU flush batch to complete before rescanning. buf_LRU_check_size_of_non_data_objects(): Simplify the code. buf_page_write_complete(): Remove the parameter evict, and always evict pages that were part of an LRU flush. buf_page_create(): Take a pre-allocated page as a parameter. buf_pool_t::free_block(): Free a pre-allocated block. recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block while not holding recv_sys.mutex. During page allocation, we may initiate a page flush, which in turn may initiate a log flush, which would require acquiring log_sys.mutex, which should always be acquired before recv_sys.mutex in order to avoid deadlocks. Therefore, we must not be holding recv_sys.mutex while allocating a buffer pool block. BtrBulk::logFreeCheck(): Skip a redundant condition. row_undo_step(): Do not invoke srv_inc_activity_count() for every row that is being rolled back. It should suffice to invoke the function in trx_flush_log_if_needed() during trx_t::commit_in_memory() when the rollback completes. sync_check_enable(): Remove. We will enable innodb_sync_debug from the very beginning. Reviewed by: Vladislav Vaintroub	2020-10-15 17:04:56 +03:00
Marko Mäkelä	c7399db645	MDEV-21174 fixup: Remove buf_dblwr_being_created	2020-09-30 13:50:00 +03:00
Marko Mäkelä	b1ab211dee	MDEV-15053 Reduce buf_pool_t::mutex contention User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0 and will no longer report the PAGE_STATE value READY_FOR_USE. We will remove some fields from buf_page_t and move much code to member functions of buf_pool_t and buf_page_t, so that the access rules of data members can be enforced consistently. Evicting or adding pages in buf_pool.LRU will remain covered by buf_pool.mutex. Evicting or adding pages in buf_pool.page_hash will remain covered by both buf_pool.mutex and the buf_pool.page_hash X-latch. After this fix, buf_pool.page_hash lookups can entirely avoid acquiring buf_pool.mutex, only relying on buf_pool.hash_lock_get() S-latch. Similarly, buf_flush_check_neighbors() can will rely solely on buf_pool.mutex, no buf_pool.page_hash latch at all. The buf_pool.mutex is rather contended in I/O heavy benchmarks, especially when the workload does not fit in the buffer pool. The first attempt to alleviate the contention was the buf_pool_t::mutex split in commit `4ed7082eef` which introduced buf_block_t::mutex, which we are now removing. Later, multiple instances of buf_pool_t were introduced in commit `c18084f71b` and recently removed by us in commit `1a6f708ec5` (MDEV-15058). UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool related debugging in otherwise non-debug builds has not been used for years. Instead, we have been using UNIV_DEBUG, which is enabled in CMAKE_BUILD_TYPE=Debug. buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on std::atomic and the buf_pool.page_hash latches, and in some cases depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before. We must always release buf_block_t::lock before invoking unfix() or io_unfix(), to prevent a glitch where a block that was added to the buf_pool.free list would apper X-latched. See commit `c5883debd6` how this glitch was finally caught in a debug environment. We move some buf_pool_t::page_hash specific code from the ha and hash modules to buf_pool, for improved readability. buf_pool_t::close(): Assert that all blocks are clean, except on aborted startup or crash-like shutdown. buf_pool_t::validate(): No longer attempt to validate n_flush[] against the number of BUF_IO_WRITE fixed blocks, because buf_page_t::flush_type no longer exists. buf_pool_t::watch_set(): Replaces buf_pool_watch_set(). Reduce mutex contention by separating the buf_pool.watch[] allocation and the insert into buf_pool.page_hash. buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a buf_pool.page_hash latch. Replaces and extends buf_page_hash_lock_s_confirm() and buf_page_hash_lock_x_confirm(). buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES. buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads: Use Atomic_counter. buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out(). buf_pool_t::LRU_remove(): Remove a block from the LRU list and return its predecessor. Incorporates buf_LRU_adjust_hp(), which was removed. buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(), for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by BTR_DELETE_OP (purge), which is never invoked on temporary tables. buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments. buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition. buf_LRU_free_page(): Clarify the function comment. buf_flush_check_neighbor(), buf_flush_check_neighbors(): Rewrite the construction of the page hash range. We will hold the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64) consecutive lookups of buf_pool.page_hash. buf_flush_page_and_try_neighbors(): Remove. Merge to its only callers, and remove redundant operations in buf_flush_LRU_list_batch(). buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite. Do not acquire buf_pool.mutex, and iterate directly with page_id_t. ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined and avoids any loops. fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove. buf_flush_page(): Add a fil_space_t* parameter. Minimize the buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated atomically with the io_fix, and we will protect most buf_block_t fields with buf_block_t::lock. The function buf_flush_write_block_low() is removed and merged here. buf_page_init_for_read(): Use static linkage. Initialize the newly allocated block and acquire the exclusive buf_block_t::lock while not holding any mutex. IORequest::IORequest(): Remove the body. We only need to invoke set_punch_hole() in buf_flush_page() and nowhere else. buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type. This field is only used during a fil_io() call. That function already takes IORequest as a parameter, so we had better introduce for the rarely changing field. buf_block_t::init(): Replaces buf_page_init(). buf_page_t::init(): Replaces buf_page_init_low(). buf_block_t::initialise(): Initialise many fields, but keep the buf_page_t::state(). Both buf_pool_t::validate() and buf_page_optimistic_get() requires that buf_page_t::in_file() be protected atomically with buf_page_t::in_page_hash and buf_page_t::in_LRU_list. buf_page_optimistic_get(): Now that buf_block_t::mutex no longer exists, we must check buf_page_t::io_fix() after acquiring the buf_pool.page_hash lock, to detect whether buf_page_init_for_read() has been initiated. We will also check the io_fix() before acquiring hash_lock in order to avoid unnecessary computation. The field buf_block_t::modify_clock (protected by buf_block_t::lock) allows buf_page_optimistic_get() to validate the block. buf_page_t::real_size: Remove. It was only used while flushing pages of page_compressed tables. buf_page_encrypt(): Add an output parameter that allows us ot eliminate buf_page_t::real_size. Replace a condition with debug assertion. buf_page_should_punch_hole(): Remove. buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch(). Add the parameter size (to replace buf_page_t::real_size). buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page(). Add the parameter size (to replace buf_page_t::real_size). fil_system_t::detach(): Replaces fil_space_detach(). Ensure that fil_validate() will not be violated even if fil_system.mutex is released and reacquired. fil_node_t::complete_io(): Renamed from fil_node_complete_io(). fil_node_t::close_to_free(): Replaces fil_node_close_to_free(). Avoid invoking fil_node_t::close() because fil_system.n_open has already been decremented in fil_space_t::detach(). BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY. BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE, and distinguish dirty pages by buf_page_t::oldest_modification(). BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead. This state was only being used for buf_page_t that are in buf_pool.watch. buf_pool_t::watch[]: Remove pointer indirection. buf_page_t::in_flush_list: Remove. It was set if and only if buf_page_t::oldest_modification() is nonzero. buf_page_decrypt_after_read(), buf_corrupt_page_release(), buf_page_check_corrupt(): Change the const fil_space_t* parameter to const fil_node_t& so that we can report the correct file name. buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function. buf_page_io_complete(): Split to buf_page_read_complete() and buf_page_write_complete(). buf_dblwr_t::in_use: Remove. buf_dblwr_t::buf_block_array: Add IORequest::flush_t. buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of os_aio_wait_until_no_pending_writes(). buf_flush_write_complete(): Declare static, not global. Add the parameter IORequest::flush_t. buf_flush_freed_page(): Simplify the code. recv_sys_t::flush_lru: Renamed from flush_type and changed to bool. fil_read(), fil_write(): Replaced with direct use of fil_io(). fil_buffering_disabled(): Remove. Check srv_file_flush_method directly. fil_mutex_enter_and_prepare_for_io(): Return the resolved fil_space_t* to avoid a duplicated lookup in the caller. fil_report_invalid_page_access(): Clean up the parameters. fil_io(): Return fil_io_t, which comprises fil_node_t and error code. Always invoke fil_space_t::acquire_for_io() and let either the sync=true caller or fil_aio_callback() invoke fil_space_t::release_for_io(). fil_aio_callback(): Rewrite to replace buf_page_io_complete(). fil_check_pending_operations(): Remove a parameter, and remove some redundant lookups. fil_node_close_to_free(): Wait for n_pending==0. Because we no longer do an extra lookup of the tablespace between fil_io() and the completion of the operation, we must give fil_node_t::complete_io() a chance to decrement the counter. fil_close_tablespace(): Remove unused parameter trx, and document that this is only invoked during the error handling of IMPORT TABLESPACE. row_import_discard_changes(): Merged with the only caller, row_import_cleanup(). Do not lock up the data dictionary while invoking fil_close_tablespace(). logs_empty_and_mark_files_at_shutdown(): Do not invoke fil_close_all_files(), to avoid a !needs_flush assertion failure on fil_node_t::close(). innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files(). fil_close_all_files(): Invoke fil_flush_file_spaces() to ensure proper durability. thread_pool::unbind(): Fix a crash that would occur on Windows after srv_thread_pool->disable_aio() and os_file_close(). This fix was submitted by Vladislav Vaintroub. Thanks to Matthias Leich and Axel Schwenke for extensive testing, Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.	2020-06-05 12:35:46 +03:00
Marko Mäkelä	42a4ae54c2	MDEV-21225 Remove ut_align() and use aligned_malloc() Before commit `90c52e5291` introduced aligned_malloc(), InnoDB always used a pattern of over-allocating memory and invoking ut_align() to guarantee the desired alignment. It is cleaner to invoke aligned_malloc() and aligned_free() directly. ut_align(): Remove. In assertions, ut_align_down() can be used instead.	2019-12-05 06:42:31 +02:00
Vladislav Vaintroub	5e62b6a5e0	MDEV-16264 Use threadpool for Innodb background work. Almost all threads have gone - the "ticking" threads, that sleep a while then do some work) (srv_monitor_thread, srv_error_monitor_thread, srv_master_thread) were replaced with timers. Some timers are periodic, e.g the "master" timer. - The btr_defragment_thread is also replaced by a timer , which reschedules it self when current defragment "item" needs throttling - the buf_resize_thread and buf_dump_threads are substitutes with tasks Ditto with page cleaner workers. - purge workers threads are not tasks as well, and purge cleaner coordinator is a combination of a task and timer. - All AIO is outsourced to tpool, Innodb just calls thread_pool::submit_io() and provides the callback. - The srv_slot_t was removed, and innodb_debug_sync used in purge is currently not working, and needs reimplementation.	2019-11-15 18:09:30 +01:00
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	c0ac0b8860	Update FSF address	2019-05-11 19:25:02 +03:00
Marko Mäkelä	0abd2766b1	Merge 10.2 into 10.3 Also, related to MDEV-15522, MDEV-17304, MDEV-17835, remove the Galera xtrabackup tests, because xtrabackup never worked with MariaDB Server 10.3 due to InnoDB redo log format changes.	2018-11-30 09:38:56 +02:00
Marko Mäkelä	447e493179	Remove some unnecessary InnoDB #include	2018-11-29 12:53:44 +02:00
Marko Mäkelä	a90100d756	Replace univ_page_size and UNIV_PAGE_SIZE Try to use one variable (srv_page_size) for innodb_page_size. Also, replace UNIV_PAGE_SIZE_SHIFT with srv_page_size_shift.	2018-04-28 20:45:45 +03:00
Marko Mäkelä	2d8fdfbde5	Merge 10.1 into 10.2 Replace have_innodb_zip.inc with innodb_page_size_small.inc.	2017-06-08 12:45:08 +03:00
Marko Mäkelä	fbeb9489cd	Cleanup of MDEV-12600: crash during install_db with innodb_page_size=32K and ibdata1=3M The doublewrite buffer pages must fit in the first InnoDB system tablespace data file. The checks that were added in the initial patch (commit `112b21da37`) were at too high level and did not cover all cases. innodb.log_data_file_size: Test all innodb_page_size combinations. fsp_header_init(): Never return an error. Move the change buffer creation to the only caller that needs to do it. btr_create(): Clean up the logic. Remove the error log messages. buf_dblwr_create(): Try to return an error on non-fatal failure. Check that the first data file is big enough for creating the doublewrite buffers. buf_dblwr_process(): Check if the doublewrite buffer is available. Display the message only if it is available. recv_recovery_from_checkpoint_start_func(): Remove a redundant message about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already been initiated. fil_report_invalid_page_access(): Simplify the message. fseg_create_general(): Do not emit messages to the error log. innobase_init(): Revert the changes. trx_rseg_create(): Refactor (no functional change).	2017-06-08 11:55:47 +03:00
Marko Mäkelä	8f643e2063	Merge 10.1 into 10.2	2017-05-23 11:09:47 +03:00
Vicențiu Ciorbaru	b87873b221	Merge branch 'merge-innodb-5.6' into bb-10.0-vicentiu This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293 from current 5.6.36 innodb. Bug #23481444 OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE INDEX APPLIED BY UNCOMMITTED ROW Problem: ======== row_search_for_mysql() does whole table traversal for range query even though the end range is passed. Whole table traversal happens when the record is not with in transaction read view. Solution: ========= Convert the innodb last record of page to mysql format and compare with end range if the traversal of row_search_mvcc() exceeds 100, no ICP involved. If it is out of range then InnoDB can avoid the whole table traversal. Need to refactor the code little bit to make it compile. Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com> Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com> Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com> RB: 14660	2017-05-17 14:53:28 +03:00
Vicențiu Ciorbaru	0af9818240	5.6.36	2017-05-15 17:17:16 +03:00
Marko Mäkelä	da76c1bd3e	Minor cleanup	2017-04-26 23:03:28 +03:00
Marko Mäkelä	a13a636c74	MDEV-11802 innodb.innodb_bug14676111 fails The function trx_purge_stop() was calling os_event_reset(purge_sys->event) before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set() call in srv_purge_coordinator_suspend() is protected by that X-latch. It would seem a good idea to consistently protect both os_event_set() and os_event_reset() calls with a common mutex or rw-lock in those cases where os_event_set() and os_event_reset() are used like condition variables, tied to changes of shared state. For each os_event_t, we try to document the mutex or rw-lock that is being used. For some events, frequent calls to os_event_set() seem to try to avoid hangs. Some events are never waited for infinitely, only timed waits, and os_event_set() is used for early termination of these waits. os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro on other systems than Windows. TODO: remove this altogether and disable innodb_use_native_aio on Windows. os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0. log_write_flush_to_disk_low(): Invoke log_mutex_enter() at the end, to avoid race conditions when changing the system state. (No potential race condition existed before MySQL 5.7.)	2017-02-20 12:32:43 +02:00
Marko Mäkelä	13493078e9	MDEV-11802 innodb.innodb_bug14676111 fails The function trx_purge_stop() was calling os_event_reset(purge_sys->event) before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set() call in srv_purge_coordinator_suspend() is protected by that X-latch. It would seem a good idea to consistently protect both os_event_set() and os_event_reset() calls with a common mutex or rw-lock in those cases where os_event_set() and os_event_reset() are used like condition variables, tied to changes of shared state. For each os_event_t, we try to document the mutex or rw-lock that is being used. For some events, frequent calls to os_event_set() seem to try to avoid hangs. Some events are never waited for infinitely, only timed waits, and os_event_set() is used for early termination of these waits. os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro on other systems than Windows. TODO: remove this altogether and disable innodb_use_native_aio on Windows. os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.	2017-02-20 12:20:52 +02:00
Marko Mäkelä	63574f1275	MDEV-11690 Remove UNIV_HOTBACKUP The InnoDB source code contains quite a few references to a closed-source hot backup tool which was originally called InnoDB Hot Backup (ibbackup) and later incorporated in MySQL Enterprise Backup. The open source backup tool XtraBackup uses the full database for recovery. So, the references to UNIV_HOTBACKUP are only cluttering the source code.	2016-12-30 16:05:42 +02:00
Jan Lindström	fec844aca8	Merge InnoDB 5.7 from mysql-5.7.14. Contains also: MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE \|\| m_lock_type != 2' failed. (branch bb-10.2-jan) Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan) enable tests that were fixed in MDEV-10549 MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan) fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables	2016-09-08 15:49:03 +03:00
Jan Lindström	2e814d4702	Merge InnoDB 5.7 from mysql-5.7.9. Contains also MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7 The failure happened because 5.7 has changed the signature of the bool handler::primary_key_is_clustered() const virtual function ("const" was added). InnoDB was using the old signature which caused the function not to be used. MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7 Fixed mutexing problem on lock_trx_handle_wait. Note that rpl_parallel and rpl_optimistic_parallel tests still fail. MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan) Reason: incorrect merge MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan) Reason: incorrect merge	2016-09-02 13:22:28 +03:00
Sergei Golubchik	6d06fbbd1d	move to storage/innobase	2015-05-04 19:17:21 +02:00
Sergei Golubchik	8ee9d19607	innodb 5.6.17	2014-05-07 17:32:23 +02:00
Sergei Golubchik	e2e5d07b28	MDEV-6184 10.0.11 merge InnoDB 5.6.16	2014-05-06 09:57:39 +02:00
Sergei Golubchik	27d45e4696	MDEV-5574 Set AUTO_INCREMENT below max value of column. Update InnoDB to 5.6.14 Apply MySQL-5.6 hack for MySQL Bug#16434374 Move Aria-only HA_RTREE_INDEX from my_base.h to maria_def.h (breaks an assert in InnoDB) Fix InnoDB memory leak	2014-02-01 09:33:26 +01:00
Michael Widenius	068c61978e	Temporary commit of 10.0-merge	2013-03-26 00:03:13 +02:00
Michael Widenius	1d0f70c2f8	Temporary commit of merge of MariaDB 10.0-base and MySQL 5.6	2012-08-01 17:27:34 +03:00

33 commits