mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 11:01:52 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	2b6f804490	Merge 10.2 into 10.3	2020-10-28 10:44:40 +02:00
Marko Mäkelä	a8de8f261d	Merge 10.2 into 10.3	2020-10-28 10:01:50 +02:00
Eugene Kosov	afc9d00c66	MDEV-23991 dict_table_stats_lock() has unnecessarily long scope Patch removes dict_index_t::stats_latch. Table/index statistics now protected with dict_sys->mutex. That way statistics computation can happen in parallel in several threads and dict_sys->mutex will be locked only for a short period of time. This patch is a joint work with Marko Mäkelä dict_index_t:🔒 make mutable which allows to pass const pointer when only lock is touched in an object btr_height_get() btr_get_size(): make index argument const for better type safety btr_estimate_number_of_different_key_vals(): now returns computed values instead of setting fields in dict_index_t directly remove everything related to dict_index_t::stats_latch dict_stats_index_set_n_diff(): now returns computed values instead of setting fields in dict_index_t directly dict_stats_analyze_index(): now returns computed values instead of setting fields in dict_index_t directly Reviewed by: Marko Mäkelä	2020-10-27 19:09:20 +03:00
Marko Mäkelä	42e1815ad8	MDEV-16952 Introduce SET GLOBAL innodb_max_purge_lag_wait Let us introduce a dummy variable innodb_max_purge_lag_wait for waiting that the InnoDB history list length is below the user-specified limit. Specifically, SET GLOBAL innodb_max_purge_lag_wait=0; should wait for all history to be purged. This could be useful when upgrading from an older version to MariaDB 10.3 or later, to avoid hitting MDEV-15912. Note: the history cannot be purged if there exist transactions that may see old versions. Reviewed by: Vladislav Vaintroub	2020-10-27 15:47:18 +02:00
Marko Mäkelä	00ddea4f2f	MDEV-24024 innodb.ibuf_not_empty failed in buildbot Probably due to the changes to page flushing in MDEV-23399 (commit `7cffb5f6e8`) the command CHECK TABLE would occasionally report a different number of rows for the corrupted secondary index. (The reported number was 991 instead of 990 on one occasion.) Let us map all numbers to 990 in the output. We only care that the injected corruption will be detected.	2020-10-27 09:52:42 +02:00
Marko Mäkelä	45ed9dd957	MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contention Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored for system tablespace on SSD When the maximum configured number of file is exceeded, InnoDB will close data files. We used to maintain a fil_system.LRU list and a counter fil_node_t::n_pending to achieve this, at the huge cost of multiple fil_system.mutex operations per I/O operation. fil_node_open_file_low(): Implement a FIFO replacement policy: The last opened file will be moved to the end of fil_system.space_list, and files will be closed from the start of the list. However, we will not move tablespaces in fil_system.space_list while i_s_tablespaces_encryption_fill_table() is executing (producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION) because it may cause information of some tablespaces to go missing. We also avoid this in mariabackup --backup because datafiles_iter_next() assumes that the ordering is not changed. IORequest: Fold more parameters to IORequest::type. fil_space_t::io(): Replaces fil_io(). fil_space_t::flush(): Replaces fil_flush(). OS_AIO_IBUF: Remove. We will always issue synchronous reads of the change buffer pages in buf_read_page_low(). We will always ignore some errors for background reads. This should reduce fil_system.mutex contention a little. fil_node_t::complete_write(): Replaces fil_node_t::complete_io(). On both read and write completion, fil_space_t::release_for_io() will have to be called. fil_space_t::io(): Do not acquire fil_system.mutex in the normal code path. xb_delta_open_matching_space(): Do not try to open the system tablespace which was already opened. This fixes a file sharing violation in mariabackup --prepare --incremental. Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	3a9a3be1c6	MDEV-23855: Improve InnoDB log checkpoint performance After MDEV-15053, MDEV-22871, MDEV-23399 shifted the scalability bottleneck, log checkpoints became a new bottleneck. If innodb_io_capacity is set low or innodb_max_dirty_pct_lwm is set high and the workload fits in the buffer pool, the page cleaner thread will perform very little flushing. When we reach the capacity of the circular redo log file ib_logfile0 and must initiate a checkpoint, some 'furious flushing' will be necessary. (If innodb_flush_sync=OFF, then flushing would continue at the innodb_io_capacity rate, and writers would be throttled.) We have the best chance of advancing the checkpoint LSN immediately after a page flush batch has been completed. Hence, it is best to perform checkpoints after every batch in the page cleaner thread, attempting to run once per second. By initiating high-priority flushing in the page cleaner as early as possible, we aim to make the throughput more stable. The function buf_flush_wait_flushed() used to sleep for 10ms, hoping that the page cleaner thread would do something during that time. The observed end result was that a large number of threads that call log_free_check() would end up sleeping while nothing useful is happening. We will revise the design so that in the default innodb_flush_sync=ON mode, buf_flush_wait_flushed() will wake up the page cleaner thread to perform the necessary flushing, and it will wait for a signal from the page cleaner thread. If innodb_io_capacity is set to a low value (causing the page cleaner to throttle its work), a write workload would initially perform well, until the capacity of the circular ib_logfile0 is reached and log_free_check() will trigger checkpoints. At that point, the extra waiting in buf_flush_wait_flushed() will start reducing throughput. The page cleaner thread will also initiate log checkpoints after each buf_flush_lists() call, because that is the best point of time for the checkpoint LSN to advance by the maximum amount. Even in 'furious flushing' mode we invoke buf_flush_lists() with innodb_io_capacity_max pages at a time, and at the start of each batch (in the log_flush() callback function that runs in a separate task) we will invoke os_aio_wait_until_no_pending_writes(). This tweak allows the checkpoint to advance in smaller steps and significantly reduces the maximum latency. On an Intel Optane 960 NVMe SSD on Linux, it reduced from 4.6 seconds to 74 milliseconds. On Microsoft Windows with a slower SSD, it reduced from more than 180 seconds to 0.6 seconds. We will make innodb_adaptive_flushing=OFF simply flush innodb_io_capacity per second whenever the dirty proportion of buffer pool pages exceeds innodb_max_dirty_pages_pct_lwm. For innodb_adaptive_flushing=ON we try to make page_cleaner_flush_pages_recommendation() more consistent and predictable: if we are below innodb_adaptive_flushing_lwm, let us flush pages according to the return value of af_get_pct_for_dirty(). innodb_max_dirty_pages_pct_lwm: Revert the change of the default value that was made in MDEV-23399. The value innodb_max_dirty_pages_pct_lwm=0 guarantees that a shutdown of an idle server will be fast. Users might be surprised if normal shutdown suddenly became slower when upgrading within a GA release series. innodb_checkpoint_usec: Remove. The master task will no longer perform periodic log checkpoints. It is the duty of the page cleaner thread. log_sys.max_modified_age: Remove. The current span of the buf_pool.flush_list expressed in LSN only matters for adaptive flushing (outside the 'furious flushing' condition). For the correctness of checkpoints, the only thing that matters is the checkpoint age (log_sys.lsn - log_sys.last_checkpoint_lsn). This run-time constant was also reported as log_max_modified_age_sync. log_sys.max_checkpoint_age_async: Remove. This does not serve any purpose, because the checkpoints will now be triggered by the page cleaner thread. We will retain the log_sys.max_checkpoint_age limit for engaging 'furious flushing'. page_cleaner.slot: Remove. It turns out that page_cleaner_slot.flush_list_time was duplicating page_cleaner.slot.flush_time and page_cleaner.slot.flush_list_pass was duplicating page_cleaner.flush_pass. Likewise, there were some redundant monitor counters, because the page cleaner thread no longer performs any buf_pool.LRU flushing, and because there only is one buf_flush_page_cleaner thread. buf_flush_sync_lsn: Protect writes by buf_pool.flush_list_mutex. buf_pool_t::get_oldest_modification(): Add a parameter to specify the return value when no persistent data pages are dirty. Require the caller to hold buf_pool.flush_list_mutex. log_buf_pool_get_oldest_modification(): Take the fall-back LSN as a parameter. All callers will also invoke log_sys.get_lsn(). log_preflush_pool_modified_pages(): Replaced with buf_flush_wait_flushed(). buf_flush_wait_flushed(): Implement two limits. If not enough buffer pool has been flushed, signal the page cleaner (unless innodb_flush_sync=OFF) and wait for the page cleaner to complete. If the page cleaner thread is not running (which can be the case durign shutdown), initiate the flush and wait for it directly. buf_flush_ahead(): If innodb_flush_sync=ON (the default), submit a new buf_flush_sync_lsn target for the page cleaner but do not wait for the flushing to finish. log_get_capacity(), log_get_max_modified_age_async(): Remove, to make it easier to see that af_get_pct_for_lsn() is not acquiring any mutexes. page_cleaner_flush_pages_recommendation(): Protect all access to buf_pool.flush_list with buf_pool.flush_list_mutex. Previously there were some race conditions in the calculation. buf_flush_sync_for_checkpoint(): New function to process buf_flush_sync_lsn in the page cleaner thread. At the end of each batch, we try to wake up any blocked buf_flush_wait_flushed(). If everything up to buf_flush_sync_lsn has been flushed, we will reset buf_flush_sync_lsn=0. The page cleaner thread will keep 'furious flushing' until the limit is reached. Any threads that are waiting in buf_flush_wait_flushed() will be able to resume as soon as their own limit has been satisfied. buf_flush_page_cleaner: Prioritize buf_flush_sync_lsn and do not sleep as long as it is set. Do not update any page_cleaner statistics for this special mode of operation. In the normal mode (buf_flush_sync_lsn is not set for innodb_flush_sync=ON), try to wake up once per second. No longer check whether srv_inc_activity_count() has been called. After each batch, try to perform a log checkpoint, because the best chances for the checkpoint LSN to advance by the maximum amount are upon completing a flushing batch. log_t: Move buf_free, max_buf_free possibly to the same cache line with log_sys.mutex. log_margin_checkpoint_age(): Simplify the logic, and replace a 0.1-second sleep with a call to buf_flush_wait_flushed() to initiate flushing. Moved to the same compilation unit with the only caller. log_close(): Clean up the calculations. (Should be no functional change.) Return whether flush-ahead is needed. Moved to the same compilation unit with the only caller. mtr_t::finish_write(): Return whether flush-ahead is needed. mtr_t::commit(): Invoke buf_flush_ahead() when needed. Let us avoid external calls in mtr_t::commit() and make the logic easier to follow by having related code in a single compilation unit. Also, we will invoke srv_stats.log_write_requests.inc() only once per mini-transaction commit, while not holding mutexes. log_checkpoint_margin(): Only care about log_sys.max_checkpoint_age. Upon reaching log_sys.max_checkpoint_age where we must wait to prevent the log from getting corrupted, let us wait for at most 1MiB of LSN at a time, before rechecking the condition. This should allow writers to proceed even if the redo log capacity has been reached and 'furious flushing' is in progress. We no longer care about log_sys.max_modified_age_sync or log_sys.max_modified_age_async. The log_sys.max_modified_age_sync could be a relic from the time when there was a srv_master_thread that wrote dirty pages to data files. Also, we no longer have any log_sys.max_checkpoint_age_async limit, because log checkpoints will now be triggered by the page cleaner thread upon completing buf_flush_lists(). log_set_capacity(): Simplify the calculations of the limit (no functional change). log_checkpoint_low(): Split from log_checkpoint(). Moved to the same compilation unit with the caller. log_make_checkpoint(): Only wait for everything to be flushed until the current LSN. create_log_file(): After checkpoint, invoke log_write_up_to() to ensure that the FILE_CHECKPOINT record has been written. This avoids ut_ad(!srv_log_file_created) in create_log_file_rename(). srv_start(): Do not call recv_recovery_from_checkpoint_start() if the log has just been created. Set fil_system.space_id_reuse_warned before dict_boot() has been executed, and clear it after recovery has finished. dict_boot(): Initialize fil_system.max_assigned_id. srv_check_activity(): Remove. The activity count is counting transaction commits and therefore mostly interesting for the purge of history. BtrBulk::insert(): Do not explicitly wake up the page cleaner, but do invoke srv_inc_activity_count(), because that counter is still being used in buf_load_throttle_if_needed() for some heuristics. (It might be cleaner to execute buf_load() in the page cleaner thread!) Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Eugene Kosov	31cde275c2	MDEV-23356 InnoDB: Failing assertion: field->col->mtype == type, crash or ASAN failures in row_sel_convert_mysql_key_to_innobase, InnoDB indexes are inconsistent after INDEX changes innobase_rename_indexes_cache(): fix corruption of index cache. Index ids help distinguish indexes when their names clash. innobase_rename_indexes_cache(): fix corruption of index statistics table. Use unique temporary names to avoid names clashing. Reviewed by: Marko Mäkelä	2020-10-26 17:39:52 +03:00
Sergei Golubchik	05a878c139	precedence bugfixing fix printing precedence for BETWEEN, LIKE/ESCAPE, REGEXP, IN don't use precedence for printing CASE/WHEN/THEN/ELSE/END fix parsing precedence of BETWEEN, LIKE/ESCAPE, REGEXP, IN support predicate arguments for IN, BETWEEN, SOUNDS LIKE, LIKE/ESCAPE, REGEXP use %nonassoc for unary operators fix parsing of IS TRUE/FALSE/UNKNOWN/NULL remove parser_precedence test as superseded by the precedence test	2020-10-23 15:53:41 +02:00
Aleksey Midenkov	54c521ca33	MDEV-23672: incorrect v_indexes access fix v_indexes was wrongly copied from the source table.	2020-10-22 22:02:22 +03:00
Marko Mäkelä	1657b7a583	Merge 10.4 to 10.5	2020-10-22 17:08:49 +03:00
Marko Mäkelä	46957a6a77	Merge 10.3 into 10.4	2020-10-22 13:27:18 +03:00
Marko Mäkelä	1619ae8990	MDEV-23672: Partly revert an incorrect fix In commit `7eda556196` we removed a valid debug assertion. AddressSanitizer would trip when writing undo log for an INSERT operation that follows the problematic ALTER TABLE because the v_indexes would refer to an index object that was freed during the execution of ALTER TABLE. The operation to remove NOT NULL attribute from a column should lead into any affected indexes to be dropped and re-created. But, the SQL layer set the ha_alter_info->handler_flags to HA_INPLACE_ADD_UNIQUE_INDEX_NO_WRITE\|ALTER_COLUMN_NULLABLE (that is, InnoDB was asked to add an index and change the column, but not to drop the index that depended on the column). Let us restore the debug assertion that catches this problem outside AddressSanitizer, and 'defuse' the offending ALTER TABLE statement in the test until something has been fixed in the SQL layer.	2020-10-22 13:19:18 +03:00
Marko Mäkelä	e3d692aa09	Merge 10.2 into 10.3	2020-10-22 08:26:28 +03:00
Marko Mäkelä	620ea816ad	Merge 10.1 into 10.2	2020-10-21 14:02:04 +03:00
Monty	f9c432c5a9	Disable from valgrind big innodb tests that doesn't run well in valgrind	2020-10-21 03:09:29 +03:00
Aleksey Midenkov	7eda556196	MDEV-23672 Assertion `v.v_indexes.empty()' failed in dict_table_t::instant_column dict_v_idx_t node was shared between two dict_v_col_t objects because of wrong object copy. Replace memory plain copy with copy constructor. Tha patch also removes n_v_indexes property and improves "page full" judgements for trx_undo_log_v_idx().	2020-10-20 10:57:57 +03:00
Aleksey Midenkov	d1667fb837	MDEV-23852 alter table rename column to uppercase doesn't work Case-sensitive compare to detect column name case change in inplace alter rename.	2020-10-20 11:16:40 +03:00
Marko Mäkelä	7cffb5f6e8	MDEV-23399: Performance regression with write workloads The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted the performance bottleneck to the page flushing. The configuration parameters will be changed as follows: innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction) innodb_lru_scan_depth=1536 (old: 1024) innodb_max_dirty_pages_pct=90 (old: 75) innodb_max_dirty_pages_pct_lwm=75 (old: 0) Note: The parameter innodb_lru_scan_depth will only affect LRU eviction of buffer pool pages when a new page is being allocated. The page cleaner thread will no longer evict any pages. It used to guarantee that some pages will remain free in the buffer pool. Now, we perform that eviction 'on demand' in buf_LRU_get_free_block(). The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows: * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks() * As a buf_pool.free limit in buf_LRU_list_batch() for terminating the flushing that is initiated e.g., by buf_LRU_get_free_block() The parameter also used to serve as an initial limit for unzip_LRU eviction (evicting uncompressed page frames while retaining ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit of 100 or unlimited for invoking buf_LRU_scan_and_free_block(). The status variables will be changed as follows: innodb_buffer_pool_pages_flushed: This includes also the count of innodb_buffer_pool_pages_LRU_flushed and should work reliably, updated one by one in buf_flush_page() to give more real-time statistics. The function buf_flush_stats(), which we are removing, was not called in every code path. For both counters, we will use regular variables that are incremented in a critical section of buf_pool.mutex. Note that show_innodb_vars() directly links to the variables, and reads of the counters will not be protected by buf_pool.mutex, so you cannot get a consistent snapshot of both variables. The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed, because the page cleaner no longer deals with writing or evicting least recently used pages, and because the single-page writes have been removed: * buffer_LRU_batch_flush_avg_time_slot * buffer_LRU_batch_flush_avg_time_thread * buffer_LRU_batch_flush_avg_time_est * buffer_LRU_batch_flush_avg_pass * buffer_LRU_single_flush_scanned * buffer_LRU_single_flush_num_scan * buffer_LRU_single_flush_scanned_per_call When moving to a single buffer pool instance in MDEV-15058, we missed some opportunity to simplify the buf_flush_page_cleaner thread. It was unnecessarily using a mutex and some complex data structures, even though we always have a single page cleaner thread. Furthermore, the buf_flush_page_cleaner thread had separate 'recovery' and 'shutdown' modes where it was waiting to be triggered by some other thread, adding unnecessary latency and potential for hangs in relatively rarely executed startup or shutdown code. The page cleaner was also running two kinds of batches in an interleaved fashion: "LRU flush" (writing out some least recently used pages and evicting them on write completion) and the normal batches that aim to increase the MIN(oldest_modification) in the buffer pool, to help the log checkpoint advance. The buf_pool.flush_list flushing was being blocked by buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit `d1ab89037a` unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message will be removed, because by design, the buf_flush_page_cleaner() should not be blocked during a batch for extended periods of time. We will remove the single-page flushing altogether. Related to this, the debug parameter innodb_doublewrite_batch_size will be removed, because all of the doublewrite buffer will be used for flushing batches. If a page needs to be evicted from the buffer pool and all 100 least recently used pages in the buffer pool have unflushed changes, buf_LRU_get_free_block() will execute buf_flush_lists() to write out and evict innodb_lru_flush_size pages. At most one thread will execute buf_flush_lists() in buf_LRU_get_free_block(); other threads will wait for that LRU flushing batch to finish. To improve concurrency, we will replace the InnoDB ib_mutex_t and os_event_t native mutexes and condition variables in this area of code. Most notably, this means that the buffer pool mutex (buf_pool.mutex) is no longer instrumented via any InnoDB interfaces. It will continue to be instrumented via PERFORMANCE_SCHEMA. For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical sections of buf_pool.flush_list_mutex should be shorter than those for buf_pool.mutex, because in the worst case, they cover a linear scan of buf_pool.flush_list, while the worst case of a critical section of buf_pool.mutex covers a linear scan of the potentially much longer buf_pool.LRU list. mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable with SAFE_MUTEX. Some InnoDB debug assertions need this predicate instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner(). buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list: Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[]. The number of active flush operations. buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA and SAFE_MUTEX instrumentation. buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU. buf_pool_t::done_flush_list: Condition variable for !n_flush_list. buf_pool_t::do_flush_list: Condition variable to wake up the buf_flush_page_cleaner when a log checkpoint needs to be written or the server is being shut down. Replaces buf_flush_event. We will keep using timed waits (the page cleaner thread will wake _at least_ once per second), because the calculations for innodb_adaptive_flushing depend on fixed time intervals. buf_dblwr: Allocate statically, and move all code to member functions. Use a native mutex and condition variable. Remove code to deal with single-page flushing. buf_dblwr_check_block(): Make the check debug-only. We were spending a significant amount of execution time in page_simple_validate_new(). flush_counters_t::unzip_LRU_evicted: Remove. IORequest: Make more members const. FIXME: m_fil_node should be removed. buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex (which we are removing). page_cleaner_slot_t, page_cleaner_t: Remove many redundant members. pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot(). recv_writer_thread: Remove. Recovery works just fine without it, if we simply invoke buf_flush_sync() at the end of each batch in recv_sys_t::apply(). recv_recovery_from_checkpoint_finish(): Remove. We can simply call recv_sys.debug_free() directly. srv_started_redo: Replaces srv_start_state. SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown() can communicate with the normal page cleaner loop via the new function flush_buffer_pool(). buf_flush_remove(): Assert that the calling thread is holding buf_pool.flush_list_mutex. This removes unnecessary mutex operations from buf_flush_remove_pages() and buf_flush_dirty_pages(), which replace buf_LRU_flush_or_remove_pages(). buf_flush_lists(): Renamed from buf_flush_batch(), with simplified interface. Return the number of flushed pages. Clarified comments and renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this function, which was their only caller, and remove 2 unnecessary buf_pool.mutex release/re-acquisition that we used to perform around the buf_flush_batch() call. At the start, if not all log has been durably written, wait for a background task to do it, or start a new task to do it. This allows the log write to run concurrently with our page flushing batch. Any pages that were skipped due to too recent FIL_PAGE_LSN or due to them being latched by a writer should be flushed during the next batch, unless there are further modifications to those pages. It is possible that a page that we must flush due to small oldest_modification also carries a recent FIL_PAGE_LSN or is being constantly modified. In the worst case, all writers would then end up waiting in log_free_check() to allow the flushing and the checkpoint to complete. buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_flush_space(): Auxiliary function to look up a tablespace for page flushing. buf_flush_page(): Defer the computation of space->full_crc32(). Never call log_write_up_to(), but instead skip persistent pages whose latest modification (FIL_PAGE_LSN) is newer than the redo log. Also skip pages on which we cannot acquire a shared latch without waiting. buf_flush_try_neighbors(): Do not bother checking buf_fix_count because buf_flush_page() will no longer wait for the page latch. Take the tablespace as a parameter, and only execute this function when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold(). buf_flush_relocate_on_flush_list(): Declare as cold, and push down a condition from the callers. buf_flush_check_neighbor(): Take id.fold() as a parameter. buf_flush_sync(): Ensure that the buf_pool.flush_list is empty, because the flushing batch will skip pages whose modifications have not yet been written to the log or were latched for modification. buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables. buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize the counters, and report n->evicted. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_do_LRU_batch(): Return the number of pages flushed. buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if adaptive hash index entries are pointing to the block. buf_LRU_get_free_block(): Do not wake up the page cleaner, because it will no longer perform any useful work for us, and we do not want it to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0) writes out and evicts at most innodb_lru_flush_size pages. (The function buf_do_LRU_batch() may complete after writing fewer pages if more than innodb_lru_scan_depth pages end up in buf_pool.free list.) Eliminate some mutex release-acquire cycles, and wait for the LRU flush batch to complete before rescanning. buf_LRU_check_size_of_non_data_objects(): Simplify the code. buf_page_write_complete(): Remove the parameter evict, and always evict pages that were part of an LRU flush. buf_page_create(): Take a pre-allocated page as a parameter. buf_pool_t::free_block(): Free a pre-allocated block. recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block while not holding recv_sys.mutex. During page allocation, we may initiate a page flush, which in turn may initiate a log flush, which would require acquiring log_sys.mutex, which should always be acquired before recv_sys.mutex in order to avoid deadlocks. Therefore, we must not be holding recv_sys.mutex while allocating a buffer pool block. BtrBulk::logFreeCheck(): Skip a redundant condition. row_undo_step(): Do not invoke srv_inc_activity_count() for every row that is being rolled back. It should suffice to invoke the function in trx_flush_log_if_needed() during trx_t::commit_in_memory() when the rollback completes. sync_check_enable(): Remove. We will enable innodb_sync_debug from the very beginning. Reviewed by: Vladislav Vaintroub	2020-10-15 17:04:56 +03:00
Marko Mäkelä	a891fe6aa5	MDEV-23927 Crash in ./mtr --skip-innodb-fast-shutdown innodb.temporary_tables innodb_preshutdown(): On innodb_fast_shutdown=0, only wait for transactions to exit if the transaction system had been initialized. Reviewed by: Vladislav Vaintroub	2020-10-09 12:47:58 +03:00
Thirunarayanan Balathandayuthapani	6504d3d229	MDEV-23722 InnoDB: Assertion: result != FTS_INVALID in fts_trx_row_get_new_state Marking of deletion of row in fts index happens twice in self-referential foreign key relation. So while performing referential checks of foreign key, InnoDB can avoid updating of fts index if the foreign key has self-referential relationship. Reviewed-by: Marko Mäkelä	2020-10-08 19:41:03 +05:30
Marko Mäkelä	1595189250	MDEV-23897 SIGSEGV on commit with innodb_lock_schedule_algorithm=VATS This regression for debug builds was introduced by MDEV-23101 (commit `224c950462`). Due to MDEV-16664, the parameter innodb_lock_schedule_algorithm=VATS is not enabled by default. The purpose of the added assertions was to enforce the invariant that Galera replication cannot be enabled together with VATS due to MDEV-12837. However, upon closer inspection, it is obvious that the variable 'lock' may be assigned to the null pointer if no match is found in the previous->hash list. lock_grant_and_move_on_page(), lock_grant_and_move_on_rec(): Assert !lock->trx->is_wsrep() only after ensuring that lock is not a null pointer.	2020-10-06 22:35:43 +03:00
Marko Mäkelä	b4fb15ccd4	MDEV-16664: Remove innodb_lock_schedule_algorithm The setting innodb_lock_schedule_algorithm=VATS that was introduced in MDEV-11039 (commit `021212b525`) causes conflicting exclusive locks to be incorrectly granted to two transactions. Specifically, in lock_rec_insert_by_trx_age() the predicate !lock_rec_has_to_wait_in_queue(in_lock) would hold even though an active transaction is already holding an exclusive lock. This was observed between two DELETE of the same clustered index record. The HASH_DELETE invocation in lock_rec_enqueue_waiting() may be related. Due to lack of progress in diagnosing the problem, we will remove the option. The unsafe option was enabled by default between commit `0c15d1a6ff` (MariaDB 10.2.3) and the parent of commit `1cc1d0429d` (MariaDB 10.2.17, 10.3.9), and it was deprecated in commit `295e2d500b` (MariaDB 10.2.34).	2020-10-05 10:30:26 +03:00
Marko Mäkelä	295e2d500b	MDEV-16664: Add deprecation warning for innodb_lock_schedule_algorithm=VATS The setting innodb_lock_schedule_algorithm=VATS that was introduced in MDEV-11039 (commit `021212b525`) causes conflicting exclusive locks to be incorrectly granted to two transactions. Specifically, in lock_rec_insert_by_trx_age() the predicate !lock_rec_has_to_wait_in_queue(in_lock) would hold even though an active transaction is already holding an exclusive lock. This was observed between two DELETE of the same clustered index record. The HASH_DELETE invocation in lock_rec_enqueue_waiting() may be related. Due to lack of progress in diagnosing the problem, we will deprecate the option and issue a warning that using it may corrupt data. The unsafe option was enabled between commit `0c15d1a6ff` (MariaDB 10.2.3) and the parent of commit `1cc1d0429d` (MariaDB 10.2.17, 10.3.9).	2020-10-05 09:34:27 +03:00
Marko Mäkelä	1c65ab26de	Merge 10.4 into 10.5	2020-10-01 14:30:17 +03:00
Marko Mäkelä	9ae608d24d	Merge 10.3 into 10.4	2020-10-01 13:41:36 +03:00
Daniel Black	904b811636	mtr: innodb_stats_dropped_locked cleanup As discovered in later test, this test doesn't remove the innodb_{index,table}_stats entries generated in the test upon completion.	2020-10-01 16:10:14 +10:00
Marko Mäkelä	323500bfa9	Merge 10.2 into 10.3	2020-09-30 16:25:06 +03:00
Sujatha	25ede13611	Merge branch '10.4' into 10.5	2020-09-29 16:59:36 +05:30
Marko Mäkelä	a441b06489	Merge 10.1 into 10.2	2020-09-29 10:04:37 +03:00
Thirunarayanan Balathandayuthapani	fdb3c64e42	MDEV-22277 LeakSanitizer: detected memory leaks in mem_heap_create_block_func after attempt to create foreign key - During online DDL, prepare phase error handler fails to remove the memory allocated for newly created foreign keys.	2020-09-28 17:04:02 +05:30
Thirunarayanan Balathandayuthapani	e8b05ce503	MDEV-23675 Assertion `pos < table->n_def' fails in dict_table_get_nth_col - During insertion of clustered inde, InnoDB does the check for foreign key constraints. It rebuild the tuple based on foreign column names when there is no foreign index. While rebuilding, InnoDB should ignore the dropped columns.	2020-09-25 21:42:24 +05:30
Marko Mäkelä	6ce0a6f9ad	Merge 10.5 into 10.6	2020-09-24 10:21:26 +03:00
Marko Mäkelä	882ce206db	Merge 10.4 into 10.5	2020-09-23 11:32:43 +03:00
Marko Mäkelä	55e48b7722	Merge 10.3 into 10.4	2020-09-22 15:12:59 +03:00
Marko Mäkelä	fde3d895d9	Merge 10.2 into 10.3	2020-09-22 14:33:03 +03:00
Marko Mäkelä	3eb81136e1	MDEV-22939 Server crashes in row_make_new_pathname() The statement ALTER TABLE...DISCARD TABLESPACE is problematic, because its designed purpose is to break the referential integrity of the data dictionary and make a table point to nowhere. ha_innobase::commit_inplace_alter_table(): Check whether the table has been discarded. (This is a bit late to check it, right before committing the change.) Previously, we performed this check only in a specific branch of the function commit_set_autoinc(). Note: We intentionally allow non-rebuilding ALTER TABLE even if the tablespace has been discarded, to remain compatible with MySQL. (See the various tests with "wl5522" in the name, such as innodb.innodb-wl5522.) The test case would crash starting with 10.3 only, but it does not hurt to minimize the code and test difference between 10.2 and 10.3.	2020-09-22 13:40:05 +03:00
Marko Mäkelä	732cd7fd53	MDEV-23705 Assertion 'table->data_dir_path \|\| !space' After DISCARD TABLESPACE, the tablespace of a table will no longer exist, and dict_get_and_save_data_dir_path() would invoke dict_get_first_path() to read an entry from SYS_DATAFILES. For some reason, DISCARD TABLESPACE would not to remove the entry from there. dict_get_and_save_data_dir_path(): If the tablespace has been discarded, do not bother trying to read the name. Side note: The tables SYS_TABLESPACES and SYS_DATAFILES are redundant and subject to removal in MDEV-22343.	2020-09-22 11:13:51 +03:00
Marko Mäkelä	3a423088ac	Merge 10.3 into 10.4	2020-09-21 12:29:00 +03:00
Marko Mäkelä	cbcb4ecabb	Merge 10.2 into 10.3	2020-09-21 11:04:04 +03:00
Marko Mäkelä	ab56cbcd81	Stabilize and clean up a test The counter buffer_flush_background_total_pages may be unreliable, because pages can be flushed in different means. So, let us only check INNODB_BUFFER_POOL_PAGES_FLUSHED.	2020-09-17 14:09:24 +03:00
Marko Mäkelä	d61c5696fa	Make a test more robust Change buffering will occasionally happen if other pages are not flushed quickly enough.	2020-09-17 14:08:41 +03:00
Thirunarayanan Balathandayuthapani	f19da4a05a	MDEV-23199 page_compression flag is missing for full_crc32 tablespace Problem: ====== Making the tablespace as page_compressed doesn't do table rebuild. It does change only the FSP_SPACE_FLAGS. During recovery: 1) InnoDB encounters FILE_CREATE redo log and opens the tablespace with old FSP_SPACE_FLAGS value. 2) Only parsing of redo log has been finished. Now InnoDB tries to load the table. If the existing tablespace flags doesn't match with table flags then InnoDB should read page0. But in fsp_flags_try_adjust(), skips the page read for full_crc32 format. 3) After that, InnoDB tries to open the clustered index and it leads to failure of page validation. Fix: === While parsing the redo log record, track FSP_SPACE_FLAGS in recv_spaces for the respective space id. Assign the flags for the tablespace when it is loaded. recv_parse_set_size_and_flags(): Parse the redo log to set the tablespace recovery size and flags. fil_space_set_recv_size_and_flags(): Changed from fil_space_set_recv_size(). To set the recovery size and flags of the tablespace. Introduce flags variable in file_name_t to maintain the tablespace flag which we encountered during parsing of redo log. is_flags_full_crc32_equal(), is_flags_non_full_crc32_equal(): Rename the variable page_ssize and space_page_ssize with fcrc32_psize and non_fcrc32_psize.	2020-09-10 15:26:43 +05:30
Thirunarayanan Balathandayuthapani	75e82f71f1	MDEV-18867 Long Time to Stop and Start fts_drop_orphaned_tables() takes long time to remove the orphaned FTS tables. In order to reduce the time, do the following: - Traverse fil_system.space_list and construct a set of table_id,index_id of all FTS_*.ibd tablespaces. - Traverse the sys_indexes table and ignore the entry from the above collection if it exist. - Existing elements in the collection can be considered as orphaned fts tables. construct the table name from (table_id,index_id) and invoke fts_drop_tables(). - Removed DICT_TF2_FTS_AUX_HEX_NAME flag usage from upgrade. - is_aux_table() in dict_table_t to check whether the given name is fts auxiliary table fts_space_set_t is a structure to store set of parent table id and index id - Remove unused FTS function in fts0fts.cc - Remove the fulltext index in row_format_redundant test case. Because it deals with the condition that SYS_TABLES does have corrupted entry and valid entry exist in SYS_INDEXES.	2020-09-10 14:10:26 +05:30
Marko Mäkelä	3421223363	Merge 10.4 into 10.5	2020-09-09 16:57:30 +03:00
Marko Mäkelä	66ae50a564	Merge 10.3 into 10.4	2020-09-09 15:00:21 +03:00
Marko Mäkelä	7e07e38cf6	Merge 10.2 into 10.3	2020-09-09 13:06:46 +03:00
Marko Mäkelä	64c8fa58a8	MDEV-23685 SIGSEGV on ADD FOREIGN KEY after failed ADD KEY dict_foreign_qualify_index(): Reject corrupted or garbage indexes. For index stubs that are created on virtual columns, no dict_field_t::col would be assign. Instead, the entire table definition would be reloaded on a successful operation.	2020-09-09 12:04:42 +03:00
Marko Mäkelä	5ff7e68c7e	Merge 10.4 into 10.5	2020-09-04 18:44:44 +03:00
Marko Mäkelä	c9cf6b13f6	Merge 10.3 into 10.4	2020-09-03 15:53:38 +03:00

1 2 3 4 5 ...

2912 commits