mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-06 15:15:34 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	c5fd9aa562	MDEV-25919: Lock tables before acquiring dict_sys.latch In commit `1bd681c8b3` (MDEV-25506 part 3) we introduced a "fake instant timeout" when a transaction would wait for a table or record lock while holding dict_sys.latch. This prevented a deadlock of the server but could cause bogus errors for operations on the InnoDB persistent statistics tables. A better fix is to ensure that whenever a transaction is being executed in the InnoDB internal SQL parser (which will for now require dict_sys.latch to be held), it will already have acquired all locks that could be required for the execution. So, we will acquire the following locks upfront, before acquiring dict_sys.latch: (1) MDL on the affected user table (acquired by the SQL layer) (2) If applicable (not for RENAME TABLE): InnoDB table lock (3) If persistent statistics are going to be modified: (3.a) MDL_SHARED on mysql.innodb_table_stats, mysql.innodb_index_stats (3.b) exclusive table locks on the statistics tables (4) Exclusive table locks on the InnoDB data dictionary tables (not needed in ANALYZE TABLE and the like) Note: Acquiring exclusive locks on the statistics tables may cause more locking conflicts between concurrent DDL operations. Notably, RENAME TABLE will lock the statistics tables even if no persistent statistics are enabled for the table. DROP DATABASE will only acquire locks on statistics tables if persistent statistics are enabled for the tables on which the SQL layer is invoking ha_innobase::delete_table(). For any "garbage collection" in innodb_drop_database(), a timeout while acquiring locks on the statistics tables will result in any statistics not being deleted for any tables that the SQL layer did not know about. If innodb_defragment=ON, information may be written to the statistics tables even for tables for which InnoDB persistent statistics are disabled. But, DROP TABLE will no longer attempt to delete that information if persistent statistics are not enabled for the table. This change should also fix the hangs related to InnoDB persistent statistics and STATS_AUTO_RECALC (MDEV-15020) as well as a bug that running ALTER TABLE on the statistics tables concurrently with running ALTER TABLE on InnoDB tables could cause trouble. lock_rec_enqueue_waiting(), lock_table_enqueue_waiting(): Do not issue a fake instant timeout error when the transaction is holding dict_sys.latch. Instead, assert that the dict_sys.latch is never being held here. lock_sys_tables(): A new function to acquire exclusive locks on all dictionary tables, in case DROP TABLE or similar operation is being executed. Locking non-hard-coded tables is optional to avoid a crash in row_merge_drop_temp_indexes(). The SYS_VIRTUAL table was introduced in MySQL 5.7 and MariaDB Server 10.2. Normally, we require all these dictionary tables to exist before executing any DDL, but the function row_merge_drop_temp_indexes() is an exception. When upgrading from MariaDB Server 10.1 or MySQL 5.6 or earlier, the table SYS_VIRTUAL would not exist at this point. ha_innobase::commit_inplace_alter_table(): Invoke log_write_up_to() while not holding dict_sys.latch. dict_sys_t::remove(), dict_table_close(): No longer try to drop index stubs that were left behind by aborted online ADD INDEX. Such indexes should be dropped from the InnoDB data dictionary by row_merge_drop_indexes() as part of the failed DDL operation. Stubs for aborted indexes may only be left behind in the data dictionary cache. dict_stats_fetch_from_ps(): Use a normal read-only transaction. ha_innobase::delete_table(), ha_innobase::truncate(), fts_lock_table(): While waiting for purge to stop using the table, do not hold dict_sys.latch. ha_innobase::delete_table(): Implement a work-around for the rollback of ALTER TABLE...ADD PARTITION. MDL_EXCLUSIVE would not be held if ALTER TABLE hits lock_wait_timeout while trying to upgrade the MDL due to a conflicting LOCK TABLES, such as in the first ALTER TABLE in the test case of Bug#53676 in parts.partition_special_innodb. Therefore, we must explicitly stop purge, because it would not be stopped by MDL. dict_stats_func(), btr_defragment_chunk(): Allocate a THD so that we can acquire MDL on the InnoDB persistent statistics tables. mysqltest_embedded: Invoke ha_pre_shutdown() before free_used_memory() in order to avoid ASAN heap-use-after-free related to acquire_thd(). trx_t::dict_operation_lock_mode: Changed the type to bool. row_mysql_lock_data_dictionary(), row_mysql_unlock_data_dictionary(): Implemented as macros. rollback_inplace_alter_table(): Apply an infinite timeout to lock waits. innodb_thd_increment_pending_ops(): Wrapper for thd_increment_pending_ops(). Never attempt async operation for InnoDB background threads, such as the trx_t::commit() in dict_stats_process_entry_from_recalc_pool(). lock_sys_t::cancel(trx_t*): Make dictionary transactions immune to KILL. lock_wait(): Make dictionary transactions immune to KILL, and to lock wait timeout when waiting for locks on dictionary tables. parts.partition_special_innodb: Use lock_wait_timeout=0 to instantly get ER_LOCK_WAIT_TIMEOUT. main.mdl: Filter out MDL on InnoDB persistent statistics tables Reviewed by: Thirunarayanan Balathandayuthapani	2021-08-31 13:54:44 +03:00
Marko Mäkelä	094de71742	MDEV-25919 preparation: Various cleanup que_eval_sql(): Remove the parameter lock_dict. The only caller with lock_dict=true was dict_stats_exec_sql(), which will now explicitly invoke dict_sys.lock() and dict_sys.unlock() by itself. row_import_cleanup(): Do not unnecessarily lock the dictionary. Concurrent access to the table during ALTER TABLE...IMPORT TABLESPACE is prevented by MDL and the fact that there cannot exist any undo log or change buffer records that would refer to the table or tablespace. row_import_for_mysql(): Do not unnecessarily lock the dictionary while accessing fil_system. Thanks to MDL_EXCLUSIVE that was acquired by the SQL layer, only one IMPORT may be in effect for the table name. row_quiesce_set_state(): Do not unnecessarily lock the dictionary. The dict_table_t::quiesce state is documented to be protected by all index latches, which we are acquiring. dict_table_close(): Introduce a simpler variant with fewer parameters. dict_table_close(): Reduce the amount of calls. We can simply invoke dict_table_t::release() on startup or in DDL operations, or when the table is inaccessible. In none of these cases, there is no need to invalidate the InnoDB persistent statistics. pars_info_t::graph_owns_us: Remove (unused). pars_info_free(): Define inline. fts_delete(), trx_t::evict_table(), row_prebuilt_free(), row_rename_table_for_mysql(): Simplify. row_mysql_lock_data_dictionary(): Remove some references; use dict_sys.lock() and dict_sys.unlock() instead. row_mysql_lock_table(): Remove. Use lock_table_for_trx() instead. ha_innobase::check_if_supported_inplace_alter(), row_create_table_for_mysql(): Simply assert dict_sys.sys_tables_exist(). In commit `49e2c8f0a6` and commit `1bd681c8b3` srv_start() actually guarantees that the system tables will exist, or the server is in read-only mode, or startup will fail. Reviewed by: Thirunarayanan Balathandayuthapani	2021-08-31 13:54:20 +03:00
Marko Mäkelä	6a2cd6f4b4	MDEV-19505 Do not hold mutex while calling que_graph_free() sym_tab_free_private(): Do not call dict_table_close(), but simply invoke dict_table_t::release(), which we can do without locking the whole dictionary cache. (Note: On user tables it may still be necessary to invoke dict_table_close(), so that InnoDB persistent statistics will be deinitialized as expected.) fts_check_corrupt(), row_fts_merge_insert(): Invoke aux_table->release() to simplify the code. This is never a user table. fts_que_graph_free(), fts_que_graph_free_check_lock(): Replaced with que_graph_free(). Reviewed by: Thirunarayanan Balathandayuthapani	2021-08-31 13:54:06 +03:00
Marko Mäkelä	82b7c561b7	MDEV-24258 Merge dict_sys.mutex into dict_sys.latch In the parent commit, dict_sys.latch could theoretically have been replaced with a mutex. But, we can do better and merge dict_sys.mutex into dict_sys.latch. Generally, every occurrence of dict_sys.mutex_lock() will be replaced with dict_sys.lock(). The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex will be removed along with dict_sys.mutex. The dict_sys.latch will remain instrumented as dict_operation_lock. Some use of dict_sys.lock() will be replaced with dict_sys.freeze(), which we will reintroduce for the new shared mode. Most notably, concurrent table lookups are possible as long as the tables are present in the dict_sys cache. In particular, this will allow more concurrency among InnoDB purge workers. Because dict_sys.mutex will no longer 'throttle' the threads that purge InnoDB transaction history, a performance degradation may be observed unless innodb_purge_threads=1. The table cache eviction policy will become FIFO-like, similar to what happened to fil_system.LRU in commit `45ed9dd957`. The name of the list dict_sys.table_LRU will become somewhat misleading; that list contains tables that may be evicted, even though the eviction policy no longer is least-recently-used but first-in-first-out. (Note: Tables can never be evicted as long as locks exist on them or the tables are in use by some thread.) As demonstrated by the test perfschema.sxlock_func, there will be less contention on dict_sys.latch, because some previous use of exclusive latches will be replaced with shared latches. fts_parse_sql_no_dict_lock(): Replaced with pars_sql(). fts_get_table_name_prefix(): Merged to fts_optimize_create(). dict_stats_update_transient_for_index(): Deduplicated some code. ha_innobase::info_low(), dict_stats_stop_bg(): Use a combination of dict_sys.latch and table->stats_mutex_lock() to cover the changes of BG_STAT_SHOULD_QUIT, because the flag is being read in dict_stats_update_persistent() while not holding dict_sys.latch. row_discard_tablespace_for_mysql(): Protect stats_bg_flag by exclusive dict_sys.latch, like most other code does. row_quiesce_table_has_fts_index(): Remove unnecessary mutex acquisition. FLUSH TABLES...FOR EXPORT is protected by MDL. row_import::set_root_by_heuristic(): Remove unnecessary mutex acquisition. ALTER TABLE...IMPORT TABLESPACE is protected by MDL. row_ins_sec_index_entry_low(): Replace a call to dict_set_corrupted_index_cache_only(). Reads of index->type were not really protected by dict_sys.mutex, and writes (flagging an index corrupted) should be extremely rare. dict_stats_process_entry_from_defrag_pool(): Only freeze the dictionary, do not lock it exclusively. dict_stats_wait_bg_to_stop_using_table(), DICT_BG_YIELD: Remove trx. We can simply invoke dict_sys.unlock() and dict_sys.lock() directly. dict_acquire_mdl_shared()<trylock=false>: Assert that dict_sys.latch is only held in shared more, not exclusive mode. Only acquire it in exclusive mode if the table needs to be loaded to the cache. dict_sys_t::acquire(): Remove. Relocating elements in dict_sys.table_LRU would require holding an exclusive latch, which we want to avoid for performance reasons. dict_sys_t::allow_eviction(): Add the table first to dict_sys.table_LRU, to compensate for the removal of dict_sys_t::acquire(). This function is only invoked by INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS. dict_table_open_on_id(), dict_table_open_on_name(): If dict_locked=false, try to acquire dict_sys.latch in shared mode. Only acquire the latch in exclusive mode if the table is not found in the cache. Reviewed by: Thirunarayanan Balathandayuthapani	2021-08-31 13:51:35 +03:00
Marko Mäkelä	2e08b6d78c	MDEV-24258 preparation: Remove dict_sys.freeze() and unfreeze() This will essentially make dict_sys.latch a mutex (it is only acquired in exclusive mode). The subsequent commit will merge dict_sys.mutex into dict_sys.latch and reintroduce dict_sys.freeze() for those cases where we currently acquire only dict_sys.latch but not dict_sys.mutex. The case where both are acquired will be mapped to dict_sys.lock(). i_s_sys_tables_fill_table_stats(): Invoke dict_sys.prevent_eviction() and the new function dict_sys.allow_eviction() to avoid table eviction while a row in INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS is being produced. Reviewed by: Thirunarayanan Balathandayuthapani	2021-08-31 13:48:10 +03:00
Marko Mäkelä	49f95c4065	Merge 10.5 into 10.6	2021-08-23 11:21:33 +03:00
Marko Mäkelä	2c9f2a4c8c	Merge 10.4 into 10.5	2021-08-23 11:10:59 +03:00
Marko Mäkelä	2b66cd2493	Merge 10.3 into 10.4	2021-08-23 10:44:06 +03:00
Marko Mäkelä	cfbdb5d210	Merge 10.2 into 10.3	2021-08-23 10:14:01 +03:00
Marko Mäkelä	ca89489716	MDEV-26383 fixup: Consistently protect freed_indexes with autoinc_mutex To avoid potential race conditions between concurrent access to dict_table_t::freed_indexes, let us consistently use dict_table_t::autoinc_mutex. dict_table_remove_from_cache_low(): To avoid extensive hold time of table->autoinc_mutex, unconditionally free the FTS data structures.	2021-08-23 10:06:21 +03:00
Thirunarayanan Balathandayuthapani	08e5a3d2e3	MDEV-26383 ASAN heap-use-after-free failure in btr_search_lazy_free Problem: ======= The last AHI page for two indexes of an dropped table is being freed at the same time by two threads. One thread frees the table heap and other thread tries to access table heap again. It leads to asan failure in btr_search_lazy_free(). Solution: ======== InnoDB uses autoinc_mutex to avoid the race condition in btr_search_lazy_free()	2021-08-21 12:38:10 +05:30
Marko Mäkelä	f3fcf5f45c	Merge 10.5 to 10.6	2021-08-19 12:25:00 +03:00
Marko Mäkelä	4a25957274	Merge 10.4 into 10.5	2021-08-18 18:22:35 +03:00
Marko Mäkelä	f84e28c119	Merge 10.3 into 10.4	2021-08-18 16:51:52 +03:00
Marko Mäkelä	cd65845a0e	Merge 10.2 into 10.3 MDEV-18734 FIXME: vcol.partition triggers ASAN heap-use-after-free	2021-08-18 12:26:58 +03:00
Eugene Kosov	890f2ad769	MDEV-20931 ALTER...IMPORT can crash the server Main idea: don't log-and-crash but propogate error to the upper layers of stack to handle it and show to a user.	2021-08-17 20:28:42 +06:00
Marko Mäkelä	4cd063b9e4	MDEV-26376 pars_info_bind_id() unnecessarily copies strings pars_info_bind_id(): Remove the parameter copy_name. It was always being passed as constant TRUE or true. It turns out that copying the string is completely unnecessary. In all calls except the one in fts_get_select_columns_str() and fts_doc_fetch_by_doc_id(), the parameter is being passed as a compile-time constant, and therefore the pointer cannot become stale. In that special call, the string that is being passed is allocated from the same memory heap that pars_info_bind_id() would have been using. pars_info_add_id(): Remove (unused declaration).	2021-08-16 12:10:20 +03:00
Oleksandr Byelkin	7ae6ef5236	Merge branch '10.5' into 10.6	2021-08-03 11:21:22 +02:00
Oleksandr Byelkin	850b2ba15d	Merge branch '10.4' into 10.5	2021-08-02 16:53:37 +02:00
Marko Mäkelä	89cc633853	MDEV-13564 fixup: Remove unused function fts_check_corrupt() The call to the function fts_check_corrupt() was removed in commit `09af00cbde` already.	2021-08-02 16:39:08 +03:00
Oleksandr Byelkin	ae6bdc6769	Merge branch '10.4' into 10.5	2021-07-31 23:19:51 +02:00
Oleksandr Byelkin	7841a7eb09	Merge branch '10.3' into 10.4	2021-07-31 22:59:58 +02:00
Marko Mäkelä	e305493b1c	MDEV-21175 follow-up: Remove redundant locking; rely on MDL Before entering DML or DDL execution in the storage engine, the SQL layer will have acquired metadata lock (MDL) on the current table name as well as the names of FOREIGN KEY (grand)child tables (that is, tables whose REFERENCES clauses point to the current table). The MDL prevents any metadata changes to these tables, such as RENAME, TRUNCATE, DROP, ALTER. While the MDL on the current table prevents dict_table_t::foreign_set from being modified, it does not prevent the table metadata that the stored pointers are pointing to from being modified. The MDL on the child tables will prevent both dict_table_t::referenced_set as well as the pointed child table metadata from being modified. wsrep_row_upd_index_is_foreign(): Do not unnecessarily acquire the data dictionary latch if Galera replication is not enabled. ha_innobase::can_switch_engines(): Rely on MDL. We are not dereferencing any pointers stored in the sets. row_mysql_freeze_data_dictionary(), row_mysql_unfreeze_data_dictionary(): Remove. row_update_for_mysql(): Call init_fts_doc_id_for_ref() only once. In ALTER TABLE...IMPORT TABLESPACE and FLUSH TABLES...FOR EXPORT the SQL layer is protecting the current table with MDL. We do not need InnoDB latches.	2021-07-29 16:38:24 +03:00
Marko Mäkelä	15363a4f1b	Cleanup: Remove pars_stored_procedure_call() The InnoDB internal SQL parser never supported this syntax.	2021-07-29 15:37:35 +03:00
Marko Mäkelä	f50eb0d398	Merge 10.2 into 10.3	2021-07-27 10:47:17 +03:00
Marko Mäkelä	afe00bb7cc	MDEV-25998 fixup: Avoid a hang btr_scrub_start_space(): Avoid an unnecessary tablespace lookup and related acquisition of fil_system->mutex. In MariaDB Server 10.3 we would get deadlocks between that mutex and a crypt_data mutex. The fix was developed by Thirunarayanan Balathandayuthapani.	2021-07-27 10:44:01 +03:00
Marko Mäkelä	cf1fc59856	MDEV-25594: Improve debug checks trx_t::will_lock: Changed the type to bool. trx_t::is_autocommit_non_locking(): Replaces trx_is_autocommit_non_locking(). trx_is_ac_nl_ro(): Remove (replaced with equivalent assertion expressions). assert_trx_nonlocking_or_in_list(): Remove. Replaced with at least as strict checks in each place. check_trx_state(): Moved to a static function; partially replaced with individual debug assertions implementing equivalent or stricter checks. This is a backport of commit `7b51d11cca` from 10.5.	2021-07-27 08:52:01 +03:00
Marko Mäkelä	b50ea90063	Merge 10.2 into 10.3	2021-07-22 18:57:54 +03:00
Marko Mäkelä	742b3a0d39	MDEV-26205 Merge new release of InnoDB 5.7.35 to 10.2	2021-07-22 18:07:37 +03:00
Jakub Łopuszański	c4295b9be9	Bug #32460315 ONLINE RESIZING BUFFER POOL CAN CRASH CONCURRENT BP LOOKUP This patch changes it so that we do not free old BP `page_hash`, but rather modify it's parameters, during resize. RB: 26084 Reviewed-by: Marcin Babij <marcin.babij@oracle.com> Reviewed-by: Yasufumi Kinoshita <yasufumi.kinoshita@oracle.com> mysql/mysql-server@ea3adc6a11	2021-07-22 18:05:23 +03:00
Marko Mäkelä	124dc0d85b	MDEV-25361 fixup: Fix integer type mismatch InnoDB tablespace identifiers and page numbers are 32-bit numbers. Let us use a 32-bit type for them in innochecksum. The changes in commit `1918bdf32c` broke the build on 32-bit Windows. Thanks to Vicențiu Ciorbaru for an initial version of this fixup.	2021-07-22 17:53:43 +03:00
Marko Mäkelä	641f09398f	Merge 10.5 into 10.6	2021-07-22 10:11:08 +03:00
Marko Mäkelä	82d5994520	MDEV-26110: Do not rely on alignment on static allocation It is implementation-defined whether alignment requirements that are larger than std::max_align_t (typically 8 or 16 bytes) will be honored by the compiler and linker. It turns out that on IBM AIX, both alignas() and MY_ALIGNED() only guarantees alignment up to 16 bytes. For some data structures, specifying alignment to the CPU cache line size (typically 64 or 128 bytes) is a mere performance optimization, and we do not really care whether the requested alignment is guaranteed. But, for the correct operation of direct I/O, we do require that the buffers be aligned at a block size boundary. field_ref_zero: Define as a pointer, not an array. For innochecksum, we can make this point to unaligned memory; for anything else, we will allocate an aligned buffer from the heap. This buffer will be used for overwriting freed data pages when innodb_immediate_scrub_data_uncompressed=ON. And exactly that code hit an assertion failure on AIX, in the test innodb.innodb_scrub. log_sys.checkpoint_buf: Define as a pointer to aligned memory that is allocated from heap. log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf instead of trying to allocate an aligned buffer from the stack.	2021-07-22 10:05:13 +03:00
Marko Mäkelä	ed0a7b1b3f	MDEV-24626 fixup: Remove useless code fil_ibd_create(): Remove code that should have been removed in commit `86dc7b4d4c` already. We no longer wrote an initialized page to the file, but we would still allocate a page image in memory and write it. xb_space_create_file(): Remove an unnecessary page write. (This is a functional change for Mariabackup.)	2021-07-20 17:35:03 +03:00
Vladislav Vaintroub	e7f4daf88c	merge 10.5 to 10.6	2021-07-16 22:12:09 +02:00
Vladislav Vaintroub	fc2ec25733	MDEV-26166 replace log_write_up_to(LSN_MAX,...) with log_buffer_flush_to_disk() Also, remove comparison lsn > flush/write lsn, prior to calling log_write_up_to. The checks and early returns are part of this function.	2021-07-16 18:44:58 +02:00
Marko Mäkelä	b797f217a3	Merge 10.5 into 10.6	2021-07-03 14:54:46 +03:00
Marko Mäkelä	bd5a6403ca	MDEV-26033: Race condition between buf_pool.page_hash and resize() The replacement of buf_pool.page_hash with a different type of hash table in commit `5155a300fa` (MDEV-22871) introduced a race condition with buffer pool resizing. We have an execution trace where buf_pool.page_hash.array is changed to point to something else while page_hash_latch::read_lock() is executing. The same should also affect page_hash_latch::write_lock(). We fix the race condition by never resizing (and reallocating) the buf_pool.page_hash. We assume that resizing the buffer pool is a rare operation. Yes, there might be a performance regression if a server is first started up with a tiny buffer pool, which is later enlarged. In that case, the tiny buf_pool.page_hash.array could cause increased use of the hash bucket lists. That problem can be worked around by initially starting up the server with a larger buffer pool and then shrinking that, until changing to a larger size again. buf_pool_t::resize_hash(): Remove. buf_pool_t::page_hash_table::lock(): Do not attempt to deal with hash table resizing. If we really wanted that in a safe manner, we would probably have to introduce a global rw-lock around the operation, or at the very least, poll buf_pool.resizing, both of which would be detrimental to performance.	2021-07-03 13:58:38 +03:00
Marko Mäkelä	ed6b230744	MDEV-25919 preparation: Remove trx_t::internal With commit `1bd681c8b3` (MDEV-25506) it no longer is necessary to run DDL and DML operations in separate transactions. Let us remove the flag trx_t::internal. Dictionary transactions will be distinguished by trx_t::dict_operation.	2021-07-01 17:51:55 +03:00
Marko Mäkelä	0a67b15a9d	Cleanup: Remove pointer indirection for trx_t::xid The trx_t::xid is always allocated, so we might as well allocate it directly in the trx_t object to improve the locality of reference.	2021-07-01 16:38:24 +03:00
Marko Mäkelä	8c5c3a4594	MDEV-26067 innodb_lock_wait_timeout values above 100,000,000 are useless The practical maximum value of the parameter innodb_lock_wait_timeout is 100,000,000. Any value larger than that specifies an infinite timeout. Therefore, we should make 100,000,000 the maximum value of the parameter.	2021-07-01 10:31:08 +03:00
Marko Mäkelä	30edd5549d	MDEV-26029: Sparse files are inefficient on thinly provisioned storage The MariaDB implementation of page_compressed tables for InnoDB used sparse files. In the worst case, in the data file, every data page will consist of some data followed by a hole. This may be extremely inefficient in some file systems. If the underlying storage device is thinly provisioned (can compress data on the fly), it would be good to write regular files (with sequences of NUL bytes at the end of each page_compressed block) and let the storage device take care of compressing the data. For reads, sparse file regions and regions containing NUL bytes will be indistinguishable. my_test_if_disable_punch_hole(): A new predicate for detecting thinly provisioned storage. (Not implemented yet.) innodb_atomic_writes: Correct the comment. buf_flush_page(): Support all values of fil_node_t::punch_hole. On a thinly provisioned storage device, we will always write NUL-padded innodb_page_size bytes also for page_compressed tables. buf_flush_freed_pages(): Remove a redundant condition. fil_space_t::atomic_write_supported: Remove. (This was duplicating fil_node_t::atomic_write.) fil_space_t::punch_hole: Remove. (Duplicated fil_node_t::punch_hole.) fil_node_t: Remove magic_n, and consolidate flags into bitfields. For punch_hole we introduce a third value that indicates a thinly provisioned storage device. fil_node_t::find_metadata(): Detect all attributes of the file.	2021-06-29 15:18:22 +03:00
Marko Mäkelä	891a927e80	Merge 10.5 into 10.6	2021-06-26 11:53:28 +03:00
Marko Mäkelä	aa95c42360	Cleanup: Remove unused mtr_block_dirtied	2021-06-26 11:17:05 +03:00
Marko Mäkelä	759deaa0a2	MDEV-26010 fixup: Use acquire/release memory order In commit `5f22511e35` we depend on Total Store Ordering. For correct operation on ISAs that implement weaker memory ordering, we must explicitly use release/acquire stores and loads on buf_page_t::oldest_modification_ to prevent a race condition when buf_page_t::list does not happen to be on the same cache line. buf_page_t::clear_oldest_modification(): Assert that the block is not in buf_pool.flush_list, and use std::memory_order_release. buf_page_t::oldest_modification_acquire(): Read oldest_modification_ with std::memory_order_acquire. In this way, if the return value is 0, the caller may safely assume that it will not observe the buf_page_t as being in buf_pool.flush_list, even if it is not holding buf_pool.flush_list_mutex. buf_flush_relocate_on_flush_list(), buf_LRU_free_page(): Invoke buf_page_t::oldest_modification_acquire().	2021-06-26 11:16:40 +03:00
Marko Mäkelä	a8350cfb5e	Merge 10.5 into 10.6	2021-06-24 21:56:44 +03:00
Marko Mäkelä	5f22511e35	MDEV-26010: Assertion lsn > 2 failed in buf_pool_t::get_oldest_modification In commit `22b62edaed` (MDEV-25113) we introduced a race condition. buf_LRU_free_page() would read buf_page_t::oldest_modification() as 0 and assume that buf_page_t::list can be used (for attaching the block to the buf_pool.free list). In the observed race condition, buf_pool_t::delete_from_flush_list() had cleared the field, and buf_pool_t::delete_from_flush_list_low() was executing concurrently with buf_LRU_block_free_non_file_page(), which resulted in buf_pool.flush_list.end becoming corrupted. buf_pool_t::delete_from_flush_list(), buf_flush_relocate_on_flush_list(): First remove the block from buf_pool.flush_list, and only then invoke buf_page_t::clear_oldest_modification(), to ensure that reading oldest_modification()==0 really implies that the block no longer is in buf_pool.flush_list.	2021-06-24 21:55:10 +03:00
Marko Mäkelä	b4c9cd201b	Merge 10.5 into 10.6	2021-06-24 12:39:34 +03:00
Marko Mäkelä	60ed479711	MDEV-26004 Excessive wait times in buf_LRU_get_free_block() buf_LRU_get_free_block(): Initially wait for a single block to be freed, signaled by buf_pool.done_free. Only if that fails and no LRU eviction flushing batch is already running, we initiate a flushing batch that should serve all threads that are currently waiting in buf_LRU_get_free_block(). Note: In an extreme case, this may introduce a performance regression at larger numbers of connections. We observed this in sysbench oltp_update_index with 512MiB buffer pool, 4GiB of data on fast NVMe, and 1000 concurrent connections, on a 20-thread CPU. The contention point appears to be buf_pool.mutex, and the improvement would turn into a regression somewhere beyond 32 concurrent connections. On slower storage, such regression was not observed; instead, the throughput was improving and maximum latency was reduced. The excessive waits were pointed out by Vladislav Vaintroub.	2021-06-24 11:01:18 +03:00
Marko Mäkelä	101da87228	Merge 10.5 into 10.6	2021-06-23 19:36:45 +03:00

1 2 3 4 5 ...

3,650 commits