mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-04-27 01:19:55 +02:00

Author	SHA1	Message	Date
ParadoxV5	2392bd02d8	Tag the `sql/log.h` family with `ATTRIBUTE_FORMAT` Let GCC `-Wformat` check formats sent to these users of `my_vsnprintf_ex` users (heh)	2025-02-12 10:17:44 +01:00
Sergei Golubchik	9ee09a33bb	Merge branch '11.7' into 11.8	2025-02-11 20:29:43 +01:00
Sergei Golubchik	f1a7693bc0	Merge branch '10.11' into 11.4	2025-01-14 23:45:41 +01:00
Marko Mäkelä	793a2fc8ba	MDEV-35049: Always enable page_cur_search_with_match_bytes() For some reason, page_cur_search_with_match_bytes(), which can speed up append operations (PAGE_CUR_LE used by INSERT), was only enabled if innodb_adaptive_hash_index=ON even though it has nothing to do with the adaptive hash index. Furthermore, mysql/mysql-server@c9bbc83d11 a.k.a. commit `c9bbc83d11` reduced a limit from 3 to 2 but forgot to adjust the PAGE_N_DIRECTION limit accordingly. We are adjusting that as well. Reviewed by: Vladislav Lesin	2025-01-10 16:40:37 +02:00
Marko Mäkelä	4221ed1d7d	MDEV-35049: Avoid building AHI beyond unique field prefix During a workload, an adaptive hash index had been built on UNIQUE INDEX(ID) on SYS_TABLES, and during a DROP TABLE operation the adaptive hash index would be widened to cover also the PRIMARY KEY(NAME) field that the index includes: (ID,NAME). Such an adaptive hash index is unlikely to satisfy (m)any queries. Let us limit the AHI prefix to the unique fields. Reviewed by: Vladislav Lesin	2025-01-10 16:40:35 +02:00
Marko Mäkelä	5f7b2a3ced	MDEV-35049: Improve btr_search_drop_page_hash_index() btr_search_drop_page_hash_index(): Replace the Boolean parameter with const dict_index_t *not_garbage. If buf_block_t::index points to that, there is no need to acquire btr_sea::partition::latch. The old parameter bool garbage_collect=false is equivalent to the parameter not_garbage=nullptr. The parameter garbage_collect=true will be replaced either with the actual index that is associated with the buffer page, or with a bogus pointer not_garbage=-1 to indicate that any lazily entries for a freed index need to be removed. buf_page_get_low(), buf_page_get_gen(), mtr_t::page_lock(), mtr_t::upgrade_buffer_fix(): Do not invoke btr_search_drop_page_hash_index(). Our caller will have to do it when appropriate. buf_page_create_low(): Keep invoking btr_search_drop_page_hash_index(). This is the normal way of lazily dropping the adaptive hash index after a DDL operation such as DROP INDEX operation. btr_block_get(), btr_root_block_get(), btr_root_adjust_on_import(), btr_read_autoinc_with_fallback(), btr_cur_instant_init_low(), btr_cur_t::search_leaf(), btr_cur_t::pessimistic_search_leaf(), btr_pcur_optimistic_latch_leaves(), dict_stats_analyze_index_below_cur(): Invoke btr_search_drop_page_hash_index(block, index) for pages that may be leaf pages. No adaptive hash index may have been created on anything else than a B-tree leaf page. btr_cur_search_to_nth_level(): Do not invoke btr_search_drop_page_hash_index(), because we are only accessing non-leaf pages and the adaptive hash index may only have been created on leaf pages. btr_page_alloc_for_ibuf() and many other callers of buf_page_get_gen() or similar functions do not invoke btr_search_drop_page_hash_index(), because the adaptive hash index is never created on such pages. If a page in the tablespace was freed as part of a DDL operation and reused for something else, then buf_page_create_low() will take care of dropping the adaptive hash index before the freed page will be modified. It is notable that while the flst_ functions may access pages that are related to allocating B-tree index pages (the BTR_SEG_TOP and BTR_SEG_LEAF linked from the index root page), those pages themselves can never be stored in the adaptive hash index. Therefore, it is not necessary to invoke btr_search_drop_page_hash_index() on them. Reviewed by: Vladislav Lesin	2025-01-10 16:40:34 +02:00
Marko Mäkelä	c942b31340	MDEV-35049: Fix bogus rebuild on BTR_CUR_HASH_FAIL btr_search_info_update_hash(): Do nothing if the record is positioned on the page supremum or infimum pseudo-record. The adaptive hash index can only include user records. This deficiency would cause the adaptive hash index parameters to change between hashing a prefix of 1 field or a prefix of 1 byte. Reviewed by: Vladislav Lesin	2025-01-10 16:40:32 +02:00
Marko Mäkelä	6b58ee769f	MDEV-35049: Fix bogus BTR_CUR_HASH_FAIL on contention btr_search_guess_on_hash(): Only set BTR_CUR_HASH_FAIL on actual mismatch. If the page latch cannot be acquired, the hash search might very well have succeeded. Do not count that as a failure, that is, do not unnecessarily invoke btr_search_update_hash_ref() after a normal search. Set cursor->flag=BTR_CUR_HASH_ABORT if the current parameters of the adaptive hash index are not suitable for the search and a call to btr_cur_t::search_info_update() might help. btr_cur_t::search_leaf(): Do not invoke search_info_update() if btr_search_guess_on_hash() failed due to contention. btr_cur_t::pessimistic_search_leaf(): Do not invoke search_info_update() on the change buffer tree. Preivously, this condition was being checked inside search_info_update().	2025-01-10 16:40:30 +02:00
Marko Mäkelä	68cac26108	MDEV-35049: Fix bogus BTR_CUR_HASH_FAIL on PAGE_CUR_LE btr_cur_t::search_leaf(): Do not attempt to use the adaptive hash index for PAGE_CUR_G or PAGE_CUR_L, because those modes expect an inequal result, and the adaptive hash index can only deliver equal results. btr_cur_t::check_mismatch(): Only handle PAGE_CUR_LE and PAGE_CUR_GE. For PAGE_CUR_LE (bool ge=false), qualify a full match for the last record of a page that is not at the end of the index. Previously, an adaptive hash index lookup would fail when the record is at the end of an index page but not at the end of the index. This would lead to unnecessary rebuild of the adaptive hash index in read-only workloads. Reviewed by: Vladislav Lesin	2025-01-10 16:40:29 +02:00
Marko Mäkelä	4dcb1b575b	MDEV-35049: Use CRC-32C and avoid allocating heap For the adaptive hash index, dtuple_fold() and rec_fold() were employing a slow rolling hash algorithm, computing hash values ("fold") for one field and one byte at a time, while depending on calls to rec_get_offsets(). We already have optimized implementations of CRC-32C and have been successfully using that function in some other InnoDB tables, but not yet in the adaptive hash index. Any linear function such as any CRC will fail the avalanche test that any cryptographically secure hash function is expected to pass: any single-bit change in the input key should affect on average half the bits in the output. But we always were happy with less than cryptographically secure: in fact, ut_fold_ulint_pair() or ut_fold_binary() are just about as linear as any CRC, using a combination of multiplication and addition, partly carry-less. It is worth noting that exclusive-or corresponds to carry-less subtraction or addition in a binary Galois field, or GF(2). We only need some way of reducing key prefixes into hash values. The CRC-32C should be better than a Rabin–Karp rolling hash algorithm. Compared to the old hash algorithm, it has the drawback that there will be only 32 bits of entropy before we choose the hash table cell by a modulus operation. The size of each adaptive hash index array is (innodb_buffer_pool_size / 512) / innodb_adaptive_hash_index_parts. With the maximum number of partitions (512), we would not exceed 1<<32 elements per array until the buffer pool size exceeds 1<<50 bytes (1 PiB). We would hit other limits before that: the virtual address space on many contemporary 64-bit processor implementations is only 48 bits (256 TiB). So, we can simply go for the SIMD accelerated CRC-32C. rec_fold(): Take a combined parameter n_bytes_fields. Determine the length of each field on the fly, and compute CRC-32C over a single contiguous range of bytes, from the start of the record payload area to the end of the last full or partial field. For secondary index records in ROW_FORMAT=REDUNDANT, also the data area that is reserved for NULL values (to facilitate in-place updates between NULL and NOT NULL values) will be included in the count. Luckily, InnoDB always zero-initialized such unused area; refer to data_write_sql_null() in rec_convert_dtuple_to_rec_old(). For other than ROW_FORMAT=REDUNDANT, no space is allocated for NULL values, and therefore the CRC-32C will only cover the actual payload of the key prefix. dtuple_fold(): For ROW_FORMAT=REDUNDANT, include the dummy NULL values in the CRC-32C, so that the values will be comparable with rec_fold(). innodb_ahi-t: A unit test for rec_fold() and dtuple_fold(). btr_search_build_page_hash_index(), btr_search_drop_page_hash_index(): Use a fixed-size stack buffer for computing the fold values, to avoid dynamic memory allocation. btr_search_drop_page_hash_index(): Do not release part.latch if we need to invoke multiple batches of rec_fold(). dtuple_t: Allocate fewer bits for the fields. The maximum number of data fields is about 1023, so uint16_t will be fine for them. The info_bits is stored in less than 1 byte. ut_pair_min(), ut_pair_cmp(): Remove. We can actually combine and compare int(n_fields << 16 \| n_bytes). PAGE_CUR_LE_OR_EXTENDS, PAGE_CUR_DBG: Remove. These were never defined, because they would only work with latin1_swedish_ci if at all. btr_cur_t::check_mismatch(): Replaces !btr_search_check_guess(). cmp_dtuple_rec_bytes(): Replaces cmp_dtuple_rec_with_match_bytes(). Determine the offsets of fields on the fly. page_cur_try_search_shortcut_bytes(): This caller of cmp_dtuple_rec_bytes() will not be invoked on the change buffer tree. cmp_dtuple_rec_leaf(): Replaces cmp_dtuple_rec_with_match() for comparing leaf-page records. buf_block_t::ahi_left_bytes_fields: Consolidated Atomic_relaxed<uint32_t> of curr_left_side << 31 \| curr_n_bytes << 16 \| curr_n_fields. The other set of parameters (n_fields, n_bytes, left_side) was removed as redundant. btr_search_update_hash_node_on_insert(): Merged to btr_search_update_hash_on_insert(). btr_search_build_page_hash_index(): Take combined left_bytes_fields instead of n_fields, n_bytes, left_side. btr_search_update_block_hash_info(), btr_search_update_hash_ref(): Merged to btr_search_info_update_hash(). btr_cur_t::n_bytes_fields: Replaces n_bytes << 16 \| n_fields. We also remove many redundant checks of btr_search.enabled. If we are holding any btr_sea::partition::latch, then a nonnull pointer in buf_block_t::index must imply that the adaptive hash index is enabled. Reviewed by: Vladislav Lesin	2025-01-10 16:39:44 +02:00
Marko Mäkelä	9c8bdc6c15	MDEV-35049: btr_search_check_free_space_in_heap() is a bottleneck Let us use implement a simple fixed-size allocator for the adaptive hash index, insted of complicating mem_heap_t or mem_block_info_t. MEM_HEAP_BTR_SEARCH: Remove. mem_block_info_t::free_block(), mem_heap_free_block_free(): Remove. mem_heap_free_top(), mem_heap_get_top(): Remove. btr_sea::partition::spare: Replaces mem_block_info_t::free_block. This keeps one spare block per adaptive hash index partition, to process an insert. We must not wait for buf_pool.mutex while holding any btr_sea::partition::latch. That is why we cache one block for future allocations. This is protected by a new btr_sea::partition::blocks_mutex in order to relieve pressure on btr_sea::partition::latch. btr_sea::partition::prepare_insert(): Replaces btr_search_check_free_space_in_heap(). btr_sea::partition::erase(): Replaces ha_search_and_delete_if_found(). btr_sea::partition::cleanup_after_erase(): Replaces the most part of ha_delete_hash_node(). Unlike the previous implementation, we will retain a spare block for prepare_insert(). This should reduce some contention on buf_pool.mutex. btr_search.n_parts: Replaces btr_ahi_parts. btr_search.enabled: Replaces btr_search_enabled. This must hold whenever buf_block_t::index is set while a thread is holding a btr_sea::partition::latch. dict_index_t::search_info: Remove pointer indirection, and use Atomic_relaxed or Atomic_counter for most fields. btr_search_guess_on_hash(): Let the caller ensure that latch_mode is BTR_MODIFY_LEAF or BTR_SEARCH_LEAF. Release btr_sea::partition::latch before buffer-fixing the block. The page latch that we already acquired is preventing buffer pool eviction. We must validate both block->index and block->page.state while holding part.latch in order to avoid race conditions with buffer page relocation or buf_pool_t::resize(). btr_search_check_guess(): Remove the constant parameter can_only_compare_to_cursor_rec=false. ahi_node: Replaces ha_node_t. This has been tested by running the regression test suite with the adaptive hash index enabled: ./mtr --mysqld=--loose-innodb-adaptive-hash-index=ON Reviewed by: Vladislav Lesin	2025-01-10 16:30:42 +02:00
Sergei Golubchik	221aa5e08f	Merge branch '10.6' into 10.11	2025-01-10 13:14:42 +01:00
Marko Mäkelä	990b010b09	MDEV-35438 Annotate InnoDB I/O functions with noexcept Most InnoDB functions do not throw any exceptions, not even indirectly std::bad_alloc, which could be thrown by a C++ memory allocation function. Let us annotate many functions with noexcept in order to reduce the code footprint related to exception handling. Reviewed by: Thirunarayanan Balathandayuthapani	2025-01-09 07:43:24 +02:00
Kristian Nielsen	0f47db8525	Merge 10.11 -> 11.4 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-12-05 11:01:42 +01:00
Kristian Nielsen	e7c6cdd842	Merge 10.6 -> 10.11 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-12-05 10:11:58 +01:00
Marko Mäkelä	2719cc4925	Merge 10.11 into 11.4	2024-12-02 11:35:34 +02:00
Marko Mäkelä	1a9011d273	MDEV-35525: Index corruption in reverse scans btr_cur_t::search_leaf(): In the BTR_SEARCH_PREV and BTR_MODIFY_PREV modes, reset the previous search status before invoking page_cur_search_with_match(). Otherwise, we the search could invoke in a totally wrong subtree. This fixes a regression that was introduced in commit `de4030e4d4` (MDEV-30400).	2024-11-29 15:12:20 +02:00
Marko Mäkelä	507323abe6	Cleanup: Remove duplicated code buf_block_alloc(): Define as an alias in buf0lru.h, which defines the underlying buf_LRU_get_free_block(). buf_block_free(): Define as an alias of the non-inline function buf_pool.free_block(block). Reviewed by: Vladislav Lesin	2024-11-29 14:16:34 +02:00
Marko Mäkelä	3d23adb766	Merge 10.6 into 10.11	2024-11-29 13:43:17 +02:00
Marko Mäkelä	19acb0257e	MDEV-35508 Race condition between purge and secondary index INSERT or UPDATE row_purge_remove_sec_if_poss_leaf(): If there is an active transaction that is not newer than PAGE_MAX_TRX_ID, return the bogus value 1 so that row_purge_remove_sec_if_poss_tree() is guaranteed to recheck if the record needs to be purged. It could be the case that an active transaction would insert this record between the time this check completed and row_purge_remove_sec_if_poss_tree() acquired a latch on the secondary index leaf page again. row_purge_del_mark_error(), row_purge_check(): Some unlikely code refactored into separate non-inline functions. trx_sys_t::find_same_or_older_low(): Move the unlikely and bulky part of trx_sys_t::find_same_or_older() to a non-inline function. trx_sys_t::find_same_or_older_in_purge(): A variant of trx_sys_t::find_same_or_older() for use in the purge subsystem, with potential concurrent access of the same trx_t object from multiple threads. trx_t::max_inactive_id_atomic: An Atomic_relaxed alias of the regular data field trx_t::max_inactive_id, which we use on systems that have native 64-bit loads or stores. On any 64-bit system that seems to be supported by GCC, Clang or MSVC, relaxed atomic loads and stores use the regular load and store instructions. On -march=i686 the 64-bit atomic loads and stores would use an XMM register. This fixes a regression that had been introduced in commit `b7b9f3ce82` (MDEV-34515). There would be messages [ERROR] InnoDB: tried to purge non-delete-marked record in index in the server error log, and an assertion ut_ad(0) would cause a crash of debug instrumented builds. This could also cause incorrect results for MVCC reads and corrupted secondary indexes. The debug instrumented test case was written by Debarun Banerjee. Reviewed by: Debarun Banerjee	2024-11-29 10:44:38 +02:00
Marko Mäkelä	26597b91b3	MDEV-35413 InnoDB: Cannot load compressed BLOB A race condition was observed between two buf_page_get_zip() for a page. One of them had proceeded to buf_read_page(), allocating and x-latching a buf_block_t that initially comprises only an uncompressed page frame. While that thread was waiting inside buf_block_alloc(), another thread would try to access the same page. Without acquiring a page latch, it would wrongly conclude that there is corruption because no compressed page frame exists for the block. buf_page_get_zip(): Simplify the logic and correct the documentation. Always acquire a shared latch to prevent any race condition with a concurrent read operation. No longer increment a buffer-fix; the latch is sufficient for preventing page relocation or eviction. buf_read_page(): Add the parameter bool unzip=true. In buf_page_get_zip() there is no need to allocate an uncompressed page frame for reading a compressed BLOB page. We only need that for other ROW_FORMAT=COMPRESSED pages, or for writing compressed BLOB pages. btr_copy_zblob_prefix(): Remove the message "Cannot load compressed BLOB" because buf_page_get_zip() will already have reported a more specific error whenever it returns nullptr. row_merge_buf_add(): Do not crash on BLOB corruption, but return an error instead. (In debug builds, an assertion will fail if this corruption is noticed.) Reviewed by: Debarun Banerjee	2024-11-22 08:33:03 +02:00
Marko Mäkelä	895cd553a3	MDEV-32175: Reduce page_align(), page_offset() calls When srv_page_size and innodb_page_size were introduced, the functions page_align() and page_offset() got more expensive. Let us try to replace such calls with simpler pointer arithmetics with respect to the buffer page frame. page_rec_get_next_non_del_marked(): Add a page frame as a parameter, and template<bool comp>. page_rec_next_get(): A more efficient variant of page_rec_get_next(), with template<bool comp> and const page_t* parameters. lock_get_heap_no(): Replaces page_rec_get_heap_no() outside debug checks. fseg_free_step(), fseg_free_step_not_header(): Take the header block as a parameter. Reviewed by: Vladislav Lesin	2024-11-21 11:01:30 +02:00
Marko Mäkelä	3c312d247c	MDEV-35190 HASH_SEARCH duplicates effort before HASH_INSERT or HASH_DELETE The HASH_ macros are unnecessarily obfuscating the logic, so we had better replace them. hash_cell_t::search(): Implement most of the HASH_DELETE logic, for a subsequent insert or remove(). hash_cell_t::remove(): Remove an element. hash_cell_t::find(): Implement the HASH_SEARCH logic. xb_filter_hash_free(): Avoid any hash table lookup; just traverse the hash bucket chains and free each element. xb_register_filter_entry(): Search databases_hash only once. rm_if_not_found(): Make use of find_filter_in_hashtable(). dict_sys_t::acquire_temporary_table(), dict_sys_t::find_table(): Define non-inline to avoid unnecessary code duplication. dict_sys_t::add(dict_table_t *table), dict_table_rename_in_cache(): Look for duplicate while finding the insert position. dict_table_change_id_in_cache(): Merged to the only caller row_discard_tablespace(). hash_insert(): Helper function of dict_sys_t::resize(). fil_space_t::create(): Look for a duplicate (and crash if found) when searching for the insert position. lock_rec_discard(): Take the hash array cell as a parameter to avoid a duplicated lookup. lock_rec_free_all_from_discard_page(): Remove a parameter. Reviewed by: Debarun Banerjee	2024-11-21 08:59:02 +02:00
ParadoxV5	d5f16d6305	Extract some of #3360 fixes to 10.6.x That PR uncovered countless issues on `my_snprintf` uses. This commit backports a squashed subset of their fixes (excludes #3485).	2024-11-18 13:29:04 +11:00
Oleksandr Byelkin	69d033d165	Merge branch '10.11' into 11.2	2024-10-29 16:42:46 +01:00
Oleksandr Byelkin	3d0fb15028	Merge branch '10.6' into 10.11	2024-10-29 15:24:38 +01:00
Vlad Lesin	8c7786e7d5	MDEV-34690 lock_rec_unlock_unmodified() causes deadlock lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock() or under a combination of lock_sys.rd_lock() + record locks hash table cell latch. It also requests page latch to check if locked records were changed by the current transaction or not. Usually InnoDB requests page latch to find the certain record on the page, and then requests lock_sys and/or record lock hash cell latch to request record lock. lock_rec_unlock_unmodified() requests the latches in the opposite order, what causes deadlocks. One of the possible scenario for the deadlock is the following: thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table cell latch, the latch is acquired; thread 2 - purge thread acquires page latch and tries to remove delete-marked record, it invokes lock_update_delete(), which requests locks hash table cell latch, held by thread 1; thread 1 - requests page latch, held by thread 2. To fix it we need to release lock_sys.latch and/or lock hash cell latch, acquire page latch and re-acquire lock_sys related latches. When lock_sys.latch and/or lock hash cell latch are released in lock_release_on_prepare() and lock_release_on_prepare_try(), the page on which the current lock is held, can be merged. In this case the bitmap of the current lock must be cleared, and the new lock must be added to the end of trx->lock.trx_locks list, or bitmap of already existing lock must be changed. The new field trx_lock_t::set_nth_bit_calls indicates if new locks (bits in existing lock bitmaps or new lock objects) were created during the period when lock_sys was released in trx->lock.trx_locks list iteration loop in lock_release_on_prepare() or lock_release_on_prepare_try(). And, if so, we traverse the list again. The block can be freed during pages merging, what causes assertion failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page get mode to it. That's why page_get_mode parameter was added to btr_block_get() to pass BUF_GET_POSSIBLY_FREED from lock_release_on_prepare() and lock_release_on_prepare_try() to buf_page_get_gen(). As searching for id of trx, which modified secondary index record, is quite expensive operation, restrict its usage for master. System variable was added to remove the restriction for testing simplifying. The variable exists only either for debug build or for build with -DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the probability of catching bugs for release build with RQG. Note that the code, which does primary index lookup to find out what transaction modified secondary index record, is necessary only when there is no primary key and no unique secondary key on replica with row based replication, because only in this case extra X locks on unmodified records can be set during scan phase. Reviewed by Marko Mäkelä.	2024-10-23 12:36:17 +03:00
Marko Mäkelä	740519e15a	MDEV-35125: Unnecessary buf_pool.page_hash lookups dict_index_t::clear(), btr_drop_temporary_table(): Make use of the root page guess if it is available. btr_read_autoinc(): Invoke btr_root_block_get() to access the root page. btr_blob_free(): Retain a buffer-fix on the page across mtr_t::commit() in order to avoid a buf_pool.page_hash lookup. dict_load_table_one(): Remove a redundant check for page id. It was already validated in buf_page_t::read_complete(). trx_t::apply_log(): Make use of buf_pool.page_fix() to avoid some mtr_t related overhead. Reviewed by: Thirunarayanan Balathandayuthapani	2024-10-17 09:10:45 +03:00
Marko Mäkelä	64b75865d5	MDEV-34823 after-merge fix btr_cur_t::search_leaf(): Remove a redundant condition. This fixes up the merge commit `cfa9784edb`	2024-09-18 07:06:35 +03:00
Yuchen Pei	cfa9784edb	Merge branch '10.11' into 11.2	2024-09-18 10:25:16 +10:00
Yuchen Pei	b168859d1e	Merge branch '10.6' into 10.11	2024-09-11 16:10:53 +10:00
Marko Mäkelä	f0de610d0c	Merge 10.11 into 11.2	2024-09-10 18:35:16 +03:00
Marko Mäkelä	b7b2d2bde4	Merge 10.5 into 10.6	2024-09-09 11:30:30 +03:00
Marko Mäkelä	f9f92b480e	Merge 10.6 into 10.11	2024-09-06 16:17:42 +03:00
Marko Mäkelä	024a18dbcb	MDEV-34823 Invalid arguments in ib_push_warning() In the bug report MDEV-32817 it occurred that the function row_mysql_get_table_status() is outputting a fil_space_t* as if it were a numeric tablespace identifier. ib_push_warning(): Remove. Let us invoke push_warning_printf() directly. innodb_decryption_failed(): Report a decryption failure and set the dict_table_t::file_unreadable flag. This code was being duplicated in very many places. We return the constant value DB_DECRYPTION_FAILED in order to avoid code duplication in the callers and to allow tail calls. innodb_fk_error(): Report a FOREIGN KEY error. dict_foreign_def_get(), dict_foreign_def_get_fields(): Remove. This code was being used in dict_create_add_foreign_to_dictionary() in an apparently uncovered code path. That ib_push_warning() call would pass the integer i+1 instead of a pointer to NUL terminated string ("%s"), and therefore the call should have resulted in a crash. dict_print_info_on_foreign_key_in_create_format(), innobase_quote_identifier(): Add const qualifiers. row_mysql_get_table_error(): Replaces row_mysql_get_table_status(). Display no message on DB_CORRUPTION; it should be properly reported at the SQL layer anyway.	2024-09-06 14:29:09 +03:00
Marko Mäkelä	9878238f74	MDEV-34791: Redundant page lookups hurt performance btr_cur_t::search_leaf(): When the index root page is also a leaf page, we may need to upgrade our existing shared root page latch into an exclusive latch. Even if we end up waiting, the root page won't be able to go away while we hold an index()->lock. The index page may be split; that is all. btr_latch_prev(): Acquire the page latch while holding a buffer-fix and an index tree latch. Merge the change buffer if needed. Use buf_pool_t::page_fix() for this special case instead of complicating buf_page_get_low() and buf_page_get_gen(). row_merge_read_clustered_index(): Remove some code that does not seem to be useful. No difference was observed with regard to removing this code when a CREATE INDEX or OPTIMIZE TABLE statement was run concurrently with sysbench oltp_update_index --tables=1 --table_size=1000 --threads=16. buf_pool_t::unzip(): Decompress a ROW_FORMAT=COMPRESSED page. buf_pool_t::page_fix(): Handle also ROW_FORMAT=COMPRESSED pages as well as change buffer merge. Optionally return an error. Add a flag for suppressing a page latch wait and a special return value -1 to indicate that the call would block. This is the preferred way of buffer-fixing blocks. The functions buf_page_get_gen() and buf_page_get_low() are only being invoked with rw_latch=RW_NO_LATCH in operations on SPATIAL INDEX. buf_page_t: Define some static functions for interpreting state(). buf_page_get_zip(), buf_read_page(), buf_read_ahead_random(), buf_read_ahead_linear(): Remove the redundant parameter zip_size. We must look up the tablespace and can invoke fil_space_t::zip_size() on it. buf_page_get_low(): Require mtr!=nullptr. buf_page_get_gen(): Implement some lock downgrading during recovery. ibuf_page_low(): Use buf_pool_t::page_fix() in a debug check. We do wait for a page read here, because otherwise a debug assertion in buf_page_get_low() in the test innodb.ibuf_delete could occasionally fail. PageConverter::operator(): Invoke buf_pool_t::page_fix() in order to possibly evict a block. This allows us to remove some special case code from buf_page_get_low().	2024-09-03 14:15:57 +03:00
Marko Mäkelä	cfcf27c6fe	Merge 10.6 into 10.11	2024-08-29 07:47:29 +03:00
Marko Mäkelä	b7b9f3ce82	MDEV-34515: Contention between purge and workload In a Sysbench oltp_update_index workload that involves 1 table, a serious contention between the workload and the purge of history was observed. This was the worst when the table contained only 1 record. This turned out to be fixed by setting innodb_purge_batch_size=128, which corresponds to the number of usable persistent rollback segments. When we go above that, there would be contention between row_purge_poss_sec() and the workload, typically on the clustered index page latch, sometimes also on a secondary index page latch. It might be that with smaller batches, trx_sys.history_size() will end up pausing all concurrent transaction start/commit frequently enough so that purge will be able to make some progress, so that there would be less contention on the index page latches between purge and SQL execution. In commit `aa719b5010` (part of MDEV-32050) the interpretation of the parameter innodb_purge_batch_size was slightly changed. It would correspond to the maximum desired size of the purge_sys.pages cache. Before that change, the parameter was referring to a number of undo log pages, but the accounting might have been inaccurate. To avoid a regression, we will reduce the default value to innodb_purge_batch_size=127, which will also be compatible with innodb_undo_tablespaces>1 (which will disable rollback segment 0). Additionally, some logic in the purge and MVCC checks is simplified. The purge tasks will make use of purge_sys.pages when accessing undo log pages to find out if a secondary index record can be removed. If an undo page needs to be looked up in buf_pool.page_hash, we will merely buffer-fix it. This is correct, because the undo pages are append-only in nature. Holding purge_sys.latch or purge_sys.end_latch or the fact that the current thread is executing as a part of an in-progress purge batch will prevent the contents of the undo page from being freed and subsequently reused. The buffer-fix will prevent the page from being evicted form the buffer pool. Thanks to this logic, we can refer to the undo log record directly in the buffer pool page and avoid copying the record. buf_pool_t::page_fix(): Look up and buffer-fix a page. This is useful for accessing undo log pages, which are append-only by nature. There will be no need to deal with change buffer or ROW_FORMAT=COMPRESSED in that case. purge_sys_t::view_guard::view_guard(): Allow the type of guard to be acquired: end_latch, latch, or no latch (in case we are a purge thread). purge_sys_t::view_guard::get(): Read-only accessor to purge_sys.pages. purge_sys_t::get_page(): Invoke buf_pool_t::page_fix(). row_vers_old_has_index_entry(): Replaced with row_purge_is_unsafe() and row_undo_mod_sec_unsafe(). trx_undo_get_undo_rec(): Merged to trx_undo_prev_version_build(). row_purge_poss_sec(): Add the parameter mtr and remove redundant or unused parameters sec_pcur, sec_mtr, is_tree. We will use the caller's mtr object but release any acquired page latches before returning. btr_cur_get_page(), page_cur_get_page(): Do not invoke page_align(). row_purge_remove_sec_if_poss_leaf(): Return the value of PAGE_MAX_TRX_ID to be checked against the page in row_purge_remove_sec_if_poss_tree(). If the secondary index page was not changed meanwhile, it will be unnecessary to invoke row_purge_poss_sec() again. trx_undo_prev_version_build(): Access any undo log pages using the caller's mini-transaction object. row_purge_vc_matches_cluster(): Moved to the only compilation unit that needs it. Reviewed by: Debarun Banerjee	2024-08-26 12:23:06 +03:00
Marko Mäkelä	7ead48a72b	MDEV-34458: Remove more traces of BTR_MODIFY_PREV In commit `2f6df93748` we fixed an observed case of the bug by removing some code related to the no longer needed BTR_MODIFY_PREV mode. In commit `73ad436e16` an alternative fix was applied that also fixes the BTR_SEARCH_PREV case. Let us clean up some implicit references to BTR_MODIFY_PREV that were missed in `2f6df93748`. btr_pcur_move_backward_from_page(): Assume that the latch mode was BTR_SEARCH_LEAF. btr_pcur_move_to_prev(): Assert that the latch mode is BTR_SEARCH_LEAF. This function is mostly invoked in row0sel.cc for read operations, as well as in row0merge.cc for reading from the clustered index. All callers indeed use a cursor in the BTR_SEARCH_LEAF mode.	2024-07-29 14:13:30 +03:00
Oleksandr Byelkin	2447dda2c0	Merge branch '10.11' into 11.1	2024-07-08 22:40:16 +02:00
Oleksandr Byelkin	034a175982	Merge branch '10.6' into 10.11	2024-07-04 11:52:07 +02:00
mariadb-DebarunBanerjee	73ad436e16	MDEV-34458 wait_for_read in buf_page_get_low hurts performance The performance regression seen while loading BP is caused by the deadlock fix given in MDEV-33543. The area of impact is wider but is more visible when BP is being loaded initially via DMLs. Specifically the response time could be impacted in DML doing pessimistic operation on index(split/merge) and the leaf pages are not found in buffer pool. It is more likely to occur with small BP size. The origin of the issue dates back to MDEV-30400 that introduced btr_cur_t::search_leaf() replacing btr_cur_search_to_nth_level() for leaf page searches. In btr_latch_prev, we use RW_NO_LATCH to get the previous page fixed in BP without latching. When the page is not in BP, we try to acquire and wait for S latch violating the latching order. This deadlock was analyzed in MDEV-33543 and fixed by using the already present wait logic in buf_page_get_gen() instead of waiting for latch. The wait logic is inferior to usual S latch wait and is simply a repeated sleep 100 of micro-sec (The actual sleep time could be more depending on platforms). The bug was seen with "change-buffering" code path and the idea was that this path should be less exercised. The judgement was not correct and the path is actually quite frequent and does impact performance when pages are not in BP and being loaded by DML expanding/shrinking large data. Fix: While trying to get a page with RW_NO_LATCH and we are attempting "out of order" latch, return from buf_page_get_gen immediately instead of waiting and follow the ordered latching path.	2024-07-03 18:08:43 +05:30
Marko Mäkelä	2f6df93748	MDEV-34458 wait_for_read in buf_page_get_low hurts performance BTR_MODIFY_PREV: Remove. This mode was only used by the change buffer, which commit `f27e9c8947` (MDEV-29694) removed. buf_page_get_gen(): Revert the change that was made in commit `90b95c6149` (MDEV-33543) because it is not applicable after MDEV-29694. This fixes the performance regression that Vladislav Vaintroub reported. This is a 11.x specific fix; this needs to be fixed differently in older major versions where the change buffer is present.	2024-06-26 13:51:38 +03:00
Marko Mäkelä	d34289a3e2	Merge 10.11 into 11.1	2024-06-17 09:21:50 +03:00
Marko Mäkelä	b81d717387	Merge 10.6 into 10.11	2024-06-11 12:50:10 +03:00
Marko Mäkelä	9fac857f26	MDEV-34283 A misplaced btr_cur_need_opposite_intention() check may fail to prevent hangs btr_cur_t::search_leaf(): Invoke btr_cur_need_opposite_intention() after positioning page_cur.rec so that the record will be in the intended page. This is something that was broken in commit `f2096478d5` or commit `de4030e4d4` or related changes. btr_cur_need_opposite_intention(): Add a debug assertion that would catch the misuse. The "next line of defence" that should have caught this bug in debug builds are assertions that mtr_t::m_memo contains MTR_MEMO_X_LOCK for the dict_index_t::lock. When btr_cur_need_opposite_intention() holds, we should escalate to acquiring an exclusive index->lock in btr_cur_t::pessimistic_search_leaf(). Reviewed by: Debarun Banerjee	2024-06-06 13:03:34 +03:00
Sergei Golubchik	f0a5412037	Merge branch '11.0' into 11.1	2024-05-13 09:52:30 +02:00
Sergei Golubchik	f9807aadef	Merge branch '10.11' into 11.0	2024-05-12 12:18:28 +02:00
Sergei Golubchik	a6b2f820e0	Merge branch '10.6' into 10.11	2024-05-10 20:02:18 +02:00
Sergei Golubchik	7b53672c63	Merge branch '10.5' into 10.6	2024-05-08 20:06:00 +02:00

1 2 3 4 5 ...

1300 commits