Commit graph

1300 commits

Author SHA1 Message Date
ParadoxV5
2392bd02d8 Tag the sql/log.h family with ATTRIBUTE_FORMAT
Let GCC `-Wformat` check formats sent to
these users of `my_vsnprintf_ex` users (heh)
2025-02-12 10:17:44 +01:00
Sergei Golubchik
9ee09a33bb Merge branch '11.7' into 11.8 2025-02-11 20:29:43 +01:00
Sergei Golubchik
f1a7693bc0 Merge branch '10.11' into 11.4 2025-01-14 23:45:41 +01:00
Marko Mäkelä
793a2fc8ba MDEV-35049: Always enable page_cur_search_with_match_bytes()
For some reason, page_cur_search_with_match_bytes(), which can speed
up append operations (PAGE_CUR_LE used by INSERT), was only enabled
if innodb_adaptive_hash_index=ON even though it has nothing to do with
the adaptive hash index.

Furthermore, mysql/mysql-server@c9bbc83d11
a.k.a. commit c9bbc83d11 reduced a limit
from 3 to 2 but forgot to adjust the PAGE_N_DIRECTION limit accordingly.
We are adjusting that as well.

Reviewed by: Vladislav Lesin
2025-01-10 16:40:37 +02:00
Marko Mäkelä
4221ed1d7d MDEV-35049: Avoid building AHI beyond unique field prefix
During a workload, an adaptive hash index had been built on
UNIQUE INDEX(ID) on SYS_TABLES, and during a DROP TABLE
operation the adaptive hash index would be widened to cover
also the PRIMARY KEY(NAME) field that the index includes: (ID,NAME).
Such an adaptive hash index is unlikely to satisfy (m)any queries.
Let us limit the AHI prefix to the unique fields.

Reviewed by: Vladislav Lesin
2025-01-10 16:40:35 +02:00
Marko Mäkelä
5f7b2a3ced MDEV-35049: Improve btr_search_drop_page_hash_index()
btr_search_drop_page_hash_index(): Replace the Boolean parameter
with const dict_index_t *not_garbage. If buf_block_t::index points
to that, there is no need to acquire btr_sea::partition::latch.

The old parameter bool garbage_collect=false is equivalent to the
parameter not_garbage=nullptr. The parameter garbage_collect=true
will be replaced either with the actual index that is associated
with the buffer page, or with a bogus pointer not_garbage=-1 to
indicate that any lazily entries for a freed index need to be removed.

buf_page_get_low(), buf_page_get_gen(), mtr_t::page_lock(),
mtr_t::upgrade_buffer_fix(): Do not invoke
btr_search_drop_page_hash_index(). Our caller will have to do it
when appropriate.

buf_page_create_low(): Keep invoking btr_search_drop_page_hash_index().
This is the normal way of lazily dropping the adaptive hash index
after a DDL operation such as DROP INDEX operation.

btr_block_get(), btr_root_block_get(), btr_root_adjust_on_import(),
btr_read_autoinc_with_fallback(), btr_cur_instant_init_low(),
btr_cur_t::search_leaf(), btr_cur_t::pessimistic_search_leaf(),
btr_pcur_optimistic_latch_leaves(), dict_stats_analyze_index_below_cur():
Invoke btr_search_drop_page_hash_index(block, index) for pages that
may be leaf pages. No adaptive hash index may have been created on
anything else than a B-tree leaf page.

btr_cur_search_to_nth_level(): Do not invoke
btr_search_drop_page_hash_index(), because we are only accessing
non-leaf pages and the adaptive hash index may only have been created
on leaf pages.

btr_page_alloc_for_ibuf() and many other callers of buf_page_get_gen()
or similar functions do not invoke btr_search_drop_page_hash_index(),
because the adaptive hash index is never created on such pages.
If a page in the tablespace was freed as part of a DDL operation and
reused for something else, then buf_page_create_low() will take care
of dropping the adaptive hash index before the freed page will be
modified.

It is notable that while the flst_ functions may access pages that are
related to allocating B-tree index pages (the BTR_SEG_TOP and BTR_SEG_LEAF
linked from the index root page), those pages themselves can never be
stored in the adaptive hash index. Therefore, it is not necessary to
invoke btr_search_drop_page_hash_index() on them.

Reviewed by: Vladislav Lesin
2025-01-10 16:40:34 +02:00
Marko Mäkelä
c942b31340 MDEV-35049: Fix bogus rebuild on BTR_CUR_HASH_FAIL
btr_search_info_update_hash(): Do nothing if the record is positioned
on the page supremum or infimum pseudo-record. The adaptive hash index
can only include user records. This deficiency would cause the
adaptive hash index parameters to change between hashing a prefix of
1 field or a prefix of 1 byte.

Reviewed by: Vladislav Lesin
2025-01-10 16:40:32 +02:00
Marko Mäkelä
6b58ee769f MDEV-35049: Fix bogus BTR_CUR_HASH_FAIL on contention
btr_search_guess_on_hash(): Only set BTR_CUR_HASH_FAIL on actual mismatch.
If the page latch cannot be acquired, the hash search might very well
have succeeded. Do not count that as a failure, that is, do not
unnecessarily invoke btr_search_update_hash_ref() after a normal search.
Set cursor->flag=BTR_CUR_HASH_ABORT if the current parameters of the
adaptive hash index are not suitable for the search and a call to
btr_cur_t::search_info_update() might help.

btr_cur_t::search_leaf(): Do not invoke search_info_update()
if btr_search_guess_on_hash() failed due to contention.

btr_cur_t::pessimistic_search_leaf(): Do not invoke search_info_update()
on the change buffer tree. Preivously, this condition was being checked
inside search_info_update().
2025-01-10 16:40:30 +02:00
Marko Mäkelä
68cac26108 MDEV-35049: Fix bogus BTR_CUR_HASH_FAIL on PAGE_CUR_LE
btr_cur_t::search_leaf(): Do not attempt to use the adaptive
hash index for PAGE_CUR_G or PAGE_CUR_L, because those modes
expect an inequal result, and the adaptive hash index can only
deliver equal results.

btr_cur_t::check_mismatch(): Only handle PAGE_CUR_LE and PAGE_CUR_GE.
For PAGE_CUR_LE (bool ge=false), qualify a full match for the last
record of a page that is not at the end of the index. Previously,
an adaptive hash index lookup would fail when the record is at the end
of an index page but not at the end of the index. This would lead to
unnecessary rebuild of the adaptive hash index in read-only workloads.

Reviewed by: Vladislav Lesin
2025-01-10 16:40:29 +02:00
Marko Mäkelä
4dcb1b575b MDEV-35049: Use CRC-32C and avoid allocating heap
For the adaptive hash index, dtuple_fold() and rec_fold() were employing
a slow rolling hash algorithm, computing hash values ("fold") for one
field and one byte at a time, while depending on calls to
rec_get_offsets().

We already have optimized implementations of CRC-32C and have been
successfully using that function in some other InnoDB tables, but not
yet in the adaptive hash index.

Any linear function such as any CRC will fail the avalanche test that
any cryptographically secure hash function is expected to pass:
any single-bit change in the input key should affect on average half
the bits in the output.

But we always were happy with less than cryptographically secure:
in fact, ut_fold_ulint_pair() or ut_fold_binary() are just about as
linear as any CRC, using a combination of multiplication and addition,
partly carry-less. It is worth noting that exclusive-or corresponds to
carry-less subtraction or addition in a binary Galois field, or GF(2).

We only need some way of reducing key prefixes into hash values.
The CRC-32C should be better than a Rabin–Karp rolling hash algorithm.
Compared to the old hash algorithm, it has the drawback that there will
be only 32 bits of entropy before we choose the hash table cell by a
modulus operation. The size of each adaptive hash index array is
(innodb_buffer_pool_size / 512) / innodb_adaptive_hash_index_parts.
With the maximum number of partitions (512), we would not exceed 1<<32
elements per array until the buffer pool size exceeds 1<<50 bytes (1 PiB).
We would hit other limits before that: the virtual address space on many
contemporary 64-bit processor implementations is only 48 bits (256 TiB).
So, we can simply go for the SIMD accelerated CRC-32C.

rec_fold(): Take a combined parameter n_bytes_fields. Determine the
length of each field on the fly, and compute CRC-32C over a single
contiguous range of bytes, from the start of the record payload area
to the end of the last full or partial field. For secondary index records
in ROW_FORMAT=REDUNDANT, also the data area that is reserved for NULL
values (to facilitate in-place updates between NULL and NOT NULL values)
will be included in the count. Luckily, InnoDB always zero-initialized
such unused area; refer to data_write_sql_null() in
rec_convert_dtuple_to_rec_old(). For other than ROW_FORMAT=REDUNDANT,
no space is allocated for NULL values, and therefore the CRC-32C will
only cover the actual payload of the key prefix.

dtuple_fold(): For ROW_FORMAT=REDUNDANT, include the dummy NULL values
in the CRC-32C, so that the values will be comparable with rec_fold().

innodb_ahi-t: A unit test for rec_fold() and dtuple_fold().

btr_search_build_page_hash_index(), btr_search_drop_page_hash_index():
Use a fixed-size stack buffer for computing the fold values, to avoid
dynamic memory allocation.

btr_search_drop_page_hash_index(): Do not release part.latch if we
need to invoke multiple batches of rec_fold().

dtuple_t: Allocate fewer bits for the fields. The maximum number of
data fields is about 1023, so uint16_t will be fine for them. The
info_bits is stored in less than 1 byte.

ut_pair_min(), ut_pair_cmp(): Remove. We can actually combine and compare
int(n_fields << 16 | n_bytes).

PAGE_CUR_LE_OR_EXTENDS, PAGE_CUR_DBG: Remove. These were never defined,
because they would only work with latin1_swedish_ci if at all.

btr_cur_t::check_mismatch(): Replaces !btr_search_check_guess().

cmp_dtuple_rec_bytes(): Replaces cmp_dtuple_rec_with_match_bytes().
Determine the offsets of fields on the fly.

page_cur_try_search_shortcut_bytes(): This caller of
cmp_dtuple_rec_bytes() will not be invoked on the change buffer tree.

cmp_dtuple_rec_leaf(): Replaces cmp_dtuple_rec_with_match()
for comparing leaf-page records.

buf_block_t::ahi_left_bytes_fields: Consolidated Atomic_relaxed<uint32_t>
of curr_left_side << 31 | curr_n_bytes << 16 | curr_n_fields.
The other set of parameters (n_fields, n_bytes, left_side) was removed
as redundant.

btr_search_update_hash_node_on_insert(): Merged to
btr_search_update_hash_on_insert().

btr_search_build_page_hash_index(): Take combined left_bytes_fields
instead of n_fields, n_bytes, left_side.

btr_search_update_block_hash_info(), btr_search_update_hash_ref():
Merged to btr_search_info_update_hash().

btr_cur_t::n_bytes_fields: Replaces n_bytes << 16 | n_fields.

We also remove many redundant checks of btr_search.enabled.
If we are holding any btr_sea::partition::latch, then a nonnull pointer
in buf_block_t::index must imply that the adaptive hash index is enabled.

Reviewed by: Vladislav Lesin
2025-01-10 16:39:44 +02:00
Marko Mäkelä
9c8bdc6c15 MDEV-35049: btr_search_check_free_space_in_heap() is a bottleneck
Let us use implement a simple fixed-size allocator for the adaptive hash
index, insted of complicating mem_heap_t or mem_block_info_t.

MEM_HEAP_BTR_SEARCH: Remove.

mem_block_info_t::free_block(), mem_heap_free_block_free(): Remove.

mem_heap_free_top(), mem_heap_get_top(): Remove.

btr_sea::partition::spare: Replaces mem_block_info_t::free_block.
This keeps one spare block per adaptive hash index partition, to
process an insert.

We must not wait for buf_pool.mutex while holding
any btr_sea::partition::latch. That is why we cache one block for
future allocations. This is protected by a new
btr_sea::partition::blocks_mutex in order to relieve pressure on
btr_sea::partition::latch.

btr_sea::partition::prepare_insert(): Replaces
btr_search_check_free_space_in_heap().

btr_sea::partition::erase(): Replaces ha_search_and_delete_if_found().

btr_sea::partition::cleanup_after_erase(): Replaces the most part of
ha_delete_hash_node(). Unlike the previous implementation, we will
retain a spare block for prepare_insert().
This should reduce some contention on buf_pool.mutex.

btr_search.n_parts: Replaces btr_ahi_parts.

btr_search.enabled: Replaces btr_search_enabled. This must hold
whenever buf_block_t::index is set while a thread is holding a
btr_sea::partition::latch.

dict_index_t::search_info: Remove pointer indirection, and use
Atomic_relaxed or Atomic_counter for most fields.

btr_search_guess_on_hash(): Let the caller ensure that latch_mode is
BTR_MODIFY_LEAF or BTR_SEARCH_LEAF. Release btr_sea::partition::latch
before buffer-fixing the block. The page latch that we already acquired
is preventing buffer pool eviction. We must validate both
block->index and block->page.state while holding part.latch
in order to avoid race conditions with buffer page relocation
or buf_pool_t::resize().

btr_search_check_guess(): Remove the constant parameter
can_only_compare_to_cursor_rec=false.

ahi_node: Replaces ha_node_t.

This has been tested by running the regression test suite
with the adaptive hash index enabled:
./mtr --mysqld=--loose-innodb-adaptive-hash-index=ON

Reviewed by: Vladislav Lesin
2025-01-10 16:30:42 +02:00
Sergei Golubchik
221aa5e08f Merge branch '10.6' into 10.11 2025-01-10 13:14:42 +01:00
Marko Mäkelä
990b010b09 MDEV-35438 Annotate InnoDB I/O functions with noexcept
Most InnoDB functions do not throw any exceptions, not even indirectly
std::bad_alloc, which could be thrown by a C++ memory allocation function.
Let us annotate many functions with noexcept in order to reduce the code
footprint related to exception handling.

Reviewed by: Thirunarayanan Balathandayuthapani
2025-01-09 07:43:24 +02:00
Kristian Nielsen
0f47db8525 Merge 10.11 -> 11.4
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 11:01:42 +01:00
Kristian Nielsen
e7c6cdd842 Merge 10.6 -> 10.11
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 10:11:58 +01:00
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Marko Mäkelä
1a9011d273 MDEV-35525: Index corruption in reverse scans
btr_cur_t::search_leaf(): In the BTR_SEARCH_PREV and BTR_MODIFY_PREV
modes, reset the previous search status before invoking
page_cur_search_with_match(). Otherwise, we the search could invoke
in a totally wrong subtree.

This fixes a regression that was introduced in
commit de4030e4d4 (MDEV-30400).
2024-11-29 15:12:20 +02:00
Marko Mäkelä
507323abe6 Cleanup: Remove duplicated code
buf_block_alloc(): Define as an alias in buf0lru.h, which defines
the underlying buf_LRU_get_free_block().

buf_block_free(): Define as an alias of the non-inline function
buf_pool.free_block(block).

Reviewed by: Vladislav Lesin
2024-11-29 14:16:34 +02:00
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
Marko Mäkelä
19acb0257e MDEV-35508 Race condition between purge and secondary index INSERT or UPDATE
row_purge_remove_sec_if_poss_leaf(): If there is an active transaction
that is not newer than PAGE_MAX_TRX_ID, return the bogus value 1
so that row_purge_remove_sec_if_poss_tree() is guaranteed to recheck if
the record needs to be purged. It could be the case that an active
transaction would insert this record between the time this check
completed and row_purge_remove_sec_if_poss_tree() acquired a latch
on the secondary index leaf page again.

row_purge_del_mark_error(), row_purge_check(): Some unlikely code
refactored into separate non-inline functions.

trx_sys_t::find_same_or_older_low(): Move the unlikely and bulky
part of trx_sys_t::find_same_or_older() to a non-inline function.

trx_sys_t::find_same_or_older_in_purge(): A variant of
trx_sys_t::find_same_or_older() for use in the purge subsystem,
with potential concurrent access of the same trx_t object from
multiple threads.

trx_t::max_inactive_id_atomic: An Atomic_relaxed alias of the
regular data field trx_t::max_inactive_id, which we
use on systems that have native 64-bit loads or stores.
On any 64-bit system that seems to be supported by GCC, Clang or MSVC,
relaxed atomic loads and stores use the regular load and store
instructions. On -march=i686 the 64-bit atomic loads and stores
would use an XMM register.

This fixes a regression that had been introduced in
commit b7b9f3ce82 (MDEV-34515).
There would be messages
[ERROR] InnoDB: tried to purge non-delete-marked record in index
in the server error log, and an assertion ut_ad(0) would cause a
crash of debug instrumented builds. This could also cause incorrect
results for MVCC reads and corrupted secondary indexes.

The debug instrumented test case was written by Debarun Banerjee.

Reviewed by: Debarun Banerjee
2024-11-29 10:44:38 +02:00
Marko Mäkelä
26597b91b3 MDEV-35413 InnoDB: Cannot load compressed BLOB
A race condition was observed between two buf_page_get_zip() for a page.
One of them had proceeded to buf_read_page(), allocating and x-latching
a buf_block_t that initially comprises only an uncompressed page frame.
While that thread was waiting inside buf_block_alloc(), another thread
would try to access the same page. Without acquiring a page latch, it
would wrongly conclude that there is corruption because no compressed
page frame exists for the block.

buf_page_get_zip(): Simplify the logic and correct the documentation.
Always acquire a shared latch to prevent any race condition with a
concurrent read operation. No longer increment a buffer-fix; the latch
is sufficient for preventing page relocation or eviction.

buf_read_page(): Add the parameter bool unzip=true. In buf_page_get_zip()
there is no need to allocate an uncompressed page frame for reading a
compressed BLOB page. We only need that for other ROW_FORMAT=COMPRESSED
pages, or for writing compressed BLOB pages.

btr_copy_zblob_prefix(): Remove the message "Cannot load compressed BLOB"
because buf_page_get_zip() will already have reported a more specific
error whenever it returns nullptr.

row_merge_buf_add(): Do not crash on BLOB corruption, but return an
error instead. (In debug builds, an assertion will fail if this
corruption is noticed.)

Reviewed by: Debarun Banerjee
2024-11-22 08:33:03 +02:00
Marko Mäkelä
895cd553a3 MDEV-32175: Reduce page_align(), page_offset() calls
When srv_page_size and innodb_page_size were introduced,
the functions page_align() and page_offset() got more expensive.
Let us try to replace such calls with simpler pointer arithmetics
with respect to the buffer page frame.

page_rec_get_next_non_del_marked(): Add a page frame as a parameter,
and template<bool comp>.

page_rec_next_get(): A more efficient variant of page_rec_get_next(),
with template<bool comp> and const page_t* parameters.

lock_get_heap_no(): Replaces page_rec_get_heap_no() outside debug checks.

fseg_free_step(), fseg_free_step_not_header(): Take the header block
as a parameter.

Reviewed by: Vladislav Lesin
2024-11-21 11:01:30 +02:00
Marko Mäkelä
3c312d247c MDEV-35190 HASH_SEARCH duplicates effort before HASH_INSERT or HASH_DELETE
The HASH_ macros are unnecessarily obfuscating the logic,
so we had better replace them.

hash_cell_t::search(): Implement most of the HASH_DELETE logic,
for a subsequent insert or remove().

hash_cell_t::remove(): Remove an element.

hash_cell_t::find(): Implement the HASH_SEARCH logic.

xb_filter_hash_free(): Avoid any hash table lookup;
just traverse the hash bucket chains and free each element.

xb_register_filter_entry(): Search databases_hash only once.

rm_if_not_found(): Make use of find_filter_in_hashtable().

dict_sys_t::acquire_temporary_table(), dict_sys_t::find_table():
Define non-inline to avoid unnecessary code duplication.

dict_sys_t::add(dict_table_t *table), dict_table_rename_in_cache():
Look for duplicate while finding the insert position.

dict_table_change_id_in_cache(): Merged to the only caller
row_discard_tablespace().

hash_insert(): Helper function of dict_sys_t::resize().

fil_space_t::create(): Look for a duplicate (and crash if found)
when searching for the insert position.

lock_rec_discard(): Take the hash array cell as a parameter
to avoid a duplicated lookup.

lock_rec_free_all_from_discard_page(): Remove a parameter.

Reviewed by: Debarun Banerjee
2024-11-21 08:59:02 +02:00
ParadoxV5
d5f16d6305 Extract some of fixes to 10.6.x
That PR uncovered countless issues on `my_snprintf` uses.
This commit backports a squashed subset of their fixes (excludes ).
2024-11-18 13:29:04 +11:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 2024-10-29 15:24:38 +01:00
Vlad Lesin
8c7786e7d5 MDEV-34690 lock_rec_unlock_unmodified() causes deadlock
lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock()
or under a combination of lock_sys.rd_lock() + record locks hash table
cell latch. It also requests page latch to check if locked records were
changed by the current transaction or not.

Usually InnoDB requests page latch to find the certain record on the
page, and then requests lock_sys and/or record lock hash cell latch to
request record lock. lock_rec_unlock_unmodified() requests the latches
in the opposite order, what causes deadlocks. One of the possible
scenario for the deadlock is the following:

thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table
           cell latch, the latch is acquired;
thread 2 - purge thread acquires page latch and tries to remove
           delete-marked record, it invokes lock_update_delete(), which
           requests locks hash table cell latch, held by thread 1;
thread 1 - requests page latch, held by thread 2.

To fix it we need to release lock_sys.latch and/or lock hash cell latch,
acquire page latch and re-acquire lock_sys related latches.

When lock_sys.latch and/or lock hash cell latch are released in
lock_release_on_prepare() and lock_release_on_prepare_try(), the page on
which the current lock is held, can be merged. In this case the bitmap
of the current lock must be cleared, and the new lock must be added to
the end of trx->lock.trx_locks list, or bitmap of already existing lock
must be changed.

The new field trx_lock_t::set_nth_bit_calls indicates if new locks
(bits in existing lock bitmaps or new lock objects) were created during
the period when lock_sys was released in trx->lock.trx_locks list
iteration loop in lock_release_on_prepare() or
lock_release_on_prepare_try(). And, if so, we traverse the list again.

The block can be freed during pages merging, what causes assertion
failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page
get mode to it. That's why page_get_mode parameter was added to
btr_block_get() to pass BUF_GET_POSSIBLY_FREED from
lock_release_on_prepare() and lock_release_on_prepare_try() to
buf_page_get_gen().

As searching for id of trx, which modified secondary index record, is
quite expensive operation, restrict its usage for master. System variable
was added to remove the restriction for testing simplifying. The
variable exists only either for debug build or for build with
-DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the
probability of catching bugs for release build with RQG.

Note that the code, which does primary index lookup to find out what
transaction modified secondary index record, is necessary only when
there is no primary key and no unique secondary key on replica with row
based replication, because only in this case extra X locks on unmodified
records can be set during scan phase.

Reviewed by Marko Mäkelä.
2024-10-23 12:36:17 +03:00
Marko Mäkelä
740519e15a MDEV-35125: Unnecessary buf_pool.page_hash lookups
dict_index_t::clear(), btr_drop_temporary_table(): Make use of the
root page guess if it is available.

btr_read_autoinc(): Invoke btr_root_block_get() to access the root page.

btr_blob_free(): Retain a buffer-fix on the page across mtr_t::commit()
in order to avoid a buf_pool.page_hash lookup.

dict_load_table_one(): Remove a redundant check for page id. It was
already validated in buf_page_t::read_complete().

trx_t::apply_log(): Make use of buf_pool.page_fix() to avoid some
mtr_t related overhead.

Reviewed by: Thirunarayanan Balathandayuthapani
2024-10-17 09:10:45 +03:00
Marko Mäkelä
64b75865d5 MDEV-34823 after-merge fix
btr_cur_t::search_leaf(): Remove a redundant condition.
This fixes up the merge commit cfa9784edb
2024-09-18 07:06:35 +03:00
Yuchen Pei
cfa9784edb
Merge branch '10.11' into 11.2 2024-09-18 10:25:16 +10:00
Yuchen Pei
b168859d1e
Merge branch '10.6' into 10.11 2024-09-11 16:10:53 +10:00
Marko Mäkelä
f0de610d0c Merge 10.11 into 11.2 2024-09-10 18:35:16 +03:00
Marko Mäkelä
b7b2d2bde4 Merge 10.5 into 10.6 2024-09-09 11:30:30 +03:00
Marko Mäkelä
f9f92b480e Merge 10.6 into 10.11 2024-09-06 16:17:42 +03:00
Marko Mäkelä
024a18dbcb MDEV-34823 Invalid arguments in ib_push_warning()
In the bug report MDEV-32817 it occurred that the function
row_mysql_get_table_status() is outputting a fil_space_t*
as if it were a numeric tablespace identifier.

ib_push_warning(): Remove. Let us invoke push_warning_printf() directly.

innodb_decryption_failed(): Report a decryption failure and set the
dict_table_t::file_unreadable flag. This code was being duplicated in
very many places. We return the constant value DB_DECRYPTION_FAILED
in order to avoid code duplication in the callers and to allow tail calls.

innodb_fk_error(): Report a FOREIGN KEY error.

dict_foreign_def_get(), dict_foreign_def_get_fields(): Remove.
This code was being used in dict_create_add_foreign_to_dictionary()
in an apparently uncovered code path. That ib_push_warning() call
would pass the integer i+1 instead of a pointer to NUL terminated
string ("%s"), and therefore the call should have resulted in a crash.

dict_print_info_on_foreign_key_in_create_format(),
innobase_quote_identifier(): Add const qualifiers.

row_mysql_get_table_error(): Replaces row_mysql_get_table_status().
Display no message on DB_CORRUPTION; it should be properly reported at
the SQL layer anyway.
2024-09-06 14:29:09 +03:00
Marko Mäkelä
9878238f74 MDEV-34791: Redundant page lookups hurt performance
btr_cur_t::search_leaf(): When the index root page is also a leaf page,
we may need to upgrade our existing shared root page latch into an
exclusive latch. Even if we end up waiting, the root page won't be able
to go away while we hold an index()->lock. The index page may be split;
that is all.

btr_latch_prev(): Acquire the page latch while holding a buffer-fix
and an index tree latch. Merge the change buffer if needed. Use
buf_pool_t::page_fix() for this special case instead of complicating
buf_page_get_low() and buf_page_get_gen().

row_merge_read_clustered_index(): Remove some code that does not seem
to be useful. No difference was observed with regard to removing this
code when a CREATE INDEX or OPTIMIZE TABLE statement was run concurrently
with sysbench oltp_update_index --tables=1 --table_size=1000 --threads=16.

buf_pool_t::unzip(): Decompress a ROW_FORMAT=COMPRESSED page.

buf_pool_t::page_fix(): Handle also ROW_FORMAT=COMPRESSED pages
as well as change buffer merge. Optionally return an error.
Add a flag for suppressing a page latch wait and a special return
value -1 to indicate that the call would block.
This is the preferred way of buffer-fixing blocks.
The functions buf_page_get_gen() and buf_page_get_low() are only being
invoked with rw_latch=RW_NO_LATCH in operations on SPATIAL INDEX.

buf_page_t: Define some static functions for interpreting state().

buf_page_get_zip(), buf_read_page(),
buf_read_ahead_random(), buf_read_ahead_linear():
Remove the redundant parameter zip_size. We must look up the
tablespace and can invoke fil_space_t::zip_size() on it.

buf_page_get_low(): Require mtr!=nullptr.

buf_page_get_gen(): Implement some lock downgrading during recovery.

ibuf_page_low(): Use buf_pool_t::page_fix() in a debug check.
We do wait for a page read here, because otherwise a debug assertion in
buf_page_get_low() in the test innodb.ibuf_delete could occasionally fail.

PageConverter::operator(): Invoke buf_pool_t::page_fix() in order
to possibly evict a block. This allows us to remove some
special case code from buf_page_get_low().
2024-09-03 14:15:57 +03:00
Marko Mäkelä
cfcf27c6fe Merge 10.6 into 10.11 2024-08-29 07:47:29 +03:00
Marko Mäkelä
b7b9f3ce82 MDEV-34515: Contention between purge and workload
In a Sysbench oltp_update_index workload that involves 1 table,
a serious contention between the workload and the purge of history
was observed. This was the worst when the table contained only 1 record.

This turned out to be fixed by setting innodb_purge_batch_size=128,
which corresponds to the number of usable persistent rollback segments.
When we go above that, there would be contention between row_purge_poss_sec()
and the workload, typically on the clustered index page latch, sometimes
also on a secondary index page latch. It might be that with smaller
batches, trx_sys.history_size() will end up pausing all concurrent
transaction start/commit frequently enough so that purge will be able
to make some progress, so that there would be less contention on the
index page latches between purge and SQL execution.

In commit aa719b5010 (part of MDEV-32050)
the interpretation of the parameter innodb_purge_batch_size was slightly
changed. It would correspond to the maximum desired size of the
purge_sys.pages cache. Before that change, the parameter was referring to
a number of undo log pages, but the accounting might have been inaccurate.

To avoid a regression, we will reduce the default value to
innodb_purge_batch_size=127, which will also be compatible with
innodb_undo_tablespaces>1 (which will disable rollback segment 0).

Additionally, some logic in the purge and MVCC checks is simplified.
The purge tasks will make use of purge_sys.pages when accessing undo
log pages to find out if a secondary index record can be removed.
If an undo page needs to be looked up in buf_pool.page_hash, we will
merely buffer-fix it. This is correct, because the undo pages are
append-only in nature. Holding purge_sys.latch or purge_sys.end_latch
or the fact that the current thread is executing as a part of an
in-progress purge batch will prevent the contents of the undo page from
being freed and subsequently reused. The buffer-fix will prevent the
page from being evicted form the buffer pool. Thanks to this logic,
we can refer to the undo log record directly in the buffer pool page
and avoid copying the record.

buf_pool_t::page_fix(): Look up and buffer-fix a page. This is useful
for accessing undo log pages, which are append-only by nature.
There will be no need to deal with change buffer or ROW_FORMAT=COMPRESSED
in that case.

purge_sys_t::view_guard::view_guard(): Allow the type of guard to be
acquired: end_latch, latch, or no latch (in case we are a purge thread).

purge_sys_t::view_guard::get(): Read-only accessor to purge_sys.pages.

purge_sys_t::get_page(): Invoke buf_pool_t::page_fix().

row_vers_old_has_index_entry(): Replaced with row_purge_is_unsafe()
and row_undo_mod_sec_unsafe().

trx_undo_get_undo_rec(): Merged to trx_undo_prev_version_build().

row_purge_poss_sec(): Add the parameter mtr and remove redundant
or unused parameters sec_pcur, sec_mtr, is_tree. We will use the
caller's mtr object but release any acquired page latches before
returning.

btr_cur_get_page(), page_cur_get_page(): Do not invoke page_align().

row_purge_remove_sec_if_poss_leaf(): Return the value of PAGE_MAX_TRX_ID
to be checked against the page in row_purge_remove_sec_if_poss_tree().
If the secondary index page was not changed meanwhile, it will be
unnecessary to invoke row_purge_poss_sec() again.

trx_undo_prev_version_build(): Access any undo log pages using
the caller's mini-transaction object.

row_purge_vc_matches_cluster(): Moved to the only compilation unit that
needs it.

Reviewed by: Debarun Banerjee
2024-08-26 12:23:06 +03:00
Marko Mäkelä
7ead48a72b MDEV-34458: Remove more traces of BTR_MODIFY_PREV
In commit 2f6df93748
we fixed an observed case of the bug by removing
some code related to the no longer needed
BTR_MODIFY_PREV mode.

In commit 73ad436e16
an alternative fix was applied that also fixes the
BTR_SEARCH_PREV case.

Let us clean up some implicit references to BTR_MODIFY_PREV
that were missed in 2f6df93748.

btr_pcur_move_backward_from_page(): Assume that the latch mode was
BTR_SEARCH_LEAF.

btr_pcur_move_to_prev(): Assert that the latch mode is BTR_SEARCH_LEAF.
This function is mostly invoked in row0sel.cc for read operations,
as well as in row0merge.cc for reading from the clustered index.
All callers indeed use a cursor in the BTR_SEARCH_LEAF mode.
2024-07-29 14:13:30 +03:00
Oleksandr Byelkin
2447dda2c0 Merge branch '10.11' into 11.1 2024-07-08 22:40:16 +02:00
Oleksandr Byelkin
034a175982 Merge branch '10.6' into 10.11 2024-07-04 11:52:07 +02:00
mariadb-DebarunBanerjee
73ad436e16 MDEV-34458 wait_for_read in buf_page_get_low hurts performance
The performance regression seen while loading BP is caused by the
deadlock fix given in MDEV-33543. The area of impact is wider but is
more visible when BP is being loaded initially via DMLs.  Specifically
the response time could be impacted in DML doing pessimistic operation
on index(split/merge) and the leaf pages are not found in buffer pool.
It is more likely to occur with small BP size.

The origin of the issue dates back to MDEV-30400 that introduced
btr_cur_t::search_leaf() replacing btr_cur_search_to_nth_level() for
leaf page searches. In btr_latch_prev, we use RW_NO_LATCH to get the
previous page fixed in BP without latching. When the page is not in BP,
we try to acquire and wait for S latch violating the latching order.

This deadlock was analyzed in MDEV-33543 and fixed by using the already
present wait logic in buf_page_get_gen() instead of waiting for latch.
The wait logic is inferior to usual S latch wait and is simply a
repeated sleep 100 of micro-sec (The actual sleep time could be more
depending on platforms). The bug was seen with "change-buffering" code
path and the idea was that this path should be less exercised. The
judgement was not correct and the path is actually quite frequent and
does impact performance when pages are not in BP and being loaded by
DML expanding/shrinking large data.

Fix: While trying to get a page with RW_NO_LATCH and we are attempting
"out of order" latch, return from buf_page_get_gen immediately instead
of waiting and follow the ordered latching path.
2024-07-03 18:08:43 +05:30
Marko Mäkelä
2f6df93748 MDEV-34458 wait_for_read in buf_page_get_low hurts performance
BTR_MODIFY_PREV: Remove. This mode was only used by the change buffer,
which commit f27e9c8947 (MDEV-29694)
removed.

buf_page_get_gen(): Revert the change that was made in
commit 90b95c6149 (MDEV-33543)
because it is not applicable after MDEV-29694. This fixes the
performance regression that Vladislav Vaintroub reported.

This is a 11.x specific fix; this needs to be fixed differently
in older major versions where the change buffer is present.
2024-06-26 13:51:38 +03:00
Marko Mäkelä
d34289a3e2 Merge 10.11 into 11.1 2024-06-17 09:21:50 +03:00
Marko Mäkelä
b81d717387 Merge 10.6 into 10.11 2024-06-11 12:50:10 +03:00
Marko Mäkelä
9fac857f26 MDEV-34283 A misplaced btr_cur_need_opposite_intention() check may fail to prevent hangs
btr_cur_t::search_leaf(): Invoke btr_cur_need_opposite_intention() after
positioning page_cur.rec so that the record will be in the intended page.
This is something that was broken in
commit f2096478d5 or
commit de4030e4d4 or related changes.

btr_cur_need_opposite_intention(): Add a debug assertion that would
catch the misuse.

The "next line of defence" that should have caught this bug in debug builds
are assertions that mtr_t::m_memo contains MTR_MEMO_X_LOCK for the
dict_index_t::lock. When btr_cur_need_opposite_intention() holds,
we should escalate to acquiring an exclusive index->lock in
btr_cur_t::pessimistic_search_leaf().

Reviewed by: Debarun Banerjee
2024-06-06 13:03:34 +03:00
Sergei Golubchik
f0a5412037 Merge branch '11.0' into 11.1 2024-05-13 09:52:30 +02:00
Sergei Golubchik
f9807aadef Merge branch '10.11' into 11.0 2024-05-12 12:18:28 +02:00
Sergei Golubchik
a6b2f820e0 Merge branch '10.6' into 10.11 2024-05-10 20:02:18 +02:00
Sergei Golubchik
7b53672c63 Merge branch '10.5' into 10.6 2024-05-08 20:06:00 +02:00