Commit graph

9643 commits

Author SHA1 Message Date
Marko Mäkelä
1efdf67e60 Merge 10.5 into 10.6 2023-03-22 15:54:45 +02:00
Marko Mäkelä
701399ad37 MDEV-30882 Crash on ROLLBACK in a ROW_FORMAT=COMPRESSED table
btr_cur_upd_rec_in_place(): Avoid calling page_zip_write_rec() if we
are not modifying any fields that are stored in compressed format.

btr_cur_update_in_place_zip_check(): New function to check if a
ROW_FORMAT=COMPRESSED record can actually be updated in place.

btr_cur_pessimistic_update(): If the BTR_KEEP_POS_FLAG is not set
(we are in a ROLLBACK and cannot write any BLOBs), ignore the potential
overflow and let page_zip_reorganize() or page_zip_compress() handle it.
This avoids a failure when an attempted UPDATE of an NULL column to 0 is
rolled back. During the ROLLBACK, we would try to move a non-updated
long column to off-page storage in order to avoid a compression failure
of the ROW_FORMAT=COMPRESSED page.

page_zip_write_trx_id_and_roll_ptr(): Remove an assertion that would fail
in row_upd_rec_in_place() because the uncompressed page would already
have been modified there.

This is a 10.5 version of commit ff3d4395d8
(different because of commit 08ba388713).
2023-03-22 15:19:52 +02:00
Marko Mäkelä
535a38c455 MDEV-30400 fixup: Fix the UNIV_ZIP_DEBUG build 2023-03-20 14:28:03 +02:00
Marko Mäkelä
32a53a66df MDEV-26827 fixup: Remove a bogus assertion
We can have dirty_blocks=0 when buf_flush_page_cleaner() is being woken up
to write out or evict pages from the buf_pool.LRU list.
2023-03-20 10:32:35 +02:00
Thirunarayanan Balathandayuthapani
e8e0559ed2 MDEV-30870 Undo tablespace name displays wrongly for I_S queries
- INNODB_SYS_TABLESPACES in information schema should display
innodb_undo001, innodb_undo002 etc as tablespace name for undo
tablespaces
2023-03-17 17:17:35 +05:30
Thirunarayanan Balathandayuthapani
18e4978edc MDEV-29975 InnoDB fails to release savepoint during bulk insert
- InnoDB does rollback the whole transaction and discards the
savepoint when there is a failure happens during bulk
insert operation. When server request to release the savepoint,
InnoDB should return DB_SUCCESS when it deals with bulk
insert operation
2023-03-17 16:41:27 +05:30
Marko Mäkelä
a55b951e60 MDEV-26827 Make page flushing even faster
For more convenient monitoring of something that could greatly affect
the volume of page writes, we add the status variable
Innodb_buffer_pool_pages_split that was previously only available
via information_schema.innodb_metrics as "innodb_page_splits".
This was suggested by Axel Schwenke.

buf_flush_page_count: Replaced with buf_pool.stat.n_pages_written.
We protect buf_pool.stat (except n_page_gets) with buf_pool.mutex
and remove unnecessary export_vars indirection.

buf_pool.flush_list_bytes: Moved from buf_pool.stat.flush_list_bytes.
Protected by buf_pool.flush_list_mutex.

buf_pool_t::page_cleaner_status: Replaces buf_pool_t::n_flush_LRU_,
buf_pool_t::n_flush_list_, and buf_pool_t::page_cleaner_is_idle.
Protected by buf_pool.flush_list_mutex. We will exclusively broadcast
buf_pool.done_flush_list by the buf_flush_page_cleaner thread,
and only wait for it when communicating with buf_flush_page_cleaner.
There is no need to keep a count of pending writes by the
buf_pool.flush_list processing. A single flag suffices for that.

Waits for page write completion can be performed by
simply waiting on block->page.lock, or by invoking
buf_dblwr.wait_for_page_writes().

buf_LRU_block_free_non_file_page(): Broadcast buf_pool.done_free and
set buf_pool.try_LRU_scan when freeing a page. This would be
executed also as part of buf_page_write_complete().

buf_page_write_complete(): Do not broadcast buf_pool.done_flush_list,
and do not acquire buf_pool.mutex unless buf_pool.LRU eviction is needed.
Let buf_dblwr count all writes to persistent pages and broadcast a
condition variable when no outstanding writes remain.

buf_flush_page_cleaner(): Prioritize LRU flushing and eviction right after
"furious flushing" (lsn_limit). Simplify the conditions and reduce the
hold time of buf_pool.flush_list_mutex. Refuse to shut down
or sleep if buf_pool.ran_out(), that is, LRU eviction is needed.

buf_pool_t::page_cleaner_wakeup(): Add the optional parameter for_LRU.

buf_LRU_get_free_block(): Protect buf_lru_free_blocks_error_printed
with buf_pool.mutex. Invoke buf_pool.page_cleaner_wakeup(true) to
to ensure that buf_flush_page_cleaner() will process the LRU flush
request.

buf_do_LRU_batch(), buf_flush_list(), buf_flush_list_space():
Update buf_pool.stat.n_pages_written when submitting writes
(while holding buf_pool.mutex), not when completing them.

buf_page_t::flush(), buf_flush_discard_page(): Require that
the page U-latch be acquired upfront, and remove
buf_page_t::ready_for_flush().

buf_pool_t::delete_from_flush_list(): Remove the parameter "bool clear".

buf_flush_page(): Count pending page writes via buf_dblwr.

buf_flush_try_neighbors(): Take the block of page_id as a parameter.
If the tablespace is dropped before our page has been written out,
release the page U-latch.

buf_pool_invalidate(): Let the caller ensure that there are no
outstanding writes.

buf_flush_wait_batch_end(false),
buf_flush_wait_batch_end_acquiring_mutex(false):
Replaced with buf_dblwr.wait_for_page_writes().

buf_flush_wait_LRU_batch_end(): Replaces buf_flush_wait_batch_end(true).

buf_flush_list(): Remove some broadcast of buf_pool.done_flush_list.

buf_flush_buffer_pool(): Invoke also buf_dblwr.wait_for_page_writes().

buf_pool_t::io_pending(), buf_pool_t::n_flush_list(): Remove.
Outstanding writes are reflected by buf_dblwr.pending_writes().

buf_dblwr_t::init(): New function, to initialize the mutex and
the condition variables, but not the backing store.

buf_dblwr_t::is_created(): Replaces buf_dblwr_t::is_initialised().

buf_dblwr_t::pending_writes(), buf_dblwr_t::writes_pending:
Keeps track of writes of persistent data pages.

buf_flush_LRU(): Allow calls while LRU flushing may be in progress
in another thread.

Tested by Matthias Leich (correctness) and Axel Schwenke (performance)
2023-03-16 17:19:58 +02:00
Marko Mäkelä
9593cccf28 MDEV-26055: Improve adaptive flushing
Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0
(not default) and innodb_adaptive_flushing=ON (default).
There is also the parameter innodb_adaptive_flushing_lwm
(default: 10 per cent of the log capacity). It should enable some
adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0.
That is not being changed here.

This idea was first presented by Inaam Rana several years ago,
and I discussed it with Jean-François Gagné at FOSDEM 2023.

buf_flush_page_cleaner(): When we are not near the log capacity limit
(neither buf_flush_async_lsn nor buf_flush_sync_lsn are set),
also try to move clean blocks from the buf_pool.LRU list to buf_pool.free
or initiate writes (but not the eviction) of dirty blocks, until
the remaining I/O capacity has been consumed.

buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify
whether dirty least recently used pages (from buf_pool.LRU) should
be evicted immediately after they have been written out. Callers outside
buf_flush_page_cleaner() will pass evict=true, to retain the existing
behaviour.

buf_do_LRU_batch(): Add the parameter bool evict.
Return counts of evicted and flushed pages.

buf_flush_LRU(): Add the parameter bool evict.
Assume that the caller holds buf_pool.mutex and
will invoke buf_dblwr.flush_buffered_writes() afterwards.

buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list()
whose caller must hold buf_pool.mutex and invoke
buf_dblwr.flush_buffered_writes() afterwards.

buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have
buf_flush_wait_batch_end().

page_cleaner_flush_pages_recommendation(): Avoid some floating-point
arithmetics.

buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(),
buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict".

buf_free_from_unzip_LRU_list_batch(): Remove the parameter.
Only actual page writes will contribute towards the limit.

buf_LRU_free_page(): Evict freed pages of temporary tables.

buf_pool.done_free: Broadcast whenever a block is freed
(and buf_pool.try_LRU_scan is set).

buf_pool_t::io_buf_t::reserve(): Retry indefinitely.
During the test encryption.innochecksum we easily run out of
these buffers for PAGE_COMPRESSED or ENCRYPTED pages.

Tested by Matthias Leich and Axel Schwenke
2023-03-16 17:09:08 +02:00
Marko Mäkelä
4105017a58 MDEV-30357 Performance regression in locking reads from secondary indexes
lock_sec_rec_some_has_impl(): Remove a harmful condition that caused the
performance regression and should not have been added in
commit b6e41e3872 in the first place.
Locking transactions that have not modified any persistent tables
can carry the transaction identifier 0.

trx_t::max_inactive_id: A cache for trx_sys_t::find_same_or_older().
The value is not reset on transaction commit so that previous results
can be reused for subsequent transactions. The smallest active
transaction ID can only increase over time, not decrease.

trx_sys_t::find_same_or_older(): Remember the maximum previous id for which
rw_trx_hash.iterate() returned false, to avoid redundant iterations.

lock_sec_rec_read_check_and_lock(): Add an early return in case we are
already holding a covering table lock.

lock_rec_convert_impl_to_expl(): Add a template parameter to avoid
a redundant run-time check on whether the index is secondary.

lock_rec_convert_impl_to_expl_for_trx(): Move some code from
lock_rec_convert_impl_to_expl(), to reduce code duplication due
to the added template parameter.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2023-03-16 16:00:45 +02:00
Marko Mäkelä
f2096478d5 MDEV-29835 InnoDB hang on B-tree split or merge
This is a follow-up to
commit de4030e4d4 (MDEV-30400),
which fixed some hangs related to B-tree split or merge.

btr_root_block_get(): Use and update the root page guess. This is just
a minor performance optimization, not affecting correctness.

btr_validate_level(): Remove the parameter "lockout", and always
acquire an exclusive dict_index_t::lock in CHECK TABLE without QUICK.
This is needed in order to avoid latching order violation in
btr_page_get_father_node_ptr_for_validate().

btr_cur_need_opposite_intention(): Return true in case
btr_cur_compress_recommendation() would hold later during the
mini-transaction, or if a page underflow or overflow is possible.
If we return true, our caller will escalate to aqcuiring an exclusive
dict_index_t::lock, to prevent a latching order violation and deadlock
during btr_compress() or btr_page_split_and_insert().

btr_cur_t::search_leaf(), btr_cur_t::open_leaf():
Also invoke btr_cur_need_opposite_intention() on the leaf page.

btr_cur_t::open_leaf(): When escalating to exclusive index locking,
acquire exclusive latches on all pages as well.

innobase_instant_try(): Return an error code if the root page cannot
be retrieved.

In addition to the normal stress testing with Random Query Generator (RQG)
this has been tested with
./mtr --mysqld=--loose-innodb-limit-optimistic-insert-debug=2
but with the injection in btr_cur_optimistic_insert() for non-leaf pages
adjusted so that it would use the value 3. (Otherwise, infinite page
splits could occur in some mtr tests.)

Tested by: Matthias Leich
2023-03-16 15:52:42 +02:00
Marko Mäkelä
85cbfaefee Merge 10.5 into 10.6 2023-03-16 15:48:08 +02:00
Marko Mäkelä
1495f057c8 MDEV-30860 Race condition between buffer pool flush and log file deletion in mariadb-backup --prepare
srv_start(): If we are going to close the log file in
mariadb-backup --prepare, call buf_flush_sync() before
calling recv_sys.debug_free() to ensure that the log file
will not be accessed.

This fixes a rather rare failure in the test
mariabackup.innodb_force_recovery where buf_flush_page_cleaner()
would invoke log_checkpoint_low() because !recv_recovery_is_on()
would hold due to the fact that recv_sys.debug_free() had
already been called. Then, the log write for the checkpoint
would fail because srv_start() had invoked log_sys.log.close_file().
2023-03-16 13:39:23 +02:00
Vlad Lesin
7d6b3d4008 MDEV-30775 Performance regression in fil_space_t::try_to_close() introduced in MDEV-23855
fil_node_open_file_low() tries to close files from the top of
fil_system.space_list if the number of opened files is exceeded.

It invokes fil_space_t::try_to_close(), which iterates the list searching
for the first opened space. Then it just closes the space, leaving it in
the same position in fil_system.space_list.

On heavy files opening, like during 'SHOW TABLE STATUS ...' execution,
if the number of opened files limit is reached,
fil_space_t::try_to_close() iterates more and more closed spaces before
reaching any opened space for each fil_node_open_file_low() call. What
causes performance regression if the number of spaces is big enough.

The fix is to keep opened spaces at the top of fil_system.space_list,
and move closed files at the end of the list.

For this purpose fil_space_t::space_list_last_opened pointer is
introduced. It points to the last inserted opened space in
fil_space_t::space_list. When space is opened, it's inserted to the
position just after the pointer points to in fil_space_t::space_list to
preserve the logic, inroduced in MDEV-23855. Any closed space is added
to the end of fil_space_t::space_list.

As opened spaces are located at the top of fil_space_t::space_list,
fil_space_t::try_to_close() finds opened space faster.

There can be the case when opened and closed spaces are mixed in
fil_space_t::space_list if fil_system.freeze_space_list was set during
fil_node_open_file_low() execution. But this should not cause any error,
as fil_space_t::try_to_close() still iterates spaces in the list.

There is no need in any test case for the fix, as it does not change any
functionality, but just fixes performance regression.
2023-03-10 18:31:10 +03:00
Marko Mäkelä
f169dfb41a Merge 10.5 into 10.6 2023-03-10 09:35:50 +02:00
Marko Mäkelä
08267ba0c8 MDEV-30819 InnoDB fails to start up after downgrading from MariaDB 11.0
While downgrades are not supported and misguided attempts at it could
cause serious corruption especially after
commit b07920b634
it might be useful if InnoDB would start up even after an upgrade to
MariaDB Server 11.0 or later had removed the change buffer.

innodb_change_buffering_update(): Disallow anything else than
innodb_change_buffering=none when the change buffer is corrupted.

ibuf_init_at_db_start(): Mention a possible downgrade in the corruption
error message. If innodb_change_buffering=none, ignore the error but do
not initialize ibuf.index.

ibuf_free_excess_pages(), ibuf_contract(), ibuf_merge_space(),
ibuf_update_max_tablespace_id(), ibuf_delete_for_discarded_space(),
ibuf_print(): Check for !ibuf.index.

ibuf_check_bitmap_on_import(): Remove some unnecessary code.
This function is only accessing change buffer bitmap pages in a
data file that is not attached to the rest of the database.
It is not accessing the change buffer tree itself, hence it does
not need any additional mutex protection.

This has been tested both by starting up MariaDB Server 10.8 on
a 11.0 data directory, and by running ./mtr --big-test while
ibuf_init_at_db_start() was tweaked to always fail.
2023-03-09 16:16:58 +02:00
Marko Mäkelä
b1646d0433 MDEV-30567 rec_get_offsets() is not optimal
rec_init_offsets_comp_ordinary(), rec_init_offsets(),
rec_get_offsets_reverse(), rec_get_nth_field_offs_old():
Simplify some bitwise arithmetics to avoid conditional jumps,
and add branch prediction hints with the assumption that most
variable-length columns are short.

Tested by: Matthias Leich
2023-03-06 17:17:32 +02:00
Thirunarayanan Balathandayuthapani
49e2b50d59 MDEV-30341 Reset check_foreigns, check_unique_secondary variables
- InnoDB fails to reset the check_foreigns and check_unique_secondary
in trx_t::free(), trx_t::commit_cleanup(). This lead to bulk insert
in internal innodb fts table operation.
2023-03-02 15:49:21 +05:30
Marko Mäkelä
085d0ac238 Merge 10.5 into 10.6 2023-02-28 16:05:21 +02:00
Marko Mäkelä
c14a39431b MDEV-30753 Possible corruption due to trx_purge_free_segment()
Starting with commit 0de3be8cfd (MDEV-30671),
the field TRX_UNDO_NEEDS_PURGE lost its previous meaning.
The following scenario is possible:

(1) InnoDB is killed at a point of time corresponding to the durable
execution of some fseg_free_step_not_header() but not
trx_purge_remove_log_hdr().
(2) After restart, the affected pages are allocated for something else.
(3) Purge will attempt to access the newly reallocated pages when looking
for some old undo log records.

trx_purge_free_segment(): Invoke trx_purge_remove_log_hdr() as the first
thing, to be safe. If the server is killed, some pages will never be
freed. That is the lesser evil. Also, before each mtr.start(), invoke
log_free_check() to prevent ib_logfile0 overrun.
2023-02-28 15:39:23 +02:00
Marko Mäkelä
3e2ad0e918 Merge 10.5 into 10.6 2023-02-27 13:17:35 +02:00
Marko Mäkelä
0de3be8cfd MDEV-30671 InnoDB undo log truncation fails to wait for purge of history
It is not safe to invoke trx_purge_free_segment() or execute
innodb_undo_log_truncate=ON before all undo log records in
the rollback segment has been processed.

A prominent failure that would occur due to premature freeing of
undo log pages is that trx_undo_get_undo_rec() would crash when
trying to copy an undo log record to fetch the previous version
of a record.

If trx_undo_get_undo_rec() was not invoked in the unlucky time frame,
then the symptom would be that some committed transaction history is
never removed. This would be detected by CHECK TABLE...EXTENDED that
was impleented in commit ab0190101b.
Such a garbage collection leak should be possible even when using
innodb_undo_log_truncate=OFF, just involving trx_purge_free_segment().

trx_rseg_t::needs_purge: Change the type from Boolean to a transaction
identifier, noting the most recent non-purged transaction, or 0 if
everything has been purged. On transaction start, we initialize this
to 1 more than the transaction start ID. On recovery, the field may be
adjusted to the transaction end ID (TRX_UNDO_TRX_NO) if it is larger.

The field TRX_UNDO_NEEDS_PURGE becomes write-only; only some debug
assertions that would validate the value. The field reflects the old
inaccurate Boolean field trx_rseg_t::needs_purge.

trx_undo_mem_create_at_db_start(), trx_undo_lists_init(),
trx_rseg_mem_restore(): Remove the parameter max_trx_id.
Instead, store the maximum in trx_rseg_t::needs_purge,
where trx_rseg_array_init() will find it.

trx_purge_free_segment(): Contiguously hold a lock on
trx_rseg_t to prevent any concurrent allocation of undo log.

trx_purge_truncate_rseg_history(): Only invoke trx_purge_free_segment()
if the rollback segment is empty and there are no pending transactions
associated with it.

trx_purge_truncate_history(): Only proceed with innodb_undo_log_truncate=ON
if trx_rseg_t::needs_purge indicates that all history has been purged.

Tested by: Matthias Leich
2023-02-24 14:24:44 +02:00
Thirunarayanan Balathandayuthapani
db245e1140 MDEV-25984 Assertion `max_doc_id > 0' failed in fts_init_doc_id()
- rollback_inplace_alter_table() locks the fts internal tables.
At the time, insert tries to fetch the doc id from config table,
fails to lock the config table and returns doc id as 0.

fts_cmp_set_sync_doc_id(): Retry to fetch the doc id again if
it encounter DB_LOCK_WAIT_TIMEOUT error
2023-02-22 18:54:00 +05:30
Vlad Lesin
a474e3278c MDEV-27701 Race on trx->lock.wait_lock between lock_rec_move() and lock_sys_t::cancel()
The initial issue was in assertion failure, which checked the equality
of lock to cancel with trx->lock.wait_lock in lock_sys_t::cancel().

If we analyze lock_sys_t::cancel() code from the perspective of
trx->lock.wait_lock racing, we won't find the error there, except the
cases when we need to reload it after the corresponding latches
acquiring.

So the fix is just to remove the assertion and reload
trx->lock.wait_lock after acquiring necessary latches.

Reviewed by: Marko Mäkelä <marko.makela@mariadb.com>
2023-02-20 20:31:24 +03:00
Marko Mäkelä
201cfc33e6 MDEV-30638 Deadlock between INSERT and InnoDB non-persistent statistics update
This is a partial revert of
commit 8b6a308e46 (MDEV-29883)
and a follow-up to the
merge commit 394fc71f4f (MDEV-24569).

The latching order related to any operation that accesses the allocation
metadata of an InnoDB index tree is as follows:

1. Acquire dict_index_t::lock in non-shared mode.
2. Acquire the index root page latch in non-shared mode.
3. Possibly acquire further index page latches. Unless an exclusive
dict_index_t::lock is held, this must follow the root-to-leaf,
left-to-right order.
4. Acquire a *non-shared* fil_space_t::latch.
5. Acquire latches on the allocation metadata pages.
6. Possibly allocate and write some pages, or free some pages.

btr_get_size_and_reserved(), dict_stats_update_transient_for_index(),
dict_stats_analyze_index(): Acquire an exclusive fil_space_t::latch
in order to avoid a deadlock in fseg_n_reserved_pages() in case of
concurrent access to multiple indexes sharing the same "inode page".

fseg_page_is_allocated(): Acquire an exclusive fil_space_t::latch
in order to avoid deadlocks. All callers are holding latches
on a buffer pool page, or an index, or both.
Before commit edbde4a11f (MDEV-24167)
a third mode was available that would not conflict with the shared
fil_space_t::latch acquired by ha_innobase::info_low(),
i_s_sys_tablespaces_fill_table(),
or i_s_tablespaces_encryption_fill_table().
Because those calls should be rather rare, it makes sense to use
the simple rw_lock with only shared and exclusive modes.

fil_crypt_get_page_throttle(): Avoid invoking fseg_page_is_allocated()
on an allocation bitmap page (which can never be freed), to avoid
acquiring a shared latch on top of an exclusive one.

mtr_t::s_lock_space(), MTR_MEMO_SPACE_S_LOCK: Remove.
2023-02-16 08:30:20 +02:00
Marko Mäkelä
54c0ac72e3 MDEV-30134 Assertion failed in buf_page_t::unfix() in buf_pool_t::watch_unset()
buf_pool_t::watch_set(): Always buffer-fix a block if one was found,
no matter if it is a watch sentinel or a buffer page. The type of
the block descriptor will be rechecked in buf_page_t::watch_unset().
Do not expect the caller to acquire the page hash latch. Starting with
commit bd5a6403ca it is safe to release
buf_pool.mutex before acquiring a buf_pool.page_hash latch.

buf_page_get_low(): Adjust to the changed buf_pool_t::watch_set().

This simplifies the logic and fixes a bug that was reproduced when
using debug builds and the setting innodb_change_buffering_debug=1.
2023-02-16 08:29:44 +02:00
Marko Mäkelä
9c15799462 MDEV-30397: MariaDB crash due to DB_FAIL reported for a corrupted page
buf_read_page_low(): Map the buf_page_t::read_complete() return
value DB_FAIL to DB_PAGE_CORRUPTED. The purpose of the DB_FAIL
return value is to avoid error log noise when read-ahead brings
in an unused page that is typically filled with NUL bytes.

If a synchronous read is bringing in a corrupted page where the
page frame does not contain the expected tablespace identifier and
page number, that must be treated as an attempt to read a corrupted
page. The correct error code for this is DB_PAGE_CORRUPTED.
The error code DB_FAIL is not handled by row_mysql_handle_errors().

This was missed in commit 0b47c126e3
(MDEV-13542).
2023-02-16 08:28:14 +02:00
Marko Mäkelä
cc27e5fd0e Merge 10.5 into 10.6 2023-02-16 08:17:54 +02:00
Marko Mäkelä
5300c0fb76 MDEV-30657 InnoDB: Not applying UNDO_APPEND due to corruption
This almost completely reverts
commit acd23da4c2 and
retains a safe optimization:

recv_sys_t::parse(): Remove any old redo log records for the
truncated tablespace, to free up memory earlier.
If recovery consists of multiple batches, then recv_sys_t::apply()
will must invoke recv_sys_t::trim() again to avoid wrongly
applying old log records to an already truncated undo tablespace.
2023-02-15 18:16:41 +02:00
Marko Mäkelä
96a3b11d13 Merge 10.5 into 10.6 2023-02-14 15:23:23 +02:00
Thirunarayanan Balathandayuthapani
1a5c7552ea MDEV-30552 InnoDB recovery crashes when error handling scenario
- InnoDB fails to reset the after_apply variable before applying
the redo log in last batch during multi-batch recovery.
2023-02-14 14:36:17 +05:30
Thirunarayanan Balathandayuthapani
3eea2e8e10 MDEV-30551 InnoDB recovery hangs when buffer pool ran out of memory
- During non-last batch of multi-batch recovery, InnoDB holds
log_sys.mutex and preallocates the block which may intiate
page flush, which may initiate log flush, which requires
log_sys.mutex to acquire again. This leads to assert failure.
So InnoDB recovery should release log_sys.mutex before
preallocating the block.
2023-02-14 14:35:35 +05:30
Marko Mäkelä
6aec87544c Merge 10.5 into 10.6 2023-02-10 13:03:01 +02:00
Marko Mäkelä
c41c79650a Merge 10.4 into 10.5 2023-02-10 12:02:11 +02:00
Vicențiu Ciorbaru
08c852026d Apply clang-tidy to remove empty constructors / destructors
This patch is the result of running
run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' .

Code style changes have been done on top. The result of this change
leads to the following improvements:

1. Binary size reduction.
* For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by
  ~400kb.
* A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb.

2. Compiler can better understand the intent of the code, thus it leads
   to more optimization possibilities. Additionally it enabled detecting
   unused variables that had an empty default constructor but not marked
   so explicitly.

   Particular change required following this patch in sql/opt_range.cc

   result_keys, an unused template class Bitmap now correctly issues
   unused variable warnings.

   Setting Bitmap template class constructor to default allows the compiler
   to identify that there are no side-effects when instantiating the class.
   Previously the compiler could not issue the warning as it assumed Bitmap
   class (being a template) would not be performing a NO-OP for its default
   constructor. This prevented the "unused variable warning".
2023-02-09 16:09:08 +02:00
Daniel Black
785386c807 innodb: cmake - sched_getcpu removed - not used 2023-02-08 13:11:37 +11:00
Marko Mäkelä
acd23da4c2 MDEV-30479 optimization: Invoke recv_sys_t::trim() earlier
recv_sys_t::parse(): Discard old page-level redo log when parsing
a TRIM_PAGES record.

recv_sys_t::apply(): trim() was invoked in parse() already.

recv_sys_t::truncated_undo_spaces[]: Only store the size, no LSN.
2023-02-06 20:29:42 +02:00
Marko Mäkelä
461402a564 MDEV-30479 OPT_PAGE_CHECKSUM mismatch after innodb_undo_log_truncate=ON
page_recv_t::trim(): Do remove log records for mini-transactions
that end right at the threshold LSN. This will avoid an inconsistency
where a dirty page had been evicted from the buffer pool during
undo tablespace truncation, and recovery would attempt to apply
log records for which the last available copy in the data file is
too new. These changes would be discarded anyway.
2023-02-06 20:29:29 +02:00
Marko Mäkelä
1c926b6263 MDEV-30527 Assertion !m_freed_pages in mtr_t::start() on DROP TEMPORARY TABLE
mtr_t::commit(): Add special handling of
innodb_immediate_scrub_data_uncompressed for TEMPORARY TABLE.

This fixes a regression that was caused by
commit de4030e4d4 (MDEV-30400).
2023-02-01 10:55:49 +02:00
Marko Mäkelä
ccafd2d49d MDEV-30524 btr_cur_t::open_leaf() opens non-leaf page in BTR_MODIFY_LEAF mode
btr_cur_t::open_leaf(): When we have to reopen the root page in
a different mode, ensure that we will actually acquire a latch upfront,
instead of using RW_NO_LATCH. This prevents a race condition where
the index tree would be split between the time we released the
root page S latch and finally acquired a latch in
mtr->upgrade_buffer_fix(), actually on a non-leaf root page.

This race condition was introduced in
commit 89ec4b53ac (MDEV-29603).
2023-01-31 16:38:11 +02:00
Oleksandr Byelkin
c3a5cf2b5b Merge branch '10.5' into 10.6 2023-01-31 09:31:42 +01:00
Marko Mäkelä
672cdcbb93 MDEV-30404: Inconsistent updates of PAGE_MAX_TRX_ID on ROW_FORMAT=COMPRESSED pages
page_copy_rec_list_start(): Do not update the PAGE_MAX_TRX_ID
on the compressed copy of the page. The modification is supposed
to be logged as part of page_zip_compress() or page_zip_reorganize().
If the page cannot be compressed (due to running out of space),
then page_zip_decompress() must be able to roll back the changes.

This fixes a regression that was introduced in
commit 56f6dab1d0 (MDEV-21174).
2023-01-26 14:47:52 +02:00
Thirunarayanan Balathandayuthapani
81196469bb MDEV-30429 InnoDB: Failing assertion: stat_value != UINT64_UNDEFINED in storage/innobase/dict/dict0stats.cc line 3647
In dict_stats_analyze_index(), InnoDB sets the maximum value for
index_stats_t to indicate the bulk under bulk insert operation.
But InnoDB fails to empty the statistics of the table in that case.
2023-01-26 18:09:34 +05:30
Marko Mäkelä
e02ed04d17 MDEV-23855 fixup: Remove SRV_MASTER_CHECKPOINT_INTERVAL 2023-01-25 13:53:10 +02:00
Marko Mäkelä
de4030e4d4 MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT
This also fixes part of MDEV-29835 Partial server freeze
which is caused by violations of the latching order that was
defined in https://dev.mysql.com/worklog/task/?id=6326
(WL#6326: InnoDB: fix index->lock contention). Unless the
current thread is holding an exclusive dict_index_t::lock,
it must acquire page latches in a strict parent-to-child,
left-to-right order. Not all cases of MDEV-29835 are fixed yet.
Failure to follow the correct latching order will cause deadlocks
of threads due to lock order inversion.

As part of these changes, the BTR_MODIFY_TREE mode is modified
so that an Update latch (U a.k.a. SX) will be acquired on the
root page, and eXclusive latches (X) will be acquired on all pages
leading to the leaf page, as well as any left and right siblings
of the pages along the path. The DEBUG_SYNC test innodb.innodb_wl6326
will be removed, because at the time the DEBUG_SYNC point is hit,
the thread is actually holding several page latches that will be
blocking a concurrent SELECT statement.

We also remove double bookkeeping that was caused due to excessive
information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo
store information of latched pages, and ensure that
mtr_memo_slot_t::object is never a null pointer.
The tree_blocks[] and tree_savepoints[] were redundant.

buf_page_get_low(): If innodb_change_buffering_debug=1, to avoid
a hang, do not try to evict blocks if we are holding a latch on
a modified page. The test innodb.innodb-change-buffer-recovery
will be removed, because change buffering may no longer be forced
by debug injection when the change buffer comprises multiple pages.
Remove a debug assertion that could fail when
innodb_change_buffering_debug=1 fails to evict a page.
For other cases, the assertion is redundant, because we already
checked that right after the got_block: label. The test
innodb.innodb-change-buffering-recovery will be removed, because
due to this change, we will be unable to evict the desired page.

mtr_t::lock_register(): Register a change of a page latch
on an unmodified buffer-fixed block.

mtr_t::x_latch_at_savepoint(), mtr_t::sx_latch_at_savepoint():
Replaced by the use of mtr_t::upgrade_buffer_fix(), which now
also handles RW_S_LATCH.

mtr_t::set_modified(): For temporary tables, invoke
buf_page_t::set_modified() here and not in mtr_t::commit().
We will never set the MTR_MEMO_MODIFY flag on other than
persistent data pages, nor set mtr_t::m_modifications when
temporary data pages are modified.

mtr_t::commit(): Only invoke the buf_flush_note_modification() loop
if persistent data pages were modified.

mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo.
This avoids many redundant entries in mtr_t::m_memo, as well as
redundant calls to buf_page_get_gen() for blocks that had already
been looked up in a mini-transaction.

btr_get_latched_root(): Return a pointer to an already latched root page.
This replaces btr_root_block_get() in cases where the mini-transaction
has already latched the root page.

btr_page_get_parent(): Fetch a parent page that was already latched
in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched().
If needed, upgrade the root page U latch to X.
This avoids bloating mtr_t::m_memo as well as performing redundant
buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for
B-tree defragmentation, we will invoke btr_cur_search_to_nth_level().

btr_cur_search_to_nth_level(): This will only be used for non-leaf
(level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE
or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be
removed altogether, or retained for the case of
CHECK TABLE without QUICK.

btr_cur_t::left_block: Remove. btr_pcur_move_backward_from_page()
can retrieve the left sibling from the end of mtr_t::m_memo.

btr_cur_t::open_leaf(): Some clean-up.

btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level()
for searches to level=0 (the leaf level). We will never release
parent page latches before acquiring leaf page latches. If we need to
temporarily release the level=1 page latch in the BTR_SEARCH_PREV or
BTR_MODIFY_PREV latch_mode, we will reposition the cursor on the
child node pointer so that we will land on the correct leaf page.

btr_cur_t::pessimistic_search_leaf(): Implement new BTR_MODIFY_TREE
latching logic in the case that page splits or merges will be needed.
The parent pages (and their siblings) should already be latched on
the first dive to the leaf and be present in mtr_t::m_memo; there
should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost
suffices; it must be revised in MDEV-29835 and work-arounds removed
for cases where mtr_t::get_already_latched() fails to find a block.

rtr_search_to_nth_level(): A SPATIAL INDEX version of
btr_search_to_nth_level() that can search to any level
(including the leaf level).

rtr_search_leaf(), rtr_insert_leaf(): Wrappers for
rtr_search_to_nth_level().

rtr_search(): Replaces rtr_pcur_open().

rtr_latch_leaves(): Replaces btr_cur_latch_leaves(). Note that unlike
in the B-tree code, there is no error handling in case the sibling
pages are corrupted.

rtr_cur_restore_position(): Remove an unused constant parameter.

btr_pcur_open_on_user_rec(): Remove the constant parameter
mode=PAGE_CUR_GE.

row_ins_clust_index_entry_low(): Use a new
mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page
when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC.

BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove.

BTR_CONT_MODIFY_TREE: Note that this is only used by
rtr_search_to_nth_level().

btr_pcur_optimistic_latch_leaves(): Replaces
btr_cur_optimistic_latch_leaves().

ibuf_delete_rec(): Acquire exclusive ibuf.index->lock in order
to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV).

btr_blob_log_check_t(): Acquire a U latch on the root page,
so that btr_page_alloc() in btr_store_big_rec_extern_fields()
will avoid a deadlock.

btr_store_big_rec_extern_fields(): Assert that the root page latch
is being held.

Tested by: Matthias Leich
Reviewed by: Vladislav Lesin
2023-01-24 14:09:21 +02:00
Denis Protivensky
39f4674599 MDEV-24623 Replicate bulk insert as table-level exclusive key
- introduce table key construction function in wsrep service interface
- don't add row keys when replicating bulk insert
- don't start bulk insert on applier or when transaction is not active
- don't start bulk insert on system versioned tables
- implement actual bulk insert table-level key replication

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2023-01-24 11:54:25 +02:00
Thirunarayanan Balathandayuthapani
ef6b3806bb MDEV-30393 InnoDB: Assertion failure in dict0dict.cc upon ADD FULLTEXT INDEX
Problem:
========
- InnoDB fails to remove the newly created table or index from
data dictionary and table cache if the alter fails in commit phase

Solution:
========
- InnoDB should restart the transaction to remove the newly
created table and index when it fails in commit phase of an alter
operation. innodb_fts.misc_debug tests the scenario with the
help of debug point "stats_lock_fail"
2023-01-24 13:13:52 +05:30
Marko Mäkelä
e41fb3697c Revert "MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT"
This reverts commit f9cac8d2cb
which was accidentally pushed prematurely.
2023-01-23 14:52:49 +02:00
Marko Mäkelä
851c56771e Merge 10.5 into 10.6 2023-01-23 13:15:41 +02:00
Thirunarayanan Balathandayuthapani
647a7232ff MDEV-30438 innodb.undo_truncate,4k fails when innodb-immediate-scrub-data-uncompressed is enabled
- InnoDB fails to clear the freed ranges during truncation of innodb
undo log tablespace. During shutdown, InnoDB flushes the freed page
ranges and throws the out of bound error.

mtr_t::commit_shrink(): clear the freed ranges while doing undo
tablespace truncation
2023-01-23 09:55:49 +05:30
Marko Mäkelä
f9cac8d2cb MDEV-30400 Assertion height == btr_page_get_level(...) on INSERT
This also fixes part of MDEV-29835 Partial server freeze
which is caused by violations of the latching order that was
defined in https://dev.mysql.com/worklog/task/?id=6326
(WL#6326: InnoDB: fix index->lock contention). Unless the
current thread is holding an exclusive dict_index_t::lock,
it must acquire page latches in a strict parent-to-child,
left-to-right order. Not all cases are fixed yet. Failure to
follow the correct latching order will cause deadlocks of threads
due to lock order inversion.

As part of these changes, the BTR_MODIFY_TREE mode is modified
so that an Update latch (U a.k.a. SX) will be acquired on the
root page, and eXclusive latches (X) will be acquired on all pages
leading to the leaf page, as well as any left and right siblings
of the pages along the path. The test innodb.innodb_wl6326
will be removed, because at the time the DEBUG_SYNC point is hit,
the thread is actually holding several page latches that will be
blocking a concurrent SELECT statement.

We also remove double bookkeeping that was caused due to excessive
information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo
store information of latched pages, and ensure that
mtr_memo_slot_t::object is never a null pointer.
The tree_blocks[] and tree_savepoints[] were redundant.

mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo.
This avoids many redundant entries in mtr_t::m_memo, as well as
redundant calls to buf_page_get_gen() for blocks that had already
been looked up in a mini-transaction.

btr_get_latched_root(): Return a pointer to an already latched root page.
This replaces btr_root_block_get() in cases where the mini-transaction
has already latched the root page.

btr_page_get_parent(): Fetch a parent page that was already latched
in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched().
If needed, upgrade the root page U latch to X.
This avoids bloating mtr_t::m_memo as well as redundant
buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for
B-tree defragmentation, we will invoke btr_cur_search_to_nth_level().

btr_cur_search_to_nth_level(): This will only be used for non-leaf
(level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE
or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be
removed altogether, or retained for the case of
CHECK TABLE without QUICK.

btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level()
for searches to level=0 (the leaf level).

btr_cur_t::pessimistic_search_leaf(): Implement the new
BTR_MODIFY_TREE latching logic in the case that page splits
or merges will be needed. The parent pages (and their siblings)
should already be latched on the first dive to the leaf and be
present in mtr_t::m_memo; there should be no need for
BTR_CONT_MODIFY_TREE. This pre-latching almost suffices;
MDEV-29835 will have to revise it and remove work-arounds where
mtr_t::get_already_latched() fails to find a block.

rtr_search_to_nth_level(): A SPATIAL INDEX version of
btr_search_to_nth_level() that can search to any level
(including the leaf level).

rtr_search_leaf(), rtr_insert_leaf(): Wrappers for
rtr_search_to_nth_level().

rtr_search(): Replaces rtr_pcur_open().

rtr_cur_restore_position(): Remove an unused constant parameter.

btr_pcur_open_on_user_rec(): Remove the constant parameter
mode=PAGE_CUR_GE.

btr_cur_latch_leaves(): Update a pre-existing mtr_t::m_memo entry
for the current leaf page.

row_ins_clust_index_entry_low(): Use a new
mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page
when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC.

btr_cur_t::open_leaf(): Some clean-up.

mtr_t::lock_register(): Register a page latch on a buffer-fixed block.

BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove.

BTR_CONT_MODIFY_TREE: Note that this is only used by
rtr_search_to_nth_level().

btr_pcur_optimistic_latch_leaves(): Replaces
btr_cur_optimistic_latch_leaves().

ibuf_delete_rec(): Acquire ibuf.index->lock.u_lock() in order
to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV).

Tested by: Matthias Leich
2023-01-19 17:19:18 +02:00