Commit graph

185 commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
647a7232ff MDEV-30438 innodb.undo_truncate,4k fails when innodb-immediate-scrub-data-uncompressed is enabled
- InnoDB fails to clear the freed ranges during truncation of innodb
undo log tablespace. During shutdown, InnoDB flushes the freed page
ranges and throws the out of bound error.

mtr_t::commit_shrink(): clear the freed ranges while doing undo
tablespace truncation
2023-01-23 09:55:49 +05:30
Marko Mäkelä
e0e096faaa MDEV-29982 Improve the InnoDB log overwrite error message
The InnoDB write-ahead log ib_logfile0 is of fixed size,
specified by innodb_log_file_size. If the tail of the log
manages to overwrite the head (latest checkpoint) of the log,
crash recovery will be broken.

Let us clarify the messages about this, including adding
a message on the completion of a log checkpoint that notes
that the dangerous situation is over.

To reproduce the dangerous scenario, we will introduce the
debug injection label ib_log_checkpoint_avoid_hard, which will
avoid log checkpoints even harder than the previous
ib_log_checkpoint_avoid.

log_t::overwrite_warned: The first known dangerous log sequence number.
Set in log_close() and cleared in log_write_checkpoint_info(),
which will output a "Crash recovery was broken" message.
2022-11-14 12:18:03 +02:00
Marko Mäkelä
7ee612c912 MDEV-21174 fixup: Remove mtr_t::release_page()
mtr_t::release_page(): Remove. The function became unused in
commit 56f6dab1d0
when the call was replaced with a call to mtr_t::memo_release().
2022-11-10 12:50:44 +02:00
Marko Mäkelä
22f935d6da MDEV-28731 Race condition on log checkpoint
mtr_t::modify(): Set the m_made_dirty flag if needed,
so that buf_pool_t::insert_into_flush_list() will be invoked
while holding log_sys.flush_order_mutex.

This is something that was should have been part of
commit b212f1dac2 (MDEV-22107).
2022-06-02 17:33:03 +03:00
Marko Mäkelä
5294695ebd Clean up mtr_t
mtr_t::is_empty(): Replaces mtr_t::get_log() and mtr_t::get_memo().

mtr_t::get_log_size(): Replaces mtr_t::get_log().

mtr_t::print(): Remove, unused function.

ReleaseBlocks::ReleaseBlocks(): Remove an unused parameter.
2022-06-02 17:18:00 +03:00
Vlad Lesin
dcb2968f90 MDEV-27557 InnoDB unnecessarily commits mtr during secondary index search to preserve clustered index latching order
New function to release latches till savepoint was added in mtr_t. As
there is no longer need to limit MDEV-20605 fix usage for locking reads
only, the limitation is removed.
2022-03-25 09:22:36 +03:00
Marko Mäkelä
fd101daa84 MDEV-27716 mtr_t::commit() acquires log_sys.mutex when writing no log
mtr_t::is_block_dirtied(), mtr_t::memo_push(): Never set m_made_dirty
for pages of the temporary tablespace. Ever since
commit 5eb539555b
we never add those pages to buf_pool.flush_list.

mtr_t::commit(): Implement part of mtr_t::prepare_write() here,
and avoid acquiring log_sys.mutex if no log is written.
During IMPORT TABLESPACE fixup, we do not write log, but we must
add pages to buf_pool.flush_list and for that, be prepared
to acquire log_sys.flush_order_mutex.

mtr_t::do_write(): Replaces mtr_t::prepare_write().
2022-02-09 15:10:10 +02:00
Marko Mäkelä
017d1b867b MDEV-27476 heap-use-after-free in buf_pool_t::is_block_field()
mtr_t::modify(): Remove a debug assertion that had been added
in commit 05fa4558e0 (MDEV-22110).
The function buf_pool_t::is_uncompressed() is only safe to invoke
while holding a buf_pool.page_hash latch so that buf_pool_t::resize()
cannot concurrently invoke free() on any chunks.
2022-01-12 12:29:16 +02:00
Marko Mäkelä
f5794e1dc6 MDEV-26445 innodb_undo_log_truncate is unnecessarily slow
trx_purge_truncate_history(): Do not force a write of the undo tablespace
that is being truncated. Instead, prevent page writes by acquiring
an exclusive latch on all dirty pages of the tablespace.

fseg_create(): Relax an assertion that could fail if a dirty undo page
is being initialized during undo tablespace truncation (and
trx_purge_truncate_history() already acquired an exclusive latch on it).

fsp_page_create(): If we are truncating a tablespace, try to reuse
a page that we may have already latched exclusively (because it was
in buf_pool.flush_list). To some extent, this helps the test
innodb.undo_truncate,16k to avoid running out of buffer pool.

mtr_t::commit_shrink(): Mark as clean all pages that are outside the
new bounds of the tablespace, and only add the newly reinitialized pages
to the buf_pool.flush_list.

buf_page_create(): Do not unnecessarily invoke change buffer merge on
undo tablespaces.

buf_page_t::clear_oldest_modification(bool temporary): Move some
assertions to the caller buf_page_write_complete().

innodb.undo_truncate: Use a bigger innodb_buffer_pool_size=24M.
On my system, it would otherwise hang 1 out of 1547 attempts
(on the 40th repeat of innodb.undo_truncate,16k).
Other page sizes were not affected.
2021-09-24 08:24:03 +03:00
Marko Mäkelä
f5fddae3cb MDEV-26450: Corruption due to innodb_undo_log_truncate
At least since commit 055a3334ad
(MDEV-13564) the undo log truncation in InnoDB did not work correctly.

The main issue is that during the execution of
trx_purge_truncate_history() some pages of the newly truncated
undo tablespace could be discarded.

This is improved from commit 1cb218c37c
which was applied to earlier-version branches.

fsp_try_extend_data_file(): Apply the peculiar rounding of
fil_space_t::size_in_header only to the system tablespace,
whose size can be expressed in megabytes in a configuration parameter.
Other files may freely grow by a number of pages.

fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces,
and mention the file name in the error message.

mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace:
(1) durably write the log
(2) release the page latches of the rebuilt tablespace
(3) release the mutexes
(4) truncate the file
(5) release the tablespace latch
This is refactored from trx_purge_truncate_history().

log_write_and_flush_prepare(), log_write_and_flush(): New functions
to durably write log during mtr_t::commit_shrink().
2021-09-24 08:22:19 +03:00
Vladislav Vaintroub
fc2ec25733 MDEV-26166 replace log_write_up_to(LSN_MAX,...) with log_buffer_flush_to_disk()
Also, remove comparison lsn > flush/write lsn, prior to calling
log_write_up_to. The checks and early returns are part of this function.
2021-07-16 18:44:58 +02:00
Marko Mäkelä
6441bc614a MDEV-25113: Introduce a page cleaner mode before 'furious flush'
MDEV-23855 changed the way how the page cleaner is signaled by
user threads. If a threshold is exceeded, a mini-transaction commit
would invoke buf_flush_ahead() in order to initiate page flushing
before all writers would eventually grind to halt in
log_free_check(), waiting for the checkpoint age to reduce.

However, buf_flush_ahead() would always initiate 'furious flushing',
making the buf_flush_page_cleaner thread write innodb_io_capacity_max
pages per batch, and sleeping no time between batches, until the
limit LSN is reached. Because this could saturate the I/O subsystem,
system throughput could significantly reduce during these
'furious flushing' spikes.

With this change, we introduce a gentler version of flush-ahead,
which would write innodb_io_capacity_max pages per second until
the 'soft limit' is reached.

buf_flush_ahead(): Add a parameter to specify whether furious flushing
is requested.

buf_flush_async_lsn: Similar to buf_flush_sync_lsn, a limit for
the less intrusive flushing.

buf_flush_page_cleaner(): Keep working until buf_flush_async_lsn
has been reached.

log_close(): Suppress a warning message in the event that a new log
is being created during startup, when old logs did not exist.
Return what type of page cleaning will be needed.

mtr_t::finish_write(): Also when m_log.is_small(), invoke log_close().
Return what type of page cleaning will be needed.

mtr_t::commit(): Invoke buf_flush_ahead() based on the return value of
mtr_t::finish_write().
2021-06-23 19:06:52 +03:00
Marko Mäkelä
e87a8efd32 MDEV-24455 Assertion `!m_freed_space' failed in mtr_t::start
In commit 0c23e32d27 (MDEV-24445)
we forgot to keep m_freed_space in sync with m_freed_pages in one case.
2020-12-21 10:48:51 +02:00
Marko Mäkelä
0c23e32d27 MDEV-24445 Using innodb_undo_tablespaces corrupts system tablespace
In the rewrite of MDEV-8139 (based on MDEV-15528), we introduced a
wrong assumption that any persistent tablespace that is not an .ibd
file is the system tablespace. This assumption is broken when
innodb_undo_tablespaces (files undo001, undo002, ...) are being used.
By default, we have innodb_undo_tablespaces=0 (the persistent undo
log is being stored in the system tablespace).

In MDEV-15528 and MDEV-8139 we rewrote the page scrubbing logic
so that it will follow the tried-and-true write-ahead logging
protocol, first writing FREE_PAGE records and then in the page
flushing, zerofilling or hole-punching freed pages.

Unfortunately, the implementation included a wrong assumption that
that anything that is not in an .ibd file must be the system tablespace.
This wrong assumption would cause overwrites of valid data pages in
the system tablespace.

mtr_t::m_freed_in_system_tablespace: Remove.

mtr_t::m_freed_space: The tablespace associated with m_freed_pages.

buf_page_free(): Take the tablespace and page number as a parameter,
instead of taking a page identifier.
2020-12-18 19:23:34 +02:00
Marko Mäkelä
aa0e380568 MDEV-24348 InnoDB shutdown hang with innodb_flush_sync=0
This hang was caused by MDEV-23855, and we failed to fix it in
MDEV-24109 (commit 4cbfdeca84).

When buf_flush_ahead() is invoked soon before server shutdown
and the non-default setting innodb_flush_sync=OFF is in effect
and the buffer pool contains dirty pages of temporary tables,
the page cleaner thread may remain in an infinite loop
without completing its work, thus causing the shutdown to hang.

buf_flush_page_cleaner(): If the buffer pool contains no
unmodified persistent pages, ensure that buf_flush_sync_lsn= 0
will be assigned, so that shutdown will proceed.

The test case is not deterministic. On my system, it reproduced
the hang with 95% probability when running multiple instances
of the test in parallel, and 4% when running single-threaded.

Thanks to Eugene Kosov for debugging and testing this.
2020-12-04 14:11:48 +02:00
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Marko Mäkelä
81ab9ea63f Merge 10.2 into 10.3 2020-12-01 14:55:46 +02:00
Marko Mäkelä
ce0cb6a4f6 MDEV-24188 fixup: Correct the FindBlockX predicate
FindBlockX::operator(): Return false if an x-latched block is found.
Previously, we were incorrectly returning false if the block was in
the log, only if not x-latched.

It is unknown if this mistake had any visible impact. Often,
we would register both MTR_MEMO_BUF_FIX and MTR_MEMO_PAGE_X_FIX
for the same block.
2020-11-18 13:52:37 +02:00
Marko Mäkelä
e8f8992801 MDEV-24188: Merge 10.4 into 10.5 2020-11-13 22:06:50 +02:00
Marko Mäkelä
749ecedfec MDEV-24188: Merge 10.3 into 10.4 2020-11-13 20:45:28 +02:00
Marko Mäkelä
f9f2f37495 MDEV-24188: Merge 10.2 into 10.3 2020-11-13 20:41:48 +02:00
Marko Mäkelä
bb328a2a27 MDEV-24188 Hang in buf_page_create() after reusing a previously freed page
The fix of MDEV-23456 (commit b1009ae5c1)
introduced a livelock between page flushing and a thread that is
executing buf_page_create().

buf_page_create(): If the current mini-transaction is holding
an exclusive latch on the page, do not attempt to acquire another
one, and do not care about any I/O fix.

mtr_t::have_x_latch(): Replaces mtr_t::get_fix_count().

dyn_buf_t::for_each_block(const Functor&) const: A new variant.

rw_lock_own(): Add a const qualifier.

Reviewed by: Thirunarayanan Balathandayuthapani
2020-11-13 20:16:39 +02:00
Marko Mäkelä
7b2bb67113 Merge 10.3 into 10.4 2020-10-29 13:38:38 +02:00
Marko Mäkelä
a8de8f261d Merge 10.2 into 10.3 2020-10-28 10:01:50 +02:00
Thirunarayanan Balathandayuthapani
bc540b8706 MDEV-23693 Failing assertion: my_atomic_load32_explicit(&lock->lock_word, MY_MEMORY_ORDER_RELAXED) == X_LOCK_DECR
InnoDB frees the block lock during buffer pool shrinking when other
thread is yet to release the block lock.  While shrinking the
buffer pool, InnoDB allows the page to be freed unless it is buffer
fixed. In some cases, InnoDB releases the latch after unfixing the
block.

Fix:
====
- InnoDB should unfix the block after releases the latch.

- Add more assertion to check buffer fix while accessing the page.

- Introduced block_hint structure to store buf_block_t pointer
and allow accessing the buf_block_t pointer only by passing a
functor. It returns original buf_block_t* pointer if it is valid
or nullptr if the pointer become stale.

- Replace buf_block_is_uncompressed() with
buf_pool_t::is_block_pointer()

This change is motivated by a change in mysql-5.7.32:
mysql/mysql-server@46e60de444
Bug #31036301 ASSERTION FAILURE: SYNC0RW.IC:429:LOCK->LOCK_WORD
2020-10-27 18:30:00 +05:30
Marko Mäkelä
c27e53f459 MDEV-23855: Use normal mutex for log_sys.mutex, log_sys.flush_order_mutex
With an unreasonably small innodb_log_file_size, the page cleaner
thread would frequently acquire log_sys.flush_order_mutex and spend
a significant portion of CPU time spinning on that mutex when
determining the checkpoint LSN.
2020-10-26 17:53:55 +02:00
Marko Mäkelä
45ed9dd957 MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contention
Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored
for system tablespace on SSD

When the maximum configured number of file is exceeded, InnoDB will
close data files. We used to maintain a fil_system.LRU list and
a counter fil_node_t::n_pending to achieve this, at the huge cost
of multiple fil_system.mutex operations per I/O operation.

fil_node_open_file_low(): Implement a FIFO replacement policy:
The last opened file will be moved to the end of fil_system.space_list,
and files will be closed from the start of the list. However, we will
not move tablespaces in fil_system.space_list while
i_s_tablespaces_encryption_fill_table() is executing
(producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION)
because it may cause information of some tablespaces to go missing.
We also avoid this in mariabackup --backup because datafiles_iter_next()
assumes that the ordering is not changed.

IORequest: Fold more parameters to IORequest::type.

fil_space_t::io(): Replaces fil_io().

fil_space_t::flush(): Replaces fil_flush().

OS_AIO_IBUF: Remove. We will always issue synchronous reads of the
change buffer pages in buf_read_page_low().

We will always ignore some errors for background reads.

This should reduce fil_system.mutex contention a little.

fil_node_t::complete_write(): Replaces fil_node_t::complete_io().
On both read and write completion, fil_space_t::release_for_io()
will have to be called.

fil_space_t::io(): Do not acquire fil_system.mutex in the normal
code path.

xb_delta_open_matching_space(): Do not try to open the system tablespace
which was already opened. This fixes a file sharing violation in
mariabackup --prepare --incremental.

Reviewed by: Vladislav Vaintroub
2020-10-26 17:09:01 +02:00
Marko Mäkelä
3a9a3be1c6 MDEV-23855: Improve InnoDB log checkpoint performance
After MDEV-15053, MDEV-22871, MDEV-23399 shifted the scalability
bottleneck, log checkpoints became a new bottleneck.

If innodb_io_capacity is set low or innodb_max_dirty_pct_lwm is
set high and the workload fits in the buffer pool, the page cleaner
thread will perform very little flushing. When we reach the capacity
of the circular redo log file ib_logfile0 and must initiate a checkpoint,
some 'furious flushing' will be necessary. (If innodb_flush_sync=OFF,
then flushing would continue at the innodb_io_capacity rate, and
writers would be throttled.)

We have the best chance of advancing the checkpoint LSN immediately
after a page flush batch has been completed. Hence, it is best to
perform checkpoints after every batch in the page cleaner thread,
attempting to run once per second.

By initiating high-priority flushing in the page cleaner as early
as possible, we aim to make the throughput more stable.

The function buf_flush_wait_flushed() used to sleep for 10ms, hoping
that the page cleaner thread would do something during that time.
The observed end result was that a large number of threads that call
log_free_check() would end up sleeping while nothing useful is happening.

We will revise the design so that in the default innodb_flush_sync=ON
mode, buf_flush_wait_flushed() will wake up the page cleaner thread
to perform the necessary flushing, and it will wait for a signal from
the page cleaner thread.

If innodb_io_capacity is set to a low value (causing the page cleaner to
throttle its work), a write workload would initially perform well, until
the capacity of the circular ib_logfile0 is reached and log_free_check()
will trigger checkpoints. At that point, the extra waiting in
buf_flush_wait_flushed() will start reducing throughput.

The page cleaner thread will also initiate log checkpoints after each
buf_flush_lists() call, because that is the best point of time for
the checkpoint LSN to advance by the maximum amount.

Even in 'furious flushing' mode we invoke buf_flush_lists() with
innodb_io_capacity_max pages at a time, and at the start of each
batch (in the log_flush() callback function that runs in a separate
task) we will invoke os_aio_wait_until_no_pending_writes(). This
tweak allows the checkpoint to advance in smaller steps and
significantly reduces the maximum latency. On an Intel Optane 960
NVMe SSD on Linux, it reduced from 4.6 seconds to 74 milliseconds.
On Microsoft Windows with a slower SSD, it reduced from more than
180 seconds to 0.6 seconds.

We will make innodb_adaptive_flushing=OFF simply flush innodb_io_capacity
per second whenever the dirty proportion of buffer pool pages exceeds
innodb_max_dirty_pages_pct_lwm. For innodb_adaptive_flushing=ON we try
to make page_cleaner_flush_pages_recommendation() more consistent and
predictable: if we are below innodb_adaptive_flushing_lwm, let us flush
pages according to the return value of af_get_pct_for_dirty().

innodb_max_dirty_pages_pct_lwm: Revert the change of the default value
that was made in MDEV-23399. The value innodb_max_dirty_pages_pct_lwm=0
guarantees that a shutdown of an idle server will be fast. Users might
be surprised if normal shutdown suddenly became slower when upgrading
within a GA release series.

innodb_checkpoint_usec: Remove. The master task will no longer perform
periodic log checkpoints. It is the duty of the page cleaner thread.

log_sys.max_modified_age: Remove. The current span of the
buf_pool.flush_list expressed in LSN only matters for adaptive
flushing (outside the 'furious flushing' condition).
For the correctness of checkpoints, the only thing that matters is
the checkpoint age (log_sys.lsn - log_sys.last_checkpoint_lsn).
This run-time constant was also reported as log_max_modified_age_sync.

log_sys.max_checkpoint_age_async: Remove. This does not serve any
purpose, because the checkpoints will now be triggered by the page
cleaner thread. We will retain the log_sys.max_checkpoint_age limit
for engaging 'furious flushing'.

page_cleaner.slot: Remove. It turns out that
page_cleaner_slot.flush_list_time was duplicating
page_cleaner.slot.flush_time and page_cleaner.slot.flush_list_pass
was duplicating page_cleaner.flush_pass.
Likewise, there were some redundant monitor counters, because the
page cleaner thread no longer performs any buf_pool.LRU flushing, and
because there only is one buf_flush_page_cleaner thread.

buf_flush_sync_lsn: Protect writes by buf_pool.flush_list_mutex.

buf_pool_t::get_oldest_modification(): Add a parameter to specify the
return value when no persistent data pages are dirty. Require the
caller to hold buf_pool.flush_list_mutex.

log_buf_pool_get_oldest_modification(): Take the fall-back LSN
as a parameter. All callers will also invoke log_sys.get_lsn().

log_preflush_pool_modified_pages(): Replaced with buf_flush_wait_flushed().

buf_flush_wait_flushed(): Implement two limits. If not enough buffer pool
has been flushed, signal the page cleaner (unless innodb_flush_sync=OFF)
and wait for the page cleaner to complete. If the page cleaner
thread is not running (which can be the case durign shutdown),
initiate the flush and wait for it directly.

buf_flush_ahead(): If innodb_flush_sync=ON (the default),
submit a new buf_flush_sync_lsn target for the page cleaner
but do not wait for the flushing to finish.

log_get_capacity(), log_get_max_modified_age_async(): Remove, to make
it easier to see that af_get_pct_for_lsn() is not acquiring any mutexes.

page_cleaner_flush_pages_recommendation(): Protect all access to
buf_pool.flush_list with buf_pool.flush_list_mutex. Previously there
were some race conditions in the calculation.

buf_flush_sync_for_checkpoint(): New function to process
buf_flush_sync_lsn in the page cleaner thread. At the end of
each batch, we try to wake up any blocked buf_flush_wait_flushed().
If everything up to buf_flush_sync_lsn has been flushed, we will
reset buf_flush_sync_lsn=0. The page cleaner thread will keep
'furious flushing' until the limit is reached. Any threads that
are waiting in buf_flush_wait_flushed() will be able to resume
as soon as their own limit has been satisfied.

buf_flush_page_cleaner: Prioritize buf_flush_sync_lsn and do not
sleep as long as it is set. Do not update any page_cleaner statistics
for this special mode of operation. In the normal mode
(buf_flush_sync_lsn is not set for innodb_flush_sync=ON),
try to wake up once per second. No longer check whether
srv_inc_activity_count() has been called. After each batch,
try to perform a log checkpoint, because the best chances for
the checkpoint LSN to advance by the maximum amount are upon
completing a flushing batch.

log_t: Move buf_free, max_buf_free possibly to the same cache line
with log_sys.mutex.

log_margin_checkpoint_age(): Simplify the logic, and replace
a 0.1-second sleep with a call to buf_flush_wait_flushed() to
initiate flushing. Moved to the same compilation unit
with the only caller.

log_close(): Clean up the calculations. (Should be no functional
change.) Return whether flush-ahead is needed. Moved to the same
compilation unit with the only caller.

mtr_t::finish_write(): Return whether flush-ahead is needed.

mtr_t::commit(): Invoke buf_flush_ahead() when needed. Let us avoid
external calls in mtr_t::commit() and make the logic easier to follow
by having related code in a single compilation unit. Also, we will
invoke srv_stats.log_write_requests.inc() only once per
mini-transaction commit, while not holding mutexes.

log_checkpoint_margin(): Only care about log_sys.max_checkpoint_age.
Upon reaching log_sys.max_checkpoint_age where we must wait to prevent
the log from getting corrupted, let us wait for at most 1MiB of LSN
at a time, before rechecking the condition. This should allow writers
to proceed even if the redo log capacity has been reached and
'furious flushing' is in progress. We no longer care about
log_sys.max_modified_age_sync or log_sys.max_modified_age_async.
The log_sys.max_modified_age_sync could be a relic from the time when
there was a srv_master_thread that wrote dirty pages to data files.
Also, we no longer have any log_sys.max_checkpoint_age_async limit,
because log checkpoints will now be triggered by the page cleaner
thread upon completing buf_flush_lists().

log_set_capacity(): Simplify the calculations of the limit
(no functional change).

log_checkpoint_low(): Split from log_checkpoint(). Moved to the
same compilation unit with the caller.

log_make_checkpoint(): Only wait for everything to be flushed until
the current LSN.

create_log_file(): After checkpoint, invoke log_write_up_to()
to ensure that the FILE_CHECKPOINT record has been written.
This avoids ut_ad(!srv_log_file_created) in create_log_file_rename().

srv_start(): Do not call recv_recovery_from_checkpoint_start()
if the log has just been created. Set fil_system.space_id_reuse_warned
before dict_boot() has been executed, and clear it after recovery
has finished.

dict_boot(): Initialize fil_system.max_assigned_id.

srv_check_activity(): Remove. The activity count is counting transaction
commits and therefore mostly interesting for the purge of history.

BtrBulk::insert(): Do not explicitly wake up the page cleaner,
but do invoke srv_inc_activity_count(), because that counter is
still being used in buf_load_throttle_if_needed() for some
heuristics. (It might be cleaner to execute buf_load() in the
page cleaner thread!)

Reviewed by: Vladislav Vaintroub
2020-10-26 17:09:01 +02:00
Marko Mäkelä
3421223363 Merge 10.4 into 10.5 2020-09-09 16:57:30 +03:00
Marko Mäkelä
66ae50a564 Merge 10.3 into 10.4 2020-09-09 15:00:21 +03:00
Marko Mäkelä
7e07e38cf6 Merge 10.2 into 10.3 2020-09-09 13:06:46 +03:00
Marko Mäkelä
c26eae0cc0 MDEV-23456 fixup: Fix mtr_t::get_fix_count()
Before commit 05fa4558e0 (MDEV-22110)
we have slot->type == MTR_MEMO_MODIFY that are unrelated to
incrementing the buffer-fix count.

FindBlock::operator(): In debug builds, skip MTR_MEMO_MODIFY entries.

Also, simplify the code a little.

This fixes an infinite loop in the tests
innodb.innodb_defragment and innodb.innodb_wl6326_big.
2020-09-09 12:01:03 +03:00
Thirunarayanan Balathandayuthapani
b1009ae5c1 MDEV-23456 fil_space_crypt_t::write_page0() is accessing an uninitialized page
buf_page_create() is invoked when page is initialized. So that
previous contents of the page ignored. In few cases, it calls
buf_page_get_gen() is called to fetch the page from buffer pool.
It should take x-latch on the page. If other thread uses the block
or block io state is different from BUF_IO_NONE then release the
mutex and check the state and buffer fix count again. For compressed
page, use the existing free block from LRU list to create new page.
Retry to fetch the compressed page if it is in flush list

fseg_create(), fseg_create_general(): Introduce block as a parameter
where segment header is placed. It is used to avoid repetitive
x-latch on the same page

Change the assert to check whether the page has SX latch and
X latch in all callee function of buf_page_create()

mtr_t::get_fix_count(): Get the buffer fix count of the given
block added by the mtr

FindBlock is added to find the buffer fix count of the given
block acquired by the mini-transaction
2020-09-09 11:58:15 +05:30
Marko Mäkelä
05fa4558e0 MDEV-22110 InnoDB unnecessarily writes unmodified pages
At least since commit 6a7be48b1b
InnoDB appears to be invoking buf_flush_note_modification() on pages
that were exclusively latched but not modified in a mini-transaction.

MTR_MEMO_MODIFY, mtr_t::modify(): Define not only in debug code,
but also in release code. We will set the MTR_MEMO_MODIFY flag
on the earliest mtr_t::m_memo entry that we find.

MTR_LOG_NONE: Only use this mode in cases where the previous
mode will be restored before anything is modified in the mini-transaction.

MTR_MEMO_PAGE_X_MODIFY, MTR_MEMO_PAGE_SX_MODIFY: The allowed flag
combinations that include MTR_MEMO_MODIFY.

ReleaseBlocks: Only invoke buf_flush_note_modification()
on those buffer pool blocks on which mtr_t::set_modified()
and mtr_t::modify() were invoked.
2020-07-28 14:02:11 +03:00
Marko Mäkelä
4d4865de6f Merge 10.4 into 10.5 2020-07-20 15:55:59 +03:00
Marko Mäkelä
4b959bd8df Merge 10.3 into 10.4 2020-07-20 15:34:59 +03:00
Marko Mäkelä
acc58fd835 Merge 10.2 into 10.3 2020-07-20 15:11:59 +03:00
Marko Mäkelä
ca9276e37e Merge 10.1 into 10.2 2020-07-20 14:53:24 +03:00
Marko Mäkelä
57ec42bc32 MDEV-23190 InnoDB data file extension is not crash-safe
When InnoDB is extending a data file, it is updating the FSP_SIZE
field in the first page of the data file.

In commit 8451e09073 (MDEV-11556)
we removed a work-around for this bug and made recovery stricter,
by making it track changes to FSP_SIZE via redo log records, and
extend the data files before any changes are being applied to them.

It turns out that the function fsp_fill_free_list() is not crash-safe
with respect to this when it is initializing the change buffer bitmap
page (page 1, or generally, N*innodb_page_size+1). It uses a separate
mini-transaction that is committed (and will be written to the redo
log file) before the mini-transaction that actually extended the data
file. Hence, recovery can observe a reference to a page that is
beyond the current end of the data file.

fsp_fill_free_list(): Initialize the change buffer bitmap page in
the same mini-transaction.

The rest of the changes are fixing a bug that the use of the separate
mini-transaction was attempting to work around. Namely, we must ensure
that no other thread will access the change buffer bitmap page before
our mini-transaction has been committed and all page latches have been
released.

That is, for read-ahead as well as neighbour flushing, we must avoid
accessing pages that might not yet be durably part of the tablespace.

fil_space_t::committed_size: The size of the tablespace
as persisted by mtr_commit().

fil_space_t::max_page_number_for_io(): Limit the highest page
number for I/O batches to committed_size.

MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch.

mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch.

mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK,
copy space->size to space->committed_size. In this way, read-ahead
or flushing will never be invoked on pages that do not yet exist
according to FSP_SIZE.
2020-07-20 14:48:56 +03:00
Monty
0fd89a1a89 Merge remote-tracking branch 'origin/10.4' into 10.5 2020-07-03 23:31:12 +03:00
Marko Mäkelä
1813d92d0c Merge 10.4 into 10.5 2020-07-02 09:41:44 +03:00
Marko Mäkelä
f347b3e0e6 Merge 10.3 into 10.4 2020-07-02 07:39:33 +03:00
Marko Mäkelä
1df1a63924 Merge 10.2 into 10.3 2020-07-02 06:17:51 +03:00
Marko Mäkelä
c36834c832 MDEV-20377: Make WITH_MSAN more usable
MemorySanitizer (clang -fsanitize=memory) requires that all code
be compiled with instrumentation enabled. The only exception is the
C runtime library. Failure to use instrumented libraries will cause
bogus messages about memory being uninitialized.

In WITH_MSAN builds, we must avoid calling getservbyname(),
because even though it is a standard library function, it is
not instrumented, not even in clang 10.

Note: Before MariaDB Server 10.5, ./mtr will typically fail
due to the old PCRE library, which was updated in MDEV-14024.

The following cmake options were tested on 10.5
in commit 94d0bb4dbe:

cmake \
-DCMAKE_C_FLAGS='-march=native -O2' \
-DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \
-DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \
-DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \
-DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DWITH_SAFEMALLOC=OFF \
-DWITH_{ZLIB,SSL,PCRE}=bundled \
-DHAVE_LIBAIO_H=0 \
-DWITH_MSAN=ON

MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED()
and __msan_unpoison().

MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for
VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow().

InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros.

ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in
functions instead of inline assembler when building WITH_MSAN.
This will require at least -msse4.2 when building for IA-32 or AMD64.
The inline assembler would not be instrumented, and would thus cause
bogus failures.
2020-07-01 17:23:00 +03:00
Thirunarayanan Balathandayuthapani
572e53d8cc MDEV-22931 mtr_t::mtr_t() allocates some memory
mtr_t::m_freed_pages: Renamed from m_freed_ranges and made it as
pointer indirection.

mtr_t::add_freed_offset(): Allocates m_freed_pages.

mtr_t:clear_freed_ranges(): Removed.

mtr_t::init(): Added debug assertion to check whether m_freed_pages
is not yet initialized.

btr_page_alloc_low(): Remove #ifdef UNIV_DEBUG_SCRUBBING.

mtr_t::commit(): Delete m_freed_pages, reset m_trim_pages and
m_freed_in_system_tablespace.

fil_space_t::clear_freed_ranges(): Added a comment to explain how
undo log tablespaces uses it.
2020-06-19 21:12:13 +05:30
Thirunarayanan Balathandayuthapani
d0c69ccab5 MDEV-22911: Fix the valgrind & MSAN instrumentation of MDEV-8139
MEM_GET_VBITS(): Save information about uninitialized data.

MEM_SET_VBITS(): Restore information about uninitialized data.
2020-06-16 20:03:35 +05:30
Thirunarayanan Balathandayuthapani
d34cc6b3fd MDEV-8139: Fix the MSAN instrumentation 2020-06-12 21:57:33 +05:30
Thirunarayanan Balathandayuthapani
c92f7e287f MDEV-8139 Fix Scrubbing
fil_space_t::freed_ranges: Store ranges of freed page numbers.

fil_space_t::last_freed_lsn: Store the most recent LSN of
freeing a page.

fil_space_t::freed_mutex: Protects freed_ranges, last_freed_lsn.

fil_space_create(): Initialize the freed_range mutex.

fil_space_free_low(): Frees the freed_range mutex.

range_set: Ranges of page numbers.

buf_page_create(): Removes the page from freed_ranges when page
is being reused.

btr_free_root(): Remove the PAGE_INDEX_ID invalidation. Because
btr_free_root() and dict_drop_index_tree() are executed in
the same atomic mini-transaction, there is no need to
invalidate the root page.

buf_release_freed_page(): Split from buf_flush_freed_page().
Skip any I/O

buf_flush_freed_pages(): Get the freed ranges from tablespace and
Write punch-hole or zeroes of the freed ranges.

buf_flush_try_neighbors(): Handles the flushing of freed ranges.

mtr_t::freed_pages: Variable to store the list of freed pages.

mtr_t::add_freed_pages(): To add freed pages.

mtr_t::clear_freed_pages(): To clear the freed pages.

mtr_t::m_freed_in_system_tablespace: Variable to indicate whether page has
been freed in system tablespace.

mtr_t::m_trim_pages: Variable to indicate whether the space has been trimmed.

mtr_t::commit(): Add the freed page and update the last freed lsn
in the tablespace and clear the tablespace freed range if space is
trimmed.

file_name_t::freed_pages: Store the freed pages during recovery.

file_name_t::add_freed_page(), file_name_t::remove_freed_page(): To
add and remove freed page during recovery.

store_freed_or_init_rec(): Store or remove the freed pages while
encountering FREE_PAGE or INIT_PAGE redo log record.

recv_init_crash_recovery_spaces(): Add the freed page encountered
during recovery to respective tablespace.
2020-06-12 09:17:51 +05:30
Marko Mäkelä
17a7bafec0 MDEV-22110 preparation: Remove mtr_memo_contains macros
Let us invoke the debug member functions of mtr_t directly.

mtr_t::memo_contains(): Change the parameter type to
const rw_lock_t&. This function cannot be invoked on
buf_block_t::lock.

The function mtr_t::memo_contains_flagged() is intended to be invoked
on buf_block_t* or rw_lock_t*, and it along with
mtr_t::memo_contains_page_flagged() are the way to check whether
a buffer pool page has been latched within a mini-transaction.
2020-06-10 07:50:09 +03:00