Commit graph

482 commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
c89366866b MDEV-22970 Possible corruption of page_compressed tables, or
when scrubbing is enabled

buf_read_recv_pages(): Ignore the page to read if it is already
present in the freed ranges.

store_freed_or_init_rec(): Store the ranges only if scrubbing
is enabled or page compressed tablespace.

recv_init_crash_recovery_space(): Add the freed range only when
scrubbing or page compressed tablespace.

range_set::contains(): Search the value is present in ranges.

range_set::remove_if_exists(): Remove the value if exist in ranges.

mtr_t::init(): Handles the scenario that mini-transaction may allocate
a page that had just been freed.

recv_sys_t::parse(): Note down the FREE and INIT redo log irrespective
of STORE value.

Removed innodb_tablespaces_scrubbing from test case
2020-07-20 18:52:10 +05:30
Marko Mäkelä
2d00e003b2 After-merge fixes for ASAN
The merge commit 0fd89a1a89
of commit b6ec1e8bbf
was slightly incomplete.

ReadView::mem_valid(): Use the correct primitive
MEM_MAKE_ADDRESSABLE(), because MEM_UNDEFINED() now has
no effect on ASAN.

recv_sys_t::alloc(), recv_sys_t::add(): Use MEM_MAKE_ADDRESSABLE()
instead of MEM_UNDEFINED(), to get the correct behaviour for ASAN.
For Valgrind and MSAN, there is no change in behaviour.

recv_sys_t::free(), recv_sys_t::clear(): Before freeing memory to
buf_pool.free_list, invoke MEM_MAKE_ADDRESSABLE() on the entire
buf_block_t::frame, to cancel the effect of MEM_NOACCESS() in
recv_sys_t::alloc().
2020-07-04 14:28:11 +03:00
Marko Mäkelä
1813d92d0c Merge 10.4 into 10.5 2020-07-02 09:41:44 +03:00
Marko Mäkelä
f347b3e0e6 Merge 10.3 into 10.4 2020-07-02 07:39:33 +03:00
Marko Mäkelä
1df1a63924 Merge 10.2 into 10.3 2020-07-02 06:17:51 +03:00
Marko Mäkelä
c36834c832 MDEV-20377: Make WITH_MSAN more usable
MemorySanitizer (clang -fsanitize=memory) requires that all code
be compiled with instrumentation enabled. The only exception is the
C runtime library. Failure to use instrumented libraries will cause
bogus messages about memory being uninitialized.

In WITH_MSAN builds, we must avoid calling getservbyname(),
because even though it is a standard library function, it is
not instrumented, not even in clang 10.

Note: Before MariaDB Server 10.5, ./mtr will typically fail
due to the old PCRE library, which was updated in MDEV-14024.

The following cmake options were tested on 10.5
in commit 94d0bb4dbe:

cmake \
-DCMAKE_C_FLAGS='-march=native -O2' \
-DCMAKE_CXX_FLAGS='-stdlib=libc++ -march=native -O2' \
-DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug \
-DWITH_INNODB_{BZIP2,LZ4,LZMA,LZO,SNAPPY}=OFF \
-DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DWITH_SAFEMALLOC=OFF \
-DWITH_{ZLIB,SSL,PCRE}=bundled \
-DHAVE_LIBAIO_H=0 \
-DWITH_MSAN=ON

MEM_MAKE_DEFINED(): An alias for VALGRIND_MAKE_MEM_DEFINED()
and __msan_unpoison().

MEM_GET_VBITS(), MEM_SET_VBITS(): Aliases for
VALGRIND_GET_VBITS(), VALGRIND_SET_VBITS(), __msan_copy_shadow().

InnoDB: Replace the UNIV_MEM_ macros with corresponding MEM_ macros.

ut_crc32_8_hw(), ut_crc32_64_low_hw(): Use the compiler built-in
functions instead of inline assembler when building WITH_MSAN.
This will require at least -msse4.2 when building for IA-32 or AMD64.
The inline assembler would not be instrumented, and would thus cause
bogus failures.
2020-07-01 17:23:00 +03:00
Marko Mäkelä
63b3f78922 MDEV-22970: Disable MDEV-8139 due to corruption concerns
Contrary to our exceptations, it seems that a mini-transaction can
allocate a page that it had freed earlier. The function mtr_t::init()
is not prepared to deal with this, and it could happen that a
newly initialized page will be scrubbed instead. This can affect
the operation on page_compressed tables, or any InnoDB data files when
innodb_background_scrub_data_uncompressed=ON.

Also, buf_read_recv_pages() can interfere with the MDEV-8139 logic
during crash recovery.

Let us temporarily disable MDEV-8139 due to such concerns.

Note: Scrubbing will partially work thanks to MDEV-15528. Only in
cases where the page does not exist in the buffer pool at the time
of the page flush, we would skip the scrubbing action.
2020-06-21 23:49:45 +02:00
Marko Mäkelä
d2c593c2a6 MDEV-22877 Avoid unnecessary buf_pool.page_hash S-latch acquisition
MDEV-15053 did not remove all unnecessary buf_pool.page_hash S-latch
acquisition. There are code paths where we are holding buf_pool.mutex
(which will sufficiently protect buf_pool.page_hash against changes)
and unnecessarily acquire the latch. Many invocations of
buf_page_hash_get_locked() can be replaced with the much simpler
buf_pool.page_hash_get_low().

In the worst case the thread that is holding buf_pool.mutex will become
a victim of MDEV-22871, suffering from a spurious reader-reader conflict
with another thread that genuinely needs to acquire a buf_pool.page_hash
S-latch.

In many places, we were also evaluating page_id_t::fold() while holding
buf_pool.mutex. Low-level functions such as buf_pool.page_hash_get_low()
must get the page_id_t::fold() as a parameter.

buf_buddy_relocate(): Defer the hash_lock acquisition to the critical
section that starts by calling buf_page_t::can_relocate().
2020-06-12 16:22:03 +03:00
Thirunarayanan Balathandayuthapani
c92f7e287f MDEV-8139 Fix Scrubbing
fil_space_t::freed_ranges: Store ranges of freed page numbers.

fil_space_t::last_freed_lsn: Store the most recent LSN of
freeing a page.

fil_space_t::freed_mutex: Protects freed_ranges, last_freed_lsn.

fil_space_create(): Initialize the freed_range mutex.

fil_space_free_low(): Frees the freed_range mutex.

range_set: Ranges of page numbers.

buf_page_create(): Removes the page from freed_ranges when page
is being reused.

btr_free_root(): Remove the PAGE_INDEX_ID invalidation. Because
btr_free_root() and dict_drop_index_tree() are executed in
the same atomic mini-transaction, there is no need to
invalidate the root page.

buf_release_freed_page(): Split from buf_flush_freed_page().
Skip any I/O

buf_flush_freed_pages(): Get the freed ranges from tablespace and
Write punch-hole or zeroes of the freed ranges.

buf_flush_try_neighbors(): Handles the flushing of freed ranges.

mtr_t::freed_pages: Variable to store the list of freed pages.

mtr_t::add_freed_pages(): To add freed pages.

mtr_t::clear_freed_pages(): To clear the freed pages.

mtr_t::m_freed_in_system_tablespace: Variable to indicate whether page has
been freed in system tablespace.

mtr_t::m_trim_pages: Variable to indicate whether the space has been trimmed.

mtr_t::commit(): Add the freed page and update the last freed lsn
in the tablespace and clear the tablespace freed range if space is
trimmed.

file_name_t::freed_pages: Store the freed pages during recovery.

file_name_t::add_freed_page(), file_name_t::remove_freed_page(): To
add and remove freed page during recovery.

store_freed_or_init_rec(): Store or remove the freed pages while
encountering FREE_PAGE or INIT_PAGE redo log record.

recv_init_crash_recovery_spaces(): Add the freed page encountered
during recovery to respective tablespace.
2020-06-12 09:17:51 +05:30
Marko Mäkelä
6877ef9a7c Merge 10.4 into 10.5 2020-06-05 20:36:43 +03:00
Marko Mäkelä
68d9d512e9 Merge 10.3 into 10.4 2020-06-05 18:05:22 +03:00
Marko Mäkelä
680463a8d9 Merge 10.2 into 10.3 2020-06-05 16:51:26 +03:00
Marko Mäkelä
b1ab211dee MDEV-15053 Reduce buf_pool_t::mutex contention
User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE
and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0
and will no longer report the PAGE_STATE value READY_FOR_USE.

We will remove some fields from buf_page_t and move much code to
member functions of buf_pool_t and buf_page_t, so that the access
rules of data members can be enforced consistently.

Evicting or adding pages in buf_pool.LRU will remain covered by
buf_pool.mutex.

Evicting or adding pages in buf_pool.page_hash will remain
covered by both buf_pool.mutex and the buf_pool.page_hash X-latch.

After this fix, buf_pool.page_hash lookups can entirely
avoid acquiring buf_pool.mutex, only relying on
buf_pool.hash_lock_get() S-latch.

Similarly, buf_flush_check_neighbors() can will rely solely on
buf_pool.mutex, no buf_pool.page_hash latch at all.

The buf_pool.mutex is rather contended in I/O heavy benchmarks,
especially when the workload does not fit in the buffer pool.

The first attempt to alleviate the contention was the
buf_pool_t::mutex split in
commit 4ed7082eef
which introduced buf_block_t::mutex, which we are now removing.

Later, multiple instances of buf_pool_t were introduced
in commit c18084f71b
and recently removed by us in
commit 1a6f708ec5 (MDEV-15058).

UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool
related debugging in otherwise non-debug builds has not been used
for years. Instead, we have been using UNIV_DEBUG, which is enabled
in CMAKE_BUILD_TYPE=Debug.

buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on
std::atomic and the buf_pool.page_hash latches, and in some cases
depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before.
We must always release buf_block_t::lock before invoking
unfix() or io_unfix(), to prevent a glitch where a block that was
added to the buf_pool.free list would apper X-latched. See
commit c5883debd6 how this glitch
was finally caught in a debug environment.

We move some buf_pool_t::page_hash specific code from the
ha and hash modules to buf_pool, for improved readability.

buf_pool_t::close(): Assert that all blocks are clean, except
on aborted startup or crash-like shutdown.

buf_pool_t::validate(): No longer attempt to validate
n_flush[] against the number of BUF_IO_WRITE fixed blocks,
because buf_page_t::flush_type no longer exists.

buf_pool_t::watch_set(): Replaces buf_pool_watch_set().
Reduce mutex contention by separating the buf_pool.watch[]
allocation and the insert into buf_pool.page_hash.

buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a
buf_pool.page_hash latch.
Replaces and extends buf_page_hash_lock_s_confirm()
and buf_page_hash_lock_x_confirm().

buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES.

buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads:
Use Atomic_counter.

buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out().

buf_pool_t::LRU_remove(): Remove a block from the LRU list
and return its predecessor. Incorporates buf_LRU_adjust_hp(),
which was removed.

buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(),
for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by
BTR_DELETE_OP (purge), which is never invoked on temporary tables.

buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments.

buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition.

buf_LRU_free_page(): Clarify the function comment.

buf_flush_check_neighbor(), buf_flush_check_neighbors():
Rewrite the construction of the page hash range. We will hold
the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64)
consecutive lookups of buf_pool.page_hash.

buf_flush_page_and_try_neighbors(): Remove.
Merge to its only callers, and remove redundant operations in
buf_flush_LRU_list_batch().

buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite.
Do not acquire buf_pool.mutex, and iterate directly with page_id_t.

ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined
and avoids any loops.

fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove.

buf_flush_page(): Add a fil_space_t* parameter. Minimize the
buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated
atomically with the io_fix, and we will protect most buf_block_t
fields with buf_block_t::lock. The function
buf_flush_write_block_low() is removed and merged here.

buf_page_init_for_read(): Use static linkage. Initialize the newly
allocated block and acquire the exclusive buf_block_t::lock while not
holding any mutex.

IORequest::IORequest(): Remove the body. We only need to invoke
set_punch_hole() in buf_flush_page() and nowhere else.

buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type.
This field is only used during a fil_io() call.
That function already takes IORequest as a parameter, so we had
better introduce  for the rarely changing field.

buf_block_t::init(): Replaces buf_page_init().

buf_page_t::init(): Replaces buf_page_init_low().

buf_block_t::initialise(): Initialise many fields, but
keep the buf_page_t::state(). Both buf_pool_t::validate() and
buf_page_optimistic_get() requires that buf_page_t::in_file()
be protected atomically with buf_page_t::in_page_hash
and buf_page_t::in_LRU_list.

buf_page_optimistic_get(): Now that buf_block_t::mutex
no longer exists, we must check buf_page_t::io_fix()
after acquiring the buf_pool.page_hash lock, to detect
whether buf_page_init_for_read() has been initiated.
We will also check the io_fix() before acquiring hash_lock
in order to avoid unnecessary computation.
The field buf_block_t::modify_clock (protected by buf_block_t::lock)
allows buf_page_optimistic_get() to validate the block.

buf_page_t::real_size: Remove. It was only used while flushing
pages of page_compressed tables.

buf_page_encrypt(): Add an output parameter that allows us ot eliminate
buf_page_t::real_size. Replace a condition with debug assertion.

buf_page_should_punch_hole(): Remove.

buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch().
Add the parameter size (to replace buf_page_t::real_size).

buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page().
Add the parameter size (to replace buf_page_t::real_size).

fil_system_t::detach(): Replaces fil_space_detach().
Ensure that fil_validate() will not be violated even if
fil_system.mutex is released and reacquired.

fil_node_t::complete_io(): Renamed from fil_node_complete_io().

fil_node_t::close_to_free(): Replaces fil_node_close_to_free().
Avoid invoking fil_node_t::close() because fil_system.n_open
has already been decremented in fil_space_t::detach().

BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY.

BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE,
and distinguish dirty pages by buf_page_t::oldest_modification().

BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead.
This state was only being used for buf_page_t that are in
buf_pool.watch.

buf_pool_t::watch[]: Remove pointer indirection.

buf_page_t::in_flush_list: Remove. It was set if and only if
buf_page_t::oldest_modification() is nonzero.

buf_page_decrypt_after_read(), buf_corrupt_page_release(),
buf_page_check_corrupt(): Change the const fil_space_t* parameter
to const fil_node_t& so that we can report the correct file name.

buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function.

buf_page_io_complete(): Split to buf_page_read_complete() and
buf_page_write_complete().

buf_dblwr_t::in_use: Remove.

buf_dblwr_t::buf_block_array: Add IORequest::flush_t.

buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of
os_aio_wait_until_no_pending_writes().

buf_flush_write_complete(): Declare static, not global.
Add the parameter IORequest::flush_t.

buf_flush_freed_page(): Simplify the code.

recv_sys_t::flush_lru: Renamed from flush_type and changed to bool.

fil_read(), fil_write(): Replaced with direct use of fil_io().

fil_buffering_disabled(): Remove. Check srv_file_flush_method directly.

fil_mutex_enter_and_prepare_for_io(): Return the resolved
fil_space_t* to avoid a duplicated lookup in the caller.

fil_report_invalid_page_access(): Clean up the parameters.

fil_io(): Return fil_io_t, which comprises fil_node_t and error code.
Always invoke fil_space_t::acquire_for_io() and let either the
sync=true caller or fil_aio_callback() invoke
fil_space_t::release_for_io().

fil_aio_callback(): Rewrite to replace buf_page_io_complete().

fil_check_pending_operations(): Remove a parameter, and remove some
redundant lookups.

fil_node_close_to_free(): Wait for n_pending==0. Because we no longer
do an extra lookup of the tablespace between fil_io() and the
completion of the operation, we must give fil_node_t::complete_io() a
chance to decrement the counter.

fil_close_tablespace(): Remove unused parameter trx, and document
that this is only invoked during the error handling of IMPORT TABLESPACE.

row_import_discard_changes(): Merged with the only caller,
row_import_cleanup(). Do not lock up the data dictionary while
invoking fil_close_tablespace().

logs_empty_and_mark_files_at_shutdown(): Do not invoke
fil_close_all_files(), to avoid a !needs_flush assertion failure
on fil_node_t::close().

innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files().

fil_close_all_files(): Invoke fil_flush_file_spaces()
to ensure proper durability.

thread_pool::unbind(): Fix a crash that would occur on Windows
after srv_thread_pool->disable_aio() and os_file_close().
This fix was submitted by Vladislav Vaintroub.

Thanks to Matthias Leich and Axel Schwenke for extensive testing,
Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.
2020-06-05 12:35:46 +03:00
Marko Mäkelä
eba2d10ac5 MDEV-22721 Remove bloat caused by InnoDB logger class
Introduce a new ATTRIBUTE_NOINLINE to
ib::logger member functions, and add UNIV_UNLIKELY hints to callers.

Also, remove some crash reporting output. If needed, the
information will be available using debugging tools.

Furthermore, remove some fts_enable_diag_print output that included
indexed words in raw form. The code seemed to assume that words are
NUL-terminated byte strings. It is not clear whether a NUL terminator
is always guaranteed to be present. Also, UCS2 or UTF-16 strings would
typically contain many NUL bytes.
2020-06-04 10:24:10 +03:00
Marko Mäkelä
fbe2712705 Merge 10.4 into 10.5
The functional changes of commit 5836191c8f
(MDEV-21168) are omitted due to MDEV-742 having addressed the issue.
2020-04-25 21:57:52 +03:00
Marko Mäkelä
af91266498 Merge 10.3 into 10.4
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
2020-04-16 12:12:26 +03:00
Marko Mäkelä
84db10f27b Merge 10.2 into 10.3 2020-04-15 09:56:03 +03:00
Thirunarayanan Balathandayuthapani
6bbc0eedc6 MDEV-22193 Avoid un-necessary page initialization during recovery
- InnoDB is doing un-necessary redo log page initialisation during
recovery and unnecessary traversal of redo log during last phase.
This patch does the optimization of removing unnecessary redo log page
initialisation and detects the memory exhaust earlier.
2020-04-09 21:25:31 +05:30
Marko Mäkelä
1738c0f1be MDEV-22169 Recovery fails after failing to insert into mlog_init
In a multi-batch recovery, we must ensure that INIT_PAGE and
especially the MDEV-15528 FREE_PAGE records will be taken
properly into account.

Writing a FREE_PAGE record gives the server permission to omit
a page write. If recovery insists on applying log to a page
whose page flush has been omitted, then the consistency checks
in the application of high-level redo log records (appending
an undo log record, inserting or deleting an index record)
will likely fail.

mlog_init_t::add(): Return whether the state was changed.

mlog_init_t::will_avoid_read(): Determine whether a page read
will be avoided and whether older log records can be safely
skipped.

recv_sys_t::parse(): Even if store==STORE_NO, process the records
INIT_PAGE and FREE_PAGE. While processing them, we can delete older
redo log records for the page. If store!=STORE_NO, we can directly
skip redo log recods of other types if mlog_init indicates that the
page will be freed or initialized by at a later LSN.

This fix was developed in cooperation with
Thirunarayanan Balathandayuthapani.
2020-04-07 17:18:05 +03:00
Vlad Lesin
5836191c8f MDEV-21168: Active XA transactions stop slave from working after backup
was restored.

Optionally rollback prepared XA's on "mariabackup --prepare".

The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves.
2020-04-07 15:05:38 +03:00
Marko Mäkelä
37c14690fc Merge 10.4 into 10.5 2020-03-30 19:07:25 +03:00
Marko Mäkelä
aae3f921ad Cleanup recv_sys: Move things to members
recv_sys.recovery_on: Replaces recv_recovery_on.

recv_sys_t::apply(): Replaces recv_apply_hashed_log_recs().

recv_sys_var_init(): Remove.

recv_sys_t::recover_low(): Attempt to initialize a page based
on buffered redo log records.
2020-03-30 18:45:09 +03:00
Marko Mäkelä
a8b04c3ee0 MDEV-12353: Remove a trace of pre-MDEV-13564 crash-upgrade
In commit f8a9f90667
we removed support for crash-upgrade from older versions,
but forgot to remove a check for recovering TRUNCATE TABLE
if MariaDB 10.2.18 or 10.3.9 or earlier were killed and
we are attempting to upgrade to MariaDB 10.5.2 or later.
Already MariaDB 10.4 would refuse to recover such TRUNCATE
operations.
2020-03-30 18:08:38 +03:00
Marko Mäkelä
e2f1f88fa6 Merge 10.3 into 10.4 2020-03-30 14:50:23 +03:00
Marko Mäkelä
1a9b6c4c7f Merge 10.2 into 10.3 2020-03-30 11:12:56 +03:00
Thirunarayanan Balathandayuthapani
6697135c6d MDEV-21572 buf_page_get_gen() should apply buffered page initialized
redo log during recovery

- InnoDB unnecessarily reads the page even though it has fully initialized
buffered redo log records. Allow the page initialization redo log to
apply for the page in buf_page_get_gen() during recovery.
- Renamed buf_page_get_gen() to buf_page_get_low()
- Newly added buf_page_get_gen() will check for buffered redo log for
the particular page id during recovery
- Added new function buf_page_mtr_lock() which basically latches the page
for the given latch type.
- recv_recovery_create_page() is inline function which creates a page
if it has page initialization redo log records.
2020-03-23 16:41:48 +05:30
Marko Mäkelä
5203bc10f1 Merge 10.4 into 10.5 2020-03-21 11:37:10 +02:00
Marko Mäkelä
bd3c8f47cd Merge 10.3 into 10.4 2020-03-20 22:06:55 +02:00
Marko Mäkelä
44298e4dea Merge 10.2 into 10.3
Also, clean up the test innodb_gis.geometry a little further.
2020-03-20 18:12:17 +02:00
Marko Mäkelä
a786f50de5 MDEV-21962 Allocate buf_pool statically
Thanks to MDEV-15058, there is only one InnoDB buffer pool.
Allocating buf_pool statically removes one level of pointer indirection
and makes code more readable, and removes the awkward initialization of
some buf_pool members.

While doing this, we will also declare some buf_pool_t data members
private and replace some functions with member functions. This is
mostly affecting buffer pool resizing.

This is not aiming to be a complete rewrite of buf_pool_t to
a proper class. Most of the buffer pool interface, such as
buf_page_get_gen(), will remain in the C programming style
for now.

buf_pool_t::withdrawing: Replaces buf_pool_withdrawing.
buf_pool_t::withdraw_clock_: Replaces buf_withdraw_clock.

buf_pool_t::create(): Repalces buf_pool_init().
buf_pool_t::close(): Replaces buf_pool_free().

buf_bool_t::will_be_withdrawn(): Replaces buf_block_will_be_withdrawn(),
buf_frame_will_be_withdrawn().

buf_pool_t::clear_hash_index(): Replaces buf_pool_clear_hash_index().
buf_pool_t::get_n_pages(): Replaces buf_pool_get_n_pages().
buf_pool_t::validate(): Replaces buf_validate().
buf_pool_t::print(): Replaces buf_print().
buf_pool_t::block_from_ahi(): Replaces buf_block_from_ahi().
buf_pool_t::is_block_field(): Replaces buf_pointer_is_block_field().
buf_pool_t::is_block_mutex(): Replaces buf_pool_is_block_mutex().
buf_pool_t::is_block_lock(): Replaces buf_pool_is_block_lock().
buf_pool_t::is_obsolete(): Replaces buf_pool_is_obsolete().
buf_pool_t::io_buf: Make default-constructible.
buf_pool_t::io_buf::create(): Delayed 'constructor'
buf_pool_t::io_buf::close(): Early 'destructor'

HazardPointer: Make default-constructible. Define all member functions
inline, also for derived classes.
2020-03-18 22:32:40 +02:00
Thirunarayanan Balathandayuthapani
09e8707d90 MDEV-21826 Recovery failure : loop of Read redo log up to LSN
- This issue is caused by MDEV-19176
(bba59abb03).
- Problem is that there is miscalculation of available memory during
recovery if innodb_buffer_pool_instances > 1.
- Ignore the buffer pool instance while calculating available_memory
- Removed recv_n_pool_free_frames variable and use buf_pool_get_n_pages()
instead.
2020-03-18 15:25:28 +05:30
Sergei Golubchik
91d1588d30 Merge branch 'github/10.5' into 10.5 2020-03-14 09:52:35 +01:00
Marko Mäkelä
f224525204 MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC
The -Wconversion in GCC seems to be stricter than in clang.
GCC at least since version 4.4.7 issues truncation warnings for
assignments to bitfields, while clang 10 appears to only issue
warnings when the sizes in bytes rounded to the nearest integer
powers of 2 are different.

Before GCC 10.0.0, -Wconversion required more casts and would not
allow some operations, such as x<<=1 or x+=1 on a data type that
is narrower than int.

GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
about x|=y even when x and y are compatible types that are narrower
than int.  Hence, we must rewrite some x|=y as
x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.

In GCC 6 and later, the warning for assigning wider to bitfields
that are narrower than 8, 16, or 32 bits can be suppressed by
applying a bitwise & with the exact bitmask of the bitfield.
For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
cases.

The bitwise negation operator appears to promote short integers
to a wider type, and hence we must add explicit truncation casts
around them. Microsoft Visual C does not allow a static_cast to
truncate a constant, such as static_cast<byte>(1) truncating int.
Hence, we will use the constructor-style cast byte(~1) for such cases.

This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
2020-03-12 19:46:41 +02:00
Oleksandr Byelkin
fad47df995 Merge branch '10.4' into 10.5 2020-03-11 17:52:49 +01:00
Sergei Golubchik
7c58e97bf6 perfschema memory related instrumentation changes 2020-03-10 19:24:22 +01:00
Marko Mäkelä
276e042de3 MDEV-21893: Assertion failure on upgrade with innodb_encrypt_log
recv_log_recover_10_4(): Add a missing bit pattern negation that
was forgotten when commit f8a9f90667
(MDEV-12353) removed the support for crash-upgrading.
2020-03-09 11:38:43 +02:00
Marko Mäkelä
57c592f74d Cleanup: Remove recv_sys.remove_extra_log_files
create_log_file(): Delete all old redo log files where they used to be
deleted, after the crash injection point innodb_log_abort_6,
before commit 9ef2d29ff4
deprecated and ignored the setting innodb_log_files_in_group.
2020-03-07 14:47:15 +02:00
Marko Mäkelä
70f0dbe4d3 Cleanup: log upgrade and encryption
log_crypt_101_read_checkpoint(), log_crypt_101_read_block():
Declare as ATTRIBUTE_COLD. These are only used when
checking that a MariaDB 10.1 encrypted redo log is clean.

log_block_calc_checksum_format_0(): Define in the only
compilation unit where it is needed. This is only used
when reading the checkpoint information from redo logs
before MariaDB 10.2.2.

crypt_info_t: Declare the byte arrays directly with alignas().

log_crypt(): Use memcpy_aligned instead of reinterpret_cast
on integers.
2020-03-07 14:31:36 +02:00
Marko Mäkelä
522fbfcb5c Cleanup: Remove recv_sys.buf_size
Also, correctly document what recv_sys.mutex is protecting.
2020-03-07 12:01:12 +02:00
Marko Mäkelä
23685378ba MDEV-14425 preparation: Simplify redo log upgrade
recv_log_recover_pre_10_2(): Merged from
recv_find_max_checkpoint_0(), recv_log_format_0_recover().
2020-03-06 11:06:59 +02:00
Marko Mäkelä
a4ab54d70f MDEV-14425 Cleanup: Use std::atomic for some log_sys members
Some fields were protected by log_sys.mutex, which adds quite some
overhead for readers. Some readers were submitting dirty reads.

log_t::lsn: Declare private and atomic. Add wrappers get_lsn()
and set_lsn() that will use relaxed memory access. Many accesses
to log_sys.lsn are still protected by log_sys.mutex; we avoid the
mutex for some readers.

log_t::flushed_to_disk_lsn: Declare private and atomic, and move
to the same cache line with log_t::lsn.

log_t::buf_free: Declare as size_t, and move to the same cache line
with log_t::lsn.

log_t::check_flush_or_checkpoint_: Declare private and atomic,
and move to the same cache line with log_t::lsn.

log_get_lsn(): Define as an alias of log_sys.get_lsn().

log_get_lsn_nowait(), log_peek_lsn(): Remove.

log_get_flush_lsn(): Define as an alias of log_sys.get_flush_lsn().

log_t::initiate_write(): Replaces log_buffer_sync_in_background().
2020-03-05 16:21:31 +02:00
Marko Mäkelä
8a25eb666d MDEV-18214 cleanup: Remove redundant MONITOR_INC calls
MONITOR_PENDING_CHECKPOINT_WRITE and MONITOR_LOG_IO track
log_sys.n_pending_checkpoint_writes and log_sys.n_log_ios,
respectively. The MONITOR_INC calls are redundant, because
the values will be overwritten in srv_mon_process_existing_counter().
2020-03-04 13:05:22 +02:00
Marko Mäkelä
9e488653ae Cleanup: Make MONITOR_LSN_CHECKPOINT_AGE a value.
Compute MONITOR_LSN_CHECKPOINT_AGE on demand in
srv_mon_process_existing_counter().
This allows us to remove the overhead of MONITOR_SET
calls for the counter.
2020-03-04 12:59:20 +02:00
Marko Mäkelä
4383897a01 MDEV-14425 preparation: Remove log_header_read()
The function log_header_read() was only used during server startup,
and it will mostly be used only for reading checkpoint information
from pre-MDEV-14425 format redo log files.

Let us replace the function with more direct calls, so that
it is clearer what is going on. It is not strictly necessary to
hold any mutex during this operation, and because there will be
only a limited number of operations during early server startup,
it is not necessary to increment any I/O counters.
2020-03-04 10:08:33 +02:00
Marko Mäkelä
fae259f036 MDEV-12353: Introduce an EXTENDED record subtype TRIM_PAGES
For undo log truncation, commit 055a3334ad
repurposed the MLOG_FILE_CREATE2 record with a nonzero page size
to indicate that an undo tablespace will be shrunk in size.
In commit 7ae21b18a6 the
MLOG_FILE_CREATE2 record was replaced by a FILE_CREATE record.

Now that the redo log encoding was changed, there is no actual need
to write a file name in the log record; it suffices to write the
page identifier of the first page that is not part of the file.

This TRIM_PAGES record could allow us to shrink any data files in the
future. For now, it will be limited to undo tablespaces.

mtr_t::log_file_op(): Remove the parameter first_page_no, because
it would always be 0 for file operations.

mtr_t::trim_pages(): Replaces fil_truncate_log().

mtr_t::log_write(): Avoid same_page encoding if !bpage&&!m_last.

fil_op_replay_rename(): Remove the constant parameter first_page_no=0.
2020-03-03 13:25:45 +02:00
Marko Mäkelä
8511f04fdb Cleanup: Remove srv_start_lsn
Most of the time, we can refer to recv_sys.recovered_lsn.
2020-03-02 15:01:46 +02:00
Vladislav Vaintroub
47d8fcf4cd MDEV-21534 - fix debug build 2020-03-01 23:33:16 +01:00
Marko Mäkelä
138cbec5f2 MDEV-21724: Optimize page_cur_insert_low() redo logging
Inserting a record into an index page involves updating multiple
fields in the page header as well as updating the next-record links
and potentially updating fields related to the sparse page directory.

Let us cover the insert operations by higher-level log records, to avoid
'redundant' logging about the writes.

The code for applying the high-level log records will check the
consistency of the page thoroughly, to avoid crashes during recovery.
We will refuse to replay the inserts if any inconsistency is detected.
With innodb_force_recovery=1, recovery will continue, but the affected
pages may be more inconsistent if some changes were omitted.

mrec_ext_t: Introduce the EXTENDED record subtypes
INSERT_HEAP_REDUNDANT, INSERT_REUSE_REDUNDANT,
INSERT_HEAP_DYNAMIC, INSERT_REUSE_DYNAMIC.
The record will explicitly identify the page type and whether
the space will be allocated from PAGE_HEAP_TOP or reused from
the PAGE_FREE list. It will also tell how many bytes to copy
from the preceding record header and payload, and how to
initialize the rest of the record header and payload.

mtr_t::page_insert(): Write the high-level log records.

log_phys_t::apply(): Parse the high-level log records.

page_apply_insert_redundant(), page_apply_insert_dynamic():
Apply the high-level log records.

page_dir_split_slot(): Introduce a variant that does not write log
nor deal with ROW_FORMAT=COMPRESSED pages.

page_mem_alloc_heap(): Remove the mtr_t parameter

page_cur_insert_rec_low(): Write log only via mtr_t::page_insert().
2020-02-27 17:19:44 +02:00
Marko Mäkelä
e15ae1cfe1 MDEV-12353: Improve page_cur_delete_rec() recovery
This is a follow-up to commit 572d20757b
where we introduced the EXTENDED log record subtypes
DELETE_ROW_FORMAT_REDUNDANT and DELETE_ROW_FORMAT_DYNAMIC.

log_phys_t::apply(): If corruption was noticed, stop applying the log
unless innodb_force_recovery is set.
2020-02-27 16:47:00 +02:00
Marko Mäkelä
4431144ae5 MDEV-12353: Make UNDO_APPEND more robust
This is a follow-up to commit 84e3f9ce84
that introduced the EXTENDED log record of UNDO_APPEND subtype.

mtr_t::undo_append(): Accurately enforce the mtr_buf_t::MAX_DATA_SIZE
limit. Also, replace mtr_buf_t::push() with simpler code, to append 1 byte
to the log.

log_phys_t::undo_append(): Return whether the page was found to
be in an inconsistent state.

log_phys_t::apply(): If corruption was noticed, stop applying log
unless innodb_force_recovery is set.
2020-02-27 16:47:00 +02:00