Commit graph

2,939 commits

Author SHA1 Message Date
Marko Mäkelä
08ba388713 MDEV-12353: Replace MLOG_REC_INSERT,MLOG_COMP_REC_INSERT
page_mem_alloc_free(), page_dir_set_n_heap(), page_ptr_set_direction():
Merge with the callers.

page_direction_reset(), page_direction_increment(),
page_zip_dir_insert(), page_zip_write_rec_ext(), page_zip_write_rec():
Add the parameter mtr, and write log.

PageBulk::insert(), PageBulk::finish(): Write log for all changes.

page_cur_rec_insert(), page_cur_insert_rec_write_log(),
page_cur_insert_rec_write_log(): Remove.

page_rec_set_next(), page_header_set_field(), page_header_set_ptr():
Remove. Use lower-level operations with or without logging.

page_zip_dir_add_slot(): Move to the same compilation unit with
its only caller, page_cur_insert_rec_zip().

page_cur_insert_rec_zip(): Mark pieces of code that must be skipped
once this task is completed.

btr_defragment_chunk(): Before starting a mini-transaction that
is writing (a lot), invoke log_free_check(). This should allow
the test innodb.innodb_defrag_concurrent to pass with the
mtr default_mysqld.cnf setting of innodb_log_file_size=10M.

MLOG_BUF_MARGIN: Remove.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
2a77b2a510 MDEV-12353: Replace MLOG_*LIST_*_DELETE and MLOG_*REC_DELETE
No longer write the following redo log records:
MLOG_COMP_LIST_END_DELETE, MLOG_LIST_END_DELETE,
MLOG_COMP_LIST_START_DELETE, MLOG_LIST_START_DELETE,
MLOG_REC_DELETE,MLOG_COMP_REC_DELETE.

Each individual deleted record will be logged separately
using physical log records.

page_dir_slot_set_n_owned(),
page_zip_rec_set_owned(), page_zip_dir_delete(), page_zip_clear_rec():
Add the parameter mtr, and write redo log.

page_dir_slot_set_rec(): Remove. Replaced with lower-level operations
that write redo log when necessary.

page_rec_set_n_owned(): Replaces rec_set_n_owned_old(),
rec_set_n_owned_new().

rec_set_heap_no(): Replaces rec_set_heap_no_old(), rec_set_heap_no_new().

page_mem_free(), page_dir_split_slot(), page_dir_balance_slot():
Add the parameter mtr.

page_dir_set_n_slots(): Merge with the caller page_dir_split_slot().

page_dir_slot_set_rec(): Merge with the callers page_dir_split_slot()
and page_dir_balance_slot().

page_cur_insert_rec_low(), page_cur_insert_rec_zip():
Suppress the logging of lower-level operations.

page_cur_delete_rec_write_log(): Remove.

page_cur_delete_rec(): Do not tolerate mtr=NULL.

rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_comp():
Replace rec_set_heap_no_old() and rec_set_heap_no_new() with direct
access that does not involve redo logging.

mtr_t::memcpy(): Do allow non-redo-logged writes to uncompressed pages
of ROW_FORMAT=COMPRESSED pages.

buf_page_io_complete(): Evict the uncompressed page of
a ROW_FORMAT=COMPRESSED page after recovery. Because we no longer
write logical log records for deleting index records, but instead
write physical records that may refer directly to the compressed
page frame of a ROW_FORMAT=COMPRESSED page, and because on recovery
we will only apply the changes to the ROW_FORMAT=COMPRESSED page,
the uncompressed page frame can be stale until page_zip_decompress()
is executed.

recv_parse_or_apply_log_rec_body(): After applying MLOG_ZIP_WRITE_STRING,
ensure that the FIL_PAGE_TYPE of the uncompressed page matches the
compressed page, because buf_flush_init_for_writing() assumes that
field to be valid.

mlog_init_t::mark_ibuf_exist(): Invoke page_zip_decompress(), because
the uncompressed page after buf_page_create() is not necessarily
up to date.

buf_LRU_block_remove_hashed(): Bypass a page_zip_validate() check
during redo log apply.

recv_apply_hashed_log_recs(): Invoke mlog_init.mark_ibuf_exist()
also for the last batch, to ensure that page_zip_decompress() will
be called for freshly initialized pages.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
d00185c40d MDEV-12353: Replace MLOG_PAGE_CREATE_RTREE, MLOG_PAGE_COMP_CREATE_RTREE
page_create(): Create normal B-tree pages. Callers that create
R-tree pages will set FIL_PAGE_TYPE and reset the split
sequence number afterwards.

The creation of ROW_FORMAT=COMPRESSED pages is unaffected;
they will be logged as compressed page images.

page_create_low(): Take const buf_block_t* as a parameter.
Let the callers invoke buf_block_modify_clock_inc().
2020-02-13 18:19:14 +02:00
Marko Mäkelä
b3d02a1fcf MDEV-12353: Replace DELETE_MARK redo log records with MLOG_WRITE_STRING
btr_cur_upd_rec_sys(): Replaces row_upd_rec_sys_fields() and
implements redo logging.

row_upd_rec_sys_fields_in_recovery(): Remove, and merge to the
only remaining caller btr_cur_parse_update_in_place().

btr_cur_del_mark_set_clust_rec_log(),
btr_cur_del_mark_set_sec_rec_log(),
btr_cur_set_deleted_flag_for_ibuf():
Remove, and replace with btr_rec_set_deleted<bool>().

page_zip_rec_set_deleted(): Add the parameter mtr, and write a
MLOG_ZIP_WRITE_STRING record to the log.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
f3230111fc MDEV-12353: Introduce MLOG_ZIP_WRITE_STRING
Log the low-level operations for ROW_FORMAT=COMPRESSED index pages
using a new record, MLOG_ZIP_WRITE_STRING. We will still use
MLOG_1BYTE,..., MLOG_8BYTES or MLOG_WRITE_STRING for operations
on other than index pages (such as the page allocation bitmap pages).

We will stop writing the record MLOG_ZIP_PAGE_COMPRESS later, after
replacing all MLOG_REC_ and MLOG_COMP_REC_ that update index pages.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
db5cdc3195 MDEV-12353: Replace MLOG_PAGE_REORGANIZE, MLOG_COMP_PAGE_REORGANIZE
Log page reorganize as a series of insert operations.
This will make the redo log volume proportional to the page payload size.

btr_page_reorganize_low(): Add template <bool recovery=false>

btr_page_reorganize_block(): Remove the parameter 'bool recovery'
2020-02-13 18:19:14 +02:00
Marko Mäkelä
276f996af9 MDEV-12353: Replace MLOG_*_END_COPY_CREATED
Instead of writing the high-level redo log records
MLOG_LIST_END_COPY_CREATED, MLOG_COMP_LIST_END_COPY_CREATED
write log for each individual insert of a record.

page_copy_rec_list_end_to_created_page(): Remove.

This will improve the fill factor of some pages.
Adjust some tests accordingly.

PageBulk::init(), PageBulk::finish(): Avoid setting bogus limits
to PAGE_HEAP_TOP and PAGE_N_DIR_SLOTS. Avoid accessor functions
that would enforce these limits before the correct ones are set
at the end of PageBulk::finish().
2020-02-13 18:19:14 +02:00
Marko Mäkelä
acd265b69b MDEV-12353: Exclusively use page_zip_reorganize() for ROW_FORMAT=COMPRESSED
page_zip_reorganize(): Restore the page on failure.
In callers, omit now-redundant calls to page_zip_decompress().

btr_page_reorganize_low(): Define in static scope only, and
remove the z_level parameter. Assert that ROW_FORMAT is not COMPRESSED.

btr_page_reorganize_block(), btr_page_reorganize(): Invoke
page_zip_reorganize() for ROW_FORMAT=COMPRESSED.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
f802c989ec MDEV-12353: Replace MLOG_UNDO_INSERT
trx_undof_page_add_undo_rec_log(): Remove.

trx_undo_page_set_next_prev_and_add(), trx_undo_page_report_modify(),
trx_undo_page_report_rename(): Write lower-level redo log records.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
e0bc29df18 MDEV-12353: Replace MLOG_UNDO_HDR_CREATE
trx_undo_header_create(): Emit lower-level records instead of
writing MLOG_UNDO_HDR_CREATE records.
2020-02-13 18:19:13 +02:00
Marko Mäkelä
737b701786 MDEV-12353: Remove trx_undo_erase_page_end()
MariaDB stopped writing the record MLOG_UNDO_ERASE_END
in commit 0fd3def284 (10.3.3).
Merge trx_undo_erase_page_end() with its callers.
2020-02-13 18:19:13 +02:00
Marko Mäkelä
07d39cde92 MDEV-12353: Replace MLOG_UNDO_INIT
trx_undo_page_init(): Write lower-level redo log records by
invoking mtr_t::write().
2020-02-13 18:19:13 +02:00
Marko Mäkelä
5bea43f5e0 MDEV-12353: Deprecate and ignore innodb_log_compressed_pages
page_zip_compress_write_log_no_data(): Remove.
We no longer write the MLOG_ZIP_PAGE_COMPRESS_NO_DATA record.
Instead, we will write MLOG_ZIP_PAGE_COMPRESS records.
2020-02-13 18:19:13 +02:00
Marko Mäkelä
600eae9179 MDEV-12353: Remove MTR_LOG_SHORT_INSERTS
No longer emit the redo log records
MLOG_LIST_END_COPY_CREATED, MLOG_COMP_LIST_END_COPY_CREATED.
2020-02-13 18:19:13 +02:00
Marko Mäkelä
1a6f708ec5 MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances
Our benchmarking efforts indicate that the reasons for splitting the
buf_pool in commit c18084f71b
have mostly gone away, possibly as a result of
mysql/mysql-server@ce6109ebfd
or similar work.

Only in one write-heavy benchmark where the working set size is
ten times the buffer pool size, the buf_pool->mutex would be
less contended with 4 buffer pool instances than with 1 instance,
in buf_page_io_complete(). That contention could be alleviated
further by making more use of std::atomic and by splitting
buf_pool_t::mutex further (MDEV-15053).

We will deprecate and ignore the following parameters:

	innodb_buffer_pool_instances
	innodb_page_cleaners

There will be only one buffer pool and one page cleaner task.

In a number of INFORMATION_SCHEMA views, columns that indicated
the buffer pool instance will be removed:

	information_schema.innodb_buffer_page.pool_id
	information_schema.innodb_buffer_page_lru.pool_id
	information_schema.innodb_buffer_pool_stats.pool_id
	information_schema.innodb_cmpmem.buffer_pool_instance
	information_schema.innodb_cmpmem_reset.buffer_pool_instance
2020-02-12 14:45:21 +02:00
Marko Mäkelä
2a6fa1c42b MDEV-21132: Use memcpy_aligned, memset_aligned 2020-02-12 11:32:09 +02:00
Oleksandr Byelkin
4b087e1754 Merge branch '10.4' into 10.5 2020-02-12 08:55:17 +01:00
Marko Mäkelä
fc2f2fa853 MDEV-19747: Deprecate and ignore innodb_log_optimize_ddl
During native table rebuild or index creation, InnoDB used to skip
redo logging and write MLOG_INDEX_LOAD records to inform crash recovery
and Mariabackup of the gaps in redo log. This is fragile and prohibits
some optimizations, such as skipping the doublewrite buffer for
newly (re)initialized pages (MDEV-19738).

row_merge_write_redo(): Remove. We do not write MLOG_INDEX_LOAD
records any more. Instead, we write full redo log.

FlushObserver: Remove.

fseg_free_page_func(): Remove the parameter log. Redo logging
cannot be disabled.

fil_space_t::redo_skipped_count: Remove.

We cannot remove buf_block_t::skip_flush_check, because PageBulk
will temporarily generate invalid B-tree pages in the buffer pool.
2020-02-11 18:44:26 +02:00
Marko Mäkelä
8ccb3caafb MDEV-17491 micro optimize page_id_t further
Let us define page_id_t as a thin wrapper of uint64_t so that
the comparison operators can be simplified. This is a follow-up
to the original commit 14be814380.

The comparison operator for recv_sys.pages.emplace() turned out to be
a busy spot in a recovery benchmark. That data structure was introduced
in MDEV-19586 in commit 177a571e01.
2020-02-11 18:03:19 +02:00
Oleksandr Byelkin
646d1ec83a Merge branch '10.3' into 10.4 2020-02-11 14:40:35 +01:00
Oleksandr Byelkin
58b70dc136 Merge branch '10.2' into 10.3 2020-02-10 20:34:16 +01:00
Marko Mäkelä
06b0623adb Cleanup: Aligned InnoDB index page header access
ut_align_down(): Preserve the const qualifier. Use C++ casts.

ha_delete_hash_node(): Correct an assertion expression.

fil_page_get_type(): Perform an assumed-aligned read.

page_align(): Preserve the const qualifier. Assume (some) alignment.

page_get_max_trx_id(): Check the index page type.

page_header_get_field(): Perform an assumed-aligned read.

page_get_autoinc(): Perform an assumed-aligned read.

page_dir_get_nth_slot(): Perform an assumed-aligned read.
Preserve the const qualifier.
2020-02-08 14:12:59 +02:00
Marko Mäkelä
c5856b0a68 MDEV-21351: Allocate aligned memory
recv_sys_t::ALIGNMENT: The recv_sys_t::alloc() alignment
2020-02-08 11:47:42 +02:00
Marko Mäkelä
0d1ca19383 One more fixup for sizeof(mtr_t) reduction
Add explicit casts when assigning ulint to m_user_space_id.
2020-02-07 13:44:13 +02:00
Marko Mäkelä
91e7b44399 mtr_t::get_log_mode(): Remove a redundant assertion
mtr_log_t and mtr_t::m_log_mode have the same range 0 to 3.
2020-02-07 13:29:08 +02:00
Marko Mäkelä
2b260f2ddd Fixup the parent commit
mtr_t::get_log_mode(): Use equivalent static_assert().

mtr_t::m_n_log_recs: Do not exceed the number of bits in uint16_t.
2020-02-07 13:15:33 +02:00
Thirunarayanan Balathandayuthapani
3be751d5b9 MDEV-21608 Assertion `n_ext == dtuple_get_n_ext(dtuple)' failed during updation of PK
- n_ext value may be less than dtuple_get_n_ext(dtuple) when PK is being
updated and new record inherits the externally stored fields from
delete mark old record.
2020-02-07 16:01:31 +05:30
Marko Mäkelä
9a999469f7 Cleanup: Recude sizeof(mtr_t)
Use bit-fields for some mtr_t members to improve locality of reference.
Because mtr_t is never shared between threads, there are no considerations
regarding concurrent access.
2020-02-07 12:07:12 +02:00
Marko Mäkelä
8b6cfda631 Merge 10.4 into 10.5 2020-02-07 08:51:20 +02:00
Marko Mäkelä
8b97eba31b MDEV-21674 purge_sys.stop() fails to wait for purge workers to complete
Since commit 5e62b6a5e0 (MDEV-16264),
purge_sys_t::stop() no longer waited for all purge activity to stop.

This caused problems on FLUSH TABLES...FOR EXPORT because of
purge running concurrently with the buffer pool flush.
The assertion at the end of buf_flush_dirty_pages() could fail.

The, implemented by Vladislav Vaintroub, aims to eliminate race
conditions when stopping or resuming purge:

waitable_task::disable(): Wait for the task to complete, then replace
the task callback function with noop.

waitable_task::enable(): Restore the original task callback function
after disable().

purge_sys_t::stop(): Invoke purge_coordinator_task.disable().

purge_sys_t::resume(): Invoke purge_coordinator_task.enable().

purge_sys_t::running(): Add const qualifier, and clarify the comment.
The purge coordinator task will remain active as long as any purge
worker task is active.

purge_worker_callback(): Assert purge_sys.running().

srv_purge_wakeup(): Merge with the only caller purge_sys_t::resume().

purge_coordinator_task: Use static linkage.
2020-02-07 08:12:58 +02:00
Marko Mäkelä
6d214415c9 MDEV-21351: Free processed recv_sys_t::blocks
Release memory as soon as redo log records are processed.

Because the memory allocation and deallocation of parsed redo log
records must be protected by recv_sys.mutex, it is better to avoid
using a std::atomic field for bookkeeping.

buf_page_t::access_time: Keep track of the recv_sys.pages record
allocations. The most significant 16 bits will count allocated
blocks (which were previously counted by buf_page_t::buf_fix_count
in the debug version), and the least significant 16 bits indicate
the number of allocated bytes in the block (which was previously
managed in buf_block_t::modify_clock), which must be a positive
number, up to innodb_page_size. The byte offset 65536 is represented
as the value 0.

recv_recover_page(): Let the caller erase the log.

recv_validate_tablespace(): Acquire recv_sys_t::mutex.
2020-02-06 09:00:19 +02:00
Marko Mäkelä
a9d1324867 Cleanup: Remove mem_block_t::magic_n and mem_block_validate()
Use of freed memory is better caught by AddressSanitizer,
especially with ASAN_POISON_MEMORY_REGION that is aliased
by MEM_NOACCESS and UNIV_MEM_FREE.
2020-02-03 12:34:08 +02:00
Eugene Kosov
691c691adc clean up redo log
main change: rename first redo log without file close

second change: use os_offset_t to represent offset in a file

third change: fix log texts
2020-02-01 23:58:24 +08:00
Marko Mäkelä
1b414c0313 MDEV-21256 after-merge fix: Use std::atomic
Starting with MariaDB Server 10.4, C++11 is being used.
Hence, std::atomic should be preferred to my_atomic.
2020-02-01 15:06:12 +02:00
Marko Mäkelä
4b291588bb MDEV-19845: Make my_cpu.h self-contained
Fix up commit f5c080c735
2020-02-01 14:56:05 +02:00
Eugene Kosov
bd36a4ca12 introduce HASH_REPLACE() for hash_table_t
HASH_REPLACE(): allows to not travel through linked list twice
when HASH_INSERT() happens right after HASH_DELETE()
2020-01-31 22:14:18 +08:00
Marko Mäkelä
5defdc382b Cleanup: Remove mtr_state_t and mtr_t::m_state
mtr_t::is_active(), mtr_t::is_committed(): Make debug-only.
2020-01-29 14:28:45 +02:00
Marko Mäkelä
50324ce624 MDEV-21351 Replace recv_sys.heap with list of buf_block_t
InnoDB crash recovery used a special type of mem_heap_t that
allocates backing store from the buffer pool. That incurred
a significant overhead, leading to underutilization of memory,
and limiting the maximum contiguous allocated size of a log record.

recv_sys_t::blocks: A linked list of buf_block_t that are allocated
by buf_block_alloc() for redo log records. Replaces recv_sys_t::heap.
We repurpose buf_block_t::unzip_LRU for linking the elements.

recv_sys_t::max_log_blocks: Renamed from recv_n_pool_free_frames.

recv_sys_t::max_blocks(): Accessor for max_log_blocks.

recv_sys_t::alloc(): Allocate memory from the current recv_sys_t::blocks
element, or allocate another block.  In debug builds, various free()
member functions must be invoked, because we repurpose
buf_page_t::buf_fix_count for tracking allocations.

recv_sys_t::free_corrupted_page(): Renamed from recv_recover_corrupt_page()

recv_sys_t::is_memory_exhausted(): Renamed from recv_sys_heap_check()

recv_sys_t::pages and its elements are allocated directly by the
system memory allocator.

recv_parse_log_recs(): Remove the parameter available_memory.

We rename some variables 'store_to_hash' to 'store', because
recv_sys.pages is not actually a hash table.

This is joint work with Thirunarayanan Balathandayuthapani.
2020-01-29 12:53:39 +02:00
Alexander Barkov
f1e13fdc8d MDEV-21581 Helper functions and methods for CHARSET_INFO 2020-01-28 12:29:23 +04:00
Eugene Kosov
b534a6675c cleanup redo log
class log_file_t: more or less sane RAII wrapper around redo log file
descriptor and its path.

This change is motivated by the need of using that log_file_t somewhere else.
2020-01-24 23:27:38 +08:00
Eugene Kosov
34dafb7e3a redo log mics fixes
os_file_flush_data_func(): fix builds on POSIX OSs where fdatasync()
is not avaiable

log_t::files::flush_data_only(): rename from fdatasync()

log_t::files::fsync(): removed and replaced with flush_data_only().
It will flush everything we need for using redo log files.
2020-01-23 22:46:43 +08:00
Eugene Kosov
700e010309 fix aligned memcpy()-like functions usage
I found that memcpy_aligned was used incorrectly at redo log and decided to put
assertions in aligned functions. And found even more incorrect cases.

Given the amount discovered of bugs, I left assertions to prevent future bugs.

my_assume_aligned(): instead of MY_ASSUME_ALIGNED macro
2020-01-23 00:12:43 +08:00
Marko Mäkelä
ded128aa9b Merge 10.4 into 10.5 2020-01-20 16:48:56 +02:00
Marko Mäkelä
87a61355e8 Merge 10.3 into 10.4
The MDEV-17062 fix in commit c4195305b2
was omitted.
2020-01-20 15:49:48 +02:00
Eugene Kosov
e9de6386ad MDEV-18115 remove now unneeded constraint
log_group_max_size: is not needed because redo log do not use fil_io() now
2020-01-18 23:42:55 +08:00
Marko Mäkelä
6373ec3ec7 Merge 10.2 into 10.3 2020-01-18 16:56:16 +02:00
Marko Mäkelä
7b70cbd838 MDEV-21499 Merge new release of InnoDB 5.7.29 to 10.2 2020-01-17 16:24:40 +02:00
Marko Mäkelä
457ce97ef2 MDEV-21512 InnoDB may hang due to SPATIAL INDEX
MySQL 5.7.29 includes the following fix:
Bug #30287668 INNODB: A LONG SEMAPHORE WAIT
mysql/mysql-server@5cdbb22b51

There is no test case. It seems that the problem could occur when
a spatial index is large and peculiar enough so that multiple R-tree
leaf pages will have the exactly same maximum bounding rectangle (MBR).

The commit message suggests that the hang can occur when R-tree
non-leaf pages are being merged, which should only be possible
during transaction rollback or the purge of transaction history,
when the R-tree index is at least 2 levels high and very many records
are being deleted. The message says that a comparison result that two
spatial index node pointer records are equal will cause an infinite loop
in rtr_page_copy_rec_list_end_no_locks(). Hence, we must include the
child page number in the comparison to be consistent with
mysql/mysql-server@2e11fe0e15.

We fix this bug in a simpler way, involving fewer code changes.

cmp_rec_rec(): Renamed from cmp_rec_rec_with_match().
Assert that rec2 always resides in an index page.
Treat non-leaf spatial index pages specially.
2020-01-17 14:27:29 +02:00
Marko Mäkelä
c3695b4058 MDEV-21511: Remove unnecessary code
Now that we will be invoking dtuple_get_n_ext() instead of
letting btr_push_update_extern_fields() update an already
calculated value, it is unnecessary to calculate the n_ext
upfront.

row_rec_to_index_entry(), row_rec_to_index_entry_low():
Remove the output parameter n_ext.
2020-01-17 14:27:29 +02:00
Marko Mäkelä
5838b52743 MDEV-21511 Wrong estimate of affected BLOB columns in update
During update, rollback, or MVCC read, we may miscalculate
the number of off-page columns, and thus the size of the
clustered index record. The function btr_push_update_extern_fields()
is mostly redundant, because the off-page columns would also be
moved by row_upd_index_replace_new_col_val(), which is invoked
via row_upd_index_replace_new_col_vals().

btr_push_update_extern_fields(): Remove.

This is based on
mysql/mysql-server@1fa475b85d
which refines a fix for a recovery bug fix
mysql/mysql-server@ce0a1e85e2
in MySQL 5.7.5.

No test case was provided by Oracle.
Some of the changed code is being covered by the existing test
innodb.blob-crash.
2020-01-17 14:27:28 +02:00