Commit graph

482 commits

Author SHA1 Message Date
Marko Mäkelä
572d20757b MDEV-12353: Reduce log volume of page_cur_delete_rec()
mrec_ext_t: Introduce DELETE_ROW_FORMAT_REDUNDANT,
DELETE_ROW_FORMAT_DYNAMIC.

mtr_t::page_delete(): Write DELETE_ROW_FORMAT_REDUNDANT or
DELETE_ROW_FORMAT_DYNAMIC log records. We log the byte offset
of the preceding record, so that on recovery we can easily
find everything to update. For DELETE_ROW_FORMAT_DYNAMIC,
we must also write the header and data size of the record.

We will retain the physical logging for ROW_FORMAT=COMPRESSED pages.

page_zip_dir_balance_slot(): Renamed from page_dir_balance_slot(),
and specialized for ROW_FORMAT=COMPRESSED only.

page_rec_set_n_owned(), page_dir_slot_set_n_owned(),
page_dir_balance_slot(): New variants that do not write any log.

page_mem_free(): Take data_size, extra_size as parameters.
Always zerofill the record payload.

page_cur_delete_rec(): For other than ROW_FORMAT=COMPRESSED,
only write log by mtr_t::page_delete().
2020-02-22 21:19:47 +02:00
Eugene Kosov
6618fc2974 MDEV-21774 Innodb, Windows : restore file sharing logic in Innodb
recv_sys_t opened redo log files along with log_sys_t. That's why I
removed file sharing logic from InnoDB
in 9ef2d29ff4
But it was actually used to ensure that only one MariaDB instance
will touch the same InnoDB files.

os0file.cc: revert some changes done previously

mapped_file_t::map(): now has arguments read_only, nvme

file_io::open(): now has argument read_only

class file_os_io: make final

log_file_t::open(): now has argument read_only
2020-02-20 18:24:21 +03:00
Marko Mäkelä
84e3f9ce84 MDEV-12353: Reduce log volume by an UNDO_APPEND record
We introduce an EXTENDED log record for appending an undo log record
to an undo log page. This is equivalent to the MLOG_UNDO_INSERT record
that was removed in commit f802c989ec,
only using more compact encoding.

mtr_t::log_write(): Fix a bug that affects longer log
record writes in the !same_page && !have_offset case.
Similar code is already implemented for the have_offset code path.
The bug was unobservable before we started to write longer
EXTENDED records. All !have_offset records (FREE_PAGE, INIT_PAGE,
EXTENDED) that were written so far are short, and we never write
RESERVED or OPTION records.

mtr_t::undo_append(): Write an UNDO_APPEND record.

log_phys_t::undo_append(): Apply an UNDO_APPEND record.

trx_undo_page_set_next_prev_and_add(),
trx_undo_page_report_modify(),
trx_undo_page_report_rename():
Invoke mtr_t::undo_append() instead of emitting WRITE records.
2020-02-19 16:42:38 +02:00
Marko Mäkelä
86f262f1c7 MDEV-12353: Reduce log volume by an UNDO_INIT record
We introduce an EXTENDED log record for initializing an undo log page.
The size of the record will be 2 bytes plus the optional page identifier.
The entire undo page will be initialized, except the space that is
already reserved for TRX_UNDO_SEG_HDR in trx_undo_seg_create().

mtr_t::undo_create(): Write the UNDO_INIT record.

trx_undo_page_init(): Initialize the undo page corresponding to the
UNDO_INIT record. Unlike the former MLOG_UNDO_INIT record, we will
initialize almost the entire page, including initializing the
TRX_UNDO_PAGE_NODE to an empty list node, so that the subsequent call
to flst_init() will avoid writing log for the undo page.
2020-02-19 15:52:16 +02:00
Eugene Kosov
e62e285fc4 remove unused function 2020-02-19 12:51:08 +03:00
Eugene Kosov
9ef2d29ff4 MDEV-14425 deprecate and ignore innodb_log_files_in_group
Now there can be only one log file instead of several which
logically work as a single file.

Possible names of redo log files: ib_logfile0,
ib_logfile101 (for just created one)

innodb_log_fiels_in_group: value of this variable is not used
by InnoDB. Possible values are still 1..100, to not break upgrade

LOG_FILE_NAME: add constant of value "ib_logfile0"
LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile"

get_log_file_path(): convenience function that returns full
path of a redo log file

SRV_N_LOG_FILES_MAX: removed

srv_n_log_files: we can't remove this for compatibility reasons,
but now server doesn't use this variable

log_sys_t::file::fd: now just one, not std::vector

log_sys_t::log_capacity: removed word 'group'

find_and_check_log_file(): part of logic from huge srv_start()
moved here

recv_sys_t::files: file descriptors of redo log files.
There can be several of those in case we're upgrading
from older MariaDB version.

recv_sys_t::remove_extra_log_files: whether to remove
ib_logfile{1,2,3...} after successfull upgrade.

recv_sys_t::read(): open if needed and read from one
of several log files

recv_sys_t::files_size(): open if needed and return files count

redo_file_sizes_are_correct(): check that redo log files
sizes are equal. Just to log an error for a user.
Corresponding check was moved from srv0start.cc

namespace deprecated: put all deprecated variables here to
prevent usage of it by us, developers
2020-02-19 12:21:59 +03:00
Marko Mäkelä
9fd309498c MDEV-12353 Cleanup: Rename INIT_INDEX_PAGE to EXTENDED
We plan use the redo log record main type code 0x20 for
InnoDB specific index page operations.

mrec_type_t: Rename INIT_INDEX_PAGE to EXTENDED.

mrec_ext_t: The EXTENDED subtypes.

This is a non-functional change: the redo log record encoding
that was introduced in commit 7ae21b18a6
is not affected.
2020-02-18 12:08:33 +02:00
Marko Mäkelä
37dc087f58 MDEV-12353: Remove bogus comments and clean up code
This is a fixup for commit 7ae21b18a6.

It turns out that even if we in the future made LSN
count mini-transactions instead of bytes, we will need
both start LSN and end LSN, which must exactly match
between mtr_t::commit() and log_phys_t::apply().

log_rec_t::lsn: Restore the const qualifier.

log_phys_t::append(): Remove the lsn parameter. Both the start
and end LSN must remain unchanged. We can only append log from
the same mini-transaction to a single log record snippet.
If we combined the log from mini-transactions A and B, it could
happen that the FIL_PAGE_LSN of the page is somewhere between
A.start_lsn and B.start_lsn. In that case, also the log of B
would be wrongly skipped.

recv_sys_t::add(): Assert that if the start LSN matches, also
the end LSN will match.
2020-02-14 10:57:52 +02:00
Marko Mäkelä
f8a9f90667 MDEV-12353: Remove support for crash-upgrade
We tighten some assertions regarding dict_index_t::is_dummy
and crash recovery, now that redo log processing will
no longer create dummy objects.
2020-02-13 19:13:45 +02:00
Marko Mäkelä
7ae21b18a6 MDEV-12353: Change the redo log encoding
log_t::FORMAT_10_5: physical redo log format tag

log_phys_t: Buffered records in the physical format.
The log record bytes will follow the last data field,
making use of alignment padding that would otherwise be wasted.
If there are multiple records for the same page, also those
may be appended to an existing log_phys_t object if the memory
is available.

In the physical format, the first byte of a record identifies the
record and its length (up to 15 bytes). For longer records, the
immediately following bytes will encode the remaining length
in a variable-length encoding. Usually, a variable-length-encoded
page identifier will follow, followed by optional payload, whose
length is included in the initially encoded total record length.

When a mini-transaction is updating multiple fields in a page,
it can avoid repeating the tablespace identifier and page number
by setting the same_page flag (most significant bit) in the first
byte of the log record. The byte offset of the record will be
relative to where the previous record for that page ended.

Until MDEV-14425 introduces a separate file-level log for
redo log checkpoints and file operations, we will write the
file-level records in the page-level redo log file.
The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
will be removed in MDEV-14425, and one sequential scan of the
page recovery log will suffice.

Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
If the information is needed, it can be parsed from WRITE records that
modify FSP_SPACE_FLAGS.

MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
as part of this work, before being replaced with WRITE (along with
MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).

mtr_buf_t::empty(): Check if the buffer is empty.

mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.

mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
for the same_page encoding.

page_recv_t::last_offset: Reflects mtr_t::m_last_offset.

Valid values for last_offset during recovery should be 0 or above 8.
(The first 8 bytes of a page are the checksum and the page number,
and neither are ever updated directly by log records.)
Internally, the special value 1 indicates that the same_page form
will not be allowed for the subsequent record.

mtr_t::page_create(): Take the block descriptor as parameter,
so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
record will always followed by a subtype byte, because same_page
records must be longer than 1 byte.

trx_undo_page_init(): Combine the writes in WRITE record.

trx_undo_header_create(): Write 4 bytes using a special MEMSET
record that includes 1 bytes of length and 2 bytes of payload.

flst_write_addr(): Define as a static function. Combine the writes.

flst_zero_both(): Replaces two flst_zero_addr() calls.

flst_init(): Do not inline the function.

fsp_free_seg_inode(): Zerofill the whole inode.

fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
to FIL_NULL when using the physical format.

btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
must have been invoked.

fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.

fil_names_dirty_and_write(): Remove the parameter mtr.
Write the records using a separate mini-transaction object,
because any FILE_ records must be at the start of a mini-transaction log.

recv_recover_page(): Add a fil_space_t* parameter.
After applying log to the a ROW_FORMAT=COMPRESSED page,
invoke buf_zip_decompress() to restore the uncompressed page.

buf_page_io_complete(): Remove the temporary hack to discard the
uncompressed page of a ROW_FORMAT=COMPRESSED page.

page_zip_write_header(): Remove. Use mtr_t::write() or
mtr_t::memset() instead, and update the compressed page frame
separately.

trx_undo_header_add_space_for_xid(): Remove.

trx_undo_seg_create(): Perform the changes that were previously
made by trx_undo_header_add_space_for_xid().

btr_reset_instant(): New function: Reset the table to MariaDB 10.2
or 10.3 format when rolling back an instant ALTER TABLE operation.

page_rec_find_owner_rec(): Merge with the only callers.

page_cur_insert_rec_low(): Combine writes by using a local buffer.
MEMMOVE data from the preceding record whenever feasible
(copying at least 3 bytes).

page_cur_insert_rec_zip(): Combine writes to page header fields.

PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
part from the preceding record.

PageBulk::finishPage(): Combine the writes to the page header
and to the sparse page directory slots.

mtr_t::write(): Only log the least significant (last) bytes
of multi-byte fields that actually differ.

For updating FSP_SIZE, we must always write all 4 bytes to the
redo log, so that the fil_space_set_recv_size() logic in
recv_sys_t::parse() will work.

mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
instead of a numeric offset to the page frame. Only log the
last bytes of multi-byte fields that actually differ.

In fil_space_crypt_t::write_page0(), we must log also any
unchanged bytes, so that recovery will recognize the record
and invoke fil_crypt_parse().

Future work:
MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
MDEV-21725 Optimize btr_page_reorganize_low() redo logging
MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
2020-02-13 19:12:17 +02:00
Marko Mäkelä
9869005201 Cleanup ibuf_page_exists(): Take simpler parameters 2020-02-13 18:19:15 +02:00
Marko Mäkelä
67c76704a8 MDEV-12353: Remove MLOG_INDEX_LOAD (innodb_log_optimize_ddl)
NOTE: This may break crash-upgrade from a dataset that was
created with innodb_log_optimize_ddl=ON. Also due to
ROW_FORMAT=COMPRESSED pages, it will be easiest to disallow
crash-upgrade.

It would be more robust to disable the MDEV-12699 logic when
crash-upgrading from old redo log format.

log_optimized_ddl_op: Remove.

fil_space_t::enable_lsn, file_name_t::enable_lsn: Remove.

ddl_tracker_t::optimized_ddl: Remove.

TODO: Remove ddl_tracker
2020-02-13 18:19:15 +02:00
Marko Mäkelä
f37a29dd66 MDEV-12353: Write log by mtr_t member functions only
mtr_t::log_write_low(): Replaces mlog_write_initial_log_record_low().

mtr_t::log_file_op(): Replaces fil_op_write_log().

mtr_t::free(): Write MLOG_INIT_FREE_PAGE.

mtr_t::init(): Write MLOG_INIT_FILE_PAGE2.

mtr_t::page_create(): Write record about the partial initialization
of an index page.

mlog_catenate_ulint(), mlog_catenate_string(),
mlog_open(), mlog_close(): Remove.
2020-02-13 18:19:15 +02:00
Marko Mäkelä
2e7a084283 MDEV-21174: Remove mlog_write_initial_log_record_fast()
Pass buf_block_t* to all functions that write redo log.
Specifically, replace the parameters page,page_zip
with buf_block_t* block in page_zip_ functions.
2020-02-13 18:19:15 +02:00
Marko Mäkelä
08ba388713 MDEV-12353: Replace MLOG_REC_INSERT,MLOG_COMP_REC_INSERT
page_mem_alloc_free(), page_dir_set_n_heap(), page_ptr_set_direction():
Merge with the callers.

page_direction_reset(), page_direction_increment(),
page_zip_dir_insert(), page_zip_write_rec_ext(), page_zip_write_rec():
Add the parameter mtr, and write log.

PageBulk::insert(), PageBulk::finish(): Write log for all changes.

page_cur_rec_insert(), page_cur_insert_rec_write_log(),
page_cur_insert_rec_write_log(): Remove.

page_rec_set_next(), page_header_set_field(), page_header_set_ptr():
Remove. Use lower-level operations with or without logging.

page_zip_dir_add_slot(): Move to the same compilation unit with
its only caller, page_cur_insert_rec_zip().

page_cur_insert_rec_zip(): Mark pieces of code that must be skipped
once this task is completed.

btr_defragment_chunk(): Before starting a mini-transaction that
is writing (a lot), invoke log_free_check(). This should allow
the test innodb.innodb_defrag_concurrent to pass with the
mtr default_mysqld.cnf setting of innodb_log_file_size=10M.

MLOG_BUF_MARGIN: Remove.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
2a77b2a510 MDEV-12353: Replace MLOG_*LIST_*_DELETE and MLOG_*REC_DELETE
No longer write the following redo log records:
MLOG_COMP_LIST_END_DELETE, MLOG_LIST_END_DELETE,
MLOG_COMP_LIST_START_DELETE, MLOG_LIST_START_DELETE,
MLOG_REC_DELETE,MLOG_COMP_REC_DELETE.

Each individual deleted record will be logged separately
using physical log records.

page_dir_slot_set_n_owned(),
page_zip_rec_set_owned(), page_zip_dir_delete(), page_zip_clear_rec():
Add the parameter mtr, and write redo log.

page_dir_slot_set_rec(): Remove. Replaced with lower-level operations
that write redo log when necessary.

page_rec_set_n_owned(): Replaces rec_set_n_owned_old(),
rec_set_n_owned_new().

rec_set_heap_no(): Replaces rec_set_heap_no_old(), rec_set_heap_no_new().

page_mem_free(), page_dir_split_slot(), page_dir_balance_slot():
Add the parameter mtr.

page_dir_set_n_slots(): Merge with the caller page_dir_split_slot().

page_dir_slot_set_rec(): Merge with the callers page_dir_split_slot()
and page_dir_balance_slot().

page_cur_insert_rec_low(), page_cur_insert_rec_zip():
Suppress the logging of lower-level operations.

page_cur_delete_rec_write_log(): Remove.

page_cur_delete_rec(): Do not tolerate mtr=NULL.

rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_comp():
Replace rec_set_heap_no_old() and rec_set_heap_no_new() with direct
access that does not involve redo logging.

mtr_t::memcpy(): Do allow non-redo-logged writes to uncompressed pages
of ROW_FORMAT=COMPRESSED pages.

buf_page_io_complete(): Evict the uncompressed page of
a ROW_FORMAT=COMPRESSED page after recovery. Because we no longer
write logical log records for deleting index records, but instead
write physical records that may refer directly to the compressed
page frame of a ROW_FORMAT=COMPRESSED page, and because on recovery
we will only apply the changes to the ROW_FORMAT=COMPRESSED page,
the uncompressed page frame can be stale until page_zip_decompress()
is executed.

recv_parse_or_apply_log_rec_body(): After applying MLOG_ZIP_WRITE_STRING,
ensure that the FIL_PAGE_TYPE of the uncompressed page matches the
compressed page, because buf_flush_init_for_writing() assumes that
field to be valid.

mlog_init_t::mark_ibuf_exist(): Invoke page_zip_decompress(), because
the uncompressed page after buf_page_create() is not necessarily
up to date.

buf_LRU_block_remove_hashed(): Bypass a page_zip_validate() check
during redo log apply.

recv_apply_hashed_log_recs(): Invoke mlog_init.mark_ibuf_exist()
also for the last batch, to ensure that page_zip_decompress() will
be called for freshly initialized pages.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
d00185c40d MDEV-12353: Replace MLOG_PAGE_CREATE_RTREE, MLOG_PAGE_COMP_CREATE_RTREE
page_create(): Create normal B-tree pages. Callers that create
R-tree pages will set FIL_PAGE_TYPE and reset the split
sequence number afterwards.

The creation of ROW_FORMAT=COMPRESSED pages is unaffected;
they will be logged as compressed page images.

page_create_low(): Take const buf_block_t* as a parameter.
Let the callers invoke buf_block_modify_clock_inc().
2020-02-13 18:19:14 +02:00
Marko Mäkelä
b3d02a1fcf MDEV-12353: Replace DELETE_MARK redo log records with MLOG_WRITE_STRING
btr_cur_upd_rec_sys(): Replaces row_upd_rec_sys_fields() and
implements redo logging.

row_upd_rec_sys_fields_in_recovery(): Remove, and merge to the
only remaining caller btr_cur_parse_update_in_place().

btr_cur_del_mark_set_clust_rec_log(),
btr_cur_del_mark_set_sec_rec_log(),
btr_cur_set_deleted_flag_for_ibuf():
Remove, and replace with btr_rec_set_deleted<bool>().

page_zip_rec_set_deleted(): Add the parameter mtr, and write a
MLOG_ZIP_WRITE_STRING record to the log.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
f3230111fc MDEV-12353: Introduce MLOG_ZIP_WRITE_STRING
Log the low-level operations for ROW_FORMAT=COMPRESSED index pages
using a new record, MLOG_ZIP_WRITE_STRING. We will still use
MLOG_1BYTE,..., MLOG_8BYTES or MLOG_WRITE_STRING for operations
on other than index pages (such as the page allocation bitmap pages).

We will stop writing the record MLOG_ZIP_PAGE_COMPRESS later, after
replacing all MLOG_REC_ and MLOG_COMP_REC_ that update index pages.
2020-02-13 18:19:14 +02:00
Marko Mäkelä
737b701786 MDEV-12353: Remove trx_undo_erase_page_end()
MariaDB stopped writing the record MLOG_UNDO_ERASE_END
in commit 0fd3def284 (10.3.3).
Merge trx_undo_erase_page_end() with its callers.
2020-02-13 18:19:13 +02:00
Marko Mäkelä
07d39cde92 MDEV-12353: Replace MLOG_UNDO_INIT
trx_undo_page_init(): Write lower-level redo log records by
invoking mtr_t::write().
2020-02-13 18:19:13 +02:00
Eugene Kosov
c400a73d7a micro optimization: avoid std::string copy 2020-02-13 16:26:47 +03:00
Marko Mäkelä
1a6f708ec5 MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances
Our benchmarking efforts indicate that the reasons for splitting the
buf_pool in commit c18084f71b
have mostly gone away, possibly as a result of
mysql/mysql-server@ce6109ebfd
or similar work.

Only in one write-heavy benchmark where the working set size is
ten times the buffer pool size, the buf_pool->mutex would be
less contended with 4 buffer pool instances than with 1 instance,
in buf_page_io_complete(). That contention could be alleviated
further by making more use of std::atomic and by splitting
buf_pool_t::mutex further (MDEV-15053).

We will deprecate and ignore the following parameters:

	innodb_buffer_pool_instances
	innodb_page_cleaners

There will be only one buffer pool and one page cleaner task.

In a number of INFORMATION_SCHEMA views, columns that indicated
the buffer pool instance will be removed:

	information_schema.innodb_buffer_page.pool_id
	information_schema.innodb_buffer_page_lru.pool_id
	information_schema.innodb_buffer_pool_stats.pool_id
	information_schema.innodb_cmpmem.buffer_pool_instance
	information_schema.innodb_cmpmem_reset.buffer_pool_instance
2020-02-12 14:45:21 +02:00
Marko Mäkelä
fc2f2fa853 MDEV-19747: Deprecate and ignore innodb_log_optimize_ddl
During native table rebuild or index creation, InnoDB used to skip
redo logging and write MLOG_INDEX_LOAD records to inform crash recovery
and Mariabackup of the gaps in redo log. This is fragile and prohibits
some optimizations, such as skipping the doublewrite buffer for
newly (re)initialized pages (MDEV-19738).

row_merge_write_redo(): Remove. We do not write MLOG_INDEX_LOAD
records any more. Instead, we write full redo log.

FlushObserver: Remove.

fseg_free_page_func(): Remove the parameter log. Redo logging
cannot be disabled.

fil_space_t::redo_skipped_count: Remove.

We cannot remove buf_block_t::skip_flush_check, because PageBulk
will temporarily generate invalid B-tree pages in the buffer pool.
2020-02-11 18:44:26 +02:00
Marko Mäkelä
f3dac59174 MDEV-21351: Fix a performance regression
The linear scan of recv_sys_t::blocks() in recv_sys_t::free()
turns out to dominate the execution time in crash recovery.
Let us scan the much shorter buf_pool->chunks lists instead.
2020-02-11 17:18:26 +02:00
Marko Mäkelä
c5856b0a68 MDEV-21351: Allocate aligned memory
recv_sys_t::ALIGNMENT: The recv_sys_t::alloc() alignment
2020-02-08 11:47:42 +02:00
Marko Mäkelä
6d214415c9 MDEV-21351: Free processed recv_sys_t::blocks
Release memory as soon as redo log records are processed.

Because the memory allocation and deallocation of parsed redo log
records must be protected by recv_sys.mutex, it is better to avoid
using a std::atomic field for bookkeeping.

buf_page_t::access_time: Keep track of the recv_sys.pages record
allocations. The most significant 16 bits will count allocated
blocks (which were previously counted by buf_page_t::buf_fix_count
in the debug version), and the least significant 16 bits indicate
the number of allocated bytes in the block (which was previously
managed in buf_block_t::modify_clock), which must be a positive
number, up to innodb_page_size. The byte offset 65536 is represented
as the value 0.

recv_recover_page(): Let the caller erase the log.

recv_validate_tablespace(): Acquire recv_sys_t::mutex.
2020-02-06 09:00:19 +02:00
Marko Mäkelä
50324ce624 MDEV-21351 Replace recv_sys.heap with list of buf_block_t
InnoDB crash recovery used a special type of mem_heap_t that
allocates backing store from the buffer pool. That incurred
a significant overhead, leading to underutilization of memory,
and limiting the maximum contiguous allocated size of a log record.

recv_sys_t::blocks: A linked list of buf_block_t that are allocated
by buf_block_alloc() for redo log records. Replaces recv_sys_t::heap.
We repurpose buf_block_t::unzip_LRU for linking the elements.

recv_sys_t::max_log_blocks: Renamed from recv_n_pool_free_frames.

recv_sys_t::max_blocks(): Accessor for max_log_blocks.

recv_sys_t::alloc(): Allocate memory from the current recv_sys_t::blocks
element, or allocate another block.  In debug builds, various free()
member functions must be invoked, because we repurpose
buf_page_t::buf_fix_count for tracking allocations.

recv_sys_t::free_corrupted_page(): Renamed from recv_recover_corrupt_page()

recv_sys_t::is_memory_exhausted(): Renamed from recv_sys_heap_check()

recv_sys_t::pages and its elements are allocated directly by the
system memory allocator.

recv_parse_log_recs(): Remove the parameter available_memory.

We rename some variables 'store_to_hash' to 'store', because
recv_sys.pages is not actually a hash table.

This is joint work with Thirunarayanan Balathandayuthapani.
2020-01-29 12:53:39 +02:00
Eugene Kosov
d9789718b9 bling windows build fix 2020-01-04 00:23:26 +07:00
Eugene Kosov
562c037b48 MDEV-18115 Remove dummy tablespace for the redo log
Redo log subsystem was decoupled from tablespace subsystem. It now manages file
descriptors for redo log files by itself.

FIL_TYPE_LOG: removed, code in various places was simplified

SRV_LOG_SPACE_FIRST_ID: renamed to SRV_SPACE_ID_UPPER_BOUND
  to better match its purpose. Code in various places was simplified

fil_n_log_flushes: replaced with log_sys::flushes
fil_n_pending_log_flushes: replaced with log_sys::pending_flushes

log_t::files::files: redo log file descriptors
log_t::files::file_names: redo log file names

log_t::files::set_file_names(): set file names without opening them
log_t::files::open_files(): opens redo log files
log_t::files::read(): treats several files as one big
log_t::files::write(): treats several files as one big
log_t::files::fsync(): flushes page cache to disk
log_t::files::close_files(): closes redo log files

fil_open_log_and_system_tablespace_files(): renamed to
  fil_open_system_tablespace_files()
  and obviously it now doesn't open redo log files

global files[1000]: removed. Why it was needed at all?
2020-01-01 22:09:51 +08:00
Marko Mäkelä
b36154a109 Cleanup log_rec_t 2019-12-27 21:20:03 +02:00
Marko Mäkelä
8cc15c036d Merge 10.4 into 10.5 2019-12-27 21:17:16 +02:00
Marko Mäkelä
b86e0f25f8 Cleanup: Use more page_id_t in crash recovery 2019-12-27 20:44:34 +02:00
Marko Mäkelä
4c25e75ce7 Merge 10.3 into 10.4 2019-12-27 18:20:28 +02:00
Marko Mäkelä
5ab70e7f68 Merge 10.2 into 10.3 2019-12-27 15:14:48 +02:00
Thirunarayanan Balathandayuthapani
bba59abb03 MDEV-19176 Reduce the memory usage during recovery
- Moved the recv_sys->heap memory condition inside recv_parse_log_recs().
So that, InnoDB can mark the status as STORE_NO earlier.

- InnoDB uses one third of buffer pool chunk size for reading the redo
log records. In that case, we can avoid the scenario where buffer ran
out of memory issue during recovery.
2019-12-23 15:51:02 +05:30
Marko Mäkelä
44be8652c5 Cleanup: Remove fil_space_get_flags()
Replace fil_space_get_flags(space) == ULINT_UNDEFINED
with the functionally equivalent fil_space_get_size(space) == 0.
2019-12-18 16:27:26 +02:00
Marko Mäkelä
fb4a897fd9 MDEV-12353 preparation: Remove UNIV_LOG_LSN_DEBUG
The debug instrumentation with the MLOG_LSN pseudo-record has not been
used for debugging for years. Let us remove this code now.
It would have to be removed as part of MDEV-12353 or MDEV-14425 anyway,
when implementing a new redo log file format.
2019-12-17 15:39:21 +02:00
Marko Mäkelä
745fd4b39f MDEV-21174: Remove some mlog_write_initial_log_record_fast()
Pass buf_block_t* to more functions that write redo log.

page_zip_write_node_ptr(), page_zip_write_blob_ptr(),
page_zip_compress_write_log_no_data():
Take buf_block_t* as parameter, and do not tolerate mtr=NULL.

page_zip_compress(): Do not tolerate mtr=NULL.

page_zip_dir_insert(): Take page_cur_t* as parameter.

mlog_write_initial_log_record(): Remove. This function was unused.

RecIterator::remove(): Remove the redundant page_zip parameter.

PageConverter::m_page_zip_ptr: Remove.
2019-12-13 18:15:51 +02:00
Marko Mäkelä
6b5cdd4ff7 MDEV-19514: Update stale comments 2019-12-04 15:35:58 +02:00
Marko Mäkelä
95e903261e MDEV-21216 InnoDB does dirty read of TRX_SYS page before recovery
InnoDB startup was discovering undo tablespaces in a dirty way.
It was reading a possibly stale copy of the TRX_SYS page before
processing any redo log records.

srv_start(): Do not call buf_pool_invalidate(). Invoke
trx_rseg_get_n_undo_tablespaces() after the recovery has been initiated.

recv_recovery_from_checkpoint_start(): Assert that the buffer pool is
empty. This used to be guaranteed by the buf_pool_invalidate() call.

trx_rseg_get_n_undo_tablespaces(): Move to the calling compilation unit,
and reimplement in a simpler way.

srv_undo_tablespace_create(): Remove the constant parameter
size=SRV_UNDO_TABLESPACE_SIZE_IN_PAGES.

srv_undo_tablespace_open(): Reimplement in a cleaner way, with
more robust error handling.

srv_all_undo_tablespaces_open(): Split from srv_undo_tablespaces_init().

srv_undo_tablespaces_init(): Read all "undo001","undo002" tablespace
files directly, without consulting the TRX_SYS page via calling
trx_rseg_get_n_undo_tablespaces().

This is joint work with Thirunarayanan Balathandayuthapani.
2019-12-04 15:34:28 +02:00
Marko Mäkelä
56f6dab1d0 MDEV-21174: Replace mlog_write_ulint() with mtr_t::write()
mtr_t::write(): Replaces mlog_write_ulint(), mlog_write_ull().
Optimize away writes if the page contents does not change,
except when a dummy write has been explicitly requested.

Because the member function template takes a block descriptor as a
parameter, it is possible to introduce better consistency checks.
Due to this, the code for handling file-based lists, undo logs
and user transactions was refactored to pass around buf_block_t.
2019-12-03 11:05:18 +02:00
Marko Mäkelä
cd92c6c83d MDEV-12353 preparation: Do not write MLOG_REC_MIN_MARK
btr_set_min_rec_mark(): Write MLOG_1BYTE instead of
MLOG_REC_MIN_MARK or MLOG_COMP_REC_MIN_MARK.

On ROW_FORMAT=COMPRESSED pages, the minimum record flag is not stored
at all. The flag is computed for the uncompressed page by
page_zip_decompress(). Hence, nothing needs to be logged for
ROW_FORMAT=COMPRESSED tables for this operation.

To facilitate crash-upgrade and hot backup from older versions,
we will retain the code to parse and apply the old log record types
MLOG_REC_MIN_MARK and MLOG_COMP_REC_MIN_MARK.
2019-12-03 11:05:18 +02:00
Marko Mäkelä
8ebd91c1a9 MDEV-12353 preparation: Do not write MLOG_FILE_WRITE_CRYPT_DATA
The MLOG_FILE_WRITE_CRYPT_DATA record was completely redundant.
It can be replaced with a single MLOG_WRITE_STRING record.

To facilitate upgrade from older versions, we will retain
fil_parse_write_crypt_data().

fil_crypt_parse(): Recover fil_space_crypt_t::write_page0().

fil_space_crypt_t::write_page0(): Write everything in a single
MLOG_WRITE_STRING for easy parsing.

fil_space_crypt_t::page0_offset: Remove.
2019-12-03 11:05:18 +02:00
Marko Mäkelä
312569e2fd MDEV-21132 Remove buf_page_t::newest_modification
At each mini-transaction commit, the log sequence number of the
mini-transaction must be written to each modified page, so that
it will be available in the FIL_PAGE_LSN field when the page is
being read in crash recovery.

InnoDB was unnecessarily allocating redundant storage for the
field, in buf_page_t::newest_modification. Let us access
FIL_PAGE_LSN directly.

Furthermore, on ALTER TABLE...IMPORT TABLESPACE, let us write
0 to FIL_PAGE_LSN instead of using log_sys.lsn.

buf_flush_init_for_writing(), buf_flush_update_zip_checksum(),
fil_encrypt_buf_for_full_crc32(), fil_encrypt_buf(),
fil_space_encrypt(): Remove the parameter lsn.

buf_page_get_newest_modification(): Merge with the only caller.

buf_tmp_reserve_compression_buf(), buf_tmp_page_encrypt(),
buf_page_encrypt(): Define static in the same compilation unit
with the only caller.

PageConverter::m_current_lsn: Remove. Write 0 to FIL_PAGE_LSN
on ALTER TABLE...IMPORT TABLESPACE.
2019-11-25 09:39:51 +02:00
Marko Mäkelä
1fe6e5a1a5 Merge 10.4 into 10.5 2019-11-12 17:21:37 +02:00
Marko Mäkelä
33cb10d4e9 Merge 10.3 into 10.4 2019-11-12 16:55:44 +02:00
Marko Mäkelä
5098d708a0 Merge 10.2 into 10.3 2019-11-12 16:42:58 +02:00
Marko Mäkelä
dc8380b65d MDEV-14602: Cleanup recv_dblwr_t::find_page()
Avoid creating std::vector, and use single instead of double traversal.
2019-11-12 14:41:24 +02:00
Marko Mäkelä
b5ef7ffa59 Use uint16_t for FIL_PAGE_TYPE
Since commit 5d596064d6
fil_page_type_is_index() expects uint16_t, not ulint.
2019-11-08 13:43:42 +02:00