Commit graph

5565 commits

Author SHA1 Message Date
Marko Mäkelä
169c00994b MDEV-12699 Improve crash recovery of corrupted data pages
InnoDB crash recovery used to read every data page for which
redo log exists. This is unnecessary for those pages that are
initialized by the redo log. If a newly created page is corrupted,
recovery could unnecessarily fail. It would suffice to reinitialize
the page based on the redo log records.

To add insult to injury, InnoDB crash recovery could hang if it
encountered a corrupted page. We will fix also that problem.
InnoDB would normally refuse to start up if it encounters a
corrupted page on recovery, but that can be overridden by
setting innodb_force_recovery=1.

Data pages are completely initialized by the records
MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
which notifies that a page has been freed and its contents
can be discarded (filled with zeroes).

The record MLOG_INDEX_LOAD notifies that redo logging has
been re-enabled after being disabled. We can avoid loading
the page if all buffered redo log records predate the
MLOG_INDEX_LOAD record.

For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
records were written before commit aa3f7a107c.
Hence, we will skip these optimizations for tables whose
name starts with FTS_.

This is joint work with Thirunarayanan Balathandayuthapani.

fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
latest recovered MLOG_INDEX_LOAD record for a tablespace.

mlog_init: Page initialization operations discovered during
redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
and should be removed in MDEV-19176.

recv_addr_state: Add the new state RECV_WILL_NOT_READ to
indicate that according to mlog_init, the page will be
initialized based on redo log record contents.

recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
as page initialization. This works around bugs in the crash
recovery of ROW_FORMAT=COMPRESSED tables.

recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
by resetting the state to RECV_NOT_PROCESSED and by updating
the fil_name_t::enable_lsn.

recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
to fil_space_t::enable_lsn.

recv_recover_page(): Add the parameter init_lsn, to ignore
any log records that precede the page initialization.
Add DBUG output about skipped operations.

buf_page_create(): Initialize FIL_PAGE_LSN, so that
recv_recover_page() will not wrongly skip applying
the page-initialization record due to the field containing
some newer LSN as a leftover from a different page.
Do not invoke ibuf_merge_or_delete_for_page() during
crash recovery.

recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
Note if a corrupted page was found during recovery.
After invoking buf_page_create(), do invoke
ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
in the last recovery batch.

ibuf_merge_or_delete_for_page(): Relax a debug assertion.

innobase_start_or_create_for_mysql(): Abort startup if
a corrupted page was found during recovery. Corrupted pages
will not be flagged if innodb_force_recovery is set.
However, the recv_sys->found_corrupt_fs flag can be set
regardless of innodb_force_recovery if file names are found
to be incorrect (for example, multiple files with the same
tablespace ID).
2019-04-17 13:58:41 +03:00
Marko Mäkelä
376bf4ede5 MDEV-19241 InnoDB fails to write MLOG_INDEX_LOAD upon completing ALTER TABLE
Similar to what was done in commit aa3f7a107c
for FULLTEXT INDEX, we must ensure that MLOG_INDEX_LOAD records will always
be written if redo logging was disabled.

row_merge_build_indexes(): Invoke row_merge_write_redo() also when
online operation is not being executed or an error occurs.
In case of an error, invoke flush_observer->interrupted() so that
the pages will not be flushed but merely evicted from the buffer pool.
Before resuming redo logging, it is crucial for the correctness of
mariabackup and InnoDB crash recovery to flush or evict all affected pages
and to write MLOG_INDEX_LOAD records.
2019-04-17 13:58:22 +03:00
Marko Mäkelä
bc8d173b9f MDEV-14239 Missing space: "innodb_open_files ... greaterthan"
innobase_init(): Add a missing space to a warning message.
Apparently, this message was corrupted in MariaDB 10.2.2 in
commit fec844aca8 related to a
conflict resolution when applying a change from MySQL 5.7.12.
2019-04-15 19:17:24 +03:00
Marko Mäkelä
4ac8fa008d FSP_FLAGS_MEM_MASK: Remove traces of ATOMIC_WRITES 2019-04-10 16:21:57 +03:00
Marko Mäkelä
e7f426d2c9 MDEV-19212: Replace macros with type-safe inline functions
The regression that was reported in MDEV-19212 occurred due to use
of macros that did not ensure that the arguments have compatible
types.

ut_2pow_remainder(), ut_2pow_round(), ut_calc_align(): Define as
inline function templates.

UT_CALC_ALIGN(): Define as a macro, because this is used in
compile_time_assert(). Only starting with C++11 (MariaDB 10.4)
we could define the inline functions as constexpr.
2019-04-08 21:33:49 +03:00
Marko Mäkelä
f120a15b93 MDEV-19212 4GB Limit on large_pages - integer overflow
os_mem_alloc_large(): Invoke the macro ut_2pow_round() with the
correct argument type.

innobase_large_page_size, innobase_use_large_pages,
os_use_large_pages, os_large_page_size: Remove.
Simply refer to opt_large_page_size, my_use_large_pages.
2019-04-08 21:33:49 +03:00
Marko Mäkelä
4b822111ef MDEV-8139: Clean up the freeing of B-tree pages
btr_page_free(): Renamed from btr_page_free_low().
If scrubbing is enabled, zero out the page with proper redo logging.
Only pass ahi=true to fseg_free_page() if the page is actually indexed.

fil_space_t::modify_check(): Renamed from fsp_space_modify_check().

fsp_init_file_page(): Define inline.
2019-04-08 21:33:49 +03:00
Marko Mäkelä
7f5849a809 MDEV-18309: Remove unused code 2019-04-07 12:05:12 +03:00
Marko Mäkelä
867617a976 MDEV-18309: InnoDB reports bogus errors about missing #sql-*.ibd on startup
This is a follow-up to MDEV-18733. As part of that fix, we made
dict_check_sys_tables() skip tables that would be dropped by
row_mysql_drop_garbage_tables().

DICT_ERR_IGNORE_DROP: A new mode where the file should not be attempted
to be opened.

dict_load_tablespace(): Do not try to load the tablespace if
DICT_ERR_IGNORE_DROP has been specified.

row_mysql_drop_garbage_tables(): Pass the DICT_ERR_IGNORE_DROP mode.

fil_space_for_table_exists_in_mem(): Remove a parameter.
The only caller that passed print_error_if_does_not_exist=true
was row_drop_single_table_tablespace().
2019-04-07 10:57:38 +03:00
Marko Mäkelä
1d30b7b1d2 MDEV-12699 preparation: Clean up recv_sys
The recv_sys data structures are accessed not only from the thread
that executes InnoDB plugin initialization, but also from the
InnoDB I/O threads, which can invoke recv_recover_page().

Assert that sufficient concurrency control is in place.
Some code was accessing recv_sys data structures without
holding recv_sys->mutex.

recv_recover_page(bpage): Refactor the call from buf_page_io_complete()
into a separate function that performs necessary steps. The
main thread was unnecessarily releasing and reacquiring recv_sys->mutex.

recv_recover_page(block,mtr,recv_addr): Pass more parameters from
the caller. Avoid redundant lookups and computations. Eliminate some
redundant variables.

recv_get_fil_addr_struct(): Assert that recv_sys->mutex is being held.
That was not always the case!

recv_scan_log_recs(): Acquire recv_sys->mutex for the whole duration
of the function. (While we are scanning and buffering redo log records,
no pages can be read in.)

recv_read_in_area(): Properly protect access with recv_sys->mutex.

recv_apply_hashed_log_recs(): Check recv_addr->state only once,
and continuously hold recv_sys->mutex. The mutex will be released
and reacquired inside recv_recover_page() and recv_read_in_area(),
allowing concurrent processing by buf_page_io_complete() in I/O threads.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
aa3f7a107c MDEV-12699 preparation: Write MLOG_INDEX_LOAD for FTS_ tables
The record MLOG_INDEX_LOAD is supposed to be written to indicate that
some page modifications bypassed redo logging, and that redo logging
is now re-enabled. It was not written for fulltext indexes during
ALTER TABLE.

row_merge_write_redo(): Declare globally. Assert that the index
is neither a spatial nor fulltext index.

recv_mlog_index_load(): Observe a MLOG_INDEX_LOAD operation.

recv_parse_log_recs(): Handle MLOG_INDEX_LOAD also in multi-record
mini-transactions. Because of this omission, we should keep writing
MLOG_INDEX_LOAD in single-record mini-transactions, because older
versions of Mariabackup would fail.

row_fts_merge_insert(): Write MLOG_INDEX_LOAD for the auxiliary
tables of fulltext indexes.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
45d338dca8 MDEV-12699 preparation: Initialize the entire page on MLOG_ZIP_PAGE_COMPRESS
The record MLOG_ZIP_PAGE_COMPRESS is similar to MLOG_INIT_FILE_PAGE2
that it contains all the information needed to initialize the page.
Like for the other record, do initialize the entire page on recovery.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
1b95118c5f buf_page_get_gen(): Allow BUF_GET_IF_IN_POOL with a dummy page_size
The page_size argument to buf_page_get_gen() only matters when the
page is going to be loaded into the buffer pool. Allow callers to
pass a dummy parameter when using BUF_GET_IF_IN_POOL (which would
return NULL if the block is not in the buffer pool).
2019-04-06 21:25:43 +03:00
Marko Mäkelä
80f29211eb Fix a crash in CHECK TABLE for corrupted encrypted root page
btr_root_get(): Ignore the root->page.encrypted flag.
The purpose of this flag is questionable since
commit 8c43f96388.

btr_validate_index(): Avoid crash if btr_root_get() returns NULL.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
1d0380e029 MDEV-15528 preparation: Do not modify a freed page
btr_free_root(): Add the parameter bool invalidate.

btr_free_root_invalidate(): Remove.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
56df18be65 Clean up the parsing of MLOG_INIT_FILE_PAGE2
fsp_apply_init_file_page(): Renamed from fsp_init_file_page_low().

fsp_parse_init_file_page(): Remove. The redo log record has no
parameters.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
71f9552fd8 recv_recovery_is_on(): Add UNIV_UNLIKELY
Normally, InnoDB is not in the process of executing crash recovery.
Provide a hint to the compiler that the recovery-related code paths
are rarely executed.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
f602385776 Do not pass table_name_t to printf-like functions 2019-04-04 08:57:53 +03:00
Marko Mäkelä
c676de1692 Fix the non-debug build 2019-04-03 21:00:13 +03:00
Marko Mäkelä
28636a92ed Merge 10.1 into 10.2 2019-04-03 19:58:47 +03:00
Marko Mäkelä
cad56fbaba MDEV-18733 MariaDB slow start after crash recovery
If InnoDB crash recovery was needed, the InnoDB function srv_start()
would invoke extra validation, reading something from every InnoDB
data file. This should be unnecessary now that MDEV-14717 made
RENAME operations crash-safe inside InnoDB (which can be
disabled in MariaDB 10.2 by setting innodb_safe_truncate=OFF).

dict_check_sys_tables(): Skip tables that would be dropped by
row_mysql_drop_garbage_tables(). Perform extra validation only
if innodb_safe_truncate=OFF, innodb_force_recovery=0 and
crash recovery was needed.

dict_load_table_one(): Validate the root page of the table.
In this way, we can deny access to corrupted or mismatching tables
not only after crash recovery, but also after a clean shutdown.
2019-04-03 19:56:03 +03:00
Marko Mäkelä
7984ea80de Remove a useless CHECK TABLE printout for debug builds 2019-04-03 19:56:03 +03:00
Marko Mäkelä
fa14784301 Fix more -Wnonnull-compare
For some reason, GCC 8 did not issue warnings for all such comparisons.
2019-04-03 19:21:54 +03:00
Marko Mäkelä
a1ec7ac4f4 Clean up table_name_t
row_is_mysql_tmp_table_name(): Replaced with
dict_table_t::is_temporary_name() and table_name_t::is_temporary().

table_name_t: Add constructors.
2019-04-03 16:09:16 +03:00
Marko Mäkelä
03672a0573 MDEV-11487: Remove dict_table_get_n_sys_cols()
In MariaDB, InnoDB tables will always contain DATA_N_SYS_COLS = 3
columns, 2 or 3 of which are present in the clustered index.
We remove the predicate that was added in MySQL 5.7 as part of WL#7682.
2019-04-03 11:02:55 +03:00
Marko Mäkelä
dbc716675b Merge 10.1 into 10.2 2019-04-03 10:32:21 +03:00
Marko Mäkelä
c0fca2863b Fix -Wnonnull-compare
InnoDB and XtraDB had redundant assertions for checking that
function parameters that were declared as nonnull were not NULL.
2019-04-03 09:46:49 +03:00
Marko Mäkelä
e3f44d8d0e MDEV-19085: Remove a bogus debug assertion
MariaDB does support InnoDB tables with no stored columns.
(They are necessarily empty.)
2019-04-02 16:40:27 +03:00
Marko Mäkelä
5633f83ca4 Fix integer type mismatch 2019-04-02 13:46:36 +03:00
Marko Mäkelä
8650848ec3 MDEV-19128 fil_name_parse() for MLOG_FILE_ is not portable
On Microsoft Windows, InnoDB writes the path separator \ to the
redo log file, while on all other platforms, / is being used.

fil_name_parse(): Normalize the parsed path separators to the
native format. This allows backups or data sets to be portable
between Windows and other systems.
2019-04-02 13:43:22 +03:00
Marko Mäkelä
bce380f2a5 Merge 10.1 into 10.2 2019-04-02 09:14:15 +03:00
Marko Mäkelä
b88f378648 Omit the definition of unused function yyset_extra()
This is follow-up for commit 619d22dde5
to fix the cmake -DWITH_EMBEDDED_SERVER build.
2019-04-02 08:50:53 +03:00
Marko Mäkelä
d59ad6972b MDEV-19085: Fix a typo that was caught by GCC 5.4 2019-04-01 18:13:11 +03:00
Marko Mäkelä
f055da9b84 MDEV-19085 Assertion failures due to virtual columns after upgrading from 10.1
MariaDB before MDEV-5800 in version 10.2.2 did not support
indexed virtual columns. Non-persistent virtual columns were
hidden from storage engines. Only starting with MDEV-5800, InnoDB
would create internal metadata on virtual columns.

Similar to what was done in MDEV-18084, MDEV-18090, MDEV-18960, we adjust
one more code path for the old tables.

innobase_build_col_map(): Allocate space for virtual columns in col_map[]
but leave the entries at ULINT_UNDEFINED, noting that the virtual columns
were missing before the table was being rebuilt.
2019-04-01 14:24:15 +03:00
Marko Mäkelä
619d22dde5 Rebuild the InnoDB lexical analyzers with flex 2.6.4
InnoDB includes 3 parsers, which use 3 lexical analyzers that
are generated with flex. Flex versions before 2.6 emitted
the keyword "register", which is deprecated in C++17.

The lexical analyzers were regenerated as follows:

for s in storage/innobase storage/xtradb
do
	(cd "$s"/pars; ./make_flex.sh)
	touch "$s"/fts/*.l
	make -C "$s"/fts -f Makefile.query
done
2019-04-01 13:03:18 +03:00
Marko Mäkelä
23eeecd68f MDEV-19111 Unused field INFORMATION_SCHEMA.INNODB_TABLESPACES_SCRUBBING.ROTATING_OR_FLUSHING
The MDEV-11738/MDEV-11581 fix was supposed to add the column
ROTATING_OR_FLUSHING to the INFORMATION_SCHEMA table
INNODB_TABLESPACES_ENCRYPTION, but it also added that column to
INNODB_TABLESPACES_SCRUBBING in InnoDB (not XtraDB).

The extra column was never initialized. We will remove it,
because key rotation has nothing to do with the scrubbing of
tablespace data.
2019-04-01 10:32:03 +03:00
Eugene Kosov
76934212eb remove unneeded code
rec_get_offsets() was previously called in a btr_cur_optimistic_update()
2019-03-29 23:42:26 +04:00
Eugene Kosov
b66164ab56 remove dead code 2019-03-29 23:40:28 +04:00
Sergei Golubchik
cc71e7501c post-merge: -Werror fixes in 10.2 2019-03-29 10:58:25 +01:00
Marko Mäkelä
d0116e10a5 Revert MDEV-18464 and MDEV-12009
This reverts commit 21b2fada7a
and commit 81d71ee6b2.

The MDEV-18464 change introduces a few data race issues. Contrary to
the documentation, the field trx_t::victim is not always being protected
by lock_sys_t::mutex and trx_t::mutex. Most importantly, it seems
that KILL QUERY could wrongly avoid acquiring both mutexes when
invoking lock_trx_handle_wait_low(), in case another thread had
already set trx->victim=true.

We also revert MDEV-12009, because it should depend on the MDEV-18464
fix being present.
2019-03-28 12:39:50 +02:00
Thirunarayanan Balathandayuthapani
0623cc7c16 MDEV-19051 Avoid unnecessary writing MLOG_INDEX_LOAD
1) Avoid writing of MLOG_INDEX_LOAD redo log record during inplace
alter table when the table is empty and also for spatial index.

2) Avoid creation of temporary merge file for spatial index during
index creation process.
2019-03-28 10:55:18 +02:00
Jan Lindström
21b2fada7a MDEV-18464: Port kill_one_trx fixes from 10.4 to 10.1
Pushed the decision for innodb transaction and system
locking down to lock0lock.cc level. With this,
we can avoid releasing these mutexes for executions
where these mutexes were acquired upfront.

This patch will also fix BF aborting of native threads, e.g.
threads which have declared wsrep_on=OFF. Earlier, we have
used, for innodb trx locks, was_chosen_as_deadlock_victim
flag, for marking inodb transactions, which are victims for
wsrep BF abort. With native threads (wsrep_on==OFF), re-using
was_chosen_as_deadlock_victim flag may lead to inteference
with real deadlock, and to deal with this, the patch has added new
flag for marking wsrep BF aborts only: victim=true

Similar way if replication decides to abort one of the threads
we mark victim by: victim=true

innobase_kill_query
	Remove lock sys and trx mutex handling.

wsrep_innobase_kill_one_trx
	Mark victim trx with victim=true

trx0trx.h
	Remove trx_abort_t type and abort type variable from
	trx struct. Add victim variable to trx.

wsrep_kill_victim
	Remove abort_type

lock_report_waiters_to_mysql
	Take also trx mutex and mark trx as a victim for
	replication abort.

lock_trx_handle_wait_low
	New low level function to check whether the transaction
	has already been rolled back because it was selected as
	a deadlock victim, or if it has to wait then cancel
	the wait lock.

lock_trx_handle_wait
	If transaction is not marked as victim take lock sys
	and trx mutex before calling lock_trx_handle_wait_low
	and release them after that.

row_search_for_mysql
	Remove lock sys and trx mutex taking and releasing.

trx_rollback_to_savepoint_for_mysql_low
trx_commit_in_memory
	Clean up victim variable.
2019-03-28 07:40:03 +02:00
Sergei Golubchik
f97d879bf8 cmake: re-enable -Werror in the maintainer mode
now we can afford it. Fix -Werror errors. Note:
* old gcc is bad at detecting uninit variables, disable it.
* time_t is int or long, cast it for printf's
2019-03-27 22:51:37 +01:00
Marko Mäkelä
1e9c2b2305 Merge 10.1 into 10.2 2019-03-27 12:26:11 +02:00
Marko Mäkelä
a6585d5ce9 Merge 10.0 into 10.1 2019-03-27 11:56:08 +02:00
Marko Mäkelä
828cc2ba7c MDEV-18417/MDEV-18656/MDEV-18417: Work around compiler ASAN bug
In a Ubuntu Xenial build environment, the compiler identified as
g++-5.real (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
seems to be emitting incorrect code for the compilation unit
trx0rec.cc, triggering a bogus-looking AddressSanitizer report
of an invalid read of something in the function trx_undo_rec_get_pars().
This is potentially affecting any larger tests where the InnoDB
purge subsystem is being exercised.

When the optimization level of trx0rec.cc is limited to -O1, no
bogus failure is being reported. With -O2 or -O3, a lot of things
seemed to be inlined in the function, and the disassembly of the
generated code did not make sense to me.
2019-03-27 11:54:34 +02:00
Marko Mäkelä
226ca250ed Merge 10.1 into 10.2 2019-03-26 14:17:19 +02:00
Marko Mäkelä
065ba53ccb MDEV-12711 mariabackup --backup is refused for multi-file system tablespace
Before MDEV-12113 (MariaDB Server 10.1.25), on shutdown InnoDB would write
the current LSN to the first page of each file of the system tablespace.
This is incompatible with MariaDB's InnoDB table encryption, because
encryption repurposed the field for an encryption key ID and checksum.

buf_page_is_corrupted(): For the InnoDB system tablespace, skip
FIL_PAGE_FILE_FLUSH_LSN when checking if a page is all zero,
because the first page of each file in the system tablespace can
contain nonzero bytes in the field.
2019-03-26 13:51:15 +02:00
Marko Mäkelä
07096ada9b Fix the Windows build
btr_cur_compress_recommendation(): Backport a change from 10.3.

This is a follow-up to commit 1bd9815479.
2019-03-25 17:15:34 +02:00
Marko Mäkelä
525e79b057 MDEV-19022: InnoDB fails to cleanup useless B-tree pages
The test case for reproducing MDEV-14126 demonstrates that InnoDB can
end up with an index tree where a non-leaf page has only one child page.

The test case innodb.innodb_bug14676111 demonstrates that such pages
are sometimes unavoidable, because InnoDB does not implement any sort
of B-tree rotation.

But, there is no reason to allow a root page with only one child page.

btr_cur_node_ptr_delete(): Replaces btr_node_ptr_delete().

btr_page_get_father(): Declare globally.

btr_discard_only_page_on_level(): Declare with ATTRIBUTE_COLD.
It turns out that this function is not covered by the
innodb.innodb_bug14676111 test case after all.

btr_discard_page(): If the root page ends up having only one child
page, shrink the tree by invoking btr_lift_page_up().
2019-03-25 16:03:24 +02:00