Commit graph

20471 commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
ada1074bb1 MDEV-14398 innodb_encryption_rotate_key_age=0 causes innodb_encrypt_tables to be ignored
The statement

SET GLOBAL innodb_encryption_rotate_key_age=0;

would have the unwanted side effect that ENCRYPTION=DEFAULT tablespaces
would no longer be encrypted or decrypted according to the setting of
innodb_encrypt_tables.

We implement a trigger, so that whenever one of the following is executed:

SET GLOBAL innodb_encrypt_tables=OFF;
SET GLOBAL innodb_encrypt_tables=ON;
SET GLOBAL innodb_encrypt_tables=FORCE;

all wrong-state ENCRYPTION=DEFAULT tablespaces will be added to
fil_system_t::rotation_list, so that the encryption will be added
or removed.

Note: This will *NOT* happen automatically after a server restart.
Before reading the first page of a data file, InnoDB cannot know
the encryption status of the data file. The statement
SET GLOBAL innodb_encrypt_tables will have the side effect that
all not-yet-read InnoDB data files will be accessed in order to
determine the encryption status.

innodb_encrypt_tables_validate(): Stop disallowing
SET GLOBAL innodb_encrypt_tables when innodb_encryption_rotate_key_age=0.
This reverts part of commit 50eb40a2a8
that addressed MDEV-11738 and MDEV-11581.

fil_system_t::read_page0(): Trigger a call to fil_node_t::read_page0().
Refactored from fil_space_get_space().

fil_crypt_rotation_list_fill(): If innodb_encryption_rotate_key_age=0,
initialize fil_system->rotation_list. This is invoked both on
SET GLOBAL innodb_encrypt_tables and
on SET GLOBAL innodb_encryption_rotate_key_age=0.

fil_space_set_crypt_data(): Remove.

fil_parse_write_crypt_data(): Simplify the logic.

This is joint work with Marko Mäkelä.
2019-05-02 13:31:59 +03:00
Marko Mäkelä
810f014ca7 MDEV-18429: Simpler implementation
This reverts commit 61f370a3c9
and implements a simpler fix that is straightforward to merge to 10.3.

lock_print_info: Renamed from PrintNotStarted. Dump the
entire contents of trx_sys->mysql_trx_list.

lock_print_info_rw_recovered: Like lock_print_info, but dump
only recovered transactions in trx_sys->rw_trx_list.

lock_print_info_all_transactions(): Dump both trx_sys->mysql_trx_list
and trx_sys->rw_trx_list.

TrxLockIterator, TrxListIterator, lock_rec_fetch_page(): Remove.
This is a partial backport of the 10.3
commit a447980ff3
which removed the race-condition-prone ability of the InnoDB monitor
to read relevant pages into the buffer pool for some record locks.
2019-04-29 17:49:55 +03:00
Marko Mäkelä
092602ac9b MDEV-14130 InnoDB messages should not refer to the MySQL 5.7 manual
In InnoDB error messages, replace the hyperlink URLs to point
to the MariaDB knowledge base.
2019-04-29 15:11:06 +03:00
Marko Mäkelä
5fb4c0a8a8 Clean up ut_list
ut_list_validate(), ut_list_map(): Add variants with const Functor&
so that these functions can be called with an rvalue.

Remove wrapper macros, and add #ifdef UNIV_DEBUG around debug-only code.
2019-04-29 15:11:06 +03:00
Eugene Kosov
61f370a3c9 MDEV-18429 Consistent non-locking reads do not appear in TRANSACTIONS section of SHOW ENGINE INNODB STATUS
lock_print_info_all_transactions(): print transactions which are started
but not in RW or RO lists.

print_not_started(): Replaces PrintNotStarted. Collect the skipped
transactions.
2019-04-29 15:11:06 +03:00
Marko Mäkelä
1b577e4d8b MDEV-19356 Assertion 'space->free_limit == 0 || space->free_limit == free_limit'
fil_node_t::read_page0(): Do not replace up-to-date metadata with a
possibly old version of page 0 that is being reread from the file.
A more up-to-date page 0 could still exists in the buffer pool,
waiting to be written back to the file.
2019-04-29 11:43:22 +03:00
Marko Mäkelä
e10b3fa97a MDEV-19231: Correct an assertion
BtrBulk::finish(): On error, the B-tree can be corrupted. Only upon
successful completion, it makes sense to validate the index.
2019-04-29 10:04:54 +03:00
Marko Mäkelä
1a5ba2a4be MDEV-19342 Merge new release of InnoDB 5.7.26 to 10.2 2019-04-26 18:19:50 +03:00
Marko Mäkelä
4e9f8c9cc4 Remove roll_node_t::partial
The field roll_node_t::partial holds if and only if
savept has been set. Make savept a pointer.

trx_rollback_start(): Use the semantic type undo_no_t for roll_limit.
2019-04-26 17:40:20 +03:00
Aditya A
f3a9f12bc3 Bug #29021730 CRASHING INNOBASE_COL_CHECK_FK WITH FOREIGN KEYS
PROBLEM
-------
Function innodb_base_col_setup_for_stored() was skipping to store
the base column information for a generated column if the base column
was a "STORED" generated column. This later causes a crash in function
innoabse_col_check_fk() where it says that a generated columns depends
upon two base columns ,but there is information on only one of them.
There was a explicit check barring the stored columns being stored,
which is wrong because the documentation says that a generated stored
column can be a part of a generated column.

FIX
----
Store the information of base column if it is a stored generated column.

#RB21247
Reviewed by: Debarun Banerjee <debarun.banerjee@oracle.com>
2019-04-26 17:40:20 +03:00
Marko Mäkelä
793bd3ee13 lock_rec_convert_impl_to_expl_for_trx(): Remove unused parameter 2019-04-26 17:40:20 +03:00
Sachin Agarwal
06ec56f579 Bug #27850600 INNODB ASYNC IO ERROR HANDLING IN IO_EVENT
Problem:
io_getevents() - read asynchronous I/O events from the completion
queue. For each IO event, the res field in io_event tells whether IO
event is succeeded or not. To see if the IO actually succeeded we
always need to check event.res (negative=error,
positive=bytesread/written).
LinuxAIOHandler::collect() doesn't check event.res value for each event.
which leads to incorrect value in n_bytes for IO context (or IO Slot).

Fix:
Added a check for event.res negative value.

RB: 20871
Reviewed by : annamalai.gurusami@oracle.com
2019-04-26 17:40:20 +03:00
Marko Mäkelä
1c4d1f3d05 innobase_col_check_fk(): Remove copying 2019-04-26 17:40:19 +03:00
Marko Mäkelä
b2dbc781c7 Implement --debug=d,ib_log_checkpoint_avoid
Normally, the InnoDB master thread executes InnoDB log checkpoints
so frequently that bugs in crash recovery or redo logging can be
hard to reproduce. This is because crash recovery would start replaying
the log only from the latest checkpoint. Because the InnoDB redo log
format only allows saving information for at most 2 latest checkpoints,
and because the log files are written in a circular fashion, it would
be challenging to implement a debug option that would start the redo
log apply from the very start of the redo log file.
2019-04-25 17:26:23 +03:00
Eugene Kosov
6c5c1f0b2f MDEV-19231 make DB_SUCCESS equal to 0
It's a micro optimization. On most platforms CPUs has instructions to
compare with 0 fast. DB_SUCCESS is the most popular outcome of functions
and this patch optimized code like (err == DB_SUCCESS)

BtrBulk::finish(): bogus assertion fixed

fil_node_t::read_page0(): corrected usage of os_file_read()

que_eval_sql(): bugus assertion removed. Apparently it checked that
the field was assigned after having been zero-initialized at
object creation.

It turns out that the return type of os_file_read_func() was changed
in mysql/mysql-server@98909cefbc (MySQL 5.7)
from ibool to dberr_t. The reviewer (if there was any) failed to
point out that because of future merges, it could be a bad idea to
change the return type of a function without changing the function name.

This change was applied to MariaDB 10.2.2 in
commit 2e814d4702 but the
MariaDB-specific code was not fully adjusted accordingly,
e.g. in fil_node_open_file(). Essentially, code like
!os_file_read(...) became dead code in MariaDB and later
in Mariabackup 10.2, and we could be dealing with an uninitialized
buffer after a failed page read.
2019-04-25 16:29:55 +03:00
Marko Mäkelä
bc145193c1 Merge 10.1 into 10.2 2019-04-25 09:04:09 +03:00
Marko Mäkelä
bfb0726fc2 Merge 5.5 into 10.1 2019-04-24 12:03:11 +03:00
Thirunarayanan Balathandayuthapani
d5da8ae04d MDEV-15772 Potential list overrun during XA recovery
InnoDB could return the same list again and again if the buffer
passed to trx_recover_for_mysql() is smaller than the number of
transactions that InnoDB recovered in XA PREPARE state.

We introduce the transaction state TRX_PREPARED_RECOVERED, which
is like TRX_PREPARED, but will be set during trx_recover_for_mysql()
so that each transaction will only be returned once.

Because init_server_components() is invoking ha_recover() twice,
we must reset the state of the transactions back to TRX_PREPARED
after returning the complete list, so that repeated traversals
will see the complete list again, instead of seeing an empty list.
Without this tweak, the test main.tc_heuristic_recover would hang
in MariaDB 10.1.
2019-04-24 11:46:14 +03:00
Marko Mäkelä
e5aa8ea525 MDEV-18139 ALTER IGNORE ... ADD FOREIGN KEY causes bogus error
dict_create_foreign_constraints_low(): Tolerate the keywords
IGNORE and ONLINE between the keywords ALTER and TABLE.

We should really remove the hacky FOREIGN KEY constraint parser
from InnoDB.
2019-04-23 17:56:43 +03:00
Alexander Barkov
6c5e4c9bc0 Fixing -Werror=format-overflow errors (found by gcc-8.3.1) 2019-04-22 15:05:11 +04:00
Marko Mäkelä
d315b4ff39 Remove IBUF_COUNT_DEBUG
The compile-time option IBUF_COUNT_DEBUG has not been used for years.
It would only work with up to 3 created .ibd files, with no buffered
changes existing while InnoDB is started up.
2019-04-19 12:44:46 +03:00
Sergei Petrunia
056b6fe1d5 MDEV-17297: stats.records=0 for a table of Archive engine when it has rows, when we run ANALYZE command
Archive storage engine assumed that any query that attempts to read from
the table will call ha_archive::info() beforehand. ha_archive would flush
un-written data in that call (this would make it visible for the reads).

Break this assumption. Flush the data when the table is opened for reading.

This way, one can do multiple write statements without causing a flush, but
as soon as we might need the data, we flush it.
2019-04-18 23:12:43 +03:00
Marko Mäkelä
169c00994b MDEV-12699 Improve crash recovery of corrupted data pages
InnoDB crash recovery used to read every data page for which
redo log exists. This is unnecessary for those pages that are
initialized by the redo log. If a newly created page is corrupted,
recovery could unnecessarily fail. It would suffice to reinitialize
the page based on the redo log records.

To add insult to injury, InnoDB crash recovery could hang if it
encountered a corrupted page. We will fix also that problem.
InnoDB would normally refuse to start up if it encounters a
corrupted page on recovery, but that can be overridden by
setting innodb_force_recovery=1.

Data pages are completely initialized by the records
MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
which notifies that a page has been freed and its contents
can be discarded (filled with zeroes).

The record MLOG_INDEX_LOAD notifies that redo logging has
been re-enabled after being disabled. We can avoid loading
the page if all buffered redo log records predate the
MLOG_INDEX_LOAD record.

For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
records were written before commit aa3f7a107c.
Hence, we will skip these optimizations for tables whose
name starts with FTS_.

This is joint work with Thirunarayanan Balathandayuthapani.

fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
latest recovered MLOG_INDEX_LOAD record for a tablespace.

mlog_init: Page initialization operations discovered during
redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
and should be removed in MDEV-19176.

recv_addr_state: Add the new state RECV_WILL_NOT_READ to
indicate that according to mlog_init, the page will be
initialized based on redo log record contents.

recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
as page initialization. This works around bugs in the crash
recovery of ROW_FORMAT=COMPRESSED tables.

recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
by resetting the state to RECV_NOT_PROCESSED and by updating
the fil_name_t::enable_lsn.

recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
to fil_space_t::enable_lsn.

recv_recover_page(): Add the parameter init_lsn, to ignore
any log records that precede the page initialization.
Add DBUG output about skipped operations.

buf_page_create(): Initialize FIL_PAGE_LSN, so that
recv_recover_page() will not wrongly skip applying
the page-initialization record due to the field containing
some newer LSN as a leftover from a different page.
Do not invoke ibuf_merge_or_delete_for_page() during
crash recovery.

recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
Note if a corrupted page was found during recovery.
After invoking buf_page_create(), do invoke
ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
in the last recovery batch.

ibuf_merge_or_delete_for_page(): Relax a debug assertion.

innobase_start_or_create_for_mysql(): Abort startup if
a corrupted page was found during recovery. Corrupted pages
will not be flagged if innodb_force_recovery is set.
However, the recv_sys->found_corrupt_fs flag can be set
regardless of innodb_force_recovery if file names are found
to be incorrect (for example, multiple files with the same
tablespace ID).
2019-04-17 13:58:41 +03:00
Marko Mäkelä
376bf4ede5 MDEV-19241 InnoDB fails to write MLOG_INDEX_LOAD upon completing ALTER TABLE
Similar to what was done in commit aa3f7a107c
for FULLTEXT INDEX, we must ensure that MLOG_INDEX_LOAD records will always
be written if redo logging was disabled.

row_merge_build_indexes(): Invoke row_merge_write_redo() also when
online operation is not being executed or an error occurs.
In case of an error, invoke flush_observer->interrupted() so that
the pages will not be flushed but merely evicted from the buffer pool.
Before resuming redo logging, it is crucial for the correctness of
mariabackup and InnoDB crash recovery to flush or evict all affected pages
and to write MLOG_INDEX_LOAD records.
2019-04-17 13:58:22 +03:00
Marko Mäkelä
bc8d173b9f MDEV-14239 Missing space: "innodb_open_files ... greaterthan"
innobase_init(): Add a missing space to a warning message.
Apparently, this message was corrupted in MariaDB 10.2.2 in
commit fec844aca8 related to a
conflict resolution when applying a change from MySQL 5.7.12.
2019-04-15 19:17:24 +03:00
Marko Mäkelä
4ac8fa008d FSP_FLAGS_MEM_MASK: Remove traces of ATOMIC_WRITES 2019-04-10 16:21:57 +03:00
Marko Mäkelä
e7f426d2c9 MDEV-19212: Replace macros with type-safe inline functions
The regression that was reported in MDEV-19212 occurred due to use
of macros that did not ensure that the arguments have compatible
types.

ut_2pow_remainder(), ut_2pow_round(), ut_calc_align(): Define as
inline function templates.

UT_CALC_ALIGN(): Define as a macro, because this is used in
compile_time_assert(). Only starting with C++11 (MariaDB 10.4)
we could define the inline functions as constexpr.
2019-04-08 21:33:49 +03:00
Marko Mäkelä
f120a15b93 MDEV-19212 4GB Limit on large_pages - integer overflow
os_mem_alloc_large(): Invoke the macro ut_2pow_round() with the
correct argument type.

innobase_large_page_size, innobase_use_large_pages,
os_use_large_pages, os_large_page_size: Remove.
Simply refer to opt_large_page_size, my_use_large_pages.
2019-04-08 21:33:49 +03:00
Marko Mäkelä
4b822111ef MDEV-8139: Clean up the freeing of B-tree pages
btr_page_free(): Renamed from btr_page_free_low().
If scrubbing is enabled, zero out the page with proper redo logging.
Only pass ahi=true to fseg_free_page() if the page is actually indexed.

fil_space_t::modify_check(): Renamed from fsp_space_modify_check().

fsp_init_file_page(): Define inline.
2019-04-08 21:33:49 +03:00
Sergei Golubchik
7d720ca8de cmake: don't use generated files to detect a submodule
Even if Makefile for some reason was checked in in a submodule,
it is still a generated file, will be cleaned, won't be in a source
package. One cannot jump to conclusions if it doesn't exist.
2019-04-07 13:49:04 +02:00
Marko Mäkelä
7f5849a809 MDEV-18309: Remove unused code 2019-04-07 12:05:12 +03:00
Marko Mäkelä
867617a976 MDEV-18309: InnoDB reports bogus errors about missing #sql-*.ibd on startup
This is a follow-up to MDEV-18733. As part of that fix, we made
dict_check_sys_tables() skip tables that would be dropped by
row_mysql_drop_garbage_tables().

DICT_ERR_IGNORE_DROP: A new mode where the file should not be attempted
to be opened.

dict_load_tablespace(): Do not try to load the tablespace if
DICT_ERR_IGNORE_DROP has been specified.

row_mysql_drop_garbage_tables(): Pass the DICT_ERR_IGNORE_DROP mode.

fil_space_for_table_exists_in_mem(): Remove a parameter.
The only caller that passed print_error_if_does_not_exist=true
was row_drop_single_table_tablespace().
2019-04-07 10:57:38 +03:00
Marko Mäkelä
1d30b7b1d2 MDEV-12699 preparation: Clean up recv_sys
The recv_sys data structures are accessed not only from the thread
that executes InnoDB plugin initialization, but also from the
InnoDB I/O threads, which can invoke recv_recover_page().

Assert that sufficient concurrency control is in place.
Some code was accessing recv_sys data structures without
holding recv_sys->mutex.

recv_recover_page(bpage): Refactor the call from buf_page_io_complete()
into a separate function that performs necessary steps. The
main thread was unnecessarily releasing and reacquiring recv_sys->mutex.

recv_recover_page(block,mtr,recv_addr): Pass more parameters from
the caller. Avoid redundant lookups and computations. Eliminate some
redundant variables.

recv_get_fil_addr_struct(): Assert that recv_sys->mutex is being held.
That was not always the case!

recv_scan_log_recs(): Acquire recv_sys->mutex for the whole duration
of the function. (While we are scanning and buffering redo log records,
no pages can be read in.)

recv_read_in_area(): Properly protect access with recv_sys->mutex.

recv_apply_hashed_log_recs(): Check recv_addr->state only once,
and continuously hold recv_sys->mutex. The mutex will be released
and reacquired inside recv_recover_page() and recv_read_in_area(),
allowing concurrent processing by buf_page_io_complete() in I/O threads.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
aa3f7a107c MDEV-12699 preparation: Write MLOG_INDEX_LOAD for FTS_ tables
The record MLOG_INDEX_LOAD is supposed to be written to indicate that
some page modifications bypassed redo logging, and that redo logging
is now re-enabled. It was not written for fulltext indexes during
ALTER TABLE.

row_merge_write_redo(): Declare globally. Assert that the index
is neither a spatial nor fulltext index.

recv_mlog_index_load(): Observe a MLOG_INDEX_LOAD operation.

recv_parse_log_recs(): Handle MLOG_INDEX_LOAD also in multi-record
mini-transactions. Because of this omission, we should keep writing
MLOG_INDEX_LOAD in single-record mini-transactions, because older
versions of Mariabackup would fail.

row_fts_merge_insert(): Write MLOG_INDEX_LOAD for the auxiliary
tables of fulltext indexes.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
45d338dca8 MDEV-12699 preparation: Initialize the entire page on MLOG_ZIP_PAGE_COMPRESS
The record MLOG_ZIP_PAGE_COMPRESS is similar to MLOG_INIT_FILE_PAGE2
that it contains all the information needed to initialize the page.
Like for the other record, do initialize the entire page on recovery.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
1b95118c5f buf_page_get_gen(): Allow BUF_GET_IF_IN_POOL with a dummy page_size
The page_size argument to buf_page_get_gen() only matters when the
page is going to be loaded into the buffer pool. Allow callers to
pass a dummy parameter when using BUF_GET_IF_IN_POOL (which would
return NULL if the block is not in the buffer pool).
2019-04-06 21:25:43 +03:00
Marko Mäkelä
80f29211eb Fix a crash in CHECK TABLE for corrupted encrypted root page
btr_root_get(): Ignore the root->page.encrypted flag.
The purpose of this flag is questionable since
commit 8c43f96388.

btr_validate_index(): Avoid crash if btr_root_get() returns NULL.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
1d0380e029 MDEV-15528 preparation: Do not modify a freed page
btr_free_root(): Add the parameter bool invalidate.

btr_free_root_invalidate(): Remove.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
56df18be65 Clean up the parsing of MLOG_INIT_FILE_PAGE2
fsp_apply_init_file_page(): Renamed from fsp_init_file_page_low().

fsp_parse_init_file_page(): Remove. The redo log record has no
parameters.
2019-04-06 21:25:43 +03:00
Marko Mäkelä
71f9552fd8 recv_recovery_is_on(): Add UNIV_UNLIKELY
Normally, InnoDB is not in the process of executing crash recovery.
Provide a hint to the compiler that the recovery-related code paths
are rarely executed.
2019-04-06 21:25:43 +03:00
Eugene Kosov
812ac2bb85 MDEV-19172 Reorder fields in PFS_events and PFS_events_waits to speed up memcpy()
before:
(gdb) p sizeof(PFS_events_waits)
$1 = 184

after:
(gdb) p sizeof(PFS_events_waits)
$1 = 160

no functional changes
2019-04-05 20:08:45 +04:00
Marko Mäkelä
f602385776 Do not pass table_name_t to printf-like functions 2019-04-04 08:57:53 +03:00
Marko Mäkelä
c676de1692 Fix the non-debug build 2019-04-03 21:00:13 +03:00
Marko Mäkelä
28636a92ed Merge 10.1 into 10.2 2019-04-03 19:58:47 +03:00
Marko Mäkelä
cad56fbaba MDEV-18733 MariaDB slow start after crash recovery
If InnoDB crash recovery was needed, the InnoDB function srv_start()
would invoke extra validation, reading something from every InnoDB
data file. This should be unnecessary now that MDEV-14717 made
RENAME operations crash-safe inside InnoDB (which can be
disabled in MariaDB 10.2 by setting innodb_safe_truncate=OFF).

dict_check_sys_tables(): Skip tables that would be dropped by
row_mysql_drop_garbage_tables(). Perform extra validation only
if innodb_safe_truncate=OFF, innodb_force_recovery=0 and
crash recovery was needed.

dict_load_table_one(): Validate the root page of the table.
In this way, we can deny access to corrupted or mismatching tables
not only after crash recovery, but also after a clean shutdown.
2019-04-03 19:56:03 +03:00
Marko Mäkelä
7984ea80de Remove a useless CHECK TABLE printout for debug builds 2019-04-03 19:56:03 +03:00
Marko Mäkelä
1f3bcff1f2 Remove unused declarations 2019-04-03 19:46:34 +03:00
Marko Mäkelä
fa14784301 Fix more -Wnonnull-compare
For some reason, GCC 8 did not issue warnings for all such comparisons.
2019-04-03 19:21:54 +03:00
Marko Mäkelä
a1ec7ac4f4 Clean up table_name_t
row_is_mysql_tmp_table_name(): Replaced with
dict_table_t::is_temporary_name() and table_name_t::is_temporary().

table_name_t: Add constructors.
2019-04-03 16:09:16 +03:00
Marko Mäkelä
03672a0573 MDEV-11487: Remove dict_table_get_n_sys_cols()
In MariaDB, InnoDB tables will always contain DATA_N_SYS_COLS = 3
columns, 2 or 3 of which are present in the clustered index.
We remove the predicate that was added in MySQL 5.7 as part of WL#7682.
2019-04-03 11:02:55 +03:00