When using innodb_page_size=16k, InnoDB tables
that were created in MariaDB 10.1.0 to 10.1.20 with
PAGE_COMPRESSED=1 and
PAGE_COMPRESSION_LEVEL=2 or PAGE_COMPRESSION_LEVEL=3
would fail to load.
fsp_flags_is_valid(): When using innodb_page_size=16k, use a
more strict check for .ibd files, with the assumption that
nobody would try to use different-page-size files.
When using innodb_page_size=16k, InnoDB tables
that were created in MariaDB 10.1.0 to 10.1.20 with
PAGE_COMPRESSED=1 and
PAGE_COMPRESSION_LEVEL=2 or PAGE_COMPRESSION_LEVEL=3
would fail to load.
fsp_flags_is_valid(): When using innodb_page_size=16k, use a
more strict check for .ibd files, with the assumption that
nobody would try to use different-page-size files.
The POINT data type is being treated just like any other
geometry data type in InnoDB. The fixed-length data type
DATA_POINT had been introduced in WL#6942 based on a
misunderstanding and without appropriate review.
Because of fundamental design problems (such as a
DEFAULT POINT(0 0) value secretly introduced by InnoDB),
the code was disabled in Oracle Bug#20415831 fix.
This patch removes the dead code and definitions that were
left behind by the Oracle Bug#20415831 patch.
This is preparation for MDEV-12288, which would set DB_TRX_ID=0
when purging history. Also with that change in place, delete-marked
records must always refer to an undo log record via a nonzero
DB_TRX_ID column. (The DB_TRX_ID is only present in clustered index
leaf page records.)
btr_cur_parse_del_mark_set_clust_rec(), rec_get_trx_id():
Statically allocate the offsets
(should never use the heap). Add some debug assertions.
Replace some use of rec_get_trx_id() with row_get_rec_trx_id().
trx_undo_report_row_operation(): Add some sanity checks that are
common for all operations that produce undo log.
The field fts_token->position is not initialized in
row_merge_fts_doc_tokenize(). We cannot have that field
without changing the fulltext parser plugin ABI
(adding st_mysql_ftparser_boolean_info::position,
as it was done in MySQL 5.7 in WL#6943).
The InnoDB fulltext parser plugins "ngram" and "Mecab" that were
introduced in MySQL 5.7 do depend on that field. But the simple_parser
does not. Apparently, simple_parser is leaving the field as 0.
So, in our fix we will assume that the missing position field is 0.
While the primary purpose of innodb_force_recovery is to allow
data to be rescued from an InnoDB instance that would crash due
to some data corruption, the settings 1, 2, or 3 are relatively
safe to use and there is no need to prevent write transactions
in these modes.
The setting innodb_force_recovery=4 and above can cause database
corruption. For those modes, we already set the flag
high_level_read_only to disable modifications, except DROP TABLE.
MODIFICATIONS_NOT_ALLOWED_MSG_FORCE_RECOVERY: Remove. There is no
need to spam the error log for each refused DML operation. It suffices
to return an error to the client. There will be messages at startup
if innodb_read_only or innodb_force_recovery are preventing writes.
Comment from Codership:-
To fix the problem, we changed the certification logic in galera to treat insert
on child table row as exclusive to prevent any operation on referenced
parent table row. At the same time, update and delete on
child table row were demoted to "shared", which makes it possible to
update/delete referenced parent table row, but only in a later transaction.
This change allows somewhat more concurrency for foreign key constrained
transactions, but is still safe for correct certification end result.
When the btr_search_latch was split into an array of latches
in MySQL 5.7.8 as part of the Oracle Bug#20985298 fix, the "caching"
of the latch across storage engine API calls was removed, and
the field trx->has_search_latch would only be set during a short
time frame in the execution of row_search_mvcc(), which was
formerly called row_search_for_mysql().
This means that the column
INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_LATCHED will always
report 0. That column cannot be removed in MariaDB 10.2, but it
can be removed in future releases.
trx_t::has_search_latch: Remove.
trx_assert_no_search_latch(): Remove.
row_sel_try_search_shortcut_for_mysql(): Remove a redundant condition
on trx->has_search_latch (it was always true).
sync_check_iterate(): Make the parameter const.
sync_check_functor_t: Make the operator() const, and remove result()
and the virtual destructor. There is no need to have mutable state
in the functors.
sync_checker<bool>: Replaces dict_sync_check and btrsea_sync_check.
sync_check: Replaces btrsea_sync_check.
dict_sync_check: Instantiated from sync_checker.
sync_allowed_latches: Use std::find() directly on the array.
Remove the std::vector.
TrxInInnoDB::enter(), TrxInInnoDB::exit(): Remove obviously redundant
debug assertions on trx->in_depth, and use equality comparison against 0
because it could be more efficient on some architectures.
innodb.table_flags: Adjust the test case. Due to the MDEV-12873 fix
in 10.2, the corrupted flags for table test.td would be converted,
and a tablespace flag mismatch will occur when trying to open the file.
Remove the SHARED_SPACE flag that was erroneously introduced in
MariaDB 10.2.2, and shift the SYS_TABLES.TYPE flags back to where
they were before MariaDB 10.2.2. While doing this, ensure that
tables created with affected MariaDB versions can be loaded,
and also ensure that tables created with MySQL 5.7 using the
TABLESPACE attribute cannot be loaded.
MariaDB 10.2.2 picked the SHARED_SPACE flag from MySQL 5.7,
shifting the MariaDB 10.1 flags PAGE_COMPRESSION, PAGE_COMPRESSION_LEVEL,
ATOMIC_WRITES by one bit. The SHARED_SPACE flag would always
be written as 0 by MariaDB, because MariaDB does not support
CREATE TABLESPACE or CREATE TABLE...TABLESPACE for InnoDB.
So, instead of the bits AALLLLCxxxxxxx we would have
AALLLLC0xxxxxxx if the table was created with MariaDB 10.2.2
to 10.2.6. (AA=ATOMIC_WRITES, LLLL=PAGE_COMPRESSION_LEVEL,
C=PAGE_COMPRESSED, xxxxxxx=7 bits that were not moved.)
PAGE_COMPRESSED=NO implies LLLLC=00000. That is not a problem.
If someone created a table in MariaDB 10.2.2 or 10.2.3 with
the attribute ATOMIC_WRITES=OFF (value 2; AA=10) and without
PAGE_COMPRESSED=YES or PAGE_COMPRESSION_LEVEL, the table should be
rejected. We ignore this problem, because it should be unlikely
for anyone to specify ATOMIC_WRITES=OFF, and because 10.2.2 and
10.2.2 were not mature releases. The value ATOMIC_WRITES=ON (1)
would be interpreted as ATOMIC_WRITES=OFF, but starting with
MariaDB 10.2.4 the ATOMIC_WRITES attribute is ignored.
PAGE_COMPRESSED=YES implies that PAGE_COMPRESSION_LEVEL be between
1 and 9 and that ROW_FORMAT be COMPACT or DYNAMIC. Thus, the affected
wrong bit pattern in SYS_TABLES.TYPE is of the form AALLLL10DB00001
where D signals the presence of a DATA DIRECTORY attribute and B is 1
for ROW_FORMAT=DYNAMIC and 0 for ROW_FORMAT=COMPACT. We must interpret
this bit pattern as AALLLL1DB00001 (discarding the extraneous 0 bit).
dict_sys_tables_rec_read(): Adjust the affected bit pattern when
reading the SYS_TABLES.TYPE column. In case of invalid flags,
report both SYS_TABLES.TYPE (after possible adjustment) and
SYS_TABLES.MIX_LEN.
dict_load_table_one(): Replace an unreachable condition on
!dict_tf2_is_valid() with a debug assertion. The flags will already
have been validated by dict_sys_tables_rec_read(); if that validation
fails, dict_load_table_low() will have failed.
fil_ibd_create(): Shorten an error message about a file pre-existing.
Datafile::validate_to_dd(): Clarify an error message about tablespace
flags mismatch.
ha_innobase::open(): Remove an unnecessary warning message.
dict_tf_is_valid(): Simplify and stricten the logic. Validate the
values of PAGE_COMPRESSION. Remove error log output; let the callers
handle that.
DICT_TF_BITS: Remove ATOMIC_WRITES, PAGE_ENCRYPTION, PAGE_ENCRYPTION_KEY.
The ATOMIC_WRITES is ignored once the SYS_TABLES.TYPE has been validated;
there is no need to store it in dict_table_t::flags. The PAGE_ENCRYPTION
and PAGE_ENCRYPTION_KEY are unused since MariaDB 10.1.4 (the GA release
was 10.1.8).
DICT_TF_BIT_MASK: Remove (unused).
FSP_FLAGS_MEM_ATOMIC_WRITES: Remove (the flags are never read).
row_import_read_v1(): Display an error if dict_tf_is_valid() fails.
dict_table_t::thd: Remove. This was only used by btr_root_block_get()
for reporting decryption failures, and it was only assigned by
ha_innobase::open(), and never cleared. This could mean that if a
connection is closed, the pointer would become stale, and the server
could crash while trying to report the error. It could also mean
that an error is being reported to the wrong client. It is better
to use current_thd in this case, even though it could mean that if
the code is invoked from an InnoDB background operation, there would
be no connection to which to send the error message.
Remove dict_table_t::crypt_data and dict_table_t::page_0_read.
These fields were never read.
fil_open_single_table_tablespace(): Remove the parameter "table".
The doublewrite buffer pages must fit in the first InnoDB system
tablespace data file. The checks that were added in the initial patch
(commit 112b21da37)
were at too high level and did not cover all cases.
innodb.log_data_file_size: Test all innodb_page_size combinations.
fsp_header_init(): Never return an error. Move the change buffer creation
to the only caller that needs to do it.
btr_create(): Clean up the logic. Remove the error log messages.
buf_dblwr_create(): Try to return an error on non-fatal failure.
Check that the first data file is big enough for creating the
doublewrite buffers.
buf_dblwr_process(): Check if the doublewrite buffer is available.
Display the message only if it is available.
recv_recovery_from_checkpoint_start_func(): Remove a redundant message
about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already
been initiated.
fil_report_invalid_page_access(): Simplify the message.
fseg_create_general(): Do not emit messages to the error log.
innobase_init(): Revert the changes.
trx_rseg_create(): Refactor (no functional change).
Problem was that all doublewrite buffer pages must fit to first
system datafile.
Ported commit 27a34df7882b1f8ed283f22bf83e8bfc523cbfde
Author: Shaohua Wang <shaohua.wang@oracle.com>
Date: Wed Aug 12 15:55:19 2015 +0800
BUG#21551464 - SEGFAULT WHILE INITIALIZING DATABASE WHEN
INNODB_DATA_FILE SIZE IS SMALL
To 10.1 (with extended error printout).
btr_create(): If ibuf header page allocation fails report error and
return FIL_NULL. Similarly if root page allocation fails return a error.
dict_build_table_def_step: If fsp_header_init fails return
error code.
fsp_header_init: returns true if header initialization succeeds
and false if not.
fseg_create_general: report error if segment or page allocation fails.
innobase_init: If first datafile is smaller than 3M and could not
contain all doublewrite buffer pages report error and fail to
initialize InnoDB plugin.
row_truncate_table_for_mysql: report error if fsp header init
fails.
srv_init_abort: New function to report database initialization errors.
srv_undo_tablespaces_init, innobase_start_or_create_for_mysql: If
database initialization fails report error and abort.
trx_rseg_create: If segment header creation fails return.
When MySQL 5.7 introduced fulltext parser plugins to InnoDB,
it hard-coded the plugin name "ngram" to mean something special.
Because -fsanitize=undefined was issuing warnings for the
assignment in row_merge_create_index() that the value is out of
range for Boolean, we remove this code that was not intended to
be used in MariaDB 10.2.
fts_check_token(): Remove the special logic for N-gram tokens.
row_merge_write(): Pass the correct (possibly encrypted) buffer
to os_file_write_int_fd().
This bug was introduced in commit 65e1399e64
which included a commit to merge changes from MySQL 5.6.36 to
MariaDB Server 10.0.
Before MDEV-6812, it did not matter that merge_files[].offset was
uninitialized when no files were created.
This problem was introduced in MDEV-6812. There could be a user-visible
impact that the progress reports spit into the error log are bogus.
row_merge_build_indexes(): Initialize merge_files[i].offset.
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
In my merge of the MySQL fix for Oracle Bug#23333990 / WL#9513
I overlooked some subsequent revisions to the test, and I also
failed to notice that the test is actually always failing.
Oracle introduced the parameter innodb_stats_include_delete_marked
but failed to consistently take it into account in FOREIGN KEY
constraints that involve CASCADE or SET NULL.
When innodb_stats_include_delete_marked=ON, obviously the purge of
delete-marked records should update the statistics as well.
One more omission was that statistics were never updated on ROLLBACK.
We are fixing that as well, properly taking into account the
parameter innodb_stats_include_delete_marked.
dict_stats_analyze_index_level(): Simplify an expression.
(Using the ternary operator with a constant operand is unnecessary
obfuscation.)
page_scan_method_t: Revert the change done by Oracle. Instead,
examine srv_stats_include_delete_marked directly where it is needed.
dict_stats_update_if_needed(): Renamed from
row_update_statistics_if_needed().
row_update_for_mysql_using_upd_graph(): Assert that the table statistics
are initialized, as guaranteed by ha_innobase::open(). Update the
statistics in a consistent way, both for FOREIGN KEY triggers and
for the main table. If FOREIGN KEY constraints exist, do not dereference
a freed pointer, but cache the proper value of node->is_delete so that
it matches prebuilt->table.
row_purge_record_func(): Update statistics if
innodb_stats_include_delete_marked=ON.
row_undo_ins(): Update statistics (on ROLLBACK of a fresh INSERT).
This is independent of the parameter; the record is not delete-marked.
row_undo_mod(): Update statistics on the ROLLBACK of updating key columns,
or (if innodb_stats_include_delete_marked=OFF) updating delete-marks.
innodb.innodb_stats_persistent: Renamed and extended from
innodb.innodb_stats_del_mark. Reduced the unnecessarily large dataset
from 262,144 to 32 rows. Test both values of the configuration
parameter innodb_stats_include_delete_marked.
Test that purge is updating the statistics.
innodb_fts.innodb_fts_multiple_index: Adjust the result. The test
is performing a ROLLBACK of an INSERT, which now affects the statistics.
include/wait_all_purged.inc: Moved from innodb.innodb_truncate_debug
to its own file.
The parameter thr of the function btr_cur_optimistic_insert()
is not declared as nonnull, but GCC 7.1.0 with -O3 is wrongly
optimizing away the first part of the condition
UNIV_UNLIKELY(thr && thr_get_trx(thr)->fake_changes)
when the function is being called by row_merge_insert_index_tuples()
with thr==NULL.
The fake_changes is an XtraDB addition. This GCC bug only appears
to have an impact on XtraDB, not InnoDB.
We work around the problem by not attempting to dereference thr
when both BTR_NO_LOCKING_FLAG and BTR_NO_UNDO_LOG_FLAG are set
in the flags. Probably BTR_NO_LOCKING_FLAG alone should suffice.
btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(),
btr_cur_pessimistic_update(): Correct comments that disagree with
usage and with nonnull attributes. No other parameter than thr can
actually be NULL.
row_ins_duplicate_error_in_clust(): Remove an unused parameter.
innobase_is_fake_change(): Unused function; remove.
ibuf_insert_low(), row_log_table_apply(), row_log_apply(),
row_undo_mod_clust_low():
Because we will be passing BTR_NO_LOCKING_FLAG | BTR_NO_UNDO_LOG_FLAG
in the flags, the trx->fake_changes flag will be treated as false,
which is the right thing to do at these low-level operations
(change buffer merge, ALTER TABLE…LOCK=NONE, or ROLLBACK).
This might be fixing actual XtraDB bugs.
Other callers that pass these two flags are also passing thr=NULL,
implying fake_changes=false. (Some callers in ROLLBACK are passing
BTR_NO_LOCKING_FLAG and a nonnull thr. In these callers, fake_changes
better be false, to avoid corruption.)
This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293
from current 5.6.36 innodb.
Bug #23481444 OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE
INDEX APPLIED BY UNCOMMITTED ROW
Problem:
========
row_search_for_mysql() does whole table traversal for range query
even though the end range is passed. Whole table traversal happens
when the record is not with in transaction read view.
Solution:
=========
Convert the innodb last record of page to mysql format and compare
with end range if the traversal of row_search_mvcc() exceeds 100,
no ICP involved. If it is out of range then InnoDB can avoid the
whole table traversal. Need to refactor the code little bit to
make it compile.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com>
Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com>
RB: 14660
The macro UT_LIST_INIT() zero-initializes the UT_LIST_NODE.
There is no need to call this macro on a buffer that has
already been zero-initialized by mem_zalloc() or mem_heap_zalloc()
or similar.
For some reason, the statement UT_LIST_INIT(srv_sys->tasks) in
srv_init() caused a SIGSEGV on server startup when compiling with
GCC 7.1.0 for AMD64 using -O3. The zero-initialization was attempted
by the instruction movaps %xmm0,0x50(%rax), while the proper offset
of srv_sys->tasks would seem to have been 0x48.
Do not silence uncertain cases, or fix any bugs.
The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
Do not silence uncertain cases, or fix any bugs.
The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
There is a race condition related to the variable
srv_stats.n_lock_wait_current_count, which is only
incremented and decremented by the function lock_wait_suspend_thread(),
The incrementing is protected by lock_sys->wait_mutex, but the
decrementing does not appear to be protected by anything.
This mismatch could allow the counter to be corrupted when a
transactional InnoDB table or record lock wait is terminating
roughly at the same time with the start of a wait on a
(possibly different) lock.
ib_counter_t: Remove some unused methods. Prevent instantiation for N=1.
Add an inc() method that takes a slot index as a parameter.
single_indexer_t: Remove.
simple_counter<typename Type, bool atomic=false>: A new counter wrapper.
Optionally use atomic memory operations for modifying the counter.
Aligned to the cache line size.
lsn_ctr_1_t, ulint_ctr_1_t, int64_ctr_1_t: Define as simple_counter<Type>.
These counters are either only incremented (and we do not care about
losing some increment operations), or the increment/decrement operations
are protected by some mutex.
srv_stats_t::os_log_pending_writes: Document that the number is protected
by log_sys->mutex.
srv_stats_t::n_lock_wait_current_count: Use simple_counter<ulint, true>,
that is, atomic inc() and dec() operations.
lock_wait_suspend_thread(): Release the mutexes before incrementing
the counters. Avoid acquiring the lock mutex if the lock wait has
already been resolved. Atomically increment and decrement
srv_stats.n_lock_wait_current_count.
row_insert_for_mysql(), row_update_for_mysql(),
row_update_cascade_for_mysql(): Use the inc() method with the trx->id
as the slot index. This is a non-functional change, just using
inc() instead of add(1).
buf_LRU_get_free_block(): Replace the method add(index, n) with inc().
There is no slot index in the simple_counter.
Use uint32_t for the encryption key_id.
When filling unsigned integer values into INFORMATION_SCHEMA tables,
use the method Field::store(longlong, bool unsigned)
instead of using Field::store(double).
Fix also some miscellanous type mismatch related to ulint (size_t).
row_undo_mod_parse_undo_rec(): Relax the too strict assertion and
correct the comment.
innodb.innodb-blob: Force a flush of the redo log right before
killing the server, to ensure that the code path gets exercised.
(The bogus debug assertion failed on the rollback of the statement
UPDATE t3 SET c=REPEAT('j',3000) WHERE a=2 which did not modify
any indexes before the server was killed.)
This only merges MDEV-12253, adapting it to MDEV-12602 which is already
present in 10.2 but not yet in the 10.1 revision that is being merged.
TODO: Error handling in crash recovery needs to be improved.
If a page cannot be decrypted (or read), we should cleanly abort
the startup. If innodb_force_recovery is specified, we should
ignore the problematic page and apply redo log to other pages.
Currently, the test encryption.innodb-redo-badkey randomly fails
like this (the last messages are from cmake -DWITH_ASAN):
2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption
2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown...
i=================================================================
==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0
#0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750)
#1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4
#2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17
#3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3
#4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2
#5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2
#6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
ANALYSIS
This is regression caused due to worklog 6742 which
implemented ha_innobase::records() which always
uses clustered index to get the row count. Previously
optimizer chose secondary index which was smaller in
size of clustered index to scan for rows and resulted in
a quicker scan.
FIX
After discussion it was decided to remove this feature in 5.7.
[#rb14040 Approved by Kevin and Oystein ]
Split the test case so that a server restart is not needed.
Reduce the test cases and use a simpler mechanism for triggering
and waiting for purge.
fil_table_accessible(): Check if a table can be accessed without
enjoying MDL protection.
PROBLEM
When truncating single tablespace tables, we need to scan the entire
buffer pool to remove the pages of the table from the buffer pool.
During this scan and removal dict_sys->mutex is being held ,causing
stalls in other DDL operations.
FIX
Release the dict_sys->mutex during the scan and reacquire it after the
scan. Make sure that purge thread doesn't purge the records of the table
being truncated and background stats collection thread skips the updation
of stats for the table being truncated.
[#rb 14564 Approved by Jimmy and satya ]
Problem:
=======
Concurrent update dml statement doesn't reflect in virtual index during
inplace table rebuild. It results mismatch value in virutal index and
clustered index. Deleting the table content tries to search the mismatch
value in virtual index but it can't find the value. During log update
apply phase, virtual information is being ignored while constructing
the new entry.
Solution:
=========
In row_log_update_apply phase, build the entry with virtual column
information. So that it can reflect in newly constructed virtual index.
Reviewed-by: Jimmy Yang<jimmy.yang@oracle.com>
RB: 14974
Issue:
======
The issue is that if a fts index is present in a table the space size is
incorrectly calculated in the case of truncate which results in a invalid
read.
Fix:
====
Have a different space size calculation in truncate if fts indexes are
present.
RB:14755
Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com>
Problem :
---------
This bug is filed from the base replication bug#25040331 where the
slave thread times out while INSERT operation waits on GAP lock taken
during Foreign Key validation.
The primary reason for the lock wait is because the statements are
getting replayed in different order. However, we also observed
two things ...
1. The slave thread could always use "Read Committed" isolation for
row level replication.
2. It is not necessary to have GAP locks in "READ Committed" isolation
level in innodb.
This bug is filed to address point(2) to avoid taking GAP locks during
Foreign Key validation.
Solution :
----------
Innodb is primarily designed for "Repeatable Read" and the GAP lock
behaviour is default. For "Read Committed" isolation, we have special
handling in row_search_mvcc to avoid taking the GAP lock while
scanning records.
While looking for Foreign Key, the code is following the default
behaviour taking GAP locks. The suggested fix is to avoid GAP
locking during FK validation similar to normal search operation
(row_search_mvcc) for "Read Committed" isolation level.
Reviewed-by: Sunny Bains <sunny.bains@oracle.com>
RB: 14526
Avoid detaching and exiting from threads that may finish before the
caller has returned from pthread_create(). Only exit from such threads,
without detach and join with them later.
Patch submitted by: Laurynas Biveinis <laurynas.biveinis@gmail.com>
RB: 13983
Reviewed by: Sunny Bains <sunny.bains@oracle.com>
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.
This patch should also address following test failures and
bugs:
MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.
MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot
MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot
MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8
Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.
Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().
Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.
btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.
buf_page_get_zip(): Issue error if page read fails.
buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.
buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.
buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.
buf_page_io_complete(): use dberr_t for error handling.
buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
Issue error if page read fails.
btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.
Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()
dict_stats_update_transient_for_index(),
dict_stats_update_transient()
Do not continue if table decryption failed or table
is corrupted.
dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.
fil_parse_write_crypt_data():
Check that key read from redo log entry is found from
encryption plugin and if it is not, refuse to start.
PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.
Fixed error code on innodb.innodb test.
Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2. Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.
Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.
fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.
Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.
Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.
fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.
recv_apply_hashed_log_recs() does not return anything.