This only merges MDEV-12253, adapting it to MDEV-12602 which is already
present in 10.2 but not yet in the 10.1 revision that is being merged.
TODO: Error handling in crash recovery needs to be improved.
If a page cannot be decrypted (or read), we should cleanly abort
the startup. If innodb_force_recovery is specified, we should
ignore the problematic page and apply redo log to other pages.
Currently, the test encryption.innodb-redo-badkey randomly fails
like this (the last messages are from cmake -DWITH_ASAN):
2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption
2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown...
i=================================================================
==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0
#0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750)
#1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4
#2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17
#3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3
#4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2
#5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2
#6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.
This patch should also address following test failures and
bugs:
MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.
MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot
MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot
MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8
Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.
Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().
Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.
btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.
buf_page_get_zip(): Issue error if page read fails.
buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.
buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.
buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.
buf_page_io_complete(): use dberr_t for error handling.
buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
Issue error if page read fails.
btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.
Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()
dict_stats_update_transient_for_index(),
dict_stats_update_transient()
Do not continue if table decryption failed or table
is corrupted.
dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.
fil_parse_write_crypt_data():
Check that key read from redo log entry is found from
encryption plugin and if it is not, refuse to start.
PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.
Fixed error code on innodb.innodb test.
Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2. Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.
Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.
fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.
Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.
Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.
fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.
recv_apply_hashed_log_recs() does not return anything.
InnoDB defines some functions that are not called at all.
Other functions are called, but only from the same compilation unit.
Remove some function declarations and definitions, and add 'static'
keywords. Some symbols must be kept for separately compiled tools,
such as innochecksum.
Also, remove empty .ic files that were not removed by my MySQL commit.
Problem:
InnoDB used to support a compilation mode that allowed to choose
whether the function definitions in .ic files are to be inlined or not.
This stopped making sense when InnoDB moved to C++ in MySQL 5.6
(and ha_innodb.cc started to #include .ic files), and more so in
MySQL 5.7 when inline methods and functions were introduced
in .h files.
Solution:
Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from
all files, assuming that the symbols are never defined.
Remove the files fut0fut.cc and ut0byte.cc which only mattered when
UNIV_NONINL was defined.
The InnoDB adaptive hash index is sometimes degrading the performance of
InnoDB, and it is sometimes disabled to get more consistent performance.
We should have a compile-time option to disable the adaptive hash index.
Let us introduce two options:
OPTION(WITH_INNODB_AHI "Include innodb_adaptive_hash_index" ON)
OPTION(WITH_INNODB_ROOT_GUESS "Cache index root block descriptors" ON)
where WITH_INNODB_AHI always implies WITH_INNODB_ROOT_GUESS.
As part of this change, the misleadingly named function
trx_search_latch_release_if_reserved(trx) will be replaced with the macro
trx_assert_no_search_latch(trx) that will be empty unless
BTR_CUR_HASH_ADAPT is defined (cmake -DWITH_INNODB_AHI=ON).
We will also remove the unused column
INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_TIMEOUT.
In MariaDB Server 10.1, it used to reflect the value of
trx_t::search_latch_timeout which could be adjusted during
row_search_for_mysql(). In 10.2, there is no such field.
Other than the removal of the unused column TRX_ADAPTIVE_HASH_TIMEOUT,
this is an almost non-functional change to the server when using the
default build options.
Some tests are adjusted so that they will work with both
-DWITH_INNODB_AHI=ON and -DWITH_INNODB_AHI=OFF. The test
innodb.innodb_monitor has been renamed to innodb.monitor
in order to track MySQL 5.7, and the duplicate tests
sys_vars.innodb_monitor_* are removed.
Oracle introduced a Memcached plugin interface to the InnoDB
storage engine in MySQL 5.6. That interface is essentially a
fork of Memcached development snapshot 1.6.0-beta1 of an old
development branch 'engine-pu'.
To my knowledge, there have not been any updates to the Memcached code
between MySQL 5.6 and 5.7; only bug fixes and extensions related to
the Oracle modifications.
The Memcached plugin is not part of the MariaDB Server. Therefore it
does not make sense to include the InnoDB interfaces for the Memcached
plugin, or to have any related configuration parameters:
innodb_api_bk_commit_interval
innodb_api_disable_rowlock
innodb_api_enable_binlog
innodb_api_enable_mdl
innodb_api_trx_level
Removing this code in one commit makes it possible to easily restore
it, in case it turns out to be needed later.
This should be functionally equivalent to WL#6204 in MySQL 8.0.0, with
the notable difference that the file format changes are limited to
repurposing a previously unused data field in B-tree pages.
For persistent InnoDB tables, write the last used AUTO_INCREMENT
value to the root page of the clustered index, in the previously
unused (0) PAGE_MAX_TRX_ID field, now aliased as PAGE_ROOT_AUTO_INC.
Unlike some other previously unused InnoDB data fields, this one was
actually always zero-initialized, at least since MySQL 3.23.49.
The writes to PAGE_ROOT_AUTO_INC are protected by SX or X latch on the
root page. The SX latch will allow concurrent read access to the root
page. (The field PAGE_ROOT_AUTO_INC will only be read on the
first-time call to ha_innobase::open() from the SQL layer. The
PAGE_ROOT_AUTO_INC can only be updated when executing SQL, so
read/write races are not possible.)
During INSERT, the PAGE_ROOT_AUTO_INC is updated by the low-level
function btr_cur_search_to_nth_level(), adding no extra page
access. [Adaptive hash index lookup will be disabled during INSERT.]
If some rare UPDATE modifies an AUTO_INCREMENT column, the
PAGE_ROOT_AUTO_INC will be adjusted in a separate mini-transaction in
ha_innobase::update_row().
When a page is reorganized, we have to preserve the PAGE_ROOT_AUTO_INC
field.
During ALTER TABLE, the initial AUTO_INCREMENT value will be copied
from the table. ALGORITHM=COPY and online log apply in LOCK=NONE will
update PAGE_ROOT_AUTO_INC in real time.
innodb_col_no(): Determine the dict_table_t::cols[] element index
corresponding to a Field of a non-virtual column.
(The MySQL 5.7 implementation of virtual columns breaks the 1:1
relationship between Field::field_index and dict_table_t::cols[].
Virtual columns are omitted from dict_table_t::cols[]. Therefore,
we must translate the field_index of AUTO_INCREMENT columns into
an index of dict_table_t::cols[].)
Upgrade from old data files:
By default, the AUTO_INCREMENT sequence in old data files would appear
to be reset, because PAGE_MAX_TRX_ID or PAGE_ROOT_AUTO_INC would contain
the value 0 in each clustered index page. In new data files,
PAGE_ROOT_AUTO_INC can only be 0 if the table is empty or does not contain
any AUTO_INCREMENT column.
For backward compatibility, we use the old method of
SELECT MAX(auto_increment_column) for initializing the sequence.
btr_read_autoinc(): Read the AUTO_INCREMENT sequence from a new-format
data file.
btr_read_autoinc_with_fallback(): A variant of btr_read_autoinc()
that will resort to reading MAX(auto_increment_column) for data files
that did not use AUTO_INCREMENT yet. It was manually tested that during
the execution of innodb.autoinc_persist the compatibility logic is
not activated (for new files, PAGE_ROOT_AUTO_INC is never 0 in nonempty
clustered index root pages).
initialize_auto_increment(): Replaces
ha_innobase::innobase_initialize_autoinc(). This initializes
the AUTO_INCREMENT metadata. Only called from ha_innobase::open().
ha_innobase::info_low(): Do not try to lazily initialize
dict_table_t::autoinc. It must already have been initialized by
ha_innobase::open() or ha_innobase::create().
Note: The adjustments to class ha_innopart were not tested, because
the source code (native InnoDB partitioning) is not being compiled.
* remove old 5.2+ InnoDB support for virtual columns
* enable corresponding parts of the innodb-5.7 sources
* copy corresponding test cases from 5.7
* copy detailed Alter_inplace_info::HA_ALTER_FLAGS flags from 5.7
- and more detailed detection of changes in fill_alter_inplace_info()
* more "innodb compatibility hooks" in sql_class.cc to
- create/destroy/reset a THD (used by background purge threads)
- find a prelocked table by name
- open a table (from a background purge thread)
* different from 5.7:
- new service thread "thd_destructor_proxy" to make sure all THDs are
destroyed at the correct point in time during the server shutdown
- proper opening/closing of tables for vcol evaluations in
+ FK checks (use already opened prelocked tables)
+ purge threads (open the table, MDLock it, add it to tdc, close
when not needed)
- cache open tables in vc_templ
- avoid unnecessary allocations, reuse table->record[0] and table->s->default_values
- not needed in 5.7, because it overcalculates:
+ tell the server to calculate vcols for an on-going inline ADD INDEX
+ calculate vcols for correct error messages
* update other engines (mroonga/tokudb) accordingly
WL#7682 in MySQL 5.7 introduced the possibility to create light-weight
temporary tables in InnoDB. These are called 'intrinsic temporary tables'
in InnoDB, and in MySQL 5.7, they can be created by the optimizer for
sorting or buffering data in query processing.
In MariaDB 10.2, the optimizer temporary tables cannot be created in
InnoDB, so we should remove the dead code and related data structures.
Revert the XtraDB changes, because 10.2 does not currently build with XtraDB.
Also omit some changes that need further investigation.
Ensure that all callers of partition_info::get_clone() are passing this!=NULL.
Contains also:
MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE || m_lock_type != 2' failed. (branch bb-10.2-jan)
Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
enable tests that were fixed in MDEV-10549
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables
Contains also
MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7
The failure happened because 5.7 has changed the signature of
the bool handler::primary_key_is_clustered() const
virtual function ("const" was added). InnoDB was using the old
signature which caused the function not to be used.
MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7
Fixed mutexing problem on lock_trx_handle_wait. Note that
rpl_parallel and rpl_optimistic_parallel tests still
fail.
MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
Reason: incorrect merge
MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
Reason: incorrect merge
Analysis: Problem was that in fil_read_first_page we do find that
table has encryption information and that encryption service
or used key_id is not available. But, then we just printed
fatal error message that causes above assertion.
Fix: When we open single table tablespace if it has encryption
information (crypt_data) store this crypt data to the table
structure. When we open a table and we find out that tablespace
is not available, check has table a encryption information
and from there is encryption service or used key_id is not available.
If it is, add additional warning for SQL-layer.
Merge Facebook commit 154c579b828a60722a7d9477fc61868c07453d08
and e8f0052f9b112dc786bf9b957ed5b16a5749f7fd authored
by Steaphan Greene from https://github.com/facebook/mysql-5.6
Optimize prefix index queries to skip cluster index lookup when possible.
Currently InnoDB will always fetch the clustered index (primary key
index) for all prefix columns in an index, even when the value of a
particular record is smaller than the prefix length. This change
optimizes that case to use the record from the secondary index and avoid
the extra lookup.
Also adds two status vars that track how effective this is:
innodb_secondary_index_triggered_cluster_reads:
Times secondary index lookup triggered cluster lookup.
innodb_secondary_index_triggered_cluster_reads_avoided:
Times prefix optimization avoided triggering cluster lookup.
Merge Facebook commit 25295d003cb0c17aa8fb756523923c77250b3294
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This adds a pointer to the trx to each mtr.
This allows the trx to be accessed in parts of the code
where it was otherwise not available. This is needed later.
Update InnoDB to 5.6.14
Apply MySQL-5.6 hack for MySQL Bug#16434374
Move Aria-only HA_RTREE_INDEX from my_base.h to maria_def.h (breaks an assert in InnoDB)
Fix InnoDB memory leak