Commit graph

346 commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
9d57468dde Bug #25357789 INNODB: LATCH ORDER VIOLATION DURING TRUNCATE TABLE IF INNODB_SYNC_DEBUG ENABLED
Analysis:
========

(1) During TRUNCATE of file_per_table tablespace, dict_operation_lock is
released before eviction of dirty pages of a tablespace from the buffer
pool. After eviction, we try to re-acquire
dict_operation_lock (higher level latch) but we already hold lower
level latch (index->lock). This causes latch order violation

(2) Deadlock issue is present if child table is being truncated and it
holds index lock. At the same time, cascade dml happens and it took
dict_operation_lock and waiting for index lock.

Fix:
====
1) Release the indexes lock before releasing the dict operation lock.

2) Ignore the cascading dml operation on the parent table, for the
cascading foreign key, if the child table is truncated or if it is
in the process of being truncated.

Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Kevin Lewis <kevin.lewis@oracle.com>
RB: 16122
2017-08-09 22:28:30 +03:00
Jan Lindström
34eef269eb MDEV-11939: innochecksum mistakes a file for an encrypted one (page 0 invalid)
Always read full page 0 to determine does tablespace contain
encryption metadata. Tablespaces that are page compressed or
page compressed and encrypted do not compare checksum as
it does not exists. For encrypted tables use checksum
verification written for encrypted tables and normal tables
use normal method.

buf_page_is_checksum_valid_crc32
buf_page_is_checksum_valid_innodb
buf_page_is_checksum_valid_none
        Modify Innochecksum logging to file to avoid compilation
	warnings.

fil0crypt.cc fil0crypt.h
        Modify to be able to use in innochecksum compilation and
        move fil_space_verify_crypt_checksum to end of the file.
        Add innochecksum logging to file.

univ.i
        Add innochecksum strict_verify, log_file and cur_page_num
        variables as extern.

page_zip_verify_checksum
        Add innochecksum logging to file and remove unnecessary code.

innochecksum.cc
        Lot of changes most notable able to read encryption
        metadata from page 0 of the tablespace.

Added test case where we corrupt intentionally
FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION (encryption key version)
FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION+4 (post encryption checksum)
FIL_DATA+10 (data)
2017-08-08 09:41:09 +03:00
Marko Mäkelä
e3d3147792 MDEV-13105 InnoDB fails to load a table with PAGE_COMPRESSION_LEVEL after upgrade from 10.1.20
When using innodb_page_size=16k, InnoDB tables
that were created in MariaDB 10.1.0 to 10.1.20 with
PAGE_COMPRESSED=1 and
PAGE_COMPRESSION_LEVEL=2 or PAGE_COMPRESSION_LEVEL=3
would fail to load.

fsp_flags_is_valid(): When using innodb_page_size=16k, use a
more strict check for .ibd files, with the assumption that
nobody would try to use different-page-size files.
2017-07-05 14:35:55 +03:00
Marko Mäkelä
8c71c6aa8b MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2
InnoDB I/O and buffer pool interfaces and the redo log format
have been changed between MariaDB 10.1 and 10.2, and the backup
code has to be adjusted accordingly.

The code has been simplified, and many memory leaks have been fixed.
Instead of the file name xtrabackup_logfile, the file name ib_logfile0
is being used for the copy of the redo log. Unnecessary InnoDB startup and
shutdown and some unnecessary threads have been removed.

Some help was provided by Vladislav Vaintroub.

Parameters have been cleaned up and aligned with those of MariaDB 10.2.

The --dbug option has been added, so that in debug builds,
--dbug=d,ib_log can be specified to enable diagnostic messages
for processing redo log entries.

By default, innodb_doublewrite=OFF, so that --prepare works faster.
If more crash-safety for --prepare is needed, double buffering
can be enabled.

The parameter innodb_log_checksums=OFF can be used to ignore redo log
checksums in --backup.

Some messages have been cleaned up.
Unless --export is specified, Mariabackup will not deal with undo log.
The InnoDB mini-transaction redo log is not only about user-level
transactions; it is actually about mini-transactions. To avoid confusion,
call it the redo log, not transaction log.

We disable any undo log processing in --prepare.

Because MariaDB 10.2 supports indexed virtual columns, the
undo log processing would need to be able to evaluate virtual column
expressions. To reduce the amount of code dependencies, we will not
process any undo log in prepare.

This means that the --export option must be disabled for now.

This also means that the following options are redundant
and have been removed:
	xtrabackup --apply-log-only
	innobackupex --redo-only

In addition to disabling any undo log processing, we will disable any
further changes to data pages during --prepare, including the change
buffer merge. This means that restoring incremental backups should
reliably work even when change buffering is being used on the server.
Because of this, preparing a backup will not generate any further
redo log, and the redo log file can be safely deleted. (If the
--export option is enabled in the future, it must generate redo log
when processing undo logs and buffered changes.)

In --prepare, we cannot easily know if a partial backup was used,
especially when restoring a series of incremental backups. So, we
simply warn about any missing files, and ignore the redo log for them.

FIXME: Enable the --export option.

FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write
a test that initiates a backup while an ALGORITHM=INPLACE operation
is creating indexes or rebuilding a table. An error should be detected
when preparing the backup.

FIXME: In --incremental --prepare, xtrabackup_apply_delta() should
ensure that if FSP_SIZE is modified, the file size will be adjusted
accordingly.
2017-07-05 11:43:28 +03:00
Marko Mäkelä
615b1f4189 Merge 10.1 into 10.2
innodb.table_flags: Adjust the test case. Due to the MDEV-12873 fix
in 10.2, the corrupted flags for table test.td would be converted,
and a tablespace flag mismatch will occur when trying to open the file.
2017-06-15 14:35:51 +03:00
Marko Mäkelä
72378a2583 MDEV-12873 InnoDB SYS_TABLES.TYPE incompatibility for PAGE_COMPRESSED=YES in MariaDB 10.2.2 to 10.2.6
Remove the SHARED_SPACE flag that was erroneously introduced in
MariaDB 10.2.2, and shift the SYS_TABLES.TYPE flags back to where
they were before MariaDB 10.2.2. While doing this, ensure that
tables created with affected MariaDB versions can be loaded,
and also ensure that tables created with MySQL 5.7 using the
TABLESPACE attribute cannot be loaded.

MariaDB 10.2.2 picked the SHARED_SPACE flag from MySQL 5.7,
shifting the MariaDB 10.1 flags PAGE_COMPRESSION, PAGE_COMPRESSION_LEVEL,
ATOMIC_WRITES by one bit. The SHARED_SPACE flag would always
be written as 0 by MariaDB, because MariaDB does not support
CREATE TABLESPACE or CREATE TABLE...TABLESPACE for InnoDB.

So, instead of the bits AALLLLCxxxxxxx we would have
AALLLLC0xxxxxxx if the table was created with MariaDB 10.2.2
to 10.2.6. (AA=ATOMIC_WRITES, LLLL=PAGE_COMPRESSION_LEVEL,
C=PAGE_COMPRESSED, xxxxxxx=7 bits that were not moved.)

PAGE_COMPRESSED=NO implies LLLLC=00000. That is not a problem.

If someone created a table in MariaDB 10.2.2 or 10.2.3 with
the attribute ATOMIC_WRITES=OFF (value 2; AA=10) and without
PAGE_COMPRESSED=YES or PAGE_COMPRESSION_LEVEL, the table should be
rejected. We ignore this problem, because it should be unlikely
for anyone to specify ATOMIC_WRITES=OFF, and because 10.2.2 and
10.2.2 were not mature releases. The value ATOMIC_WRITES=ON (1)
would be interpreted as ATOMIC_WRITES=OFF, but starting with
MariaDB 10.2.4 the ATOMIC_WRITES attribute is ignored.

PAGE_COMPRESSED=YES implies that PAGE_COMPRESSION_LEVEL be between
1 and 9 and that ROW_FORMAT be COMPACT or DYNAMIC. Thus, the affected
wrong bit pattern in SYS_TABLES.TYPE is of the form AALLLL10DB00001
where D signals the presence of a DATA DIRECTORY attribute and B is 1
for ROW_FORMAT=DYNAMIC and 0 for ROW_FORMAT=COMPACT. We must interpret
this bit pattern as AALLLL1DB00001 (discarding the extraneous 0 bit).

dict_sys_tables_rec_read(): Adjust the affected bit pattern when
reading the SYS_TABLES.TYPE column. In case of invalid flags,
report both SYS_TABLES.TYPE (after possible adjustment) and
SYS_TABLES.MIX_LEN.

dict_load_table_one(): Replace an unreachable condition on
!dict_tf2_is_valid() with a debug assertion. The flags will already
have been validated by dict_sys_tables_rec_read(); if that validation
fails, dict_load_table_low() will have failed.

fil_ibd_create(): Shorten an error message about a file pre-existing.

Datafile::validate_to_dd(): Clarify an error message about tablespace
flags mismatch.

ha_innobase::open(): Remove an unnecessary warning message.

dict_tf_is_valid(): Simplify and stricten the logic. Validate the
values of PAGE_COMPRESSION. Remove error log output; let the callers
handle that.

DICT_TF_BITS: Remove ATOMIC_WRITES, PAGE_ENCRYPTION, PAGE_ENCRYPTION_KEY.
The ATOMIC_WRITES is ignored once the SYS_TABLES.TYPE has been validated;
there is no need to store it in dict_table_t::flags. The PAGE_ENCRYPTION
and PAGE_ENCRYPTION_KEY are unused since MariaDB 10.1.4 (the GA release
was 10.1.8).

DICT_TF_BIT_MASK: Remove (unused).

FSP_FLAGS_MEM_ATOMIC_WRITES: Remove (the flags are never read).

row_import_read_v1(): Display an error if dict_tf_is_valid() fails.
2017-06-15 14:26:06 +03:00
Marko Mäkelä
58f87a41bd Remove some fields from dict_table_t
dict_table_t::thd: Remove. This was only used by btr_root_block_get()
for reporting decryption failures, and it was only assigned by
ha_innobase::open(), and never cleared. This could mean that if a
connection is closed, the pointer would become stale, and the server
could crash while trying to report the error. It could also mean
that an error is being reported to the wrong client. It is better
to use current_thd in this case, even though it could mean that if
the code is invoked from an InnoDB background operation, there would
be no connection to which to send the error message.

Remove dict_table_t::crypt_data and dict_table_t::page_0_read.
These fields were never read.

fil_open_single_table_tablespace(): Remove the parameter "table".
2017-06-15 12:41:02 +03:00
Marko Mäkelä
a78476d342 Merge 10.1 into 10.2 2017-06-12 17:43:07 +03:00
Marko Mäkelä
3005cebc96 Post-push fix for MDEV-12610 MariaDB start is slow
fil_crypt_read_crypt_data(): Remove an unnecessary
acquisition of fil_system->mutex. Remove a duplicated condition
from the callers.
2017-06-12 17:10:56 +03:00
Jan Lindström
58c56dd7f8 MDEV-12610: MariaDB start is slow
Problem appears to be that the function fsp_flags_try_adjust()
is being unconditionally invoked on every .ibd file on startup.
Based on performance investigation also the top function
fsp_header_get_crypt_offset() needs to addressed.

Ported implementation of fsp_header_get_encryption_offset()
function from 10.2 to fsp_header_get_crypt_offset().

Introduced a new function fil_crypt_read_crypt_data()
to read page 0 if it is not yet read.

fil_crypt_find_space_to_rotate(): Now that page 0 for every .ibd
file is not read on startup we need to check has page 0 read
from space that we investigate for key rotation, if it is not read
we read it.

fil_space_crypt_get_status(): Now that page 0 for every .ibd
file is not read on startup here also we need to read page 0
if it is not yet read it. This is needed
as tests use IS query to wait until background encryption
or decryption has finished and this function is used to
produce results.

fil_crypt_thread(): Add is_stopping condition for tablespace
so that we do not rotate pages if usage of tablespace should
be stopped. This was needed for failure seen on regression
testing.

fil_space_create: Remove page_0_crypt_read and extra
unnecessary info output.

fil_open_single_table_tablespace(): We call fsp_flags_try_adjust
only when when no errors has happened and server was not started
on read only mode and tablespace validation was requested or
flags contain other table options except low order bits to
FSP_FLAGS_POS_PAGE_SSIZE position.

fil_space_t::page_0_crypt_read removed.

Added test case innodb-first-page-read to test startup when
encryption is on and when encryption is off to check that not
for all tables page 0 is read on startup.
2017-06-09 13:15:39 +03:00
Marko Mäkelä
2d8fdfbde5 Merge 10.1 into 10.2
Replace have_innodb_zip.inc with innodb_page_size_small.inc.
2017-06-08 12:45:08 +03:00
Marko Mäkelä
fbeb9489cd Cleanup of MDEV-12600: crash during install_db with innodb_page_size=32K and ibdata1=3M
The doublewrite buffer pages must fit in the first InnoDB system
tablespace data file. The checks that were added in the initial patch
(commit 112b21da37)
were at too high level and did not cover all cases.

innodb.log_data_file_size: Test all innodb_page_size combinations.

fsp_header_init(): Never return an error. Move the change buffer creation
to the only caller that needs to do it.

btr_create(): Clean up the logic. Remove the error log messages.

buf_dblwr_create(): Try to return an error on non-fatal failure.
Check that the first data file is big enough for creating the
doublewrite buffers.

buf_dblwr_process(): Check if the doublewrite buffer is available.
Display the message only if it is available.

recv_recovery_from_checkpoint_start_func(): Remove a redundant message
about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already
been initiated.

fil_report_invalid_page_access(): Simplify the message.

fseg_create_general(): Do not emit messages to the error log.

innobase_init(): Revert the changes.

trx_rseg_create(): Refactor (no functional change).
2017-06-08 11:55:47 +03:00
Jan Lindström
112b21da37 MDEV-12600: crash during install_db with innodb_page_size=32K and ibdata1=3M;
Problem was that all doublewrite buffer pages must fit to first
system datafile.

Ported commit 27a34df7882b1f8ed283f22bf83e8bfc523cbfde
Author: Shaohua Wang <shaohua.wang@oracle.com>
Date:   Wed Aug 12 15:55:19 2015 +0800

    BUG#21551464 - SEGFAULT WHILE INITIALIZING DATABASE WHEN
    INNODB_DATA_FILE SIZE IS SMALL

To 10.1 (with extended error printout).

btr_create(): If ibuf header page allocation fails report error and
return FIL_NULL. Similarly if root page allocation fails return a error.

dict_build_table_def_step: If fsp_header_init fails return
error code.

fsp_header_init: returns true if header initialization succeeds
and false if not.

fseg_create_general: report error if segment or page allocation fails.

innobase_init: If first datafile is smaller than 3M and could not
contain all doublewrite buffer pages report error and fail to
initialize InnoDB plugin.

row_truncate_table_for_mysql: report error if fsp header init
fails.

srv_init_abort: New function to report database initialization errors.

srv_undo_tablespaces_init, innobase_start_or_create_for_mysql: If
database initialization fails report error and abort.

trx_rseg_create: If segment header creation fails return.
2017-06-01 14:07:48 +03:00
Jan Lindström
6b6987154a MDEV-12114: install_db shows corruption for rest encryption and innodb_checksum_algorithm=strict_none
Problem was that checksum check resulted false positives that page is
both not encrypted and encryted when checksum_algorithm was
strict_none.

Encrypton checksum will use only crc32 regardless of setting.

buf_zip_decompress: If compression fails report a error message
containing the space name if available (not available during import).
And note if space could be encrypted.

buf_page_get_gen: Do not assert if decompression fails,
instead unfix the page and return NULL to upper layer.

fil_crypt_calculate_checksum: Use only crc32 method.

fil_space_verify_crypt_checksum: Here we need to check
crc32, innodb and none method for old datafiles.

fil_space_release_for_io: Allow null space.

encryption.innodb-compressed-blob is now run with crc32 and none
combinations.

Note that with none and strict_none method there is not really
a way to detect page corruptions and page corruptions after
decrypting the page with incorrect key.

New test innodb-checksum-algorithm to test different checksum
algorithms with encrypted, row compressed and page compressed
tables.
2017-06-01 14:07:48 +03:00
Jan Lindström
1af8bf39ca MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M;
Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for
encrypted pages even in system datafiles should contain key_version
except very first page (0:0) is after encryption overwritten with
flush lsn.

Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1
The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during
InnoDB startup.

At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
from the first page of each file in the InnoDB system tablespace.
If there are multiple files, the minimum and maximum LSN can differ.
These numbers are passed to InnoDB startup.

Having the number in other files than the first file of the InnoDB
system tablespace is not providing much additional value. It is
conflicting with other use of the field, such as on InnoDB R-tree
index pages and encryption key_version.

This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to
other files than the first file of the InnoDB system tablespace
(page number 0:0) when system tablespace is encrypted. If tablespace
is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
to all first pages of system tablespace to avoid unnecessary
warnings on downgrade.

open_or_create_data_files(): pass only one flushed_lsn parameter

xb_load_tablespaces(): pass only one flushed_lsn parameter.

buf_page_create(): Improve comment about where
FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set.

fil_write_flushed_lsn(): A new function, merged from
fil_write_lsn_and_arch_no_to_file() and
fil_write_flushed_lsn_to_data_files().
Only write to the first page of the system tablespace (page 0:0)
if tablespace is encrypted, or write all first pages of system
tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE)
afterwards.

fil_read_first_page(): read flush_lsn and crypt_data only from
first datafile.

fil_open_single_table_tablespace(): Remove output of LSN, because it
was only valid for the system tablespace and the undo tablespaces, not
user tablespaces.

fil_validate_single_table_tablespace(): Remove output of LSN.

checkpoint_now_set(): Use fil_write_flushed_lsn and output
a error if operation fails.

Remove lsn variable from fsp_open_info.

recv_recovery_from_checkpoint_start(): Remove unnecessary second
flush_lsn parameter.

log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn
and output error if it fails.

open_or_create_data_files(): Pass only one flushed_lsn variable.
2017-06-01 14:07:48 +03:00
Marko Mäkelä
22e5e64c0d MDEV-11623 merge fix: Use the correct flags in an error message 2017-05-29 14:37:24 +03:00
Marko Mäkelä
8f643e2063 Merge 10.1 into 10.2 2017-05-23 11:09:47 +03:00
Marko Mäkelä
70505dd45b Merge 10.1 into 10.2 2017-05-22 09:46:51 +03:00
Jan Lindström
90c52e5291 MDEV-12615: InnoDB page compression method snappy mostly does not compress pages
Snappy compression method require that output buffer
used for compression is bigger than input buffer.
Similarly lzo require additional work memory buffer.
Increase the allocated buffer accordingly.

buf_tmp_buffer_t: removed unnecessary lzo_mem, crypt_buf_free and
comp_buf_free.

buf_pool_reserve_tmp_slot: use alligned_alloc and if snappy
available allocate size based on snappy_max_compressed_length and
if lzo is available increase buffer by LZO1X_1_15_MEM_COMPRESS.

fil_compress_page: Remove unneeded lzo mem (we use same buffer)
and if output buffer is not yet allocated allocate based similarly
as above.

Decompression does not require additional work area.

    Modify test to use same test as other compression method tests.
2017-05-20 21:51:34 +03:00
Marko Mäkelä
a4d4a5fe82 After-merge fix for MDEV-11638
In commit 360a4a0372
some debug assertions were introduced to the page flushing code
in XtraDB. Add these assertions to InnoDB as well, and adjust
the InnoDB shutdown so that these assertions will not fail.

logs_empty_and_mark_files_at_shutdown(): Advance
srv_shutdown_state from the first phase SRV_SHUTDOWN_CLEANUP
only after no page-dirtying activity is possible
(well, except by srv_master_do_shutdown_tasks(), which will be
fixed separately in MDEV-12052).

rotate_thread_t::should_shutdown(): Already exit the key rotation
threads at the first phase of shutdown (SRV_SHUTDOWN_CLEANUP).

page_cleaner_sleep_if_needed(): Do not sleep during shutdown.
This change is originally from XtraDB.
2017-05-20 08:41:34 +03:00
Marko Mäkelä
65e1399e64 Merge 10.0 into 10.1
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
2017-05-20 08:41:20 +03:00
Marko Mäkelä
13a350ac29 Merge 10.0 into 10.1 2017-05-19 12:29:37 +03:00
Vicențiu Ciorbaru
45898c2092 Merge remote-tracking branch 'origin/10.0' into 10.0 2017-05-18 15:45:55 +03:00
Jan Lindström
f302a3cf9d MDEV-12593: InnoDB page compression should use lz4_compress_default if
available

lz4.cmake: Check if shared or static lz4 library has LZ4_compress_default
function and if it has define HAVE_LZ4_COMPRESS_DEFAULT.

fil_compress_page: If HAVE_LZ4_COMPRESS_DEFAULT is defined use
LZ4_compress_default function for compression if not use
LZ4_compress_limitedOutput function.

Introduced a innodb-page-compression.inc file for page compression
tests that will also search .ibd file to verify that pages
are compressed (i.e. used search string is not found). Modified
page compression tests to use this file.

Note that snappy method is not included because of MDEV-12615
InnoDB page compression method snappy mostly does not compress pages
that will be fixed on different commit.
2017-05-18 09:29:44 +03:00
Vicențiu Ciorbaru
b87873b221 Merge branch 'merge-innodb-5.6' into bb-10.0-vicentiu
This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293
from current 5.6.36 innodb.

Bug #23481444	OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE
                       INDEX APPLIED BY UNCOMMITTED ROW
Problem:
========
row_search_for_mysql() does whole table traversal for range query
even though the end range is passed. Whole table traversal happens
when the record is not with in transaction read view.

Solution:
=========

Convert the innodb last record of page to mysql format and compare
with end range if the traversal of row_search_mvcc() exceeds 100,
no ICP involved. If it is out of range then InnoDB can avoid the
whole table traversal. Need to refactor the code little bit to
make it compile.

Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com>
Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com>
RB: 14660
2017-05-17 14:53:28 +03:00
Marko Mäkelä
e22d86a3eb fil_create_new_single_table_tablespace(): Correct a bogus nonnull attribute
The parameter path can be passed as NULL.
This error was reported by GCC 7.1.0 when compiling
CMAKE_BUILD_TYPE=Debug with -O3.
2017-05-17 13:49:51 +03:00
Marko Mäkelä
956d2540c4 Remove redundant UT_LIST_INIT() calls
The macro UT_LIST_INIT() zero-initializes the UT_LIST_NODE.
There is no need to call this macro on a buffer that has
already been zero-initialized by mem_zalloc() or mem_heap_zalloc()
or similar.

For some reason, the statement UT_LIST_INIT(srv_sys->tasks) in
srv_init() caused a SIGSEGV on server startup when compiling with
GCC 7.1.0 for AMD64 using -O3. The zero-initialization was attempted
by the instruction movaps %xmm0,0x50(%rax), while the proper offset
of srv_sys->tasks would seem to have been 0x48.
2017-05-17 10:33:49 +03:00
Marko Mäkelä
febe88198e Make some variables const in fil_iterate()
This is a non-functional change to make it slightly easier
to read the code. We seem to have some bugs in this
IMPORT TABLESPACE code; see MDEV-12396.
2017-05-17 08:54:16 +03:00
Vicențiu Ciorbaru
0af9818240 5.6.36 2017-05-15 17:17:16 +03:00
Marko Mäkelä
021d636551 Fix some integer type mismatch.
Use uint32_t for the encryption key_id.

When filling unsigned integer values into INFORMATION_SCHEMA tables,
use the method Field::store(longlong, bool unsigned)
instead of using Field::store(double).

Fix also some miscellanous type mismatch related to ulint (size_t).
2017-05-10 12:45:46 +03:00
Marko Mäkelä
588a6a186a MDEV-12750 Fix crash recovery of key rotation
When MySQL 5.7.9 was merged to MariaDB 10.2.2, an important
debug assertion was omitted from mlog_write_initial_log_record_low().

mlog_write_initial_log_record_low(): Put back the assertion
mtr_t::is_named_space().

fil_crypt_start_encrypting_space(), fil_crypt_rotate_page():
Call mtr_t::set_named_space() before modifying any pages.

fsp_flags_try_adjust(): Call mtr_t::set_named_space(). This additional
breakage was introduced in the merge of MDEV-11623 from 10.1. It was
not caught because of the missing debug assertion in
mlog_write_initial_log_record_low().

Remove some suppressions from the encryption.innodb-redo-badkey test.
2017-05-09 21:03:27 +03:00
Marko Mäkelä
d7cfe2c4f3 MDEV-12253 post-fix: Do not leak memory in crash recovery
This is a backport from 10.2 where it fixes the
cmake -DWITH_ASAN test failure that was mentioned
in commit f9cc391863
(merging MDEV-12253 from 10.1 to 10.2).

fil_parse_write_crypt_data(): If the tablespace is not found,
invoke fil_space_destroy_crypt_data(&crypt_data) to properly
free the created object.
2017-05-09 14:36:17 +03:00
Sergei Golubchik
c91ecf9e9b Merge branch '10.1' into 10.2
Revert commit db0917f68f, because the fix for MDEV-12696
is coming from 5.5 and 10.1 in this merge.
2017-05-09 13:24:52 +02:00
Marko Mäkelä
2645bda5f2 MDEV-12253 post-fix: Do not leak memory in crash recovery
This fixes the cmake -DWITH_ASAN test failure that was mentioned
in commit f9cc391863 (merging
MDEV-12253 from 10.1 to 10.2).

fil_parse_write_crypt_data(): If the tablespace is not found,
invoke fil_space_destroy_crypt_data(&crypt_data) to properly
free the created object.

With this, the test encryption.innodb-redo-badkey still reports
"Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT"
but does not fail. The misleading message should be corrected,
maybe as part of MDEV-12699.
2017-05-09 13:40:42 +03:00
Marko Mäkelä
6c91be54b1 MDEV-11520 Properly retry posix_fallocate() on EINTR
We only want to retry posix_fallocate() on EINTR as long as the system
is not being shut down. We do not want to retry on any other (hard) error.

Thanks to Jocelyn Fournier for quickly noticing the mistake in my
previous commit.
2017-05-06 15:50:09 +03:00
Marko Mäkelä
baad0f3484 MDEV-11520 post-fix: Retry posix_fallocate() on EINTR in fil_ibd_create()
Earlier versions of MariaDB only use posix_fallocate() when extending
data files, not when initially creating the files,
2017-05-06 14:33:14 +03:00
Marko Mäkelä
f9cc391863 Merge 10.1 into 10.2
This only merges MDEV-12253, adapting it to MDEV-12602 which is already
present in 10.2 but not yet in the 10.1 revision that is being merged.

TODO: Error handling in crash recovery needs to be improved.
If a page cannot be decrypted (or read), we should cleanly abort
the startup. If innodb_force_recovery is specified, we should
ignore the problematic page and apply redo log to other pages.
Currently, the test encryption.innodb-redo-badkey randomly fails
like this (the last messages are from cmake -DWITH_ASAN):

2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption
2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown...
i=================================================================
==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0
    #0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750)
    #1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4
    #2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17
    #3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3
    #4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2
    #5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2
    #6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
2017-05-05 10:38:53 +03:00
Jan Lindström
acce1f37c2 MDEV-12624: encryption.innodb_encryption_tables fails in buildbot with timeout
This regression was caused by MDEV-12467 encryption.create_or_replace
hangs during DROP TABLE, where if table->is_stopping() (i.e. when
tablespace is dropped) background key rotation thread calls
fil_crypt_complete_rotate_space to release space and stop rotation.
However, that function does not decrease number of rotating
threads if table->is_stopping() is true.
2017-05-02 08:09:16 +03:00
Sergei Golubchik
0072d2e9a1 InnoDB cleanup: remove a bunch of #ifdef UNIV_INNOCHECKSUM
innochecksum uses global variables. great, let's use them all the
way down, instead of passing them as arguments to innodb internals,
conditionally modifying function prototypes with #ifdefs
2017-04-30 14:58:11 +02:00
Jan Lindström
935a1c676e MDEV-12623: InnoDB: Failing assertion: kv == 0
|| kv >= crypt_data->min_key_version,
encryption.innodb_encryption_tables failed in buildbot.

Now that key_version is not stored when page is read to
buf_page_t::key_version but always read from actual page
this assertion is not always valid.
2017-04-29 10:05:39 +03:00
Marko Mäkelä
b82c602db5 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

The following parts of this fix differ from the 10.2 version of this fix:

buf_page_get_corrupt(): Add a tablespace parameter.

In 10.2, we already had a two-phase process of freeing fil_space objects
(first, fil_space_detach(), then release fil_system->mutex, and finally
free the fil_space and fil_node objects).

fil_space_free_and_mutex_exit(): Renamed from fil_space_free().
Detach the tablespace from the fil_system cache, release the
fil_system->mutex, and then wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread.
During the wait, future calls to fil_space_acquire_for_io() will
not find this tablespace, and the count can only be decremented to 0,
at which point it is safe to free the objects.

fil_node_free_part1(), fil_node_free_part2(): Refactored from
fil_node_free().
2017-04-28 14:12:52 +03:00
Marko Mäkelä
4b24467ff3 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

fil_space_free_low(): Wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread. Future
calls to fil_space_acquire_for_io() will not find this tablespace,
because it will already have been detached from fil_system.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

FIXME: buf_page_check_corrupt() should take a tablespace from
fil_space_acquire_for_io() as a parameter. This will be done
in the 10.1 version of this patch and merged from there.
That depends on MDEV-12253, which has not been merged from 10.1 yet.
2017-04-28 12:23:35 +03:00
Marko Mäkelä
f740d23ce6 Merge 10.1 into 10.2 2017-04-28 12:22:32 +03:00
Thirunarayanan Balathandayuthapani
2ef1baa75f Bug #24793413 LOG PARSING BUFFER OVERFLOW
Problem:
========
During checkpoint, we are writing all MLOG_FILE_NAME records in one mtr
and parse buffer can't be processed till MLOG_MULTI_REC_END. Eventually parse
buffer exceeds the RECV_PARSING_BUF_SIZE and eventually it overflows.

Fix:
===
1) Break the large mtr if it exceeds LOG_CHECKPOINT_FREE_PER_THREAD into multiple mtr during checkpoint.
2) Move the parsing buffer if we are encountering only MLOG_FILE_NAME
records. So that it will never exceed the RECV_PARSING_BUF_SIZE.

Reviewed-by: Debarun Bannerjee <debarun.bannerjee@oracle.com>
Reviewed-by: Rahul M Malik <rahul.m.malik@oracle.com>
RB: 14743
2017-04-26 23:03:32 +03:00
Marko Mäkelä
849af74a48 MariaDB adjustments for Oracle Bug#23070734 fix
Split the test case so that a server restart is not needed.
Reduce the test cases and use a simpler mechanism for triggering
and waiting for purge.

fil_table_accessible(): Check if a table can be accessed without
enjoying MDL protection.
2017-04-26 23:03:32 +03:00
Aditya A
62dca454e7 Bug #23070734 CONCURRENT TRUNCATE TABLES CAUSE STALLS
PROBLEM

When truncating single tablespace tables, we need to scan the entire
buffer pool to remove the pages of the table from the buffer pool.
During this scan and removal dict_sys->mutex is being held ,causing
stalls in other DDL operations.

FIX

Release the dict_sys->mutex during the scan and reacquire it after the
scan. Make sure that purge thread doesn't purge the records of the table
being truncated and background stats collection thread skips the updation
of stats for the table being truncated.

[#rb 14564 Approved by Jimmy and satya ]
2017-04-26 23:03:31 +03:00
Jan Lindström
765a43605a MDEV-12253: Buffer pool blocks are accessed after they have been freed
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.

This patch should also address following test failures and
bugs:

MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.

MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot

MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot

MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8

Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.

Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().

Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.

btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.

buf_page_get_zip(): Issue error if page read fails.

buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.

buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.

buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.

buf_page_io_complete(): use dberr_t for error handling.

buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
        Issue error if page read fails.

btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.

Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()

dict_stats_update_transient_for_index(),
dict_stats_update_transient()
        Do not continue if table decryption failed or table
        is corrupted.

dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.

fil_parse_write_crypt_data():
        Check that key read from redo log entry is found from
        encryption plugin and if it is not, refuse to start.

PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.

Fixed error code on innodb.innodb test.

Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2.  Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.

Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.

fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.

Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.

Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.

fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.

recv_apply_hashed_log_recs() does not return anything.
2017-04-26 15:19:16 +03:00
Marko Mäkelä
14d124880f Fix a crash when page_compression fails during IMPORT TABLESPACE
fil_compress_page(): Check for space==NULL.
2017-04-21 18:44:37 +03:00
Marko Mäkelä
200ef51344 Fix a compilation error 2017-04-21 18:29:50 +03:00
Marko Mäkelä
0871a00a62 MDEV-12545 Reduce the amount of fil_space_t lookups
buf_flush_write_block_low(): Acquire the tablespace reference once,
and pass it to lower-level functions. This is only a start; further
calls may be removed.

fil_decompress_page(): Remove unsafe use of fil_space_get_by_id().
2017-04-21 18:12:10 +03:00