Commit graph

497 commits

Author SHA1 Message Date
Marko Mäkelä
cd623508df buf_read_ibuf_merge_pages(): Discard all entries for a missing tablespace
A similar change was contributed to Percona XtraBackup, but for some
reason, it is not present in Percona XtraDB. Since MDEV-9566
(MariaDB 10.1.23), that change is present in the MariaDB XtraDB.
2017-06-29 22:39:06 +03:00
Marko Mäkelä
3e1d0ff574 Fix a merge error in commit 8f643e2063
A merge error caused InnoDB bootstrap to fail when
innodb_undo_tablespaces was set to more than 2.
This was because of a bug that was introduced to
srv_undo_tablespaces_init() by the merge.

Furthermore, some adjustments for Oracle Bug#25551311 aka
Bug#23517560 changes were forgotten. We must minimize direct
references to srv_undo_tablespaces_open and use predicates
instead.

srv_undo_tablespaces_init(): Increment srv_undo_tablespaces_open
once, not twice, per loop iteration.

is_system_or_undo_tablespace(): Remove (unused function).

is_predefined_tablespace(): Invoke srv_is_undo_tablespace().
2017-06-27 21:23:12 +03:00
Marko Mäkelä
0d69d313a1 Prevent interleaved error log output on InnoDB startup
buf_flush_page_cleaner_coordinator(): Signal the thread creator
that the error log output regarding setpriority() has been issued.

innobase_start_or_create_for_mysql(): Wait for
buf_flush_page_cleaner_coordinator() to completely start up.

This prevents sporadic failures of tests that search the server
error log for InnoDB redo log recovery messages.
2017-06-23 22:54:42 +03:00
Marko Mäkelä
a71c870ebf InnoDB: Reset unknown TRX_SYS page type to FIL_PAGE_TYPE_TRX_SYS
buf_flush_init_for_writing(): Reset the FIL_PAGE_TYPE
of the TRX_SYS page to the canonical value
FIL_PAGE_TYPE_TRX_SYS instead of FIL_PAGE_TYPE_UNKNOWN.
2017-06-22 09:39:04 +03:00
Marko Mäkelä
50faeda4d6 Remove trx_t::has_search_latch and simplify debug code
When the btr_search_latch was split into an array of latches
in MySQL 5.7.8 as part of the Oracle Bug#20985298 fix, the "caching"
of the latch across storage engine API calls was removed, and
the field trx->has_search_latch would only be set during a short
time frame in the execution of row_search_mvcc(), which was
formerly called row_search_for_mysql().

This means that the column
INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_LATCHED will always
report 0. That column cannot be removed in MariaDB 10.2, but it
can be removed in future releases.

trx_t::has_search_latch: Remove.

trx_assert_no_search_latch(): Remove.

row_sel_try_search_shortcut_for_mysql(): Remove a redundant condition
on trx->has_search_latch (it was always true).

sync_check_iterate(): Make the parameter const.

sync_check_functor_t: Make the operator() const, and remove result()
and the virtual destructor. There is no need to have mutable state
in the functors.

sync_checker<bool>: Replaces dict_sync_check and btrsea_sync_check.

sync_check: Replaces btrsea_sync_check.

dict_sync_check: Instantiated from sync_checker.

sync_allowed_latches: Use std::find() directly on the array.
Remove the std::vector.

TrxInInnoDB::enter(), TrxInInnoDB::exit(): Remove obviously redundant
debug assertions on trx->in_depth, and use equality comparison against 0
because it could be more efficient on some architectures.
2017-06-16 13:17:05 +03:00
Marko Mäkelä
a78476d342 Merge 10.1 into 10.2 2017-06-12 17:43:07 +03:00
Marko Mäkelä
fa57479fcd Merge 10.0 into 10.1 2017-06-12 14:26:32 +03:00
Marko Mäkelä
417434f12d MDEV-13039 innodb_fast_shutdown=0 may fail to purge all undo log
When a slow shutdown is performed soon after spawning some work for
background threads that can create or commit transactions, it is possible
that new transactions are started or committed after the purge has finished.
This is violating the specification of innodb_fast_shutdown=0, namely that
the purge must be completed. (None of the history of the recent transactions
would be purged.)

Also, it is possible that the purge threads would exit in slow shutdown
while there exist active transactions, such as recovered incomplete
transactions that are being rolled back. Thus, the slow shutdown could
fail to purge some undo log that becomes purgeable after the transaction
commit or rollback.

srv_undo_sources: A flag that indicates if undo log can be generated
or the persistent, whether by background threads or by user SQL.
Even when this flag is clear, active transactions that already exist
in the system may be committed or rolled back.

innodb_shutdown(): Renamed from innobase_shutdown_for_mysql().
Do not return an error code; the operation never fails.
Clear the srv_undo_sources flag, and also ensure that the background
DROP TABLE queue is empty.

srv_purge_should_exit(): Do not allow the purge to exit if
srv_undo_sources are active or the background DROP TABLE queue is not
empty, or in slow shutdown, if any active transactions exist
(and are being rolled back).

srv_purge_coordinator_thread(): Remove some previous workarounds
for this bug.

innobase_start_or_create_for_mysql(): Set buf_page_cleaner_is_active
and srv_dict_stats_thread_active directly. Set srv_undo_sources before
starting the purge subsystem, to prevent immediate shutdown of the purge.
Create dict_stats_thread and fts_optimize_thread immediately
after setting srv_undo_sources, so that shutdown can use this flag to
determine if these subsystems were started.

dict_stats_shutdown(): Shut down dict_stats_thread. Backported from 10.2.

srv_shutdown_table_bg_threads(): Remove (unused).
2017-06-09 16:20:42 +03:00
Jan Lindström
58c56dd7f8 MDEV-12610: MariaDB start is slow
Problem appears to be that the function fsp_flags_try_adjust()
is being unconditionally invoked on every .ibd file on startup.
Based on performance investigation also the top function
fsp_header_get_crypt_offset() needs to addressed.

Ported implementation of fsp_header_get_encryption_offset()
function from 10.2 to fsp_header_get_crypt_offset().

Introduced a new function fil_crypt_read_crypt_data()
to read page 0 if it is not yet read.

fil_crypt_find_space_to_rotate(): Now that page 0 for every .ibd
file is not read on startup we need to check has page 0 read
from space that we investigate for key rotation, if it is not read
we read it.

fil_space_crypt_get_status(): Now that page 0 for every .ibd
file is not read on startup here also we need to read page 0
if it is not yet read it. This is needed
as tests use IS query to wait until background encryption
or decryption has finished and this function is used to
produce results.

fil_crypt_thread(): Add is_stopping condition for tablespace
so that we do not rotate pages if usage of tablespace should
be stopped. This was needed for failure seen on regression
testing.

fil_space_create: Remove page_0_crypt_read and extra
unnecessary info output.

fil_open_single_table_tablespace(): We call fsp_flags_try_adjust
only when when no errors has happened and server was not started
on read only mode and tablespace validation was requested or
flags contain other table options except low order bits to
FSP_FLAGS_POS_PAGE_SSIZE position.

fil_space_t::page_0_crypt_read removed.

Added test case innodb-first-page-read to test startup when
encryption is on and when encryption is off to check that not
for all tables page 0 is read on startup.
2017-06-09 13:15:39 +03:00
Marko Mäkelä
2d8fdfbde5 Merge 10.1 into 10.2
Replace have_innodb_zip.inc with innodb_page_size_small.inc.
2017-06-08 12:45:08 +03:00
Marko Mäkelä
fbeb9489cd Cleanup of MDEV-12600: crash during install_db with innodb_page_size=32K and ibdata1=3M
The doublewrite buffer pages must fit in the first InnoDB system
tablespace data file. The checks that were added in the initial patch
(commit 112b21da37)
were at too high level and did not cover all cases.

innodb.log_data_file_size: Test all innodb_page_size combinations.

fsp_header_init(): Never return an error. Move the change buffer creation
to the only caller that needs to do it.

btr_create(): Clean up the logic. Remove the error log messages.

buf_dblwr_create(): Try to return an error on non-fatal failure.
Check that the first data file is big enough for creating the
doublewrite buffers.

buf_dblwr_process(): Check if the doublewrite buffer is available.
Display the message only if it is available.

recv_recovery_from_checkpoint_start_func(): Remove a redundant message
about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already
been initiated.

fil_report_invalid_page_access(): Simplify the message.

fseg_create_general(): Do not emit messages to the error log.

innobase_init(): Revert the changes.

trx_rseg_create(): Refactor (no functional change).
2017-06-08 11:55:47 +03:00
Marko Mäkelä
6e0e5eefbc Fix some printf format type mismatch 2017-06-06 12:04:29 +03:00
Marko Mäkelä
30df297c2f Merge 10.0 into 10.1
Rewrite the test encryption.innodb-checksum-algorithm not to
require any restarts or re-bootstrapping, and to cover all
innodb_page_size combinations.

Test innodb.101_compatibility with all innodb_page_size combinations.
2017-06-06 10:59:54 +03:00
Jan Lindström
6b6987154a MDEV-12114: install_db shows corruption for rest encryption and innodb_checksum_algorithm=strict_none
Problem was that checksum check resulted false positives that page is
both not encrypted and encryted when checksum_algorithm was
strict_none.

Encrypton checksum will use only crc32 regardless of setting.

buf_zip_decompress: If compression fails report a error message
containing the space name if available (not available during import).
And note if space could be encrypted.

buf_page_get_gen: Do not assert if decompression fails,
instead unfix the page and return NULL to upper layer.

fil_crypt_calculate_checksum: Use only crc32 method.

fil_space_verify_crypt_checksum: Here we need to check
crc32, innodb and none method for old datafiles.

fil_space_release_for_io: Allow null space.

encryption.innodb-compressed-blob is now run with crc32 and none
combinations.

Note that with none and strict_none method there is not really
a way to detect page corruptions and page corruptions after
decrypting the page with incorrect key.

New test innodb-checksum-algorithm to test different checksum
algorithms with encrypted, row compressed and page compressed
tables.
2017-06-01 14:07:48 +03:00
Jan Lindström
1af8bf39ca MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M;
Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for
encrypted pages even in system datafiles should contain key_version
except very first page (0:0) is after encryption overwritten with
flush lsn.

Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1
The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during
InnoDB startup.

At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
from the first page of each file in the InnoDB system tablespace.
If there are multiple files, the minimum and maximum LSN can differ.
These numbers are passed to InnoDB startup.

Having the number in other files than the first file of the InnoDB
system tablespace is not providing much additional value. It is
conflicting with other use of the field, such as on InnoDB R-tree
index pages and encryption key_version.

This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to
other files than the first file of the InnoDB system tablespace
(page number 0:0) when system tablespace is encrypted. If tablespace
is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
to all first pages of system tablespace to avoid unnecessary
warnings on downgrade.

open_or_create_data_files(): pass only one flushed_lsn parameter

xb_load_tablespaces(): pass only one flushed_lsn parameter.

buf_page_create(): Improve comment about where
FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set.

fil_write_flushed_lsn(): A new function, merged from
fil_write_lsn_and_arch_no_to_file() and
fil_write_flushed_lsn_to_data_files().
Only write to the first page of the system tablespace (page 0:0)
if tablespace is encrypted, or write all first pages of system
tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE)
afterwards.

fil_read_first_page(): read flush_lsn and crypt_data only from
first datafile.

fil_open_single_table_tablespace(): Remove output of LSN, because it
was only valid for the system tablespace and the undo tablespaces, not
user tablespaces.

fil_validate_single_table_tablespace(): Remove output of LSN.

checkpoint_now_set(): Use fil_write_flushed_lsn and output
a error if operation fails.

Remove lsn variable from fsp_open_info.

recv_recovery_from_checkpoint_start(): Remove unnecessary second
flush_lsn parameter.

log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn
and output error if it fails.

open_or_create_data_files(): Pass only one flushed_lsn variable.
2017-06-01 14:07:48 +03:00
Marko Mäkelä
c2ef0bb6ce Merge 5.5 into 10.0 2017-05-29 13:15:36 +03:00
Marko Mäkelä
2cb94aa1b7 MDEV-11626 innodb.innodb-change-buffer-recovery fails for xtradb
buf_page_get_gen(): Remove the error log messages about
page flushing and eviction when
innodb_change_buffering_debug=1 is in effect.
2017-05-29 13:07:23 +03:00
Marko Mäkelä
8f643e2063 Merge 10.1 into 10.2 2017-05-23 11:09:47 +03:00
Jan Lindström
90c52e5291 MDEV-12615: InnoDB page compression method snappy mostly does not compress pages
Snappy compression method require that output buffer
used for compression is bigger than input buffer.
Similarly lzo require additional work memory buffer.
Increase the allocated buffer accordingly.

buf_tmp_buffer_t: removed unnecessary lzo_mem, crypt_buf_free and
comp_buf_free.

buf_pool_reserve_tmp_slot: use alligned_alloc and if snappy
available allocate size based on snappy_max_compressed_length and
if lzo is available increase buffer by LZO1X_1_15_MEM_COMPRESS.

fil_compress_page: Remove unneeded lzo mem (we use same buffer)
and if output buffer is not yet allocated allocate based similarly
as above.

Decompression does not require additional work area.

    Modify test to use same test as other compression method tests.
2017-05-20 21:51:34 +03:00
Marko Mäkelä
a4d4a5fe82 After-merge fix for MDEV-11638
In commit 360a4a0372
some debug assertions were introduced to the page flushing code
in XtraDB. Add these assertions to InnoDB as well, and adjust
the InnoDB shutdown so that these assertions will not fail.

logs_empty_and_mark_files_at_shutdown(): Advance
srv_shutdown_state from the first phase SRV_SHUTDOWN_CLEANUP
only after no page-dirtying activity is possible
(well, except by srv_master_do_shutdown_tasks(), which will be
fixed separately in MDEV-12052).

rotate_thread_t::should_shutdown(): Already exit the key rotation
threads at the first phase of shutdown (SRV_SHUTDOWN_CLEANUP).

page_cleaner_sleep_if_needed(): Do not sleep during shutdown.
This change is originally from XtraDB.
2017-05-20 08:41:34 +03:00
Marko Mäkelä
65e1399e64 Merge 10.0 into 10.1
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
2017-05-20 08:41:20 +03:00
Marko Mäkelä
907cbadb6f Fix some -Wimplicit-fallthrough warnings in InnoDB
buf_read_ahead_linear(): Do not display a message if the tablespace
is being deleted.

dtype_print(): Add a missing break statement.
2017-05-19 22:58:59 +03:00
Marko Mäkelä
13a350ac29 Merge 10.0 into 10.1 2017-05-19 12:29:37 +03:00
Vicențiu Ciorbaru
45898c2092 Merge remote-tracking branch 'origin/10.0' into 10.0 2017-05-18 15:45:55 +03:00
Vicențiu Ciorbaru
b87873b221 Merge branch 'merge-innodb-5.6' into bb-10.0-vicentiu
This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293
from current 5.6.36 innodb.

Bug #23481444	OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE
                       INDEX APPLIED BY UNCOMMITTED ROW
Problem:
========
row_search_for_mysql() does whole table traversal for range query
even though the end range is passed. Whole table traversal happens
when the record is not with in transaction read view.

Solution:
=========

Convert the innodb last record of page to mysql format and compare
with end range if the traversal of row_search_mvcc() exceeds 100,
no ICP involved. If it is out of range then InnoDB can avoid the
whole table traversal. Need to refactor the code little bit to
make it compile.

Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com>
Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com>
RB: 14660
2017-05-17 14:53:28 +03:00
Marko Mäkelä
956d2540c4 Remove redundant UT_LIST_INIT() calls
The macro UT_LIST_INIT() zero-initializes the UT_LIST_NODE.
There is no need to call this macro on a buffer that has
already been zero-initialized by mem_zalloc() or mem_heap_zalloc()
or similar.

For some reason, the statement UT_LIST_INIT(srv_sys->tasks) in
srv_init() caused a SIGSEGV on server startup when compiling with
GCC 7.1.0 for AMD64 using -O3. The zero-initialization was attempted
by the instruction movaps %xmm0,0x50(%rax), while the proper offset
of srv_sys->tasks would seem to have been 0x48.
2017-05-17 10:33:49 +03:00
Vicențiu Ciorbaru
0af9818240 5.6.36 2017-05-15 17:17:16 +03:00
Marko Mäkelä
b9474b5d51 Merge 10.1 into 10.2 2017-05-12 13:29:24 +03:00
Marko Mäkelä
03dca7a333 Merge 10.0 into 10.1 2017-05-12 13:12:45 +03:00
Marko Mäkelä
ff16609374 MDEV-12674 Innodb_row_lock_current_waits has overflow
There is a race condition related to the variable
srv_stats.n_lock_wait_current_count, which is only
incremented and decremented by the function lock_wait_suspend_thread(),

The incrementing is protected by lock_sys->wait_mutex, but the
decrementing does not appear to be protected by anything.
This mismatch could allow the counter to be corrupted when a
transactional InnoDB table or record lock wait is terminating
roughly at the same time with the start of a wait on a
(possibly different) lock.

ib_counter_t: Remove some unused methods. Prevent instantiation for N=1.
Add an inc() method that takes a slot index as a parameter.

single_indexer_t: Remove.

simple_counter<typename Type, bool atomic=false>: A new counter wrapper.
Optionally use atomic memory operations for modifying the counter.
Aligned to the cache line size.

lsn_ctr_1_t, ulint_ctr_1_t, int64_ctr_1_t: Define as simple_counter<Type>.
These counters are either only incremented (and we do not care about
losing some increment operations), or the increment/decrement operations
are protected by some mutex.

srv_stats_t::os_log_pending_writes: Document that the number is protected
by log_sys->mutex.

srv_stats_t::n_lock_wait_current_count: Use simple_counter<ulint, true>,
that is, atomic inc() and dec() operations.

lock_wait_suspend_thread(): Release the mutexes before incrementing
the counters. Avoid acquiring the lock mutex if the lock wait has
already been resolved. Atomically increment and decrement
srv_stats.n_lock_wait_current_count.

row_insert_for_mysql(), row_update_for_mysql(),
row_update_cascade_for_mysql(): Use the inc() method with the trx->id
as the slot index. This is a non-functional change, just using
inc() instead of add(1).

buf_LRU_get_free_block(): Replace the method add(index, n) with inc().
There is no slot index in the simple_counter.
2017-05-12 12:24:53 +03:00
Marko Mäkelä
9d2c1d09aa MDEV-12253 post-push fix: buf_read_page_low() can return DB_ERROR
The function buf_read_page_low() invokes fil_io(), which can return
DB_ERROR when the requested page is out of bounds (such as when
restoring a buffer pool dump). The callers should be handling that.
2017-05-09 14:36:15 +03:00
Marko Mäkelä
3f5a8cbe12 MDEV-12253 post-push fix: buf_read_page_low() can return DB_ERROR
The function buf_read_page_low() invokes fil_io(), which can return
DB_ERROR when the requested page is out of bounds (such as when
restoring a buffer pool dump). The callers should be handling that.
2017-05-06 23:10:52 +03:00
Marko Mäkelä
14c6f00a9f Merge 10.1 into 10.2
Also, include fixes by Vladislav Vaintroub to the
aws_key_management plugin. The AWS C++ SDK specifically depends on
OPENSSL_LIBRARIES, not generic SSL_LIBRARIES (such as YaSSL).
2017-05-06 14:36:46 +03:00
Marko Mäkelä
f9cc391863 Merge 10.1 into 10.2
This only merges MDEV-12253, adapting it to MDEV-12602 which is already
present in 10.2 but not yet in the 10.1 revision that is being merged.

TODO: Error handling in crash recovery needs to be improved.
If a page cannot be decrypted (or read), we should cleanly abort
the startup. If innodb_force_recovery is specified, we should
ignore the problematic page and apply redo log to other pages.
Currently, the test encryption.innodb-redo-badkey randomly fails
like this (the last messages are from cmake -DWITH_ASAN):

2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption
2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown...
i=================================================================
==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0
    #0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750)
    #1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4
    #2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17
    #3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3
    #4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2
    #5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2
    #6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
2017-05-05 10:38:53 +03:00
Sergei Golubchik
0072d2e9a1 InnoDB cleanup: remove a bunch of #ifdef UNIV_INNOCHECKSUM
innochecksum uses global variables. great, let's use them all the
way down, instead of passing them as arguments to innodb internals,
conditionally modifying function prototypes with #ifdefs
2017-04-30 14:58:11 +02:00
Marko Mäkelä
b82c602db5 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

The following parts of this fix differ from the 10.2 version of this fix:

buf_page_get_corrupt(): Add a tablespace parameter.

In 10.2, we already had a two-phase process of freeing fil_space objects
(first, fil_space_detach(), then release fil_system->mutex, and finally
free the fil_space and fil_node objects).

fil_space_free_and_mutex_exit(): Renamed from fil_space_free().
Detach the tablespace from the fil_system cache, release the
fil_system->mutex, and then wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread.
During the wait, future calls to fil_space_acquire_for_io() will
not find this tablespace, and the count can only be decremented to 0,
at which point it is safe to free the objects.

fil_node_free_part1(), fil_node_free_part2(): Refactored from
fil_node_free().
2017-04-28 14:12:52 +03:00
Marko Mäkelä
4b24467ff3 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

fil_space_free_low(): Wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread. Future
calls to fil_space_acquire_for_io() will not find this tablespace,
because it will already have been detached from fil_system.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

FIXME: buf_page_check_corrupt() should take a tablespace from
fil_space_acquire_for_io() as a parameter. This will be done
in the 10.1 version of this patch and merged from there.
That depends on MDEV-12253, which has not been merged from 10.1 yet.
2017-04-28 12:23:35 +03:00
Marko Mäkelä
f740d23ce6 Merge 10.1 into 10.2 2017-04-28 12:22:32 +03:00
Marko Mäkelä
67e9c4cf6c Adapt the test case for Oracle Bug#25385590
buf_chunk_not_freed(), logs_empty_and_mark_files_at_shutdown():
Relax debug assertions when innodb_force_recovery=6
implies innodb_read_only.
2017-04-26 23:03:33 +03:00
Thirunarayanan Balathandayuthapani
88a84f49b3 Bug #25167032 CRASH WHEN ASSIGNING MY_ERRNO - MISSING MY_THREAD_INIT IN BACKGROUND THREAD
Description:
===========
Add my_thread_init() and my_thread_exit() for background threads which
initializes and frees the st_my_thread_var structure.

Reviewed-by: Jimmy Yang<jimmy.yang@oracle.com>
RB: 15003
2017-04-26 23:03:32 +03:00
rahul malik
dbe4c4e354 BUG#25330449 ASSERT SIZE==SPACE->SIZE DURING BUF_READ_AHEAD_RANDOM
Problem:

During read head, wrong page size is used to calcuate the tablespace size.

Fix:

Use physical page size to calculate tablespace size

Reveiwed-By: Satya Bodapati
RB: 14993
2017-04-26 23:03:32 +03:00
Darshan M N
16ed1f9c31 BUG#25053705 INVALID I/O ON TABLE AFTER TRUNCATE
Issue:
======
The issue is that if a fts index is present in a table the space size is
incorrectly calculated in the case of truncate which results in a invalid
read.

Fix:
====
Have a different space size calculation in truncate if fts indexes are
present.

RB:14755
Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com>
2017-04-26 23:03:30 +03:00
Knut Anders Hatlen
9df0426103 Bug#25048573: STD::MAP INSTANTIATIONS CAUSE STATIC ASSERT FAILURES ON FREEBSD 11
Problem: Some instantiations of std::map have discrepancies between
the value_type of the map and the value_type of the map's allocator.
On FreeBSD 11 this is detected by Clang, and an error is raised at
compilation time.

Fix: Specify the correct value_type for the allocators.

Also fix an unused variable warning in storage/innobase/os/os0file.cc.
2017-04-26 23:03:29 +03:00
Marko Mäkelä
da76c1bd3e Minor cleanup 2017-04-26 23:03:28 +03:00
Marko Mäkelä
472b5f0d1f Follow-up to Bug#24346574 PAGE CLEANER THREAD, ASSERT BLOCK->N_POINTERS == 0
Silence the Valgrind warnings on instrumented builds (-DWITH_VALGRIND).

assert_block_ahi_empty_on_init(): A variant of
assert_block_ahi_empty() that declares n_pointers initialized and then
asserts that n_pointers==0.

In Valgrind-instrumented builds, InnoDB declares allocated memory
uninitialized.
2017-04-26 23:03:27 +03:00
Marko Mäkelä
e63ead68bf Bug#24346574 PAGE CLEANER THREAD, ASSERT BLOCK->N_POINTERS == 0
btr_search_drop_page_hash_index(): Do not return before ensuring
that block->index=NULL, even if !btr_search_enabled. We would
typically still skip acquiring the AHI latch when the AHI is
disabled, because block->index would already be NULL. Only if the AHI
is in the process of being disabled, we would wait for the AHI latch
and then notice that block->index=NULL and return.

The above bug was a regression caused in MySQL 5.7.9 by the fix of
Bug#21407023: DISABLING AHI SHOULD AVOID TAKING AHI LATCH

The rest of this patch improves diagnostics by adding assertions.

assert_block_ahi_valid(): A debug predicate for checking that
block->n_pointers!=0 implies block->index!=NULL.

assert_block_ahi_empty(): A debug predicate for checking that
block->n_pointers==0.

buf_block_init(): Instead of assigning block->n_pointers=0,
assert_block_ahi_empty(block).

buf_pool_clear_hash_index(): Clarify comments, and assign
block->n_pointers=0 before assigning block->index=NULL.
The wrong ordering could make block->n_pointers appear incorrect in
debug assertions. This bug was introduced in MySQL 5.1.52 by
Bug#13006367 62487: INNODB TAKES 3 MINUTES TO CLEAN UP THE
ADAPTIVE HASH INDEX AT SHUTDOWN

i_s_innodb_buffer_page_get_info(): Add a comment that
the IS_HASHED column in the INFORMATION_SCHEMA views
INNODB_BUFFER_POOL_PAGE and INNODB_BUFFER_PAGE_LRU may
show false positives (there may be no pointers after all.)

ha_insert_for_fold_func(), ha_delete_hash_node(),
ha_search_and_update_if_found_func(): Use atomics for
updating buf_block_t::n_pointers. While buf_block_t::index is
always protected by btr_search_x_lock(index), in
ha_insert_for_fold_func() the n_pointers-- may belong to
another dict_index_t whose btr_search_latches[] we are not holding.

RB: 13879
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
2017-04-26 23:03:27 +03:00
Jan Lindström
765a43605a MDEV-12253: Buffer pool blocks are accessed after they have been freed
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.

This patch should also address following test failures and
bugs:

MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.

MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot

MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot

MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8

Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.

Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().

Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.

btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.

buf_page_get_zip(): Issue error if page read fails.

buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.

buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.

buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.

buf_page_io_complete(): use dberr_t for error handling.

buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
        Issue error if page read fails.

btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.

Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()

dict_stats_update_transient_for_index(),
dict_stats_update_transient()
        Do not continue if table decryption failed or table
        is corrupted.

dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.

fil_parse_write_crypt_data():
        Check that key read from redo log entry is found from
        encryption plugin and if it is not, refuse to start.

PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.

Fixed error code on innodb.innodb test.

Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2.  Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.

Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.

fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.

Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.

Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.

fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.

recv_apply_hashed_log_recs() does not return anything.
2017-04-26 15:19:16 +03:00
Marko Mäkelä
0871a00a62 MDEV-12545 Reduce the amount of fil_space_t lookups
buf_flush_write_block_low(): Acquire the tablespace reference once,
and pass it to lower-level functions. This is only a start; further
calls may be removed.

fil_decompress_page(): Remove unsafe use of fil_space_get_by_id().
2017-04-21 18:12:10 +03:00
Marko Mäkelä
47141c9d9b Fix some InnoDB type mismatch
On 64-bit Windows, sizeof(ulint)!=sizeof(ulong).
2017-04-21 18:04:02 +03:00
Marko Mäkelä
5684aa220c MDEV-12488 Remove type mismatch in InnoDB printf-like calls
Alias the InnoDB ulint and lint data types to size_t and ssize_t,
which are the standard names for the machine-word-width data types.

Correspondingly, define ULINTPF as "%zu" and introduce ULINTPFx as "%zx".
In this way, better compiler warnings for type mismatch are possible.

Furthermore, use PRIu64 for that 64-bit format, and define
the feature macro __STDC_FORMAT_MACROS to enable it on Red Hat systems.

Fix some errors in error messages, and replace some error messages
with assertions.
Most notably, an IMPORT TABLESPACE error message in InnoDB was
displaying the number of columns instead of the mismatching flags.
2017-04-21 18:03:15 +03:00