Commit graph

237 commits

Author SHA1 Message Date
Marko Mäkelä
13a350ac29 Merge 10.0 into 10.1 2017-05-19 12:29:37 +03:00
Marko Mäkelä
956d2540c4 Remove redundant UT_LIST_INIT() calls
The macro UT_LIST_INIT() zero-initializes the UT_LIST_NODE.
There is no need to call this macro on a buffer that has
already been zero-initialized by mem_zalloc() or mem_heap_zalloc()
or similar.

For some reason, the statement UT_LIST_INIT(srv_sys->tasks) in
srv_init() caused a SIGSEGV on server startup when compiling with
GCC 7.1.0 for AMD64 using -O3. The zero-initialization was attempted
by the instruction movaps %xmm0,0x50(%rax), while the proper offset
of srv_sys->tasks would seem to have been 0x48.
2017-05-17 10:33:49 +03:00
Marko Mäkelä
03dca7a333 Merge 10.0 into 10.1 2017-05-12 13:12:45 +03:00
Marko Mäkelä
ff16609374 MDEV-12674 Innodb_row_lock_current_waits has overflow
There is a race condition related to the variable
srv_stats.n_lock_wait_current_count, which is only
incremented and decremented by the function lock_wait_suspend_thread(),

The incrementing is protected by lock_sys->wait_mutex, but the
decrementing does not appear to be protected by anything.
This mismatch could allow the counter to be corrupted when a
transactional InnoDB table or record lock wait is terminating
roughly at the same time with the start of a wait on a
(possibly different) lock.

ib_counter_t: Remove some unused methods. Prevent instantiation for N=1.
Add an inc() method that takes a slot index as a parameter.

single_indexer_t: Remove.

simple_counter<typename Type, bool atomic=false>: A new counter wrapper.
Optionally use atomic memory operations for modifying the counter.
Aligned to the cache line size.

lsn_ctr_1_t, ulint_ctr_1_t, int64_ctr_1_t: Define as simple_counter<Type>.
These counters are either only incremented (and we do not care about
losing some increment operations), or the increment/decrement operations
are protected by some mutex.

srv_stats_t::os_log_pending_writes: Document that the number is protected
by log_sys->mutex.

srv_stats_t::n_lock_wait_current_count: Use simple_counter<ulint, true>,
that is, atomic inc() and dec() operations.

lock_wait_suspend_thread(): Release the mutexes before incrementing
the counters. Avoid acquiring the lock mutex if the lock wait has
already been resolved. Atomically increment and decrement
srv_stats.n_lock_wait_current_count.

row_insert_for_mysql(), row_update_for_mysql(),
row_update_cascade_for_mysql(): Use the inc() method with the trx->id
as the slot index. This is a non-functional change, just using
inc() instead of add(1).

buf_LRU_get_free_block(): Replace the method add(index, n) with inc().
There is no slot index in the simple_counter.
2017-05-12 12:24:53 +03:00
Marko Mäkelä
9d2c1d09aa MDEV-12253 post-push fix: buf_read_page_low() can return DB_ERROR
The function buf_read_page_low() invokes fil_io(), which can return
DB_ERROR when the requested page is out of bounds (such as when
restoring a buffer pool dump). The callers should be handling that.
2017-05-09 14:36:15 +03:00
Marko Mäkelä
b82c602db5 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

The following parts of this fix differ from the 10.2 version of this fix:

buf_page_get_corrupt(): Add a tablespace parameter.

In 10.2, we already had a two-phase process of freeing fil_space objects
(first, fil_space_detach(), then release fil_system->mutex, and finally
free the fil_space and fil_node objects).

fil_space_free_and_mutex_exit(): Renamed from fil_space_free().
Detach the tablespace from the fil_system cache, release the
fil_system->mutex, and then wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread.
During the wait, future calls to fil_space_acquire_for_io() will
not find this tablespace, and the count can only be decremented to 0,
at which point it is safe to free the objects.

fil_node_free_part1(), fil_node_free_part2(): Refactored from
fil_node_free().
2017-04-28 14:12:52 +03:00
Vladislav Vaintroub
9c4b7cad27 MDEV-9566 Prepare xtradb for xtrabackup
These changes are comparable to Percona's modifications in innodb in the
Percona Xtrabackup repository.

- If functions are used in backup as well as in innodb, make them non-static.

- Define IS_XTRABACKUP() macro for special handling of innodb running
  inside backup.

- Extend some functions for backup.
  fil_space_for_table_exists_in_mem() gets additional parameter
  'remove_from_data_dict_if_does_not_exist', for partial backups

  fil_load_single_table_tablespaces() gets an optional parameter predicate
  which tells whether to load tablespace based on database or table name,
  also for partial backups.

  srv_undo_tablespaces_init() gets an optional parameter 'backup_mode'

- Allow single redo log file (for backup "prepare")

- Do not read doublewrite buffer pages in backup, they are outdated

- Add function fil_remove_invalid_table_from_data_dict(), to remove non-existing
   tables from data dictionary in case of partial backups.

- On Windows, fix file share modes when opening tablespaces,
to allow mariabackup to read tablespaces while server is online.

- Avoid access to THDVARs in backup, because innodb plugin is not loaded,
and THDVAR would crash in this case.
2017-04-27 19:12:39 +02:00
Jan Lindström
765a43605a MDEV-12253: Buffer pool blocks are accessed after they have been freed
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.

This patch should also address following test failures and
bugs:

MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.

MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot

MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot

MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8

Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.

Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().

Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.

btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.

buf_page_get_zip(): Issue error if page read fails.

buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.

buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.

buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.

buf_page_io_complete(): use dberr_t for error handling.

buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
        Issue error if page read fails.

btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.

Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()

dict_stats_update_transient_for_index(),
dict_stats_update_transient()
        Do not continue if table decryption failed or table
        is corrupted.

dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.

fil_parse_write_crypt_data():
        Check that key read from redo log entry is found from
        encryption plugin and if it is not, refuse to start.

PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.

Fixed error code on innodb.innodb test.

Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2.  Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.

Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.

fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.

Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.

Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.

fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.

recv_apply_hashed_log_recs() does not return anything.
2017-04-26 15:19:16 +03:00
Marko Mäkelä
996c7d5cb5 MDEV-12545 Reduce the amount of fil_space_t lookups
buf_flush_write_block_low(): Acquire the tablespace reference once,
and pass it to lower-level functions. This is only a start; further
calls may be removed later.
2017-04-21 17:47:23 +03:00
Marko Mäkelä
aafaf05a47 Fix some InnoDB type mismatch introduced in 10.1
On 64-bit Windows, sizeof(ulint)!=sizeof(ulong).
2017-04-21 17:43:35 +03:00
Marko Mäkelä
7445ff84f4 Merge 10.0 into 10.1 2017-04-21 17:41:40 +03:00
Marko Mäkelä
e056d1f1ca Fix some InnoDB type mismatch
On 64-bit Windows, sizeof(ulint)!=sizeof(ulong).
2017-04-21 17:39:12 +03:00
Marko Mäkelä
8c38147cdd Merge 10.0 into 10.1 2017-04-21 12:46:12 +03:00
Marko Mäkelä
d34a67b067 MDEV-12534 Use atomic operations whenever available
Allow 64-bit atomic operations on 32-bit systems,
only relying on HAVE_ATOMIC_BUILTINS_64, disregarding
the width of the register file.

Define UNIV_WORD_SIZE correctly on all systems, including Windows.
In MariaDB 10.0 and 10.1, it was incorrectly defined as 4 on
64-bit Windows.

Define HAVE_ATOMIC_BUILTINS_64 on Windows
(64-bit atomics are available on both 32-bit and 64-bit Windows
platforms; the operations were unnecessarily disabled even on
64-bit Windows).

MONITOR_OS_PENDING_READS, MONITOR_OS_PENDING_WRITES: Enable by default.

os_file_n_pending_preads, os_file_n_pending_pwrites,
os_n_pending_reads, os_n_pending_writes: Remove.
Use the monitor counters instead.

os_file_count_mutex: Remove. On a system that does not support
64-bit atomics, monitor_mutex will be used instead.
2017-04-20 16:29:12 +03:00
Marko Mäkelä
9505c96839 MDEV-12428 SIGSEGV in buf_page_decrypt_after_read() during DDL
Also, some MDEV-11738/MDEV-11581 post-push fixes.

In MariaDB 10.1, there is no fil_space_t::is_being_truncated field,
and the predicates fil_space_t::stop_new_ops and fil_space_t::is_stopping()
are interchangeable. I requested the fil_space_t::is_stopping() to be added
in the review, but some added checks for fil_space_t::stop_new_ops were
not replaced with calls to fil_space_t::is_stopping().

buf_page_decrypt_after_read(): In this low-level I/O operation, we must
look up the tablespace if it exists, even though future I/O operations
have been blocked on it due to a pending DDL operation, such as DROP TABLE
or TRUNCATE TABLE or other table-rebuilding operations (ALTER, OPTIMIZE).
Pass a parameter to fil_space_acquire_low() telling that we are performing
a low-level I/O operation and the fil_space_t::is_stopping() status should
be ignored.
2017-04-03 22:09:28 +03:00
Jan Lindström
50eb40a2a8 MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing
MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off

Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.

This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).

fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()

fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.

btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
    Use fil_space_acquire()/release() to access fil_space_t.

buf_page_decrypt_after_read():
    Use fil_space_get_crypt_data() because at this point
    we might not yet have read page 0.

fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().

fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.

fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.

check_table_options()
	Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*

i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.
2017-03-14 16:23:10 +02:00
Marko Mäkelä
0094b6581d Merge 10.0 into 10.1 2017-03-10 15:16:13 +02:00
Marko Mäkelä
1b2b209519 Use correct integer format with printf-like functions. 2017-03-09 11:28:07 +02:00
Marko Mäkelä
498f4a825b Fix InnoDB/XtraDB compilation warnings on 32-bit builds. 2017-03-09 08:54:07 +02:00
Marko Mäkelä
ad0c218a44 Merge 10.0 into 10.1
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:

recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).

Report progress also via systemd using sd_notifyf().
2017-03-09 08:53:08 +02:00
Vicențiu Ciorbaru
83da1a1e57 Merge branch 'merge-xtradb-5.6' into 10.0 2017-03-05 00:59:57 +02:00
Vicențiu Ciorbaru
8d69ce7b82 5.6.35-80.0 2017-03-04 20:50:02 +02:00
Vicențiu Ciorbaru
1acfa942ed Merge branch '5.5' into 10.0 2017-03-03 01:37:54 +02:00
Marko Mäkelä
2bfe83adec Remove a bogus Valgrind "suppression".
fsp_init_file_page_low() does initialize all pages nowadays,
even those in the InnoDB system tablespace.
2017-02-21 16:45:03 +02:00
Marko Mäkelä
3c47ed4849 Merge 10.0 into 10.1 2017-02-20 14:02:40 +02:00
Marko Mäkelä
13493078e9 MDEV-11802 innodb.innodb_bug14676111 fails
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.

It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.

For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.

os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.

os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
2017-02-20 12:20:52 +02:00
Jan Lindström
108b211ee2 Fix gcc 6.3.x compiler warnings.
These are caused by fact that functions are declared with
__attribute__((nonnull)) or left shit like ~0 << macro
when ~0U << macro should be used.
2017-02-16 12:02:31 +02:00
Jan Lindström
de9963b786 After reivew fixes. 2017-02-10 17:41:35 +02:00
Jan Lindström
41cd80fe06 After review fixes. 2017-02-10 16:05:37 +02:00
Jan Lindström
0340067608 After review fixes for MDEV-11759.
buf_page_is_checksum_valid_crc32()
buf_page_is_checksum_valid_innodb()
buf_page_is_checksum_valid_none():
	Use ULINTPF instead of %lu and %u for ib_uint32_t

fil_space_verify_crypt_checksum():
	Check that page is really empty if checksum and
	LSN are zero.

fil_space_verify_crypt_checksum():
	Correct the comment to be more agurate.

buf0buf.h:
	Remove unnecessary is_corrupt variable from
	buf_page_t structure.
2017-02-09 08:49:13 +02:00
Jan Lindström
e53dfb24be MDEV-11707: Fix incorrect memset() for structures containing
dynamic class GenericPolicy<TTASEventMutex<GenericPolicy> >'; vtable

Instead using mem_heap_alloc and memset, use mem_heap_zalloc
directly.
2017-02-06 15:40:17 +02:00
Jan Lindström
ddf2fac733 MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes
compatibility problems

Pages that are encrypted contain post encryption checksum on
different location that normal checksum fields. Therefore,
we should before decryption check this checksum to avoid
unencrypting corrupted pages. After decryption we can use
traditional checksum check to detect if page is corrupted
or unencryption was done using incorrect key.

Pages that are page compressed do not contain any checksum,
here we need to fist unencrypt, decompress and finally
use tradional checksum check to detect page corruption
or that we used incorrect key in unencryption.

buf0buf.cc: buf_page_is_corrupted() mofified so that
compressed pages are skipped.

buf0buf.h, buf_block_init(), buf_page_init_low():
removed unnecessary page_encrypted, page_compressed,
stored_checksum, valculated_checksum fields from
buf_page_t

buf_page_get_gen(): use new buf_page_check_corrupt() function
to detect corrupted pages.

buf_page_check_corrupt(): If page was not yet decrypted
check if post encryption checksum still matches.
If page is not anymore encrypted, use buf_page_is_corrupted()
traditional checksum method.

If page is detected as corrupted and it is not encrypted
we print corruption message to error log.
If page is still encrypted or it was encrypted and now
corrupted, we will print message that page is
encrypted to error log.

buf_page_io_complete(): use new buf_page_check_corrupt()
function to detect corrupted pages.

buf_page_decrypt_after_read(): Verify post encryption
checksum before tring to decrypt.

fil0crypt.cc: fil_encrypt_buf() verify post encryption
checksum and ind fil_space_decrypt() return true
if we really decrypted the page.

fil_space_verify_crypt_checksum(): rewrite to use
the method used when calculating post encryption
checksum. We also check if post encryption checksum
matches that traditional checksum check does not
match.

fil0fil.ic: Add missed page type encrypted and page
compressed to fil_get_page_type_name()

Note that this change does not yet fix innochecksum tool,
that will be done in separate MDEV.

Fix test failures caused by buf page corruption injection.
2017-02-06 15:40:16 +02:00
Jan Lindström
b7b4c332c0 MDEV-11614: Syslog messages: "InnoDB: Log sequence number
at the start 759654123 and the end 0 do not match."

For page compressed and encrypted tables log sequence
number at end is not stored, thus disable this message
for them.
2017-01-22 08:46:15 +02:00
Jan Lindström
dc557ca817 MDEV-11835: InnoDB: Failing assertion: free_slot != NULL on
restarting server with encryption and read-only

buf0buf.cc: Temporary slots used in encryption was calculated
by read_threads * write_threads. However, in read-only mode
write_threads is zero. Correct way is to calculate
(read_threads + write_threads) * max pending IO requests.
2017-01-19 08:19:08 +02:00
Marko Mäkelä
ab1e6fefd8 MDEV-11623 MariaDB 10.1 fails to start datadir created with
MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K

The storage format of FSP_SPACE_FLAGS was accidentally broken
already in MariaDB 10.1.0. This fix is bringing the format in
line with other MySQL and MariaDB release series.

Please refer to the comments that were added to fsp0fsp.h
for details.

This is an INCOMPATIBLE CHANGE that affects users of
page_compression and non-default innodb_page_size. Upgrading
to this release will correct the flags in the data files.
If you want to downgrade to earlier MariaDB 10.1.x, please refer
to the test innodb.101_compatibility how to reset the
FSP_SPACE_FLAGS in the files.

NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret
uncompressed data files with innodb_page_size=4k or 64k as
compressed innodb_page_size=16k files, and then probably fail
when trying to access the pages. See the comments in the
function fsp_flags_convert_from_101() for detailed analysis.

Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16.
In this way, compressed innodb_page_size=16k tablespaces will not
be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20.

Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the
dict_table_t::flags when the table is available, in
fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace().
During crash recovery, fil_load_single_table_tablespace() will use
innodb_compression_level for the PAGE_COMPRESSION_LEVEL.

FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags
that are not to be written to FSP_SPACE_FLAGS. Currently, these will
include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR.

Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support
one innodb_page_size for the whole instance.

When creating a dummy tablespace for the redo log, use
fil_space_t::flags=0. The flags are never written to the redo log files.

Remove many FSP_FLAGS_SET_ macros.

dict_tf_verify_flags(): Remove. This is basically only duplicating
the logic of dict_tf_to_fsp_flags(), used in a debug assertion.

fil_space_t::mark: Remove. This flag was not used for anything.

fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter
mark_space, and add a parameter for table flags. Check that
fil_space_t::flags match the table flags, and adjust the (memory-only)
flags based on the table flags.

fil_node_open_file(): Remove some redundant or unreachable conditions,
do not use stderr for output, and avoid unnecessary server aborts.

fil_user_tablespace_restore_page(): Convert the flags, so that the
correct page_size will be used when restoring a page from the
doublewrite buffer.

fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove.
It suffices to have fil_space_is_page_compressed().

FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL,
FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not
exist in the FSP_SPACE_FLAGS but only in memory.

fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS
in page 0. Called by fil_open_single_table_tablespace(),
fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql()
except if --innodb-read-only is active.

fsp_flags_is_valid(ulint): Reimplement from the scratch, with
accurate comments. Do not display any details of detected
inconsistencies, because the output could be confusing when
dealing with MariaDB 10.1.x data files.

fsp_flags_convert_from_101(ulint): Convert flags from buggy
MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags
cannot be in MariaDB 10.1.x format.

fsp_flags_match(): Check the flags when probing files.
Implemented based on fsp_flags_is_valid()
and fsp_flags_convert_from_101().

dict_check_tablespaces_and_store_max_id(): Do not access the
page after committing the mini-transaction.

IMPORT TABLESPACE fixes:

AbstractCallback::init(): Convert the flags.

FetchIndexRootPages::operator(): Check that the tablespace flags match the
table flags. Do not attempt to convert tablespace flags to table flags,
because the conversion would necessarily be lossy.

PageConverter::update_header(): Write back the correct flags.
This takes care of the flags in IMPORT TABLESPACE.
2017-01-15 19:05:50 +02:00
Marko Mäkelä
a9d00db155 MDEV-11799 InnoDB can abort if the doublewrite buffer
contains a bad and a good copy

Clean up the InnoDB doublewrite buffer code.

buf_dblwr_init_or_load_pages(): Do not add empty pages to the buffer.

buf_dblwr_process(): Do consider changes to pages that are all zero.
Do not abort when finding a corrupted copy of a page in the doublewrite
buffer, because there could be multiple copies in the doublewrite buffer,
and only one of them needs to be good.
2017-01-15 18:56:56 +02:00
Marko Mäkelä
fb5ee7d6d0 Plug a memory leak in buf_dblwr_process(). 2017-01-05 19:01:14 +02:00
Marko Mäkelä
719321e78e MDEV-11638 Encryption causes race conditions in InnoDB shutdown
InnoDB shutdown failed to properly take fil_crypt_thread() into account.
The encryption threads were signalled to shut down together with other
non-critical tasks. This could be much too early in case of slow shutdown,
which could need minutes to complete the purge. Furthermore, InnoDB
failed to wait for the fil_crypt_thread() to actually exit before
proceeding to the final steps of shutdown, causing the race conditions.

Furthermore, the log_scrub_thread() was shut down way too early.
Also it should remain until the SRV_SHUTDOWN_FLUSH_PHASE.

fil_crypt_threads_end(): Remove. This would cause the threads to
be terminated way too early.

srv_buf_dump_thread_active, srv_dict_stats_thread_active,
lock_sys->timeout_thread_active, log_scrub_thread_active,
srv_monitor_active, srv_error_monitor_active: Remove a race condition
between startup and shutdown, by setting these in the startup thread
that creates threads, not in each created thread. In this way, once the
flag is cleared, it will remain cleared during shutdown.

srv_n_fil_crypt_threads_started, fil_crypt_threads_event: Declare in
global rather than static scope.

log_scrub_event, srv_log_scrub_thread_active, log_scrub_thread():
Declare in static rather than global scope. Let these be created by
log_init() and freed by log_shutdown().

rotate_thread_t::should_shutdown(): Do not shut down before the
SRV_SHUTDOWN_FLUSH_PHASE.

srv_any_background_threads_are_active(): Remove. These checks now
exist in logs_empty_and_mark_files_at_shutdown().

logs_empty_and_mark_files_at_shutdown(): Shut down the threads in
the proper order. Keep fil_crypt_thread() and log_scrub_thread() alive
until SRV_SHUTDOWN_FLUSH_PHASE, and check that they actually terminate.
2017-01-05 00:20:06 +02:00
Marko Mäkelä
d50cf42bc0 MDEV-9282 Debian: the Lintian complains about "shlib-calls-exit" in ha_innodb.so
Replace all exit() calls in InnoDB with abort() [possibly via ut_a()].
Calling exit() in a multi-threaded program is problematic also for
the reason that other threads could see corrupted data structures
while some data structures are being cleaned up by atexit() handlers
or similar.

In the long term, all these calls should be replaced with something
that returns an error all the way up the call stack.
2016-12-28 15:54:24 +02:00
Jan Lindström
72cc73cea2 MDEV-10368: get_latest_version() called too often
Reduce the number of calls to encryption_get_key_get_latest_version
when doing key rotation with two different methods:

(1) We need to fetch key information when tablespace not yet
have a encryption information, invalid keys are handled now
differently (see below). There was extra call to detect
if key_id is not found on key rotation.

(2) If key_id is not found from encryption plugin, do not
try fetching new key_version for it as it will fail anyway.
We store return value from encryption_get_key_get_latest_version
call and if it returns ENCRYPTION_KEY_VERSION_INVALID there
is no need to call it again.
2016-12-13 11:51:33 +02:00
Marko Mäkelä
a68d1352b6 MDEV-11349 (2/2) Fix some bogus-looking Valgrind warnings
buf_block_init(): Initialize buf_page_t::flush_type.
For some reason, Valgrind 3.12.0 would seem to flag some
bits in adjacent bitfields as uninitialized, even though only
the two bits of flush_type were left uninitialized. Initialize
the field to get rid of many warnings.

buf_page_init_low(): Initialize buf_page_t::old.
For some reason, Valgrind 3.12.0 would seem to flag all 32
bits uninitialized when buf_page_init_for_read() invokes
buf_LRU_add_block(bpage, TRUE). This would trigger bogus warnings
for buf_page_t::freed_page_clock being uninitialized.
(The V-bits would later claim that only "old" is initialized
in the 32-bit word.) Perhaps recent compilers
(GCC 6.2.1 and clang 4.0.0) generate more optimized x86_64 code
for bitfield operations, confusing Valgrind?

mach_write_to_1(), mach_write_to_2(), mach_write_to_3():
Rewrite the assertions that ensure that the most significant
bits are zero. Apparently, clang 4.0.0 would optimize expressions
of the form ((n | 0xFF) <= 0x100) to (n <= 0x100). The redundant
0xFF was added in the first place in order to suppress a
Valgrind warning. (Valgrind would warn about comparing uninitialized
values even in the case when the uninitialized bits do not affect
the result of the comparison.)
2016-11-25 12:34:02 +02:00
Marko Mäkelä
8da33e3a86 MDEV-11349 (1/2) Fix some clang 4.0 warnings
In InnoDB and XtraDB functions that declare pointer parameters as nonnull,
remove nullness checks, because GCC would optimize them away anyway.

Use #ifdef instead of #if when checking for a configuration flag.

Clang says that left shifts of negative values are undefined.
So, use ~0U instead of ~0 in a number of macros.

Some functions that were defined as UNIV_INLINE were declared as
UNIV_INTERN. Consistently use the same type of linkage.

ibuf_merge_or_delete_for_page() could pass bitmap_page=NULL to
buf_page_print(), conflicting with the __attribute__((nonnull)).
2016-11-25 09:09:51 +02:00
Sergei Golubchik
a98c85bb50 Merge branch '10.0-galera' into 10.1 2016-11-02 13:44:07 +01:00
Jan Lindström
554c60ab0d MDEV-11182: InnoDB: Assertion failure in file buf0buf.cc line 4730 (encryption.create_or_replace fails in buildbot and outside)
Analysis: Problem is that page is encrypted but encryption information
on page 0 has already being changed.

Fix: If page header contains key_version != 0 and even if based on
current encryption information tablespace is not encrypted we
need to check is page corrupted. If it is not, then we know that
page is not encrypted. If page is corrupted, we need to try to
decrypt it and then compare the stored and calculated checksums
to see is page corrupted or not.
2016-10-31 12:44:06 +02:00
Jan Lindström
885577fb10 MDEV-11004: Unable to start (Segfault or os error 2) when encryption key missing
Two problems:

(1) When pushing warning to sql-layer we need to check that thd != NULL
to avoid NULL-pointer reference.

(2) At tablespace key rotation if used key_id is not found from
encryption plugin tablespace should not be rotated.
2016-10-29 10:09:06 +03:00
Jan Lindström
bc323727de MDEV-10977: [ERROR] InnoDB: Block in space_id 0 in file ibdata1 encrypted.
MDEV-10394: Innodb system table space corrupted

Analysis: After we have read the page in buf_page_io_complete try to
find if the page is encrypted or corrupted. Encryption was determined
by reading FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION field from FIL-header
as a key_version. However, this field is not always zero even when
encryption is not used. Thus, incorrect key_version could lead situation where
decryption is tried to page that is not encrypted.

Fix: We still read key_version information from FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION
field but also check if tablespace has encryption information before trying
encrypt the page.
2016-10-29 10:09:06 +03:00
Sergei Golubchik
675f27b382 Merge branch 'merge/merge-xtradb-5.6' into 10.0
commented out the "compressed columns" feature
2016-10-25 18:28:31 +02:00
Sergei Golubchik
d7dc03a267 5.6.33-79.0 2016-10-25 17:01:37 +02:00
Hyeonseok Oh
9401e6befd Remove unnecessary semicolons 2016-10-24 14:58:41 +09:00
Sergei Golubchik
66d9696596 Merge branch '10.0' into 10.1 2016-09-28 17:55:28 +02:00