Commit graph

162 commits

Author SHA1 Message Date
Jan Lindström
50eb40a2a8 MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing
MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off

Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.

This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).

fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()

fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.

btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
    Use fil_space_acquire()/release() to access fil_space_t.

buf_page_decrypt_after_read():
    Use fil_space_get_crypt_data() because at this point
    we might not yet have read page 0.

fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().

fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.

fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.

check_table_options()
	Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*

i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.
2017-03-14 16:23:10 +02:00
Marko Mäkelä
9dc10d5851 Merge 10.0 into 10.1 2017-03-13 19:17:34 +02:00
Sergei Golubchik
b6a1d6538b compiler warnings 2017-03-10 18:21:22 +01:00
Marko Mäkelä
032678ad18 MDEV-12091 Shutdown fails to wait for rollback of recovered transactions to finish
In the 10.1 InnoDB Plugin, a call os_event_free(buf_flush_event) was
misplaced. The event could be signalled by rollback of resurrected
transactions while shutdown was in progress. This bug was caught
by cmake -DWITH_ASAN testing. This call was only present in the
10.1 InnoDB Plugin, not in other versions, or in XtraDB.

That said, the bug affects all InnoDB versions. Shutdown assumes the
cessation of any page-dirtying activity, including the activity of
the background rollback thread. InnoDB only waited for the background
rollback to finish as part of a slow shutdown (innodb_fast_shutdown=0).
The default is a clean shutdown (innodb_fast_shutdown=1). In a scenario
where InnoDB is killed, restarted, and shut down soon enough, the data
files could become corrupted.

logs_empty_and_mark_files_at_shutdown(): Wait for the
rollback to finish, except if innodb_fast_shutdown=2
(crash-like shutdown) was requested.

trx_rollback_or_clean_recovered(): Before choosing the next
recovered transaction to roll back, terminate early if non-slow
shutdown was initiated. Roll back everything on slow shutdown
(innodb_fast_shutdown=0).

srv_innodb_monitor_mutex: Declare as static, because the mutex
is only used within one module.

After each call to os_event_free(), ensure that the freed event
is not reachable via global variables, by setting the relevant
variables to NULL.
2017-03-10 18:54:29 +02:00
Marko Mäkelä
498f4a825b Fix InnoDB/XtraDB compilation warnings on 32-bit builds. 2017-03-09 08:54:07 +02:00
Marko Mäkelä
ad0c218a44 Merge 10.0 into 10.1
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:

recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).

Report progress also via systemd using sd_notifyf().
2017-03-09 08:53:08 +02:00
Marko Mäkelä
47396ddea9 Merge 5.5 into 10.0
Also, implement MDEV-11027 a little differently from 5.5:

recv_sys_t::report(ib_time_t): Determine whether progress should
be reported.

recv_apply_hashed_log_recs(): Rename the parameter to last_batch.
2017-03-08 11:40:43 +02:00
Marko Mäkelä
9c47beb8bd MDEV-11027 InnoDB log recovery is too noisy
Provide more useful progress reporting of crash recovery.

recv_sys_t::progress_time: The time of the last report.

recv_scan_print_counter: Remove.

log_group_read_log_seg(): After after each I/O request,
report progress if needed.

recv_apply_hashed_log_recs(): At the start of each batch,
if there are pages to be recovered, issue a message.
2017-03-08 10:07:50 +02:00
Vicențiu Ciorbaru
83da1a1e57 Merge branch 'merge-xtradb-5.6' into 10.0 2017-03-05 00:59:57 +02:00
Vicențiu Ciorbaru
8d69ce7b82 5.6.35-80.0 2017-03-04 20:50:02 +02:00
Marko Mäkelä
adc91387e3 Merge 10.0 into 10.1 2017-03-03 13:27:12 +02:00
Marko Mäkelä
fc673a2c12 MDEV-12127 InnoDB: Assertion failure loop_count < 5 in file log0log.cc
As suggested in MySQL Bug#58536, increase the limit in this
debug assertion in order to avoid false positives on heavily
loaded systems.
2017-02-28 09:54:12 +02:00
Marko Mäkelä
13493078e9 MDEV-11802 innodb.innodb_bug14676111 fails
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.

It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.

For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.

os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.

os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
2017-02-20 12:20:52 +02:00
Marko Mäkelä
9017a05d87 Merge 10.0 into 10.1 2017-02-08 17:30:25 +02:00
Marko Mäkelä
d831e4c22a MDEV-12024 InnoDB startup fails to wait for recv_writer_thread to finish
recv_writer_thread(): Do not assign recv_writer_thread_active=true
in order to avoid a race condition with
recv_recovery_from_checkpoint_finish().

recv_init_crash_recovery(): Assign recv_writer_thread_active=true
before creating recv_writer_thread.
2017-02-08 17:23:13 +02:00
Marko Mäkelä
2e67e66c3a Merge 10.0 into 10.1 2017-02-08 08:53:34 +02:00
Marko Mäkelä
20e8347447 MDEV-11985 Make innodb_read_only shutdown more robust
If InnoDB is started in innodb_read_only mode such that
recovered incomplete transactions exist at startup
(but the redo logs are clean), an assertion will fail at shutdown,
because there would exist some non-prepared transactions.

logs_empty_and_mark_files_at_shutdown(): Do not wait for incomplete
transactions to finish if innodb_read_only or innodb_force_recovery>=3.
Wait for purge to finish in only one place.

trx_sys_close(): Relax the assertion that would fail first.

trx_free_prepared(): Also free recovered TRX_STATE_ACTIVE transactions
if innodb_read_only or innodb_force_recovery>=3.
2017-02-04 17:33:19 +02:00
Marko Mäkelä
f1f8ebc325 Merge 10.0 into 10.1 2017-01-26 23:40:11 +02:00
Marko Mäkelä
afb461587c MDEV-11915 Detect InnoDB system tablespace size mismatch early
InnoDB would refuse to start up if there is a mismatch on
the size of the system tablespace files. However, before this
check is conducted, the system tablespace may already have been
heavily modified.

InnoDB should perform the size check as early as possible.

recv_recovery_from_checkpoint_finish():
Move the recv_apply_hashed_log_recs() call to
innobase_start_or_create_for_mysql().

innobase_start_or_create_for_mysql(): Test the mutex functionality
before doing anything else. Use a compile_time_assert() for a
sizeof() constraint. Check the size of the system tablespace as
early as possible.
2017-01-26 23:10:36 +02:00
Marko Mäkelä
49fe9bad01 MDEV-11814 Refuse innodb_read_only startup if crash recovery is needed
recv_scan_log_recs(): Remember if redo log apply is needed,
even if starting up in innodb_read_only mode.

recv_recovery_from_checkpoint_start_func(): Refuse
innodb_read_only startup if redo log apply is needed.
2017-01-26 13:58:58 +02:00
Marko Mäkelä
719321e78e MDEV-11638 Encryption causes race conditions in InnoDB shutdown
InnoDB shutdown failed to properly take fil_crypt_thread() into account.
The encryption threads were signalled to shut down together with other
non-critical tasks. This could be much too early in case of slow shutdown,
which could need minutes to complete the purge. Furthermore, InnoDB
failed to wait for the fil_crypt_thread() to actually exit before
proceeding to the final steps of shutdown, causing the race conditions.

Furthermore, the log_scrub_thread() was shut down way too early.
Also it should remain until the SRV_SHUTDOWN_FLUSH_PHASE.

fil_crypt_threads_end(): Remove. This would cause the threads to
be terminated way too early.

srv_buf_dump_thread_active, srv_dict_stats_thread_active,
lock_sys->timeout_thread_active, log_scrub_thread_active,
srv_monitor_active, srv_error_monitor_active: Remove a race condition
between startup and shutdown, by setting these in the startup thread
that creates threads, not in each created thread. In this way, once the
flag is cleared, it will remain cleared during shutdown.

srv_n_fil_crypt_threads_started, fil_crypt_threads_event: Declare in
global rather than static scope.

log_scrub_event, srv_log_scrub_thread_active, log_scrub_thread():
Declare in static rather than global scope. Let these be created by
log_init() and freed by log_shutdown().

rotate_thread_t::should_shutdown(): Do not shut down before the
SRV_SHUTDOWN_FLUSH_PHASE.

srv_any_background_threads_are_active(): Remove. These checks now
exist in logs_empty_and_mark_files_at_shutdown().

logs_empty_and_mark_files_at_shutdown(): Shut down the threads in
the proper order. Keep fil_crypt_thread() and log_scrub_thread() alive
until SRV_SHUTDOWN_FLUSH_PHASE, and check that they actually terminate.
2017-01-05 00:20:06 +02:00
Marko Mäkelä
8451e09073 MDEV-11556 InnoDB redo log apply fails to adjust data file sizes
fil_space_t::recv_size: New member: recovered tablespace size in pages;
0 if no size change was read from the redo log,
or if the size change was implemented.

fil_space_set_recv_size(): New function for setting space->recv_size.

innodb_data_file_size_debug: A debug parameter for setting the system
tablespace size in recovery even when the redo log does not contain
any size changes. It is hard to write a small test case that would
cause the system tablespace to be extended at the critical moment.

recv_parse_log_rec(): Note those tablespaces whose size is being changed
by the redo log, by invoking fil_space_set_recv_size().

innobase_init(): Correct an error message, and do not require a larger
innodb_buffer_pool_size when starting up with a smaller innodb_page_size.

innobase_start_or_create_for_mysql(): Allow startup with any initial
size of the ibdata1 file if the autoextend attribute is set. Require
the minimum size of fixed-size system tablespaces to be 640 pages,
not 10 megabytes. Implement innodb_data_file_size_debug.

open_or_create_data_files(): Round the system tablespace size down
to pages, not to full megabytes, (Our test truncates the system
tablespace to more than 800 pages with innodb_page_size=4k.
InnoDB should not imagine that it was truncated to 768 pages
and then overwrite good pages in the tablespace.)

fil_flush_low(): Refactored from fil_flush().

fil_space_extend_must_retry(): Refactored from
fil_extend_space_to_desired_size().

fil_mutex_enter_and_prepare_for_io(): Extend the tablespace if
fil_space_set_recv_size() was called.

The test case has been successfully run with all the
innodb_page_size values 4k, 8k, 16k, 32k, 64k.
2016-12-30 09:52:24 +02:00
Marko Mäkelä
d50cf42bc0 MDEV-9282 Debian: the Lintian complains about "shlib-calls-exit" in ha_innodb.so
Replace all exit() calls in InnoDB with abort() [possibly via ut_a()].
Calling exit() in a multi-threaded program is problematic also for
the reason that other threads could see corrupted data structures
while some data structures are being cleaned up by atexit() handlers
or similar.

In the long term, all these calls should be replaced with something
that returns an error all the way up the call stack.
2016-12-28 15:54:24 +02:00
Sergei Golubchik
a98c85bb50 Merge branch '10.0-galera' into 10.1 2016-11-02 13:44:07 +01:00
Sergei Golubchik
675f27b382 Merge branch 'merge/merge-xtradb-5.6' into 10.0
commented out the "compressed columns" feature
2016-10-25 18:28:31 +02:00
Sergei Golubchik
d7dc03a267 5.6.33-79.0 2016-10-25 17:01:37 +02:00
Vladislav Vaintroub
ee1d08c115 Revert "Prepare XtraDB to be used with xtrabackup."
This reverts commit de5646f1a9.
2016-10-23 00:10:37 +00:00
Vladislav Vaintroub
de5646f1a9 Prepare XtraDB to be used with xtrabackup.
The changes are deliberately kept minimal

- some functions are made global instead of static (they will be used in
xtrabackup later on)

- functions got additional parameter, deliberately unused for now :
  fil_load_single_tablespaces
  srv_undo_tablespaces_init

- Global variables added, also unused for now :
   srv_archive_recovery
   srv_archive_recovery_limit_lsn
   srv_apply_log_only
   srv_backup_mode
   srv_close_files

- To make xtrabackup link with sql.lib on Windows, added some missing
  source files to sql.lib

- Fixed os_thread_ret_t to be DWORD on Windows
2016-10-22 14:10:12 +00:00
Sergei Golubchik
e4957de4fd Merge branch 'merge-xtradb-5.5' into 5.5 2016-10-13 12:40:24 +02:00
Sergei Golubchik
6010a27c87 5.5.52-38.3 2016-10-13 12:23:16 +02:00
Sergei Golubchik
66d9696596 Merge branch '10.0' into 10.1 2016-09-28 17:55:28 +02:00
Sergei Golubchik
bb8b658954 Merge branch 'merge/merge-xtradb-5.6' into 10.0 2016-09-27 18:58:57 +02:00
Sergei Golubchik
93ab3093cb 5.6.32-78.1 2016-09-27 18:00:59 +02:00
Sergei Golubchik
6b1863b830 Merge branch '10.0' into 10.1 2016-08-25 12:40:09 +02:00
Sergei Golubchik
3863e72380 Merge branch 'merge/merge-xtradb-5.6' into 10.0
5.6.31-77.0
2016-08-10 19:55:45 +02:00
Sergei Golubchik
64752acf72 5.6.31-77.0 2016-08-10 19:24:58 +02:00
Sergei Golubchik
309c08c17c Merge branch '5.5' into 10.0 2016-08-10 19:19:05 +02:00
Sergei Golubchik
5265243cc4 Merge branch 'merge/merge-xtradb-5.5' into 5.5 2016-08-03 20:44:08 +02:00
Sergei Golubchik
e316c46f43 5.5.50-38.0 2016-08-03 20:43:29 +02:00
Sergei Golubchik
3361aee591 Merge branch '10.0' into 10.1 2016-06-28 22:01:55 +02:00
Sergei Golubchik
b3f4cf7c13 Merge branch 'merge-xtradb-5.6' into 0.0 2016-06-21 15:27:09 +02:00
Sergei Golubchik
b42664e85e 5.6.30-76.3 2016-06-21 14:20:09 +02:00
Sergei Golubchik
260699e91b Merge branch 'merge-xtradb-5.5' into 5.5 2016-06-14 13:59:41 +02:00
Sergei Golubchik
f54dcf1e87 5.5.49-37.9 2016-06-14 12:38:47 +02:00
Jan Lindström
c395aad668 MDEV-9840: Test encryption.innodb-log-encrypt-crash fails on buildbot
Problem: We created more than 5 encryption keys for redo-logs.
Idea was that we do not anymore create more than one encryption
key for redo-logs but if existing checkpoint from earlier
MariaDB contains more keys, we should read all of them.

Fix: Add new encryption key to memory structure only if there
currently has none or if we are reading checkpoint from the log.
Checkpoint from older MariaDB version could contain more than
one key.
2016-03-31 13:12:48 +03:00
Jan Lindström
37a65e3335 MDEV-9793: getting mysqld crypto key from key version failed
Make sure that we read all possible encryption keys from checkpoint
and if log block checksum does not match, print all found
checkpoint encryption keys.
2016-03-30 16:09:47 +03:00
Jan Lindström
f448a800e1 MDEV-9422: Checksum errors on restart when killing busy instance that uses encrypted XtraDB tables
Analysis:

-- InnoDB has n (>0) redo-log files.
-- In the first page of redo-log there is 2 checkpoint records on fixed location (checkpoint is not encrypted)
-- On every checkpoint record there is up to 5 crypt_keys containing the keys used for encryption/decryption
-- On crash recovery we read all checkpoints on every file
-- Recovery starts by reading from the latest checkpoint forward
-- Problem is that latest checkpoint might not always contain the key we need to decrypt all the
   redo-log blocks (see MDEV-9422 for one example)
-- Furthermore, there is no way to identify is the log block corrupted or encrypted

For example checkpoint can contain following keys :

write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ]

so over time we could have a checkpoint

write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ]

killall -9 mysqld causes crash recovery and on crash recovery we read as
many checkpoints as there is log files, e.g.

read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]

This is problematic, as we could still scan log blocks e.g. from checkpoint 4 and we do
not know anymore the correct key.

CRYPT INFO: for checkpoint 14 search 4
CRYPT INFO: for checkpoint 13 search 4
CRYPT INFO: for checkpoint 12 search 4
CRYPT INFO: for checkpoint 11 search 4
CRYPT INFO: for checkpoint 10 search 4
CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND)

For every checkpoint, code generated a new encrypted key based on key
from encryption plugin and random numbers. Only random numbers are
stored on checkpoint.

Fix: Generate only one key for every log file. If checkpoint contains only
one key, use that key to encrypt/decrypt all log blocks. If checkpoint
contains more than one key (this is case for databases created
using MariaDB server version 10.1.0 - 10.1.12 if log encryption was
used). If looked checkpoint_no is found from keys on checkpoint we use
that key to decrypt the log block. For encryption we use always the
first key. If the looked checkpoint_no is not found from keys on checkpoint
we use the first key.

Modified code also so that if log is not encrypted, we do not generate
any empty keys. If we have a log block and no keys is found from
checkpoint we assume that log block is unencrypted. Log corruption or
missing keys is found by comparing log block checksums. If we have
a keys but current log block checksum is correct we again assume
log block to be unencrypted. This is because current implementation
stores checksum only before encryption and new checksum after
encryption but before disk write is not stored anywhere.
2016-03-18 07:58:04 +02:00
Sergei Golubchik
a5679af1b1 Merge branch '10.0' into 10.1 2016-02-23 21:35:05 +01:00
Sergei Golubchik
17a792a441 Merge branch 'merge-xtradb-5.6' into 10.0 2016-02-16 18:55:00 +01:00
Sergei Golubchik
d76eba6a6b 5.6.28-76.1 2016-02-16 12:06:16 +01:00