Commit graph

189 commits

Author SHA1 Message Date
Marko Mäkelä
e0ebe3d083 Merge 10.0 into 10.1 2017-09-22 10:31:49 +03:00
Marko Mäkelä
f6cb4f0a19 MDEV-13814 Extra logging when innodb_log_archive=ON
log_group_read_log_seg(): Only display the message during recovery,
not during normal operation. When the XtraDB configuration parameter
innodb_log_archive is set, this function will be called during
normal operation.
2017-09-22 10:28:14 +03:00
Marko Mäkelä
8c4df595b8 MDEV-13807 mariabackup --apply-log-only does generate redo log by performing rollback and possibly other tasks
Skip rollback and other redo-log-generating tasks if
srv_apply_log_only is set.

Instead of assigning the debug variable recv_no_log_write = FALSE,
assign it to srv_apply_log_only, so that any unwanted writes are caught.
2017-09-16 09:55:49 +03:00
Jan Lindström
fa2701c6f7 MDEV-12634: Uninitialised ROW_MERGE_RESERVE_SIZE bytes written to tem…
…porary file

Fixed by removing writing key version to start of every block that
was encrypted. Instead we will use single key version from log_sys
crypt info.

After this MDEV also blocks writen to row log are encrypted and blocks
read from row log aren decrypted if encryption is configured for the
table.

innodb_status_variables[], struct srv_stats_t
	Added status variables for merge block and row log block
	encryption and decryption amounts.

Removed ROW_MERGE_RESERVE_SIZE define.

row_merge_fts_doc_tokenize
	Remove ROW_MERGE_RESERVE_SIZE

row_log_t
	Add index, crypt_tail, crypt_head to be used in case of
	encryption.

row_log_online_op, row_log_table_close_func
	Before writing a block encrypt it if encryption is enabled

row_log_table_apply_ops, row_log_apply_ops
	After reading a block decrypt it if encryption is enabled

row_log_allocate
	Allocate temporary buffers crypt_head and crypt_tail
	if needed.

row_log_free
	Free temporary buffers crypt_head and crypt_tail if they
	exist.

row_merge_encrypt_buf, row_merge_decrypt_buf
	Removed.

row_merge_buf_create, row_merge_buf_write
	Remove ROW_MERGE_RESERVE_SIZE

row_merge_build_indexes
	Allocate temporary buffer used in decryption and encryption
	if needed.

log_tmp_blocks_crypt, log_tmp_block_encrypt, log_temp_block_decrypt
	New functions used in block encryption and decryption

log_tmp_is_encrypted
	New function to check is encryption enabled.

Added test case innodb-rowlog to force creating a row log and
verify that operations are done using introduced status
variables.
2017-09-14 09:23:20 +03:00
Marko Mäkelä
112d721a74 Merge 10.0 into 10.1 2017-09-07 12:08:34 +03:00
Marko Mäkelä
d861822c4f MDEV-13253 After rebuilding redo logs, InnoDB can leak data from redo log buffer
recv_reset_logs(): Initialize the redo log buffer, so that no data
from the old redo log can be written to the new redo log.

This bug has very little impact before MariaDB 10.2. The
innodb_log_encrypt option that was introduced in MariaDB 10.1
increases the impact. If the redo log used to be encrypted, and
it is being resized and encryption disabled, then previously
encrypted data could end up being written to the new redo log
in clear text. This resulted in encryption.innodb_encrypt_log
test failures in MariaDB 10.2.
2017-09-07 12:01:07 +03:00
Marko Mäkelä
829752973b Merge branch '10.0' into 10.1 2017-08-30 13:06:13 +03:00
Marko Mäkelä
e634fdcd5b WL#8845: Clarify the message about redo log format incompatibility
recv_find_max_checkpoint(): Refer to MariaDB 10.2.2 instead of
MySQL 5.7.9. Do not hint that a binary downgrade might be possible,
because there are many changes in InnoDB 5.7 that could make
downgrade impossible: a column appended to SYS_INDEXES, added
SYS_* tables, undo log format changes, and so on.
2017-08-29 21:58:02 +03:00
Sergei Golubchik
8e8d42ddf0 Merge branch '10.0' into 10.1 2017-08-08 10:18:43 +02:00
Vicențiu Ciorbaru
b278c02e18 Merge branch 'merge-xtradb-5.6' into 10.0 2017-08-02 12:15:58 +03:00
Vicențiu Ciorbaru
04ae1207ed 5.6.36-82.1 2017-08-02 12:11:06 +03:00
Marko Mäkelä
d1e182d603 MDEV-12975 InnoDB redo log minimum size check uses detected file size instead of requested innodb_log_file_size
log_calc_max_ages(): Use the requested size in the check, instead of
the detected redo log size. The redo log will be resized at startup
if it differs from what has been requested.
2017-06-19 16:17:03 +03:00
Marko Mäkelä
5e4f4ec821 MDEV-12975 InnoDB redo log minimum size check uses detected file size instead of requested innodb_log_file_size
log_calc_max_ages(): Use the requested size in the check, instead of
the detected redo log size. The redo log will be resized at startup
if it differs from what has been requested.
2017-06-19 15:59:19 +03:00
Marko Mäkelä
fbeb9489cd Cleanup of MDEV-12600: crash during install_db with innodb_page_size=32K and ibdata1=3M
The doublewrite buffer pages must fit in the first InnoDB system
tablespace data file. The checks that were added in the initial patch
(commit 112b21da37)
were at too high level and did not cover all cases.

innodb.log_data_file_size: Test all innodb_page_size combinations.

fsp_header_init(): Never return an error. Move the change buffer creation
to the only caller that needs to do it.

btr_create(): Clean up the logic. Remove the error log messages.

buf_dblwr_create(): Try to return an error on non-fatal failure.
Check that the first data file is big enough for creating the
doublewrite buffers.

buf_dblwr_process(): Check if the doublewrite buffer is available.
Display the message only if it is available.

recv_recovery_from_checkpoint_start_func(): Remove a redundant message
about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already
been initiated.

fil_report_invalid_page_access(): Simplify the message.

fseg_create_general(): Do not emit messages to the error log.

innobase_init(): Revert the changes.

trx_rseg_create(): Refactor (no functional change).
2017-06-08 11:55:47 +03:00
Jan Lindström
1af8bf39ca MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M;
Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for
encrypted pages even in system datafiles should contain key_version
except very first page (0:0) is after encryption overwritten with
flush lsn.

Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1
The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during
InnoDB startup.

At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
from the first page of each file in the InnoDB system tablespace.
If there are multiple files, the minimum and maximum LSN can differ.
These numbers are passed to InnoDB startup.

Having the number in other files than the first file of the InnoDB
system tablespace is not providing much additional value. It is
conflicting with other use of the field, such as on InnoDB R-tree
index pages and encryption key_version.

This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to
other files than the first file of the InnoDB system tablespace
(page number 0:0) when system tablespace is encrypted. If tablespace
is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
to all first pages of system tablespace to avoid unnecessary
warnings on downgrade.

open_or_create_data_files(): pass only one flushed_lsn parameter

xb_load_tablespaces(): pass only one flushed_lsn parameter.

buf_page_create(): Improve comment about where
FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set.

fil_write_flushed_lsn(): A new function, merged from
fil_write_lsn_and_arch_no_to_file() and
fil_write_flushed_lsn_to_data_files().
Only write to the first page of the system tablespace (page 0:0)
if tablespace is encrypted, or write all first pages of system
tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE)
afterwards.

fil_read_first_page(): read flush_lsn and crypt_data only from
first datafile.

fil_open_single_table_tablespace(): Remove output of LSN, because it
was only valid for the system tablespace and the undo tablespaces, not
user tablespaces.

fil_validate_single_table_tablespace(): Remove output of LSN.

checkpoint_now_set(): Use fil_write_flushed_lsn and output
a error if operation fails.

Remove lsn variable from fsp_open_info.

recv_recovery_from_checkpoint_start(): Remove unnecessary second
flush_lsn parameter.

log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn
and output error if it fails.

open_or_create_data_files(): Pass only one flushed_lsn variable.
2017-06-01 14:07:48 +03:00
Marko Mäkelä
2f29fc3c1c 10.1 additions for MDEV-12052 Shutdown crash presumably due to master thread activity
btr_defragment_thread(): Create the thread in the same place as other
threads. Do not invoke btr_defragment_shutdown(), because
row_drop_tables_for_mysql_in_background() in the master thread can still
keep invoking btr_defragment_remove_table().

logs_empty_and_mark_files_at_shutdown(): Wait for btr_defragment_thread()
to exit.

innobase_start_or_create_for_mysql(), innobase_shutdown_for_mysql():
Skip encryption and scrubbing in innodb_read_only_mode.

srv_export_innodb_status(): Do not export encryption or scrubbing
statistics in innodb_read_only mode, because the threads will not
be running.
2017-05-26 15:19:40 +03:00
Marko Mäkelä
b61700c221 Merge 10.0 into 10.1 2017-05-23 08:59:03 +03:00
Marko Mäkelä
a4d4a5fe82 After-merge fix for MDEV-11638
In commit 360a4a0372
some debug assertions were introduced to the page flushing code
in XtraDB. Add these assertions to InnoDB as well, and adjust
the InnoDB shutdown so that these assertions will not fail.

logs_empty_and_mark_files_at_shutdown(): Advance
srv_shutdown_state from the first phase SRV_SHUTDOWN_CLEANUP
only after no page-dirtying activity is possible
(well, except by srv_master_do_shutdown_tasks(), which will be
fixed separately in MDEV-12052).

rotate_thread_t::should_shutdown(): Already exit the key rotation
threads at the first phase of shutdown (SRV_SHUTDOWN_CLEANUP).

page_cleaner_sleep_if_needed(): Do not sleep during shutdown.
This change is originally from XtraDB.
2017-05-20 08:41:34 +03:00
Marko Mäkelä
65e1399e64 Merge 10.0 into 10.1
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
2017-05-20 08:41:20 +03:00
Sergei Golubchik
7c03edf2fe MDEV-6262 analyze the coverity report on mariadb
uploaded 10.0, analyzed everything with the Impact=High
(and a couple of Medium)
2017-05-19 20:26:56 +02:00
Vicențiu Ciorbaru
d8b45b0c00 Merge branch 'merge-xtradb-5.6' into 10.0 2017-05-17 12:11:12 +03:00
Vicențiu Ciorbaru
360a4a0372 5.6.36-82.0 2017-05-16 14:16:11 +03:00
Vladislav Vaintroub
9c4b7cad27 MDEV-9566 Prepare xtradb for xtrabackup
These changes are comparable to Percona's modifications in innodb in the
Percona Xtrabackup repository.

- If functions are used in backup as well as in innodb, make them non-static.

- Define IS_XTRABACKUP() macro for special handling of innodb running
  inside backup.

- Extend some functions for backup.
  fil_space_for_table_exists_in_mem() gets additional parameter
  'remove_from_data_dict_if_does_not_exist', for partial backups

  fil_load_single_table_tablespaces() gets an optional parameter predicate
  which tells whether to load tablespace based on database or table name,
  also for partial backups.

  srv_undo_tablespaces_init() gets an optional parameter 'backup_mode'

- Allow single redo log file (for backup "prepare")

- Do not read doublewrite buffer pages in backup, they are outdated

- Add function fil_remove_invalid_table_from_data_dict(), to remove non-existing
   tables from data dictionary in case of partial backups.

- On Windows, fix file share modes when opening tablespaces,
to allow mariabackup to read tablespaces while server is online.

- Avoid access to THDVARs in backup, because innodb plugin is not loaded,
and THDVAR would crash in this case.
2017-04-27 19:12:39 +02:00
Vladislav Vaintroub
db39107413 MDEV-11663 Create services for functionality used by plugins
Added service for
- encryption (AES)
- error reporting, e.g my_printf_error()
2017-04-27 19:12:38 +02:00
Jan Lindström
765a43605a MDEV-12253: Buffer pool blocks are accessed after they have been freed
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.

This patch should also address following test failures and
bugs:

MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.

MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot

MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot

MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8

Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.

Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().

Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.

btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.

buf_page_get_zip(): Issue error if page read fails.

buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.

buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.

buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.

buf_page_io_complete(): use dberr_t for error handling.

buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
        Issue error if page read fails.

btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.

Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()

dict_stats_update_transient_for_index(),
dict_stats_update_transient()
        Do not continue if table decryption failed or table
        is corrupted.

dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.

fil_parse_write_crypt_data():
        Check that key read from redo log entry is found from
        encryption plugin and if it is not, refuse to start.

PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.

Fixed error code on innodb.innodb test.

Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2.  Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.

Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.

fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.

Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.

Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.

fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.

recv_apply_hashed_log_recs() does not return anything.
2017-04-26 15:19:16 +03:00
Marko Mäkelä
8c38147cdd Merge 10.0 into 10.1 2017-04-21 12:46:12 +03:00
Marko Mäkelä
87b6df31c4 MDEV-12488 Remove type mismatch in InnoDB printf-like calls
This is a reduced version of an originally much larger patch.
We will keep the definition of the ulint, lint data types unchanged,
and we will not be replacing fprintf() calls with ib_logf().

On Windows, use the standard format strings instead of nonstandard
extensions.

This patch fixes some errors in format strings.
Most notably, an IMPORT TABLESPACE error message in InnoDB was
displaying the number of columns instead of the mismatching flags.
2017-04-21 12:06:29 +03:00
Jan Lindström
50eb40a2a8 MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing
MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off

Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.

This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).

fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()

fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.

btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
    Use fil_space_acquire()/release() to access fil_space_t.

buf_page_decrypt_after_read():
    Use fil_space_get_crypt_data() because at this point
    we might not yet have read page 0.

fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().

fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.

fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.

check_table_options()
	Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*

i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.
2017-03-14 16:23:10 +02:00
Marko Mäkelä
9dc10d5851 Merge 10.0 into 10.1 2017-03-13 19:17:34 +02:00
Sergei Golubchik
b6a1d6538b compiler warnings 2017-03-10 18:21:22 +01:00
Marko Mäkelä
032678ad18 MDEV-12091 Shutdown fails to wait for rollback of recovered transactions to finish
In the 10.1 InnoDB Plugin, a call os_event_free(buf_flush_event) was
misplaced. The event could be signalled by rollback of resurrected
transactions while shutdown was in progress. This bug was caught
by cmake -DWITH_ASAN testing. This call was only present in the
10.1 InnoDB Plugin, not in other versions, or in XtraDB.

That said, the bug affects all InnoDB versions. Shutdown assumes the
cessation of any page-dirtying activity, including the activity of
the background rollback thread. InnoDB only waited for the background
rollback to finish as part of a slow shutdown (innodb_fast_shutdown=0).
The default is a clean shutdown (innodb_fast_shutdown=1). In a scenario
where InnoDB is killed, restarted, and shut down soon enough, the data
files could become corrupted.

logs_empty_and_mark_files_at_shutdown(): Wait for the
rollback to finish, except if innodb_fast_shutdown=2
(crash-like shutdown) was requested.

trx_rollback_or_clean_recovered(): Before choosing the next
recovered transaction to roll back, terminate early if non-slow
shutdown was initiated. Roll back everything on slow shutdown
(innodb_fast_shutdown=0).

srv_innodb_monitor_mutex: Declare as static, because the mutex
is only used within one module.

After each call to os_event_free(), ensure that the freed event
is not reachable via global variables, by setting the relevant
variables to NULL.
2017-03-10 18:54:29 +02:00
Marko Mäkelä
498f4a825b Fix InnoDB/XtraDB compilation warnings on 32-bit builds. 2017-03-09 08:54:07 +02:00
Marko Mäkelä
ad0c218a44 Merge 10.0 into 10.1
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:

recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).

Report progress also via systemd using sd_notifyf().
2017-03-09 08:53:08 +02:00
Marko Mäkelä
47396ddea9 Merge 5.5 into 10.0
Also, implement MDEV-11027 a little differently from 5.5:

recv_sys_t::report(ib_time_t): Determine whether progress should
be reported.

recv_apply_hashed_log_recs(): Rename the parameter to last_batch.
2017-03-08 11:40:43 +02:00
Marko Mäkelä
9c47beb8bd MDEV-11027 InnoDB log recovery is too noisy
Provide more useful progress reporting of crash recovery.

recv_sys_t::progress_time: The time of the last report.

recv_scan_print_counter: Remove.

log_group_read_log_seg(): After after each I/O request,
report progress if needed.

recv_apply_hashed_log_recs(): At the start of each batch,
if there are pages to be recovered, issue a message.
2017-03-08 10:07:50 +02:00
Vicențiu Ciorbaru
83da1a1e57 Merge branch 'merge-xtradb-5.6' into 10.0 2017-03-05 00:59:57 +02:00
Vicențiu Ciorbaru
8d69ce7b82 5.6.35-80.0 2017-03-04 20:50:02 +02:00
Marko Mäkelä
adc91387e3 Merge 10.0 into 10.1 2017-03-03 13:27:12 +02:00
Marko Mäkelä
fc673a2c12 MDEV-12127 InnoDB: Assertion failure loop_count < 5 in file log0log.cc
As suggested in MySQL Bug#58536, increase the limit in this
debug assertion in order to avoid false positives on heavily
loaded systems.
2017-02-28 09:54:12 +02:00
Marko Mäkelä
13493078e9 MDEV-11802 innodb.innodb_bug14676111 fails
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.

It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.

For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.

os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.

os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
2017-02-20 12:20:52 +02:00
Marko Mäkelä
9017a05d87 Merge 10.0 into 10.1 2017-02-08 17:30:25 +02:00
Marko Mäkelä
d831e4c22a MDEV-12024 InnoDB startup fails to wait for recv_writer_thread to finish
recv_writer_thread(): Do not assign recv_writer_thread_active=true
in order to avoid a race condition with
recv_recovery_from_checkpoint_finish().

recv_init_crash_recovery(): Assign recv_writer_thread_active=true
before creating recv_writer_thread.
2017-02-08 17:23:13 +02:00
Marko Mäkelä
2e67e66c3a Merge 10.0 into 10.1 2017-02-08 08:53:34 +02:00
Marko Mäkelä
20e8347447 MDEV-11985 Make innodb_read_only shutdown more robust
If InnoDB is started in innodb_read_only mode such that
recovered incomplete transactions exist at startup
(but the redo logs are clean), an assertion will fail at shutdown,
because there would exist some non-prepared transactions.

logs_empty_and_mark_files_at_shutdown(): Do not wait for incomplete
transactions to finish if innodb_read_only or innodb_force_recovery>=3.
Wait for purge to finish in only one place.

trx_sys_close(): Relax the assertion that would fail first.

trx_free_prepared(): Also free recovered TRX_STATE_ACTIVE transactions
if innodb_read_only or innodb_force_recovery>=3.
2017-02-04 17:33:19 +02:00
Marko Mäkelä
f1f8ebc325 Merge 10.0 into 10.1 2017-01-26 23:40:11 +02:00
Marko Mäkelä
afb461587c MDEV-11915 Detect InnoDB system tablespace size mismatch early
InnoDB would refuse to start up if there is a mismatch on
the size of the system tablespace files. However, before this
check is conducted, the system tablespace may already have been
heavily modified.

InnoDB should perform the size check as early as possible.

recv_recovery_from_checkpoint_finish():
Move the recv_apply_hashed_log_recs() call to
innobase_start_or_create_for_mysql().

innobase_start_or_create_for_mysql(): Test the mutex functionality
before doing anything else. Use a compile_time_assert() for a
sizeof() constraint. Check the size of the system tablespace as
early as possible.
2017-01-26 23:10:36 +02:00
Marko Mäkelä
49fe9bad01 MDEV-11814 Refuse innodb_read_only startup if crash recovery is needed
recv_scan_log_recs(): Remember if redo log apply is needed,
even if starting up in innodb_read_only mode.

recv_recovery_from_checkpoint_start_func(): Refuse
innodb_read_only startup if redo log apply is needed.
2017-01-26 13:58:58 +02:00
Marko Mäkelä
719321e78e MDEV-11638 Encryption causes race conditions in InnoDB shutdown
InnoDB shutdown failed to properly take fil_crypt_thread() into account.
The encryption threads were signalled to shut down together with other
non-critical tasks. This could be much too early in case of slow shutdown,
which could need minutes to complete the purge. Furthermore, InnoDB
failed to wait for the fil_crypt_thread() to actually exit before
proceeding to the final steps of shutdown, causing the race conditions.

Furthermore, the log_scrub_thread() was shut down way too early.
Also it should remain until the SRV_SHUTDOWN_FLUSH_PHASE.

fil_crypt_threads_end(): Remove. This would cause the threads to
be terminated way too early.

srv_buf_dump_thread_active, srv_dict_stats_thread_active,
lock_sys->timeout_thread_active, log_scrub_thread_active,
srv_monitor_active, srv_error_monitor_active: Remove a race condition
between startup and shutdown, by setting these in the startup thread
that creates threads, not in each created thread. In this way, once the
flag is cleared, it will remain cleared during shutdown.

srv_n_fil_crypt_threads_started, fil_crypt_threads_event: Declare in
global rather than static scope.

log_scrub_event, srv_log_scrub_thread_active, log_scrub_thread():
Declare in static rather than global scope. Let these be created by
log_init() and freed by log_shutdown().

rotate_thread_t::should_shutdown(): Do not shut down before the
SRV_SHUTDOWN_FLUSH_PHASE.

srv_any_background_threads_are_active(): Remove. These checks now
exist in logs_empty_and_mark_files_at_shutdown().

logs_empty_and_mark_files_at_shutdown(): Shut down the threads in
the proper order. Keep fil_crypt_thread() and log_scrub_thread() alive
until SRV_SHUTDOWN_FLUSH_PHASE, and check that they actually terminate.
2017-01-05 00:20:06 +02:00
Marko Mäkelä
8451e09073 MDEV-11556 InnoDB redo log apply fails to adjust data file sizes
fil_space_t::recv_size: New member: recovered tablespace size in pages;
0 if no size change was read from the redo log,
or if the size change was implemented.

fil_space_set_recv_size(): New function for setting space->recv_size.

innodb_data_file_size_debug: A debug parameter for setting the system
tablespace size in recovery even when the redo log does not contain
any size changes. It is hard to write a small test case that would
cause the system tablespace to be extended at the critical moment.

recv_parse_log_rec(): Note those tablespaces whose size is being changed
by the redo log, by invoking fil_space_set_recv_size().

innobase_init(): Correct an error message, and do not require a larger
innodb_buffer_pool_size when starting up with a smaller innodb_page_size.

innobase_start_or_create_for_mysql(): Allow startup with any initial
size of the ibdata1 file if the autoextend attribute is set. Require
the minimum size of fixed-size system tablespaces to be 640 pages,
not 10 megabytes. Implement innodb_data_file_size_debug.

open_or_create_data_files(): Round the system tablespace size down
to pages, not to full megabytes, (Our test truncates the system
tablespace to more than 800 pages with innodb_page_size=4k.
InnoDB should not imagine that it was truncated to 768 pages
and then overwrite good pages in the tablespace.)

fil_flush_low(): Refactored from fil_flush().

fil_space_extend_must_retry(): Refactored from
fil_extend_space_to_desired_size().

fil_mutex_enter_and_prepare_for_io(): Extend the tablespace if
fil_space_set_recv_size() was called.

The test case has been successfully run with all the
innodb_page_size values 4k, 8k, 16k, 32k, 64k.
2016-12-30 09:52:24 +02:00
Marko Mäkelä
d50cf42bc0 MDEV-9282 Debian: the Lintian complains about "shlib-calls-exit" in ha_innodb.so
Replace all exit() calls in InnoDB with abort() [possibly via ut_a()].
Calling exit() in a multi-threaded program is problematic also for
the reason that other threads could see corrupted data structures
while some data structures are being cleaned up by atexit() handlers
or similar.

In the long term, all these calls should be replaced with something
that returns an error all the way up the call stack.
2016-12-28 15:54:24 +02:00