Commit graph

4018 commits

Author SHA1 Message Date
Marko Mäkelä
b82c602db5 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

The following parts of this fix differ from the 10.2 version of this fix:

buf_page_get_corrupt(): Add a tablespace parameter.

In 10.2, we already had a two-phase process of freeing fil_space objects
(first, fil_space_detach(), then release fil_system->mutex, and finally
free the fil_space and fil_node objects).

fil_space_free_and_mutex_exit(): Renamed from fil_space_free().
Detach the tablespace from the fil_system cache, release the
fil_system->mutex, and then wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread.
During the wait, future calls to fil_space_acquire_for_io() will
not find this tablespace, and the count can only be decremented to 0,
at which point it is safe to free the objects.

fil_node_free_part1(), fil_node_free_part2(): Refactored from
fil_node_free().
2017-04-28 14:12:52 +03:00
Sergei Golubchik
6935d66053 MDEV-12591 MariaDB embedded server refuses start with innodb_plugin
innodb calls sd_notify(), it must be linked with libsystemd
2017-04-27 19:12:45 +02:00
Vladislav Vaintroub
db39107413 MDEV-11663 Create services for functionality used by plugins
Added service for
- encryption (AES)
- error reporting, e.g my_printf_error()
2017-04-27 19:12:38 +02:00
Jan Lindström
765a43605a MDEV-12253: Buffer pool blocks are accessed after they have been freed
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.

This patch should also address following test failures and
bugs:

MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.

MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot

MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot

MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8

Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.

Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().

Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.

btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.

buf_page_get_zip(): Issue error if page read fails.

buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.

buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.

buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.

buf_page_io_complete(): use dberr_t for error handling.

buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
        Issue error if page read fails.

btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.

Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()

dict_stats_update_transient_for_index(),
dict_stats_update_transient()
        Do not continue if table decryption failed or table
        is corrupted.

dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.

fil_parse_write_crypt_data():
        Check that key read from redo log entry is found from
        encryption plugin and if it is not, refuse to start.

PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.

Fixed error code on innodb.innodb test.

Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2.  Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.

Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.

fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.

Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.

Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.

fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.

recv_apply_hashed_log_recs() does not return anything.
2017-04-26 15:19:16 +03:00
Marko Mäkelä
200ef51344 Fix a compilation error 2017-04-21 18:29:50 +03:00
Marko Mäkelä
996c7d5cb5 MDEV-12545 Reduce the amount of fil_space_t lookups
buf_flush_write_block_low(): Acquire the tablespace reference once,
and pass it to lower-level functions. This is only a start; further
calls may be removed later.
2017-04-21 17:47:23 +03:00
Marko Mäkelä
555e52f3bc MDEV-12467 encryption.create_or_replace hangs during DROP TABLE
fil_crypt_thread(): Do invoke fil_crypt_complete_rotate_space()
when the tablespace is about to be dropped. Also, remove a redundant
check whether rotate_thread_t::space is NULL. It can only become
NULL when fil_crypt_find_space_to_rotate() returns false, and in
that case we would already have terminated the loop.

fil_crypt_find_page_to_rotate(): Remove a redundant check for
space->crypt_data == NULL. Once encryption metadata has been
created for a tablespace, it cannot be removed without dropping
the entire tablespace.
2017-04-21 17:47:15 +03:00
Marko Mäkelä
d23eb8e6d2 Follow-up to MDEV-12488: Fix some type mismatch in header files
This reduces the number of compilation warnings on Windows.
2017-04-21 17:47:05 +03:00
Marko Mäkelä
aafaf05a47 Fix some InnoDB type mismatch introduced in 10.1
On 64-bit Windows, sizeof(ulint)!=sizeof(ulong).
2017-04-21 17:43:35 +03:00
Marko Mäkelä
7445ff84f4 Merge 10.0 into 10.1 2017-04-21 17:41:40 +03:00
Marko Mäkelä
e056d1f1ca Fix some InnoDB type mismatch
On 64-bit Windows, sizeof(ulint)!=sizeof(ulong).
2017-04-21 17:39:12 +03:00
Marko Mäkelä
e48ae21b0e Follow-up to MDEV-12534: Fix warnings on 32-bit systems 2017-04-21 16:22:46 +03:00
Marko Mäkelä
8c38147cdd Merge 10.0 into 10.1 2017-04-21 12:46:12 +03:00
Marko Mäkelä
87b6df31c4 MDEV-12488 Remove type mismatch in InnoDB printf-like calls
This is a reduced version of an originally much larger patch.
We will keep the definition of the ulint, lint data types unchanged,
and we will not be replacing fprintf() calls with ib_logf().

On Windows, use the standard format strings instead of nonstandard
extensions.

This patch fixes some errors in format strings.
Most notably, an IMPORT TABLESPACE error message in InnoDB was
displaying the number of columns instead of the mismatching flags.
2017-04-21 12:06:29 +03:00
Marko Mäkelä
d34a67b067 MDEV-12534 Use atomic operations whenever available
Allow 64-bit atomic operations on 32-bit systems,
only relying on HAVE_ATOMIC_BUILTINS_64, disregarding
the width of the register file.

Define UNIV_WORD_SIZE correctly on all systems, including Windows.
In MariaDB 10.0 and 10.1, it was incorrectly defined as 4 on
64-bit Windows.

Define HAVE_ATOMIC_BUILTINS_64 on Windows
(64-bit atomics are available on both 32-bit and 64-bit Windows
platforms; the operations were unnecessarily disabled even on
64-bit Windows).

MONITOR_OS_PENDING_READS, MONITOR_OS_PENDING_WRITES: Enable by default.

os_file_n_pending_preads, os_file_n_pending_pwrites,
os_n_pending_reads, os_n_pending_writes: Remove.
Use the monitor counters instead.

os_file_count_mutex: Remove. On a system that does not support
64-bit atomics, monitor_mutex will be used instead.
2017-04-20 16:29:12 +03:00
Marko Mäkelä
3bb32e8682 MDEV-10509: Remove excessive server error logging on InnoDB ALTER TABLE
Disable the output that was added in MDEV-6812 if log_warnings=2 or less.
Also, remove some redundant messages.

TODO: Implement MDEV-12512 to supercede MDEV-6812 and properly report
the progress of ALTER TABLE…ALGORITHM=INPLACE.
2017-04-17 03:50:38 +03:00
Marko Mäkelä
16b2b1eae5 Use page_is_leaf() where applicable 2017-04-17 03:50:10 +03:00
Sachin Setiya
cdd1dc829b MW-28, codership/mysql-wsrep#28 Fix sync_thread_levels debug assert
Introduced a new wsrep_trx_print_locking() which may be called
under lock_sys->mutex if the trx has locks.

Signed-off-by: Sachin Setiya <sachin.setiya@mariadb.com>
2017-04-06 15:41:54 +05:30
Marko Mäkelä
25d69ea012 MDEV-12198 innodb_defragment=1 crashes server on OPTIMIZE TABLE when FULLTEXT index exists
ha_innobase::defragment_table(): Skip corrupted indexes and
FULLTEXT INDEX. In InnoDB, FULLTEXT INDEX is implemented with
auxiliary tables. We will not defragment them on OPTIMIZE TABLE.
2017-04-06 08:56:25 +03:00
Marko Mäkelä
8d4871a953 Merge 10.0 into 10.1 2017-04-06 08:53:59 +03:00
Marko Mäkelä
8e36216a06 Import two ALTER TABLE…ALGORITHM=INPLACE tests from MySQL 5.6.
Also, revert part of MDEV-7685 that added an InnoDB abort when
ALTER TABLE…ALGORITHM=INPLACE is reporting that it ran out of
file space.
2017-04-05 14:46:35 +03:00
Jan Lindström
85239bdfeb Merge pull request #350 from grooverdan/10.1-TRX_SYS_PAGE_NO
fil_crypt_rotate_page - space_id should be compared to TRX_SYS_SPACE not space
2017-04-05 08:40:47 +03:00
Daniel Black
9a218f4fb8 fil_crypt_rotate_page - space_id should be compared to TRX_SYS_SPACE not space
Fixes compile error that highlights problem:

/source/storage/innobase/fil/fil0crypt.cc: In function 'void fil_crypt_rotate_page(const key_state_t*, rotate_thread_t*)':
/source/storage/innobase/fil/fil0crypt.cc:1770:15: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
  if (space == TRX_SYS_SPACE && offset == TRX_SYS_PAGE_NO) {

Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
2017-04-04 15:47:21 +10:00
Marko Mäkelä
9505c96839 MDEV-12428 SIGSEGV in buf_page_decrypt_after_read() during DDL
Also, some MDEV-11738/MDEV-11581 post-push fixes.

In MariaDB 10.1, there is no fil_space_t::is_being_truncated field,
and the predicates fil_space_t::stop_new_ops and fil_space_t::is_stopping()
are interchangeable. I requested the fil_space_t::is_stopping() to be added
in the review, but some added checks for fil_space_t::stop_new_ops were
not replaced with calls to fil_space_t::is_stopping().

buf_page_decrypt_after_read(): In this low-level I/O operation, we must
look up the tablespace if it exists, even though future I/O operations
have been blocked on it due to a pending DDL operation, such as DROP TABLE
or TRUNCATE TABLE or other table-rebuilding operations (ALTER, OPTIMIZE).
Pass a parameter to fil_space_acquire_low() telling that we are performing
a low-level I/O operation and the fil_space_t::is_stopping() status should
be ignored.
2017-04-03 22:09:28 +03:00
Vladislav Vaintroub
d7c35a9992 Fix compiler error 2017-03-23 19:28:36 +00:00
Vladislav Vaintroub
e5b67a46bc MDEV-12345 Performance : replace calls to clock() inside trx_start_low() by THD::start_utime 2017-03-23 11:50:22 +00:00
Jan Lindström
b1ec35b903 Add assertions when key rotation list is used. 2017-03-16 17:30:13 +02:00
Jan Lindström
50eb40a2a8 MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing
MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off

Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.

This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).

fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()

fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.

btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
    Use fil_space_acquire()/release() to access fil_space_t.

buf_page_decrypt_after_read():
    Use fil_space_get_crypt_data() because at this point
    we might not yet have read page 0.

fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().

fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.

fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.

check_table_options()
	Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*

i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.
2017-03-14 16:23:10 +02:00
Marko Mäkelä
9dc10d5851 Merge 10.0 into 10.1 2017-03-13 19:17:34 +02:00
Sergei Golubchik
b6a1d6538b compiler warnings 2017-03-10 18:21:22 +01:00
Marko Mäkelä
032678ad18 MDEV-12091 Shutdown fails to wait for rollback of recovered transactions to finish
In the 10.1 InnoDB Plugin, a call os_event_free(buf_flush_event) was
misplaced. The event could be signalled by rollback of resurrected
transactions while shutdown was in progress. This bug was caught
by cmake -DWITH_ASAN testing. This call was only present in the
10.1 InnoDB Plugin, not in other versions, or in XtraDB.

That said, the bug affects all InnoDB versions. Shutdown assumes the
cessation of any page-dirtying activity, including the activity of
the background rollback thread. InnoDB only waited for the background
rollback to finish as part of a slow shutdown (innodb_fast_shutdown=0).
The default is a clean shutdown (innodb_fast_shutdown=1). In a scenario
where InnoDB is killed, restarted, and shut down soon enough, the data
files could become corrupted.

logs_empty_and_mark_files_at_shutdown(): Wait for the
rollback to finish, except if innodb_fast_shutdown=2
(crash-like shutdown) was requested.

trx_rollback_or_clean_recovered(): Before choosing the next
recovered transaction to roll back, terminate early if non-slow
shutdown was initiated. Roll back everything on slow shutdown
(innodb_fast_shutdown=0).

srv_innodb_monitor_mutex: Declare as static, because the mutex
is only used within one module.

After each call to os_event_free(), ensure that the freed event
is not reachable via global variables, by setting the relevant
variables to NULL.
2017-03-10 18:54:29 +02:00
Marko Mäkelä
0094b6581d Merge 10.0 into 10.1 2017-03-10 15:16:13 +02:00
Marko Mäkelä
5aacb861f2 WSREP: Use TRX_ID_FMT for trx_id_t in messages. 2017-03-09 15:11:24 +02:00
Marko Mäkelä
b28adb6a62 Fix an error introduced in the previous commit.
fil_parse_write_crypt_data(): Correct the comparison operator.
This was broken in commit 498f4a825b
which removed a signed/unsigned mismatch in these comparisons.
2017-03-09 15:09:44 +02:00
Marko Mäkelä
1b2b209519 Use correct integer format with printf-like functions. 2017-03-09 11:28:07 +02:00
Marko Mäkelä
498f4a825b Fix InnoDB/XtraDB compilation warnings on 32-bit builds. 2017-03-09 08:54:07 +02:00
Marko Mäkelä
ad0c218a44 Merge 10.0 into 10.1
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:

recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).

Report progress also via systemd using sd_notifyf().
2017-03-09 08:53:08 +02:00
Marko Mäkelä
74fe0e03d5 Remove unused declarations. 2017-03-08 11:46:34 +02:00
Marko Mäkelä
47396ddea9 Merge 5.5 into 10.0
Also, implement MDEV-11027 a little differently from 5.5:

recv_sys_t::report(ib_time_t): Determine whether progress should
be reported.

recv_apply_hashed_log_recs(): Rename the parameter to last_batch.
2017-03-08 11:40:43 +02:00
Marko Mäkelä
9c47beb8bd MDEV-11027 InnoDB log recovery is too noisy
Provide more useful progress reporting of crash recovery.

recv_sys_t::progress_time: The time of the last report.

recv_scan_print_counter: Remove.

log_group_read_log_seg(): After after each I/O request,
report progress if needed.

recv_apply_hashed_log_recs(): At the start of each batch,
if there are pages to be recovered, issue a message.
2017-03-08 10:07:50 +02:00
Marko Mäkelä
1fd3cc8c1f Fix a compiler warning. 2017-03-08 10:06:34 +02:00
Vicențiu Ciorbaru
c4f3e64c23 Merge branch 'bb-10.0-vicentiu' into 10.0 2017-03-06 21:50:42 +02:00
Marko Mäkelä
68d632bc5a Replace some functions with macros.
This is a non-functional change.

On a related note, the calls fil_system_enter() and fil_system_exit()
are often used in an unsafe manner. The fix of MDEV-11738 should
introduce fil_space_acquire() and remove potential race conditions.
2017-03-06 10:02:01 +02:00
Marko Mäkelä
adc91387e3 Merge 10.0 into 10.1 2017-03-03 13:27:12 +02:00
Marko Mäkelä
29c776cfd1 MDEV-11520: Retry posix_fallocate() after EINTR.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
2017-03-03 12:03:33 +02:00
Marko Mäkelä
6b8173b6e9 MDEV-11520: Retry posix_fallocate() after EINTR.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
2017-03-03 11:47:31 +02:00
Daniel Black
bc28b305e5 Remove warning: unused variable 'volatile_var' [-Wunused-variable]
This occured in gcc-6.2.1.

The variable wasn't used so was no need to be volatile either.

Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
2017-03-03 13:00:41 +04:00
Vicențiu Ciorbaru
1acfa942ed Merge branch '5.5' into 10.0 2017-03-03 01:37:54 +02:00
Marko Mäkelä
fc673a2c12 MDEV-12127 InnoDB: Assertion failure loop_count < 5 in file log0log.cc
As suggested in MySQL Bug#58536, increase the limit in this
debug assertion in order to avoid false positives on heavily
loaded systems.
2017-02-28 09:54:12 +02:00
Marko Mäkelä
ec4cf111c0 MDEV-11520 after-merge fix for 10.1: Use sparse files.
If page_compression (introduced in MariaDB Server 10.1) is enabled,
the logical action is to not preallocate space to the data files,
but to only logically extend the files with zeroes.

fil_create_new_single_table_tablespace(): Create smaller files for
ROW_FORMAT=COMPRESSED tables, but adhere to the minimum file size of
4*innodb_page_size.

fil_space_extend_must_retry(), os_file_set_size(): On Windows,
use SetFileInformationByHandle() and FILE_END_OF_FILE_INFO,
which depends on bumping _WIN32_WINNT to 0x0600.
FIXME: The files are not yet set up as sparse, so
this will currently end up physically extending (preallocating)
the files, wasting storage for unused pages.

os_file_set_size(): Add the parameter "bool sparse=false" to declare
that the file is to be extended logically, instead of being preallocated.
The only caller with sparse=true is
fil_create_new_single_table_tablespace().
(The system tablespace cannot be created with page_compression.)

fil_space_extend_must_retry(), os_file_set_size(): Outside Windows,
use ftruncate() to extend files that are supposed to be sparse.
On systems where ftruncate() is limited to files less than 4GiB
(if there are any), fil_space_extend_must_retry() retains the
old logic of physically extending the file.
2017-02-22 22:29:56 +02:00