Commit graph

363 commits

Author SHA1 Message Date
Marko Mäkelä
2645bda5f2 MDEV-12253 post-fix: Do not leak memory in crash recovery
This fixes the cmake -DWITH_ASAN test failure that was mentioned
in commit f9cc391863 (merging
MDEV-12253 from 10.1 to 10.2).

fil_parse_write_crypt_data(): If the tablespace is not found,
invoke fil_space_destroy_crypt_data(&crypt_data) to properly
free the created object.

With this, the test encryption.innodb-redo-badkey still reports
"Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT"
but does not fail. The misleading message should be corrected,
maybe as part of MDEV-12699.
2017-05-09 13:40:42 +03:00
Marko Mäkelä
6c91be54b1 MDEV-11520 Properly retry posix_fallocate() on EINTR
We only want to retry posix_fallocate() on EINTR as long as the system
is not being shut down. We do not want to retry on any other (hard) error.

Thanks to Jocelyn Fournier for quickly noticing the mistake in my
previous commit.
2017-05-06 15:50:09 +03:00
Marko Mäkelä
baad0f3484 MDEV-11520 post-fix: Retry posix_fallocate() on EINTR in fil_ibd_create()
Earlier versions of MariaDB only use posix_fallocate() when extending
data files, not when initially creating the files,
2017-05-06 14:33:14 +03:00
Marko Mäkelä
f9cc391863 Merge 10.1 into 10.2
This only merges MDEV-12253, adapting it to MDEV-12602 which is already
present in 10.2 but not yet in the 10.1 revision that is being merged.

TODO: Error handling in crash recovery needs to be improved.
If a page cannot be decrypted (or read), we should cleanly abort
the startup. If innodb_force_recovery is specified, we should
ignore the problematic page and apply redo log to other pages.
Currently, the test encryption.innodb-redo-badkey randomly fails
like this (the last messages are from cmake -DWITH_ASAN):

2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption
2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown...
i=================================================================
==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0
    #0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750)
    #1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4
    #2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17
    #3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3
    #4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2
    #5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2
    #6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
2017-05-05 10:38:53 +03:00
Jan Lindström
acce1f37c2 MDEV-12624: encryption.innodb_encryption_tables fails in buildbot with timeout
This regression was caused by MDEV-12467 encryption.create_or_replace
hangs during DROP TABLE, where if table->is_stopping() (i.e. when
tablespace is dropped) background key rotation thread calls
fil_crypt_complete_rotate_space to release space and stop rotation.
However, that function does not decrease number of rotating
threads if table->is_stopping() is true.
2017-05-02 08:09:16 +03:00
Sergei Golubchik
0072d2e9a1 InnoDB cleanup: remove a bunch of #ifdef UNIV_INNOCHECKSUM
innochecksum uses global variables. great, let's use them all the
way down, instead of passing them as arguments to innodb internals,
conditionally modifying function prototypes with #ifdefs
2017-04-30 14:58:11 +02:00
Jan Lindström
935a1c676e MDEV-12623: InnoDB: Failing assertion: kv == 0
|| kv >= crypt_data->min_key_version,
encryption.innodb_encryption_tables failed in buildbot.

Now that key_version is not stored when page is read to
buf_page_t::key_version but always read from actual page
this assertion is not always valid.
2017-04-29 10:05:39 +03:00
Marko Mäkelä
b82c602db5 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

The following parts of this fix differ from the 10.2 version of this fix:

buf_page_get_corrupt(): Add a tablespace parameter.

In 10.2, we already had a two-phase process of freeing fil_space objects
(first, fil_space_detach(), then release fil_system->mutex, and finally
free the fil_space and fil_node objects).

fil_space_free_and_mutex_exit(): Renamed from fil_space_free().
Detach the tablespace from the fil_system cache, release the
fil_system->mutex, and then wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread.
During the wait, future calls to fil_space_acquire_for_io() will
not find this tablespace, and the count can only be decremented to 0,
at which point it is safe to free the objects.

fil_node_free_part1(), fil_node_free_part2(): Refactored from
fil_node_free().
2017-04-28 14:12:52 +03:00
Marko Mäkelä
4b24467ff3 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

fil_space_free_low(): Wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread. Future
calls to fil_space_acquire_for_io() will not find this tablespace,
because it will already have been detached from fil_system.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

FIXME: buf_page_check_corrupt() should take a tablespace from
fil_space_acquire_for_io() as a parameter. This will be done
in the 10.1 version of this patch and merged from there.
That depends on MDEV-12253, which has not been merged from 10.1 yet.
2017-04-28 12:23:35 +03:00
Marko Mäkelä
f740d23ce6 Merge 10.1 into 10.2 2017-04-28 12:22:32 +03:00
Thirunarayanan Balathandayuthapani
2ef1baa75f Bug #24793413 LOG PARSING BUFFER OVERFLOW
Problem:
========
During checkpoint, we are writing all MLOG_FILE_NAME records in one mtr
and parse buffer can't be processed till MLOG_MULTI_REC_END. Eventually parse
buffer exceeds the RECV_PARSING_BUF_SIZE and eventually it overflows.

Fix:
===
1) Break the large mtr if it exceeds LOG_CHECKPOINT_FREE_PER_THREAD into multiple mtr during checkpoint.
2) Move the parsing buffer if we are encountering only MLOG_FILE_NAME
records. So that it will never exceed the RECV_PARSING_BUF_SIZE.

Reviewed-by: Debarun Bannerjee <debarun.bannerjee@oracle.com>
Reviewed-by: Rahul M Malik <rahul.m.malik@oracle.com>
RB: 14743
2017-04-26 23:03:32 +03:00
Marko Mäkelä
849af74a48 MariaDB adjustments for Oracle Bug#23070734 fix
Split the test case so that a server restart is not needed.
Reduce the test cases and use a simpler mechanism for triggering
and waiting for purge.

fil_table_accessible(): Check if a table can be accessed without
enjoying MDL protection.
2017-04-26 23:03:32 +03:00
Aditya A
62dca454e7 Bug #23070734 CONCURRENT TRUNCATE TABLES CAUSE STALLS
PROBLEM

When truncating single tablespace tables, we need to scan the entire
buffer pool to remove the pages of the table from the buffer pool.
During this scan and removal dict_sys->mutex is being held ,causing
stalls in other DDL operations.

FIX

Release the dict_sys->mutex during the scan and reacquire it after the
scan. Make sure that purge thread doesn't purge the records of the table
being truncated and background stats collection thread skips the updation
of stats for the table being truncated.

[#rb 14564 Approved by Jimmy and satya ]
2017-04-26 23:03:31 +03:00
Jan Lindström
765a43605a MDEV-12253: Buffer pool blocks are accessed after they have been freed
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.

This patch should also address following test failures and
bugs:

MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.

MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot

MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot

MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8

Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.

Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().

Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.

btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.

buf_page_get_zip(): Issue error if page read fails.

buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.

buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.

buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.

buf_page_io_complete(): use dberr_t for error handling.

buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
        Issue error if page read fails.

btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.

Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()

dict_stats_update_transient_for_index(),
dict_stats_update_transient()
        Do not continue if table decryption failed or table
        is corrupted.

dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.

fil_parse_write_crypt_data():
        Check that key read from redo log entry is found from
        encryption plugin and if it is not, refuse to start.

PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.

Fixed error code on innodb.innodb test.

Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2.  Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.

Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.

fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.

Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.

Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.

fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.

recv_apply_hashed_log_recs() does not return anything.
2017-04-26 15:19:16 +03:00
Marko Mäkelä
14d124880f Fix a crash when page_compression fails during IMPORT TABLESPACE
fil_compress_page(): Check for space==NULL.
2017-04-21 18:44:37 +03:00
Marko Mäkelä
200ef51344 Fix a compilation error 2017-04-21 18:29:50 +03:00
Marko Mäkelä
0871a00a62 MDEV-12545 Reduce the amount of fil_space_t lookups
buf_flush_write_block_low(): Acquire the tablespace reference once,
and pass it to lower-level functions. This is only a start; further
calls may be removed.

fil_decompress_page(): Remove unsafe use of fil_space_get_by_id().
2017-04-21 18:12:10 +03:00
Marko Mäkelä
b66e15ecbc MDEV-12467 encryption.create_or_replace hangs during DROP TABLE
fil_crypt_thread(): Do invoke fil_crypt_complete_rotate_space()
when the tablespace is about to be dropped. Also, remove a redundant
check whether rotate_thread_t::space is NULL. It can only become
NULL when fil_crypt_find_space_to_rotate() returns false, and in
that case we would already have terminated the loop.

fil_crypt_find_page_to_rotate(): Remove a redundant check for
space->crypt_data == NULL. Once encryption metadata has been
created for a tablespace, it cannot be removed without dropping
the entire tablespace.
2017-04-21 18:04:20 +03:00
Marko Mäkelä
47141c9d9b Fix some InnoDB type mismatch
On 64-bit Windows, sizeof(ulint)!=sizeof(ulong).
2017-04-21 18:04:02 +03:00
Marko Mäkelä
5684aa220c MDEV-12488 Remove type mismatch in InnoDB printf-like calls
Alias the InnoDB ulint and lint data types to size_t and ssize_t,
which are the standard names for the machine-word-width data types.

Correspondingly, define ULINTPF as "%zu" and introduce ULINTPFx as "%zx".
In this way, better compiler warnings for type mismatch are possible.

Furthermore, use PRIu64 for that 64-bit format, and define
the feature macro __STDC_FORMAT_MACROS to enable it on Red Hat systems.

Fix some errors in error messages, and replace some error messages
with assertions.
Most notably, an IMPORT TABLESPACE error message in InnoDB was
displaying the number of columns instead of the mismatching flags.
2017-04-21 18:03:15 +03:00
Marko Mäkelä
996c7d5cb5 MDEV-12545 Reduce the amount of fil_space_t lookups
buf_flush_write_block_low(): Acquire the tablespace reference once,
and pass it to lower-level functions. This is only a start; further
calls may be removed later.
2017-04-21 17:47:23 +03:00
Marko Mäkelä
555e52f3bc MDEV-12467 encryption.create_or_replace hangs during DROP TABLE
fil_crypt_thread(): Do invoke fil_crypt_complete_rotate_space()
when the tablespace is about to be dropped. Also, remove a redundant
check whether rotate_thread_t::space is NULL. It can only become
NULL when fil_crypt_find_space_to_rotate() returns false, and in
that case we would already have terminated the loop.

fil_crypt_find_page_to_rotate(): Remove a redundant check for
space->crypt_data == NULL. Once encryption metadata has been
created for a tablespace, it cannot be removed without dropping
the entire tablespace.
2017-04-21 17:47:15 +03:00
Marko Mäkelä
aafaf05a47 Fix some InnoDB type mismatch introduced in 10.1
On 64-bit Windows, sizeof(ulint)!=sizeof(ulong).
2017-04-21 17:43:35 +03:00
Marko Mäkelä
1494147cf6 Merge 10.1 into 10.2 2017-04-06 09:52:25 +03:00
Jan Lindström
85239bdfeb Merge pull request #350 from grooverdan/10.1-TRX_SYS_PAGE_NO
fil_crypt_rotate_page - space_id should be compared to TRX_SYS_SPACE not space
2017-04-05 08:40:47 +03:00
Daniel Black
9a218f4fb8 fil_crypt_rotate_page - space_id should be compared to TRX_SYS_SPACE not space
Fixes compile error that highlights problem:

/source/storage/innobase/fil/fil0crypt.cc: In function 'void fil_crypt_rotate_page(const key_state_t*, rotate_thread_t*)':
/source/storage/innobase/fil/fil0crypt.cc:1770:15: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
  if (space == TRX_SYS_SPACE && offset == TRX_SYS_PAGE_NO) {

Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
2017-04-04 15:47:21 +10:00
Marko Mäkelä
6e105d7535 Merge 10.1 into 10.2 2017-04-04 07:59:25 +03:00
Marko Mäkelä
9505c96839 MDEV-12428 SIGSEGV in buf_page_decrypt_after_read() during DDL
Also, some MDEV-11738/MDEV-11581 post-push fixes.

In MariaDB 10.1, there is no fil_space_t::is_being_truncated field,
and the predicates fil_space_t::stop_new_ops and fil_space_t::is_stopping()
are interchangeable. I requested the fil_space_t::is_stopping() to be added
in the review, but some added checks for fil_space_t::stop_new_ops were
not replaced with calls to fil_space_t::is_stopping().

buf_page_decrypt_after_read(): In this low-level I/O operation, we must
look up the tablespace if it exists, even though future I/O operations
have been blocked on it due to a pending DDL operation, such as DROP TABLE
or TRUNCATE TABLE or other table-rebuilding operations (ALTER, OPTIMIZE).
Pass a parameter to fil_space_acquire_low() telling that we are performing
a low-level I/O operation and the fil_space_t::is_stopping() status should
be ignored.
2017-04-03 22:09:28 +03:00
Sergei Golubchik
da4d71d10d Merge branch '10.1' into 10.2 2017-03-30 12:48:42 +02:00
Jan Lindström
b1ec35b903 Add assertions when key rotation list is used. 2017-03-16 17:30:13 +02:00
Jan Lindström
50eb40a2a8 MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing
MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off

Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.

This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).

fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()

fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.

btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
    Use fil_space_acquire()/release() to access fil_space_t.

buf_page_decrypt_after_read():
    Use fil_space_get_crypt_data() because at this point
    we might not yet have read page 0.

fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().

fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.

fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.

check_table_options()
	Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*

i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.
2017-03-14 16:23:10 +02:00
Marko Mäkelä
9dc10d5851 Merge 10.0 into 10.1 2017-03-13 19:17:34 +02:00
Vladislav Vaintroub
a98009ab02 MDEV-12201 innodb_flush_method are not available on Windows
Remove srv_win_file_flush_method

- Rename srv_unix_file_flush_method to srv_file_flush_method, and
  rename constants to remove UNIX from them, i.e SRV_UNIX_FSYNC=>SRV_FSYNC

- Add SRV_ALL_O_DIRECT_FSYNC corresponding to current Windows default
(no buffering for either log or data, flush on both log and data)

- change os_file_open on Windows to behave identically to Unix wrt
O_DIRECT and O_DSYNC settings. map O_DIRECT to FILE_FLAG_NO_BUFFERING and
O_DSYNC to FILE_FLAG_WRITE_THROUGH

- remove various #ifdef _WIN32
2017-03-09 19:19:38 +00:00
Marko Mäkelä
b28adb6a62 Fix an error introduced in the previous commit.
fil_parse_write_crypt_data(): Correct the comparison operator.
This was broken in commit 498f4a825b
which removed a signed/unsigned mismatch in these comparisons.
2017-03-09 15:09:44 +02:00
Marko Mäkelä
498f4a825b Fix InnoDB/XtraDB compilation warnings on 32-bit builds. 2017-03-09 08:54:07 +02:00
Marko Mäkelä
89d80c1b0b Fix many -Wconversion warnings.
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong.  Change some parameters to this type.

Use size_t in a few more places.

Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.

When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.

In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
2017-03-07 19:07:27 +02:00
Marko Mäkelä
68d632bc5a Replace some functions with macros.
This is a non-functional change.

On a related note, the calls fil_system_enter() and fil_system_exit()
are often used in an unsafe manner. The fix of MDEV-11738 should
introduce fil_space_acquire() and remove potential race conditions.
2017-03-06 10:02:01 +02:00
Marko Mäkelä
adc91387e3 Merge 10.0 into 10.1 2017-03-03 13:27:12 +02:00
Marko Mäkelä
29c776cfd1 MDEV-11520: Retry posix_fallocate() after EINTR.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
2017-03-03 12:03:33 +02:00
Marko Mäkelä
78153cf641 MDEV-11927 InnoDB change buffer is not being merged
to tables in the system tablespace

This is a regression caused by MDEV-11585, which accidentally
changed Tablespace::is_undo_tablespace() in an incorrect way,
causing the InnoDB system tablespace to be reported as a dedicated
undo tablespace, for which the change buffer is not applicable.

Tablespace::is_undo_tablespace(): Remove. There were only 2
calls from the function buf_page_io_complete(). Replace those
calls as appropriate.

Also, merge changes to tablespace import/export tests from
MySQL 5.7, and clean up the tests a little further, allowing
them to be run with any innodb_page_size.

Remove duplicated error injection instrumentation for the
import/export tests.  In MySQL 5.7, the error injection label
buf_page_is_corrupt_failure was renamed to
buf_page_import_corrupt_failure.

fil_space_extend_must_retry(): Correct a debug assertion
(tablespaces can be extended during IMPORT), and remove a
TODO comment about compressed temporary tables that was
already addressed in MDEV-11816.

dict_build_tablespace_for_table(): Correct a comment that
no longer holds after MDEV-11816, and assert that
ROW_FORMAT=COMPRESSED can only be used in .ibd files.
2017-02-24 22:16:33 +02:00
Marko Mäkelä
ec4cf111c0 MDEV-11520 after-merge fix for 10.1: Use sparse files.
If page_compression (introduced in MariaDB Server 10.1) is enabled,
the logical action is to not preallocate space to the data files,
but to only logically extend the files with zeroes.

fil_create_new_single_table_tablespace(): Create smaller files for
ROW_FORMAT=COMPRESSED tables, but adhere to the minimum file size of
4*innodb_page_size.

fil_space_extend_must_retry(), os_file_set_size(): On Windows,
use SetFileInformationByHandle() and FILE_END_OF_FILE_INFO,
which depends on bumping _WIN32_WINNT to 0x0600.
FIXME: The files are not yet set up as sparse, so
this will currently end up physically extending (preallocating)
the files, wasting storage for unused pages.

os_file_set_size(): Add the parameter "bool sparse=false" to declare
that the file is to be extended logically, instead of being preallocated.
The only caller with sparse=true is
fil_create_new_single_table_tablespace().
(The system tablespace cannot be created with page_compression.)

fil_space_extend_must_retry(), os_file_set_size(): Outside Windows,
use ftruncate() to extend files that are supposed to be sparse.
On systems where ftruncate() is limited to files less than 4GiB
(if there are any), fil_space_extend_must_retry() retains the
old logic of physically extending the file.
2017-02-22 22:29:56 +02:00
Marko Mäkelä
e1e920bf63 Merge 10.0 into 10.1 2017-02-22 15:53:05 +02:00
Marko Mäkelä
a0ce92ddc7 MDEV-11520 post-fix
fil_extend_space_to_desired_size(): Use a proper type cast when
computing start_offset for the posix_fallocate() call on 32-bit systems
(where sizeof(ulint) < sizeof(os_offset_t)). This could affect 32-bit
systems when extending files that are at least 4 MiB long.

This bug existed in MariaDB 10.0 before MDEV-11520. In MariaDB 10.1
it had been fixed in MDEV-11556.
2017-02-22 12:32:17 +02:00
Marko Mäkelä
81695ab8b5 MDEV-11520 Extending an InnoDB data file unnecessarily allocates
a large memory buffer on Windows

fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate(), and pass the correct arguments to
posix_fallocate().

On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)

I tested that the file extension works properly with a multi-file
system tablespace, both with --innodb-use-fallocate and
--skip-innodb-use-fallocate (the default):

./mtr \
--mysqld=--innodb-use-fallocate \
--mysqld=--innodb-autoextend-increment=1 \
--mysqld=--innodb-data-file-path='ibdata1:5M;ibdata2:5M:autoextend' \
--parallel=auto --force --retry=0 --suite=innodb &

ls -lsh mysql-test/var/*/mysqld.1/data/ibdata2
(several samples while running the test)
2017-02-22 12:21:44 +02:00
Marko Mäkelä
32170cafad MDEV-12075 innodb_use_fallocate does not work in MariaDB Server 10.1.21
fil_space_extend_must_retry(): When innodb_use_fallocate=ON,
initialize pages_added = size - space->size so that posix_fallocate()
will actually attempt to extend the file, instead of keeping the same size.

This is a regression from MDEV-11556 which refactored
the InnoDB data file extension.
2017-02-16 11:12:24 +02:00
Sergei Golubchik
6c4144a468 InnoDB: suppress posix_fallocate() failure errors when EINVAL
EINVAL means that the filesystem doesn't support posix_fallocate().

There were two places where this error was issued, one checked for
EINVAL, the other did not. This commit fixed the other place
to also check for EINVAL.

Also, remove the space after the REFMAN to get the valid url
with no space in the middle.

Also don't say "Make sure the file system supports this function."
when posix_fallocate() fails, because this message is only shown
when the filesystem does support this function.
2017-02-13 18:12:04 +01:00
Jan Lindström
41cd80fe06 After review fixes. 2017-02-10 16:05:37 +02:00
Marko Mäkelä
99b2de92c6 Post-push fix for MDEV-11623: Remove an unused variable. 2017-02-09 09:36:10 +02:00
Jan Lindström
0340067608 After review fixes for MDEV-11759.
buf_page_is_checksum_valid_crc32()
buf_page_is_checksum_valid_innodb()
buf_page_is_checksum_valid_none():
	Use ULINTPF instead of %lu and %u for ib_uint32_t

fil_space_verify_crypt_checksum():
	Check that page is really empty if checksum and
	LSN are zero.

fil_space_verify_crypt_checksum():
	Correct the comment to be more agurate.

buf0buf.h:
	Remove unnecessary is_corrupt variable from
	buf_page_t structure.
2017-02-09 08:49:13 +02:00
Marko Mäkelä
cbdc389ec9 MDEV-12022 InnoDB wrongly ignores the end of an .ibd file
InnoDB can wrongly ignore the end of data files when using
innodb_page_size=32k or innodb_page_size=64k. These page sizes
use an allocation extent size of 2 or 4 megabytes, not 1 megabyte.

This issue does not affect MariaDB Server 10.2, which is using
the correct WL#5757 code from MySQL 5.7.

That said, it does not make sense to ignore the tail of data files.
The next time the data file needs to be extended, it would be extended
to a multiple of the extent size, once the size exceeds one extent.
2017-02-08 11:35:35 +02:00