Commit graph

3984 commits

Author SHA1 Message Date
Marko Mäkelä
b28adb6a62 Fix an error introduced in the previous commit.
fil_parse_write_crypt_data(): Correct the comparison operator.
This was broken in commit 498f4a825b
which removed a signed/unsigned mismatch in these comparisons.
2017-03-09 15:09:44 +02:00
Marko Mäkelä
498f4a825b Fix InnoDB/XtraDB compilation warnings on 32-bit builds. 2017-03-09 08:54:07 +02:00
Marko Mäkelä
ad0c218a44 Merge 10.0 into 10.1
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:

recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).

Report progress also via systemd using sd_notifyf().
2017-03-09 08:53:08 +02:00
Marko Mäkelä
74fe0e03d5 Remove unused declarations. 2017-03-08 11:46:34 +02:00
Marko Mäkelä
47396ddea9 Merge 5.5 into 10.0
Also, implement MDEV-11027 a little differently from 5.5:

recv_sys_t::report(ib_time_t): Determine whether progress should
be reported.

recv_apply_hashed_log_recs(): Rename the parameter to last_batch.
2017-03-08 11:40:43 +02:00
Marko Mäkelä
9c47beb8bd MDEV-11027 InnoDB log recovery is too noisy
Provide more useful progress reporting of crash recovery.

recv_sys_t::progress_time: The time of the last report.

recv_scan_print_counter: Remove.

log_group_read_log_seg(): After after each I/O request,
report progress if needed.

recv_apply_hashed_log_recs(): At the start of each batch,
if there are pages to be recovered, issue a message.
2017-03-08 10:07:50 +02:00
Marko Mäkelä
1fd3cc8c1f Fix a compiler warning. 2017-03-08 10:06:34 +02:00
Vicențiu Ciorbaru
c4f3e64c23 Merge branch 'bb-10.0-vicentiu' into 10.0 2017-03-06 21:50:42 +02:00
Marko Mäkelä
68d632bc5a Replace some functions with macros.
This is a non-functional change.

On a related note, the calls fil_system_enter() and fil_system_exit()
are often used in an unsafe manner. The fix of MDEV-11738 should
introduce fil_space_acquire() and remove potential race conditions.
2017-03-06 10:02:01 +02:00
Marko Mäkelä
adc91387e3 Merge 10.0 into 10.1 2017-03-03 13:27:12 +02:00
Marko Mäkelä
29c776cfd1 MDEV-11520: Retry posix_fallocate() after EINTR.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
2017-03-03 12:03:33 +02:00
Marko Mäkelä
6b8173b6e9 MDEV-11520: Retry posix_fallocate() after EINTR.
The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
2017-03-03 11:47:31 +02:00
Daniel Black
bc28b305e5 Remove warning: unused variable 'volatile_var' [-Wunused-variable]
This occured in gcc-6.2.1.

The variable wasn't used so was no need to be volatile either.

Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
2017-03-03 13:00:41 +04:00
Vicențiu Ciorbaru
1acfa942ed Merge branch '5.5' into 10.0 2017-03-03 01:37:54 +02:00
Marko Mäkelä
fc673a2c12 MDEV-12127 InnoDB: Assertion failure loop_count < 5 in file log0log.cc
As suggested in MySQL Bug#58536, increase the limit in this
debug assertion in order to avoid false positives on heavily
loaded systems.
2017-02-28 09:54:12 +02:00
Marko Mäkelä
ec4cf111c0 MDEV-11520 after-merge fix for 10.1: Use sparse files.
If page_compression (introduced in MariaDB Server 10.1) is enabled,
the logical action is to not preallocate space to the data files,
but to only logically extend the files with zeroes.

fil_create_new_single_table_tablespace(): Create smaller files for
ROW_FORMAT=COMPRESSED tables, but adhere to the minimum file size of
4*innodb_page_size.

fil_space_extend_must_retry(), os_file_set_size(): On Windows,
use SetFileInformationByHandle() and FILE_END_OF_FILE_INFO,
which depends on bumping _WIN32_WINNT to 0x0600.
FIXME: The files are not yet set up as sparse, so
this will currently end up physically extending (preallocating)
the files, wasting storage for unused pages.

os_file_set_size(): Add the parameter "bool sparse=false" to declare
that the file is to be extended logically, instead of being preallocated.
The only caller with sparse=true is
fil_create_new_single_table_tablespace().
(The system tablespace cannot be created with page_compression.)

fil_space_extend_must_retry(), os_file_set_size(): Outside Windows,
use ftruncate() to extend files that are supposed to be sparse.
On systems where ftruncate() is limited to files less than 4GiB
(if there are any), fil_space_extend_must_retry() retains the
old logic of physically extending the file.
2017-02-22 22:29:56 +02:00
Marko Mäkelä
e1e920bf63 Merge 10.0 into 10.1 2017-02-22 15:53:05 +02:00
Marko Mäkelä
a0ce92ddc7 MDEV-11520 post-fix
fil_extend_space_to_desired_size(): Use a proper type cast when
computing start_offset for the posix_fallocate() call on 32-bit systems
(where sizeof(ulint) < sizeof(os_offset_t)). This could affect 32-bit
systems when extending files that are at least 4 MiB long.

This bug existed in MariaDB 10.0 before MDEV-11520. In MariaDB 10.1
it had been fixed in MDEV-11556.
2017-02-22 12:32:17 +02:00
Marko Mäkelä
81695ab8b5 MDEV-11520 Extending an InnoDB data file unnecessarily allocates
a large memory buffer on Windows

fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate(), and pass the correct arguments to
posix_fallocate().

On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)

I tested that the file extension works properly with a multi-file
system tablespace, both with --innodb-use-fallocate and
--skip-innodb-use-fallocate (the default):

./mtr \
--mysqld=--innodb-use-fallocate \
--mysqld=--innodb-autoextend-increment=1 \
--mysqld=--innodb-data-file-path='ibdata1:5M;ibdata2:5M:autoextend' \
--parallel=auto --force --retry=0 --suite=innodb &

ls -lsh mysql-test/var/*/mysqld.1/data/ibdata2
(several samples while running the test)
2017-02-22 12:21:44 +02:00
Marko Mäkelä
365c4e971a MDEV-11520/MDEV-5746 post-fix: Do not posix_fallocate() too much.
Before the MDEV-11520 fixes, fil_extend_space_to_desired_size()
in MariaDB Server 5.5 incorrectly passed the desired file size as the
third argument to posix_fallocate(), even though the length of the
extension should have been passed. This looks like a regression
that was introduced in the 5.5 version of MDEV-5746.
2017-02-22 10:03:33 +02:00
Marko Mäkelä
6de50b2c7f MDEV-11520 post-fixes
Remove the unused variable desired_size.

Also, correct the expression for the posix_fallocate() start_offset,
and actually test that it works with a multi-file system tablespace.
Before MDEV-11520, the expression was wrong in both innodb_plugin and
xtradb, in different ways.

The start_offset formula was tested with the following:

./mtr --big-test --mysqld=--innodb-use-fallocate \
--mysqld=--innodb-data-file-path='ibdata1:5M;ibdata2:5M:autoextend' \
--parallel=auto --force --retry=0 --suite=innodb &

ls -lsh mysql-test/var/*/mysqld.1/data/ibdata2
2017-02-22 09:44:21 +02:00
Marko Mäkelä
978179a9d4 MDEV-11520 Extending an InnoDB data file unnecessarily allocates
a large memory buffer on Windows

fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate().

On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)

For innodb_plugin, port the XtraDB fix for MySQL Bug#56433
(introducing fil_system->file_extend_mutex). The bug was
fixed differently in MySQL 5.6 (and MariaDB Server 10.0).
2017-02-21 16:45:03 +02:00
Marko Mäkelä
2bfe83adec Remove a bogus Valgrind "suppression".
fsp_init_file_page_low() does initialize all pages nowadays,
even those in the InnoDB system tablespace.
2017-02-21 16:45:03 +02:00
Marko Mäkelä
3c47ed4849 Merge 10.0 into 10.1 2017-02-20 14:02:40 +02:00
Marko Mäkelä
13493078e9 MDEV-11802 innodb.innodb_bug14676111 fails
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.

It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.

For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.

os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.

os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
2017-02-20 12:20:52 +02:00
Jan Lindström
108b211ee2 Fix gcc 6.3.x compiler warnings.
These are caused by fact that functions are declared with
__attribute__((nonnull)) or left shit like ~0 << macro
when ~0U << macro should be used.
2017-02-16 12:02:31 +02:00
Marko Mäkelä
01d5d6db4c Fix GCC 6.3.0 warnings. 2017-02-16 11:16:27 +02:00
Marko Mäkelä
32170cafad MDEV-12075 innodb_use_fallocate does not work in MariaDB Server 10.1.21
fil_space_extend_must_retry(): When innodb_use_fallocate=ON,
initialize pages_added = size - space->size so that posix_fallocate()
will actually attempt to extend the file, instead of keeping the same size.

This is a regression from MDEV-11556 which refactored
the InnoDB data file extension.
2017-02-16 11:12:24 +02:00
Jan Lindström
de9963b786 After reivew fixes. 2017-02-10 17:41:35 +02:00
Jan Lindström
41cd80fe06 After review fixes. 2017-02-10 16:05:37 +02:00
Marko Mäkelä
99b2de92c6 Post-push fix for MDEV-11623: Remove an unused variable. 2017-02-09 09:36:10 +02:00
Marko Mäkelä
ef065dbbc2 Merge 10.0 into 10.1 2017-02-09 08:51:52 +02:00
Jan Lindström
0340067608 After review fixes for MDEV-11759.
buf_page_is_checksum_valid_crc32()
buf_page_is_checksum_valid_innodb()
buf_page_is_checksum_valid_none():
	Use ULINTPF instead of %lu and %u for ib_uint32_t

fil_space_verify_crypt_checksum():
	Check that page is really empty if checksum and
	LSN are zero.

fil_space_verify_crypt_checksum():
	Correct the comment to be more agurate.

buf0buf.h:
	Remove unnecessary is_corrupt variable from
	buf_page_t structure.
2017-02-09 08:49:13 +02:00
Marko Mäkelä
6011fb6daa Post-push fix for MDEV-11947 InnoDB purge workers fail to shut down
Use the ib_int64_t type alias instead of the standard type int64_t,
so that the code will compile on Microsoft Visual Studio 2013.
2017-02-09 08:47:38 +02:00
Marko Mäkelä
9017a05d87 Merge 10.0 into 10.1 2017-02-08 17:30:25 +02:00
Marko Mäkelä
d831e4c22a MDEV-12024 InnoDB startup fails to wait for recv_writer_thread to finish
recv_writer_thread(): Do not assign recv_writer_thread_active=true
in order to avoid a race condition with
recv_recovery_from_checkpoint_finish().

recv_init_crash_recovery(): Assign recv_writer_thread_active=true
before creating recv_writer_thread.
2017-02-08 17:23:13 +02:00
Marko Mäkelä
cbdc389ec9 MDEV-12022 InnoDB wrongly ignores the end of an .ibd file
InnoDB can wrongly ignore the end of data files when using
innodb_page_size=32k or innodb_page_size=64k. These page sizes
use an allocation extent size of 2 or 4 megabytes, not 1 megabyte.

This issue does not affect MariaDB Server 10.2, which is using
the correct WL#5757 code from MySQL 5.7.

That said, it does not make sense to ignore the tail of data files.
The next time the data file needs to be extended, it would be extended
to a multiple of the extent size, once the size exceeds one extent.
2017-02-08 11:35:35 +02:00
Marko Mäkelä
2e67e66c3a Merge 10.0 into 10.1 2017-02-08 08:53:34 +02:00
Jan Lindström
e53dfb24be MDEV-11707: Fix incorrect memset() for structures containing
dynamic class GenericPolicy<TTASEventMutex<GenericPolicy> >'; vtable

Instead using mem_heap_alloc and memset, use mem_heap_zalloc
directly.
2017-02-06 15:40:17 +02:00
Jan Lindström
ddf2fac733 MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes
compatibility problems

Pages that are encrypted contain post encryption checksum on
different location that normal checksum fields. Therefore,
we should before decryption check this checksum to avoid
unencrypting corrupted pages. After decryption we can use
traditional checksum check to detect if page is corrupted
or unencryption was done using incorrect key.

Pages that are page compressed do not contain any checksum,
here we need to fist unencrypt, decompress and finally
use tradional checksum check to detect page corruption
or that we used incorrect key in unencryption.

buf0buf.cc: buf_page_is_corrupted() mofified so that
compressed pages are skipped.

buf0buf.h, buf_block_init(), buf_page_init_low():
removed unnecessary page_encrypted, page_compressed,
stored_checksum, valculated_checksum fields from
buf_page_t

buf_page_get_gen(): use new buf_page_check_corrupt() function
to detect corrupted pages.

buf_page_check_corrupt(): If page was not yet decrypted
check if post encryption checksum still matches.
If page is not anymore encrypted, use buf_page_is_corrupted()
traditional checksum method.

If page is detected as corrupted and it is not encrypted
we print corruption message to error log.
If page is still encrypted or it was encrypted and now
corrupted, we will print message that page is
encrypted to error log.

buf_page_io_complete(): use new buf_page_check_corrupt()
function to detect corrupted pages.

buf_page_decrypt_after_read(): Verify post encryption
checksum before tring to decrypt.

fil0crypt.cc: fil_encrypt_buf() verify post encryption
checksum and ind fil_space_decrypt() return true
if we really decrypted the page.

fil_space_verify_crypt_checksum(): rewrite to use
the method used when calculating post encryption
checksum. We also check if post encryption checksum
matches that traditional checksum check does not
match.

fil0fil.ic: Add missed page type encrypted and page
compressed to fil_get_page_type_name()

Note that this change does not yet fix innochecksum tool,
that will be done in separate MDEV.

Fix test failures caused by buf page corruption injection.
2017-02-06 15:40:16 +02:00
Marko Mäkelä
f162704570 Rewrite the innodb.log_file_size test with DBUG_EXECUTE_IF.
Remove the debug parameter innodb_force_recovery_crash that was
introduced into MySQL 5.6 by me in WL#6494 which allowed InnoDB
to resize the redo log on startup.

Let innodb.log_file_size actually start up the server, but ensure
that the InnoDB storage engine refuses to start up in each of the
scenarios.
2017-02-05 17:07:16 +02:00
Marko Mäkelä
20e8347447 MDEV-11985 Make innodb_read_only shutdown more robust
If InnoDB is started in innodb_read_only mode such that
recovered incomplete transactions exist at startup
(but the redo logs are clean), an assertion will fail at shutdown,
because there would exist some non-prepared transactions.

logs_empty_and_mark_files_at_shutdown(): Do not wait for incomplete
transactions to finish if innodb_read_only or innodb_force_recovery>=3.
Wait for purge to finish in only one place.

trx_sys_close(): Relax the assertion that would fail first.

trx_free_prepared(): Also free recovered TRX_STATE_ACTIVE transactions
if innodb_read_only or innodb_force_recovery>=3.
2017-02-04 17:33:19 +02:00
Marko Mäkelä
9f0dbb3120 MDEV-11947 InnoDB purge workers fail to shut down
srv_release_threads(): Actually wait for the threads to resume
from suspension. On CentOS 5 and possibly other platforms,
os_event_set() may be lost.

srv_resume_thread(): A counterpart of srv_suspend_thread().
Optionally wait for the event to be set, optionally with a timeout,
and then release the thread from suspension.

srv_free_slot(): Unconditionally suspend the thread. It is always
in resumed state when this function is entered.

srv_active_wake_master_thread_low(): Only call os_event_set().

srv_purge_coordinator_suspend(): Use srv_resume_thread() instead
of the complicated logic.
2017-02-04 17:33:01 +02:00
Jan Lindström
17cc619847 MDEV-11671: Duplicated [NOTE] output for changed innodb_page_size
Remove duplicated output and change output level to info.
2017-01-31 15:42:52 +02:00
Marko Mäkelä
732672c304 MDEV-11233 CREATE FULLTEXT INDEX with a token longer than 127 bytes
crashes server

This bug is the result of merging the Oracle MySQL follow-up fix
BUG#22963169 MYSQL CRASHES ON CREATE FULLTEXT INDEX
without merging the base bug fix:
Bug#79475 Insert a token of 84 4-bytes chars into fts index causes
server crash.

Unlike the above mentioned fixes in MySQL, our fix will not change
the storage format of fulltext indexes in InnoDB or XtraDB
when a character encoding with mbmaxlen=2 or mbmaxlen=3
and the length of a word is between 128 and 84*mbmaxlen bytes.
The Oracle fix would allocate 2 length bytes for these cases.

Compatibility with other MySQL and MariaDB releases is ensured by
persisting the used maximum length in the SYS_COLUMNS table in the
InnoDB data dictionary.

This fix also removes some unnecessary strcmp() calls when checking
for the legacy default collation my_charset_latin1
(my_charset_latin1.name=="latin1_swedish_ci").

fts_create_one_index_table(): Store the actual length in bytes.
This metadata will be written to the SYS_COLUMNS table.

fts_zip_initialize(): Initialize only the first byte of the buffer.
Actually the code should not even care about this first byte, because
the length is set as 0.

FTX_MAX_WORD_LEN: Define as HA_FT_MAXCHARLEN * 4 aka 336 bytes,
not as 254 bytes.

row_merge_create_fts_sort_index(): Set the actual maximum length of the
column in bytes, similar to fts_create_one_index_table().

row_merge_fts_doc_tokenize(): Remove the redundant parameter word_dtype.
Use the actual maximum length of the column. Calculate the extra_size
in the same way as row_merge_buf_encode() does.
2017-01-27 10:19:39 +02:00
Marko Mäkelä
f1f8ebc325 Merge 10.0 into 10.1 2017-01-26 23:40:11 +02:00
Marko Mäkelä
afb461587c MDEV-11915 Detect InnoDB system tablespace size mismatch early
InnoDB would refuse to start up if there is a mismatch on
the size of the system tablespace files. However, before this
check is conducted, the system tablespace may already have been
heavily modified.

InnoDB should perform the size check as early as possible.

recv_recovery_from_checkpoint_finish():
Move the recv_apply_hashed_log_recs() call to
innobase_start_or_create_for_mysql().

innobase_start_or_create_for_mysql(): Test the mutex functionality
before doing anything else. Use a compile_time_assert() for a
sizeof() constraint. Check the size of the system tablespace as
early as possible.
2017-01-26 23:10:36 +02:00
Marko Mäkelä
49fe9bad01 MDEV-11814 Refuse innodb_read_only startup if crash recovery is needed
recv_scan_log_recs(): Remember if redo log apply is needed,
even if starting up in innodb_read_only mode.

recv_recovery_from_checkpoint_start_func(): Refuse
innodb_read_only startup if redo log apply is needed.
2017-01-26 13:58:58 +02:00
Jan Lindström
b7b4c332c0 MDEV-11614: Syslog messages: "InnoDB: Log sequence number
at the start 759654123 and the end 0 do not match."

For page compressed and encrypted tables log sequence
number at end is not stored, thus disable this message
for them.
2017-01-22 08:46:15 +02:00
Jan Lindström
8a4d605500 MDEV-11838: Innodb-encryption-algorithm default should be != none
Change default to zlib, this has effect only if user has
explicitly requested page compression and then user
naturally expects that pages are really compressed
if they can be compressed.
2017-01-19 12:20:54 +02:00