Commit graph

217 commits

Author SHA1 Message Date
Vicențiu Ciorbaru
65eefcdc60 Merge remote-tracking branch '10.2' into 10.3 2018-04-12 12:41:19 +03:00
Vladislav Vaintroub
7b16291c36 MDEV-15707 : deadlock in Innodb IO code, caused by change buffering.
In async IO completion code, after reading a page,Innodb can wait for
completion of other bufferpool reads.
This is for example what happens if change-buffering is active.

Innodb on Windows could deadlock, as it did not have dedicated threads
for processing change buffer asynchronous reads.

The fix for that is to have windows now has the same background threads,
including dedicated thread for ibuf, and log AIOs.

The ibuf/read completions are now dispatched to their threads with
PostQueuedCompletionStatus(), the write and log completions are processed
in thread where they arrive.
2018-04-08 21:32:02 +00:00
Sergei Golubchik
b1818dccf7 Merge branch '10.2' into 10.3 2018-03-28 17:31:57 +02:00
Marko Mäkelä
3d7915f000 Merge 10.1 into 10.2 2018-03-21 22:58:52 +02:00
Vicențiu Ciorbaru
82aeb6b596 Merge branch '10.1' into 10.2 2018-03-21 10:36:49 +02:00
Marko Mäkelä
e0a0fe7d81 MDEV-12396 IMPORT TABLESPACE: Do not retry partial reads
fil_iterate(), fil_tablespace_iterate(): Replace os_file_read()
with os_file_read_no_error_handling().

os_file_read_func(), os_file_read_no_error_handling_func():
Do not retry partial reads. There used to be an infinite amount
of retries. Because InnoDB extends both data and log files upfront,
partial reads should be impossible during normal operation.
2018-03-20 15:31:39 +02:00
Vicențiu Ciorbaru
24b353162f Merge branch '10.0-galera' into 10.1 2018-03-19 15:21:01 +02:00
Daniel Black
7bb661cd40 innodb: os_file_create_tmpfile always called with NULL -> simplify 2018-03-15 13:49:00 +11:00
Daniel Black
26e4a48bda MDEV-8743: ib_logfile0 Use O_CLOEXEC so galera SST scripts don't get fd 2018-03-02 11:09:51 +11:00
Vladislav Vaintroub
56e7b7eaed Make possible to use clang on Windows (clang-cl)
-DWITH_ASAN can be used as well now, on x64

Fix many clang-cl warnings.
2018-02-20 21:17:36 +00:00
Marko Mäkelä
b006d2ead4 Merge bb-10.2-ext into 10.3 2018-02-15 10:22:03 +02:00
Marko Mäkelä
00f0c039d2 MDEV-15270 Mariabackup should not try to use doublewrite buffer
When Mariabackup gets a bad read of the first page of the system
tablespace file, it would inappropriately try to apply the doublewrite
buffer and write changes back to the data file (to the source file)!
This is very wrong and must be prevented.

The correct action would be to retry reading the system tablespace
as well as any other files whose first page was read incorrectly.
Fixing this was not attempted.

xb_load_tablespaces(): Shorten a bogus message to be more relevant.
The message can be displayed by --backup or --prepare.

xtrabackup_backup_func(), os_file_write_func(): Add a missing space
to a message.

Datafile::restore_from_doublewrite(): Do not even attempt the
operation in Mariabackup.

recv_init_crash_recovery_spaces(): Do not attempt to restore the
doublewrite buffer in Mariabackup (--prepare or --export), because
all pages should have been copied correctly in --backup already,
and because --backup should ignore the doublewrite buffer.

SysTablespace::read_lsn_and_check_flags(): Do not attempt to initialize
the doublewrite buffer in Mariabackup.

innodb_make_page_dirty(): Correct the bounds check.

Datafile::read_first_page(): Correct the name of the parameter.
2018-02-12 16:56:01 +02:00
Vladislav Vaintroub
53476abce8 Windows, compiling : use /permissive- switch to improve conformance
fix a couple "initialization skipped by goto" and other new errors.
2018-02-07 20:22:30 +00:00
Marko Mäkelä
7cb3520c06 Merge bb-10.2-ext into 10.3 2017-11-30 08:16:37 +02:00
Marko Mäkelä
c19ef508b8 InnoDB: Remove ut_snprintf() and the use of my_snprintf(); use snprintf() 2017-11-13 02:11:48 +02:00
Marko Mäkelä
a48aa0cd56 Merge bb-10.2-ext into 10.3 2017-11-10 16:12:45 +02:00
Marko Mäkelä
51679e5c38 MDEV-14132 InnoDB page corruption
On some old GNU/Linux systems, invoking posix_fallocate() with
offset=0 would sometimes cause already allocated bytes in the
data file to be overwritten.

Fix a correctness regression that was introduced in
commit 420798a81a
by invoking posix_fallocate() in a safer way.
A similar change was made in MDEV-5746 earlier.

os_file_get_size(): Avoid changing the state of the file handle,
by invoking fstat() instead of lseek().

os_file_set_size(): Determine the current size of the file
by os_file_get_size(), and then extend the file from that point
onwards.
2017-11-06 08:53:51 +02:00
Marko Mäkelä
30a8764b92 MDEV-14244 MariaDB fails to run with O_DIRECT
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.

MariaDB used to ignore any possible return value from posix_fallocate()
ever since innodb_use_fallocate was introduced in MDEV-4338. If EINVAL
was returned, the file would not be extended.

Starting with MDEV-11520, MariaDB would treat EINVAL as a hard error.

Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
2017-11-06 08:53:50 +02:00
Marko Mäkelä
19733efa7b MDEV-14244 MariaDB 10.2.10 fails to run on Debian Stretch with ext3 and O_DIRECT
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.

MariaDB 10.2 used to handle the EINVAL return value from posix_fallocate()
before commit b731a5bcf2
which refactored os_file_set_size() to try posix_fallocate().

Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
2017-11-02 16:18:41 +02:00
Alexander Barkov
835cbbcc7b Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3
TODO: enable MDEV-13049 optimization for 10.3
2017-10-30 20:47:39 +04:00
Marko Mäkelä
58e0dcb93d Add a missing space to an error message 2017-10-30 10:06:47 +02:00
Vladislav Vaintroub
97df230aed MDEV-14115 : Do not use lpNumberOfBytesRead/Written params in
ReadFile/WriteFile operations.

Innodb opens files with FILE_FLAG_OVERLAPPED. lpNumberOfBytesRead/Written
are documented to be potentially inaccurate in this case,
(possibly even if async operations complete synchronously?)

The fix is to always pass NULL for the correspondng parameters,
as recommended by  MSDN. Read the actual counts with
GetQueuedCompletionStatus() or GetOverlappedResult().
2017-10-27 23:42:02 +00:00
Marko Mäkelä
067f83969c MDEV-14132 follow-up fix: Make os_file_get_size() thread-safe
os_file_get_size(): Use fstat() instead of calling lseek() 3 times.
In this way, concurrent calls to this function should not interfere
with each other.

Suggested by Vladislav Vaintroub.
2017-10-27 19:33:38 +03:00
Marko Mäkelä
5f5ffdc76b MDEV-14132 follow-up fix: Validate the posix_fallocate() argument
os_file_set_size(): Sometimes the file already is large enough.
Avoid calling posix_fallocate() with a non-positive argument.
Also, add a missing space to an error message.
2017-10-27 18:59:22 +03:00
Vladislav Vaintroub
057a6cf768 MDEV-14132 : fix posix_fallocate() calls to workaround some (ancient) Linux bugs
With this patch, parameters passed to posix_fallocate() will be
the same as they were prior to refactoring in  commit b731a5bcf2

In particular, 'offset' parameter for posix_fallocate is again current_file_size
and 'length' is new_file_size - current_file_size.

This seems to fix crashes on ancient Linux (kernel 2.6).
2017-10-27 11:56:10 +00:00
Vladislav Vaintroub
fa7a1a57d9 Windows : small optimization in os_is_sparse_file_supported()
Use GetFileInformationByHandleEx with FileAttributeTagInfo to query whether
the file is sparse. This saves 1 syscall, as GetFileInformationByHandle()
would additionally query volume info.
2017-10-10 06:19:50 +00:00
Vladislav Vaintroub
ff2d9e125f MDEV-13941 followup.
Try to fix fragmentation (unsparse files), for pre-existing
installations.

Unsparse the innodb file, when it needs to be extended, unless compression
is used. For Win7/2008R2 unsparse  does not work (as documented in MSDN),
therefore for sparse files in older Windows, file extension will be done
via writing zeroes at the end of file.
2017-10-10 06:19:50 +00:00
Vladislav Vaintroub
b731a5bcf2 Innodb : Refactor os_file_set_size() to be compatible 10.1
The last parameter to this function is now,"bool is_sparse", like in 10.1
rather than the  unused/useless "bool is_readonly", merged from MySQL 5.7

Like in 10.1, this function now supports sparse files, and efficient
platform specific mechanisms for file extension

os_file_set_size() is now consistenly used in all places where
innodb files are extended.
2017-10-10 06:19:50 +00:00
Vladislav Vaintroub
420798a81a Refactor os_file_set_size to extend already existing files.
Change fil_space_extend_must_retry() to use this function.
2017-10-07 08:30:20 +00:00
Marko Mäkelä
2c1067166d Merge bb-10.2-ext into 10.3 2017-10-04 08:24:06 +03:00
Vladislav Vaintroub
96b9c61787 MDEV-13941 Fix high NTFS fragmentation on 10.2
Prior to this patch, creating or even opening any innodb file in 10.2
would set a sparse flag on file. The file extension was done by setting
end of file, without writing zeros. This technique is fine, however
due to sparsedness, it created a hole at the end of the file, which
lead to much higher fragmentation subsequently.

The fix is only to use sparse flag for compressed tables, where holes
are actually wanted, but not for normal tables.
2017-09-29 17:29:21 +00:00
Vladislav Vaintroub
7354dc6773 MDEV-13384 - misc Windows warnings fixed 2017-09-28 17:20:46 +00:00
Vladislav Vaintroub
eba44874ca MDEV-13844 : Fix Windows warnings. Fix DBUG_PRINT.
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)

- Also fix printf-format warnings

Make the above mentioned warnings fatal.

- fix pthread_join on Windows to set return value.
2017-09-28 17:20:46 +00:00
Vladislav Vaintroub
1d7bc3b582 Innodb : do not call fflush() in os_get_last_error_low(), if no error
message was written.
2017-09-16 09:45:38 +00:00
Sergei Golubchik
bb8e99fdc3 Merge branch 'bb-10.2-ext' into 10.3 2017-08-26 00:34:43 +02:00
Vladislav Vaintroub
edf77043ba MDEV-12948 : do not spam error log, if DeviceIoControl(IOCTL_STORAGE_QUERY_PROPERTY)
fails with ERROR_INVALID_FUNCTION

This DeviceIoControl seems to happen on different boxes from time to time,
and there is not much user can do about it.
Instead of error, log a single INFO message, so it does not disturb users
much.
2017-08-17 17:36:39 +00:00
Marko Mäkelä
57fea53615 Merge bb-10.2-ext into 10.3 2017-07-07 12:39:43 +03:00
Alexander Barkov
3b9273d203 Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3 2017-07-05 17:43:32 +04:00
Marko Mäkelä
8c71c6aa8b MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2
InnoDB I/O and buffer pool interfaces and the redo log format
have been changed between MariaDB 10.1 and 10.2, and the backup
code has to be adjusted accordingly.

The code has been simplified, and many memory leaks have been fixed.
Instead of the file name xtrabackup_logfile, the file name ib_logfile0
is being used for the copy of the redo log. Unnecessary InnoDB startup and
shutdown and some unnecessary threads have been removed.

Some help was provided by Vladislav Vaintroub.

Parameters have been cleaned up and aligned with those of MariaDB 10.2.

The --dbug option has been added, so that in debug builds,
--dbug=d,ib_log can be specified to enable diagnostic messages
for processing redo log entries.

By default, innodb_doublewrite=OFF, so that --prepare works faster.
If more crash-safety for --prepare is needed, double buffering
can be enabled.

The parameter innodb_log_checksums=OFF can be used to ignore redo log
checksums in --backup.

Some messages have been cleaned up.
Unless --export is specified, Mariabackup will not deal with undo log.
The InnoDB mini-transaction redo log is not only about user-level
transactions; it is actually about mini-transactions. To avoid confusion,
call it the redo log, not transaction log.

We disable any undo log processing in --prepare.

Because MariaDB 10.2 supports indexed virtual columns, the
undo log processing would need to be able to evaluate virtual column
expressions. To reduce the amount of code dependencies, we will not
process any undo log in prepare.

This means that the --export option must be disabled for now.

This also means that the following options are redundant
and have been removed:
	xtrabackup --apply-log-only
	innobackupex --redo-only

In addition to disabling any undo log processing, we will disable any
further changes to data pages during --prepare, including the change
buffer merge. This means that restoring incremental backups should
reliably work even when change buffering is being used on the server.
Because of this, preparing a backup will not generate any further
redo log, and the redo log file can be safely deleted. (If the
--export option is enabled in the future, it must generate redo log
when processing undo logs and buffered changes.)

In --prepare, we cannot easily know if a partial backup was used,
especially when restoring a series of incremental backups. So, we
simply warn about any missing files, and ignore the redo log for them.

FIXME: Enable the --export option.

FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write
a test that initiates a backup while an ALGORITHM=INPLACE operation
is creating indexes or rebuilding a table. An error should be detected
when preparing the backup.

FIXME: In --incremental --prepare, xtrabackup_apply_delta() should
ensure that if FSP_SIZE is modified, the file size will be adjusted
accordingly.
2017-07-05 11:43:28 +03:00
Marko Mäkelä
41a6475b49 InnoDB: Use access() instead of open() 2017-07-05 08:02:55 +03:00
Marko Mäkelä
68b5aeae4e Minor cleanup of InnoDB I/O routines
Change many function parameters from IORequest& to const IORequest&.

Remove an unused definition of ECANCELED.
2017-06-29 22:30:47 +03:00
Marko Mäkelä
1e3886ae80 Merge bb-10.2-ext into 10.3 2017-06-19 17:28:08 +03:00
Marko Mäkelä
6e0e5eefbc Fix some printf format type mismatch 2017-06-06 12:04:29 +03:00
Marko Mäkelä
3d615e4b1a Merge branch 'bb-10.2-ext' into 10.3
This excludes MDEV-12472 (InnoDB should accept XtraDB parameters,
warning that they are ignored). In other words, MariaDB 10.3 will not
recognize any XtraDB-specific parameters.
2017-06-02 09:36:04 +03:00
Sachin Setiya
92209ac6f6 Merge tag 'mariadb-10.0.31' into 10.0-galera
Signed-off-by: Sachin Setiya <sachin.setiya@mariadb.com>
2017-05-30 15:28:52 +05:30
Marko Mäkelä
8f643e2063 Merge 10.1 into 10.2 2017-05-23 11:09:47 +03:00
Marko Mäkelä
65e1399e64 Merge 10.0 into 10.1
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
2017-05-20 08:41:20 +03:00
Vicențiu Ciorbaru
b87873b221 Merge branch 'merge-innodb-5.6' into bb-10.0-vicentiu
This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293
from current 5.6.36 innodb.

Bug #23481444	OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE
                       INDEX APPLIED BY UNCOMMITTED ROW
Problem:
========
row_search_for_mysql() does whole table traversal for range query
even though the end range is passed. Whole table traversal happens
when the record is not with in transaction read view.

Solution:
=========

Convert the innodb last record of page to mysql format and compare
with end range if the traversal of row_search_mvcc() exceeds 100,
no ICP involved. If it is out of range then InnoDB can avoid the
whole table traversal. Need to refactor the code little bit to
make it compile.

Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com>
Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com>
RB: 14660
2017-05-17 14:53:28 +03:00
Vicențiu Ciorbaru
0af9818240 5.6.36 2017-05-15 17:17:16 +03:00
Marko Mäkelä
021d636551 Fix some integer type mismatch.
Use uint32_t for the encryption key_id.

When filling unsigned integer values into INFORMATION_SCHEMA tables,
use the method Field::store(longlong, bool unsigned)
instead of using Field::store(double).

Fix also some miscellanous type mismatch related to ulint (size_t).
2017-05-10 12:45:46 +03:00