Commit graph

694 commits

Author SHA1 Message Date
Vladislav Vaintroub
321771f89f MDEV-15895 : make Innodb merge temp tables use pfs_os_file_t for
file IO, rather than int.

On Windows, it is suboptimal to depend on C runtime, as it has limited
number of file descriptors. This change eliminates
os_file_read_no_error_handling_int_fd(), os_file_write_int_fd(),
OS_FILE_FROM_FD() macro.
2018-04-17 09:07:38 +01:00
Vladislav Vaintroub
47ea2227e5 fix typo, amend last commit 2018-04-14 23:59:59 +01:00
Vladislav Vaintroub
043a9b4e1b Windows, innodb : reduce noise from os_file_get_block_size()
if volume can't be opened due to permissions, or
IOCTL_STORAGE_QUERY_PROPERTY fails with not implemented, do not report it.
Those errors happen, there is nothing user can do.

This patch amends fix for MDEV-12948.
2018-04-14 23:53:11 +01:00
Vicențiu Ciorbaru
65eefcdc60 Merge remote-tracking branch '10.2' into 10.3 2018-04-12 12:41:19 +03:00
Vladislav Vaintroub
7b16291c36 MDEV-15707 : deadlock in Innodb IO code, caused by change buffering.
In async IO completion code, after reading a page,Innodb can wait for
completion of other bufferpool reads.
This is for example what happens if change-buffering is active.

Innodb on Windows could deadlock, as it did not have dedicated threads
for processing change buffer asynchronous reads.

The fix for that is to have windows now has the same background threads,
including dedicated thread for ibuf, and log AIOs.

The ibuf/read completions are now dispatched to their threads with
PostQueuedCompletionStatus(), the write and log completions are processed
in thread where they arrive.
2018-04-08 21:32:02 +00:00
Sergei Golubchik
b1818dccf7 Merge branch '10.2' into 10.3 2018-03-28 17:31:57 +02:00
Marko Mäkelä
3d7915f000 Merge 10.1 into 10.2 2018-03-21 22:58:52 +02:00
Vicențiu Ciorbaru
82aeb6b596 Merge branch '10.1' into 10.2 2018-03-21 10:36:49 +02:00
Marko Mäkelä
e0a0fe7d81 MDEV-12396 IMPORT TABLESPACE: Do not retry partial reads
fil_iterate(), fil_tablespace_iterate(): Replace os_file_read()
with os_file_read_no_error_handling().

os_file_read_func(), os_file_read_no_error_handling_func():
Do not retry partial reads. There used to be an infinite amount
of retries. Because InnoDB extends both data and log files upfront,
partial reads should be impossible during normal operation.
2018-03-20 15:31:39 +02:00
Vicențiu Ciorbaru
24b353162f Merge branch '10.0-galera' into 10.1 2018-03-19 15:21:01 +02:00
Daniel Black
7bb661cd40 innodb: os_file_create_tmpfile always called with NULL -> simplify 2018-03-15 13:49:00 +11:00
Daniel Black
26e4a48bda MDEV-8743: ib_logfile0 Use O_CLOEXEC so galera SST scripts don't get fd 2018-03-02 11:09:51 +11:00
Daniel Black
9629bca1f0 MDEV-8743: use O_CLOEXEC (innodb/xtradb) 2018-03-02 10:54:00 +11:00
Vladislav Vaintroub
56e7b7eaed Make possible to use clang on Windows (clang-cl)
-DWITH_ASAN can be used as well now, on x64

Fix many clang-cl warnings.
2018-02-20 21:17:36 +00:00
Marko Mäkelä
b006d2ead4 Merge bb-10.2-ext into 10.3 2018-02-15 10:22:03 +02:00
Marko Mäkelä
00f0c039d2 MDEV-15270 Mariabackup should not try to use doublewrite buffer
When Mariabackup gets a bad read of the first page of the system
tablespace file, it would inappropriately try to apply the doublewrite
buffer and write changes back to the data file (to the source file)!
This is very wrong and must be prevented.

The correct action would be to retry reading the system tablespace
as well as any other files whose first page was read incorrectly.
Fixing this was not attempted.

xb_load_tablespaces(): Shorten a bogus message to be more relevant.
The message can be displayed by --backup or --prepare.

xtrabackup_backup_func(), os_file_write_func(): Add a missing space
to a message.

Datafile::restore_from_doublewrite(): Do not even attempt the
operation in Mariabackup.

recv_init_crash_recovery_spaces(): Do not attempt to restore the
doublewrite buffer in Mariabackup (--prepare or --export), because
all pages should have been copied correctly in --backup already,
and because --backup should ignore the doublewrite buffer.

SysTablespace::read_lsn_and_check_flags(): Do not attempt to initialize
the doublewrite buffer in Mariabackup.

innodb_make_page_dirty(): Correct the bounds check.

Datafile::read_first_page(): Correct the name of the parameter.
2018-02-12 16:56:01 +02:00
Vladislav Vaintroub
53476abce8 Windows, compiling : use /permissive- switch to improve conformance
fix a couple "initialization skipped by goto" and other new errors.
2018-02-07 20:22:30 +00:00
Marko Mäkelä
7cb3520c06 Merge bb-10.2-ext into 10.3 2017-11-30 08:16:37 +02:00
Marko Mäkelä
c19ef508b8 InnoDB: Remove ut_snprintf() and the use of my_snprintf(); use snprintf() 2017-11-13 02:11:48 +02:00
Marko Mäkelä
a48aa0cd56 Merge bb-10.2-ext into 10.3 2017-11-10 16:12:45 +02:00
Marko Mäkelä
51679e5c38 MDEV-14132 InnoDB page corruption
On some old GNU/Linux systems, invoking posix_fallocate() with
offset=0 would sometimes cause already allocated bytes in the
data file to be overwritten.

Fix a correctness regression that was introduced in
commit 420798a81a
by invoking posix_fallocate() in a safer way.
A similar change was made in MDEV-5746 earlier.

os_file_get_size(): Avoid changing the state of the file handle,
by invoking fstat() instead of lseek().

os_file_set_size(): Determine the current size of the file
by os_file_get_size(), and then extend the file from that point
onwards.
2017-11-06 08:53:51 +02:00
Marko Mäkelä
30a8764b92 MDEV-14244 MariaDB fails to run with O_DIRECT
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.

MariaDB used to ignore any possible return value from posix_fallocate()
ever since innodb_use_fallocate was introduced in MDEV-4338. If EINVAL
was returned, the file would not be extended.

Starting with MDEV-11520, MariaDB would treat EINVAL as a hard error.

Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
2017-11-06 08:53:50 +02:00
Marko Mäkelä
19733efa7b MDEV-14244 MariaDB 10.2.10 fails to run on Debian Stretch with ext3 and O_DIRECT
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.

MariaDB 10.2 used to handle the EINVAL return value from posix_fallocate()
before commit b731a5bcf2
which refactored os_file_set_size() to try posix_fallocate().

Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
2017-11-02 16:18:41 +02:00
Alexander Barkov
835cbbcc7b Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3
TODO: enable MDEV-13049 optimization for 10.3
2017-10-30 20:47:39 +04:00
Marko Mäkelä
58e0dcb93d Add a missing space to an error message 2017-10-30 10:06:47 +02:00
Vladislav Vaintroub
97df230aed MDEV-14115 : Do not use lpNumberOfBytesRead/Written params in
ReadFile/WriteFile operations.

Innodb opens files with FILE_FLAG_OVERLAPPED. lpNumberOfBytesRead/Written
are documented to be potentially inaccurate in this case,
(possibly even if async operations complete synchronously?)

The fix is to always pass NULL for the correspondng parameters,
as recommended by  MSDN. Read the actual counts with
GetQueuedCompletionStatus() or GetOverlappedResult().
2017-10-27 23:42:02 +00:00
Marko Mäkelä
067f83969c MDEV-14132 follow-up fix: Make os_file_get_size() thread-safe
os_file_get_size(): Use fstat() instead of calling lseek() 3 times.
In this way, concurrent calls to this function should not interfere
with each other.

Suggested by Vladislav Vaintroub.
2017-10-27 19:33:38 +03:00
Marko Mäkelä
5f5ffdc76b MDEV-14132 follow-up fix: Validate the posix_fallocate() argument
os_file_set_size(): Sometimes the file already is large enough.
Avoid calling posix_fallocate() with a non-positive argument.
Also, add a missing space to an error message.
2017-10-27 18:59:22 +03:00
Vladislav Vaintroub
057a6cf768 MDEV-14132 : fix posix_fallocate() calls to workaround some (ancient) Linux bugs
With this patch, parameters passed to posix_fallocate() will be
the same as they were prior to refactoring in  commit b731a5bcf2

In particular, 'offset' parameter for posix_fallocate is again current_file_size
and 'length' is new_file_size - current_file_size.

This seems to fix crashes on ancient Linux (kernel 2.6).
2017-10-27 11:56:10 +00:00
Vladislav Vaintroub
fa7a1a57d9 Windows : small optimization in os_is_sparse_file_supported()
Use GetFileInformationByHandleEx with FileAttributeTagInfo to query whether
the file is sparse. This saves 1 syscall, as GetFileInformationByHandle()
would additionally query volume info.
2017-10-10 06:19:50 +00:00
Vladislav Vaintroub
ff2d9e125f MDEV-13941 followup.
Try to fix fragmentation (unsparse files), for pre-existing
installations.

Unsparse the innodb file, when it needs to be extended, unless compression
is used. For Win7/2008R2 unsparse  does not work (as documented in MSDN),
therefore for sparse files in older Windows, file extension will be done
via writing zeroes at the end of file.
2017-10-10 06:19:50 +00:00
Vladislav Vaintroub
b731a5bcf2 Innodb : Refactor os_file_set_size() to be compatible 10.1
The last parameter to this function is now,"bool is_sparse", like in 10.1
rather than the  unused/useless "bool is_readonly", merged from MySQL 5.7

Like in 10.1, this function now supports sparse files, and efficient
platform specific mechanisms for file extension

os_file_set_size() is now consistenly used in all places where
innodb files are extended.
2017-10-10 06:19:50 +00:00
Vladislav Vaintroub
420798a81a Refactor os_file_set_size to extend already existing files.
Change fil_space_extend_must_retry() to use this function.
2017-10-07 08:30:20 +00:00
Marko Mäkelä
2c1067166d Merge bb-10.2-ext into 10.3 2017-10-04 08:24:06 +03:00
Vladislav Vaintroub
96b9c61787 MDEV-13941 Fix high NTFS fragmentation on 10.2
Prior to this patch, creating or even opening any innodb file in 10.2
would set a sparse flag on file. The file extension was done by setting
end of file, without writing zeros. This technique is fine, however
due to sparsedness, it created a hole at the end of the file, which
lead to much higher fragmentation subsequently.

The fix is only to use sparse flag for compressed tables, where holes
are actually wanted, but not for normal tables.
2017-09-29 17:29:21 +00:00
Vladislav Vaintroub
7354dc6773 MDEV-13384 - misc Windows warnings fixed 2017-09-28 17:20:46 +00:00
Vladislav Vaintroub
eba44874ca MDEV-13844 : Fix Windows warnings. Fix DBUG_PRINT.
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)

- Also fix printf-format warnings

Make the above mentioned warnings fatal.

- fix pthread_join on Windows to set return value.
2017-09-28 17:20:46 +00:00
Vladislav Vaintroub
1d7bc3b582 Innodb : do not call fflush() in os_get_last_error_low(), if no error
message was written.
2017-09-16 09:45:38 +00:00
Marko Mäkelä
4e1fa7f63d Merge bb-10.2-ext into 10.3 2017-09-01 11:33:45 +03:00
Marko Mäkelä
4386ee8ccc Add ATTRIBUTE_NORETURN and ATTRIBUTE_COLD
ATTRIBUTE_NORETURN is supported on all platforms (MSVS and GCC-like).
It declares that a function will not return; instead, the thread or
the whole process will terminate.

ATTRIBUTE_COLD is supported starting with GCC 4.3. It declares that
a function is supposed to be executed rarely. Rarely used error-handling
functions and functions that emit messages to the error log should be
tagged such.
2017-08-31 09:30:55 +03:00
Sergei Golubchik
bb8e99fdc3 Merge branch 'bb-10.2-ext' into 10.3 2017-08-26 00:34:43 +02:00
Vladislav Vaintroub
edf77043ba MDEV-12948 : do not spam error log, if DeviceIoControl(IOCTL_STORAGE_QUERY_PROPERTY)
fails with ERROR_INVALID_FUNCTION

This DeviceIoControl seems to happen on different boxes from time to time,
and there is not much user can do about it.
Instead of error, log a single INFO message, so it does not disturb users
much.
2017-08-17 17:36:39 +00:00
Marko Mäkelä
57fea53615 Merge bb-10.2-ext into 10.3 2017-07-07 12:39:43 +03:00
Alexander Barkov
3b9273d203 Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3 2017-07-05 17:43:32 +04:00
Marko Mäkelä
8c71c6aa8b MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2
InnoDB I/O and buffer pool interfaces and the redo log format
have been changed between MariaDB 10.1 and 10.2, and the backup
code has to be adjusted accordingly.

The code has been simplified, and many memory leaks have been fixed.
Instead of the file name xtrabackup_logfile, the file name ib_logfile0
is being used for the copy of the redo log. Unnecessary InnoDB startup and
shutdown and some unnecessary threads have been removed.

Some help was provided by Vladislav Vaintroub.

Parameters have been cleaned up and aligned with those of MariaDB 10.2.

The --dbug option has been added, so that in debug builds,
--dbug=d,ib_log can be specified to enable diagnostic messages
for processing redo log entries.

By default, innodb_doublewrite=OFF, so that --prepare works faster.
If more crash-safety for --prepare is needed, double buffering
can be enabled.

The parameter innodb_log_checksums=OFF can be used to ignore redo log
checksums in --backup.

Some messages have been cleaned up.
Unless --export is specified, Mariabackup will not deal with undo log.
The InnoDB mini-transaction redo log is not only about user-level
transactions; it is actually about mini-transactions. To avoid confusion,
call it the redo log, not transaction log.

We disable any undo log processing in --prepare.

Because MariaDB 10.2 supports indexed virtual columns, the
undo log processing would need to be able to evaluate virtual column
expressions. To reduce the amount of code dependencies, we will not
process any undo log in prepare.

This means that the --export option must be disabled for now.

This also means that the following options are redundant
and have been removed:
	xtrabackup --apply-log-only
	innobackupex --redo-only

In addition to disabling any undo log processing, we will disable any
further changes to data pages during --prepare, including the change
buffer merge. This means that restoring incremental backups should
reliably work even when change buffering is being used on the server.
Because of this, preparing a backup will not generate any further
redo log, and the redo log file can be safely deleted. (If the
--export option is enabled in the future, it must generate redo log
when processing undo logs and buffered changes.)

In --prepare, we cannot easily know if a partial backup was used,
especially when restoring a series of incremental backups. So, we
simply warn about any missing files, and ignore the redo log for them.

FIXME: Enable the --export option.

FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write
a test that initiates a backup while an ALGORITHM=INPLACE operation
is creating indexes or rebuilding a table. An error should be detected
when preparing the backup.

FIXME: In --incremental --prepare, xtrabackup_apply_delta() should
ensure that if FSP_SIZE is modified, the file size will be adjusted
accordingly.
2017-07-05 11:43:28 +03:00
Marko Mäkelä
41a6475b49 InnoDB: Use access() instead of open() 2017-07-05 08:02:55 +03:00
Marko Mäkelä
68b5aeae4e Minor cleanup of InnoDB I/O routines
Change many function parameters from IORequest& to const IORequest&.

Remove an unused definition of ECANCELED.
2017-06-29 22:30:47 +03:00
Marko Mäkelä
bb60a832ed Minor cleanup of InnoDB shutdown
os_thread_active(): Remove.

srv_shutdown_all_bg_threads(): Assert that high-level threads
have already exited. Do not sleep if os_thread_count=0.
2017-06-29 22:20:34 +03:00
Marko Mäkelä
1e3886ae80 Merge bb-10.2-ext into 10.3 2017-06-19 17:28:08 +03:00
Marko Mäkelä
35248fed10 10.2 follow-up to MDEV-13039 innodb_fast_shutdown=0 crash due premature purge shutdown before fts_optimize_shutdown()
srv_start_state_t: Document the flags. Replace SRV_START_STATE_STAT
with SRV_START_STATE_REDO. The srv_bg_undo_sources replaces the
original use of SRV_START_STATE_STAT.

dict_stats_thread_started, buf_dump_thread_started,
buf_flush_page_cleaner_thread_started: Remove (unused).

srv_shutdown_all_bg_threads(): Always wait for the I/O threads
to exit, also in read-only mode.

os_thread_free(): Remove.
2017-06-12 19:07:34 +03:00