When attempting to rename a table to a non-existing database,
InnoDB would misleadingly report "OS error 71" when in fact the
error code is InnoDB's own (OS_FILE_NOT_FOUND), and not report
both pathnames. Errors on rename could occur due to reasons
connected to either pathname.
os_file_handle_rename_error(): New function, to report errors in
renaming files.
InnoDB insisted on closing the file handle before renaming a file.
Renaming a file should never be a problem on POSIX systems. Also on
Windows it should work if the file was opened in FILE_SHARE_DELETE
mode.
fil_space_t::stop_ios: Remove. We no longer need to stop file access
during rename operations.
fil_mutex_enter_and_prepare_for_io(): Remove the wait for stop_ios.
fil_rename_tablespace(): Remove the retry logic; do not close the
file handle. Remove the unused fault injection that was added along
with the DATA DIRECTORY functionality (MySQL WL#5980).
os_file_create_simple_func(), os_file_create_func(),
os_file_create_simple_no_error_handling_func(): Include FILE_SHARE_DELETE
in the share_mode. (We will still prevent multiple InnoDB instances
from using the same files by not setting FILE_SHARE_WRITE.)
if volume can't be opened due to permissions, or
IOCTL_STORAGE_QUERY_PROPERTY fails with not implemented, do not report it.
Those errors happen, there is nothing user can do.
This patch amends fix for MDEV-12948.
In async IO completion code, after reading a page,Innodb can wait for
completion of other bufferpool reads.
This is for example what happens if change-buffering is active.
Innodb on Windows could deadlock, as it did not have dedicated threads
for processing change buffer asynchronous reads.
The fix for that is to have windows now has the same background threads,
including dedicated thread for ibuf, and log AIOs.
The ibuf/read completions are now dispatched to their threads with
PostQueuedCompletionStatus(), the write and log completions are processed
in thread where they arrive.
fil_iterate(), fil_tablespace_iterate(): Replace os_file_read()
with os_file_read_no_error_handling().
os_file_read_func(), os_file_read_no_error_handling_func():
Do not retry partial reads. There used to be an infinite amount
of retries. Because InnoDB extends both data and log files upfront,
partial reads should be impossible during normal operation.
When Mariabackup gets a bad read of the first page of the system
tablespace file, it would inappropriately try to apply the doublewrite
buffer and write changes back to the data file (to the source file)!
This is very wrong and must be prevented.
The correct action would be to retry reading the system tablespace
as well as any other files whose first page was read incorrectly.
Fixing this was not attempted.
xb_load_tablespaces(): Shorten a bogus message to be more relevant.
The message can be displayed by --backup or --prepare.
xtrabackup_backup_func(), os_file_write_func(): Add a missing space
to a message.
Datafile::restore_from_doublewrite(): Do not even attempt the
operation in Mariabackup.
recv_init_crash_recovery_spaces(): Do not attempt to restore the
doublewrite buffer in Mariabackup (--prepare or --export), because
all pages should have been copied correctly in --backup already,
and because --backup should ignore the doublewrite buffer.
SysTablespace::read_lsn_and_check_flags(): Do not attempt to initialize
the doublewrite buffer in Mariabackup.
innodb_make_page_dirty(): Correct the bounds check.
Datafile::read_first_page(): Correct the name of the parameter.
On some old GNU/Linux systems, invoking posix_fallocate() with
offset=0 would sometimes cause already allocated bytes in the
data file to be overwritten.
Fix a correctness regression that was introduced in
commit 420798a81a
by invoking posix_fallocate() in a safer way.
A similar change was made in MDEV-5746 earlier.
os_file_get_size(): Avoid changing the state of the file handle,
by invoking fstat() instead of lseek().
os_file_set_size(): Determine the current size of the file
by os_file_get_size(), and then extend the file from that point
onwards.
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.
MariaDB used to ignore any possible return value from posix_fallocate()
ever since innodb_use_fallocate was introduced in MDEV-4338. If EINVAL
was returned, the file would not be extended.
Starting with MDEV-11520, MariaDB would treat EINVAL as a hard error.
Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.
MariaDB 10.2 used to handle the EINVAL return value from posix_fallocate()
before commit b731a5bcf2
which refactored os_file_set_size() to try posix_fallocate().
Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
ReadFile/WriteFile operations.
Innodb opens files with FILE_FLAG_OVERLAPPED. lpNumberOfBytesRead/Written
are documented to be potentially inaccurate in this case,
(possibly even if async operations complete synchronously?)
The fix is to always pass NULL for the correspondng parameters,
as recommended by MSDN. Read the actual counts with
GetQueuedCompletionStatus() or GetOverlappedResult().
os_file_get_size(): Use fstat() instead of calling lseek() 3 times.
In this way, concurrent calls to this function should not interfere
with each other.
Suggested by Vladislav Vaintroub.
os_file_set_size(): Sometimes the file already is large enough.
Avoid calling posix_fallocate() with a non-positive argument.
Also, add a missing space to an error message.
With this patch, parameters passed to posix_fallocate() will be
the same as they were prior to refactoring in commit b731a5bcf2
In particular, 'offset' parameter for posix_fallocate is again current_file_size
and 'length' is new_file_size - current_file_size.
This seems to fix crashes on ancient Linux (kernel 2.6).
Use GetFileInformationByHandleEx with FileAttributeTagInfo to query whether
the file is sparse. This saves 1 syscall, as GetFileInformationByHandle()
would additionally query volume info.
Try to fix fragmentation (unsparse files), for pre-existing
installations.
Unsparse the innodb file, when it needs to be extended, unless compression
is used. For Win7/2008R2 unsparse does not work (as documented in MSDN),
therefore for sparse files in older Windows, file extension will be done
via writing zeroes at the end of file.
The last parameter to this function is now,"bool is_sparse", like in 10.1
rather than the unused/useless "bool is_readonly", merged from MySQL 5.7
Like in 10.1, this function now supports sparse files, and efficient
platform specific mechanisms for file extension
os_file_set_size() is now consistenly used in all places where
innodb files are extended.
Prior to this patch, creating or even opening any innodb file in 10.2
would set a sparse flag on file. The file extension was done by setting
end of file, without writing zeros. This technique is fine, however
due to sparsedness, it created a hole at the end of the file, which
lead to much higher fragmentation subsequently.
The fix is only to use sparse flag for compressed tables, where holes
are actually wanted, but not for normal tables.
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)
- Also fix printf-format warnings
Make the above mentioned warnings fatal.
- fix pthread_join on Windows to set return value.
ATTRIBUTE_NORETURN is supported on all platforms (MSVS and GCC-like).
It declares that a function will not return; instead, the thread or
the whole process will terminate.
ATTRIBUTE_COLD is supported starting with GCC 4.3. It declares that
a function is supposed to be executed rarely. Rarely used error-handling
functions and functions that emit messages to the error log should be
tagged such.
fails with ERROR_INVALID_FUNCTION
This DeviceIoControl seems to happen on different boxes from time to time,
and there is not much user can do about it.
Instead of error, log a single INFO message, so it does not disturb users
much.
InnoDB I/O and buffer pool interfaces and the redo log format
have been changed between MariaDB 10.1 and 10.2, and the backup
code has to be adjusted accordingly.
The code has been simplified, and many memory leaks have been fixed.
Instead of the file name xtrabackup_logfile, the file name ib_logfile0
is being used for the copy of the redo log. Unnecessary InnoDB startup and
shutdown and some unnecessary threads have been removed.
Some help was provided by Vladislav Vaintroub.
Parameters have been cleaned up and aligned with those of MariaDB 10.2.
The --dbug option has been added, so that in debug builds,
--dbug=d,ib_log can be specified to enable diagnostic messages
for processing redo log entries.
By default, innodb_doublewrite=OFF, so that --prepare works faster.
If more crash-safety for --prepare is needed, double buffering
can be enabled.
The parameter innodb_log_checksums=OFF can be used to ignore redo log
checksums in --backup.
Some messages have been cleaned up.
Unless --export is specified, Mariabackup will not deal with undo log.
The InnoDB mini-transaction redo log is not only about user-level
transactions; it is actually about mini-transactions. To avoid confusion,
call it the redo log, not transaction log.
We disable any undo log processing in --prepare.
Because MariaDB 10.2 supports indexed virtual columns, the
undo log processing would need to be able to evaluate virtual column
expressions. To reduce the amount of code dependencies, we will not
process any undo log in prepare.
This means that the --export option must be disabled for now.
This also means that the following options are redundant
and have been removed:
xtrabackup --apply-log-only
innobackupex --redo-only
In addition to disabling any undo log processing, we will disable any
further changes to data pages during --prepare, including the change
buffer merge. This means that restoring incremental backups should
reliably work even when change buffering is being used on the server.
Because of this, preparing a backup will not generate any further
redo log, and the redo log file can be safely deleted. (If the
--export option is enabled in the future, it must generate redo log
when processing undo logs and buffered changes.)
In --prepare, we cannot easily know if a partial backup was used,
especially when restoring a series of incremental backups. So, we
simply warn about any missing files, and ignore the redo log for them.
FIXME: Enable the --export option.
FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write
a test that initiates a backup while an ALGORITHM=INPLACE operation
is creating indexes or rebuilding a table. An error should be detected
when preparing the backup.
FIXME: In --incremental --prepare, xtrabackup_apply_delta() should
ensure that if FSP_SIZE is modified, the file size will be adjusted
accordingly.
srv_start_state_t: Document the flags. Replace SRV_START_STATE_STAT
with SRV_START_STATE_REDO. The srv_bg_undo_sources replaces the
original use of SRV_START_STATE_STAT.
dict_stats_thread_started, buf_dump_thread_started,
buf_flush_page_cleaner_thread_started: Remove (unused).
srv_shutdown_all_bg_threads(): Always wait for the I/O threads
to exit, also in read-only mode.
os_thread_free(): Remove.
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293
from current 5.6.36 innodb.
Bug #23481444 OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE
INDEX APPLIED BY UNCOMMITTED ROW
Problem:
========
row_search_for_mysql() does whole table traversal for range query
even though the end range is passed. Whole table traversal happens
when the record is not with in transaction read view.
Solution:
=========
Convert the innodb last record of page to mysql format and compare
with end range if the traversal of row_search_mvcc() exceeds 100,
no ICP involved. If it is out of range then InnoDB can avoid the
whole table traversal. Need to refactor the code little bit to
make it compile.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com>
Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com>
RB: 14660
Use uint32_t for the encryption key_id.
When filling unsigned integer values into INFORMATION_SCHEMA tables,
use the method Field::store(longlong, bool unsigned)
instead of using Field::store(double).
Fix also some miscellanous type mismatch related to ulint (size_t).
Description
===========
Under heavy load, the aysnchronous Windows file IO API can return a
failure code that is handled in MySQL Server by retrying the file IO operation.
A cast necessary for the correct operation of the retry path in a 64 bit
build is missing, leading to the file IO retry result being misinterpreted
and ultimately the report of the OS error number 995
(ERROR_OPERATION_ABORTED) in the MySQL error log.
Fix
===
Supply the missing cast.
Reviewed-by: Sunny Bains <sunny.bains@oracle.com>
RB: 14109
Avoid detaching and exiting from threads that may finish before the
caller has returned from pthread_create(). Only exit from such threads,
without detach and join with them later.
Patch submitted by: Laurynas Biveinis <laurynas.biveinis@gmail.com>
RB: 13983
Reviewed by: Sunny Bains <sunny.bains@oracle.com>
Alias the InnoDB ulint and lint data types to size_t and ssize_t,
which are the standard names for the machine-word-width data types.
Correspondingly, define ULINTPF as "%zu" and introduce ULINTPFx as "%zx".
In this way, better compiler warnings for type mismatch are possible.
Furthermore, use PRIu64 for that 64-bit format, and define
the feature macro __STDC_FORMAT_MACROS to enable it on Red Hat systems.
Fix some errors in error messages, and replace some error messages
with assertions.
Most notably, an IMPORT TABLESPACE error message in InnoDB was
displaying the number of columns instead of the mismatching flags.
Define UNIV_WORD_SIZE as a simple alias to SIZEOF_SIZE_T.
In MariaDB 10.0 and 10.1, it was incorrectly defined as 4 on
64-bit Windows.
MONITOR_OS_PENDING_READS, MONITOR_OS_PENDING_WRITES: Enable by default.
os_n_pending_reads, os_n_pending_writes: Remove.
Use the monitor counters instead.