Commit graph

457 commits

Author SHA1 Message Date
Marko Mäkelä
7de38492fc After-merge fix: cmake -DPLUGIN_PERFSCHEMA=NO
An #include was forgotten in b6ac67389d
2019-07-25 13:34:31 +03:00
Marko Mäkelä
b6ac67389d Merge 10.1 into 10.2 2019-07-25 12:14:27 +03:00
Marko Mäkelä
0c7c61019d Remove the wrappers ut_time(), ut_difftime(), ib_time_t 2019-07-24 21:59:26 +03:00
Marko Mäkelä
10ee1b95b8 Remove ut_usectime(), ut_gettimeofday()
Replace ut_usectime() with my_interval_timer(),
which is equivalent, but monotonically counting nanoseconds
instead of counting the microseconds of real time.

os_event_wait_time_low(): Use my_hrtime() instead of ut_usectime().

FIXME: Set a clock attribute on the condition variable that allows
a monotonic clock to be chosen as the time base, so that the wait
is immune to adjustments of the system clock.
2019-07-24 21:59:26 +03:00
Marko Mäkelä
cbac8f9351 MDEV-19725 Incorrect error handling in ALTER TABLE
Some I/O functions and macros that are declared in os0file.h used to
return a Boolean status code (nonzero on success). In MySQL 5.7, they
were changed to return dberr_t instead. Alas, in MariaDB Server 10.2,
some uses of functions were not adjusted to the changed return value.

Until MDEV-19231, the valid values of dberr_t were always nonzero.
This means that some code that was incorrectly checking for a zero
return value from the functions would never detect a failure.

After MDEV-19231, some tests for ALTER ONLINE TABLE would fail with
cmake -DPLUGIN_PERFSCHEMA=NO. It turned out that the wrappers
pfs_os_file_read_no_error_handling_int_fd_func() and
pfs_os_file_write_int_fd_func() were wrongly returning
bool instead of dberr_t. Also the callers of these functions were
wrongly expecting bool (nonzero on success) instead of dberr_t.

This mistake had been made when the addition of these functions was
merged from MySQL 5.6.36 and 5.7.18 into MariaDB Server 10.2.7.

This fix also reverts commit 40becbc3c7
which attempted to work around the problem.
2019-06-10 18:15:25 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
c0ac0b8860 Update FSF address 2019-05-11 19:25:02 +03:00
Sachin Agarwal
06ec56f579 Bug #27850600 INNODB ASYNC IO ERROR HANDLING IN IO_EVENT
Problem:
io_getevents() - read asynchronous I/O events from the completion
queue. For each IO event, the res field in io_event tells whether IO
event is succeeded or not. To see if the IO actually succeeded we
always need to check event.res (negative=error,
positive=bytesread/written).
LinuxAIOHandler::collect() doesn't check event.res value for each event.
which leads to incorrect value in n_bytes for IO context (or IO Slot).

Fix:
Added a check for event.res negative value.

RB: 20871
Reviewed by : annamalai.gurusami@oracle.com
2019-04-26 17:40:20 +03:00
Marko Mäkelä
e7f426d2c9 MDEV-19212: Replace macros with type-safe inline functions
The regression that was reported in MDEV-19212 occurred due to use
of macros that did not ensure that the arguments have compatible
types.

ut_2pow_remainder(), ut_2pow_round(), ut_calc_align(): Define as
inline function templates.

UT_CALC_ALIGN(): Define as a macro, because this is used in
compile_time_assert(). Only starting with C++11 (MariaDB 10.4)
we could define the inline functions as constexpr.
2019-04-08 21:33:49 +03:00
Marko Mäkelä
f120a15b93 MDEV-19212 4GB Limit on large_pages - integer overflow
os_mem_alloc_large(): Invoke the macro ut_2pow_round() with the
correct argument type.

innobase_large_page_size, innobase_use_large_pages,
os_use_large_pages, os_large_page_size: Remove.
Simply refer to opt_large_page_size, my_use_large_pages.
2019-04-08 21:33:49 +03:00
Sergei Golubchik
f97d879bf8 cmake: re-enable -Werror in the maintainer mode
now we can afford it. Fix -Werror errors. Note:
* old gcc is bad at detecting uninit variables, disable it.
* time_t is int or long, cast it for printf's
2019-03-27 22:51:37 +01:00
Marko Mäkelä
00572a0b0c MDEV-17482 InnoDB fails to say which fatal error fsync() returned
os_file_fsync_posix(): If fsync() returns a fatal error,
do include errno in the error message.

In the future, we might handle fsync() or write or allocation failures
on InnoDB data files a little more gracefully: flag the affected index
or table as corrupted, and deny any subsequent writes to the table.

If a write to the undo log or redo log fails, an alternative to
killing the server could be to deny any writes to InnoDB tables
until the server has been restarted.
2019-03-18 12:32:10 +02:00
Sergei Golubchik
f1134d5676 post-merge: gcc 8 warnings
note: Inherit String from Sql_alloc,
to get operators new and new[] in sync

in rocksdb gcc was complaining that non-lvalue was cast to const.
2019-03-15 21:00:50 +01:00
Marko Mäkelä
2565c02ca5 Remove unnecessary type casts 2019-01-23 14:42:21 +02:00
Marko Mäkelä
8e80fd6bfd Merge 10.1 into 10.2 2019-01-17 11:24:38 +02:00
Marko Mäkelä
71eb762611 Merge 10.0 into 10.1 2019-01-17 06:40:24 +02:00
Eugene Kosov
e0633f25e8 MDEV-18243 incorrect ASAN instrumentation
Poisoning memory after munmap() and friends is totally incorrect as this memory
could be anything.

os_mem_free_large(): remove memory poisoning
2019-01-15 15:32:18 +03:00
Eugene Kosov
0dafcf529c cleanup os_event 2018-12-21 10:16:03 +02:00
Eugene Kosov
ed166f53fa MDEV-18043 data race in os_event
os_event::is_set(): protect os_event::m_set with os_event::mutex
2018-12-21 10:16:03 +02:00
Marko Mäkelä
447e493179 Remove some unnecessary InnoDB #include 2018-11-29 12:53:44 +02:00
Marko Mäkelä
ff88e4bb8a Remove many redundant #include from InnoDB 2018-11-19 11:42:14 +02:00
Marko Mäkelä
055a3334ad MDEV-13564 Mariabackup does not work with TRUNCATE
Implement undo tablespace truncation via normal redo logging.

Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name,
CREATE, and DROP.

Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2
is killed before the DROP operation is committed. If MariaDB Server 10.2
is killed during TRUNCATE, it is also possible that the old table
was renamed to #sql-ib*.ibd but the data dictionary will refer to the
table using the original name.

In MariaDB Server 10.3, RENAME inside InnoDB is transactional,
and #sql-* tables will be dropped on startup. So, this new TRUNCATE
will be fully crash-safe in 10.3.

ha_mroonga::wrapper_truncate(): Pass table options to the underlying
storage engine, now that ha_innobase::truncate() will need them.

rpl_slave_state::truncate_state_table(): Before truncating
mysql.gtid_slave_pos, evict any cached table handles from
the table definition cache, so that there will be no stale
references to the old table after truncating.

== TRUNCATE TABLE ==

WL#6501 in MySQL 5.7 introduced separate log files for implementing
atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB
undo and redo log. Some convoluted logic was added to the InnoDB
crash recovery, and some extra synchronization (including a redo log
checkpoint) was introduced to make this work. This synchronization
has caused performance problems and race conditions, and the extra
log files cannot be copied or applied by external backup programs.

In order to support crash-upgrade from MariaDB 10.2, we will keep
the logic for parsing and applying the extra log files, but we will
no longer generate those files in TRUNCATE TABLE.

A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE
(with full redo and undo logging and proper rollback). This will
be implemented in MDEV-14717.

ha_innobase::truncate(): Invoke RENAME, create(), delete_table().
Because RENAME cannot be fully rolled back before MariaDB 10.3
due to missing undo logging, add some explicit rename-back in
case the operation fails.

ha_innobase::delete(): Introduce a variant that takes sqlcom as
a parameter. In TRUNCATE TABLE, we do not want to touch any
FOREIGN KEY constraints.

ha_innobase::create(): Add the parameters file_per_table, trx.
In TRUNCATE, the new table must be created in the same transaction
that renames the old table.

create_table_info_t::create_table_info_t(): Add the parameters
file_per_table, trx.

row_drop_table_for_mysql(): Replace a bool parameter with sqlcom.

row_drop_table_after_create_fail(): New function, wrapping
row_drop_table_for_mysql().

dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(),
fil_prepare_for_truncate(), fil_reinit_space_header_for_table(),
row_truncate_table_for_mysql(), TruncateLogger,
row_truncate_prepare(), row_truncate_rollback(),
row_truncate_complete(), row_truncate_fts(),
row_truncate_update_system_tables(),
row_truncate_foreign_key_checks(), row_truncate_sanity_checks():
Remove.

row_upd_check_references_constraints(): Remove a check for
TRUNCATE, now that the table is no longer truncated in place.

The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some
race-condition like scenarios. The test innodb-innodb.truncate does
not use any synchronization.

We add a redo log subformat to indicate backup-friendly format.
MariaDB 10.4 will remove support for the old TRUNCATE logging,
so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve
limitations.

== Undo tablespace truncation ==

MySQL 5.7 implements undo tablespace truncation. It is only
possible when innodb_undo_tablespaces is set to at least 2.
The logging is implemented similar to the WL#6501 TRUNCATE,
that is, using separate log files and a redo log checkpoint.

We can simply implement undo tablespace truncation within
a single mini-transaction that reinitializes the undo log
tablespace file. Unfortunately, due to the redo log format
of some operations, currently, the total redo log written by
undo tablespace truncation will be more than the combined size
of the truncated undo tablespace. It should be acceptable
to have a little more than 1 megabyte of log in a single
mini-transaction. This will be fixed in MDEV-17138 in
MariaDB Server 10.4.

recv_sys_t: Add truncated_undo_spaces[] to remember for which undo
tablespaces a MLOG_FILE_CREATE2 record was seen.

namespace undo: Remove some unnecessary declarations.

fil_space_t::is_being_truncated: Document that this flag now
only applies to undo tablespaces. Remove some references.

fil_space_t::is_stopping(): Do not refer to is_being_truncated.
This check is for tablespaces of tables. Potentially used
tablespaces are never truncated any more.

buf_dblwr_process(): Suppress the out-of-bounds warning
for undo tablespaces.

fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero
page number (new size of the tablespace in pages) to inform
crash recovery that the undo tablespace size has been reduced.

fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2
can be written for undo tablespaces (without .ibd file suffix)
for a nonzero page number.

os_file_truncate(): Add the parameter allow_shrink=false
so that undo tablespaces can actually be shrunk using this function.

fil_name_parse(): For undo tablespace truncation,
buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[].

recv_read_in_area(): Avoid reading pages for which no redo log
records remain buffered, after recv_addr_trim() removed them.

trx_rseg_header_create(): Add a FIXME comment that we could write
much less redo log.

trx_undo_truncate_tablespace(): Reinitialize the undo tablespace
in a single mini-transaction, which will be flushed to the redo log
before the file size is trimmed.

recv_addr_trim(): Discard any redo logs for pages that were
logged after the new end of a file, before the truncation LSN.
If the rec_list becomes empty, reduce n_addrs. After removing
any affected records, actually truncate the file.

recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before
applying any log records. The undo tablespace files must be open
at this point.

buf_flush_or_remove_pages(), buf_flush_dirty_pages(),
buf_LRU_flush_or_remove_pages(): Add a parameter for specifying
the number of the first page to flush or remove (default 0).

trx_purge_initiate_truncate(): Remove the log checkpoints, the
extra logging, and some unnecessary crash points. Merge the code
from trx_undo_truncate_tablespace(). First, flush all to-be-discarded
pages (beyond the new end of the file), then trim the space->size
to make the page allocation deterministic. At the only remaining
crash injection point, flush the redo log, so that the recovery
can be tested.
2018-09-07 22:10:02 +03:00
Marko Mäkelä
de469a2f29 MDEV-14637: Fix hang due to persistent statistics
Similar to the tables SYS_FOREIGN and SYS_FOREIGN_COLS,
the tables mysql.innodb_table_stats and mysql.innodb_index_stats
are updated by the InnoDB internal SQL parser, which fails to
enforce the size limits of the data. Due to this, it is possible
for InnoDB to hang when there are persistent statistics defined on
partitioned tables where the total length of table name,
partition name and subpartition name exceeds the incorrectly
defined limit VARCHAR(64). That column should have been defined
as VARCHAR(199).

btr_node_ptr_max_size(): Interpret the VARCHAR(64) as VARCHAR(199),
to prevent a hang in the case that the upgrade script has not been
run.

dict_table_schema_check(): Ignore difference in the length of the
table_name column.

ha_innobase::max_supported_key_length(): For innodb_page_size=4k,
return a larger value so that the table mysql.innodb_index_stats
can be created. This could allow "impossible" tables to be created,
such that it is not possible to insert anything into a secondary
index when both the secondary key and the primary key are long,
but this is the easiest and most consistent way. The Oracle fix
would only ignore the maximum length violation for the two
statistics tables.

os_file_get_status_posix(), os_file_get_status_win32(): Handle
ENAMETOOLONG as well.

This patch is based on the following change in MySQL 5.7.23.
Not all changes were applied, and our variant allows persistent
statistics to work without hangs even if the table definitions
were not upgraded.

From fdbdce701ab8145ae234c9d401109dff4e4106cb Mon Sep 17 00:00:00 2001
From: Aditya A <aditya.a@oracle.com>
Date: Thu, 17 May 2018 16:11:43 +0530
Subject: [PATCH] Bug #26390736 THE FIELD TABLE_NAME (VARCHAR(64)) FROM
           MYSQL.INNODB_TABLE_STATS CAN OVERFLOW.

    In mysql.innodb_index_stats and mysql.innodb_table_stats
    tables the table name column didn't take into consideration
    partition names which can be more than varchar(64).
2018-08-03 08:33:38 +03:00
Allen Lai
f70a318576 Bug#27805553 HARD ERROR SHOULD BE REPORTED WHEN FSYNC() RETURN EIO.
fsync() will just return EIO only once when the IO error happens, so, it's
wrong to keep trying to call it till it return success.
When fsync() returns EIO it should be treated as a hard error and InnoDB must
abort immediately.
2018-08-03 08:32:17 +03:00
Vladislav Vaintroub
b71c9ae030 amend fix for MDEV-16596 - do not use CREATE_NEW flag when reopening redo log file.
use OPEN_ALWAYS instead, since we know file already exist.
2018-07-01 14:00:29 +01:00
Vladislav Vaintroub
c612a1e77c MDEV-16596 : Windows - redo log does not work on native 4K sector disks.
Disks with native 4K sectors need 4K alignment and size for  unbuffered IO
(i.e for files opened with FILE_FLAG_NO_BUFFERING)

Innodb opens redo log with FILE_FLAG_NO_BUFFERING, however it always does
512byte IOs. Thus, the IO on 4K native sectors will fail, rendering
Innodb non-functional.

The fix is to check whether OS_FILE_LOG_BLOCK_SIZE is multiple of logical
sector size, and if it is not, reopen the redo log without
FILE_FLAG_NO_BUFFERING flag.
2018-06-30 11:04:51 +01:00
Vladislav Vaintroub
04677f44c7 Innodb : do not use errno on Windows to print os_file_pwrite() error.
Use GetLastError() instead.
2018-06-28 17:23:05 +01:00
Sergei Golubchik
b942aa34c1 Merge branch '10.1' into 10.2 2018-06-21 23:47:39 +02:00
Vicențiu Ciorbaru
6e55236c0a Merge branch '10.0-galera' into 10.1 2018-06-12 19:39:37 +03:00
Vicențiu Ciorbaru
aa59ecec89 Merge branch '10.0' into 10.1 2018-06-12 18:55:27 +03:00
Marko Mäkelä
8f5f0575ab MDEV-16456 InnoDB error "returned OS error 71" complains about wrong path
When attempting to rename a table to a non-existing database,
InnoDB would misleadingly report "OS error 71" when in fact the
error code is InnoDB's own (OS_FILE_NOT_FOUND), and not report
both pathnames. Errors on rename could occur due to reasons
connected to either pathname.

os_file_handle_rename_error(): New function, to report errors in
renaming files.
2018-06-12 10:25:23 +03:00
Marko Mäkelä
0ad9c3a016 MDEV-16456 InnoDB error "returned OS error 71" complains about wrong path
When attempting to rename a table to a non-existing database,
InnoDB would misleadingly report "OS error 71" when in fact the
error code is InnoDB's own (OS_FILE_NOT_FOUND), and not report
both pathnames. Errors on rename could occur due to reasons
connected to either pathname.

os_file_handle_rename_error(): New function, to report errors in
renaming files.
2018-06-12 09:54:31 +03:00
Marko Mäkelä
df42830b28 Merge 10.1 into 10.2 2018-06-06 11:25:33 +03:00
Marko Mäkelä
1d4e1d3263 Merge 10.0 to 10.1 2018-06-06 11:04:17 +03:00
Marko Mäkelä
55abcfa7b7 MDEV-16124 fil_rename_tablespace() times out and crashes server during table-rebuilding ALTER TABLE
InnoDB insisted on closing the file handle before renaming a file.
Renaming a file should never be a problem on POSIX systems. Also on
Windows it should work if the file was opened in FILE_SHARE_DELETE
mode.

fil_space_t::stop_ios: Remove. We no longer need to stop file access
during rename operations.

fil_mutex_enter_and_prepare_for_io(): Remove the wait for stop_ios.

fil_rename_tablespace(): Remove the retry logic; do not close the
file handle. Remove the unused fault injection that was added along
with the DATA DIRECTORY functionality (MySQL WL#5980).

os_file_create_simple_func(), os_file_create_func(),
os_file_create_simple_no_error_handling_func(): Include FILE_SHARE_DELETE
in the share_mode. (We will still prevent multiple InnoDB instances
from using the same files by not setting FILE_SHARE_WRITE.)
2018-06-05 18:16:12 +03:00
Jan Lindström
648cf7176c Merge remote-tracking branch 'origin/5.5-galera' into 10.0-galera 2018-05-07 13:49:14 +03:00
Vladislav Vaintroub
47ea2227e5 fix typo, amend last commit 2018-04-14 23:59:59 +01:00
Vladislav Vaintroub
043a9b4e1b Windows, innodb : reduce noise from os_file_get_block_size()
if volume can't be opened due to permissions, or
IOCTL_STORAGE_QUERY_PROPERTY fails with not implemented, do not report it.
Those errors happen, there is nothing user can do.

This patch amends fix for MDEV-12948.
2018-04-14 23:53:11 +01:00
Vladislav Vaintroub
7b16291c36 MDEV-15707 : deadlock in Innodb IO code, caused by change buffering.
In async IO completion code, after reading a page,Innodb can wait for
completion of other bufferpool reads.
This is for example what happens if change-buffering is active.

Innodb on Windows could deadlock, as it did not have dedicated threads
for processing change buffer asynchronous reads.

The fix for that is to have windows now has the same background threads,
including dedicated thread for ibuf, and log AIOs.

The ibuf/read completions are now dispatched to their threads with
PostQueuedCompletionStatus(), the write and log completions are processed
in thread where they arrive.
2018-04-08 21:32:02 +00:00
Marko Mäkelä
3d7915f000 Merge 10.1 into 10.2 2018-03-21 22:58:52 +02:00
Vicențiu Ciorbaru
82aeb6b596 Merge branch '10.1' into 10.2 2018-03-21 10:36:49 +02:00
Marko Mäkelä
e0a0fe7d81 MDEV-12396 IMPORT TABLESPACE: Do not retry partial reads
fil_iterate(), fil_tablespace_iterate(): Replace os_file_read()
with os_file_read_no_error_handling().

os_file_read_func(), os_file_read_no_error_handling_func():
Do not retry partial reads. There used to be an infinite amount
of retries. Because InnoDB extends both data and log files upfront,
partial reads should be impossible during normal operation.
2018-03-20 15:31:39 +02:00
Vicențiu Ciorbaru
24b353162f Merge branch '10.0-galera' into 10.1 2018-03-19 15:21:01 +02:00
Daniel Black
26e4a48bda MDEV-8743: ib_logfile0 Use O_CLOEXEC so galera SST scripts don't get fd 2018-03-02 11:09:51 +11:00
Daniel Black
9629bca1f0 MDEV-8743: use O_CLOEXEC (innodb/xtradb) 2018-03-02 10:54:00 +11:00
Marko Mäkelä
00f0c039d2 MDEV-15270 Mariabackup should not try to use doublewrite buffer
When Mariabackup gets a bad read of the first page of the system
tablespace file, it would inappropriately try to apply the doublewrite
buffer and write changes back to the data file (to the source file)!
This is very wrong and must be prevented.

The correct action would be to retry reading the system tablespace
as well as any other files whose first page was read incorrectly.
Fixing this was not attempted.

xb_load_tablespaces(): Shorten a bogus message to be more relevant.
The message can be displayed by --backup or --prepare.

xtrabackup_backup_func(), os_file_write_func(): Add a missing space
to a message.

Datafile::restore_from_doublewrite(): Do not even attempt the
operation in Mariabackup.

recv_init_crash_recovery_spaces(): Do not attempt to restore the
doublewrite buffer in Mariabackup (--prepare or --export), because
all pages should have been copied correctly in --backup already,
and because --backup should ignore the doublewrite buffer.

SysTablespace::read_lsn_and_check_flags(): Do not attempt to initialize
the doublewrite buffer in Mariabackup.

innodb_make_page_dirty(): Correct the bounds check.

Datafile::read_first_page(): Correct the name of the parameter.
2018-02-12 16:56:01 +02:00
Marko Mäkelä
c19ef508b8 InnoDB: Remove ut_snprintf() and the use of my_snprintf(); use snprintf() 2017-11-13 02:11:48 +02:00
Marko Mäkelä
51679e5c38 MDEV-14132 InnoDB page corruption
On some old GNU/Linux systems, invoking posix_fallocate() with
offset=0 would sometimes cause already allocated bytes in the
data file to be overwritten.

Fix a correctness regression that was introduced in
commit 420798a81a
by invoking posix_fallocate() in a safer way.
A similar change was made in MDEV-5746 earlier.

os_file_get_size(): Avoid changing the state of the file handle,
by invoking fstat() instead of lseek().

os_file_set_size(): Determine the current size of the file
by os_file_get_size(), and then extend the file from that point
onwards.
2017-11-06 08:53:51 +02:00
Marko Mäkelä
30a8764b92 MDEV-14244 MariaDB fails to run with O_DIRECT
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.

MariaDB used to ignore any possible return value from posix_fallocate()
ever since innodb_use_fallocate was introduced in MDEV-4338. If EINVAL
was returned, the file would not be extended.

Starting with MDEV-11520, MariaDB would treat EINVAL as a hard error.

Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
2017-11-06 08:53:50 +02:00
Marko Mäkelä
19733efa7b MDEV-14244 MariaDB 10.2.10 fails to run on Debian Stretch with ext3 and O_DIRECT
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.

MariaDB 10.2 used to handle the EINVAL return value from posix_fallocate()
before commit b731a5bcf2
which refactored os_file_set_size() to try posix_fallocate().

Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
2017-11-02 16:18:41 +02:00