Replace all io_context* occurrences with io_context_t
Even in release mode die immediately when some io_* functions return
EINVAL. This always means some programming bug and it's better to fail fast.
LinuxAIOHandler::resubmit(): fix condition. Stop ignoring -1 return code which
corresponds to EPERM and io_submit() really can return this one.
Use io_destroy() to stop leaking io_context_t.
Make m_aio_ctx std::vector instead of C array. I think that internal check
for index overflow might be useful.
Add debug assertions for EFAULT because for me receiving it
looks like a programming bug.
Almost all threads have gone
- the "ticking" threads, that sleep a while then do some work)
(srv_monitor_thread, srv_error_monitor_thread, srv_master_thread)
were replaced with timers. Some timers are periodic,
e.g the "master" timer.
- The btr_defragment_thread is also replaced by a timer , which
reschedules it self when current defragment "item" needs throttling
- the buf_resize_thread and buf_dump_threads are substitutes with tasks
Ditto with page cleaner workers.
- purge workers threads are not tasks as well, and purge cleaner
coordinator is a combination of a task and timer.
- All AIO is outsourced to tpool, Innodb just calls thread_pool::submit_io()
and provides the callback.
- The srv_slot_t was removed, and innodb_debug_sync used in purge
is currently not working, and needs reimplementation.
Ignore GetDiskFreeSpace() errors in os_file_get_status_win32
The call is only used to calculate filesystem block size, and this in
turn is only shown in information_schema.sys_tablespaces.FS_BLOCK_SIZE.
There is no other use of this field, it does not affect any Innodb
functionality
Replace ut_usectime() with my_interval_timer(),
which is equivalent, but monotonically counting nanoseconds
instead of counting the microseconds of real time.
os_event_wait_time_low(): Use my_hrtime() instead of ut_usectime().
FIXME: Set a clock attribute on the condition variable that allows
a monotonic clock to be chosen as the time base, so that the wait
is immune to adjustments of the system clock.
According to the code, it was Windows specific "simulated AIO"
workaround. The simulated s not supported on Windows anymore.
Thus, remove the dead code
Many InnoDB internal variables and counters were only exposed
in an unstructured fashion via SHOW ENGINE INNODB STATUS.
Expose more variables via SHOW STATUS. Many of these were
exported in XtraDB.
Also, introduce SHOW_SIZE_T and use the proper size for
exporting the InnoDB variables.
Remove some unnecessary indirection via export_vars, and
bind some variables directly.
dict_sys_t::rough_size(): Replaces dict_sys_get_size()
and includes the hash table sizes.
This is based on a contribution by Tony Liu from ServiceNow.
Some I/O functions and macros that are declared in os0file.h used to
return a Boolean status code (nonzero on success). In MySQL 5.7, they
were changed to return dberr_t instead. Alas, in MariaDB Server 10.2,
some uses of functions were not adjusted to the changed return value.
Until MDEV-19231, the valid values of dberr_t were always nonzero.
This means that some code that was incorrectly checking for a zero
return value from the functions would never detect a failure.
After MDEV-19231, some tests for ALTER ONLINE TABLE would fail with
cmake -DPLUGIN_PERFSCHEMA=NO. It turned out that the wrappers
pfs_os_file_read_no_error_handling_int_fd_func() and
pfs_os_file_write_int_fd_func() were wrongly returning
bool instead of dberr_t. Also the callers of these functions were
wrongly expecting bool (nonzero on success) instead of dberr_t.
This mistake had been made when the addition of these functions was
merged from MySQL 5.6.36 and 5.7.18 into MariaDB Server 10.2.7.
This fix also reverts commit 40becbc3c7
which attempted to work around the problem.
InnoDB duplicates file descriptor returned by create_temp_file() to
workaround further inconsistent use of this descriptor.
Use mysys file descriptors consistently for innobase_mysql_tmpfile(NULL).
Mostly close it by appropriate mysys wrappers.
Windows does atomic writes, as long as they are aligned and multiple
of sector size. this is documented in MSDN.
Fix innodb.doublewrite test to always use doublewrite buffer,
(even if atomic writes are autodetected)
Problem:
io_getevents() - read asynchronous I/O events from the completion
queue. For each IO event, the res field in io_event tells whether IO
event is succeeded or not. To see if the IO actually succeeded we
always need to check event.res (negative=error,
positive=bytesread/written).
LinuxAIOHandler::collect() doesn't check event.res value for each event.
which leads to incorrect value in n_bytes for IO context (or IO Slot).
Fix:
Added a check for event.res negative value.
RB: 20871
Reviewed by : annamalai.gurusami@oracle.com
The regression that was reported in MDEV-19212 occurred due to use
of macros that did not ensure that the arguments have compatible
types.
ut_2pow_remainder(), ut_2pow_round(), ut_calc_align(): Define as
inline function templates.
UT_CALC_ALIGN(): Define as a macro, because this is used in
compile_time_assert(). Only starting with C++11 (MariaDB 10.4)
we could define the inline functions as constexpr.
os_mem_alloc_large(): Invoke the macro ut_2pow_round() with the
correct argument type.
innobase_large_page_size, innobase_use_large_pages,
os_use_large_pages, os_large_page_size: Remove.
Simply refer to opt_large_page_size, my_use_large_pages.
For tablespaces that do not reside on spinning storage, it does
not make sense to attempt to write nearby pages when writing out
dirty pages from the InnoDB buffer pool. It is actually detrimental
to performance and to the life span of flash ROM storage.
With this change, MariaDB will detect whether an InnoDB file resides
on solid-state storage. The detection has been implemented for Linux
and Microsoft Windows. For other systems, we will err on the safe side
and assume that files reside on SSD.
As part of this change, we will reduce the number of fstat() calls
when opening data files on POSIX systems and slightly clean up some
file I/O code.
FIXME: os_is_sparse_file_supported() on POSIX works in a destructive
manner. Thus, we can only invoke it when creating files, not when
opening them.
For diagnostics, we introduce the column ON_SSD to the table
INFORMATION_SCHEMA.INNODB_TABLESPACES_SCRUBBING. The table
INNODB_SYS_TABLESPACES might seem more appropriate, but its purpose
is to reflect the contents of the InnoDB system table SYS_TABLESPACES,
which we would like to remove at some point.
On Microsoft Windows, querying StorageDeviceSeekPenaltyProperty
sometimes returns ERROR_GEN_FAILURE instead of ERROR_INVALID_FUNCTION
or ERROR_NOT_SUPPORTED. We will silently ignore also this error,
and assume that the file does not reside on SSD.
On Linux, the detection will be based on the files
/sys/block/*/queue/rotational and /sys/block/*/dev.
Especially for USB storage, it is possible that
/sys/block/*/queue/rotational will wrongly report 1 instead of 0.
fil_node_t::on_ssd: Whether the InnoDB data file resides on
solid-state storage.
fil_system_t::ssd: Collection of Linux block devices that reside on
non-rotational storage.
fil_system_t::create(): Detect ssd on Linux based on the contents
of /sys/block/*/queue/rotational and /sys/block/*/dev.
fil_system_t::is_ssd(dev_t): Determine if a Linux block device is
non-rotational. Partitions will be identified with the containing
block device by assuming that the least significant 4 bits of the
minor number identify a partition, and that the "partition number"
of the entire device is 0.
now we can afford it. Fix -Werror errors. Note:
* old gcc is bad at detecting uninit variables, disable it.
* time_t is int or long, cast it for printf's
The MDEV-17262 commit 26432e49d3
was skipped. In Galera 4, the implementation would seem to require
changes to the streaming replication.
In the tests archive.rnd_pos main.profiling, disable_ps_protocol
for SHOW STATUS and SHOW PROFILE commands until MDEV-18974
has been fixed.
os_file_fsync_posix(): If fsync() returns a fatal error,
do include errno in the error message.
In the future, we might handle fsync() or write or allocation failures
on InnoDB data files a little more gracefully: flag the affected index
or table as corrupted, and deny any subsequent writes to the table.
If a write to the undo log or redo log fails, an alternative to
killing the server could be to deny any writes to InnoDB tables
until the server has been restarted.