The function posix_fallocate() as well as the Linux system call
fallocate() can return EINTR when the operation was interrupted
by a signal. In that case, keep retrying the operation, except
if InnoDB shutdown has been initiated.
Rotating binary/relay logs can cause interuption to the processing
on the server. Binary and relay logs have their own mechanism already
for not getting out of control (expire_logs_days).
By no longer rotating binary and relay logs log rotation is limited to
the following logs:
* error log
* general log
* slow query log
Writing these to the binary log would cause any logrotation on the
slave to occur twice, once due to this and another due to the log-
rotate script on the slave. Now --local is passed to mysqladmin to
prevent this duplication.
Due to the way Advance Toolchain before 10.0-3 and 8.0-8 is
packaged, shared libraries installed after the library cache
advance-toolchain-atX.Y-runtime package aren't updated in
/opt/atX.Y/etc/ld.so.cache. This results in mysqld,
configured with RUNPATH set to /opt/atX.Y/lib64, resulting
in the Advance Toolchain loader being used and if libraries
such as jemalloc, libssl or any other library that mysqld uses
is installed after Advance Toolchain, these libraries aren't in
the cache and therefore won't be found in the RPM postinstall
when mysqld is executed by mysql_install_db.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
This is an addition to original fix. Buildbot revealed another sporadic failure
in perfschema.threads_mysql test. Tests relies on data stored in
performance_schema.threads, while performing waits on
information_schema.processlist. These tables are not updated synchronously.
Fixed by performing waits on performance_schema.threads instead.
Applied lost in a merge revision 7f38a07:
MDEV-10043 - main.events_restart fails sporadically in buildbot (crashes upon
shutdown)
There was race condition between shutdown thread and event worker threads.
Shutdown thread waits for thread_count to become 0 in close_connections(). It
may happen so that event worker thread was started but didn't increment
thread_count by this time. In this case shutdown thread may miss wait for this
working thread and continue deinitialization. Worker thread in turn may continue
execution and crash on deinitialized data.
Fixed by incrementing thread_count before thread is actually created like it is
done for connection threads.
Also let event scheduler not to inc/dec running threads counter for symmetry
with other "service" threads.
On FreeBSD liblz4 is installed in /usr/local/lib.
Groonga uses pkg_check_modules to check for liblz4 (that is, pkg-config),
and then it used to set for libgroonga.a
link_directories({$LIBLZ4_LIBRARY_DIRS})
target_link_libraries(... ${LIBLZ4_LIBRARIES})
Now groonga is a static library, linked into ha_mroonga.so. CMake won't
link dynamic liblz4.so into libgroonga.a, instead it'll pass it as a
dependency and will link it into ha_mroonga.so. Fine so far. But it will
not pass link_directories from the static library as a dependency,
so ha_mroonga.so won't find liblz4.so
As suggested on cmake mailing list (e.g.
here: http://public.kitware.com/pipermail/cmake/2011-November/047468.html)
we switch to use the full path to liblz4.so, instead of the -l/-L pair.
The problem was that waiting for pause_for_ftwrl was done before
event_group was completed. This caused rpl_pause_for_ftwrl() to wait
forever during FLUSH TABLES WITH READ LOCK.
Now we only wait for FLUSH TABLES WITH READ LOCK when we are changing
to a new event group.
Protection added to reopen_file() and new_file_impl().
Without this we could get an assert in fn_format() as name == 0,
because the file was closed and name reset, atthe same time
new_file_impl() was called.
is starting.
This is needed as if we kill the START SLAVE thread too early during
shutdown then the IO_THREAD or SQL_THREAD will not have time to properly
initlize it's replication or THD structures and clean_up() will try to
delete master_info structures that are still in use.
The reason for this is that stop slave takes LOCK_active_mi over the
whole operation while some slave operations will also need LOCK_active_mi
which causes deadlocks.
Fixed by introducing object counting for Master_info and not taking
LOCK_active_mi over stop slave or even stop_all_slaves()
Another benefit of this approach is that it allows:
- Multiple threads can run SHOW SLAVE STATUS at the same time
- START/STOP/RESET/SLAVE STATUS on a slave will not block other slaves
- Simpler interface for handling get_master_info()
- Added some missing unlock of 'log_lock' in error condtions
- Moved rpl_parallel_inactivate_pool(&global_rpl_thread_pool) to end
of stop_slave() to not have to use LOCK_active_mi inside
terminate_slave_threads()
- Changed argument for remove_master_info() to Master_info, as we always
have this available
- Fixed core dump when doing FLUSH TABLES WITH READ LOCK and parallel
replication. Problem was that waiting for pause_for_ftwrl was not done
when deleting rpt->current_owner after a force_abort.
- Removed not used variables
- Added __attribute__()
- Added static to some local functions
(gcc 5.4 gives a warning for external functions without an external definition)
This removes functionality of where ./mtr --mem /tmp/dir could be a directory.
Now MTR_MEM=/tmp/dir ./mtr is needed.
The case where MTR_MEM=/tmp/dir ./mtr --mem has the equivalent effect.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
--mem works better as a pure flag, because it can be followed by command-line arguments (test names). If the option is allowed to have a value, the test name which directly follows it will be treated as the option value instead. It is possible to implement workarounds to avoid this, but they would not be completely reliable, and there is no practical purpose of such extension of functionality to justify them.
(with blocks are still reachable).
There was a similar suppression already, but it had an extra line
comparing to failures which we are getting, so it wasn't applied.
Added another variant of the suppression.
If page_compression (introduced in MariaDB Server 10.1) is enabled,
the logical action is to not preallocate space to the data files,
but to only logically extend the files with zeroes.
fil_create_new_single_table_tablespace(): Create smaller files for
ROW_FORMAT=COMPRESSED tables, but adhere to the minimum file size of
4*innodb_page_size.
fil_space_extend_must_retry(), os_file_set_size(): On Windows,
use SetFileInformationByHandle() and FILE_END_OF_FILE_INFO,
which depends on bumping _WIN32_WINNT to 0x0600.
FIXME: The files are not yet set up as sparse, so
this will currently end up physically extending (preallocating)
the files, wasting storage for unused pages.
os_file_set_size(): Add the parameter "bool sparse=false" to declare
that the file is to be extended logically, instead of being preallocated.
The only caller with sparse=true is
fil_create_new_single_table_tablespace().
(The system tablespace cannot be created with page_compression.)
fil_space_extend_must_retry(), os_file_set_size(): Outside Windows,
use ftruncate() to extend files that are supposed to be sparse.
On systems where ftruncate() is limited to files less than 4GiB
(if there are any), fil_space_extend_must_retry() retains the
old logic of physically extending the file.
fil_extend_space_to_desired_size(): Use a proper type cast when
computing start_offset for the posix_fallocate() call on 32-bit systems
(where sizeof(ulint) < sizeof(os_offset_t)). This could affect 32-bit
systems when extending files that are at least 4 MiB long.
This bug existed in MariaDB 10.0 before MDEV-11520. In MariaDB 10.1
it had been fixed in MDEV-11556.
a large memory buffer on Windows
fil_extend_space_to_desired_size(), os_file_set_size(): Use calloc()
for memory allocation, and handle failures. Properly check the return
status of posix_fallocate(), and pass the correct arguments to
posix_fallocate().
On Windows, instead of extending the file by at most 1 megabyte at a time,
write a zero-filled page at the end of the file.
According to the Microsoft blog post
https://blogs.msdn.microsoft.com/oldnewthing/20110922-00/?p=9573
this will physically extend the file by writing zero bytes.
(InnoDB never uses DeviceIoControl() to set the file sparse.)
I tested that the file extension works properly with a multi-file
system tablespace, both with --innodb-use-fallocate and
--skip-innodb-use-fallocate (the default):
./mtr \
--mysqld=--innodb-use-fallocate \
--mysqld=--innodb-autoextend-increment=1 \
--mysqld=--innodb-data-file-path='ibdata1:5M;ibdata2:5M:autoextend' \
--parallel=auto --force --retry=0 --suite=innodb &
ls -lsh mysql-test/var/*/mysqld.1/data/ibdata2
(several samples while running the test)
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
The failure happens due to a race condition between processing
a row event (INSERT) and an automatically generated event
DROP TEMPORARY TABLE. Even though DROP has a higher GTID, it can
become visible in @@gtid_slave_pos before the row event with
a lower GTID has been committed. Since the test makes the slave
to synchronize with the master using GTID, the waiting stops
as soon as GTID of the DROP TEMPORARY TABLE becomes visible,
and if changes from the previous event haven't been applied yet,
the error occurs.
According to Kristian (see the comment to MDEV-10631), the real
problem is that DROP TEMPORARY TABLE is logged in the row mode
at all. For this particular test, since DROP does not do anything,
nothing prevents it from competing with the prior transaction.
The workaround for the test is to add a meaningful event
after DROP TEMPORARY TABLE, so that the slave would wait on its
GTID instead of the one from DROP.
Additionally (unrelated to this problem) removed FLUSH TABLES,
which, as the comment stated, should have been removed after
MDEV-6403 was fixed.
The standalone warning is not a sign of a problem, just of slowness,
so it should be added to global suppressions. If a real problem
happens, there will be other errors
* Revert "Make --mem a pure flag. If there is need to specifically set the location"
This reverts commit 716621db3f.
* MDEV-11619: mtr: when --mem is pure flag, conflicts with $MTR_MEM
Conflicts occurs when MTR_MEM=/xxx/yy ./mtr --mem is invoked. Here
the --mem option overrides opt_mem leaving the default path to be chosen.
This change makes when MTR_MEM set, opt_mem, the flag, is also
set. Both the environment and flag can no be set without conflicting.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
* MDEV-11619: if opt_mem is a path include it first
* MDEV-11619: MTR_MEM locations - don't follow symlinks
From Bjorn Munch it seems symlinks can confuse some
tests. Lets just avoid those.
(ref: https://github.com/mysql/mysql-server/pull/116#issuecomment-268479774)
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
fil_space_extend_must_retry(): When innodb_use_fallocate=ON,
initialize pages_added = size - space->size so that posix_fallocate()
will actually attempt to extend the file, instead of keeping the same size.
This is a regression from MDEV-11556 which refactored
the InnoDB data file extension.
Test results are distorted by a small rounding error
during an intermediate stage of calculating the result.
By using the SQL ROUND function we stablise tests.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>