track page-access counter
As part of MDEV-21212, n_page_gets that is meant to track page access,
is ported to use distributed counter that default uses atomic sub-counters.
n_page_gets originally was a non-atomic counter that represented an approximate
value of pages tracked. Using the said analogy it doesn't need to be
an atomic distributed counter.
This patch introduces an interface that allows distributed counter to be
used with atomic and non-atomic sub-counter (through template) and also
port n_page_gets to use non-atomic distributed counter using the said
updated interface.
Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed,
we are observing sporadic hangs during the execution of the
RESET MASTER statement. We are hoping to fix the hangs with these
changes, but due to the rather infrequent occurrence of the hangs
and our inability to reliably reproduce the hangs, we cannot be
sure of this.
What we do know is that innodb_force_recovery=2 (or a larger setting)
will prevent srv_master_callback (the former srv_master_thread) from
running. In that mode, periodic log flushes would never occur and
RESET MASTER could hang indefinitely. That is demonstrated by the new
test case that was developed by Andrei Elkin. We fix this case by
implementing a special case for it.
This also includes some code cleanup and renames of misleadingly
named code. The interface has nothing to do with log checkpoints in
the storage engine; it is only about requesting log writes to be
persistent.
handlerton::commit_checkpoint_request,
commit_checkpoint_notify_ha(): Remove the unused parameter hton.
log_requests.start: Replaces pending_checkpoint_list.
log_requests.end: Replaces pending_checkpoint_list_end.
log_requests.mutex: Replaces pending_checkpoint_mutex.
log_flush_notify_and_unlock(), log_flush_notify(): Replaces
innobase_mysql_log_notify(). The new implementation should be
functionally equivalent to the old one.
innodb_log_flush_request(): Replaces innobase_checkpoint_request().
Implement a fast path for common cases, and reduce the mutex hold time.
POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha()
for the current request if it is already satisfied, as well as invoke
log_flush_notify_and_unlock() for any satisfied requests.
log_write(): Invoke log_flush_notify() when the write is already durable.
This was missing WITH_PMEM when the log is in persistent memory.
Reviewed by: Vladislav Vaintroub
we need to stop server instance on upgrade, but it may be started either by SysV init script or by SystemD.
this commit adds `mysql` target to `systemctl stop` call.
`mysql` may be the name of initscript or an alias while `mariadb` is
a systemd unit file.
Systemd has a socket activation feature where a mariadb.socket
definition defines the sockets to listen to, and passes those
file descriptors directly to mariadbd to use when a connection
occurs.
The new functionality is utilized when starting as follows:
systemctl start mariadb.socket
The mariadb.socket definition only needs to contain the network
information, ListenStream= directives, the mariadb.service
definition is still used for service instigation.
When mariadbd is started in this way, the socket, port, bind-address
backlog are all assumed to be self contained in the mariadb.socket
definition and as such the mariadb settings and command line
arguments of these network settings are ignored.
See man systemd.socket for how to limit this to specific ports.
Extra ports, those specified with extra_port in socket activation
mode, are those with a FileDescriptorName=extra. These need
to be in a separate service name like mariadb-extra.socket and
these require a Service={mariadb.service} directive to map to the
original service. Extra ports need systemd v227 or greater
(not RHEL/Centos7 - v219) when FileDescriptorName= was added,
otherwise the extra ports are treated like ordinary ports.
The number of sockets isn't limited when using systemd socket activation
(except by operating system limits on file descriptors and a minimal
amount of memory used per file descriptor). The systemd sockets passed
can include any ownership or permissions, including those the
mariadbd process wouldn't normally have the permission to create.
This implementation is compatible with mariadb.service definitions.
Those services started with:
systemctl start mariadb.service
does actually start the mariadb.service and used all the my.cnf
settings of sockets and ports like it previously did.
cleanups from PR 900:
- Use mariadb names instead of mysql and add secure-installation and
additionally organize man pages.
- Remove obsolete script `/make_binary_distribution`
- Don't build binary `mariadb-install-db` in case of without-server
based on the man-page
```
The replace program is used by msql2mysql. See msql2mysql(1).
```
msql2mysql is labeled as Client component, so should the dependency
Closes PR #900
Under WITHOUT_WSREP:
Exclude support files that are server only like
* wsrep.cnf
* wsrep_notify
* log rotate config files
* mysqld_multi
Exclude man pages of server components
`mallinfo` is deprecated since glibc 2.33 and has been replaced by mallinfo2.
The deprecation causes building the server to fail if glibc version is > 2.33.
Check if mallinfo2 exist on the system and use it instead.
The function row_upd_clust_step() is invoking several static functions,
some of which used to commit the mini-transaction in some cases.
If innobase_get_computed_value() would fail due to some reason,
we would fail to invoke mtr_t::commit() and release buffer pool
page latches. This would likely lead to a hanging server later.
This regression was introduced in
commit 97db6c15ea (MDEV-20618).
row_upd_index_is_referenced(), row_upd_sec_index_entry(),
row_upd_sec_index_entry(): Cleanup: Replace some ibool with bool.
row_upd_clust_rec_by_insert(), row_upd_clust_rec(): Guarantee that
the mini-transaction will always remain in active state.
row_upd_del_mark_clust_rec(): Guarantee that
the mini-transaction will always remain in active state.
This fixes one "leak" of mini-transaction on DB_COMPUTE_VALUE_FAILED.
row_upd_clust_step(): Use only one return path, which will always
invoke mtr.commit(). After a failed row_upd_store_row() call, we
will no longer "leak" the mini-transaction.
This fix was verified by RQG on 10.6 (depending on MDEV-371 that
was introduced in 10.4). Unfortunately, it is challenging to
create a regression test for this, and a test case could soon become
invalid as more bugs in virtual column evaluation are fixed.
A side effect of the MDEV-24589 bug fix is that if
FLUSH TABLE...FOR EXPORT is initiated before the history of an
earlier DROP INDEX operation has been purged, then the data file
will contain allocated pages that belonged to the dropped indexes.
These pages would never be freed after a subsequent IMPORT TABLESPACE.
We will work around this regression by making IMPORT TABLESPACE
tolerate pages that refer to an unknown index.
HAVE_valgrind_or_MSAN to HAVE_valgrind was incorrect in
af784385b4.
In my_valgrind.h when clang exists (hence no __has_feature(memory_sanitizer),
and -DWITH_VALGRIND=1, but without memcheck.h, we end up with a MEM_CHECK_DEFINED
being empty.
If we are also doing a CMAKE_BUILD_TYPE=Debug this results a number of
[-Werror,-Wunused-variable] errors because MEM_CHECK_DEFINED is empty.
With MEM_CHECK_DEFINED empty, there becomes no uses of this of the
fixed field and innodb variables in this patch.
So we stop using HAVE_valgrind as catchall and use the name
HAVE_CHECK_MEM to indicate that a CHECK_MEM_DEFINED function exists.
Reviewer: Monty
Corrects: af784385b4
Binlog group commit could lead to a situation where group commit leader
accesses participant thd's wsrep client state concurrently with the
thread executing the participant thd.
This is because of race condition in
MYSQL_BIN_LOG::write_transaction_to_binlog_events(),
and was fixed by moving wsrep_ordered_commit() to happen in
MYSQL_BIN_LOG::queue_for_group_commit() under protection of
LOCK_prepare_ordered mutex.
splittable derived
If one of joined tables of the processed query is a materialized derived
table (or view or CTE) with GROUP BY clause then under some conditions it
can be subject to split optimization. With this optimization new equalities
are injected into the WHERE condition of the SELECT that specifies this
derived table. The injected equalities are generated for all join orders
with which the split optimization can employed. After the best join order
has been chosen only certain of this equalities are really needed. The
others can be safely removed. If it's not done and some of injected
equalities involve expressions over semi-joins with look-up access then
the query may return a wrong result set.
This patch effectively removes equalities injected for split optimization
that are needed only at the optimization stage and not needed for execution.
Approved by serg@mariadb.com
- use FIND_PACKAGE(LIBAIO) to find libaio
- Use standard CMake conventions in Find{PMEM,URING}.cmake
- Drop the LIB from LIB{PMEM,URING}_{INCLUDE_DIR,LIBRARIES}
It is cleaner, and consistent with how other packages are handled in CMake.
e.g successful FIND_PACKAGE(PMEM) now sets PMEM_FOUND, PMEM_LIBRARIES,
PMEM_INCLUDE_DIR, not LIBPMEM_{FOUND,LIBRARIES,INCLUDE_DIR}.
- Decrease the output. use FIND_PACKAGE with QUIET argument.
- for Linux packages, either liburing, or libaio is required
If liburing is installed, libaio does not need to be present .
Use FIND_PACKAGE([LIBAIO|URING] REQUIRED) if either library is required.
row_upd_clust_step(): Remove the "trigger" on DELETE SYS_INDEXES
that would invoke dict_drop_index_tree(). Let us do it on purge.
row_purge_remove_clust_if_poss_low(): Invoke
dict_drop_index_tree() when purging a delete-marked SYS_INDEXES record.