* replace the message away in the test result
* remove "feedback plugin:" prefix, it's a server message, not plugin's
* downgrade to the warning, because
1) it's not a failure, no operation was aborted, server still works
2) it's something actionable, so not a [Note] either
This shows the maximum memory allocations used by the current connection.
The value for @@global.max_memory_used is 0 as we are not collecting this
value as it would cause a notable performance issue registering this for
all threads for every memory allocation
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Added Query_time (total time spent running queries) to status_variables.
Other things:
- Added SHOW_MICROSECOND_STATUS type that shows an ulonglong variable
in microseconds converted to a double (in seconds).
- Changed Busy_time and Cpu_time to use SHOW_MICROSECOND_STATUS, which
simplified the code and avoids some double divisions for each query.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Threads can normally exit without a explicit pthread_exit call.
There seem to date to old glibc bugs, many around 2.2.5.
The semi related bug was https://bugs.mysql.com/bug.php?id=82886.
To improve safety in the signal handlers DBUG_* code was removed.
These where also needed to avoid some MSAN unresolved stack issues.
This is effectively a backport of 2719cc4925.
Backport of 2f5174e556:
MDEV-33075 Resolve server shutdown issues on macOS, Solaris, and FreeBSD.
This commit addresses multiple server shutdown problems observed on macOS,
Solaris, and FreeBSD:
1. Corrected a non-portable assumption where socket shutdown was expected
to wake up poll() with listening sockets in the main thread.
Use more robust self-pipe to wake up poll() by writing to the pipe's write
end.
Signed-off-by: Ivan Prisyazhnyy <john.koepi@gmail.com>
Backport of 2f5174e556:
MDEV-33075 Resolve server shutdown issues on macOS, Solaris, and FreeBSD.
This commit addresses multiple server shutdown problems observed on macOS,
Solaris, and FreeBSD:
2. Fixed a random crash on macOS from pthread_kill(signal_handler)
when the signal_handler was detached and the thread had already exited.
Use more robust `kill(getpid(), SIGTERM)` to wake up the signal handler
thread.
Additionally, the shutdown code underwent light refactoring
for better readability and maintainability:
- Modified `break_connect_loop()` to no longer wait for the main thread,
aligning behavior with Windows (since 10.4).
Backport of 2f5174e556:
MDEV-33075 Resolve server shutdown issues on macOS, Solaris, and FreeBSD.
This commit addresses multiple server shutdown problems observed on macOS,
Solaris, and FreeBSD:
3. Made sure, that signal handler thread always exits once `abort_loop` is
set, and also calls `my_thread_end()` and clears `signal_thread_in_use`
when exiting.
This fixes warning "1 thread did not exit" by `my_global_thread_end()`
seen on FreeBSD/macOS when the process is terminated via signal.
Additionally, the shutdown code underwent light refactoring
for better readability and maintainability:
- Removed dead code related to the unused `USE_ONE_SIGNAL_HAND`
preprocessor constant.
Signed-off-by: Ivan Prisyazhnyy <john.koepi@gmail.com>
Backport of 2f5174e556:
MDEV-33075 Resolve server shutdown issues on macOS, Solaris, and FreeBSD.
Eliminated support for `#ifndef HAVE_POLL` in `handle_connection_sockets`
This code is also dead, since 10.4
Signed-off-by: Ivan Prisyazhnyy <john.koepi@gmail.com>
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.
This patch consolidates the functions all into the
qsort_cmp2 interface.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
When these members switched from plain `char*` to `LEX_CSTRING`,
not all usages were converted. Specifically, in this commit are args of
`my_snprintf` derivatives. Because until MDEV-21978,
automated type checks were unavailable for those functions due to
their incompatibility, so these tools didn’t catch them.
create templates
thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)
and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
This task is inspired by the Percona implementation of
slow_query_log_always_write_time.
This task implements the variable log_slow_always_query_time (name
matching other MariaDB variables using the slow query log). The
default value for the variable is 31536000, which makes MariaDB
compatible with older installations.
For queries with execution time longer than log_slow_always_query_time
the variables log_slow_rate_limit and log_slow_min_examined_row_limit
will be ignored and the query will be written to the slow query log
if there is no other limitations (like log_slow_filter etc).
Other things:
- long_query_time internal variable renamed to log_slow_query_time.
- More descriptive information for "log_slow_query_time".
1. Binlog commit by rotate (MDEV-32014) should not be
used with Galera, yet while WSREP binlog emulation
is active, the code path could lead into
binlog_cache_data::write_prepare() in an invalid
state, leading to errors in MTR. To fix, an extra
check is added to ensure the binlog is actually
active before calling write_prepare().
2. If the #binlog_cache_files directory exists on a
mariadbd run without opt_log_bin, the directory
was treated as a table/database, leading to errors.
To fix, on startup, if opt_log_bin is disabled and
#binlog_cache_files exists (in the default log
directory), the directory is deleted (and an
informational message is provided in the error
log)
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
for large transaction
Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.
The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.
Design
======
* binlog_large_commit_threshold
type: ulonglong
scope: global
dynamic: yes
default: 128MB
Only the binlog cache temporary files large than 128MB are
renamed to binlog file.
* #binlog_cache_files directory
To support rename, all binlog cache temporary files are managed
as normal files now. `#binlog_cache_files` directory is in the same
directory with binlog files. It is created at server startup if it doesn't
exist. Otherwise, all files in the directory is deleted at startup.
The temporary files are named with ML_ prefix and the memorary address
of the binlog_cache_data object which guarantees it is unique.
* Reserve space
To supprot rename feature, It must reserve enough space at the
begin of the binlog cache file. The space is required for
Format description, Gtid list, checkpoint and Gtid events when
renaming it to a binlog file.
Since binlog_cache_data's cache_log is directly accessed by binlog log,
online alter and wsrep. It is not easy to update all the code. Thus
binlog cache will not reserve space if it is not session binlog cache or
wsrep session is enabled.
- m_file_reserved_bytes
Stores the bytes reserved at the begin of the cache file.
It is initialized in write_prepare() and cleared by reset().
The reserved file header is hide to callers. Thus there is no
change for callers. E.g.
- get_byte_position() still get the length of binlog data
written to the cache, but not the file length.
- truncate(0) will truncate the file to m_file_reserved_bytes but not 0.
- write_prepare()
write_prepare() is called everytime when anything is being written
into the cache. It will call init_file_reserved_bytes() to create
the cache file (if it doesn't exist) and reserve suitable space if
the data written exceeds buffer's size.
* Binlog_commit_by_rotate
It is used to encapsulate the code for remaing a binlog cache
tempoary file to binlog file.
- should_commit_by_rotate()
it is called by write_transaction_to_binlog_events() to check if
a binlog cache should be rename to a binlog file.
- commit()
That is the entry to rename a binlog cache and commit the
transaction. Both rename and commit are protected by LOCK_log,
Thus not other transactions can write anything into the renamed
binlog before it.
Rename happens in a rotation. After the new binlog file is generated,
replace_binlog_file() is called to:
- copy data from the new binlog file to its binlog cache file.
- write gtid event.
- rename the binlog cache file to binlog file.
After that the rotation will continue to succeed. Then the transaction
is committed in a seperated group itself. Its cache file will be
detached and cache log will be reset before calling
trx_group_commit_with_engines(). Thus only Xid event be written.
Set select_thread_in_use only when we're about to enter into the polling
loop, not sooner, allowing early proces aborts to exist cleanly: the
process won't be waiting for a polling loop that isn't yet polling.
One change is that if the port is not supplied or out of bound, the
old behaviour is to print 3306. The new behaviour is to not print
it (if not supplied) or the out of bound value.
Each time a listener socket becomes ready, MariaDB calls accept() ten
times (MAX_ACCEPT_RETRY), even if all but the first one return EAGAIN
because there are no more connections. This causes unnecessary CPU
usage - on our server, the CPU load of that thread, which does nothing
but accept(), saturates one CPU core by ~45%. The loop should stop
after the first EAGAIN.
Perf report:
11.01% mariadbd libc.so.6 [.] accept4
6.42% mariadbd [kernel.kallsyms] [k] finish_task_switch.isra.0
5.50% mariadbd [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
5.50% mariadbd [kernel.kallsyms] [k] syscall_enter_from_user_mode
4.59% mariadbd [kernel.kallsyms] [k] __fget_light
3.67% mariadbd [kernel.kallsyms] [k] kmem_cache_alloc
2.75% mariadbd [kernel.kallsyms] [k] fput
2.75% mariadbd [kernel.kallsyms] [k] mod_objcg_state
1.83% mariadbd [kernel.kallsyms] [k] __inode_wait_for_writeback
1.83% mariadbd [kernel.kallsyms] [k] __sys_accept4
1.83% mariadbd [kernel.kallsyms] [k] _raw_spin_unlock_irq
1.83% mariadbd [kernel.kallsyms] [k] alloc_inode
1.83% mariadbd [kernel.kallsyms] [k] call_rcu
crash recovery
Summary
=======
When doing server recovery, the active transactions will be rolled
back by InnoDB background rollback thread automatically. The
prepared transactions will be committed or rolled back accordingly
by binlog recovery. Binlog recovery is done in main thread before
the server can provide service to users. If there is a big
transaction to rollback, the server will not available for a long
time.
This patch provides a way to rollback the prepared transactions
asynchronously. Thus the rollback will not block server startup.
Design
======
- Handler::recover_rollback_by_xid()
This patch provides a new handler interface to rollback transactions
in recover phase. InnoDB just set the transaction's state to active.
Then the transaction will be rolled back by the background rollback
thread.
- Handler::signal_tc_log_recover_done()
This function is called after tc log is opened(typically binlog opened)
has done. When this function is called, all transactions will be rolled
back have been reverted to ACTIVE state. Thus it starts rollback thread
to rollback the transactions.
- Background rollback thread
With this patch, background rollback thread is defered to run until binlog
recovery is finished. It is started by innobase_tc_log_recovery_done().