Most resource limit information is excessive, particularly
limits that aren't limited.
We restructure the output by considering the Linux format
of /proc/limits which had its soft limits beginning at offset
26. "u"limited lines are skipped.
Example output:
Resource Limits (excludes unlimited resources):
Limit Soft Limit Hard Limit Units
Max stack size 8388608 unlimited bytes
Max processes 127235 127235 processes
Max open files 32198 32198 files
Max locked memory 8388608 8388608 bytes
Max pending signals 127235 127235 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
This is 8 lines less that what was before.
The FreeBSD limits file was /proc/curproc/rlimit and a different format
so a loss of non-Linux proc enabled OSes isn't expected.
Provide bug url in addition to how to report the bug.
Remove obsolete information like key_buffers and used connections
as they haven't meaningfully added value to a bug report for quite
a while. Remove information that comes from long fixed interfaces
in glibc/kernel.
Encourage the use of a full backtrace from the core with debug
symbols.
Lets be realistic about the error messages, its the users we are addressing
not developers so wording around getting the information communicated
is the key aspect.
All the user readable text and instructions are in on place, as
non-understandable is the end of the reading process for the user.
Remove the duplicate printing of the query.
Use my_progname rather than "mysqld" to reflex the program name.
So the signal handler output is now in the form:
1. User instructions
2. Server Information
3. Stacktrace
4. connection/query/optimizer_switch
5. Core information and resource limits
6. Kernel information
When handling fatal signal, shut down Galera networking
before printing out stack trace and writing core file.
This is to achieve fail-silent semantics on crashes which may
keep the process running for a long time, but not fully responding
e.g. due to core dumping or symbol resolving.
Also suppress all Galera/wsrep logging to avoid logging from
background threads to garble crash information from signal handler.
Notice that for fully fail-silent crash, Galera 26.4.19 is needed.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
A user reported that MariaDB server got a signal 6 but still accepted new
connections and did not crash. I have not been able to find a way to
repeat this or find the cause of issue. However to make it easier to
notice that the server is unstable, I added code to disable new
connections when the handle_fatal_signal() handler has been called.
Blaming hardware and poor libraries seems on the rare end of the
scale of things that go wrong. Accept the blame, apologize, and
kindly request a bug report.
Also url change on stack traces is changed to include mariadbd.
Thanks Vlad for also raising that blaming was wrong.
and my_getwd(). The cause is my_errno define which
depends on my_thread_var being a not null pointer
otherwise it will be de-referenced and cause
a SEGV already in the signal handler.
Replace uses of these functions in the output_core_info
using posix read/getcwd functions instead.
The getwd fallback in my_getcwd isn't needed as
its been obsolute for a very long time.
Thanks Vladislav Vaintroub for diagnosis and posix
recommendation.
MariaDB MDEV-12583 added `SOURCE_REVISION` variable that exposes the
SHA1 of source code commit that the current running engine was built
from. This info is useful for troubleshooting and debugging.
This commit does the following:
- addes the `SOURCE_REVISION` value into engine error log.
- when a crash triggers handle_fatal_signal, the `SOURCE_REVISION` will
be included in crash report.
- resolves MDEV-20344: startup messages belong in stderr/error-log not
stdout
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
Recent adventures in liburing and btrfs have shown up some kernel
version dependent bugs. Having a bug report of accurace kernel version
can start to correlate these errors sooner.
On Linux, /proc/version contains the kernel version.
FreeBSD has kern.version (per man 8 sysctl), so include that too.
Example output:
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
Kernel version: Linux version 5.19.0-0.rc2.21.fc37.x86_64 (mockbuild@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1), GNU ld version 2.38-14.fc37) #1 SMP PREEMPT_DYNAMIC Mon Jun 13 15:27:24 UTC 2022
Segmentation fault (core dumped)
This is a side-effect of my_large_malloc() introduction,MDEV-18851
It removed a cast to size_t to variable 'blocks' in
multiplication blocks * keycache->key_cache_block_size , creating ulong value
instead of correct size_t.
Replaced a couple of ulongs with appropriate data type, which is size_t.
Also, fixed casts to ulongs in crash handler messages, so that people would
not be confused by that, too.
Interestingly, aria did not expose the same problem even if it contains
copied and pasted code in ma_pagecache, because Aria had some ulongs removed
when fixing a similar problem in MDEV-9256.
because the name was misleading, it counts not threads, but THDs,
and as THD_count is the only way to increment/decrement it, it
could as well be declared inside THD_count.
This fixed the MySQL bug# 20338 about misuse of double underscore
prefix __WIN__, which was old MySQL's idea of identifying Windows
Replace it by _WIN32 standard symbol for targeting Windows OS
(both 32 and 64 bit)
Not that connect storage engine is not fixed in this patch (must be
fixed in "upstream" branch)
If two threads would call sf_free() / free_memory() at the same time,
bad_ptr() would not detect this. Fixed by adding extra detection
when working with the memory region under sf_mutex.
Other things:
- If safe_malloc crashes while mutex is hold, stack trace printing will
hang because we malloc is called by my_open(), which is used by stack
trace printing code. Fixed by adding MY_NO_REGISTER flag to my_open,
which will disable the malloc() call to remmeber the file name.
When my_vsnprintf() is patched, the code protected disabled with
'WAITING_FOR_BUGFIX_TO_VSPRINTF' should be enabled again. Also all %b
formats in this patch should be revert to %s again