An unfortunate change to the default behavior of the handling of
core dumps was implemented in
commit e9be5428a2
by making MTR_PRINT_CORE=small the default value, that is, to
only display the stack trace of one thread in crash reports.
Many if not most failures that occur in regression tests are
sporadic and involve race conditions or deadlocks. To be able to
analyze such failures, having the stack traces of all active threads
is a must, because CI environments typically do not save any core dumps.
While the environment variable MTR_PRINT_CORE could be set in CI
environments to compensate for the unfortunate change, it is better
to revert to the old default (dumping all threads) so that
no explicit action will be required from maintainers of independent
CI systems. In that case, if something fails once in a blue moon,
we can have some hope of diagnosing it based on the output.
We fix this regression by defaulting the unset environment variable
MTR_PRINT_CORE to "medium".
66832e3a introduced change that prints core dumps in very detailed
format. That's completely out of user-friendliness but serves as a
measure for debugging hard-reproducible bugs.
The proper way to implement this:
1. it must be controlled by command-line and environment variable;
2. detailed traces must be default for buildbots only, for user
invocations normal stack traces should be printed.
Options for control are: MTR_PRINT_CORE and --print-core that accept
the following values:
no Don't print core
short Print stack trace of failed thread
medium Print stack traces of all threads
detailed Print all stack traces with debug context
custom:<code> Use debugger commands <code> to print stack trace
Default setting is: short (see env_or_default() call in pre_setup())
For environment variable wrong values are silently ignored (falls back
to default setting, see env_or_default()).
Command-line option --print-core (or -C) overrides environment
variable. Its default value is 'short' if not specified explicitly
(same env_or_default() call in pre_setup()). Explicit values are
checked for validity.
--print-method option can specify by which debugger we print
cores. For Windows there is only one choice: cdb. For Unix the values
are: gdb, dbx, lldb, auto. Default value is: auto
In 'auto' we try to use all possible debuggers until success.
bt full - to include args and locals.
set print sevenbit on
- it is more useful to be able to see the exact bytes
(in case something is dumped as a string and not hexadecimal digits)
set print static-members off
- there are many interesting (non-const) static members
set frame-arguments all
- even non-printables are useful to see.
Let's make our bb logs give a little bit more detail on those
hard to reproduce bugs.
Tests on rhel7's gdb-7.6.1-120.el7
Added --result-file option, which will produce var/mtr-results.txt
Output has a simple format:
<tag> : <value> for general info on test run
{
<tag> : <value>
....
} for each test
Output from failed tests are included but may be truncated.
See WL for more details.
- output callstacks from crash using cdb debugger which is part
of "Debugging Tools for Windows". Output other interesting
information - function parameters, possibly source code fragment
and other goodies of "!analyze" cdb extension.
SIGABRT is sent to relevant processes after a timeout
client/mysqltest.cc:
Fixed signal handlers to mysqltest actually dumps core
mysql-test/lib/My/CoreDump.pm:
Added support for dbx
mysql-test/lib/My/SafeProcess.pm:
Added dump_core to force process to dump core
mysql-test/lib/My/SafeProcess/safe_process.cc:
Traps SIGABRT and sends this on to child
mysql-test/mysql-test-run.pl:
When test times out, force core dumps on mysqltest and servers
Post-push fixes making it work on pushbuild's valgrind host, and clarifying the output.
mysql-test/lib/My/CoreDump.pm:
- Improved parsing of mtr output so that it works on pushbuild's "valgrind" host.
- Added stack trace for the thread that coredumped, to make output more readable when there are many threads.
- Added explanation of what the output consists of.
- Added early removal of temp file.