- Core dump
- Contents
- Disabling automatic core dumps
- Using sysctl
- Using systemd
- Using PAM limits
- Using ulimit
- Making a core dump
- Where do they go?
- Examining a core dump
- Julia Evans
- what’s a segfault?
- step 1: run valgrind
- How to get a core dump
- ulimit: set the max size of a core dump
- kernel.core_pattern: where core dumps are written
- kernel.core_pattern & Ubuntu
- So you have a core dump. Now what?
- Getting a backtrace from gdb
- look at the stack for every thread
- gdb + core dumps = amazing
- getting a stack trace from a core dump is pretty approachable!
- 17.4 Collect Core Dumps
- 17.4.1 Collect Core Dumps on Oracle Solaris
- 17.4.2 Collect Core Dumps on Linux
- 17.4.3 Reasons for Not Getting a Core File
- 17.4.4 Collect Crash Dumps on Windows
Core dump
A core dump is a file containing a process’s address space (memory) when the process terminates unexpectedly. Core dumps may be produced on-demand (such as by a debugger), or automatically upon termination. Core dumps are triggered by the kernel in response to program crashes, and may be passed to a helper program (such as systemd-coredump) for further processing. A core dump is not typically used by an average user, but may be passed on to developers upon request where it can be invaluable as a post-mortem snapshot of the program’s state at the time of the crash, especially if the fault is hard to reliably reproduce.
Contents
Disabling automatic core dumps
Users may wish to disable automatic core dumps for a number of reasons:
- Performance: generating core dumps for memory-heavy processes can waste system resources and delay the cleanup of memory.
- Disk space: core dumps of memory-heavy processes may consume disk space equal to, if not greater, than the process’s memory footprint if not compressed.
- Security: core dumps, although typically readable only by root, may contain sensitive data (such as passwords or cryptographic keys), which are written to disk following a crash.
Using sysctl
sysctl can be used to set the kernel.core_pattern to nothing to disable core dump handling. Create this file[1]:
To apply the setting immediately, use sysctl :
Using systemd
systemd’s default behavior is defined in /usr/lib/sysctl.d/50-coredump.conf , which sets kernel.core_pattern to call systemd-coredump . It generates core dumps for all processes in /var/lib/systemd/coredump . systemd-coredump behavior can be overridden by creating a configuration snippet in the /etc/systemd/coredump.conf.d/ directory with the following content[2][3]:
Then reload systemd’s configuration.
This method alone is usually sufficient to disable userspace core dumps, so long as no other programs enable automatic core dumps on the system, but the coredump is still generated in memory and systemd-coredump run.
Using PAM limits
The maximum core dump size for users logged in via PAM is enforced by limits.conf. Setting it to zero disables core dumps entirely. [4]
Using ulimit
Command-line shells such as bash or zsh provide a builtin ulimit command which can be used to report or set resource limits of the shell and the processes started by the shell. See bash(1) § SHELL BUILTIN COMMANDS or zshbuiltins(1) for details.
To disable core dumps in the current shell:
Making a core dump
To generate a core dump of an arbitrary process, first install the gdb package. Then find the PID of the running process, for example with pgrep:
Attach to the process:
Then at the (gdb) prompt:
Now you have a coredump file called core.2071 .
Where do they go?
The kernel.core_pattern sysctl decides where automatic core dumps go. By default, core dumps are sent to systemd-coredump which can be configured in /etc/systemd/coredump.conf . By default, all core dumps are stored in /var/lib/systemd/coredump (due to Storage=external ) and they are compressed with zstd (due to Compress=yes ). Additionally, various size limits for the storage can be configured.
To retrieve a core dump from the journal, see coredumpctl(1) .
Examining a core dump
Use coredumpctl to find the corresponding dump:
You need to uniquely identify the relevant dump. This is possible by specifying a PID , name of the executable, path to the executable or a journalctl predicate (see coredumpctl(1) and journalctl(1) for details). To see details of the core dumps:
Pay attention to «Signal» row, that helps to identify crash cause. For deeper analysis you can examine the backtrace using gdb:
When gdb is started, use the bt command to print the backtrace:
See Debugging/Getting traces if debugging symbols are requested, but not found.
Источник
Julia Evans
This week at work I spent all week trying to debug a segfault. I’d never done this before, and some of the basic things involved (get a core dump! find the line number that segfaulted!) took me a long time to figure out. So here’s a blog post explaining how to do those things!
At the end of this blog post, you should know how to go from “oh no my program is segfaulting and I have no idea what is happening” to “well I know what its stack / line number was when it segfaulted, at least!“.
what’s a segfault?
A “segmentation fault” is when your program tries to access memory that it’s not allowed to access, or tries to . This can be caused by:
- trying to dereference a null pointer (you’re not allowed to access the memory address 0 )
- trying to dereference some other pointer that isn’t in your memory
- a C++ vtable pointer that got corrupted and is pointing to the wrong place, which causes the program to try to execute some memory that isn’t executable
- some other things that I don’t understand, like I think misaligned memory accesses can also segfault
This “C++ vtable pointer” thing is what was happening to my segfaulting program. I might explain that in a future blog post because I didn’t know any C++ at the beginning of this week and this vtable lookup thing was a new way for a program to segfault that I didn’t know about.
But! This blog post isn’t about C++ bugs. Let’s talk about the basics, like, how do we even get a core dump?
step 1: run valgrind
I found the easiest way to figure out why my program is segfaulting was to use valgrind: I ran
and this gave me a stack trace of what happened. Neat!
But I also wanted to do a more in-depth investigation and find out more than just what valgrind was telling me! So I wanted to get a core dump and explore it.
How to get a core dump
A core dump is a copy of your program’s memory, and it’s useful when you’re trying to debug what went wrong with your problematic program.
When your program segfaults, the Linux kernel will sometimes write a core dump to disk. When I originally tried to get a core dump, I was pretty frustrated for a long time because – Linux wasn’t writing a core dump!! Where was my core dump.
Here’s what I ended up doing:
- Run ulimit -c unlimited before starting my program
- Run sudo sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t
ulimit: set the max size of a core dump
ulimit -c sets the maximum size of a core dump. It’s often set to 0, which means that the kernel won’t write core dumps at all. It’s in kilobytes. ulimits are per process – you can see a process’s limits by running cat /proc/PID/limit
For example these are the limits for a random Firefox process on my system:
The kernel uses the soft limit (in this case, “max core file size = 0”) when deciding how big of a core file to write. You can increase the soft limit up to the hard limit using the ulimit shell builtin ( ulimit -c unlimited !)
kernel.core_pattern: where core dumps are written
kernel.core_pattern is a kernel parameter or a “sysctl setting” that controls where the Linux kernel writes core dumps to disk.
Kernel parameters are a way to set global settings on your system. You can get a list of every kernel parameter by running sysctl -a , or use sysctl kernel.core_pattern to look at the kernel.core_pattern setting specifically.
So sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t will write core dumps to /tmp/core-
If you want to know more about what these %e , %p parameters read, see man core.
It’s important to know that kernel.core_pattern is a global settings – it’s good to be a little careful about changing it because it’s possible that other systems depend on it being set a certain way.
kernel.core_pattern & Ubuntu
By default on Ubuntu systems, this is what kernel.core_pattern is set to
This caused me a lot of confusion (what is this apport thing and what is it doing with my core dumps??) so here’s what I learned about this:
- Ubuntu uses a system called “apport” to report crashes in apt packages
- Setting kernel.core_pattern=|/usr/share/apport/apport %p %s %c %d %P means that core dumps will be piped to apport
- apport has logs in /var/log/apport.log
- apport by default will ignore crashes from binaries that aren’t part of an Ubuntu packages
I ended up just overriding this Apport business and setting kernel.core_pattern to sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t because I was on a dev machine, I didn’t care whether Apport was working on not, and I didn’t feel like trying to convince Apport to give me my core dumps.
So you have a core dump. Now what?
Okay, now we know about ulimits and kernel.core_pattern and you have actually have a core dump file on disk in /tmp . Amazing! Now what. We still don’t know why the program segfaulted!
The next step is to open the core file with gdb and get a backtrace.
Getting a backtrace from gdb
You can open a core file with gdb like this:
Next, we want to know what the stack was when the program crashed. Running bt at the gdb prompt will give you a backtrace. In my case gdb hadn’t loaded symbols for the binary, so it was just like . . Luckily, loading symbols fixed it.
Here’s how to load debugging symbols.
This loads symbols from the binary and from any shared libraries the binary uses. Once I did that, gdb gave me a beautiful stack trace with line numbers when I ran bt .
If you want this to work, the binary should be compiled with debugging symbols. Having line numbers in your stack traces is extremely helpful when trying to figure out why a program crashed 🙂
look at the stack for every thread
Here’s how to get the stack for every thread in gdb!
gdb + core dumps = amazing
If you have a core dump & debugging symbols and gdb, you are in an amazing situation!! You can go up and down the call stack, print out variables, and poke around in memory to see what happened. It’s the best.
If you are still working on being a gdb wizard, you can also just print out the stack trace with bt and that’s okay 🙂
Another path to figuring out your segfault is to do one compile the program with AddressSanitizer (“ASAN”) ( $CC -fsanitize=address ) and run it. I’m not going to discuss that in this post because this is already pretty long and anyway in my case the segfault disappeared with ASAN turned on for some reason, possibly because the ASAN build used a different memory allocator (system malloc instead of tcmalloc).
I might write about ASAN more in the future if I ever get it to work 🙂
getting a stack trace from a core dump is pretty approachable!
This blog post sounds like a lot and I was pretty confused when I was doing it but really there aren’t all that many steps to getting a stack trace out of a segfaulting program:
if that doesn’t work, or if you want to have a core dump to investigate:
- make sure the binary is compiled with debugging symbols
- set ulimit and kernel.core_pattern correctly
- run the program
- open your core dump with gdb , load the symbols, and run bt
- try to figure out what happened!!
I was able using gdb to figure out that there was a C++ vtable entry that is pointing to some corrupt memory, which was somewhat helpful and helped me feel like I understood C++ a bit better. Maybe we’ll talk more about how to use gdb to figure things out another day!
You might also like the Recurse Center, my very favorite programming community (my posts about it)
Источник
17.4 Collect Core Dumps
This section explains how to generate and collect core dumps (also known as crash dumps). A core dump or a crash dump is a memory snapshot of a running process. A core dump can be automatically created by the operating system when a fatal or unhandled error (for example, signal or system exception) occurs. Alternatively, a core dump can be forced by means of system-provided command-line utilities. Sometimes a core dump is useful when diagnosing a process that appears to be hung; the core dump may reveal information about the cause of the hang.
When collecting a core dump, be sure to gather other information about the environment so that the core file can be analyzed (for example, OS version, patch information, and the fatal error log).
Core dumps do not usually contain all the memory pages of the crashed or hung process. With each of the operating systems discussed here, the text (or code) pages of the process are not included in core dumps. But to be useful, a core dump must consist of pages of heap and stack as a minimum. Collecting non-truncated good core dump files is essential for postmortem analysis of the crash.
The following sections describe scenarios for collecting core dumps.
17.4.1 Collect Core Dumps on Oracle Solaris
With the Oracle Solaris operating system, unhandled signals such as a segmentation violation, illegal instruction, and so forth, result in a core dump. By default, the core dump is created in the current working directory of the process and the name of the core dump file is core. The user can configure the location and name of the core dump using the core file administration utility, coreadm . This procedure is fully described in the man page for the coreadm utility.
The ulimit utility is used to get or set the limitations on the system resources available to the current shell and its descendants. Use the ulimit -c command to check or set the core file size limit. Make sure that the limit is set to unlimited ; otherwise the core file could be truncated.
ulimit is a Bash shell built-in command; on a C shell, use the limit command.
Ensure that any scripts that are used to launch the VM or your application do not disable core dump creation.
The gcore utility can be used to get a core image of running processes. This utility accepts a process id (pid) of the process for which you want to force core dump.
To get the list of Java processes running on the machine, you can use any of the following commands:
ps -ef | grep java
The jps command-line utility does not perform name matching (that is, looking for «java» in the process command name) and so it can list Java VM embedded processes as well as the Java processes.
The following are two methods to collect core dumps on Oracle Solaris.
ShowMessageBoxOnError option on Oracle Solaris:
A Java process can be started with the -XX:+ShowMessageBoxOnError command-line option. When a fatal error is encountered, the process prints a message to standard error and waits for a yes or no response from standard input. Example 17-1 shows the output when an unexpected signal occurs.
Example 17-1 Unexpected Signal Error on Solaris
Before answering yes or pressing RETURN (Enter), use the gcore utility to force a core dump. Then you can type yes to launch the dbx debugger.
Suspend a process with truss utility:
In situations where it is not possible to specify the -XX:+ShowMessageBoxOnError option, you might be able to use the truss utility. This Oracle Solaris operating system utility is used to trace system calls and signals. You can use this utility to suspend the process when it reaches a specific function or system call.
The command in Example 17-2 shows how to use the truss utility to suspend a process when the exit system call is executed (in other words, the process is about to exit).
Example 17-2 Use truss Utility to Suspend a Process
When the process calls exit , it will be suspended. At this point, you can attach the debugger to the process or call gcore to force a core dump.
17.4.2 Collect Core Dumps on Linux
On the Linux operating system, unhandled signals such as segmentation violation, illegal instruction, and so forth, result in a core dump. By default, the core dump is created in the current working directory of the process and the name of the core dump file is core. pid , where pid is the process id of the crashed Java process.
The ulimit utility is used to get or set the limitations on the system resources available to the current shell and its descendants. Use the ulimit -c command to check or set the core file size limit. Make sure that the limit is set to unlimited ; otherwise the core file could be truncated.
ulimit is a Bash shell built-in command; on a C shell, use the limit command.
Ensure that any scripts that are used to launch the VM or your application do not disable core dump creation.
You can use the gcore command in the gdb (GNU Debugger) interface to get a core image of a running process. This utility accepts the pid of the process for which you want to force the core dump.
To get the list of Java processes running on the machine, you can use any of the following commands:
ps -ef | grep java
The jps command-line utility does not perform name matching (that is, looking for «java» in the process command name) and so it can list Java VM embedded processes as well as the Java processes.
The following is one option to collect core dumps on Linux.
ShowMessageBoxOnError option in Linux:
A Java process can be started with the -XX:+ShowMessageBoxOnError command-line option. When a fatal error is encountered, the process prints a message to standard error and waits for a yes or no response from standard input. Example 17-3 shows the output when an unexpected signal occurs.
Example 17-3 Unexpected Signal Error in Linux
Type yes to launch the gdb (GNU Debugger) interface, as suggested by the error report shown above. In the gdb prompt, you can give the gcore command. This command creates a core dump of the debugged process with the name core. pid , where pid is the process ID of the crashed process. Make sure that the gdb gcore command is supported in your versions of gdb . Look for help gcore in the gdb command prompt.
17.4.3 Reasons for Not Getting a Core File
The following list explains the major reasons that a core file might not be generated. This list pertains to both Oracle Solaris and Linux operating systems, unless specified otherwise.
The current user does not have permission to write in the current working directory of the process.
The current user has write permission on the current working directory, but there is already a file named core that has read-only permission.
The current directory does not have enough space or there is no space left.
The current directory has a subdirectory named core.
The current working directory is remote. It might be mapped by NFS (Network File System), and NFS failed just at the time the core dump was about to be created.
Oracle Solaris operating system only: The coreadm tool has been used to configure the directory and name of the core file, but any of the above reasons apply for the configured directory or filename.
The core file size limit is too low. Check your core file limit using the ulimit -c command (Bash shell) or the limit -c command (C shell). If the output from this command is not unlimited, the core dump file size might not be large enough. If this is the case, you will get truncated core dumps or no core dump at all. In addition, ensure that any scripts that are used to launch the VM or your application do not disable core dump creation.
The process is running a setuid program and therefore the operating system will not dump core unless it is configured explicitly.
Java specific: If the process received SIGSEGV or SIGILL but no core dump, it is possible that the process handled it. For example, HotSpot VM uses the SIGSEGV signal for legitimate purposes, such as throwing NullPointerException , deoptimization, and so forth. The signal is unhandled by the Java VM only if the current instruction (PC) falls outside Java VM generated code. These are the only cases in which HotSpot dumps core.
Java specific: The JNI Invocation API was used to create the VM. The standard Java launcher was not used. The custom Java launcher program handled the signal by just consuming it and produced the log entry silently. This situation has occurred with certain Application Servers and Web Servers. These Java VM embedding programs transparently attempt to restart (fail over) the system after an abnormal termination. In this case, the fact that a core dump is not produced is a feature and not a bug.
17.4.4 Collect Crash Dumps on Windows
On Windows operating system there are three types of crash dumps:
Dr. Watson logfile, which is a text error log file that includes faulting stack trace and a few other details.
User minidump, which can be considered a «partial» core dump. It is not a complete core dump, because it does not contain all the useful memory pages of the process.
Dr. Watson full-dump, which is equivalent to a Unix core dump. This dump contains most memory pages of the process (except for code pages).
When an unexpected exception occurs on Windows, the action taken depends on two values in the following registry key:
The two values are named Debugger and Auto . The Auto value indicates if the debugger specified in the value of the Debugger entry starts automatically when an application error occurs.
A value of 0 for Auto means that the system displays a message box notifying the user when an application error occurs.
A value of 1 for Auto means that the debugger starts automatically.
The value of Debugger is the debugger command that is to be used to debug program errors.
When a program error occurs, Windows examines the Auto value and if the value is 0 it executes the command in the Debugger value. If the value for Debugger is a valid command, a message box is created with two buttons: OK and Cancel. If the user clicks OK, the program is terminated. If the user clicks Cancel, the specified debugger is started. If the value for the Auto entry is set to 1 and the value for the Debugger entry specifies the command for a valid debugger, the system automatically starts the debugger and does not generate a message box.
The following are two ways to collect crash dump on Windows.
The Dr. Watson debugger is used to create crash dump files. By default, the Dr. Watson debugger (drwtsn32.exe) is installed into the Windows system folder ( %SystemRoot% \System32).
To install Dr. Watson as the postmortem debugger, run the following command:
To configure name and location of crash dump files, run drwtsn32 without any options.
In the Dr. Watson GUI window, make sure that the Create Crash Dump File check box is selected and that the crash dump file path and log file path are configured in their respective text fields.
Dr. Watson may be configured to create a full dump using the registry. The registry key is shown in Example 17-4.
Example 17-4 Registry Key to Create a Full Dump
Note: If the application handles the exception, then the registry-configured debugger is not invoked. In that case it might be appropriate to use the -XX:+ShowMessageBoxOnError command-line option to force the process to wait for user intervention on fatal error conditions.
Force a crash dump:
On the Windows operating system, the userdump command-line utility can be used to force a Dr. Watson dump of a running process. The userdump utility does not ship with Windows but instead is released as a component of the OEM Support Tools package.
An alternative way to force a crash dump is to use the windbg debugger. The main advantage of using windbg is that it can attach to a process in a non-invasive manner (that is, read-only). Normally Windows terminates a process after a crash dump is obtained but with the non-invasive attach it is possible to obtain a crash dump and let the process continue. To attach the debugger non-invasively requires selecting the Attach to Process option and the Noninvasive checkbox.
When the debugger is attached, a crash dump can be obtained using the command shown in Example 17-5.
Example 17-5 Get a Crash Dump
The windbg debugger is included in the «Debugging Tools for Windows» download.
An additional utility in this download is the dumpchk.exe utility, which can verify that a memory dump file has been created correctly.
Both userdump.exe and windbg require the pid of the process. The userdump -p command lists the process and program for all processes. This is useful if you know that the application is started with the java.exe launcher. However, if a custom launcher is used (embedded VM), it might be difficult to recognize the process. In that case you can use the jps command-line utility as it lists the pids of the Java processes only.
As with Oracle Solaris and Linux operating systems, you can also use the -XX:+ShowMessageBoxOnError command-line option on Windows. When a fatal error is encountered, the process shows a message box and waits for a yes or no response from the user.
Источник