Debugging: Core dump / Memory Dump - 2020
A core dump is basically a snapshot of the memory when the program crashed.
It's basically the process address space in use (from the mm_struct structure which contains all the virtual memory areas), and any other supporting information at the time it crashed.
A process dumps core when it is terminated by the operating system due to a fault in the program. The most typical reason this occurs is because the program accessed an invalid pointer value. For example, when we try to dereference a NULL pointer, we receive a SEGV signal, before exit. As part of that process, the operating system tries to write our information to a file for later post-mortem analysis.
We can use the core dump to diagnose and debug our computer programs by loading the core file into a debugger along with the executable file (for symbols and other debugging information). Since core dumps can take non-trivial amounts of disk space, there is a configurable limit on how large they can be. We can see it with
ulimit -c
Here is a sample code:
/* t.c */ #include <stdio.h> void foo() { int *ptr = 0; *ptr = 7; } int main() { foo(); return 0; }
If we run the code, we get the following run-time error:
Segmentation fault (core dumped)
But it's not in the current directory. Where is it?
Let's go to /proc/sys/kernel directory.
$ ls -la core* -rw-r--r--. 1 root root 0 Aug 28 23:53 core_pattern -rw-r--r--. 1 root root 0 Aug 28 16:12 core_pipe_limit -rw-r--r--. 1 root root 0 Aug 28 23:53 core_uses_pid
What's in the core_pattern?
$ cat core_pattern |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e
Note that the first character of the pattern is a "|", the kernel will treat the rest of the pattern as a command to run. The core dump will be written to the standard input of that program instead of to a file.
So, it turned out that my Fedora is configured to send it to the the Automatic Bug Reporting Tool (ABRT). We need to change the line of the core_pattern to "core".
$ sudo bash -c 'echo core.%e.%p > /proc/sys/kernel/core_pattern' $ cat /proc/sys/kernel/core_pattern core.%e.%p
In the setting, %e is for executable file name and %p is for pid.
After setting the size(in 512-byte blocks) of the core file, we run the code again:
$ ulimit -c unlimited $ ./t Segmentation fault (core dumped) $ ls core.t.3209 t t.c
The core file used to be simply a binary file, however, because in modern OS, the address space of a process may not be sequential, and a process may share pages with other processes, the core file should be able to represents more info as well as the state of the program at the time of dumping. On linux system, the ELF (Executable and Linkable Format) file format is being used.
We can use the core file with gdb:
gdb executable core-file
In our case:
$ gdb ./t core.t.3529 GNU gdb (GDB) Fedora (7.5.0.20120926-25.fc18) ... Reading symbols from /home/khong/TEST/DMP/t...done. [New LWP 3529] Core was generated by `./t'. Program terminated with signal 11, Segmentation fault. #0 0x00000000004004fc in foo () at t.c:6 6 *ptr = 7;
It indicates the line 6 has an issue. We know we're deferencing the NULL pointer at this line.
We can use backtrace to list the call stacks that had been made when the program crashed:
(gdb) backtrace #0 0x00000000004004fc in foo () at t.c:6 #1 0x0000000000400512 in main () at t.c:11
To move up and down the call stacks:
(gdb) up #1 0x0000000000400512 in main () at t.c:11 11 foo(); (gdb) down #0 0x00000000004004fc in foo () at t.c:6 6 *ptr = 7;
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization