C++ Tutorial
Multi-Threaded Programming
Debugging - 2020
- TotalView
- Platforms: Linux, AIX, Solaris, Tru64, Cray (Linux-CNL, Catamount, Mac OS
- From wiki - It allows process control down to the single thread, the ability to look at data for a single thread or all threads at the same time, and the ability to synchronize threads through breakpoints. TotalView integrates memory leak detection and other heap memory debugging features. Data analysis features help find anomalies and problems in the target program's data, and the combination of visualization and evaluation points lets the user watch data change as the program executes. TotalView includes the ability to test fixes while debugging. It supports parallel programming including Message Passing Interface (MPI), Unified Parallel C (UPC) and OpenMP. It can be extended to support debugging CUDA. It also has an optional add-on called ReplayEngine that can be used to perform reverse debugging (stepping backwards to look at older values of variables.)
- IntelĀ® Parallel Studio 2011
It plugs into the Microsoft Visual Studio Integrated Development Environment by adopting a common runtime called the Microsoft Concurrency Runtime, which is part of Visual Studio 2010.
- Parallel Composer consists of the Intel C++ compiler, a number of performance libraries (Integrated Performance Primitives), Intel Threading Building Blocks and a parallel debugger extension.
- Parallel Inspector improves reliability by identifying memory errors and threading errors.
- Parallel Amplifier is a performance profiler that analyzes hotspots, concurrency and locks-and-waits.
- Oracle Solaris Studio 12.2
Solaris and Linux
- Visual Studio 2012
- Valgrind
It is a programming tool for memory debugging, memory leak detection, and profiling. Released under the terms of the GNU General Public License, Valgrind is free software.
Here is a simple way of checking memory leak: Code looks like this:
#include <stdlib.h> #include <stdio.h> int main() { printf("mem leak testing....\n"); int *ptr =(int *)malloc(1000*sizeof(int)); return 0; }
Compile and run with Valgrind:$ gcc -g -o test test.c $ valgrind --tool=memcheck --leak-check=full ./test ==2948== Memcheck, a memory error detector ==2948== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==2948== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==2948== Command: ./test ==2948== mem leak testing.... ==2948== ==2948== HEAP SUMMARY: ==2948== in use at exit: 4,000 bytes in 1 blocks ==2948== total heap usage: 1 allocs, 0 frees, 4,000 bytes allocated ==2948== ==2948== 4,000 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==2948== at 0x402BB7A: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==2948== by 0x804845C: main (test.c:6) ==2948== ==2948== LEAK SUMMARY: ==2948== definitely lost: 4,000 bytes in 1 blocks ==2948== indirectly lost: 0 bytes in 0 blocks ==2948== possibly lost: 0 bytes in 0 blocks ==2948== still reachable: 0 bytes in 0 blocks ==2948== suppressed: 0 bytes in 0 blocks ==2948== ==2948== For counts of detected and suppressed errors, rerun with: -v ==2948== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Helgrind is a tool capable of detecting race conditions in multithreaded code.
The following sample shows the race condition detected by valgrind using helgrind tool.
The code Race Condition looks like this:
#include <stdio.h> #include <pthread.h> static volatile int balance = 0; void *deposit(void *param) { char *who = param; int i; printf("%s: begin\n", who); for (i = 0; i < 1000000; i++) { balance = balance + 1; } printf("%s: done\n", who); return NULL; } int main() { pthread_t p1, p2; printf("main() starts depositing, balance = %d\n", balance); pthread_create(&p1;, NULL, deposit, "A"); pthread_create(&p2;, NULL, deposit, "B"); // join waits for the threads to finish pthread_join(p1, NULL); pthread_join(p2, NULL); printf("main() A and B finished, balance = %d\n", balance); return 0; }
Here is the Makefile:test2: test2.o gcc -g -o test2 test2.o -Wall -lpthread test2.o: test2.c gcc -c test2.c clean: rm -f *.o test2
Run with valgrind:$ make $ valgrind --tool=helgrind ./test2 ==3041== Helgrind, a thread error detector ==3041== Copyright (C) 2007-2011, and GNU GPL'd, by OpenWorks LLP et al. ==3041== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==3041== Command: ./test2 ==3041== main() starts depositing, balance = 0 A: begin B: begin ==3041== ---Thread-Announcement------------------------------------------ ==3041== ==3041== Thread #3 was created ==3041== at 0x4153D28: clone (clone.S:111) ==3041== ==3041== ---Thread-Announcement------------------------------------------ ==3041== ==3041== Thread #2 was created ==3041== at 0x4153D28: clone (clone.S:111) ==3041== ==3041== ---------------------------------------------------------------- ==3041== ==3041== Possible data race during read of size 4 at 0x804A02C by thread #3 ==3041== Locks held: none ==3041== at 0x8048514: deposit (in /home/khong/TEST/Valgrind/test2) ==3041== by 0x402D95F: ?? (in /usr/lib/valgrind/vgpreload_helgrind-x86-linux.so) ==3041== by 0x4050D4B: start_thread (pthread_create.c:308) ==3041== by 0x4153D3D: clone (clone.S:130) ==3041== ==3041== This conflicts with a previous write of size 4 by thread #2 ==3041== Locks held: none ==3041== at 0x804851C: deposit (in /home/khong/TEST/Valgrind/test2) ==3041== by 0x402D95F: ?? (in /usr/lib/valgrind/vgpreload_helgrind-x86-linux.so) ==3041== by 0x4050D4B: start_thread (pthread_create.c:308) ==3041== by 0x4153D3D: clone (clone.S:130) ==3041== ==3041== ---------------------------------------------------------------- ==3041== ==3041== Possible data race during write of size 4 at 0x804A02C by thread #3 ==3041== Locks held: none ==3041== at 0x804851C: deposit (in /home/khong/TEST/Valgrind/test2) ==3041== by 0x402D95F: ?? (in /usr/lib/valgrind/vgpreload_helgrind-x86-linux.so) ==3041== by 0x4050D4B: start_thread (pthread_create.c:308) ==3041== by 0x4153D3D: clone (clone.S:130) ==3041== ==3041== This conflicts with a previous write of size 4 by thread #2 ==3041== Locks held: none ==3041== at 0x804851C: deposit (in /home/khong/TEST/Valgrind/test2) ==3041== by 0x402D95F: ?? (in /usr/lib/valgrind/vgpreload_helgrind-x86-linux.so) ==3041== by 0x4050D4B: start_thread (pthread_create.c:308) ==3041== by 0x4153D3D: clone (clone.S:130) ==3041== A: done B: done main() A and B finished, balance = 2000000 ==3041== ==3041== For counts of detected and suppressed errors, rerun with: -v ==3041== Use --history-level=approx or =none to gain increased speed, at ==3041== the cost of reduced accuracy of conflicting-access information ==3041== ERROR SUMMARY: 22 errors from 2 contexts (suppressed: 68 from 19)
There is a rule of thumb known as the Pareto principle, and it is also referred as the 80-20 rule. In other words, 80% of the effects of event come from only 20% of the possible causes. So, if we optimize 20% of our code, we realize 80% of all the gains in speed.
How can we know which 20% of our code to optimize? We need a profiler.
CodeAnalyst is a free performance analyzer from Advanced Micro Devices for programs on AMD hardware. It also does basic timer-based profiling on Intel processors.
DTrace dynamic tracing tool for Solaris, FreeBSD, Mac OS X and other operating systems.
Insure++ is Parasoft's runtime memory analysis and error detection tool. Its Inuse component provides a graphical view of memory allocations over time, with specific visibility into overall heap usage, block allocations, possible outstanding leaks, etc.
Parallel Studio from Intel contains Parallel Amplifier, which tunes both serial and parallel programs. It also includes Parallel Inspector, which detects races, deadlocks and memory errors. Parallel Composer includes codecov, a command line coverage tool.
Visual Studio Team System Profiler is Microsoft's commercial profiler offering.
Developer Edition by Software Diagnostics is a commercial integrated recorder, profiler and debugger for dynamic analysis, integrating dynamic tracing functionalities enabling reverse debugging and full comprehension of system behavior as well as Performance Analysis functionalities over the full software life cycle.
VTune from Intel for optimizing performance across Intel architectures.
Coming...
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization