Tool Time: memP - Parallel Heap Profiling

Monday, November 16, 2020

memP is a parallel heap profiling library for MPI applications. The intent of memP is to identify the heap allocation that causes a task to reach its memory in use high water mark (HWM) for each task in a parallel job. Currently, memP requires that all tasks call MPI_Init and MPI_Finalize. Both subroutines are used to manage the internal memP data structures. To use this tool, no code changes are required but the application needs to be linked against the memP library.

There are two types of memP reports:

  • Summary Report: Generated from within MPI_Finalize, this report describes the memory HWM of each task over the run of the application. This can be used to determine which task allocates the most memory and how this compares to the memory of other tasks;
  • Task Report: Based on specific criteria, a report can be generated that provides a snapshot of the heap memory currently in use, including the amount allocated at specific call sites.

The memP tool can be downloaded from Sourceforge:

and built from the source code and requires the following packages:

  • libunwind and libunwind-devel;
  • binutils and binutils-devel.

The user code is built as normal, but the final link line must include the following to link it against the memP library (shown in boldface):

mpicc -g -c mpi_code.c
mpicc -g mpi_code.o [other object files]           \
      -L<path-to-memp>/memP-1.0.3/lib -lmemP -lbfd \
      -lunwind -lm -o mpi_code.exe

The link flag shown in boldface should appear after other object files and libraries. Then execute your code as you would normally do:

mpirun -n 4 ./mpi_heat2D.exe
[ … program output … ]
memP: Storing memP output in [./mpi_heat2D.exe.4.903415.1.memP]

The last line is output from memP showing the output of the memory tracing, which is stored in a text file. The output filename is in the format:

<executable>.<task count>.<process id>.<index>.memP

You can control memP using the MEMP environment variable and the options are shown in the table below.

Option Description Default
-a Abort when threshold reached (see -h option).  
-d Summary Report: Print detailed call site list.  
-e exe Specify the full path to the executable.  
-f dir Record output file in directory <dir>. .
-g[#] Enable memP debug mode with number indicating debugging level. disabled
-h n Task Report: HWM Threashold  
-i n Task Report: Maximum number of reports to generate.  
-j n Summary & Task Report: Only report on the specified MPI_COMM_WORLD rank  
-k n Sets callsite stack traceback depth to <n>. 8
-n Do not truncate full pathname of filename in callsites.  
-o Disable profiling at initialization. Application must enable profiling with MPI_Pcontrol()  
-p n Sets the number of HWM task entries to list in the report. If -1, list all. 1
-s n Set hash table size to <n>. 256
-t Generate stack trace data. disabled
-x Generate XML report.  

Thus, to generate stack trace data with detailed call site in XML format, use:

export MEMP=”-t -d -x”

The output file will contain memory statistics of each MPI process in text format.