Tool Time: Using the Python Extrae API to Profile a Region of a Code

Thursday, May 28, 2020

The Extrae profiling tool, developed by BSC, can very quickly produce very large trace files, which can take several minutes to load into Paraver, the tool used to view the traces. These trace files can be kept to a more manageable size by using Extrae’s API to turn the tracing on and off as needed. For example, the user might only want to record data for two or three time steps. This API was previously only available for codes developed in C, C++ and Fortran but now it also supports Python codes using MPI. This article describes how to use this.

The directory $EXTRAE_HOME/libexec should contain Python modules which can be used in user codes. If these modules are not there, then this might mean that they were not built because the system does not have Python installed. Assuming they are there, set the Python module path appropriately:

export PYTHONPATH=$EXTRAE_HOME/libexec:$PYTHONPATH

Then import the Extrae module into your user application:

import pyextrae.mpi as pyextrae

If you are not interested in profiling the initialisation section, then switch off profiling:

pyextrae.shutdown()

If you are interested in profiling a particular time step, for example, this could be done using:

for i in range(nsteps + 1):
    if i == time_step_save:
        pyextrae.restart()

    next(irun) # time step calculation

    if i == time_step_save:
        pyextrae.shutdown()

Note that the restart() and shutdown() function calls must be called between MPI_INIT() and MPI_FINALIZE(). The resulting trace file, viewed in Paraver, will look similar to the one below.

The green colour indicates that tracing is disabled. The computation that follows is a single time step, so keeping the associated trace files small and ensuring that Paraver is very responsive.

Note that any code that appears before MPI_INIT() will appear as computation because Extrae uses MPI_INIT() to initialise its data structures. In addition, even if the Extrae shutdown function is the first to be called, Python import statements may also appear in the timeline. Such events should not be confused with the user code.