Tool Time: Using PAPI Counters to compute IPC (Instructions Per Cycle) for an MPI Fortran application

Tuesday, October 19, 2021

PAPI (Performance Application Programming Interface) is an open-source API developed by the University of Tennessee to access hardware performance counters on most modern microprocessors [1].

PAPI is written in C, and it is used by many profiling tools, such as TAU, Score-P, and EXTRAE. In some cases, it is more convenient to use PAPI directly from the application, to reduce the overhead and obtain only the desired quantities. In this article, we describe how to call this API from an MPI Fortran application. The PAPI /include directory where the header file fpapi.h is located must be accessible. Taking Marenostrum4 as an example, the first step is to load a PAPI module

    load module PAPI/5.7.0

Then, during the compilation of the Fortran application, we need to set the PAPI paths

    PAPIPATH =/apps/PAPI/5.7.0
    INCPATH = $(PAPIPATH)/include
    LIBS += -L${PAPIPATH}/lib -lpapi

Finally, include the PAPI header file into your application

    include "f90papi.h"

In the Fortran application, we first initialize the PAPI library and create an empty event set to store the PAPI events

    check = PAPI_VER_CURRENT
    call PAPIF_LIBRARY_INIT(check)
    EventSet = PAPI_NULL
    call PAPIF_CREATE_EVENTSET(EventSet, check)
    ! add PAPI events by conveniently defined parameter
    call PAPIF_ADD_EVENT(EventSet, PAPI_TOT_CYC, check)
    call PAPIF_ADD_EVENT(EventSet, PAPI_TOT_INS, check)

In this example, we are interested in two PAPI events, namely, the total number of cycles and instructions (PAPI_TOT_CYC, PAPI_TOT_INS). All PAPI events are referenced by name and collected in an event set.

    call PAPIF_NUM_EVENTS(EventSet, EventCount, check)

We use PAPIF_START and PAPIF_STOP calls to identify the section of the application from which we want to collect data.

    call PAPIF_START(EventSet, check)
    !! perform MPI computation
    call PAPIF_STOP(EventSet, EventValues, check)

For this application, we use the stored event values to compute the IPC.

    IPC = real(EventValues(2))/real(EventValues(1))

The result can be seen below.

    PAPI_TOT_CYC  : 11240960793
    PAPI_TOT_INS  : 21290930932
    IPC [INS/CYC] : 1.89

Note that PAPI calls must be called before MPI_Finalize(). Moreover, the values listed above are for an individual MPI process.

For the complete list of Standardized Event Definitions, check the papiStdEventDefs.h header file, while for the list of available PAPI counters, enter papi_avail from the terminal.

-- Federico Panichi (NAG)

[1] P. J. Mucci, et al. “PAPI: A Portable Interface to Hardware Performance Counters”, Proceedings of the Department of Defense HPCMP Users Group Conference, 7 – 10, (1999)