10-fold scalabilty improvement from POP services

Wednesday, May 24, 2017

EPW (Electron-Phonon using Wannier interpolation) is a materials science DFT code distributed in the Quantum ESPRESSO suite.  It is Fortran code parallelised with MPI.  Developers from the University of Oxford requested a POP performance audit of an unreleased version of the code that was still in development, to be tested with a GaN polar wurtzite crystal dataset on the ARCHER Cray XC30 computer at EPCC.


[Sample Simulation Output]

The initial audit of 48 processes identified a variety of load imbalance issues, and excessive time in the ephwann simulation phase.  This became the focus of a subsequent POP performance plan, where the developers specialized routines to avoid unnecessary calculation and optimize vector summations. Using a finer uniform grid reduced load imbalance and this revised version was 60% faster and could be used for larger execution configurations with 240 MPI processes.

Unfortunately, overall performance was disappointing, with writing the final simulation results having grown to dominate execution time. The figure shows a histogram of the writing time varying by MPI process on nine compute nodes. Although the amount of data is not large (around 50MB of formatted text), it was a bottleneck inhibiting scaling and larger simulations.  A POP proof-of-concept investigation was pursued which replaced file writing concurrently by all processes with serial writing only by rank zero. This reduced writing time from over seven hours to under one minute, and now a negligible component of EPW execution.

The final code scales well with 85% parallel efficiency for 960 MPI processes, supporting larger simulations.  These POP reports helped support EPW readiness to productively utilize additional larger allocations of computational resources.