The Institute of Physical Metallurgy and Metal Physics of RWTH Aachen University (IMM) develops a code for the simulation of microstructure evolution in polycrystalline materials, called GraGLeS2D. The OpenMP parallel code is designed to run on large SMP machines in the RWTH compute cluster with 16-sockets and up to 2 TB of memory. After a POP performance audit of the code done by POP experts, several performance issues in the code were detected and a performance plan on how these issues could be resolved was set up.
To verify the proposed optimization steps, POP experts and the code developer at IMM implemented these steps in close collaboration as the first proof-of-concept study done in POP. The optimizations include:
- The use of a memory allocation library optimized for multi-threading.
- Reordering the work distribution to threads in comparison to optimize for data locality between neighboring cells. (see Figure below)
- Algorithmic optimizations in the convolution algorithm.
- Code restructuring to enable vectorization in parts of the computation.
After these optimization steps were implemented, a significant performance improvement was achieved. For the hotspot of the application, the convolution region, the speedup going from 1 to 16 sockets is about 15 instead of 6 as it was before the optimization. Overall, the runtime of this region was improved by a factor of more than 10X. So, the proof-of-concept verified that the planned optimizations indeed resulted in significantly better code performance.