Run time halved for OpenMP code

Tuesday, May 25, 2021

Having already identified the three causes of low efficiency in GE GAS Power’s jCFD_Genesis code, POP returned to fix the outstanding issue in a ‘proof of concept’ study.

Fig. 1: jCFD_Genesis flow field solution

In the first phase of work, the POP metrics-based methodology had been utilised to identify that poor parallel performance was due to imbalanced OpenMP, excessive serial computation (outside OpenMP), and a significant reduction in IPC (instructions per cycle) as the number of threads was increased. These insights gained allowed Nadir Ince, the code developer, to fix two of the issues himself. However, POP’s help was needed again to improve the poor IPC scaling.

To protect their IP a sanitised version of the source code was made available to POP, which could be compiled and executed for analysis. Various tools (including MAQAO and Intel’s VTune) were utilised to investigate causes of the low IPC, and to identify which regions of code needed refactoring. A study of the relevant source code identified reordering for improved memory access and redundant computation for removal.

After the code modifications, the IPC value on 45 threads increased from 0.5 to 0.8 for the test case being used and required only 2/3 the number of useful instructions. This resulted in a significant 2.1 times speed up.

-- Jonathan Boyle (NAG)