A recent POP assessment for the BayPass code gave Mathieu Gautier, the code developer, valuable insight into the cause of low IPC (instructions per cycle), allowing a new version of BayPass with significantly improved parallel scaling, e.g. over 10 times speedup relative to the original version.
BayPass implements Genome-Wide Scan for Adaptive Differentiation and Association Analysis with population-specific covariables, using OpenMP parallelisation. The POP assessment identified that the poor parallel scaling was due a reduction in IPC with increasing thread count, and provided a solution to fix the problem.
The plots show the performance before (blue) and after (red) implementing the POP recommendations, showing a significant reduction in run time.
-- Jonathan Boyle (NAG)