The importance of accurate and comprehensive software performance analysis

Friday, June 2, 2017

What could you achieve if your parallel software ran faster?

Improving performance of your code could mean running at the resolution you need, coupling that extra package to make your simulations more realistic, or significantly reducing your time to solution to get what you want now rather than later.

But this is not an easy task. Many connecting and interweaving factors affect the performance of your application, and unpicking them takes a serious investment of time and effort which often is not available, leading to performance optimisation with a narrow focus. But naïvely deal with only the obvious issues, and sooner or later the next problem will pop up, just like playing whack-a-mole. If you don’t step back to look at the issues holistically, you may achieve little more than running a little bit faster or on a few more cores.

Fighting the symptoms rather than the underlying causes is only putting off the inevitable. The only optimal way to make significant improvements to your code is to fully understand all the reasons behind the behaviour of your application, and then focus on the specific underlying issues that throttle the performance. Hence at POP we use a clear methodology within our free code audits to give you a reproducible and comparable measure of the performance of your application in detail, to allow you to target your focus onto the real problems.

At the top level, the global efficiency metric tells you how your application scales for an overall picture of the potential for improvement. This global efficiency is the product of computation efficiency and parallelisation efficiency, which tell you whether your application is taking longer when doing your computation on more cores or spending more time doing the parallelisation.

Both these efficiencies are broken down further. For example, low computation efficiency could be due to requiring more instructions as you run on more cores, or maybe throughput has slowed caused by reduced IPC (instructions per cycle) or reduced CPU frequency. Parallelisation efficiency is also broken down further, e.g. a load balance component, serialisation efficiency and communication efficiency. And all these efficiencies can be analysed further.

Clearly such analysis is a significant undertaking, and can be hard to justify without knowing the resulting benefits. That is where the European Commission funded POP project can help. As a free service our experts use profiling tools with this methodology to locate and characterise potential for improvement in your application. Our work will give you the knowledge necessary to decide the best course of action to get the performance you need, and allow you to determine how much effort is needed with a clear view of the potential reward.