14th POP Webinar - Energy Efficient Computing using Dynamic Tuning

Thursday, April 2, 2020

We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.

The presentation slides are also available here.

This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.

The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest.

MERIC at first performs the parameter space search for hardware parameters for all significant regions. The CPU DVFS (dynamic voltage and frequency scaling), CPU uncore (non-core parts of the CPU, e.g. cache, memory controller) frequency and number of active CPU cores are tuned. For every tested combination, MERIC records the runtime and energy consumption for further analysis.

In the second step we use RADAR tool, that analyses the data for all significant regions and identifies the optimal configurations. These are than used to create tuning model for production runs of the HPC application. If the developer is interested in inspecting the behaviour of an HPC application, RADAR also presents the recorded data using its graphic user interface (GUI). The GUI shows how the runtime and energy consumption per region changes with tuning of hardware parameters.

Finally, having the tuning model, the analysed application is prepared for production runs. In this scenario MERIC will read the tuning model and performs the dynamic tuning (applying the tuning model) as a HPC application progresses from one significant region to another.

This methodology has been thoroughly evaluated within the READEX project using both benchmark applications and HPC applications, and the results are shown in the Table 1. All applications have been built with the Intel compiler. One can see that approximately 20% of energy savings is achieved across the various applications and we believe that similar savings can be achieved by new applications without any code modification.

Application HW parameters Static tuning saving
node energy (%) / time (%)
Dynamic tuning savings
node energy (%) / time (%)
AMG2013 CF, UCF, threads 12.5 / -0.9 7.8 / -14
Blasbench CF, UCF, threads 7.4 / -0.9 15.3 / -18.1
Kripke CF, UCF 11.5 / -28.3 18.8 / -18.7
Lulesh CF, UCF, threads 17.6 / -8.9 18.7 / -11.7
NPB3.3-BT-MZ CF, UCF, threads 11 / -11.3 10.8 / -12
BEM4I CF, UCF, threads 15.7 / -6.2 34.1 / 10.9
INDEED CF, UCF, threads 17.6 / -12.8 19.5 / -14.2
ESPRESO CF, UCF, threads 4.3 / -8.9 8.2 / -10.1
OpenFOAM CF, UCF 15.9 / -10.5 20.1 / 11.5

Table 1. Evaluation of READEX Tool Suite on TUD Taurus Haswell system with HDEEM energy measurements. Negative time values indicate longer runtime, but with the energy saving benefit

About the Presenter

Lubomir Riha, Ph.D. is the Head of the Infrastructure Research Lab at IT4Innovations National Supercomputing Center. Previously he was a senior researcher in the Parallel Algorithms Research Lab at IT4Innovations and a research scientist in the High Performance Computing Lab at George Washington University, ECE Department. Currently he is a local principal investigator of the H2020 Center of Excellence, POP.