Materials modelling software in general has evolved from academic codes to sophisticated mature software over the last four decades or so. The science around their software is flourishing and there is funding available to add functionality and new science. The appropriate maintenance of software requires time and dedicated staff to provide it with its deserved lifetime of many decades and beyond. This long-term nature requires a sound legal and business foundation (business models, licensing, etc.) and software developers (from community driven free and open-source software to small, medium and large commercial enterprises[1][2]) have to consider this. Part of maintaining the code is to ensure it runs efficiently on massively parallel HPC systems as the science is becoming more complex and system sizes are increasing.
Major rewrites to materials software are required as the availability of massively parallel computing permits larger systems of 10,000s of atoms, rather than 100s, to be simulated. Many of the academic software owners could do this with using one full-time employee (FTE). However, once new parallel hardware architectures emerge, more FTEs are required to properly port and scale the code. Also, the more mature the software and the more HPC architecture changes one encounters, the costlier the software becomes to maintain, especially when taking into account the constant striving for greater performance. To improve large codes, one must identify where efforts should focus on by locating performance bottlenecks.
This is where a service provider like POP can provide a vital service. POP’s aim is to compliment domain specialists by assisting them to write efficient code that performs well on massively parallel systems. Materials modelling software developers are well versed in writing code and often perform benchmark and scaling tests. POP’s strengths are listed below:
- Profiling parallel codes using a unique methodology. This quantifies performance efficiencies and identifies areas for improvement;
- Promoting best practice in writing parallel code;
- Training in profiling and optimising parallel codes.
The profiling goes deep into how a parallel code is behaving, e.g. computational characteristics, communication characteristics, or I/O. POP have honed their skills to detect where exactly the performance issue is, which has led to refactoring of parallel codes and subsequently, optimisation for higher parallel scalability or quicker time to solution.
POP has experience with both discrete and continuum modelling software. For the ADF code, an example for discrete modelling, POP pinpointed a load imbalance due to unequal distribution of work and gave a recommendation on how to improve the load balancing algorithm with a performance improvement of a factor of two. POP also assessed a continuum modelling code, aiding with the performance improvement of the Urban Heat Island solver by Rheologic. This is an example where even though there was already a super-linear speedup for the test case used, the POP methodology identified room for improving the load balance of the application to boost the performance even further. Thus, POP demonstrated additional value to established materials modelling software which could aid with even more impactful exploitation by both academic and industrial end users.
-- Alexandra Simperler (Simperler Consulting)
[1] Goldbeck, Gerhard, and Simperler, Alexandra, 2019, Business Models and Sustainability for Materials Modelling Software: Zenodo, doi:10.5281/zenodo.2541723.
[2] Gerhard Goldbeck, Alexandra Simperler, Natalia Konchakova, & Daniel Höche. (2019, August 28). A Guide to Find The Right Business Model For Materials Modelling Software (Version 1). Zenodo. doi:10.5281/zenodo.3380362.