Readiness of HPC Extreme-scale Applications (2nd Edition)

ISC HPC 2025 Workshop

Friday, June 13, 2025, 2:00pm - 6:00pm

The Top500 list of June 2022 had the first supercomputer with exaFLOPS HPC performance, after many years of international community pursuit of this goal, and several more systems have followed or are expected within the next year. One is the European supercomputer JUPITER, hosted by Jülich Supercomputing Centre and half funded by EuroHPC JU, guaranteeing access primarily for projects led by institutions in Europe. Now is therefore the time for HPC application software to demonstrate its readiness for extreme-scale computer systems composed from large assemblies of a heterogeneous variety of CPU processors and GPU accelerators.

Europe has been preparing HPC applications for this challenge for the last nine years through its Centres of Excellence (CoEs). Since 2015 more than 32 CoE projects have been funded; they aim at greatly extending the scalability of a large variety of HPC codes and improving their execution efficiency and performance.

Performance Optimisation and Productivity (POP) CoE, where both workshop organisers are task leaders, is dedicated to providing free performance assessments to HPC application developers and in particular supports the domain-specific CoEs (as well as the wider HPC community of academic and industry). Insights gathered by POP showed that, although they serve different science fields, many of the challenges that these applications face are common and also the solutions adopted. For this reason, we have organised mini-symposia at two PASC conferences addressing the question: "Are HPC codes ready for exa-scale? An EU HPC Centre of Excellence Point of View''. Representatives from different CoEs shared their experience, via presentations and panels, and very fruitful discussions resulted.

To broaden this activity to a larger community, last year we organised the first edition of this ISC workshop providing a forum to discuss common challenges, ideas, solutions, and opportunities from the point of view of HPC applications developers preparing for exa-scale. ISC is the leading HPC conference in Europe, gathering not only the main HPC vendors and providers but also developers and standardising committees from programming models, compilers, and other system software. However, we still need one of the key players in this ecosystem, the HPC applications! We seek to cover this gap and expand the ISC conference to HPC code developers.

The ISC schedule can be found here

Organizers:

  • Marta García-Gasulla
    Researcher and Team Leader, Barcelona Supercomputing Center
  • Brian J. N. Wylie
    Research Scientist, Forschungszentrum Jülich GmbH, Jülich Supercomputing Centre

Workshop agenda:

14:00 Welcome & Introduction to workshop (Garcia & Wylie) (Slides)
14:10 European HPC application ecosystem” (Guntram Berti, scapos/D) (Slides)
14:30 EuroHPC Centre of Excellence presentations
16:00 Break
16:30 Presentations
  • “POP CoE services & engagements with other CoEs” (Marta Garcia, BSC/E) (Slides)
  • “EuroHPC concept for applications” (Linda Gesenhues, EuroHPC JU/Lux)
17:00 Panel discussion (Moderator: Guy Lonsdale, scapos/D)
17:50 Conclusion (Garcia & Wylie)
18:00 Adjourn

Presentations:

  • European HPC application ecosystem
    Guntram Berti (scapos/D)

    Abstract: The talk will present an overview of the about 60 HPC codes of the current set of 14 HPC Centres of Excellence (CoEs), which are currently being ported to the EuroHPC JU systems. In addition, some highlights from the Innovation studies funded by the Inno4scale project will be discussed.

  • MaXimizing portability and performance of material modelling on EuroHPC clusters
    Laura Bellantani (CINECA/I)

    Abstract: The MaX (Materials design at Exascale) Centre of Excellence brings together computational scientists and core scientific communities with the goal of preparing key materials science codes for the future exascale computing clusters.  The MaX portfolio includes several flagship codes – QUANTUM ESPRESSO, YAMBO, SIESTA, FLEUR and BIGDFT – selected for their global user base, open-source nature, and complementary approaches to quantum materials modeling. During earlier phases of the project, these code underwent substantial refactoring following the principle of separation of concerns: this strategy separates high-level components of the codes, such as property calculators and quantum engines primarily developed by domain scientists, from low-level libraries, typically maintained by technologists and optimized for performance on the diverse computing architectures.

    In this talk we discuss how this modular design has proven effective in enhancing code portability and performance across a wide range of HPC platforms, while also ensuring long-term maintainability which is an essential requirement for community codes. We present the main challenges and milestones in enabling MaX applications to NVIDIA, AMD and Intel GPUs with “OpenX” directive-based programming models, discuss runtime optimizations with Multi-process-Service, and our investigation of communication backends beyond OpenMPI (specifically HPCX-MPI and NCCL) to improve the efficiency of memory-demanding workloads in multi-GPU scaling. We will address how our benchmarking and profiling campaign on EuroHPC clusters has been streamlined by integrating applications, tools and platforms into an ecosystem of interoperable JUBE scripts ensuring consistent and reproducible data acquisition, and how results are finally distributed to application users via an HTML-based visualization tool to help them identifying the most suitable configuration for production runs.

  • ESiWACE services and benchmark suite for weather and climate
    Erwan Raffin (Eviden/F)

    Abstract: ESiWACE is the European Centre of Excellence in Simulation of Weather and Climate in Europe. Among its activities, ESiWACE is delivering a range of services to support the weather and climate modelling community on the path to exascale computing on current and future European systems, including tailored help with model performance and scaling optimization. To illustrate this effort, two success stories will be presented in this talk: one dealing with GPU porting of the OGSTM ocean model and another on the optimization of the GLOBO atmospheric model improving its scalability. ESiWACE is also developing the High Performance Climate and Weather benchmark suite composed of flagship models and kernels. An overview of this domain specific benchmark will be presented and how it contributes to foster co-design especially for future European technology.

  • dealii-X: Preparing generic PDE solvers for exascale supercomputers
    Martin Kronbichler (RUB/D)

    Abstract: My talk will present activities in the EuroHPC Centre of Excellence dealii-X, a project aiming to develop efficient simulation software for biomedical applications in the human body. One of the core activities is the development of highly efficient building blocks for solving partial differential equations with the deal.II finite element library. We work on a comprehensive set of linear and nonlinear solvers for the mathematical models of fluid dynamics or (poro-)elasticity as well as new models for processes in cells. We will present algorithmic advances to use large GPU-based supercomputers and report on our experiences of performance portability for running our codes on hardware of different vendors.

  • STREAmS: portability, performance, maintainability. Can they coexist?
    Francesco Salvadore (CINECA/I)

    Abstract: We present the latest version of STREAmS the compressible flows solver that stems from the 20-year research history of the Sapienza University of Rome group and has been in recent years completely redesigned for HPC thanks to the contribution of CINECA. The current version includes support for several programming paradigms oriented to the most popular HPC systems: starting from the version with pure MPI for CPU architectures, there is the OpenMP version to add the thread layer to CPU runs, the CUDA Fortran version to use NVIDIA GPUs, the HIP version to use AMD GPUs and APUs, and the OpenMP-offload version, potentially portable and today used mainly to exploit Intel GPUs.

    Efficient algorithms and implementation enable STREAmS to address open problems in basic fluid dynamics research such as boundary layer or shock-boundary layer interaction, achieving resolutions which can approach conditions normally achievable only by experiments. STREAmS has been tested on a considerable variety of clusters and architectures showing very good single-node and scalability performance. We specifically discuss the case of the airfoil which, thanks to the curvilinear grids, can be studied up to Reynolds and Mach values close to those of real flights allowing Direct Numerical Simulation to approach the area of industrial use from which it is historically far.

    It is worth investigating the usability and especially the maintainability of a cross-platform code such as STREAmS. The goal of the solver's latest developments has been to broaden its usability without, however, distorting its origins and support from the community of scientific experts guiding its evolution. Code development takes place uniquely in CUDA Fortran (Fortran is the historical language of CFD) and following a rather simple set of programming policies. From the developed source code, our performance portability library ‘sutils’ is able to translate the code and generate all the backend-dependent parts, even writing the C layer if necessary. This means that a normal PhD student is able to operate in the code without deep parallel programming or HPC skills and this gives chances for a successful maintainability of the solver.

  • Supporting cutting edge development of LAMMPS with EESSI
    Helena Vela (DoItNow!/E)

    Abstract: The MultiXScale CoE is an exascale-oriented application co-design and delivery for multiscale simulations. It is a collaborative project between members of the CECAM network and the EESSI community that will allow domain scientists to take advantage of the computational resources offered by EuroHPC JU. In this presentation we are going to focus on one of the lighthouse codes within MultiXScale, LAMMPS, which is used by a large number of computational scientists. We will discuss how the developers of new plugins for LAMMPS are testing on a wide range of systems with the help of the software.eessi.io and dev.eessi.io repositories.

    The dev.eessi.io repository allows developers to share pre-releases of their software so they can test it on systems where EESSI is available, this includes the EuroHPC systems Vega, Karolina and Deucalion. For example, on Vega, development codes of LAMMPS are already available for using and testing through dev.eessi.io. In this talk we will show all the CI infrastructure to provide pre-released versions of your software using the plugins under development in MultiXscale, with LAMMPS as an example.

  • Improving Energy Efficiency of SPACE CoE Codes
    João Barbosa (IT4Innovations/CZ)

    Abstract: This talk will present the activities implemented to enhance the energy efficiency of SPACE applications.

    We examined and measured energy consumption and efficiency by modifying specific hardware power settings. Our method involved static frequency tuning, where a single hardware configuration (CPU or GPU frequency) was set at the beginning of each application run and remained unchanged. We monitored and assessed execution time, energy consumption, and FLOPs per Watt.

    We will present the energy efficiency of three hardware platforms: (i) NVidia A100 GPU installed in the IT4Innovations Karolina supercomputer, (ii) Intel Sapphire Rapids processor with DDR and HBM memory, and (iii) NVidia's ARM-based GRACE CPU. These platforms were chosen based on their relevance to SPACE CoE's work, with a focus on GPU accelerators and co-design platforms.

    The methodology is not usable only for the Astrophysics & Cosmology community but, in general, for other parallel codes running on supercomputers around the world. If one considers that a machine consumes 20MW, energy savings of 5-10% are in orders in megawatts, which significantly reduces operational costs.

    The energy efficiency analysis of the parallel codes was done by the open-source MERIC tool. We want to highlight that for the static tuning approach, there is no need to change the code to the optimal settings, and it can be applied job-wide using a job scheduler.