Tuesday, July 16, 2019

A common problem encountered when attempting to set up workflows on HPC machines is handling and installing all the necessary dependencies such as vendor-specific MPI libraries, mathematical and parallel toolkits, and other necessary software libraries. Although most HPC administrators will provide a set of commonly used applications and libraries, it is still very often necessary to manually compile at least some dependencies yourself. This manual installation can be both time-consuming and error-prone and can make migration to a new HPC system an unappetising prospect.

Installing the POP profiling tools requires configuration and installation of various low-level libraries to support their tracing and instrumentation functionality. To help ease this installation and dependency management, I recommend the use of an HPC package manager such as Spack.

Spack is a package manager designed for HPC systems designed to automate the installation and management of HPC software and libraries (the Spack documentation provides an up to date list of the packages provided by Spack). It can be used for both central provision of packages and for individual users to install the packages they need. Although it has been designed with HPC usage in mind, it can be used just as effectively to install scientific software on a local machine.

Its main features include:

Easy installation and porting to different machines
Simple installation of over 3000 scientific packages
Allowing multiple package/library versions to coexist
Managing builds with multiple compilers
Easy addition of new packages

In particular, the following POP tools are available in Spack:

Dimemas
Extrae
Scalasca
Score-p
SimGrid
Tau
Vampirtrace

This blog post will give a brief overview of how Spack provides a simple method to install not only the POP tools but a wide range of HPC and scientific software quickly and easily.

Installing Spack

Spack itself is written in Python, and requires only that the system provides Python version 2.6 or newer and a working C/C++ compiler.

It is hosted on Github and installation can be performed by either cloning the git repository:

$ git clone https://github.com/spack/spack.git

or downloading and extracting the latest release tarball, for example:

$ wget https://github.com/spack/spack/releases/download/v0.12.1/spack-0.12.1.tar.gz

$ tar xvf spack-0.12.1.tar.gz

Due to the rolling-release nature of Spack it is recommended to use the latest version of the package repository via the git clone method. However, if git is not available on the target system, the release tarball version may be used to install git and then updated to the latest version using the newly installed git package.

Finally, enable usage of the Spack command by sourcing the Spack activation script:

$ export SPACK_ROOT=/path/to/spack
$ . $SPACK_ROOT/share/spack/setup-env.sh

Installing Packages with Spack

To determine if the package you want to install is provided by Spack, you can either check the package list in the Spack documentation, or use the spack list command on an existing installation. For example: to list available packages which match the name "lapack":

$ spack list lapack
==> 3 packages.
clapack  netlib-lapack  netlib-scalapack

Installing a package uses the install command. For example to install the Scalasca profiling tool:

$ spack install scalasca

Spack will then install all the dependencies that Scalasca itself requires, and then install Scalasca.

Using Spack Packages

Spack packages are used in the same way as "traditional" modulefiles on HPC systems. They can be queried and loaded using an installed module system, or ideally using the spack load command.

For example, to load the Scalasca module for use:

$ spack load scalasca

Spack modules can be used alongside "normal" system modules, but be aware that Spack does not, by default, know anything about these system modules.

Controlling What Spack Installs

To find out beforehand exactly what Spack will install, use the spec command. This asks Spack to build a concrete specification of exactly what packages it will install to satisfy the requirements you have given it. For example, revisiting the Scalasca example:

Input spec

--------------------------------

Scalasca

Concretized

--------------------------------

scalasca@2.4%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
    ^cubew@4.4.2%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
    ^pkgconf@1.6.1%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
    ^zlib@1.2.11%gcc@8.1.0+optimize+pic+shared arch=linux-ubuntu16.04-x86_64
    ^openmpi@3.1.4%gcc@8.1.0~cuda+cxx_exceptions fabrics=none arch=linux-ubuntu16.04-x86_64
    ^hwloc@1.11.11%gcc@8.1.0~cairo~cuda~gl+libxml2~nvml+pci+shared arch=linux-ubuntu16.04-x86_64
        ^libpciaccess@0.13.5%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
        ^libtool@2.4.6%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
            ^m4@1.4.18%gcc@8.1.0 +sigsegv arch=linux-ubuntu16.04-x86_64
            ^libsigsegv@2.11%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
        ^util-macros@1.19.1%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
        ^libxml2@2.9.9%gcc@8.1.0~python arch=linux-ubuntu16.04-x86_64
        ^libiconv@1.15%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
        ^xz@5.2.4%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
        ^numactl@2.0.12%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
        ^autoconf@2.69%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
            ^perl@5.26.2%gcc@8.1.0+cpanm +shared+threads arch=linux-ubuntu16.04-x86_64
            ^gdbm@1.18.1%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
                ^readline@7.0%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
                ^ncurses@6.1%gcc@8.1.0~symlinks~termlib arch=linux-ubuntu16.04-x86_64
        ^automake@1.16.1%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64
    ^otf2@2.1.1%gcc@8.1.0 arch=linux-ubuntu16.04-x86_64

This shows the complete tree of dependencies required to install and use Scalasca. There are several important things we can see here. First, notice that Spack is installing a copy of OpenMPI. MPI is a key dependency for Scalasca, but depending on the system you are using OpenMPI may not be the correct choice. In this case Spack allows you to either specify another MPI implementation to be installed rather than the default choice of OpenMPI, or to make use of an already installed MPI, for example, an optimised vendor supplied version on a large HPC machine. This is covered in detail in the Spack documentation.

Second notice that each package is followed by a series of specifiers. The @ specifier indicates the version of the package that will be installed, the % syntax denotes the compiler version that will be used, and the + and ~ indicate that optional features will be enabled or disabled respectively. This same syntax can be used on the command line to request a specific package version, compiler version and features. Again, this is fully detailed in the Spack documentation.

Summary

Spack provides a way to quickly and easily build everything you need to install and run the POP analysis tools and profile your workflow. Its only dependencies are Python 2.6 or later and a working C/C++ compiler. It provides a simple interface to managing installed packages and their dependencies, with over 3000 scientific software packages available to install.

-- Phil Tooley (NAG)

Tags:

Performance tools

Tool time

Performance Optimisation and Productivity

Tool Time: Installing and Managing HPC Software with Spack

Installing Spack

Installing Packages with Spack

Using Spack Packages

Controlling What Spack Installs

Summary