POP tool descriptions: INESC-ID tools and methods

Thursday, June 27, 2024

The Cache-aware Roofline Model (CARM) [1], developed at the CHAMP Hub at INESC-ID, is an insightful computer architecture performance model. It offers a high-level picture on the fundamental memory and compute performance limitations, while also providing intuitive analysis of the application execution bottlenecks and effectively guiding optimization efforts. The CARM Tool [2] provides a one-stop shop for CARM related analysis across a variety of different architectures, supporting nearly all major CPU vendors and ISAs. It comprises a set of independent and complementary modules that provide a complete CARM-based profiling ecosystem, ranging from automatic system benchmarking to application analysis and result visualization.

The CARM Tool provides both a command line mode and a graphical user interface (GUI) to fully engage with the cross-platform architecture and application profiling facilities when executing user-specified tasks. Two different engines for in-depth CARM-based application analysis are provided based on: performance counters and/or dynamic binary instrumentation. With those, the performance and arithmetic intensity of analyzed applications and their hotspots (accessible via the custom built-in ROI code annotations) are automatically calculated and visualized in the tool-generated CARM plot of the architecture where the code is run. The CARM Tool also automatically saves the collected results from different platforms and application runs, which become available to be presented within the GUI or exported as SVG graphs.

The CARM Tool is equipped with a robust assembly-level micro-benchmarking module necessary to construct the CARM on all supported processors (e.g.: Intel/AMD x86-64, ARM AARCH64, and RISCV64), for any number of threads and a large set of different instruction set extensions (such as SIMD AVX512, Neon, RVV), data precisions, and instruction types. The Tool allows for a highly accurate and user customizable micro-benchmarking of complete memory subsystems and compute units, for various problem sizes, load/store and compute-to-memory operation ratios. It fully assesses the upper-bound capabilities of FP units and memory hierarchy levels (caches and DRAM).

The CARM Tool is open-source, more information about it can be found at https://github.com/champ-hub/carm-roofline.

References

[1] A. Ilic, F. Pratas, and L. Sousa, "Cache-aware Roofline model: Upgrading the loft", IEEE Computer Architecture Letters, vol. 13, no. 1, pp. 21–24, Jan. 2014. DOI: 10.1109/L-CA.2013.6

[2] https://github.com/champ-hub/carm-roofline