Retrospection on the Performance Analysis Tools for Large-Scale HPC Programs

Published: 01 Jan 2024, Last Modified: 15 May 2025HiPC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As the performance gap between hardware and software widens, performance analysis tools are essential for understanding the behavior of large-scale High-Performance Computing (HPC) programs. These tools provide insights into the performance bottlenecks and help in optimizing the performance of the programs. In this paper, we present a comprehensive study of performance analysis tools for large-scale HPC systems including both sampling-based and instrumentation-based tools that are commonly adopted in the HPC community. We investigate the abundance and overheads of data collection as well as the analysis capabilities of HPCToolkit, TAU, and Scalasca with representative programs at scale. Our study shows that different performance analysis tools have distinct strengths and weaknesses, and the choice of a performance analysis tool depends on the specific requirements of the user. We also discuss the challenges and future directions in the field of performance analysis tools for large-scale HPC systems.
Loading