Architectural Requirements for Deep Learning Workloads in HPC Environments

Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams

Published: 2021, Last Modified: 13 May 2023PMBS 2021Readers: Everyone

Abstract: Scientific machine learning (SciML) promises to have a transformational impact on scientific exploration, by combining state-of-the-art AI methods with the latest generation of supercomputers. However, to efficiently leverage ML techniques on high-performance computing (HPC) systems, it is critical to understand the performance characteristics of the underlying algorithms on modern computational systems. In this work, we present a new methodology for developing a detailed performance understanding of ML benchmarks. To demonstrate our approach we investigate two emerging SciML benchmark applications from cosmology and climate, ComsoFlow and DeepCAM, as well as ResNet-50, a well-known image classification model. We develop and validate performance models that explore the key architectural artifacts, including memory requirements, data reuse, and performance efficiency across both single- and multiple-GPU computations. Our methodology also focuses on the complexity of data-movement across storage and memory hierarchies, and leverages our performance models to capture key components of runtime execution while highlighting design tradeoffs. Although our work focuses on image-processing methods on GPU-based HPC systems, our approach is applicable to a variety of ML algorithmic domains and emerging AI accelerators. Overall, our insights will help computer architects and data scientists understand performance bottlenecks and optimization opportunities to improve SciML design and system efficiency.

0 Replies