Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Nitin A. Gawande, Jeff A. Daily, Charles Siegel, Nathan R. Tallent, Abhinav Vishnu

Published: 2020, Last Modified: 05 Mar 2025Future Gener. Comput. Syst. 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We have provided a detailed performance and power scaling analysis of important CNN workloads on two architectures: (a) NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and (b) a cluster with Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path.•For ML workloads considered here, GPUs provide the highest overall raw performance. We also find that a single KNL can be competitive with a single Pascal in certain cases. Focusing DL architectural innovation on FLOPs can be misguided.•The importance of the interconnect is highly dependent on neural network architecture.