Abstract: Highlights•We have provided a detailed performance and power scaling analysis of important CNN workloads on two architectures: (a) NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and (b) a cluster with Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path.•For ML workloads considered here, GPUs provide the highest overall raw performance. We also find that a single KNL can be competitive with a single Pascal in certain cases. Focusing DL architectural innovation on FLOPs can be misguided.•The importance of the interconnect is highly dependent on neural network architecture.
Loading