A Uniform Latency Model for DNN Accelerators with Diverse Architectures and Dataflows

Linyan Mei; Huichu Liu; Tony F. Wu; Huseyin Ekin Sumbul; Marian Verhelst; Edith Beigné

A Uniform Latency Model for DNN Accelerators with Diverse Architectures and Dataflows

Linyan Mei, Huichu Liu, Tony F. Wu, Huseyin Ekin Sumbul, Marian Verhelst, Edith Beigné

Published: 01 Jan 2022, Last Modified: 15 May 2025DATE 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the early design phase of a Deep Neural Network (DNN) acceleration system, fast energy and latency estimation are important to evaluate the optimality of different design candidates on algorithm, hardware, and algorithm-to-hardware mapping, given the gigantic design space. This work proposes a uniform intra-layer analytical latency model for DNN accelerators that can be used to evaluate diverse architectures and dataflows. It employs a 3-step approach to systematically estimate the latency breakdown of different system components, capture the operation state of each memory component, and identify stall-induced performance bottlenecks. To achieve high accuracy, different memory attributes, operands' memory sharing scenarios, as well as dataflow implications have been taken into account. Validation against an in-house taped-out accelerator across various DNN layers has shown an average latency model accuracy of 94.3%. To showcase the capability of the proposed model, we carry out 3 case studies to assess respectively the impact of mapping, workloads, and diverse hardware architectures on latency, driving design insights for algorithm-hardware-mapping co-optimization.

Loading