A Fully Digital and Row-Pipelined Compute-in-Memory Neural Network Accelerator With System-on-Chip-Level Benchmarking for Augmented/Virtual Reality Applications
Abstract: Compute-in-memory (CIM) has emerged as an effective technique to address memory access bottlenecks for deep neural networks (DNNs). Augmented/virtual reality (AR/VR) devices require running high-performance DNN inference at tight power budgets, making CIMs ideal candidates for low-power on-device acceleration. While high energy efficiencies have been reported at the CIM macro levels, the energy efficiencies of CIM-based accelerators at the system-on-chip (SoC) level have been underexplored for realistic system integration considerations. In this work, we present a CIM accelerator architecture comprising 16 row-pipelined, fully digital CIM macros and provide a comprehensive analysis of CIM energy-efficiency benefits at the SoC level targeting representative AR/VR workloads. Two key results are as follows. 1) Realistic SoC-level CIM accelerator energy efficiency may be ∼50% lower than the CIM macro-level peak energy efficiency when additional logic, memory hierarchies, and NN-dependent suboptimal compute utilization are considered. 2) The CIM accelerator still demonstrates up to ∼2.1× energy savings at the SoC level compared to a systolic-array-based DNN accelerator at iso-peak throughput.
Loading