Abstract: Compute-in-memory (CIM) has recently emerged as a promising design paradigm to accelerate deep neural network (DNN) processing. Continuously better energy and area efficiency at the macrolevel had been reported through many testchips over the last few years. However, in those macro design-oriented studies, accelerator-level considerations, such as memory accesses and processing of entire DNN workloads have not been investigated in-depth. In this article, we aim to fill this gap starting with the characteristics of our latest CIM macro fabricated with cutting-edge FinFET CMOS technology at 4-nm node. We then study, through an accelerator simulator developed in-house, three key items that would determine the efficiency of our CIM macro in the accelerator context while running MLPerf Mobile suite: 1) dataflow optimization; 2) optimal selection of CIM macro dimensions to further improve macro utilization; and 3) optimal combination of multiple CIM macros. Although there is typically a stark contrast between macro-level peak and accelerator-level average throughput and energy efficiency, the aforementioned optimizations are shown to improve the macro utilization by $3.04\times $ and reduce the energy-delay product (EDP) to $0.34\times $ compared to the original macro on MLPerf Mobile inference workloads. While we exploit a digital CIM macro in this study, the findings and proposed methods remain valid for other types of CIM (such as analog CIM and analog–digital–hybrid CIM) as well.
Loading