PipeCIM: A High-Throughput Computing-In-Memory Microprocessor With Nested Pipeline and RISC-V Extended Instructions

Tingran Chen; Wenjia Wang; Jiaqi Chen; Haotian Fu; Wente Yi; Bojun Cheng; He Zhang; Biao Pan

PipeCIM: A High-Throughput Computing-In-Memory Microprocessor With Nested Pipeline and RISC-V Extended Instructions

Tingran Chen, Wenjia Wang, Jiaqi Chen, Haotian Fu, Wente Yi, Bojun Cheng, He Zhang, Biao Pan

Published: 01 Jan 2024, Last Modified: 28 Sept 2024IEEE Trans. Circuits Syst. I Regul. Pap. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The large number of multiply accumulate (MAC) operations in Convolutional Neural Network (CNN) leads to substantial data migration and computation. Although computing-in-memory (CIM) proves to be a promising paradigm for MAC operations, high throughput CNN accelerator still confronts bottlenecks from: the low MAC utilization and the uncessary off-chip memory access. In this paper, we propose a high throughput CIM-based CNN accelerator PipeCIM with three hierarchies of pipelines: Intra-Macro, Near-Memory and Tile-Level. The Intra-Macro Pipeline parallelly executes data transfer and in-memory-computing (IMC) operations. The Near-Memory Pipeline alleviates memory access for pooling and data reshaping. The Tile-Level Pipeline establishes a layer-wise pipeline to further improve the throughput while reducing control complexity. PipeCIM introduces the nested scheme and a Unidirectional Divergent Connection Protocol (UDTCP) to simplify the control of data flow with the help of customized RISC-V instructions. To validate our design, PipeCIM was prototyped in 55 nm process node, achieving energy efficiency of 133.8 TOPS/W and peak throughput of 819 GOPS with a 16KB CIM array, which can accelerate VGG-16 to $128.56\times $ or Inception to $19.754\times $ compared to the baseline.

Loading