High-Throughput, Area-Efficient, and Variation-Tolerant 3-D In-Memory Compute System for Deep Convolutional Neural Networks

Hasita Veluri, Yida Li, Jessie Xuhua Niu, Evgeny Zamburg, Aaron Voon-Yew Thean

Published: 01 Jan 2021, Last Modified: 28 Nov 2023IEEE Internet Things J. 2021Readers: Everyone

Abstract: Untethered computing using deep convolutional neural networks (DCNNs) at the edge of IoT with limited resources requires systems that are exceedingly power and area-efficient. Analog in-memory matrix-matrix multiplications enabled by emerging memories can significantly reduce the energy budget of such systems and result in compact accelerators. In this article, we report a high-throughput RRAM-based DCNN processor that boasts 7.12× area-efficiency (AE) and 6.52× power-efficiency (PE) enhancements over state-of-the-art accelerators. We achieve this by coupling a novel in-memory computing methodology with a staggered-3D memristor array. Our variation-tolerant in-memory compute method, which performs operations on signed floating-point numbers within a single array, leverages charge domain operations and conductance discretization to reduce peripheral overheads. Voltage pulses applied at the staggered bottom electrodes of the 3D-array generate a concurrent input shift and parallelize convolution operations to boost throughput. The high density and low footprint of the 3D-array, along with the modified in-memory M2M execution, improve peak AE to 9.1TOPsmm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-2</sup> while the elimination of input regeneration improves PE to 10.6TOPsW <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-1</sup> . This work provides a path towards infallible RRAM-based hardware accelerators that are fast, low power, and low area.

0 Replies