Abstract: Recent algorithmic progression has brought competitive classification accuracy despite constraining neural networks to binary weights (+1/-1). These findings show remarkable optimization opportunities to eliminate the need for computationally-intensive multiplications, reducing memory access and storage. In this paper, we present ParaPIM architecture, which transforms current Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) sub-arrays to massively parallel computational units capable of running inferences for Binary-Weight Deep Neural Networks (BWNNs). ParaPIM's in-situ computing architecture can be leveraged to greatly reduce energy consumption dealing with convolutional layers, accelerate BWNNs inference, eliminate unnecessary off-chip accesses and provide ultra-high internal bandwidth. The device-to-architecture co-simulation results indicate ~4x higher energy efficiency and 7.3x speedup over recent processing-in-DRAM acceleration, or roughly 5x higher energy-efficiency and 20.5x speedup over recent ASIC approaches, while maintaining inference accuracy comparable to baseline designs.
0 Replies
Loading