Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture
Abstract: Convolutional neural networks (CNNs), despite their broad applications, are constrained by
high computational and memory requirements. Existing compression techniques often neglect approximation
errors incurred during training. This work proposes approximation-aware-training, in which group of weights
are approximated using a differential approximation function, resulting in a new weight matrix composed of
approximation function’s coefficients (AFC). The network is trained using backpropagation to minimize
the loss function with respect to AFC matrix with linear and quadratic approximation functions preserving
accuracy at high compression rates. This work extends to implement an compute-in-memory architecture for
inference operations of approximate neural networks. This architecture includes a mapping algorithm that
modulates inputs and map AFC to crossbar arrays directly, eliminating the need to predict approximated
weights for evaluating output. This reduces the number of crossbars, lowering area and energy consumption.
Integrating magnetic random-access memory-based devices further enhances performance by reducing latency and energy consumption. Simulation results on approximated LeNet-5, VGG8, AlexNet, and ResNet18
models trained on the CIFAR-100 dataset showed reductions of 54%, 30%, 67%, and 20% in the total number
of crossbars, respectively, resulting in improved area efficiency. In the ResNet18 architecture, latency and
energy consumption decreased by 95% and 93.3% with spin-orbit torque (SOT) based crossbars compared
to RRAM-based architectures.
Loading