Improved Deep Neural Network Hardware-Accelerators Based on Non-Volatile-Memory: The Local Gains Technique

Irem Boybat, Carmelo di Nolfo, Stefano Ambrogio, Martina Bodini, Nathan C. P. Farinha, Robert M. Shelby, Pritish Narayanan, Severin Sidler, Hsinyu Tsai, Yusuf Leblebici, Geoffrey W. Burr

Published: 2017, Last Modified: 12 May 2025ICRC 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Cognitive computing - which learns to do useful computational tasks from data, rather than by being programmed explicitly - represents a fundamentally new form of computing. Unfortunately, Deep Neural Networks (DNNs) learn from repeated exposure to huge datasets, which currently requires extensive computation capabilities (such as many GPUs) working together over days or weeks of time. To accelerate this process, our group is investigating hardware accelerators for backpropagation training based on analog Non-Volatile Memory (NVM). This paper describes a novel Local Gains (LG) method which can increase network accuracy, extend the range of acceptable learning rates, and reduce overall weight-update activity and thus the corresponding power consumption. We analyze the impact of different activation functions and the corresponding dynamic range of input and output neurons. We then show that the use of non-negative neuron-activations offers advantages within a crossbar implementation (without degrading accuracy), by causing the sign of the weight-update to depend only on the sign of the backpropagated error. Then we introduce LG: a neuron-centric (NOT synapse-centric) modulation of the learning rate based on the sign of successive weight updates. The concept of Safety Margin (SM) - the margin by which the correct output neuron exceeded (or failed to exceed) the strongest incorrect neuron - is introduced, providing a novel way to gauge the robustness of DNN classification performance. We use device-aware DNN simulations to demonstrate higher accuracy, reduced sensitivity to network hyperparameters, and an overall improved training process, as well as lower network activity and reduced power consumption.