Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Xi Chen; Chang Gao; Zuowen Wang; Longbiao Cheng; Sheng Zhou; Shih-Chii Liu; Tobi Delbruck

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Xi Chen, Chang Gao, Zuowen Wang, Longbiao Cheng, Sheng Zhou, Shih-Chii Liu, Tobi Delbruck

Published: 01 Nov 2023, Last Modified: 22 Dec 2023MLNCP PosterEveryoneRevisionsBibTeX

Keywords: Recurrent Neural Networks, temporal sparsity, training optimization, edge computing, hardware acceleration

TL;DR: This paper proposes an RNN training algorithm that exploits temporal sparsity in both forward and backward propagation phases which can significantly reduce the computation costs and the requirements for training on edge devices.

Abstract: Recurrent Neural Networks (RNNs) are useful in temporal sequence tasks. However, training RNNs involves dense matrix multiplications which require hardware that can support a large number of arithmetic operations and memory accesses. Implementing online training of RNNs on the edge calls for optimized algorithms for an efficient deployment on hardware. Inspired by the spiking neuron model, the Delta RNN exploits temporal sparsity during inference by skipping over the update of hidden states from those inactivated neurons whose change of activation across two timesteps is below a defined threshold. This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge. Due to the symmetric computation graphs of forward and backward propagation during training, the gradient computation of inactivated neurons can be skipped. Results show a reduction of ∼80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%. Additionally, we show that our training algorithm will be useful for online incremental learning on edge devices with limited computing resources.

Submission Number: 5

Loading