Intrinsic Sparse LSTM using Structured Targeted Dropout for Efficient Hardware Inference

Johanna Hedlund Lindmar, Chang Gao, Shih-Chii Liu

Published: 01 Jan 2022, Last Modified: 12 May 2023AICAS 2022Readers: Everyone

Abstract: Recurrent Neural Networks (RNNs) are useful for speech recognition but their fully-connected structure leads to a large memory footprint, making it difficult to deploy them on resource-constrained embedded systems. Previous structured RNN pruning methods can effectively reduce RNN size; however, it is difficult to find a good balance between high sparsity and high task accuracy or the pruned models only lead to moderate speedup on custom hardware accelerators. This work proposes a novel structured pruning method called Structure Targeted Dropout (STD)-Intrinsic Sparse Structures (ISS) that stochastically drops grouped rows and columns of the weight matrices during training. The compressed networks are equivalent to a smaller dense network, which can be efficiently processed by Graphics Processing Units (GPUs). STD-ISS is evaluated on the TIMIT phone recognition task using Long Short-Term Memory (LSTM) RNNs. It outperforms previous state-of-the-art hardware-friendly methods on both accuracy and compression ratio. STD-ISS achieves a size compression ratio of up to 50× with <1 % accuracy loss, leading to a 19.1× speedup on the embedded Jetson Xavier NX GPU platform.

0 Replies