Image Captioning with Sparse LTSMDownload PDF

19 Apr 2024 (modified: 17 Feb 2017)ICLR 2017 workshop submissionReaders: Everyone
Abstract: Long Short-Term Memory (LSTM) is widely used to solve sequence modeling problems, for example, image captioning. We found the LSTM cells are heavily redundant. We adopt network pruning to reduce the redundancy of LSTM and introduce sparsity as new regularization to reduce overfitting. We can achieve better performance than the dense baseline while reducing the total number of parameters in LSTM by more than 80%, from 2.1 million to only 0.4 million. Sparse LSTM can improve the BLUE-4 score by 1.3 points on Flickr8k dataset and CIDER score by 1.7 points on MSCOCO dataset. We explore four types of pruning policies on LSTM, visualize the sparsity pattern, weight distribution of sparse LSTM and analyze the pros and cons of each policy.
TL;DR: We achieve better performance with 80% less parameters by introducing sparsity to LSTM
Keywords: Deep learning
Conflicts: nvidia.com, stanford.edu, tsinghua.edu.cn
3 Replies

Loading