Image Captioning with Sparse LTSM

Yujun Lin; Song Han; Yu Wang; William J. Dally

Image Captioning with Sparse LTSM

Yujun Lin, Song Han, Yu Wang, William J. Dally

09 Jul 2025 (modified: 17 Feb 2017)ICLR 2017Readers: Everyone

Abstract: Long Short-Term Memory (LSTM) is widely used to solve sequence modeling problems, for example, image captioning. We found the LSTM cells are heavily redundant. We adopt network pruning to reduce the redundancy of LSTM and introduce sparsity as new regularization to reduce overfitting. We can achieve better performance than the dense baseline while reducing the total number of parameters in LSTM by more than 80%, from 2.1 million to only 0.4 million. Sparse LSTM can improve the BLUE-4 score by 1.3 points on Flickr8k dataset and CIDER score by 1.7 points on MSCOCO dataset. We explore four types of pruning policies on LSTM, visualize the sparsity pattern, weight distribution of sparse LSTM and analyze the pros and cons of each policy.

TL;DR: We achieve better performance with 80% less parameters by introducing sparsity to LSTM

Keywords: Deep learning

Conflicts: nvidia.com, stanford.edu, tsinghua.edu.cn

3 Replies

Loading