Image Captioning with Sparse LTSMDownload PDF

03 May 2025 (modified: 17 Feb 2017)ICLR 2017Readers: Everyone
Abstract: Long Short-Term Memory (LSTM) is widely used to solve sequence modeling problems, for example, image captioning. We found the LSTM cells are heavily redundant. We adopt network pruning to reduce the redundancy of LSTM and introduce sparsity as new regularization to reduce overfitting. We can achieve better performance than the dense baseline while reducing the total number of parameters in LSTM by more than 80%, from 2.1 million to only 0.4 million. Sparse LSTM can improve the BLUE-4 score by 1.3 points on Flickr8k dataset and CIDER score by 1.7 points on MSCOCO dataset. We explore four types of pruning policies on LSTM, visualize the sparsity pattern, weight distribution of sparse LSTM and analyze the pros and cons of each policy.
TL;DR: We achieve better performance with 80% less parameters by introducing sparsity to LSTM
Keywords: Deep learning
Conflicts: nvidia.com, stanford.edu, tsinghua.edu.cn
3 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview