A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

Shayne Longpre; Sabeek Pradhan; Caiming Xiong; Richard Socher

A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher

19 Jun 2025 (modified: 21 Jul 2022)Submitted to ICLR 2017Readers: Everyone

Abstract: LSTMs have become a basic building block for many deep NLP models. In recent years, many improvements and variations have been proposed for deep sequence models in general, and LSTMs in particular. We propose and analyze a series of architectural modifications for LSTM networks resulting in improved performance for text classification datasets. We observe compounding improvements on traditional LSTMs using Monte Carlo test-time model averaging, deep vector averaging (DVA), and residual connections, along with four other suggested modifications. Our analysis provides a simple, reliable, and high quality baseline model.

TL;DR: Relatively simple augmentations to the LSTM, such as Monte Carlo test time averaging, deep vector averaging, and residual connections, can yield massive accuracy improvements on text classification datasets.

Conflicts: salesforce.com, zoox.com, stanford.edu, cs.stanford.edu

Keywords: Natural language processing, Deep learning, Supervised Learning

8 Replies

Loading