Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

Omer Levy; Kenton Lee; Nicholas FitzGerald; Luke Zettlemoyer

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: Long short-term memory networks (LSTMs) were introduced to combat vanishing gradients in simple recurrent neural networks (S-RNNs) by augmenting them with additive recurrent connections controlled by gates. We present an alternate view to explain the success of LSTMs: the gates themselves are powerful recurrent models that provide more representational power than previously appreciated. We do this by showing that the LSTM's gates can be decoupled from the embedded S-RNN, producing a restricted class of RNNs where the main recurrence computes an element-wise weighted sum of context-independent functions of the inputs. Experiments on a range of challenging NLP problems demonstrate that the simplified gate-based models work substantially better than S-RNNs, and often just as well as the original LSTMs, strongly suggesting that the gates are doing much more in practice than just alleviating vanishing gradients.

TL;DR: Gates do all the heavy lifting in LSTMs by computing element-wise weighted sums, and removing the internal simple RNN does not degrade model performance.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/long-short-term-memory-as-a-dynamically/code)

17 Replies

Loading