Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We present an alternate view to explain the success of LSTMs: the gates themselves are powerful recurrent models that provide more representational power than previously appreciated. We do this by showing that much of the LSTM's architecture can be removed, producing a restricted class of RNNs where the main recurrence computes an element-wise weighted sum of context-independent functions of the inputs. Experiments on a range of challenging NLP problems demonstrate that the simplified models work as well as the original LSTMs, strongly suggesting that the gates are doing much more in practice than just alleviating vanishing gradients.
  • TL;DR: Gates do all the heavy lifting in LSTMs by computing element-wise weighted sums, and removing the internal simple RNN does not degrade model performance.