Mogrifier LSTM

Gábor Melis; Tomáš Kočiský; Phil Blunsom

Mogrifier LSTM

Gábor Melis, Tomáš Kočiský, Phil Blunsom

Published: 20 Dec 2019, Last Modified: 22 Jun 2025ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: lstm, language modelling

TL;DR: An LSTM extension with state-of-the-art language modelling results.

Abstract: Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3–4 perplexity points on Penn Treebank and Wikitext-2, and 0.01–0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

Code: [![github](/images/github_icon.svg) deepmind/lamb](https://github.com/deepmind/lamb) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=SJe5P6EYvS)

Data: [Hutter Prize](https://paperswithcode.com/dataset/hutter-prize), [Penn Treebank](https://paperswithcode.com/dataset/penn-treebank), [WikiText-2](https://paperswithcode.com/dataset/wikitext-2)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/mogrifier-lstm/code)

Original Pdf: pdf

7 Replies

Loading