Compressive Transformers for Long-Range Sequence Modelling

Jack W. Rae; Anna Potapenko; Siddhant M. Jayakumar; Chloe Hillier; Timothy P. Lillicrap

Compressive Transformers for Long-Range Sequence Modelling

Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Chloe Hillier, Timothy P. Lillicrap

Published: 20 Dec 2019, Last Modified: 12 Oct 2025ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Long-range transformer using a compressive memory, achieves sota in wikitext-103 and enwik8 LM benchmarks, release a new book-level LM benchmark PG-19.

Abstract: We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

Keywords: memory, language modeling, transformer, compression

Code: [![Papers with Code](/images/pwc_icon.svg) 6 community implementations](https://paperswithcode.com/paper/?openreview=SylKikSYDH)

Data: [PG-19](https://paperswithcode.com/dataset/pg-19), [Billion Word Benchmark](https://paperswithcode.com/dataset/billion-word-benchmark), [BookCorpus](https://paperswithcode.com/dataset/bookcorpus), [CBT](https://paperswithcode.com/dataset/cbt), [Hutter Prize](https://paperswithcode.com/dataset/hutter-prize), [LAMBADA](https://paperswithcode.com/dataset/lambada), [NarrativeQA](https://paperswithcode.com/dataset/narrativeqa), [WikiText-103](https://paperswithcode.com/dataset/wikitext-103), [WikiText-2](https://paperswithcode.com/dataset/wikitext-2)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/compressive-transformers-for-long-range/code)

Original Pdf: pdf

19 Replies

Loading