Titans: Learning to Memorize at Test Time

Ali Behrouz; Peilin Zhong; Vahab Mirrokni

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, Vahab Mirrokni

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Attention, Associative Memory, Long-term Memory, Hybrid Models

Abstract: Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long-past information. We show that this neural memory has the advantage of fast parallelizable training. From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short-term memory, while neural memory due to its ability to memorize the data, acts as a long-term, more persistent, memory. Based on these two modules, we introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture. Our experimental results on language modeling, common-sense reasoning, and time series tasks show that Titans are effective compared to baselines, while they can effectively scale to larger context window in needle-in-haystack tasks.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 5479

Loading