It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Ali Behrouz; Meisam Razaviyayn; Peilin Zhong; Vahab Mirrokni

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, Vahab Mirrokni

Published: 26 Jan 2026, Last Modified: 04 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test Time Memorization, Online Optimization, Recurrent Neural Networks

Abstract: Designing efficient and effective architectural backbones has been in the core of research efforts to enhance the capability of foundation models. Inspired by the human cognitive phenomenon of attentional bias—the natural tendency to prioritize certain events or stimuli—we reconceptualize neural architectures, including Transformers, Titans, and modern linear recurrent neural networks as associative memory modules with attentional bias. We define and formalize the concept of attentional bias as the internal memory objective deep learning architectures. We show that existing deep learning architectures leverage the same attentional bias based on $L_2$ loss function. Going beyond $L_2$ loss function, we present a set of alternative attentional bias configurations along with their effective approximations. We then reinterpret forgetting mechanisms in modern deep learning architectures as a form of retention regularization. Building upon these insights, we present Miras, a general framework to design deep learning architectures based on the choice of attentional bias objective, retention gate, associative memory architecture, and memory learning algorithm. Our experiments show different designs yield models with varying strengths. Furthermore, our special instances of Miras achieve exceptional performance in language modeling, commonsense reasoning, recall intensive, and time series tasks, outperforming Transformers and other modern linear recurrent models.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14820

Loading