It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

ICLR 2026 Conference Submission14820 Authors

19 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Test Time Memorization, Online Optimization, Recurrent Neural Networks
Abstract: Designing efficient and effective architectural backbones has been in the core of research efforts to enhance the capability of foundation models. Inspired by the human cognitive phenomenon of attentional bias—the natural tendency to prioritize certain events or stimuli—we reconceptualize neural architectures, including Transformers, Titans, and modern linear recurrent neural networks as associative memory modules with attentional bias. We define and formalize the concept of attentional bias as the internal memory objective deep learning architectures. We show that existing deep learning architectures leverage the same attentional bias based on $L_2$ loss function. Going beyond $L_2$ loss function, we present a set of alternative attentional bias configurations along with their effective approximations. We then reinterpret forgetting mechanisms in modern deep learning architectures as a form of retention regularization. Building upon these insights, we present Miras, a general framework to design deep learning architectures based on the choice of attentional bias objective, retention gate, associative memory architecture, and memory learning algorithm. Our experiments show different designs yield models with varying strengths. Furthermore, our special instances of Miras achieve exceptional performance in language modeling, commonsense reasoning, recall intensive, and time series tasks, outperforming Transformers and other modern linear recurrent models.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14820
Loading