Extending LLM Context via Associative Recurrent Memory

Gleb Kuzmin; Ivan Rodkin; Aydar Bulatov; Yuri Kuratov; Timothy Baldwin; Mikhail Burtsev; Artem Shelmanov

Extending LLM Context via Associative Recurrent Memory

Gleb Kuzmin, Ivan Rodkin, Aydar Bulatov, Yuri Kuratov, Timothy Baldwin, Mikhail Burtsev, Artem Shelmanov

Published: 03 Mar 2026, Last Modified: 26 Mar 2026NFAM 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: associative memory, recurrent transformer, LLM

Abstract: Closed-source LLMs support context windows of up to 1M tokens and beyond, while smaller open-source models still have a limited window of 32K-128K tokens. However, for some tasks, it is necessary to use a small local LLM to avoid any data leaks; some of these tasks require domain-specific knowledge and long-context understanding. To address this gap, we introduce two domain-specific long-context datasets, ManyTypes-long and GovReport-long, and present a practical recipe for extending short-context LLMs using the Associative Recurrent Memory Transformer (ARMT) architecture. Finally, we analyze the associative memory in trained ARMT models and show that associative memory primarily benefits from representations in the middle and upper layers of the transformer, allowing us to reduce the size of the models by removing redundant associative memory from other layers. Our results demonstrate an effective approach to enabling long-context capabilities in small, privacy-preserving LLMs for domain specific tasks.

Submission Number: 22

Loading