TL;DR: We introduce Artificial Hippocampus Networks (AHNs), which transform lossless memory into compressed form for efficient long-context modeling.
Abstract: Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of artificial neural networks. Our method maintains a sliding window of the Transformer's KV cache as lossless short-term memory, while a learnable module termed Artificial Hippocampus Network (AHN) recurrently compresses out-of-window information into a fixed-size compact long-term memory. To validate this framework, we instantiate AHNs using modern RNN-like architectures, including Mamba2, DeltaNet, and GatedDeltaNet to augment open-weight base LLMs. We also propose an efficient self-distillation method where the base model' all parameters are frozen and only the parameters from AHNs are optimized. For inference, our method sets a default large sliding window size of 32k for attention, and AHNs activate only when the sequence length exceeds the 32k window, addressing the quadratic-complexity issue of attention that emerges at that scale. Extensive experiments on long-context benchmarks LV-Eval and InfiniteBench demonstrate that AHN-augmented models consistently outperform sliding window baselines and achieve performance comparable or even superior to full-attention models, while substantially reducing computational and memory requirements. For instance, augmenting the Qwen2.5-3B-Instruct with AHNs reduces inference FLOPs by 40.5% and memory cache by 74.0%, while improving its average score on LV-Eval (128k sequence length) from 4.41 to 5.88.
Lay Summary: Large language models are becoming increasingly useful for reading and reasoning over long documents, conversations, books, and other extended inputs. However, remembering everything in a long input exactly can be very expensive: as the input gets longer, the model needs more computation and memory, which makes long-context understanding difficult and costly.
This paper introduces Artificial Hippocampus Networks, a new way to help language models handle long inputs more efficiently. The idea is inspired by human memory. Instead of keeping every past detail in full, the model keeps recent information exactly, while older information is gradually compressed into a compact long-term memory. This allows the model to still use useful information from earlier parts of the input without storing the entire history in an expensive form.
We show that this approach can be added to existing open language models with only a small number of extra trainable parameters. In experiments on long-context benchmarks, models equipped with Artificial Hippocampus Networks use substantially less computation and memory than standard long-context attention, while achieving comparable or better performance. These results suggest a practical path toward more efficient AI systems that can process very long texts.
Link To Code: https://github.com/ByteDance-Seed/AHN
Primary Area: Deep Learning->Large Language Models
Keywords: large language models, long-context modeling
Originally Submitted PDF: pdf
Submission Number: 10560
Loading