PUM-Net: Plastic Unified Memory Network with Associative Interaction for Long-Context State Space Models

ICLR 2026 Conference Submission16691 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: State Space Model, Long Context, Internal Memory, External Memory
Abstract: Recent Mamba-family State Space Models (SSMs) leverage linear recurrent dynamics to achieve efficiency in long-sequence modeling. However, their exponential decay kernels inevitably cause rapid forgetting, making it difficult to recall information once sequence lengths far exceed the training horizon. Inspired by advances in Transformer-based architectures such as Native Sparse Attention (NSA), which employ internal chunk memory to alleviate long-term forgetting, we extend Mamba with a chunk-wise internal long-term memory that improves the retrieval of distant context. While this enhances in-context recall, a more fundamental challenge for long-context models is the ability to access and integrate vast external world knowledge without compromising efficiency. Existing retrieval-augmented generation (RAG) approaches attempt to address this by appending retrieved documents to queries, which substantially increases training cost and fails to fully integrate internal and external memory representations. To overcome these limitations, we propose the Plastic Unified Memory Network (PUM-Net), a unified dual-memory architecture that, for the first time, enables joint pre-training over both dynamic internal memory and static, pre-encoded external knowledge. This plastic unification allows external memory to refine internal states during training, enabling bidirectional interaction without inflating sequence length, thereby supporting more effective long-context modeling and achieving substantial improvements across long-range challenging benchmarks.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16691
Loading