Hippoformer: Integrating Hippocampus-inspired Spatial Memory with Transformers

ICLR 2026 Conference Submission22528 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hippocampus, grid cell, spatial reasoning, relational memory, transformer
TL;DR: We propose Hippoformer, a hybrid architecture that integrates hippocampal-inspired structured spatial memory with transformers, enabling scalable spatial reasoning and outperforming existing models in 2D and 3D grid tasks.
Abstract: Transformers form the foundation of modern generative AI, yet their key–value memory lacks inherent spatial priors, constraining their capacity for spatial reasoning. In contrast, neuroscience points to the hippocampal–entorhinal system, where the medial entorhinal cortex provides structural codes and the hippocampus binds them with sensory codes to enable flexible spatial inference. However, existing hippocampus models such as the Tolman-Eichenbaum Machine (TEM) suffer from inefficiencies due to outer-product operations or context-length bottlenecks in self-attention, limiting their scalability and integration into modern deep learning frameworks. To bridge this gap, we propose mm-TEM, an efficient and scalable structural spatial memory model that leverages meta-MLP relational memory to improve training efficiency, form grid-like representations, and reveal a novel link between prediction horizon and grid scales. Extensive evaluation shows its strong generalization on long sequences, large-scale environments, and multi-step prediction, with analyses confirming that its advantages stem from explicit understanding of spatial structures. Building on this, we introduce Hippoformer, which integrates mm-TEM with Transformer to combine structural spatial memory with precise working memory and abstraction, achieving superior generalization in both 2D and 3D prediction tasks and highlighting the potential of hippocampal-inspired architectures for complex domains. Overall, Hippoformer represents a initial step toward seamlessly embedding structured spatial memory into foundation architectures, offering a potential scalable path to endow deep learning models with spatial intelligence.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 22528
Loading