Episodic Knowledge Binding: a New Challenge for LLM Continual Learning

ICLR 2026 Conference Submission12820 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Continual Learning, Episodic Memory, Knowledge Binding, Catastrophic Forgetting, Large Language Models, Sequential Learning, Multi-Event Retrieval, Benchmark
TL;DR: We characterize a new challenge in LLM continual learning where models trained on separate episodic events, fail to semantically bind them. We provide a benchmark and human-inspired baseline which we termed generative cued recall.
Abstract: Large language models (LLMs) excel at learning individual facts but fail at a fundamental aspect of human cognition: binding related episodes through shared elements. Unlike humans, that effortlessly retrieve all encounters with a person or visits to a location after learning each separately, we demonstrate through controlled experiments that LLMs trained on single-event question-answering pairs cannot generalize to exhaustive multi-event retrieval. We formalize Episodic Knowledge Binding as the challenge of retrieving multiple related episodes when training lacks explicit multi-event supervision. Differently from catastrophic forgetting, where models lose previously learned information, this binding failure persists even when training on aggregated data without temporal confounds, showing that models do not spontaneously develop multi-event retrieval from separate training points. Leveraging synthetic episodic narratives, we reveal a consistent binding gap across model scales (3B--13B and GPT-4.1) and narrative lengths (10--100 events): models attain high accuracy when entities appear in single events, but performance collapses when multiple related episodes must be retrieved. We find that (unsurprisingly) binding becomes harder with more events and that model scaling (more surprisingly) offers only minimal relief within our tested range. To address this problem, we propose Generative Cued Replay (GCR), that (i) inherently operates in a continual learning manner and, inspired by hippocampal memory consolidation, (ii) queries the model's parametric memory for related episodes when processing new events, (iii) synthesizing multi-event training data without storing past episodes at each new training step. This approach significantly improves binding without architectural changes, offering a practical method compared to exhaustive multi-event supervision which is both computationally infeasible as well as inherently more rigid. We release our Episodic Knowledge Binding benchmark to enable future research on this fundamental capability that LLMs are currently lacking.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 12820
Loading