Bounded Working Memory for LLMs: Reproducing Human Recall Dynamics

ICLR 2026 Conference Submission20058 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: memory, chunking, LLMs
TL;DR: We develop a chunking based memory module for LLMs modeled on working memory and Miller's number.
Abstract: We introduce a cognitively inspired working memory module for large language models (LLMs) that enables efficient narrative recall under capacity constraints. Our approach decomposes input text into structured memory chunks using four methods—semantic, phrase, sentence, and schematic chunking—and integrates prioritization strategies based on salience, connectivity, and temporal decay. These mechanisms enforce a bounded memory capacity, inspired by Miller’s number, while preserving information critical for downstream recall. We evaluate the framework on the Naturalistic Free Recall dataset, where models must reconstruct long-form narratives from compressed memory representations. Memory-augmented LLMs achieve higher semantic similarity to human recall transcripts than random baselines, while exhibiting structured retrieval effects such as primacy and recency. These results demonstrate that chunk-based working memory improves the plausibility and efficiency of LLM recall, offering a scalable approach for constrained-context reasoning and memory alignment.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 20058
Loading