Keywords: model evaluation, long-context language models, working memory limitations, contextual interference, In-context learning, proactive interference, robustness & reliability, top‑down control, cognitive‑science–inspired evaluation
TL;DR: LLMs fail to retrieve recent updates when earlier input gets in the way, revealing working-memory-like limits beyond a model's context length.
Abstract: Information retrieval in Large Language Models (LLMs) is increasingly recognized as intertwined with generation capabilities rather than mere lookup. While longer contexts are often assumed to improve retrieval, the mechanics of intra-context interference, as instantiated in MRCR test, remain understudied. To address this, we adapt the proactive interference (PI) paradigm from cognitive science, where earlier information disrupts recall of newer updates. In humans, susceptibility to such interference is inversely linked to working memory capacity. We introduce PI-LLM, an evaluation to measure LLM working memory by sequentially streams co-referenced key–value updates, where the same key is sequentially rebound to multiple values, and queries only the final values. Although these final values are clearly positioned just before the query, LLM retrieval accuracy declines log-linearly toward zero as co-referenced interference accumulates; errors arise from retrieving previously overwritten values. Attempts to mitigate interference via prompt engineering (e.g., instructing models to ignore earlier input) yield limited success. These findings reveal a fundamental constraint on LLMs’ ability to disentangle interference and flexibly manipulate binding information, suggesting a working memory bottleneck beyond mere context access.
PI-LLM bridges (i)LLM performance in MRCR tests and (ii) studies of entity binding in LLM mechanistic interpretations. And provides a cognitive-science inspired measurement of LLM working-memory-like capacity.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 18089
Loading