Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

Boqin Yuan; Yue Su; Kun Yao

Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

Boqin Yuan, Yue Su, Kun Yao

Published: 03 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop MemAgents PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: agent memory, information retrieval, memory systems

TL;DR: In memory augmented LLM agents, retrieval quality matters much more than write strategy, and improving retrieval leads to substantially larger performance gains than more complex memory construction.

Abstract: Memory-augmented LLM agents store and retrieve information from prior interactions, yet the relative importance of how memories are written versus how they are retrieved remains unclear. We introduce a diagnostic framework that analyzes how performance differences manifest across write strategies, retrieval methods, and memory utilization behavior, and apply it to a $3 \times 3$ study crossing three write strategies (raw chunks, Mem0-style fact extraction, MemGPT-style summarization) with three retrieval methods (cosine, BM25, hybrid reranking). On LoCoMo, retrieval method is the dominant factor: average accuracy spans $20$ points across retrieval methods ($57.1\%$ to $77.2\%$) but only $3$-$8$ points across write strategies. Raw chunked storage, which requires zero LLM calls, matches or outperforms expensive lossy alternatives, suggesting that current memory pipelines may discard useful context that downstream retrieval mechanisms fail to compensate for. Failure analysis shows that performance breakdowns most often manifest at the retrieval stage rather than at utilization. We argue that, under current retrieval practices, improving retrieval quality yields larger gains than increasing write-time sophistication.

Submission Number: 98

Loading