Keywords: LLM Agents, Graph-structured Memory, Reinforcement Learning
Abstract: Large Language Model (LLM) agents face a fundamental bottleneck called \emph{context flooding} in multi-turn information retrieval: as evidence accumulates across interaction turns, their context windows become saturated with redundant, conflicting, and outdated facts.
Existing research either compresses interaction history into rolling summaries, sacrificing relational precision, or relies on retrieval-augmented pipelines that lack mechanisms for resolving knowledge conflicts and maintaining a globally consistent belief state. In this work, we introduce GRAM, a reinforcement learning framework designed to train an agent to actively manage and utilize a dynamic graph-structured memory. Rather than passively accumulating retrieved text, the agent learns to incrementally construct and revise a coherent knowledge graph that serves as its persistent belief state, resolving contradictions and preserving logical relationships in memory. Furthermore, We employ Group Relative Policy Optimization to train small language models (under 4B) to master these complex memory-governance behaviors. Experimental results across multiple mainstream question-answering benchmarks demonstrate the superiority of the GRAM framework over existing agentic memory baselines and conventional RAG systems.
Paper Type: Long
Research Area: LLM agents
Research Area Keywords: AI / LLM Agents, Language Modeling, Question Answering
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2006
Loading