Abstract: Retrieval-augmented generation (RAG) enables LLMs to ground responses in external knowledge, but long-term,
multi-session conversations still suffer from implicit recall failures: when current user queries lack lexical overlap
with earlier facts (e.g., preferences), standard dense retrieval and long-context prompting often miss the most
relevant memories. We present a dialogue-aware RAG system that jointly addresses what to store and how to
retrieve under constraints. Our design extracts durable user facts into a lightweight memory graph, enriches
queries with conversational cues, performs hybrid retrieval, and uses a budget-aware router to balance quality
and serving cost. On our Implicit Preference Recall benchmark, the system lifts Recall@10 to 0.70 (vs. 0.58 for
dense-only) and improves nDCG@10 from 0.41 to 0.51. The system also reduces cross-modality disagreement by
47% and achieves a 81% cost reduction compared to long-context methods.
Topics: Agentic Systems: Data and knowledge management for agentic AI, Agentic Systems: Systems optimizations for agentic AI applications
Submission Number: 17
Loading