Keywords: memory management, conversational agent, RAG, text segmentation, prompt compression
TL;DR: A system facilitates long-term conversational agents by constructing a memory bank at segment level while applying compression-based denoising to enhance memory retrieval.
Abstract: To deliver coherent and personalized experiences in long-term conversations, existing approaches typically perform retrieval augmented response generation by constructing memory banks from conversation history at either the turn-level, session-level, or through summarization techniques. In this paper, we explore the impact of different memory granularities and present two key findings: (1) Turn-level, session-level, and summarization-based methods all exhibit limitations in terms of the accuracy of the retrieval and the semantics of the retrieved content, ultimately leading to sub-optimal responses. (2) The redundancy in natural language introduces noise, hindering precise retrieval. We demonstrate that *LLMLingua-2*, originally designed for prompt compression to accelerate LLM inference, can serve as an effective denoising method to enhance memory retrieval accuracy.
Building on these insights, we propose **SeCom**, a method that constructs the memory bank at segment level by introducing a **Se**gmentation model that partitions long-term conversations into topically coherent segments, while applying **Com**pression based denoising on memory units to enhance memory retrieval. Experimental results show that **SeCom** exhibits superior performance over baselines on long-term conversation benchmarks *LOCOMO* and *Long-MT-Bench+*.
Submission Number: 78
Loading