ROXY: Generative Indexing and Conflict-Aware Reranking for Long-Horizon Conversational Memory

ROXY: Generative Indexing and Conflict-Aware Reranking for Long-Horizon Conversational Memory

ACL ARR 2026 January Submission7966 Authors

06 Jan 2026 (modified: 07 Jun 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative indexing, conversational long-term memory, retrieval-augmented generation (RAG), conflict-aware reranking

Abstract: Long-horizon conversational assistants must answer questions grounded in multi-session dialogue, yet external memory faces two key challenges: (1) retrieval mismatch-queries often lack cues to surface relevant memories, and (2) conflict resolution-retrieved evidence contains temporal updates, negations, or overrides that similarity- and rule-based methods cannot reliably handle. We propose ROXY, a retrieval-oriented memory framework that integrates Generative Indexing and Conflict-Aware Reranking to address both challenges. For retrieval, ROXY performs Generative Indexing (GI): an LLM generates anticipatory cue questions for each memory chunk and indexes them alongside the original content. For conflicts, a conflict-aware LLM judge reasons over high-recall candidates conditioned on the query, selecting logically coherent evidence without hand-crafted rules-whereas similarity-based methods fail to distinguish semantically similar but contradictory memories. On the LoCoMo benchmark under identical configurations as MemInsight, ROXY achieves 89.3 Recall@5 and 41.2 F1, outperforming MemInsight by +28.8 and +11.1 points respectively, with strong gains on single-hop, temporal, multi-hop, and adversarial questions. These results show that anticipatory indexing combined with adaptive conflict reasoning offers a scalable solution for memory-grounded conversational agents. Code and prompts will publicly available.

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: retrieval-augmented generation, passage retrieval, dense retrieval, re-ranking, document representation, agent memory, LLM agents, tool use, conversational modeling, conversational QA, question generation, logical reasoning

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 7966

Loading