Generative History Augmentation for Context-Aware Dense Retrieval in Conversational Search

Generative History Augmentation for Context-Aware Dense Retrieval in Conversational Search

ACL ARR 2025 May Submission4829 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Conversational search needs to accurately understand the actual search intent in multi-turn interactions to retrieve relevant passages. Traditional conversational query rewriting methods primarily rely on manually rewritten queries. In contrast, conversational dense retrieval approaches directly utilize the entire conversation context as input, which introduces redundant noise and is further constrained by the limited availability of human-annotated supervisory signals in the dataset. To address these limitations, we propose the **G**enerative **H**istory Augmentation for Context-**A**ware **D**ense **R**etrieval (**GHADR**) system. Initially, we propose an iterative prompt refinement mechanism to leverage large language models (LLMs) to augment the conversation history and generate high-quality rewritten queries. Subsequently, we implement a semantically guided clustering algorithm to mine additional supervision signals for model training. Finally, we train a context-aware passage retriever using both the rewritten queries and the extracted signals from historical turns. Experiments on four public conversational search datasets demonstrate the effectiveness of GHADR in improving retrieval performance and reducing reliance on human-annotated signals.

Paper Type: Long

Research Area: Information Retrieval and Text Mining

Research Area Keywords: passage retrieval, dense retrieval, contrastive learning

Contribution Types: Model analysis & interpretability

Languages Studied: english

Submission Number: 4829

Loading