Abstract: Conversational search needs to accurately understand the actual search intent in multi-turn interactions to retrieve relevant passages. Traditional conversational query rewriting methods primarily rely on manually rewritten queries. In contrast, conversational dense retrieval approaches directly utilize the entire conversation context as input, which introduces redundant noise and is further constrained by the limited availability of human-annotated supervisory signals in the dataset. To address these limitations, we propose the **G**enerative **H**istory Augmentation for Context-**A**ware **D**ense **R**etrieval (**GHADR**) system. Initially, we propose an iterative prompt refinement mechanism to leverage large language models (LLMs) to augment the conversation history and generate high-quality rewritten queries. Subsequently, we implement a semantically guided clustering algorithm to mine additional supervision signals for model training. Finally, we train a context-aware passage retriever using both the rewritten queries and the extracted signals from historical turns. Experiments on four public conversational search datasets demonstrate the effectiveness of GHADR in improving retrieval performance and reducing reliance on human-annotated signals.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: passage retrieval, dense retrieval, contrastive learning
Contribution Types: Model analysis & interpretability
Languages Studied: english
Submission Number: 4829
Loading