ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning

Changtai Zhu; Siyin Wang; Ruijun Feng; Kai Song; Xipeng Qiu

ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning

Changtai Zhu, Siyin Wang, Ruijun Feng, Kai Song, Xipeng Qiu

Published: 12 Jun 2025, Last Modified: 06 Jul 2025VecDB 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval, Large Language Models, Reasoning, Vector Database

Abstract: Conversational search systems require effective handling of context-dependent queries that often contain ambiguity, omission, and coreference. Conversational Query Reformulation (CQR) addresses this challenge by transforming such queries into self-contained forms suitable for standard retrieval pipelines, including those based on vector databases. Existing CQR approaches face two major limitations: a reliance on costly external supervision from human annotations or large language models, and poor alignment between the rewriting model and the downstream retriever. To address these issues, we propose ConvSearch-R1, the first self-driven framework that eliminates the need for external rewrite supervision by leveraging reinforcement learning to optimize query reformulation directly through retrieval signals obtained from vector databases. Our method introduces a novel two-stage pipeline: (1) Self-Driven Policy Warm-Up, which mitigates the cold-start problem via retrieval-guided self-distillation, and (2) Retrieval-Guided Reinforcement Learning, which employs a rank-incentive reward shaping mechanism to overcome the sparsity of traditional retrieval metrics. Extensive evaluations on the TopiOCQA and QReCC datasets show that ConvSearch-R1 significantly outperforms previous state-of-the-art methods, achieving over 10\% improvement on the challenging TopiOCQA dataset while using only a 3B parameter model without any external supervision. Our results highlight the practical utility of vector databases in enabling effective, self-supervised reformulation strategies for conversational search applications.

Submission Number: 13

Loading