References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
Keywords: Finetuning LLMs, Preference optimization, Self-supervised learning, Conversational query reformulation
TL;DR: We present DualReform, a novel reference-free preference optimization framework for conversational query reformulation that generates pseudo reference passages from commonly-encountered conversational datasets.
Abstract: Conversational query reformulation (CQR) has become indispensable for improving retrieval in dialogue-based applications. However, existing approaches typically rely on reference passages for optimization, which are **impractical** to acquire in real-world scenarios. To address this limitation, we introduce a novel **reference-free** preference optimization framework ***DualReform*** that generates **pseudo reference passages** from **commonly-encountered** conversational datasets containing only queries and responses. DualReform attains this goal through two key innovations: (1) **response-based inference**, where responses serve as proxies to infer pseudo reference passages, and (2) **response refinement via the dual-role of CQR**, where a CQR model refines responses based on the shared objectives between response refinement and CQR. Despite not relying on reference passages, ***DualReform*** achieves 96.9--99.1% of the retrieval accuracy attainable only with reference passages and surpasses the state-of-the-art method by up to 31.6%.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8489
Loading