References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation

Doyoung Kim; Youngjun Lee; Joeun Kim; Jihwan Bang; Hwanjun Song; Susik Yoon; Jae-Gil Lee

References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation

Doyoung Kim, Youngjun Lee, Joeun Kim, Jihwan Bang, Hwanjun Song, Susik Yoon, Jae-Gil Lee

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Finetuning LLMs, Preference optimization, Self-supervised learning, Conversational query reformulation

TL;DR: We present DualReform, a novel reference-free preference optimization framework for conversational query reformulation that generates pseudo reference passages from commonly-encountered conversational datasets.

Abstract: Conversational query reformulation (CQR) has become indispensable for improving retrieval in dialogue-based applications. However, existing approaches typically rely on reference passages for optimization, which are **impractical** to acquire in real-world scenarios. To address this limitation, we introduce a novel **reference-free** preference optimization framework ***DualReform*** that generates **pseudo reference passages** from **commonly-encountered** conversational datasets containing only queries and responses. DualReform attains this goal through two key innovations: (1) **response-based inference**, where responses serve as proxies to infer pseudo reference passages, and (2) **response refinement via the dual-role of CQR**, where a CQR model refines responses based on the shared objectives between response refinement and CQR. Despite not relying on reference passages, ***DualReform*** achieves 96.9--99.1% of the retrieval accuracy attainable only with reference passages and surpasses the state-of-the-art method by up to 31.6%.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 8489

Loading