Generation Space Preference Optimization for Few Shot Dialogue State Tracking

Generation Space Preference Optimization for Few Shot Dialogue State Tracking

ACL ARR 2025 May Submission6262 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Dialogue State Tracking (DST) is an essential component of task-oriented dialogue systems. Few-shot DST effectively reduces the reliance on large-scale annotated data, but suffers from insufficient training. In this work, we propose a novel training method called Generation Space Preference Optimization (GSPO) to mitigate insufficient training for few-shot DST, which extends preference optimization to DST and generates preference data by the model's generation space and the reuse of supervised fine-tuned (SFT) data, free of extra reward models and additional preference data. Experimental results demonstrate that our method achieves competitive performance compared to those using 100 B-scale LLMs and shows better performance with over 5% of the whole training data (400 training samples).

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: dialogue state tracking, low resource, task-oriented

Contribution Types: Approaches to low-resource settings

Languages Studied: english

Submission Number: 6262

Loading