Utility-Based Preference Training for Effective Synthetic Text Classification

ACL ARR 2025 May Submission4448 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We propose a novel approach for generating high-quality synthetic text data for multiclass text classification by leveraging large language models (LLMs) with preference-based fine-tuning. Our method modifies the Direct Preference Optimization (DPO) framework by incorporating a margin-based utility signal that encourages class-discriminative text generation. This margin-based variant, which we call Utility DPO (U-DPO), promotes the generation of synthetic samples with clearer label-specific features.We evaluate our method on two academic document classification benchmarks, Arxiv and WOS-11967, which cover 11 and 33 classes, respectively. Synthetic data generated by a language model trained with U-DPO leads to better classification performance than data generated by a baseline LLM or a model trained with standard DPO. Notably, U-DPO yields consistent improvements in classification accuracy, both when models are trained exclusively on synthetic data and when synthetic data is used to augment limited real data, highlighting the practical value of preference-optimized synthetic datasets. In general, our work demonstrates that incorporating task-specific utility signals into LLM training is a promising direction to generate effective synthetic data for text classification, enabling improved downstream performance without additional human annotation.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: generative models,data augmentation,reinforcement learning
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: english
Submission Number: 4448
Loading