Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Supervised fine-tuning (SFT) has become a crucial step for aligning pretrained large language models (LLMs) using supervised datasets of input-output pairs. However, despite being supervised, SFT is inherently limited by its generative training objective. To address its limitations, the existing common strategy is to follow SFT with a separate phase of preference optimization (PO), which relies on either human-labeled preference data or a strong reward model to guide the learning process. In this paper, we address the limitations of SFT by exploring one of the most successful techniques in conventional supervised learning: discriminative learning. We introduce **Discriminative Fine-Tuning (DFT)**, an improved variant of SFT, which mitigates the burden of collecting human-labeled preference data or training strong reward models. Unlike SFT that employs a generative approach and overlooks negative data, DFT adopts a **discriminative paradigm** that increases the probability of positive answers while suppressing potentially negative ones, aiming for **data prediction** instead of token prediction. Our contributions include: (i) a discriminative probabilistic framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input; (ii) efficient algorithms to optimize this discriminative likelihood; and (iii) extensive experiments demonstrating DFT's effectiveness, achieving performance better than SFT and comparable to if not better than SFT→PO. The code can be found at https://github.com/Optimization-AI/DFT.
Lay Summary: Training large language models typically involves supervised fine-tuning (SFT) to teach them to generate good responses from input-output examples. However, SFT has a critical limitation — it only teaches what to say, not what to avoid saying, because it focuses on generating correct tokens rather than distinguishing good answers from bad ones. To address this, researchers commonly add a second training phase called preference optimization, which requires expensive human-labeled preference data or reward models. We developed Discriminative Fine-Tuning (DFT), which solves SFT's limitations in a single training stage without needing human preference data or reward models. Instead of just learning to produce good answers like SFT, our method adopts a discriminative approach that increases the probability of correct responses while suppressing potentially incorrect ones. We achieve this by having the model generate its own negative examples during training, then using a framework that compares good answers against bad ones — shifting from token prediction to data prediction. Our experiments show that DFT outperforms standard SFT and matches two-stage training methods. On mathematical reasoning, DFT achieved state-of-the-art results among 7-billion parameter models, reaching 79.15% accuracy on GSM8K. This approach makes high-quality training more accessible by eliminating expensive human annotation requirements.
Link To Code: https://github.com/Optimization-AI/DFT
Primary Area: Deep Learning->Large Language Models
Keywords: Large Language Models, Discriminative Likelihood, Supervised Finetuning
Submission Number: 7541
Loading