Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks

Published: 22 Jun 2025, Last Modified: 22 Jun 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Preference Optimization, Reinforcement Learning, Post-Training
Abstract: This study evaluates Direct Preference Optimization (DPO) and its variants for aligning Large Language Models (LLMs) with human preferences, testing three configurations: (1) with Supervised Fine-Tuning (SFT), (2) without SFT, and (3) without SFT but using an instruction-tuned model. We further investigate how training set size influences model performance. Our evaluation spans 13 benchmarks—covering dialogue, reasoning, mathematical problem-solving, question answering, truthfulness, MT-Bench, Big Bench, and the Open LLM Leaderboard. We find that: (1) alignment methods often achieve near-optimal performance even with smaller subsets of training data; (2) although they offer limited improvements on complex reasoning tasks, they enhance mathematical problem-solving; and (3) using an instruction-tuned model improves truthfulness. These insights highlight the conditions under which alignment methods excel, as well as their limitations.
Archival Status: Archival, Non‑archival
Submission Number: 87
Loading