Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization

ICLR 2026 Conference Submission21050 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: optimization, direct preference optimization, sharpness-aware minimization, learning dynamics
Abstract: Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified squeezing effect (also known as likelihood displacement), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in the logit space. Our analysis reveals that gradient descent with a negative learning rate causes residuals to expand rapidly along high-curvature directions, which underlies the squeezing effect, whereas Sharpness-Aware Minimization (SAM) can suppress this behavior through its curvature-regularization effect. Building on this insight, we investigate logits-SAM, a computationally efficient variant that perturbs only the output layer with negligible overhead. Extensive experiments on Pythia-2.8B and Mistral-7B across multiple datasets demonstrate that logits-SAM consistently improves the effectiveness of DPO.
Primary Area: optimization
Submission Number: 21050
Loading