Anchored Supervised Fine-Tuning

ICLR 2026 Conference Submission17489 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: SFT
Abstract: Post-training of large language models involves a fundamental trade-off between supervised fine-tuning (SFT), which efficiently mimics demonstrations but tends to memorize, and reinforcement learning (RL), which achieves better generaliza- tion at higher computational cost. Dynamic Fine-Tuning (DFT) recently emerged as a promising middle ground, reweighting SFT objectives with token probabili- ties and achieving improvements in certain reasoning domains, though it exhibits instability in other tasks. We provide a analysis of DFT through the reward- weighted regression (RWR) framework, revealing that it corresponds to a spe- cific auxiliary distribution choice that yields provably tighter RL bounds than standard SFT. However, our analysis also uncovers a critical limitation: this con- struction lacks distributional anchoring, leading to progressive drift that under- mines training stability. To address this, we propose Anchored Supervised Fine- Tuning (ASFT), which augments DFT’s reweighting with lightweight KL regu- larization to preserve tightness while ensuring stability. Empirically, ASFT con- sistently outperforms both SFT and DFT across mathematical reasoning, medical knowledge grounding, and code generation, achieving substantial improvements with minimal computational overhead. Our RWR framework provides a system- atic lens for understanding post-training methods and demonstrates that principled theoretical analysis leads to both stronger guarantees and practical gains.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17489
Loading