Keywords: SFT
Abstract: Post-training of large language models involves a fundamental trade-off between
supervised fine-tuning (SFT), which efficiently mimics demonstrations but tends
to memorize, and reinforcement learning (RL), which achieves better generaliza-
tion at higher computational cost. Dynamic Fine-Tuning (DFT) recently emerged
as a promising middle ground, reweighting SFT objectives with token probabili-
ties and achieving improvements in certain reasoning domains, though it exhibits
instability in other tasks. We provide a analysis of DFT through the reward-
weighted regression (RWR) framework, revealing that it corresponds to a spe-
cific auxiliary distribution choice that yields provably tighter RL bounds than
standard SFT. However, our analysis also uncovers a critical limitation: this con-
struction lacks distributional anchoring, leading to progressive drift that under-
mines training stability. To address this, we propose Anchored Supervised Fine-
Tuning (ASFT), which augments DFT’s reweighting with lightweight KL regu-
larization to preserve tightness while ensuring stability. Empirically, ASFT con-
sistently outperforms both SFT and DFT across mathematical reasoning, medical
knowledge grounding, and code generation, achieving substantial improvements
with minimal computational overhead. Our RWR framework provides a system-
atic lens for understanding post-training methods and demonstrates that principled
theoretical analysis leads to both stronger guarantees and practical gains.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17489
Loading