Keywords: Reasoning, Large Language Models, Supervised Fine-Tuning, Post-Training
Abstract: Large reasoning models have recently demonstrated remarkable capabilities in solving complex tasks, where supervised fine-tuning (SFT) on long chain-of-thought data serves as a crucial foundation for eliciting and enhancing their reasoning abilities.
Despite rapid progress in both improving and analyzing reasoning-oriented SFT, the field still lacks a systematic survey that consolidates its fast-growing literature.
To fill this gap, we present a comprehensive review of recent advancements in reasoning SFT, examining the literature through the dual lenses of methodological design and analytical investigation.
First, we review methodological improvements across the SFT pipeline and categorize them into data-centric approaches and algorithm-centric innovations.
Second, we reorganize analytical studies along three dimensions: data characteristics, optimization dynamics, and mechanistic insights.
Finally, by synthesizing current research focuses and remaining bottlenecks, we outline promising future directions for reasoning SFT.
We hope this survey deepens the understanding of reasoning SFT and paves the way for advanced reasoning models.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: Reasoning
Contribution Types: Surveys
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: no
Submission Number: 16460
Loading