Supervised Fine-Tuning for Reasoning Large Language Models: A Survey

Supervised Fine-Tuning for Reasoning Large Language Models: A Survey

ACL ARR 2026 May Submission16460 Authors

26 May 2026 (modified: 12 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reasoning, Large Language Models, Supervised Fine-Tuning, Post-Training

Abstract: Large reasoning models have recently demonstrated remarkable capabilities in solving complex tasks, where supervised fine-tuning (SFT) on long chain-of-thought data serves as a crucial foundation for eliciting and enhancing their reasoning abilities. Despite rapid progress in both improving and analyzing reasoning-oriented SFT, the field still lacks a systematic survey that consolidates its fast-growing literature. To fill this gap, we present a comprehensive review of recent advancements in reasoning SFT, examining the literature through the dual lenses of methodological design and analytical investigation. First, we review methodological improvements across the SFT pipeline and categorize them into data-centric approaches and algorithm-centric innovations. Second, we reorganize analytical studies along three dimensions: data characteristics, optimization dynamics, and mechanistic insights. Finally, by synthesizing current research focuses and remaining bottlenecks, we outline promising future directions for reasoning SFT. We hope this survey deepens the understanding of reasoning SFT and paves the way for advanced reasoning models.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: Reasoning

Contribution Types: Surveys

Languages Studied: English

EMNLP 2026 AI Reviewing Experiment: no

Submission Number: 16460

Loading