Supervised Fine-tuning versus Reinforcement Learning: A Comprehensive Survey on Large Language Model Post-training

Supervised Fine-tuning versus Reinforcement Learning: A Comprehensive Survey on Large Language Model Post-training

ACL ARR 2026 January Submission7447 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Model Alignment, Supervised Fine-Tuning, Reinforcement Learning, Post-Training Methods

Abstract: Pre-trained Large Language Model (LLM) exhibits broad capabilities, yet, for specific tasks or domains, their attainment of higher accuracy and more reliable reasoning generally depends on post-training through Supervised Fine-tuning (SFT) or Reinforcement Learning (RL). Although often treated as distinct methodologies, recent theoretical and empirical developments demonstrate that SFT and RL are closely connected. This survey presents a comprehensive and unified perspective on LLM post-training with SFT and RL. We first provide an in-depth overview of both techniques, examining their objectives, algorithmic structures, and data requirements. We then systematically analyze their interplay, highlighting frameworks that integrate SFT and RL, hybrid training pipelines, and methods that leverage their complementary strengths. Drawing on a representative set of recent application studies from 2023 to 2025, we identify emerging trends, characterize the rapid shift toward hybrid post-training paradigms, and distill key takeaways that clarify when and why each method is most effective. By synthesizing theoretical insights, practical methodologies, and empirical evidence, this survey establishes a coherent understanding of SFT and RL within a unified framework and outlines promising directions for future research in scalable, efficient, and generalizable LLM post-training.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: Language Modeling, Machine Learning for NLP, Interpretability and Analysis of Models for NLP

Contribution Types: Data analysis, Surveys

Languages Studied: English

Submission Number: 7447

Loading