The First Few Tokens Are All You Need: Unsupervised Rejection-Free Sampling Fine-Tuning for Reasoning Models

ACL ARR 2025 February Submission4983 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks due to large-scale pre-training and extensive instruction fine-tuning. However, enhancing their reasoning abilities remains a significant challenge, often requiring supervised fine-tuning with extensive labeled datasets, which is resource-intensive. In this paper, we introduce a simple yet effective unsupervised fine-tuning method that significantly improves the reasoning performance of LLMs using only prefix substrings as minimal guidance. Our approach leverages the inherent reasoning structures within pretrained models to facilitate reasoning without the need for annotated data. We find that different reasoning trajectories for the same question tend to share common prefixes, a phenomenon we term \textbf{Prefix Self-Consistency}. By training the model on these prefixes, we enhance its reasoning capabilities efficiently. Experiments across various training corpora show that our method outperforms vanilla full-token fine-tuning and achieves performance comparable to supervised approaches like Rejection Sampling Fine-Tuning (RFT), while requiring significantly less training and inference time. This demonstrates that minimal unsupervised fine-tuning can substantially enhance the reasoning capabilities of LLMs, opening new avenues for efficient and accessible model improvement.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Self-Improvement; LLMs; Unsupervised Learning
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 4983
Loading