Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models

Published: 08 Mar 2025, Last Modified: 12 Apr 2025SSI-FM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: fine-tuning, code generation, synthetic data, self-improvement, reasoning
Abstract: While large language models (LLMs) have demonstrated strong capabilities in programming and mathematical reasoning tasks, their continued advancement is constrained by the limit of high-quality, testable, open-source training data. To overcome this bottleneck approaches such as distillation and curated reasoning training have shown promise, but they still rely on larger models and extensive data sources. We propose a simple approach that effectively leverages a model's self-improvement capabilities, summarized in four steps: **Think, Prune, Train, Improve**. Specifically, we iteratively fine-tune models on their own *correct* step-by-step reasoning solutions. Our experiments demonstrate the effectiveness of this method on coding and mathematical reasoning benchmarks. On GSM8K, the Gemma-2-2B model improves its pass@1 from 41.0\% to 57.6, while the Gemma-2-9B model jumps from 66.4% to 82%, matching LLaMA-70B-Instruct's 78\%. The benefits extend to leading models as well. LLaMA-70B-Instruct, achieves 91% pass@1 through this technique outperforming GPT-4o 82%. Our approach introduces a simple yet scalable paradigm in which self-generated reasoning traces and systematic data selection enhance a model's reasoning ability and performance.
Submission Number: 64
Loading