Keywords: Large language models, reasoning, test-time scaling
TL;DR: In this paper, we propose ID-sampling, a novel test-time sampling algorithm that can more efficiently sample correct responses on reasoning tasks.
Abstract: Recent reasoning models, such as OpenAI’s O1 series, have demonstrated exceptional performance on complex reasoning tasks and revealed new test-time scaling laws. Inspired by this, many people have been studying how to train models to achieve effective self-evaluation and self-correction to further enable the scaling paradigm. However, less studied is how to efficiently scale test-time compute from a fixed model, and this remains a challenge. In this paper, we focus on whether LLMs can benefit from matching the pattern of correct responses. Specifically, we explore how systematically triggering a model's self-correction mechanisms can improve performance on challenging reasoning tasks. To this end, we propose a novel iterative deepening sampling algorithm framework designed to enhance self-correction and generate higher-quality samples. Through extensive experiments on Math500, AIME, and GPQA-diamond benchmarks, we demonstrate that our method achieves a higher success rate on difficult tasks and provide detailed ablation studies to analyze its effectiveness across diverse settings.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14473
Loading