Efficient Test-time Scaling via Iterative Deepening

Efficient Test-time Scaling via Iterative Deepening

ICLR 2026 Conference Submission14473 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models, reasoning, test-time scaling

TL;DR: In this paper, we propose ID-sampling, a novel test-time sampling algorithm that can more efficiently sample correct responses on reasoning tasks.

Abstract: Recent reasoning models, such as OpenAI’s O1 series, have demonstrated exceptional performance on complex reasoning tasks and revealed new test-time scaling laws. Inspired by this, many people have been studying how to train models to achieve effective self-evaluation and self-correction to further enable the scaling paradigm. However, less studied is how to efficiently scale test-time compute from a fixed model, and this remains a challenge. In this paper, we focus on whether LLMs can benefit from matching the pattern of correct responses. Specifically, we explore how systematically triggering a model's self-correction mechanisms can improve performance on challenging reasoning tasks. To this end, we propose a novel iterative deepening sampling algorithm framework designed to enhance self-correction and generate higher-quality samples. Through extensive experiments on Math500, AIME, and GPQA-diamond benchmarks, we demonstrate that our method achieves a higher success rate on difficult tasks and provide detailed ablation studies to analyze its effectiveness across diverse settings.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14473

Loading