Diversity-aware Training for Test-time Scaling

Diversity-aware Training for Test-time Scaling

ICLR 2026 Conference Submission4872 Authors

13 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language mode, test-time scaling

TL;DR: This paper introduces a novel training framework that uses diverse training data and noisy embeddings to guide LLMs, significantly improving test-time scaling performance on math, code, and agent tasks.

Abstract: Test-time scaling for large language models (LLMs) is a recognized effective approach to improving performance. However, when increasing test-time computation, the performance gains grow progressively smaller. This is largely due to the tendency of independent reasoning attempts to collapse into similar incorrect solutions. Existing approaches to enhancing reasoning diversity mainly focus on token-level diversity, which fail to capture reasoning-level diversity and introduce hallucinations. To this end, we introduce RePrism, a novel framework designed to act like a Reasoning Prism, guiding models to explore a spectrum of distinct and valid reasoning paths from a single input. First, we construct training data where each prompt is associated with multiple diverse yet correct answers. Additionally, we introduce noise embeddings into special tokens as implicit diversity signals, teaching the model to recognize these embeddings as indicators of diverse reasoning paths. We validate RePrism on 9 challenging benchmarks across Math, Code, and Agent tasks, where it increases the models' pass@N accuracy by up to 6.4\%, 1.1\%, and 0.5\% on Math, Code, and Agent tasks, respectively. Moreover, we demonstrate that the reasoning diversity instilled by RePrism provides a superior foundation for Reinforcement Learning (RL). RePrism not only furnishes a richer exploration space that leads to enhanced performance gain from RL, but also prevents the collapse of reasoning diversity during RL training.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 4872

Loading