Mode-conditioning unlocks superior test-time compute scaling

Chen Henry Wu; Sachin Goyal; Aditi Raghunathan

Mode-conditioning unlocks superior test-time compute scaling

Chen Henry Wu, Sachin Goyal, Aditi Raghunathan

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER Workshop SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: test-time scaling, reasoning, diversity, creativity

Abstract: Parallel sampling promises substantial gains in test-time scaling, but its effectiveness is sharply limited by diversity collapse, where models concentrate on a few modes and repeated samples produce the same mistakes. We propose the *mode-conditioning (ModC) framework*, which explicitly allocates test-time compute across reasoning modes using either specialist models or mode-specific prefixes. ModC consistently improves scaling across controlled graph-search tasks and large-scale reasoning benchmarks, spanning model families and sizes from 0.5B to 7B. On OpenThoughts, fine-tuning Qwen2.5-7B with ModC achieves an **8× efficiency gain** over standard training while also improving the maximum attainable Pass@k. We further show that gradient clustering enables ModC without explicit mode labels, yielding up to 10% gains on datasets such as NuminaMath. These results demonstrate that standard training underutilizes the diversity in data, and that ModC provides a simple, effective remedy for unlocking the full benefits of diversity in test-time scaling.

Submission Number: 256

Loading