Keywords: reasoning, uncertainty, generation, branching, entropy, efficiency
TL;DR: The paper introduces EAGER, a decoding method for reasoning language models that uses token-level uncertainty to allocate computation more efficiently, reducing redundancy and improving the balance between performance and efficiency.
Abstract: With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance.
EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and then reallocates the saved compute budget to the instances where exploration of alternative paths is most needed.
We find that across multiple open-source models on complex reasoning benchmarks such as AIME 2025, while EAGer generates up to 65\% fewer tokens (hence saving the compute), it achieves up to 27% improvement in the Pass@1 compared to the Full Parallel sampling.
Our results show that EAGer consistently maximizes the efficiency-performance trade-off by enabling dynamic control over computation expenditure.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 13343
Loading