Optimizing Reasoning Efficiency through Prompt Difficulty Prediction

Bo Zhao; Berkcan Kapusuzoglu; Kartik Balasubramaniam; Sambit Sahu; Supriyo Chakraborty; Genta Indra Winata

Optimizing Reasoning Efficiency through Prompt Difficulty Prediction

Bo Zhao, Berkcan Kapusuzoglu, Kartik Balasubramaniam, Sambit Sahu, Supriyo Chakraborty, Genta Indra Winata

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reasoning models, model routing, intermediate representations

TL;DR: We use LLM internal representations to predict problem difficulty and route reasoning tasks to the smallest capable model, boosting efficiency.

Abstract: Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B, we train lightweight predictors of problem difficulty or model correctness to guide routing across a pool of reasoning models. On diverse math benchmarks, routing improves efficiency over random assignment and matches s1.1-32B’s performance while using significantly less compute. Our results demonstrate that difficulty-aware routing is effective for cost-efficient deployment of reasoning models.

Submission Number: 58

Loading