Adaptive Test-Time Compute Allocation via Query Complexity Estimation in Large Language Models

18 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adaptive Compute Allocation 、Large Language Models 、Complexity Estimation 、Inference Efficiency 、Resource Optimization
Abstract: Recent advances in test-time compute scaling have demonstrated substantial performance improvements for large language models through increased inference-time computation. However, existing approaches uniformly allocate computational resources regardless of query complexity, leading to significant inefficiencies. We propose AdaptiveComp, a principled framework that dynamically allocates test-time compute based on query complexity estimation. Our approach introduces: (1) a theoretically-grounded complexity estimator using information-theoretic measures, (2) a continuous resource allocation strategy with provable optimality guarantees, and (3) an uncertainty-aware early stopping mechanism.Through comprehensive evaluation on 8 benchmarks spanning mathematical reasoning, code synthesis, and multi-step planning, we demonstrate that AdaptiveComp achieves comparable performance to uniform high-compute baselines while reducing computational costs by 47.3±3.2% (p<0.001). Moreover, we establish theoretical connections between query complexity and optimal compute allocation, providing the first formal treatment of this problem. Our analysis reveals that complexity-aware allocation becomes increasingly beneficial as task diversity increases, with efficiency gains of up to 73% on heterogeneous datasets.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 12170
Loading