Adaptive Test-Time Compute Allocation via Query Complexity Estimation in Large Language Models

Yuhang Du

Adaptive Test-Time Compute Allocation via Query Complexity Estimation in Large Language Models

Yuhang Du

18 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adaptive Compute Allocation 、Large Language Models 、Complexity Estimation 、Inference Efficiency 、Resource Optimization

Abstract: Recent advances in test-time compute scaling have demonstrated substantial performance improvements for large language models through increased inference-time computation. However, existing approaches uniformly allocate computational resources regardless of query complexity, leading to significant inefficiencies. We propose AdaptiveComp, a principled framework that dynamically allocates test-time compute based on query complexity estimation. Our approach introduces: (1) a theoretically-grounded complexity estimator using information-theoretic measures, (2) a continuous resource allocation strategy with provable optimality guarantees, and (3) an uncertainty-aware early stopping mechanism.Through comprehensive evaluation on 8 benchmarks spanning mathematical reasoning, code synthesis, and multi-step planning, we demonstrate that AdaptiveComp achieves comparable performance to uniform high-compute baselines while reducing computational costs by 47.3±3.2% (p<0.001). Moreover, we establish theoretical connections between query complexity and optimal compute allocation, providing the first formal treatment of this problem. Our analysis reveals that complexity-aware allocation becomes increasingly beneficial as task diversity increases, with efficiency gains of up to 73% on heterogeneous datasets.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 12170

Loading