Anytime Verified Agents: Adaptive Compute Allocation for Reliable LLM Reasoning under Budget Constraints
Abstract: Large language model (LLMs) agents show promising results in reasoning, planning, and tool use. However, their performance scales with the computational budget. Existing methods allocate computational resources using static strategies such as fixed search depths, constant self-consistency sampling, or uniform verification. This means that simple problems are used as much as complex tasks. We present Anytime Verified Agents (AVA), a framework that dynamically allocates compute search, tool use, and verification within a user-specified budget. AVA integrates calibrated uncertainty estimation, value-of-information-guided search expansion, and selective verification cascades with early exits. The controller dynamically allocates the compute based on the predicted failure risk and marginal reliability gains, allowing the agent to achieve higher accuracy at fixed budgets or lower costs at target reliability levels. AVA is evaluated on mathematical reasoning (GSM8K), multi-hop question answering (HotpotQA), and code generation (HumanEval) benchmarks, and it is compared to fixed-depth search, self-consistency, and always-verify baselines. The results show that the adaptive allocation achieves a 20-40% cost reduction at equivalent reliability while maintaining accuracy, showing clear Pareto improvements in the compute-reliability trade-off.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Weiyang_Liu1
Submission Number: 6525
Loading