Towards Budget-Aware Agents: Do LLM Agents Know What They Will Spend?
Keywords: budget-aware agents, llm, agents, interpretability, evaluation, protocol
TL;DR: We formalize budget awareness: whether LLM agents can estimate their remaining resource needs mid-execution, and show that this capability decouples from task performance, fails in structured ways, and is trainable as a control signal.
Abstract: Foundation-model agents are deployed with growing resource constraints like token, money, and time budgets, yet it remains unclear whether they know how much budget they will spend. We call this capability budget awareness and formalize it as progressive interval estimation: mid-execution, whether the agent can provide an interval on how much budget remains needed and whether the task is still finishable. We score this with a rollout-replay protocol that re-queries the same agent on every prefix of an unconstrained rollout, and decompose estimation into three sub-capabilities: feasibility prediction, early failure detection, and interval calibration. We evaluate five frontier models on four environments spanning internal budgets (token consumption on Sokoban, Search-R1, and SWE-bench) and external budgets (cost, time, and warehouse occupancy in a supply-chain environment curated from real enterprise data); we further train Qwen-7B budget estimators with SFT and RL on Sokoban, and deploy their predictions through a simple early-stop policy. Across these axes, we find budget awareness: (1) decouples from task performance, (2) fails in structured ways, and (3) is already actionable and trainable as a control signal that resource-limited agents currently lack.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 165
Loading