Keywords: budget-aware agents, deployment, social aspect, llm agents, multi-turn RL, agentic AI
Abstract: Foundation-model agents operate under growing resource constraints yet rarely know how much budget they will spend. We call this capability budget awareness and formalize it as progressive interval estimation: mid-execution, can the agent provide a calibrated interval over remaining budget and declare when the task is infeasible? We score this with a rollout-replay protocol that re-queries the agent on every trajectory prefix, decomposing estimation into feasibility prediction, early failure detection, and interval calibration. We evaluate five frontier models across four environments, including internal token budgets (Sokoban, Search-R1, SWE-bench) and external multi-dimensional budgets (Warehouse), and train Qwen-7B estimators with SFT and RL. We find budget awareness: (1) decouples from task performance, (2) fails in structured ways (universal optimistic bias, late failure recognition, calibration-bound feasibility vs. reasoning-bound intervals), and (3) is actionable via early stopping and trainable via SFT-then-RL as a control signal that resource-limited agents currently lack.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: calibration/uncertainty, probing, robustness, agent evaluation, environment interaction, LLM efficiency, tool use, planning in agents
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: yes
Submission Number: 15737
Loading