Safe Under Budget? Verification Budgets and Abstention Failures in Web Agents
Keywords: agent safety, web agents, computer-use agents, policy compliance, verification budget, abstention, agent systems
TL;DR: Partial verification budgets can induce unsafe side-effect completions in web agents; an evidence-preserving gate mitigates this by forcing verification or abstention.
Abstract: Web agents are often deployed under operational limits on steps, tool calls, context, and verification opportunities. Existing safety evaluations show that agents can violate policies or pursue underspecified goals, while budget-aware inference work mainly optimizes cost-performance trade-offs. We study the missing interaction: whether insufficient verification budget changes agent safety behavior. We introduce a controlled diagnostic protocol over deterministic local web environments with policy-constrained, ambiguous, and verification-required tasks. Across a local 7B-model study, insufficient verification budgets produce unsafe side-effect completions: agents sometimes reach task end states while missing required evidence. The effect is clearest under partial verification budgets: agents can take side-effect actions, but cannot inspect all safety-critical evidence. We evaluate Budgeted Safety Gate as a diagnostic mitigation that preserves evidence requirements before side effects. The gate reduces unsafe completion in our suite, at the cost of conservative abstention under tight budgets.
Track: Short Paper (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 33
Loading