Verify What Matters: Budgeted Verification for Tool Using Agents under Counterfactual Downstream Harm
Abstract: Tool-using agents make intermediate decisions that alter persistent state, shape later observations,
and can trigger failures that are not equally easy to recover from. When verification
is costly, the central question is therefore not whether checking helps in general, but which
decisions are worth checking. Policies driven only by local uncertainty capture whether a step
may be wrong, but not how much that error would matter if left uncorrected. We formulate
budgeted verification for tool-using agents as an intervention-allocation problem, and argue
that the step-level value of verification factors into verifier efficacy, local error probability,
counterfactual downstream harm, and intervention cost. This yields a consequence-aware
decision target under which uncertainty-only routing is the restriction to a constant harm.
We emphasize that the paper’s primary contribution is conceptual: a decision structure
in which local error likelihood and downstream harm enter as separate inputs rather than
being collapsed into a single scalar score. Empirically, we report a four-episode mechanism
pilot on a dependency-sensitive slice of an OpenClaw-based sandbox, where uncertainty-only
routing misses two irreversible failures that a harm-aware rule catches. The pilot is intentionally
small-scale; it isolates the mechanism rather than estimating an effect size, and a
Fisher exact test on the pilot’s 2×2 contingency yields p ≈ 0.43. We specify three follow-up
comparisons, including matched-budget, cross-slice, and process-reward-threshold, whose results
would be required for an effect-size claim; these are outlined but not reported in the present
submission. Readers should interpret the empirical content as supporting the framework’s
direction of effect in a restricted setting, not as establishing quantitative superiority.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Hanrui_Zhang1
Submission Number: 8582
Loading