Verify What Matters: Budgeted Verification for Tool Using Agents under Counterfactual Downstream Harm

23 Apr 2026 (modified: 26 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Tool-using agents make intermediate decisions that alter persistent state, shape later observations, and can trigger failures that are not equally easy to recover from. When verification is costly, the central question is therefore not whether checking helps in general, but which decisions are worth checking. Policies driven only by local uncertainty capture whether a step may be wrong, but not how much that error would matter if left uncorrected. We formulate budgeted verification for tool-using agents as an intervention-allocation problem, and argue that the step-level value of verification factors into verifier efficacy, local error probability, counterfactual downstream harm, and intervention cost. This yields a consequence-aware decision target under which uncertainty-only routing is the restriction to a constant harm. We emphasize that the paper’s primary contribution is conceptual: a decision structure in which local error likelihood and downstream harm enter as separate inputs rather than being collapsed into a single scalar score. Empirically, we report a four-episode mechanism pilot on a dependency-sensitive slice of an OpenClaw-based sandbox, where uncertainty-only routing misses two irreversible failures that a harm-aware rule catches. The pilot is intentionally small-scale; it isolates the mechanism rather than estimating an effect size, and a Fisher exact test on the pilot’s 2×2 contingency yields p ≈ 0.43. We specify three follow-up comparisons, including matched-budget, cross-slice, and process-reward-threshold, whose results would be required for an effect-size claim; these are outlined but not reported in the present submission. Readers should interpret the empirical content as supporting the framework’s direction of effect in a restricted setting, not as establishing quantitative superiority.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Hanrui_Zhang1
Submission Number: 8582
Loading