# WS5 claim check (numeric, cross-domain)

This table is a **sanity check**: do we observe improvements in the currently available paper-level runs/tables?
Numbers are taken from the per-run JSON tables under `artifacts/tables/` (see manifest).

| Domain/run | Baseline → Method | Success (Δ) | SLO viol (Δ) | Tok after (Δ) | Lat P95 ms (Δ) | Wait ms/ep (Δ) | Deadlock (Δ) | Notes |
|---|---|---:|---:|---:|---:|---:|---:|---|
| Domain A (Habitat, spnoise, 30ep) | `nobrace_noprune` → `brace_prune_r0.7` | 100.0% (0.0pp) | 4.7% (-80.8pp) | 20.02 (-215.05) | 2500 (-177.07) | - (-) | - (-) | SPL=0.994 (Δ=0.00) |
| Domain A (Habitat, oracle, 30ep) | `nobrace_noprune` → `brace_prune_r0.7` | 100.0% (0.0pp) | 1.0% (-83.7pp) | 20.00 (-210.00) | 2487 (-175.86) | - (-) | - (-) | SPL=0.997 (Δ=0.00) |
| Domain B (RoboFactory, real LLM, 10ep) | `nobrace_none` → `brace_erecap_r0.7` | 100.0% (0.0pp) | 50.0% (-50.0pp) | 318.73 (-1247.63) | 1213 (-391.16) | 3546 (-5517.68) | 0.0% (0.0pp) | - |
| Domain B (RoboFactory, proxy tokenizer, 10ep) | `nobrace_none` → `brace_erecap_r0.7` | 100.0% (30.0pp) | 5.7% (-25.9pp) | 153.05 (-53.80) | 254 (-76.72) | 6930 (-1852.97) | 0.0% (-30.0pp) | - |
| Domain C (AirSimNH, paper runs, 10ep) | `baseline` → `brace_full` | 100.0% (0.0pp) | 4.7% (-95.3pp) | 1113.59 (-1820.55) | 1640 (-6880.00) | 0 (0.00) | 0.0% (0.0pp) | near_miss=29.2 (Δ=4.50); min_dist=1.895 (Δ=-0.07) |
| Domain C (AirSimNH, highest-frequency x1 check, 3ep) | `baseline` → `brace_full` | 100.0% (0.0pp) | 0.0% (-100.0pp) | 800.00 (-2383.09) | 1640 (-8624.00) | 0 (0.00) | 0.0% (0.0pp) | near_miss=21.7 (Δ=-21.67); min_dist=1.962 (Δ=1.54) |

## Interpretation notes (WS5)

- A domain can look “not better” in *Success* if success is already saturated (e.g., oracle executors).
- For BRACE vs non-BRACE attribution, prefer multi-agent domains (Domain B/C) and stability fields (deadlock/wait/churn).
- Budget-matched baselines (e.g., recency under fixed token budget) can be strong; the paper claim should emphasize tails/SLO + stability under context growth, not only point success in easy regimes.

