# ReconVLA on CALVIN Task D-D (5-step chains): n=10 vs n=50 vs n=500

This is a **diagnostic / limitations** table, **not BRACE evidence**. The goal is to avoid citing only small-N runs that may overestimate performance.

**Key metrics**

- Avg successful sequence length: average number of correctly executed steps (out of 5).
- Success@k: probability of successfully completing **k instructions in a row** (k=1..5).

| N | Avg success len (0..5) | Success@1 | Success@2 | Success@3 | Success@4 | Success@5 |
|---:|---:|---:|---:|---:|---:|---:|
| 10 | 3.40 | 81.8% | 81.8% | 63.6% | 45.5% | 36.4% |
| 50 | 3.98 | 94.1% | 86.3% | 76.5% | 72.5% | 60.8% |
| 500 | 0.79 | 22.0% | 16.6% | 14.6% | 13.8% | 12.0% |

**Notes**

- The n=500 summary is the finalized output (result timestamp: `2026-01-28 04:20`).
- The gap between n=50 and n=500 suggests a major difficulty / distribution shift; treat small-N numbers only as early signals.
