## LLM Evaluation Summary for what_if_a_supervolcano_erupted_next

| Model | Rigor/Trace (0-25) | Integration/Causality (0-25) | Feasibility/Minimality (0-20) | Uncertainty/Adaptation (0-15) | Decisionability (0-15) | Overall (0-100) |
|---|---:|---:|---:|---:|---:|---:|
| agents4sci_v2 | 24.0 | 23.0 | 18.0 | 13.0 | 14.0 | 92.0 |
| baseline_single | 20.0 | 22.0 | 16.0 | 12.0 | 10.0 | 80.0 |
| baseline_tree | 22.0 | 23.0 | 18.0 | 13.0 | 12.0 | 88.0 |
| baseline_debate | 20.0 | 22.0 | 16.0 | 11.0 | 7.0 | 76.0 |

**Best overall**: `agents4sci_v2` with score 92.00.
