# Reproducibility Materials

Analysis and figure-generation scripts accompanying the paper.

## Folder structure

```
repro_anonymous/
├── README.md
├── analysis/
│   ├── stratum_analysis.py        # per-method cells, bootstrap CI, transition matrix
│   ├── predictive_framework.py    # 37.7% TO→TS rate, ΔTS prediction, cross-cell ΔTC fit
│   ├── recovery_ci.py             # Wilson 95% CI + Fisher exact pairwise tests
│   ├── disagreement_taxonomy.py   # N/I/S/Z classifier for Opus-vs-GTED disagreements
│   └── check_refs.py              # \Cref / \cite resolution sanity check
├── figures/
│   └── make_figures.py            # generates fig1–fig5 PDFs from numbers in the paper
└── data/                          # raw model outputs + judge verdicts (see below)
```

## Data layout (expected under `data/`)

Scripts read from environment variables with sensible defaults:

| Env var       | Default path                                  | Contents                                    |
| ------------- | --------------------------------------------- | ------------------------------------------- |
| `RUNS_DIR`    | `../data/runs/proofnet_186/v4_pro`            | `B1.jsonl`, `B3.jsonl`, `B4.jsonl`, `B5.jsonl` (per-method runs) |
| `JUDGE_DIR`   | `../data/judge`                               | `v4pro_{b1,b3,b4,b5}_opus.jsonl`, `..._gted.jsonl`              |
| `MASTER_TABLE`| `../data/judge/master_table.json`             | aggregated TC% / GTED-SF% across 19 (model, dataset, method)    |
| `GOLD_PATH`   | `../data/proofnetsharp_test.json`             | gold ProofNet$^{\#}$ formalizations                              |
| `PAPER_TEX`   | `../main.tex`                                 | active paper source (for `check_refs.py`)                        |

## Quick start

```bash
python analysis/stratum_analysis.py       # reproduces §5.1 / §5.2 cells + bootstrap
python analysis/predictive_framework.py   # reproduces §5.4 recovery rates + cross-cell fit
python analysis/recovery_ci.py            # reproduces App A Wilson CI table
python analysis/disagreement_taxonomy.py  # reproduces §5.5 N/I/S/Z taxonomy table
python figures/make_figures.py            # regenerates fig1–fig5 PDFs
```
