# FT-Predict Experiments — Reproducible Figure Generation

This repository contains the code used to reproduce the **Section 7** figures and **Appendix F** supplementary figures.
It maps the paper's theoretical quantities to concrete estimators:

- empirical risk curve: $\widehat{\mathcal{L}}(c)$
- intrinsic floor: $\widehat{\mathcal{L}}_{\mathrm{int}} := \widehat{\mathcal{L}}(c_{\max})$
- reducible optimization variance: $\widehat{\mathcal{V}}_{\mathrm{opt}}(c) := \widehat{\mathcal{L}}(c) - \widehat{\mathcal{L}}_{\mathrm{int}}$
- uncertainty decay exponent: $\widehat{\alpha}$ via log–log power-law fitting
- marginal gain: $\Delta(c) := \widehat{\mathcal{V}}_{\mathrm{opt}}(c) - \widehat{\mathcal{V}}_{\mathrm{opt}}(c_{\text{next}})$

Monotonicity of $\widehat{\mathcal{L}}(c)$ is enforced via **isotonic regression** (Appendix F protocol).

---

## Data files (use these exact filenames)

Place the following files under `data/` **with the same names**:

### Real / benchmark
- `metadataset_Risk.csv` — run-level logs (includes `dataset_name`, `seed`, `probe_c`, `R_true`, `R_pred`, `squared_error`, plus feature columns)
- `risk_curve_by_dataset.csv` — aggregated curves (includes `dataset_name`, `probe_c`, `L_hat`, ...)

Schema: see `docs/schema_real.md`.

### Synthetic validation
- `synthetic_task_specs.csv`
- `synthetic_run_logs.csv`
- `synthetic_risk_curves.csv`
- `synthetic_schema.md` (author-provided schema for synthetic files)

---

## Install

```bash
pip install -r requirements.txt
```

---

## One-command reproduction (PDF figures)

```bash
python -m scripts.make_all_figures --data_dir data --out_dir figures
```

All figures are saved as **PDF** to `figures/`, and two manifests are written:
- `figures/manifest_real.json`
- `figures/manifest_synth.json`

These record thresholds, key parameters, and library versions.

---

## Main entrypoints

- `scripts/make_real_figures.py`  
  Generates: `Fig_7_1_population_decay_full.pdf`, `Fig_7_3_phase_diagram.pdf`, `Fig_7_4_efficiency_frontier.pdf`, and Appendix figures.

- `scripts/make_synth_figures.py`  
  Generates: `Fig_S1_synth_decay_by_regime.pdf`, `Fig_S4_synth_phase_diagram.pdf`.

- `scripts/make_all_figures.py`  
  Runs both.

---

## Regime assignment (post hoc)

Regimes are assigned *post hoc* from $(\widehat{\mathcal{L}}_{\mathrm{int}},\widehat{\alpha})$ using thresholds in:

- `configs/regime_thresholds.yaml`

This assignment is used **only for analysis/visualization** and is never used for tuning.

---

## Repository layout

```
src/ftpredict/
  estimation.py      # L_hat(c), L_int, V_opt(c), isotonic regression
  powerlaw.py        # alpha_hat fitting
  marginal_gain.py   # Delta(c) and normalization
  regimes.py         # regime assignment + thresholds
  bootstrap.py       # bootstrap CIs (optional utilities)
  plots.py           # all Matplotlib PDF outputs
docs/
  schema_real.md
configs/
  regime_thresholds.yaml
scripts/
  make_real_figures.py
  make_synth_figures.py
  make_all_figures.py
```
