# Reproducibility

This folder contains all code needed to reproduce the paper figures in one place, with a flat script layout.

## Layout

- **Scripts** (all in this directory):
  - `run_all_figures.py` — main entry: run this to generate all figures
  - `bike_analysis.py` — bike sharing (MPF) figures
  - `bike_ebm_analysis.py` — bike sharing (EBM) 2D PD
  - `cali_analysis.py` — California housing figures
  - `data_generation.py` — synthetic data for PD cancellation
  - `synthetic_analysis.py` — synthetic PD figures (MPF, EBM, XGBoost)

- **Cluster scripts** (`mpf_cluster_scripts/`): configs and job runners for training MPF models on a cluster (e.g. configs in `cluster_scripts/configs/`, `submit_cluster_jobs.sh`, `run_cluster_experiment.py`). See `mpf_cluster_scripts/README.md` and `mpf_cluster_scripts/cluster_scripts/CLUSTER_EXECUTION.md` for usage.

- **TSL source** (`TSL_source/`): implementation of the TSL/MPF method. *Note for reviewers:* the codebase is named **MPF** (the package lives under `TSL_source/MPF/`, including `mpf-py`). Figure-generation scripts depend on it via `mpf_py` (see `requirements.txt`).

- **Data** (populate from `simulations/`):
  - `data/bike_sharing/` — e.g. `42712_Bike_Sharing_Demand.csv`
  - `data/california/` — e.g. `44977_california_housing.csv`
  - (Synthetic uses no external data; it is generated by the scripts.)

- **Models** (populate from `simulations/`):
  - `models/bike_sharing/` — `mpf_model.bin`, `ebm_model.pkl`
  - `models/california/` — `mpf_interpretable.bin`, `mpf_blackbox.bin`
  - `models/synthetic_pd/mpf/model.bin`, `ebm/model.pkl`, `xgboost/model.json`

- **Output**: `figures/` — PDFs (and any other outputs) written here.

## Generate all figures

```bash
cd reproducibility
python run_all_figures.py
```

Figures are written to `reproducibility/figures/` (bike_sharing, california, synthetic_pd).

## Anonymity (for review submission)

- **Build artifacts**: Do not include `TSL_source/MPF/target/` in the submission; it contains host-specific paths. The repo’s `.gitignore` there excludes it.
- **Cluster result JSONs**: Any `models/` or cluster result files that contain `fitted_model_path` should use relative or generic paths (one such path in `models/california/` has been sanitized).

## Dependencies

Same as the main simulations: `mpf_py`, `interpret`, `xgboost`, `matplotlib`, `numpy`, `pandas`, `cartopy` (for California maps), etc. Use the project’s environment or install from the repo’s requirements if present.
