TRACE Benchmark (Synthetic)

This repository implements a reproducible synthetic benchmark to evaluate the TRACE diagnostic across two synthetic worlds: Blobs-Shift and Moons-Warp. The runner now outputs only machine-readable results for supplementary material (no plots).

Setup:

```bash
python -m pip install -r trace_bench/requirements.txt
```

Run (local CPU/GPU):

```bash
# Creates trace_bench/reports/<timestamp>/{blobs,moons}/ with results.json, results.csv, summary.json
python -m trace_bench.run --world blobs_shift --grid trace_bench/configs/blobs.yaml --use_sinkhorn --use_mmd --output trace_bench/reports/$(date +%Y%m%d_%H%M%S)/blobs
python -m trace_bench.run --world moons_warp --grid trace_bench/configs/moons.yaml --use_sinkhorn --output trace_bench/reports/$(date +%Y%m%d_%H%M%S)/moons
```

Run (SLURM):

```bash
sbatch trace_bench/scripts/sbatch_run_all.sbatch
# Logs: slurm_logs/trace_bench_<JOBID>.log
```

Outputs:
- results.csv: per-run rows with delta_R, Bhat_ot, Bhat_mmd, and term_* columns
- results.json: same content as CSV in JSON form
- summary.json: only Spearman correlations: {"spearman_ot": ..., "spearman_mmd": ...}

Reproducibility:
- Global seeding via `utils/seed.py` (Python, NumPy, PyTorch)
- Configs in `trace_bench/configs/{blobs.yaml, moons.yaml}`

Structure:

```
trace_bench/
  data_gen/        # synthetic worlds
  models/          # sklearn & torch models
  metrics/         # sinkhorn, mmd, grads, risk, spearman
  eval/            # diagnostic
  utils/           # seeding, io (no plotting required)
  configs/         # yaml configs
  scripts/         # run_all.sh, sbatch_run_all.sbatch
  reports/         # outputs
```
