# Figure guide (what each plot shows)

This file summarizes the most important plots under `evidence/` and where to find the
corresponding raw tables (`.csv` / `.json`).

> **Note on naming:** This document uses paper names. Code/CSV files use internal names.
> See `docs/ALGORITHMS.md` for the mapping (e.g., "Residual Bootstrapping" = `BERW-Hetero`).

## Paper-facing figures/tables

- **Figure 1 (money plot, fixed budget, representative high-misranking functions)**  
  - Plot: `evidence/paper_figures/figure1_money_plot.pdf`  
  - Data (curves): `evidence/hansen_test_fixed_budget/moneyplot_with_resample/csv/`

- **Figure 2 (depth–fidelity bubble plot)**  
  - Plot: `evidence/paper_figures/figure2_depth_fidelity_bubble.pdf`  
  - Inputs:
    - Performance: `evidence/hansen_test_fixed_budget/noisefree/bbob_summary.csv`
    - Residual Bootstrapping depth traces: `evidence/hansen_test_fixed_budget/diagnostics/traces/`
    - UH-CMA-ES cost measurement: `evidence/uh_cmaes_cost_measurement/uh_cmaes_cost_summary.csv`

- **Figure 3 (composite: external tasks + single-crossing + transfer)**
  - Plots:
    - `evidence/paper_figures/figure3a_ranking.pdf`
    - `evidence/paper_figures/figure3b_single_crossing.pdf`
    - `evidence/paper_figures/figure3c_transfer.pdf`

- **Table 1 (main aggregate stats)**
  - `evidence/paper_tables/table_probeswitch_comparison.tex`

## COCO / BBOB-noisy (fixed-budget)

- **Money plot (fixed budget, D=40, B=100D, representative functions)**  
  - Plot (paper representative set): `evidence/hansen_test_fixed_budget/money_plot_noisefree_d40_B100_f10-13-16-25_with_resample.png`  
  - Legacy plot: `evidence/hansen_test_fixed_budget/money_plot_noisefree_d40_B100_f8-10-14-20_with_resample.png`  
  - What it shows: median noise-free best-so-far vs total evaluations (fixed budget), comparing
    Residual Bootstrapping / Probe-and-Switch against UH-CMA-ES and fixed-k resampling.
  - Stats tables: `evidence/hansen_test_fixed_budget/noisefree/pairwise_sign_test_with_resample.csv`

- **Budget grid win-rate (D=40)**  
  - Plot: `evidence/hansen_test_fixed_budget_grid/winrate_vs_budget.png`  
  - Table: `evidence/hansen_test_fixed_budget_grid/budget_grid_summary.csv`  
  - What it shows: win-rate vs budget multiplier for the fixed-budget regime.

- **Budget grid win-rate (D=20)**  
  - Plot: `evidence/hansen_test_fixed_budget_grid_d20/winrate_vs_budget.png`  
  - Table: `evidence/hansen_test_fixed_budget_grid_d20/budget_grid_summary.csv`

## Probe-and-Switch (calibration, transfer, overhead)

- **Calibration curves (COCO, test split)**  
  - Plots:
    - `evidence/probe_calibration_bbob_noisy/bbob_B200_d40_calibration.png`
    - `evidence/probe_calibration_bbob_noisy/bbob_B500_d40_calibration.png`
  - Inputs:
    - `evidence/bbob_noisy_probe_decision_accuracy_noisefree_i1-15_B200/decision_points.csv`
    - `evidence/bbob_noisy_probe_decision_accuracy_noisefree_i1-15_B500/decision_points.csv`

- **Transfer + overhead summary (compact aggregation)**  
  - Plot: `evidence/probeswitch_transfer_overhead_summary/transfer_overhead_main.png`  
  - Compact tables:
    - `evidence/probeswitch_transfer_overhead_summary/transfer_summary_compact.csv`
    - `evidence/probeswitch_transfer_overhead_summary/overhead_curve_compact.csv`
  - What it shows: one view that combines (i) threshold transfer across tasks/budgets and (ii) the probe overhead-vs-gain curve.

- **End-to-end external transfer win-rate**  
  - Plot: `evidence/probeswitch_external_transfer/winrate_switch_vs_cma.png`  
  - Table: `evidence/probeswitch_external_transfer/summary.csv`

## External tasks (fixed-budget)

- **RL (CartPole) heavy-tail objective**
  - Plot: `evidence/application_rl_cartpole_heavytail_quadratic_cost/final_boxplot.png`
  - Raw runs: `evidence/application_rl_cartpole_heavytail_quadratic_cost/runs.csv`

- **RL (Pendulum) heavy-tail objective**
  - Plot: `evidence/application_rl_pendulum_heavytail/final_boxplot.png`
  - Raw runs: `evidence/application_rl_pendulum_heavytail/runs.csv`

- **RL (Pendulum) Gaussian noise**
  - Plot: `evidence/application_rl_pendulum_gaussian/final_boxplot.png`
  - Raw runs: `evidence/application_rl_pendulum_gaussian/runs.csv`

- **Noisy HPO (digits0)**  
  - Plot: `evidence/application_hpo_noisy_logreg_digits0_sigma1p0/final_boxplot.png`  
  - Raw runs: `evidence/application_hpo_noisy_logreg_digits0_sigma1p0/runs.csv`

- **State-dependent heavy-tail control (LQR)**  
  - Plot: `evidence/application_lqr_heavytail_control_fixed_budget_resample/final_boxplot.png`  
  - Raw runs: `evidence/application_lqr_heavytail_control_fixed_budget_resample/runs.csv`

- **Nonconvex mini-batch MLP (digits0)**  
  - Plots:
    - `evidence/application_mlp_minibatch_digits0_heavytail_sigma1p0/batch_16_final_boxplot.png`
    - `evidence/application_mlp_minibatch_digits0_heavytail_sigma1p0/batch_256_final_boxplot.png`
  - Raw sweep: `evidence/application_mlp_minibatch_digits0_heavytail_sigma1p0/sweep_summary.csv`

## Mechanistic / diagnostic plots

- **Misranking metric sanity check (RD vs Kendall/top-μ)**  
  - Plot: `evidence/misranking_metric_sandwich/misranking_metric_sandwich.png`  
  - Raw metrics: `evidence/misranking_metric_sandwich/misranking_metrics_bbob_noisy_d40_es.csv`

- **Variance proxy fails under state-dependent noise (radial)**  
  - Plot: `evidence/probe_decoupling_radial/probe_decoupling.png`  
  - Raw probe values: `evidence/probe_decoupling_radial/probe_values.csv`

- **Quadratic mechanism check (misranking → update dispersion)**
  - Plot: `evidence/theory_update_dispersion_quadratic/update_dispersion_quadratic.png`
  - Raw data: `evidence/theory_update_dispersion_quadratic/update_dispersion_quadratic.csv`

## Appendix figures and tables

All appendix figures are located in `evidence/paper_figures/Appendix/`.

| Paper Section | Title | File |
|---------------|-------|------|
| A1 | Mechanism validation on a controlled quadratic | `fig_a1_mechanism_quadratic.pdf` |
| A2 | RB-PEM estimator ablations | `fig_a2_ablations.pdf` |
| A3 | Residual-pool diagnostic snapshots | `fig_a3_diagnostics.pdf` |
| A4 | Interpreting the rank-disagreement probe | `fig_a4_misranking_sandwich.pdf` |
| A5 | Variance does not equal misranking | `fig_a5_probe_decoupling.pdf` |
| A6 | Probe calibration curves | `fig_a6_probe_calibration.pdf` |
| A7 | Probe reliability versus probe budget | `fig_a7_probe_budget_roc.pdf` |
| A8 | Threshold sensitivity analysis | `fig_a8_threshold_sensitivity.pdf` |
| A9 | Depth–fidelity robustness and UH-CMA-ES sensitivity | `fig_a10_depth_fidelity_tradeoff.pdf` |
| A10 | External validity on nonconvex real-data task | `fig_a12_mlp_digits0.pdf` |
| A11 | Complete results on high-misranking COCO functions | `evidence/paper_tables/table_a11_high_misranking.tex` |

**Note on file naming**: Some appendix figure files have legacy numbering (e.g., `fig_a10_*` for A9, `fig_a12_*` for A10) due to historical reorganization of the appendix structure. The mapping above shows the correct correspondence between paper sections and files.
