# Probe decision accuracy (bbob-noisy, COCO noise-free labels; i=1–15, B=200×D, D=10)

Goal: check whether a tiny probe can predict which base optimizer should win on **low-dimensional** bbob-noisy instances, and how threshold selection behaves.

Setup:
- Suite: `bbob-noisy`, `D=10`, functions `1–30` (COCO ids `101–130`), instances `1–15` → `n=450`.
- Outcome labels: compare noise-free `best_f` between:
  - `CMA-ES-sep`
  - `BERW-Hetero`
  using `bbob_summary.csv` from `Results/bbob_noisy_d10_i1-15_probe_labels_B200/noisefree` under budget `B=200×D`.
- Probes:
  - misranking probe: `rank_disagreement` on a small CMA-style candidate set (2 draws)
  - variance probe: `rel_std(f(x0))` from repeated evaluations at `x0` (10 reps)

Key result (at the default thresholds used in this repository):
- misranking-probe (`t=0.12`): accuracy **0.522**
- variance-probe (`t=0.05`): accuracy **0.511**
- always choose CMA baseline: **0.658**
- always choose BERW baseline: **0.342**

Interpretation:
- In `D=10`, the correct decision is overwhelmingly “run CMA” and the probes do **not** add useful signal.
- This supports the boundary statement: ProbeSwitch thresholds are **regime-dependent** (not universal across dimensions).

Train/test threshold selection (train instances `1–5`, test instances `6–15`):
- With an extended threshold range (`tmax=0.5`), misranking threshold selection chooses `t≈0.485` and effectively predicts **CMA always** (`pred_berw_rate=0`), matching the low-misranking regime.
- k-fold CV (by instances, `k=5`) outputs `threshold_kfold_k5_misranking_rd_log10_regret_mean_then_threshold.json`; CV does not beat the always-CMA baseline here.

Files:
- `summary.json`, `decision_points.csv`
- `threshold_sweep.csv`
- `train_test_threshold_*.json`, `train_test_threshold_sweep_*.csv`
- `threshold_kfold_k5_*.json`: instance-level k-fold robustness checks (generated by `tools/probe_threshold_kfold.py`)

Reproduce:
```bash
python3 tools/probe_decision_accuracy.py \
  --results-dir Results/bbob_noisy_d10_i1-15_probe_labels_B200/noisefree \
  --dimension 10 --functions 1-30 --instances 1-15 --budget 200 \
  --algo-cma CMA-ES-sep --algo-berw BERW-Hetero \
  --misranking-threshold 0.12 --variance-threshold 0.05 \
  --output-dir evidence/bbob_noisy_probe_decision_accuracy_noisefree_i1-15_B200_d10

python3 tools/probe_threshold_train_test.py \
  --decision-points evidence/bbob_noisy_probe_decision_accuracy_noisefree_i1-15_B200_d10/decision_points.csv \
  --probe-key misranking_rd --train-instances 1-5 --test-instances 6-15 \
  --tmax 0.5 --tstep 0.005 \
  --loss log10 --selection regret_mean_then_threshold

python3 tools/probe_threshold_kfold.py \
  --decision-points evidence/bbob_noisy_probe_decision_accuracy_noisefree_i1-15_B200_d10/decision_points.csv \
  --probe-key misranking_rd --group-by instance --k 5 \
  --loss log10 --selection regret_mean_then_threshold --fixed-threshold 0.12
```
