# Probe decision accuracy (bbob-noisy, COCO noise-free labels; i=1–15, B=200×D, D=20)

Goal: check probe → decision predictability in a **lower-dimensional** setting (`D=20`), where CMA is often strong and switching can be harmful.

Setup:
- Suite: `bbob-noisy`, `D=20`, functions `1–30` (COCO ids `101–130`), instances `1–15` → `n=450`.
- Outcome labels (noise-free): compare `best_f` between `CMA-ES-sep` vs `BERW-Hetero`,
  taken from `Results/bbob_noisy_d20_i1-15_probe_labels_B200/noisefree` under budget `B=200×D`.
- Probes: `misranking_rd` and `variance_rel_sd` (same definitions as other evidence packs).

Key result (at the default thresholds used in this repository):
- misranking-probe (`t=0.12`): accuracy **0.589**
- variance-probe (`t=0.05`): accuracy **0.569**
- always choose CMA baseline: **0.631**
- always choose BERW baseline: **0.369**

Interpretation:
- In `D=20`, always choosing CMA is already strong; probe-driven switching is **not** reliably beneficial.
- This matches the regime story used throughout this repository: robust selection helps primarily in higher-dimensional / stronger-misranking regimes.

Robustness checks:
- Train/test threshold selection and k-fold CV (by instances) are included; both show that misranking-only thresholding is unstable here and does not beat the always-CMA baseline.
  - See `threshold_kfold_k5_misranking_rd_log10_regret_mean_then_threshold.json` and the baseline block inside it.

Files:
- `summary.json`, `decision_points.csv`
- `threshold_sweep.csv`
- `train_test_threshold_*.json`, `train_test_threshold_sweep_*.csv`
- `threshold_kfold_k5_*.json`: instance-level k-fold robustness checks (generated by `tools/probe_threshold_kfold.py`)

Reproduce:
```bash
python3 tools/probe_decision_accuracy.py \
  --results-dir Results/bbob_noisy_d20_i1-15_probe_labels_B200/noisefree \
  --dimension 20 --functions 1-30 --instances 1-15 --budget 200 \
  --algo-cma CMA-ES-sep --algo-berw BERW-Hetero \
  --misranking-threshold 0.12 --variance-threshold 0.05 \
  --output-dir evidence/bbob_noisy_probe_decision_accuracy_noisefree_i1-15_B200_d20

python3 tools/probe_threshold_kfold.py \
  --decision-points evidence/bbob_noisy_probe_decision_accuracy_noisefree_i1-15_B200_d20/decision_points.csv \
  --probe-key misranking_rd --group-by instance --k 5 \
  --loss log10 --selection regret_mean_then_threshold --fixed-threshold 0.12
```
