# Decision Evidence (Nonconvex, Real Data): misranking-probe → “turn on robust pipeline?”

This evidence package reuses the same **decision-evidence protocol** on a *nonconvex*, *real-data* task:
heavy-tailed mini-batch MLP on `digits0` (see `evidence/application_mlp_minibatch_digits0_heavytail_sigma1p0/`).

Goal: show that a **single probe** (misranking severity at `x0`) can predict whether a *robust/noise-aware* strategy
should be used, beyond COCO, without “hand tuning”.

## Decision problem (what is being predicted)

For each instance `(seed, batch_size)` we compare:

- `CMA-ES` (baseline)
- `ProbeSwitch-Noise` (our noise-aware pipeline; 3-way choice among CMA / Hetero / Robust internally)

Label `berw` means `ProbeSwitch-Noise` achieves lower `post_true` than `CMA-ES` on that instance.

## Key artifacts

- `decision_points.csv`: merged table of `(seed, batch_size)` → `{probe values, best_f_cma, best_f_berw, label}`.
- `summary.json`: counts + a fixed-threshold confusion snapshot.
- `train_test_threshold_misranking_rd_log10_regret_mean.json`: train/test threshold selection on seeds (train 1–25, test 26–50).
- `train_test_threshold_sweep_misranking_rd_log10_regret_mean.csv`: sweep curve (accuracy/regret vs threshold).

## How this is constructed

- Runs source: `Results/exp_mlp_digits0_heavytail_sigma1p0_h4_N256_B40_seeds1_50/`
- Probe source: `evidence/application_mlp_minibatch_digits0_heavytail_sigma1p0/probe_values.csv`

Decision points are created by:

```bash
python3 tools/make_decision_points_from_runs_and_probes.py \
  --runs-csv "Results/.../batch_4/runs.csv,Results/.../batch_16/runs.csv,Results/.../batch_256/runs.csv" \
  --probe-values-csv evidence/application_mlp_minibatch_digits0_heavytail_sigma1p0/probe_values.csv \
  --key-cols seed,batch_size --instance-col seed \
  --algo-cma CMA-ES --algo-berw ProbeSwitch-Noise \
  --metric post_true --lower-is-better \
  --output-dir evidence/application_mlp_minibatch_digits0_heavytail_sigma1p0_decision_accuracy_vs_noise_switch
```

Then the train/test threshold file is generated by:

```bash
python3 tools/probe_threshold_train_test.py \
  --decision-points evidence/application_mlp_minibatch_digits0_heavytail_sigma1p0_decision_accuracy_vs_noise_switch/decision_points.csv \
  --probe-key misranking_rd \
  --train-instances 1-25 --test-instances 26-50 \
  --loss log10 --selection regret_mean_then_threshold \
  --tmin 0.0 --tmax 0.5 --tstep 0.01
```
