# ESR Experiment Progress Tracker

Last updated: 2026-01-28 15:30

We've made several changes to sampling and the judge model. Results are in `experiment_results/claude_haiku_4_5_20251001_judge/`.

**Key change**: Switched to old off-topic detectors (26 latents from `data/off_topic_detectors_old.json`) because they show better ablation contrast than the new separability-based detectors.

## Target State

### Core Experiments (Haiku Judge)

| Experiment | Model(s) | Target | Notes |
|------------|----------|--------|-------|
| Exp 1 (ESR) | All 5 models | 200 features, 25 prompts/feature | Main ESR measurement |
| Exp 2 (Boost sweep) | Llama 70B | ~50 features, multi-boost levels | Amount not critical |
| Exp 3 (Ablation) | Llama 70B | All exp1 features, ablated | OTD latent ablation (old detectors) |
| Exp 4 (Finetuning) | Llama 8B | 200 features, 10-90% masking ratios | Fine-tuning ratio sweep |
| Exp 5 (Prompt variants) | All models | Uses exp1 data | Resistance prompts |

### Additional Experiments

| Experiment | Description | Location |
|------------|-------------|----------|
| Exp 6 (Sequential) | Sequential activation analysis | `experiment_6_sequential_activations.py` |
| Exp 7 (Cross-judge) | Re-grade with different judge models | `experiment_7_cross_judge/` |
| Exp 8 (No-steering baseline) | Baseline without feature steering | `experiment_8_no_steering_baseline.py` |
| Exp 9 (Activation stats) | Self-correction activation stats | `experiment_9_activation_stats/` |
| Exp 10 (Random latent control) | Ablate random latents as control | `experiment_10_random_latent_control/` |

---

## Current Status

### Experiment 1: ESR Model Comparison

| Model | Features | Trials/Feature | Status |
|-------|----------|----------------|--------|
| Gemma 2B | 200 | 25.0 | DONE |
| Gemma 9B | 200 | 25.0 | DONE |
| Gemma 27B | 200 | 25.0 | DONE |
| Llama 8B | 200 | ~25 | DONE |
| Llama 70B | 200 | ~25 | DONE |

### Experiment 2: Boost Level Sweep

| Model | Features | Status |
|-------|----------|--------|
| Llama 70B | 50 | DONE |

### Experiment 3: Ablation Study (Old OTD - 26 latents)

| Model | Files | Status |
|-------|-------|--------|
| Llama 70B | 7 ablation files | IN PROGRESS - need to run for all exp1 files |

### Experiment 4: Fine-tuning Ratio Sweep

| Masking % | Status |
|-----------|--------|
| 10% | DONE |
| 20% | DONE |
| 30% | DONE |
| 40% | DONE |
| 50% | DONE |
| 60% | DONE |
| 70% | DONE |
| 80% | DONE |
| 90% | DONE |

**Results**: Higher masking ratio → higher % multi-attempt (0% base → 40%+ at 60-90% masking)

### Experiment 5: Prompt Variants

| Model | Status |
|-------|--------|
| Gemma 2B | DONE |
| Gemma 9B | RUNNING |
| Gemma 27B | PENDING |
| Llama 8B | PENDING |
| Llama 70B | PENDING |

### Experiment 6-10: Additional Experiments

| Experiment | Status |
|------------|--------|
| Exp 6 (Sequential) | PENDING |
| Exp 7 (Cross-judge) | PENDING |
| Exp 8 (No-steering baseline) | PENDING |
| Exp 9 (Activation stats) | PENDING |
| Exp 10 (Random latent) | PENDING |

---

## Running Jobs

- Experiment 5: Gemma 9B prompt variants (all variants)
- Experiment 10: OTD ablation run

---

## TODO


1. [x] Run Experiment 4 for all masking ratios (DONE)
2. [ ] Run Experiment 3 ablation (old OTD) for all Llama 70B exp1 files
3. [ ] Run Experiment 5 for remaining models (Gemma 27B, Llama 8B, Llama 70B)

4. [ ] Run Experiment 10 random ablation (after OTD phase completes)
5. [ ] Run Experiment 6 (sequential activations)
6. [ ] Run Experiment 8 (no-steering baseline) for all models
7. [ ] Run Experiment 7 (cross-judge analysis)
8. [ ] Run Experiment 9 (activation stats pipeline)

9. [ ] Generate all plots with `python plotting/plot_all.py`

---

## Plotting Updates

- `plot_exp4.py`: Updated to glob for files (no hardcoded filenames), now shows % Multi-Attempt + MSI (removed Success Rate panel)

---

## File Locations

- Results: `experiment_results/claude_haiku_4_5_20251001_judge/`
- Old OTD detectors: `data/off_topic_detectors_old.json` (26 latents)
- Plots: `plots/`
- Scripts: `experiment_*.py`, `plotting/plot_exp*.py`
- Overnight runner: `run_overnight.sh`
