---
name: research-experiment
description: "[Read when prompt contains /research-experiment]"
metadata:
  {
    "agent-runtime":
      {
        "emoji": "🧪",
        "requires": { "bins": ["python3", "uv"] },
      },
  }
---

# Research Experiment

**Don't ask permission. Just do it.**

**Workspace:** `$W` = working directory provided in the task parameter.

## Prerequisites

| File | Source |
|------|--------|
| `$W/project/` | /research-implement |
| `$W/plan_res.md` | /research-plan |
| `$W/iterations/judge_v*.md` | /research-review (the latest verdict must be PASS) |

**Verify PASS:** read the latest `judge_v*.md` and confirm `verdict: PASS`. If it is not, STOP.

## Output

| File | Content |
|------|---------|
| `$W/experiment_res.md` | Full experiment report (full training + ablations + supplementary experiments). |
| `$W/experiment_analysis/analysis_{N}.md` | Per-round analysis report (produced during iteration). |

---

## Workflow

### Step 1: Full training

Change the epoch count to the official value specified in `plan_res.md`. **Do not change code logic — only the epoch count.**

```bash
cd $W/project && source .venv/bin/activate
python3 run.py  # full epochs
```

Record the full training's `[RESULT]` output.

### Step 2: Analyze the results

Read the training output and assess:

- Final loss and metrics.
- Training-curve trend (does the loss keep decreasing?).
- Whether it overfits (train vs val gap).

### Step 3: Ablation studies

Following the ablation plan in `plan_res.md`, run 2–3 ablations.

For each ablation:

1. Modify the code (comment out / replace the corresponding component).
2. Run a 2-epoch quick check.
3. Record the result.

```bash
# Example: drop the attention module
python3 run.py --epochs 2 --ablation no_attention
```

### Step 4: Analysis → supplementary experiment iterations (2 rounds)

**⚠️ This is the Novix Exp Analyzer mechanism — analyze the existing results, propose supplementary experiments, run them, and analyze again.**

Loop **twice**:

#### 4.1 Analyze the current results

Read all current experiment results (full training + ablations) and write the analysis report to `$W/experiment_analysis/analysis_{N}.md`:

```markdown
# Experiment Analysis Round {N}

## Summary of current results
- Full training: {metrics}
- Ablation studies: {key findings}

## Issues or opportunities found
1. {observation} → suggestion: {experiment}
2. ...

## Supplementary experiment plan
| Experiment | Purpose | Modification | Expected result |
|------------|---------|--------------|-----------------|
| {exp_name} | {why} | {what to change} | {expected} |
```

Typical supplementary experiments (output of the **Novix Exp Analyzer**):

- **Sensitivity analysis**: effect of key hyperparameters (lr, hidden_dim, dropout).
- **Visualisations**: attention maps, embedding visualisations, training-curve comparisons.
- **Comparison experiments**: performance against baseline methods.
- **Robustness tests**: behaviour under different data scales / noise levels.

#### 4.2 Run the supplementary experiments

Following the plan in the analysis report, modify the code and run the supplementary experiments. **Only change experiment-related parameters / configuration; do not change core algorithm logic.**

```bash
cd $W/project && source .venv/bin/activate
python3 run.py --experiment {exp_name}
```

After recording the results, return to 4.1 for the next round (2 rounds total).

---

### Step 5: Write the final experiment report

Aggregate all results (full training + ablations + 2 rounds of supplementary experiments) and write `$W/experiment_res.md`:

```markdown
# Experiment Report

## Full Training Results (from execution log)
- Epochs: {N}
- [RESULT] train_loss={value}
- [RESULT] val_metric={value}
- [RESULT] elapsed={value}
- [RESULT] device={device}

> The numbers above come from real execution output.

## Training Analysis
- Convergence: {converged / still improving / diverged}
- Overfitting: {yes/no, evidence}

## Ablation Studies

| Experiment | Modification | val_metric | vs Full |
|------------|--------------|-----------|---------|
| Full model | — | {value} | baseline |
| No {component} | drop {X} | {value} | {-/+}% |
| ... | ... | ... | ... |

## Supplementary Experiments

### Sensitivity Analysis
| Hyperparameter | Value | val_metric | Notes |
|----------------|-------|-----------|-------|
| ... | ... | ... | ... |

### Comparison with Baselines
| Method | val_metric | Notes |
|--------|-----------|-------|
| Ours | {value} | — |
| {Baseline} | {value} | ... |

### Visualisations
- Training curve: `$W/project/figures/training_curve.png`
- {other visualisation}: `$W/project/figures/{name}.png`

## Conclusions
- {key findings from all experiments}

## Limitations
- {limitations and future work}
```

---

## Rules

1. Full training only changes the epoch count, not the code logic.
2. All numbers must come from real execution output.
3. At least two ablation experiments are required.
4. If full training fails (OOM, etc.), reduce batch size and retry — do not skip.
5. **The supplementary experiment iteration must be done twice (Novix Exp Analyzer mechanism)** — round 1 targets the initial results, round 2 targets the round-1 supplementary results.
6. Supplementary experiments do not modify the core algorithm; they only change experiment configuration / parameters / visualisation code.
