# Graph-GRPO (Flow-GRPO / Graph-GRPO)

This README keeps only the "how to run" content: training, test-only, mol_opt evaluation, batch evaluation, and screen mode.
All absolute paths below are placeholders and should be replaced with local paths on your machine.

## 0. Directories and Default Paths

- Repo path (example): `/path/to/RL_Graph_Generation/`
- `hydra.run.dir` defaults to `../outputs/...`, so running from the repo root writes to: `/path/to/outputs/...`
- Checkpoint root (example): `/path/to/checkpoints/`

## 1. Training (Flow-GRPO)

Entry: `src/train_flow_grpo.py`

### 1.1 Single-task training (ZINC / PMO / TDC style tasks)

From the repo root:
```bash
CUDA_VISIBLE_DEVICES=3 python src/train_flow_grpo.py +experiment=zinc dataset=zinc +grpo=deco_hop
```

Notes:
- `+grpo=median2` loads `configs/grpo/median2.yaml`
- For other tasks, replace `median2` with the corresponding filename in `configs/grpo/*.yaml` (without `.yaml`).

### 1.2 The 23 tasks (task name list)

Batch evaluation scripts (and screen CSV column names) use this task list:
```
albuterol_similarity, amlodipine_mpo, celecoxib_rediscovery, deco_hop, drd2,
fexofenadine_mpo, gsk3b, isomers_c7h8n2o2, isomers_c9h10n2o2pf2cl, jnk3,
median1, median2, mestranol_similarity, osimertinib_mpo, perindopril_mpo,
qed, ranolazine_mpo, scaffold_hop, sitagliptin_mpo, thiothixene_rediscovery,
troglitazone_rediscovery, valsartan_smarts, zaleplon_mpo
```

## 2. Test-only (quick sampling / quick validation)

Entry: `src/train_flow_grpo.py` (it switches to test-only when `general.test_only` is set)

Example (sample + evaluate from a checkpoint; outputs go to the Hydra output dir):
```bash
CUDA_VISIBLE_DEVICES=0 python src/train_flow_grpo.py +experiment=tree dataset=tree +grpo=tree \
  general.test_only=/path/to/checkpoints/tree.ckpt
```

Common output files (under the Hydra output dir):
- `test_epoch*_res_*.txt`: sampling + evaluation summary
- For goal-directed rewards: `best_molecules_<task>.txt` (Top100 with scores)

## 3. mol_opt evaluation (Graph-GRPO)

Goal: reuse the official `mol_opt` runner + oracle wrapper. Budget, oracle call counting, Top-10 curves, and AUC@10 are handled by `mol_opt`. This repo only provides the proposer (candidate generation + refine) and does not call the oracle internally.

Prerequisites:
- `mol_opt` repo is adjacent: `../mol_opt/` (or set `MOLOPT_REPO`)
- Two conda envs (default names; can be overridden by env vars):
  - `MOLOPT_CONDA_ENV=molopt`
  - `RL_GRAPH_CONDA_ENV=defog`

### 3.1 Single-task evaluation

```bash
./scripts/run_mol_opt_graph_grpo.sh /path/to/checkpoints/valsartan_smarts.ckpt valsartan_smarts 10000 0
```

### 3.2 Batch evaluation for the 23 tasks (auto-find ckpt by task name)

If your ckpts are stored under `/path/to/checkpoints/` and named by task:
```bash
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints/zinc.ckpt 10000 0
```

Batch output folder (per-task dirs, Top-10 plots, mol_opt progress CSVs, and batch summary CSV):
- Default: `mol_opt/main/graph_grpo/results/batch_<timestamp>_seed<seed>_budget<max_oracle_calls>/`
- Or set manually: `GRAPH_GRPO_BATCH_OUTPUT_DIR=/path/to/graph_grpo_eval_runs/run1`

Batch summary CSV:
- A summary file is written under the batch folder: `batch_summary_seed<seed>_budget<max_oracle_calls>.csv`
  - Columns: `task,seed,top1,top10,auc10`

Checkpoint lookup rules for each task `<task>`:
- First: `<CKPT_ROOT>/<task>.ckpt`
- Second: `<CKPT_ROOT>/<task>` (if it is already a file)
- Third: latest `*.ckpt` under `<CKPT_ROOT>/<task>/`

mol_opt output directory:
- Default: `mol_opt/main/graph_grpo/results/<oracle>_<seed>/`
- The proposer also writes `top01_*.png`...`top10_*.png` + `top10_smiles.txt` on shutdown.

### 3.3 AUC@10 acceleration (early stop when Top-10 stalls)

If you only care about `auc_top10`, you can early-stop when **`avg_top10` stops improving for N oracle calls** and extend the plateau to the full budget to compute AUC@10 (saves time).

Enable it by setting `grpo.early_stop_patience` in the task config (e.g., `10000`):
```bash
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints/zinc.ckpt 10000 0
```

Optional env vars:
- `GRAPH_GRPO_EARLY_STOP_EPS`: threshold to consider "changed" (default `0`)
- `GRAPH_GRPO_EARLY_STOP_MIN_CALLS`: minimum oracle calls before early-stop is allowed (default `0`)

## 4. Screen mode (use a precomputed CSV for Round0 initialization)

Use case: you already computed scores for all ZINC250k molecules and want to seed refine with "best in dataset" without replacing Round0 samples.

CSV requirements:
- Must include a `smiles` column
- Must include task columns (e.g., `median2`, `valsartan_smarts`, etc.)

Usage (with mol_opt batch eval):
```bash
GRAPH_GRPO_SCREEN_MODE=1 \
GRAPH_GRPO_SCREEN_CSV=/path/to/zinc250k.csv \
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints/zinc.ckpt 10000 0
```

Optional env vars:
- `GRAPH_GRPO_SCREEN_COLUMN`: which column to score (default: current task `cfg.grpo.target_task`)
- `GRAPH_GRPO_SCREEN_TOPK`: top-k size (default: `grpo.round0_samples`)
- `GRAPH_GRPO_SCREEN_CACHE_DIR`: cache dir (default: `.graph_grpo_screen_cache/` alongside the CSV)

Notes:
- Screen mode still samples `round0_samples`; it only adds "best in dataset" if it beats the best Round0 sample (ties favor the sampled graph).
- Screen mode does not replace Round0 samples; oracle calls are still made by mol_opt and still consume budget.

## 5. Independent evaluation (GDPO eval, runs inside this repo)

Use case: sample molecules and evaluate with GDPO docking without mol_opt.

Basic usage (recommended: include the same experiment/dataset configs used in training):
```bash
PYTHONPATH=/path/to/RL_Graph_Generation python src/eval_grpo_sampler.py gdpo_eval \
  --ckpt /abs/path/to/your.ckpt \
  --grpo-config configs/grpo/lead_opt_parp1.yaml \
  --extra-config configs/experiment/zinc.yaml \
  --extra-config configs/dataset/zinc.yaml
```

Common args:
- `--num-samples 2048`: number of samples (default: `grpo.gdpo_eval_samples` or 512)
- `--device cuda:0`: evaluation device

Parallel docking (evaluation is parallelized as well):
- `grpo.num_reward_workers`: number of dock workers
- `grpo.gdpo_dock_cpu_per_worker`: CPU cores per worker

Eval-only (no screen seed injection):
```bash
GRAPH_GRPO_SCREEN_MODE=0 PYTHONPATH=/path/to/RL_Graph_Generation \
python src/eval_grpo_sampler.py gdpo_eval --ckpt ... --grpo-config ...
```

Three runs (seed=0/1/2, outputs to `outputs`):
```bash
OUT=/path/to/outputs/gdpo_eval_results/lead_opt_braf_$(date +%Y%m%d_%H%M%S)
for seed in 0 1 2; do
  GRAPH_GRPO_SCREEN_MODE=1 PYTHONPATH=/path/to/RL_Graph_Generation CUDA_VISIBLE_DEVICES=0 \
  python src/eval_grpo_sampler.py gdpo_eval \
    --ckpt /path/to/checkpoints/lead_opt_braf.ckpt \
    --grpo-config configs/grpo/lead_opt_braf.yaml \
    --extra-config configs/experiment/zinc.yaml \
    --extra-config configs/dataset/zinc.yaml \
    --num-samples 2048 \
    --seed $seed \
    --out-dir $OUT
done
```

Aggregate logs:
```bash
python scripts/agg_gdpo_eval.py \
  --log /path/to/outputs/gdpo_eval_results/lead_opt_jak2_20260120_170638/evaluation_dictzinc.log \
  --target jak2 \
  --last 3
```

## 6. Ablations (6 groups)

All experiments run 3 seeds (0/1/2) and aggregate `top1/top10/auc10` into a single CSV.

Common settings (example):
- `GRAPH_GRPO_SEEDS=0,1,2`
- Output summary CSV: `batch_<timestamp>_seeds0_1_2_budget<budget>/batch_summary_seeds0_1_2_budget<budget>.csv`

Key switches:
- naive defog (use CLI ckpt; overrides `configs/grpo/*.yaml` `pretrained_checkpoint`)  
  `GRAPH_GRPO_USE_DEFAULT_CKPT=1`
- Graph-RL (read task ckpts from `/path/to/checkpoints/`)  
  **Do not set** `GRAPH_GRPO_USE_DEFAULT_CKPT`
- w/o refine (sample until budget is exhausted)  
  `GRAPH_GRPO_DISABLE_REFINE=1`
- with refine  
  unset or `GRAPH_GRPO_DISABLE_REFINE=0`
- w/o screen  
  `GRAPH_GRPO_SCREEN_MODE=0`
- with screen  
  `GRAPH_GRPO_SCREEN_MODE=1` + `GRAPH_GRPO_SCREEN_CSV=...` (optional `GRAPH_GRPO_SCREEN_COLUMN`)

> Note: naive defog uses the single ckpt path passed by `--batch`, which overrides `configs/grpo/*.yaml` `pretrained_checkpoint`.

### 6.1 naive defog

1) w/o refine + w/o screen
```bash
CUDA_VISIBLE_DEVICES=2 \
GRAPH_GRPO_EVAL_BATCH_SIZE=7000 \
GRAPH_GRPO_SEEDS=0,1,2 \
GRAPH_GRPO_DISABLE_REFINE=1 \
GRAPH_GRPO_SCREEN_MODE=0 \
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints/zinc.ckpt  10000
```

2) with refine + w/o screen
```bash
GRAPH_GRPO_SEEDS=0,1,2 \
GRAPH_GRPO_USE_DEFAULT_CKPT=1 \
GRAPH_GRPO_DISABLE_REFINE=0 \
GRAPH_GRPO_SCREEN_MODE=0 \
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints/zinc.ckpt 10000 0
```

3) with refine + with screen
```bash
GRAPH_GRPO_SEEDS=0,1,2 \
GRAPH_GRPO_USE_DEFAULT_CKPT=1 \
GRAPH_GRPO_DISABLE_REFINE=0 \
GRAPH_GRPO_SCREEN_MODE=1 \
GRAPH_GRPO_SCREEN_CSV=/path/to/zinc250k.csv \
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints/zinc.ckpt 10000 0
```

### 6.2 Graph-RL (ckpt from `/path/to/checkpoints/`)

4) w/o refine + w/o screen
```bash
GRAPH_GRPO_SEEDS=0,1,2 \
GRAPH_GRPO_DISABLE_REFINE=1 \
GRAPH_GRPO_SCREEN_MODE=0 \
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints 10000
```

5) with refine + w/o screen
```bash
GRAPH_GRPO_SEEDS=0,1,2 \
GRAPH_GRPO_DISABLE_REFINE=0 \
GRAPH_GRPO_SCREEN_MODE=0 \
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints 10000 0
```

6) with refine + with screen
```bash
GRAPH_GRPO_SEEDS=0,1,2 \
GRAPH_GRPO_DISABLE_REFINE=0 \
GRAPH_GRPO_SCREEN_TOPK=10 \
GRAPH_GRPO_SCREEN_MODE=1 \
GRAPH_GRPO_SCREEN_CSV=/path/to/zinc250k.csv \
./scripts/run_mol_opt_graph_grpo.sh --batch /path/to/checkpoints 10000
```
