```bash
WANDB_NAME="ArcChallenge Gemma-4B-pt beamsearch" \
HYDRA_CONFIG=../examples/configs/polygraph_eval_arcchallenge.yaml \
python polygraph_eval.py \
    batch_size=1 subsample_eval_dataset=2000 \
    model=gemma_3 model.path=google/gemma-3-4b-pt \
    estimators=default_estimators +report_to_wandb=True
```

Models:
* `model=gemma_3 model.path=google/gemma-3-4b-pt`
* `model=gemma_3 model.path=google/gemma-3-4b-it`
* `model=bloomz-560m model.path=meta-llama/Llama-3.1-8B`
* `model=bloomz-560m model.path=meta-llama/Llama-3.1-8B-Instruct`
* `model=bloomz-560m model.path=meta-llama/Qwen/Qwen3-8B-Base`
* `model=bloomz-560m model.path=meta-llama/Qwen/Qwen3-8B`

Datasets:
* `../examples/configs/polygraph_eval_triviaqa.yaml`
* `../examples/configs/polygraph_eval_triviaqa_instruct.yaml`
* `../examples/configs/polygraph_eval_webq.yaml`
* `../examples/configs/polygraph_eval_webq_instruct.yaml`
* `../examples/configs/polygraph_eval_coqa.yaml`
* `../examples/configs/polygraph_eval_coqa_instruct.yaml`
* `../examples/configs/polygraph_eval_hotpotqa.yaml`
* `../examples/configs/polygraph_eval_hotpotqa_instruct.yaml`
* `../examples/configs/polygraph_eval_csqa.yaml`
* `../examples/configs/polygraph_eval_csqa_instruct.yaml`
* `../examples/configs/polygraph_eval_arcchallenge.yaml`
* `../examples/configs/polygraph_eval_arcchallenge_instruct.yaml`

Plot tables: `src/paper_plot_tables.ipynb`

