# MolecularIQ Submission Bundle

This directory packages the pieces needed to evaluate MolecularIQ with the streamlined reward system. Copy the contents of `submission_lm_eval/` into the root of a fresh `lm-evaluation-harness` checkout (or merge them with an existing tree).

## Included Components

```
submission_lm_eval/
├── evaluate_model.py          # CLI wrapper to run evaluations from YAML configs
├── configs/                   # Model configs (backend args + chemistry prompts)
├── data/moleculariq/          # Local copy of the evaluation dataset
└── lm_eval/tasks/chemsets/
    ├── common/                # Loader utilities (config resolver, prompts)
    ├── moleculariq/
    │   ├── task_processor.py
    │   ├── molecular_iq_pass_at_k.yaml
    │   └── rewards/           # Reward dispatcher and helpers
    └── utils.py               # Shared chemistry preprocessing/extraction helpers
```

## Integration Steps

1. **Copy files** into the harness root:
   ```bash
   cp -r submission_lm_eval/* /path/to/lm-evaluation-harness/
   ```
2. **Dataset**: a local copy (`data/moleculariq/`) is included and the YAML already points to it. Replace the folder or update the path if you prefer a different dataset source.
3. **Install dependencies**: `vllm`, `transformers`, OpenAI/Anthropic SDKs (as needed), plus `rdkit`. Ensure GPUs are available for vLLM models.
4. **Run an evaluation** with `evaluate_model.py`:
   ```bash
   python evaluate_model.py \
     --model_config configs/qwen3-06b.yaml \
     --task moleculariq_pass_at_k \
     --output_dir results/
   ```
   Add overrides as needed, e.g. `--set model_args.tensor_parallel_size=2`.

## Notes

- Reward logic lives in `chemsets/moleculariq/rewards/`;
- All configs specify full `gen_kwargs`, so task defaults will not override sampling behaviour.
- `chem_model_config` in each YAML references the same file, so prompts and extraction hooks come from a single source.
- Inline prompt usage is logged on first application; set `LM_EVAL_DISABLE_INLINE_PROMPT=true` to disable for debugging.

Refer to the main harness docs for additional datasets/backends; this bundle only contains what is required for MolecularIQ.
