# MolLangBench Code Repository

This code repository is designed to work with Hugging Face datasets, but can be easily modified to load our supplementary dataset provided in this submission.

## Directory Structure

```
MolLangBench/
├── scripts/
│   ├── create_prompts.py
│   ├── create_openai_jobs.py
│   ├── submit_openai_jobs.py
│   └── evaluate_results.py
├── prompts/
│   ├── editing/
│   ├── generation/
│   └── recognition/
│       ├── aldehyde/
│       ├── amide/
│       ├── benzene/
│       ├── bond_connections/
│       ├── bond_stereo/
│       ├── carboxyl/
│       ├── chiral_stereo/
│       ├── ester/
│       ├── furan/
│       ├── halogen_atoms/
│       ├── ketone/
│       ├── one_hop_neighbors/
│       ├── pyridine/
│       ├── quaternary_carbons/
│       ├── ring_junctions/
│       ├── thiophene/
│       ├── three_hop_neighbors/
│       └── two_hop_neighbors/
├── Miscellaneous/
│   ├── mathpix/
│   ├── rdkit_recognition_tasks/
│   │   └── ... (18 RDKit utility scripts)
│   └── send_gpt_image_1/
├── requirements.txt
└── readme.md
```

## Getting Started

Install dependencies:
```bash
pip install -r requirements.txt
```

## Usage

We provide end-to-end scripts for:

1. Generating prompt files (`.jsonl`)
2. Creating batch job inputs for the OpenAI API
3. Submitting jobs & retrieving outputs
4. Computing evaluation metrics

Below is a step-by-step example workflow for the **"one hop neighbors"** recognition subtask.

---

### 1. Prepare Prompts

Prompt templates for all tasks and modalities (SMILES and image) are located in the [`prompts`](./prompts/) folder. To generate a `.jsonl` prompt file, run:

```bash
python scripts/create_prompts.py \
    --task_type <recognition|editing|generation> \
    --recognition_subtask <recognition_subtask_name_if_applicable> \
    --modality <smiles|image> \
    --split <train|test> \
    --output_file <output_jsonl_path>
````

**Example:**
For the "one hop neighbors" recognition subtask (SMILES modality, test split):

```bash
python scripts/create_prompts.py \
    --task_type recognition \
    --recognition_subtask one_hop_neighbors \
    --modality smiles \
    --split test \
    --output_file exps/one_hop_neighbors/prompts.jsonl
```

> For image modality, the image is included as a Base64-encoded string in the `.jsonl` file.

---

### 2. Create OpenAI Batch Job File

Generate a batch job file for your prompts and desired model:

```bash
python scripts/create_openai_jobs.py \
    --prompt_file <prompts_jsonl_path> \
    --output_file <batch_job_jsonl_path> \
    --model <model_id> \
    --custom_id_prefix <optional_prefix>
```

**Example:**
Using the `o4-mini` model for "one hop neighbors":

```bash
python scripts/create_openai_jobs.py \
    --prompt_file exps/one_hop_neighbors/prompts.jsonl \
    --output_file exps/one_hop_neighbors/o4-mini/batch_input.jsonl \
    --model o4-mini \
    --custom_id_prefix o4_mini
```

---

### 3. Submit Jobs & Retrieve Outputs

Submit the batch job to the OpenAI API:

```bash
python scripts/submit_openai_jobs.py submit \
    --jobs_file <batch_job_jsonl_path> \
    [--api_key YOUR_API_KEY] \
    [--organization YOUR_ORG_ID]
```

> You can also set your API key as an environment variable. The organization ID is optional.

**Example:**

```bash
python scripts/submit_openai_jobs.py submit \
    --jobs_file exps/one_hop_neighbors/o4-mini/batch_input.jsonl
```

This command will print a `batch_id` for your job.

To retrieve the results (periodically checks until the job is complete):

```bash
python scripts/submit_openai_jobs.py retrieve \
    --batch_id <BATCH_ID> \
    --output_file <results_jsonl_path> \
    [--api_key YOUR_API_KEY] \
    [--organization YOUR_ORG_ID] \
    [--check_interval 60]
```

**Example:**

```bash
python scripts/submit_openai_jobs.py retrieve \
    --batch_id BATCH_ID \
    --output_file exps/one_hop_neighbors/o4-mini/results.jsonl
```

---

### 4. Evaluate the Results

Evaluate the model outputs with:

```bash
python scripts/evaluate_results.py \
    --results_file <results_jsonl_path> \
    --task_type <recognition|editing|generation> \
    --subtask <subtask_name> \
    --modality <smiles|image>
```

**Example:**

```bash
python scripts/evaluate_results.py \
    --results_file exps/one_hop_neighbors/o4-mini/results.jsonl \
    --task_type recognition \
    --subtask one_hop_neighbors \
    --modality smiles
```
This will print out the evaluation metrics for your selected task and model.
> The default result tags are `<count>` and `<atom_indices>`. For certain tasks, you may need to specify custom result tags using the `--result_1_tag <result_1_tag>` argument.

---

## Miscellaneous

The [`Miscellaneous`](./Miscellaneous) folder contains helpful scripts and utilities, including:

1. **Ground Truth Collection**  
   Scripts for collecting ground truth information for each recognition task using RDKit.

2. **Image-to-SMILES Conversion**  
   Scripts to call the Mathpix API for converting molecule images to SMILES strings for automated evaluation.

3. **Per-Image OpenAI API Submission**  
   Scripts to submit image generation and editing requests (for the image modality) to the OpenAI API one-by-one, as batch jobs are not currently supported for image tasks.
