# ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling

## About the Project

This repository provides **Bench4Opt**, an optimization modeling dataset, together with **ORGEval**, a graph-theoretic evaluation tool for verifying whether models generated by LLMs are correct.

---

## Benchmark Dataset

The benchmark dataset is stored in `/data/bench4opt_mix`, which includes:

- **`test.jsonl`** – Contains all Bench4Opt samples. Each entry includes:
  - **`id`**: Formatted as `BENCH4OPT_{problem_id}`  
  - **`data_path`**: Path to a randomly sampled data file for the problem  
  - **`problem`**: Word problem description  
  - **`reference_code`**: Reference Gurobi code (ground truth) for solving the problem  
  - **`reference_lp_path`**: Reference ground-truth model in `.lp` format (obtained by inserting data from `data_path` into `reference_code`)  
  - **`wp_type`**: Problem type indicator  
    - `""` → structured problem  
    - `"_concise"` → concise problem  

- **`lp_code/`** – Reference models in `.lp` format, named by problem ID  
- **`lp_data/`** – Sample test problem data, named by problem ID. A test instance can be generated by filling this data into `reference_code`.  

---

## Evaluation Tool

We provide an evaluation algorithm for testing LLMs on Bench4Opt.  
To run a full evaluation with GPT-4o, execute:

```bash
scripts/test/test_pipeline_gpt_4o_resume.sh
```

Outputs will be stored in `/outputs`.  
- Generated code → `/temp_code`  
- Generated `.lp` models → `/temp_lp`  
