# GIE‑Bench 

*GIE‑Bench* ( **G**rounded Evaluation for Text-Guided **I**mage **E**diting) is a curated dataset for assessing text‑guided image‑editing models along two complementary axes:

| Axis                          | Metric(s)                                | What it measures                                                |
| ----------------------------- | ---------------------------------------- | -------------------------------------------------------------- |
| **Functional Correctness**    | Multiple‑choice QA via GPT‑4o            | Did the edit satisfy the instruction?                          |
| **Content Preservation**      | CLIP‑Sim, SSIM, MSE, PSNR (masked)       | How well are unedited regions preserved?                       |


---

## 📂 Repository Layout
```
GIE‑Bench/
├── images2000urls/             # URL list + download helper
│   └── download_images_from_urls.py
├── evaluation_script/         # Automated evaluation
│   ├── GPT‑4o_VQA_evaluation.py
│   ├── masked_clip_ssim_evaluation.py
│   ├── masked_mse_evaluation.py
│   └── masked_psnr_evaluation.py
├── evaluation_script/         # Automated score report
│   ├── MSE_stats.py
│   ├── PSNR_stats.py
│   ├── SSIM_stats.py
│   └── VQA_stats.py
├── gie_bench.json.zip          # Zipped benchmark file (≈2 MB JSON)
└── README.md                   # You’re here
```

---


## 📥 Downloading the Benchmark

1. **Raw images**

   ```bash
   python images2000urls/download_images_from_urls.py 
   ```

2. **Benchmark JSON**

   ```bash
   unzip gie_bench.json.zip      # produces gie_bench.json
   ```

---

## 🚀 Running Your Model on GIE‑Bench

1. **Inference**

   - Load `gie_bench.json`.
   - For each entry, generate an edited image for input image `image`, following edit instruction `edit_instruction`.
   - Save the edited image **locally** and write the file path back to the same entry under the key `edited_image_path`.

   ```python
   entry["edited_image_path"] = f"outputs/{entry_id}.png"
   ```

2. **Save the modified benchmark**

   ```python
   with open("results/my_model_output.json", "w") as f:
       json.dump(data, f, indent=2)
   ```

---

## 🧪 Evaluation

### 1. Functional Correctness (GPT‑4o)

```bash
python evaluation_script/GPT-4o_VQA_evaluation.py #for all evaluation code, you will need to modify path path to yours
```

### 2. Content Preservation

```bash
# CLIP + SSIM (masked)
python evaluation_script/masked_clip_ssim_evaluation.py

# MSE (masked)
python evaluation_script/masked_mse_evaluation.py

#PSNR (masked)
python evaluation_script/masked_psnr_evaluation.py
```

Each script appends score fields to a new JSON, preserving your original file.

---

## 📊 Reporting Results

Scripts to summarize evaluation scores are in evaluation_script/report_result

---


## 📄 License

This project is distributed under the **MIT License**. 

---

*Happy editing & benchmarking!* 🎨
