Keywords: editing dataset, editing benchmark, editing model
TL;DR: This paper propose a large-scale, high-quality editing dataset, accompanied by a comprehensive benchmark, an advanced editing model, and an effective edit evaluator.
Abstract: Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks.
To overcome these limitations, we introduce **ImgEdit**, a large-scale, high-quality image-editing dataset comprising one million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks.
To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality.
Using ImgEdit, we train **ImgEdit-E1**, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design.
For comprehensive evaluation, we introduce **ImgEdit-Bench**, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation.
It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite.
We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models.
Croissant File: json
Dataset URL: https://huggingface.co/collections/sysuyy
Code URL: https://github.com/PKU-YuanGroup/
Primary Area: Datasets & Benchmarks for applications in computer vision
Flagged For Ethics Review: true
Submission Number: 534
Loading