UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Haozhe Zhao; Xiaojian Ma; Liang Chen; Shuzheng Si; Rujie Wu; Kaikai An; Peiyu Yu; Minjia Zhang; Qing Li; Baobao Chang

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang

Published: 26 Sept 2024, Last Modified: 13 Nov 2024NeurIPS 2024 Track Datasets and Benchmarks PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: image editing, text-to-image, instruction-based image editing, region-based image editing

TL;DR: We propose a dataset and a corresponding model that scales up fine-grained instruction-based image editing.

Abstract: This paper presents UltraEdit, a large-scale (~ 4M editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a *systematic* approach to producing massive and high-quality image editing samples: 1) UltraEdit includes more diverse editing instructions by combining LLM creativity and in-context editing examples by human raters; 2) UltraEdit is anchored on real images (photographs or artworks), which offers more diversity and less biases than those purely synthesized by text-to-image models; 3) UltraEdit supports region-based editing with high-quality, automatically produced region annotations. Our experiments show that canonical diffusion-based editing baselines trained on UltraEdit set new records on challenging MagicBrush and Emu-Edit benchmarks, respectively. Our analysis further confirms the crucial role of real image anchors and region-based editing data. The dataset, code, and models will be made public.

Supplementary Material: pdf

Submission Number: 991

Loading