Grounded-Instruct-Pix2Pix: Improving Instruction Based Image Editing with Automatic Target Grounding

Artur Shagidanov, Hayk Poghosyan, Xinyu Gong, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

Published: 2024, Last Modified: 02 Mar 2026ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Text-guided Image Editing has recently attracted significant attention due to advances in the denoising diffusion models field. Current methods make it possible to execute complex image editing operations with simple text prompts. But despite impressive results, they often fail to restrict the edit area to only the object of interest, specified in the text prompt. To this end, we propose a novel framework we name Grounded-Instruct-Pix2Pix, which is capable of localized instruction-guided image editing in various scenarios including multi-object cases and complex backgrounds. Our experiments on a diverse set of images clearly showcase its advantage over the recent state-of-the-art approaches, especially at restricting the editing effect to the area of interest only. Grounded-Instruct-Pix2Pix implementation will be available at https://github.com/arthur-71/Grounded-Instruct-Pix2Pix.

External IDs:dblp:conf/icassp/ShagidanovPGWNS24