Sketch-Guided Scene-Level Image Editing with Diffusion Models

Published: 01 Jan 2025, Last Modified: 25 Jul 2025CVM (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Sketch-based image editing allows for intuitive and flexible modification of image details, effectively improving editing efficiency and diversity. When performing the scene-level image editing task where sketches are employed to control multiple objects within the editing region, existing approaches using GAN or diffusion models face limitations in handling complex editing intentions, such as editing scene content with various object attributes including spatial layout, semantics, structure, and number of objects. The challenge lies in effectively utilizing the attributes of multi-objects in the sketch and mapping these sketch attributes to the image editing region. In this work, we propose a Sketch-guided Diffusion Model called SDM, which integrates a global-to-local conditioning strategy to maximize the utilization of each object instance’s attributes in the sketch. Specifically, this strategy incorporates a multi-instance guided cross-attention module and modifies attention maps with sketch masks, to help the model capture object semantics, structure, and quantity jointly. Additionally, we optimize the generation of the shared boundary region for overlapped objects to tackle the issue of ambiguous contours and semantics around the boundary. Then we introduce the multi-instance semantic loss to compensate for the diffusion model’s limitation of potential semantics comprehension in sketches. Extensive experiments with high-quality editing results show that the proposed method outperforms state-of-the-art methods in the sketch-guided scene-level image editing task.
Loading