Region-Aware Instruction-Guided Image Editing with Attention-Weighted Feature Fusion

Yan-Lin Zhu, Peipei Yang

Published: 02 Dec 2024, Last Modified: 29 Sept 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Recently, diffusion model-based techniques have been exploited to offer high-quality, stable, and controllable image editing. Existing methods, such as P2P, often require closely matched prompts for the original and target images, which limits their flexibility and applicability. As a result, the task of instruction-based image editing has emerged to address these limitations. However, existing instruction-based image editing models often suffer from over-editing. To address this issue, we propose a novel region-aware instruction-based image editing approach that leverages the inversion properties of diffusion models formulated as Ordinary Differential Equations (ODEs) and attention-weighted feature fusion techniques. Our approach uses attention maps obtained from the UNet of Stable-Diffusion as a soft mask to weight the features of the original image with denoised results during the reverse diffusion process, ensuring precise and relevant edits while keeping unrelated content unchanged. Compared to other diffusion model-based methods, our approach significantly reduces over-editing and maintains the integrity of image content and structure. This plug-and-play, training-free method effectively performs various editing tasks, providing a flexible and practical solution for instruction-based image editing.