ALIGNEDEDIT: PROMPT-ALIGNED WEAK GUIDANCE FOR TEXT-GUIDED IMAGE EDITING

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: diffusion models, image editing, semanticediting, CFG guidance, sampling
TL;DR: Substitute for CFG guidance fo text based image editing, using the semantic weak model.
Abstract: Text-guided image editing has advanced rapidly, yet most approaches still rely on classifier-free guidance. We introduce ALIGNEDEDIT for semantic image editing, employing semantically weak guidance to produce natural edits that align with the instruction prompt. CFG is a de facto standard guidance technique that use an uncondition model to steer sampling toward the positive condition and amplify its signal. However, this mechanism that use condition and uncondition model that misalend in semantic space induce over-editing, artifacts, and unintended changes. ALIGNEDEDIT employs aligned yet semantically weak guidance, preventing error accumulation and producing faithful edits without unintended modifications, resulting in a more natural appearance. To obtain an aligned yet semantically weak model, ALIGNEDEDIT identifies semantically strong tokens in each attention block and attenuates their embeddings to reduce semantic strength. Because the semantically weak model is derived directly from the model itself, no explicit negative prompt is required, making the method substantially less sensitive to prompt choice. We apply our guidance to two diffusion-based editing models, CosXL and Kontext. Across diverse benchmarks of Emu-Edit for real-image editing, HQ-Edit for synthetic editing, and ImgEdit-Bench for multi-turn editing, our method yields edits that are more natural and more faithfully aligned with the prompt.
Supplementary Material: pdf
Primary Area: generative models
Submission Number: 9678
Loading