Keywords: text-guided image inpainting, diffusion models, high-resolution image inpainting
TL;DR: We present a text-guided image inpainting method which improves prompt alignment, prevents image quality loss and enables high-resolution inpainting
Abstract: Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results.
However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts.
Therefore, we introduce $\textit{ProFI-Painter}$, a $\textbf{training-free}$ approach that $\textbf{accurately follows prompts}$.
To this end, we design the $\textit{Prompt-Aware Introverted Attention (PAIntA)}$ layer enhancing self-attention scores by prompt information resulting in better text aligned generations.
To further improve the prompt coherence we introduce the $\textit{Reweighting Attention Score Guidance (RASG)}$ mechanism seamlessly integrating a post-hoc sampling strategy into the general form of DDIM to prevent out-of-distribution latent shifts.
Our experiments demonstrate that ProFI-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively across multiple metrics and a user study.
Code will be made public.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6902
Loading