Fill with Anything: High-Resolution and Prompt-Faithful Image Completion

Hayk Manukyan; Andranik Sargsyan; Barsegh Atanyan; Zhangyang Wang; Shant Navasardyan; Humphrey Shi

Fill with Anything: High-Resolution and Prompt-Faithful Image Completion

Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: text-guided inpainting, diffusion inpainting, reweighting attention score guidance, prompt-aware introverted attention, RASG, PaIntA, conditional super-resolution, classifier guidance, classifier-free guidance, introvert attention, diffusion models

Abstract: Building on the achievements of text-to-image diffusion models, recent advancements in text-guided image inpainting have yielded remarkably realistic and visually compelling outcomes. Nevertheless, current text-to-image inpainting models leave substantial room for enhancement, particularly in addressing the often inadequate alignment of user prompts with the inpainted region, and in extending applicability to high-resolution images. To this end, this paper introduces an entirely $\textbf{training-free}$ approach that $\textbf{faithfully adheres to prompts}$ and seamlessly $\textbf{scale to high-resolution}$ image inpainting. To achieve this, we first present the Prompt-Aware Introverted Attention (PAIntA) layer, which enriches self-attention modules by incorporating prompt information derived from cross-attention scores, alleviating the visual context dominance in inpainting caused by all-to-all attention. Furthermore, we introduce the Reweighting Attention Score Guidance (RASG) mechanism, which directs cross-attention scores towards improved textual alignment while preserving the generation domain. In addition, to address inpainting at larger scales, we introduce a specialized super-resolution technique tailored for inpainting, enabling the completion of missing regions in images of up to 2K resolution. Experimental results demonstrate that our proposed method surpasses existing state-of-the-art approaches in both qualitative and quantitative measures, achieving a substantial generation accuracy improvement of $\textbf{61.4\%}$ compared to $\textbf{51.9\%}$. Our codes will be open-sourced.

Supplementary Material: pdf

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7482

Loading