Multi-Scale Window based Transformer Network for High Quality Image Inpainting

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Image inpainting, Image completion, Transformer, Multi-scale window, Polarized self-attention, Mask updating
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: To achieve effective image inpainting, it is crucial for the model to understand contextual information. Previous studies using CNN-based algorithms have encountered limitations due to the absence of long-range dependencies, which resulted in the model's inability to capture contextual information. In this paper, we propose a Multi-Scale Window-based Transformer model for high-quality image inpainting. We introduce a transformer network with multi-scale windows to capture the influence of different window sizes and gather significant contextual information. To effectively integrate features processed through self-attention, we modified the polarized self-attention network to align with the dimensions of the multi-window scale. We also propose the Selective Mask Update method, which captures vital information from features processed by self-attention, enabling the generation of higher-quality results. Experiments show that it effectively fills in missing areas and demonstrates superior performance on the benchmark dataset compared to other models.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7908
Loading