Abstract: Inpainting, which refers to the synthesis of missing regions, can help restore occluded or degraded areas of an image and also serve as a precursor task for self-supervision of neural networks for computer vision. The current state-of-the-art models for inpainting are computationally heavy as they are based on transformer or CNN backbones that are trained in adversarial or diffusion settings. This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture---WavePaint. It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers. The proposed model outperforms the current state-of-the-art models for image inpainting on reconstruction quality while also using much fewer parameters and GPU RAM, and considerably lower training and evaluation times. Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator. This work suggests that neural architectures that are modeled after natural image priors require fewer parameters and computations to achieve better generalization.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ran_He1
Submission Number: 3591
Loading