WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting

WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting

TMLR Paper3591 Authors

29 Oct 2024 (modified: 18 Feb 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Inpainting, which refers to the synthesis of missing regions, can help restore occluded or degraded areas of an image and also serve as a precursor task for self-supervision of neural networks for computer vision. The current state-of-the-art models for inpainting are computationally heavy as they are based on transformer or CNN backbones that are trained in adversarial or diffusion settings. This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture---WavePaint. It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers. The proposed model outperforms the current state-of-the-art models for image inpainting on reconstruction quality while also using much fewer parameters and GPU RAM, and considerably lower training and evaluation times. Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator. This work suggests that neural architectures that are modeled after natural image priors require fewer parameters and computations to achieve better generalization.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Ran_He1

Submission Number: 3591

Loading