Keywords: Vector-Quantization, Image Restoration
Abstract: Vector-Quantization (VQ) generative models are widely used to learn a high-quality (HQ) codebook and a decoder as powerful generative priors for blind image restoration (BIR). In this paper, we revisit the key VQ process in VQ-based BIR methods, and provide three close observations on the side effects of VQ for code index prediction: 1) confining the representational capability of HQ codebook, 2) being error-prone on code index prediction, and 3) under-valuing the low-quality (LQ) feature for BIR. These observations motivate us to replace discrete VQ selection by continuous feature transformation from input LQ image to output HQ image with the HQ codebook. To this end, in this paper, we propose a new Self-in-Cross-Attention (SinCA) module to augment the HQ codebook with the LQ feature of input LQ image and perform cross-attention between LQ feature and input-augmented codebook. In this way, our SinCA extends the representational capability of the HQ codebook and effectively leverages the self-expressiveness property of input LQ image. Experiments on four typical VQ-based BIR methods demonstrate that, by replacing the VQ process with transformers using our SinCA, they achieve better quantitative and qualitative performance on blind image super-resolution and blind face restoration. The code will be publicly released.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4290
Loading