ARCHITECTURE MATTERS: METAFORMER AND GLOBAL-AWARE CONVOLUTION STREAMING FOR IMAGE RESTORATION

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Image Restoration, Deblurring, Denoising, Transformers
TL;DR: We argue that the general architecture significantly impact the performance of existing Transformer-based image restoration methods and thus proposed a MetaFormer architecture with a global-aware convolution streaming.
Abstract: Transformer-based methods have sparked significant interest in this field, primarily due to their self-attention mechanism's capacity to capture long-range dependencies. However, existing transformer-based image restoration methods restrict self-attention on windows or across channels to avoid computational complexity explosion, limiting their ability to capture long-range dependencies. This leads us to explore the following question: Is the general architecture abstracted from Transformers significantly impact the performance of existing Transformer-based image restoration methods? To this end, we first analyze the existing attention modules and replace them with solely convolution modules, also known as convolution streaming. We demonstrate that these convolution modules deliver comparable performance with existing attention modules at the similar cost of computation burden. Our findings underscore the importance of the overall Transformer architecture in image restoration, motivating the principle of MetaFormer-a general architecture abstracted from transformer-based methods without specifying the feature mixing manners. To further enhance the capture of long-range dependencies within the powerful MetaFormer architecture, we construct an efficient global-aware convolution streaming module with Fourier Transform. Integrating the MetaFormer architecture and global-aware convolution streaming module, we achieves consistent performance gain on multiple image restoration tasks including image deblurring, image denoising, and image deraining, with even less computation burden.
Supplementary Material: pdf
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3532
Loading