Revisiting the Receptive Field of Swin Transformer with Fourier Transformer
Keywords: Swin Transformer
Abstract: Transformer-based methods have shown exceptional performance in image recognition and restoration tasks, primarily due to their superior capability to model long-range dependencies compared to Convolutional Neural Network (CNN)-based approaches. Recent developments, exemplified by the Swin Transformer, have introduced a window-based and local attention strategy to balance performance and computational efficiency. However, these approaches limit the model's ability to efficiently capture global information and establish long-range dependencies in their early stages. In this study, we revisit the limitation of the receptive field in window-based Transformers for both high-level and low-level tasks. Additionally, we introduce the Spatial Frequency Block (SFB) based on Fast Fourier transform to enhance efficiency in capturing global information with minimal model parameters and computational requirements. We apply our method to existing state-of-the-art techniques, pushing them towards higher performance ceilings on multiple popular large-scale benchmarks. For example, in high-level tasks, our method achieves 82.1 top-1 accuracy on ImageNet-1K (image classification), 51.1 box AP and 44.8 mask AP on COCO (object detection), and 45.68 mIoU on ADE20K (semantic segmentation) - surpassing Swin-T by 0.8 top-1 accuracy, 0.6 box AP, 0.7 mask AP, and 1.17 mIoU, respectively. In a low-level task, our method achieves a PSNR of 32.24 dB on the Manga109 dataset (× 2) - outperforming SwinIR by 0.21 dB on image super resolution, a significant improvement. Furthermore, we demonstrate effectiveness and scalability across varied low-level tasks like image denoising, JPEG artifact reduction, stereo image super-resolution, and nighttime flare removal.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: true
Submission Guidelines: true
Anonymous Url: true
No Acknowledgement Section: true
Submission Number: 5604
Loading