Keywords: N-Gram, Single Image Super-Resolution, Swin Transformer, Efficiency, Deep Learning
TL;DR: In our efficient NGswin, N-Gram context is proposed for deep learning in single image super-resolution for the first in history.
Abstract: In single image super-resolution (SISR), many deep learning-based methods suffer from intensive computational operations. In addition, while Swin Transformer-based methods such as SwinIR established state-of-the-art results, they still hold the problem of ignoring the broad regions when computing window self-attention (WSA) to reconstruct high-frequency information. In this paper, we propose the efficient NGswin network, which is the first attempt in history to introduce N-Gram to deep learning in images. For text analysis, N-Gram is a sequence of consecutive characters or words, but in an image, we define N-Gram as neighboring local windows (in WSA of Swin Transformer) which interact with each other by sliding-WSA. We propose N-Gram interaction, SCDP bottleneck, and a pooling-cascading mechanism, which enable the network to consider broad regions beneficial to recovering the degraded neighbor pixels. Moreover, we employ a hierarchical encoder with patch-merging, uni-Gram embedding, and a compact decoder to NGswin to enhance the network efficiency. Experimental results show that the proposed model achieves competitive performance in terms of PSNR and SSIM scores with fewer operations (Mult-Adds) compared to other methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
1 Reply
Loading