Transformer-Style Convolutional Network for Efficient Natural and Industrial Image Superresolution

Xiao Liu, Zhengyong Wang, Hong Yang, Xiaohai He, Haosong Gou, Chao Ren

Published: 01 Jan 2025, Last Modified: 06 Nov 2025IEEE Trans. Ind. Informatics 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Single image superresolution (SISR) is a critical task in computer vision with significant applications in both natural and industrial contexts. Although transformer-based approaches for SISR have achieved notable progress due to their exceptional representational capabilities, their quadratic computational complexity poses challenges for deployment on devices with limited resources. Conversely, convolutional networks (ConvNets) are inherently efficient but have difficulty capturing long-range pixel relationships because of their focus on spatial locality. This gives rise to a complementary relationship between the representational power of transformers and the efficiency of ConvNets, both of which are essential for practical applications. Motivated by this, in this article, we introduce TSCN, a novel transformer-style ConvNet. Our analysis highlights the strengths of transformers, including large-range dependencies modeling, two-order features interaction, input self-adaptation, and incorporating advanced components. Based on these insights, we guide the design of ConvNets to fully exploit these characteristics. Specifically, we rethink spatial convolution to enhance the modeling of spatial features and modify the macrostructure of the transformer by replacing self-attention and feed-forward network with the large-range multiorder convolution modulation (LMCM) layer and spatial awareness dynamic feature flow (SADFF) layer. The LMCM integrates reweighting into the large-range convolutional modulation technology, allowing self-adaptive recalibration of input representations using convolutional features as weight matrices and multiorder features interaction. In addition, the SADFF introduces spatial awareness, locality, and dynamic information flow modulation between layers. Experimental results demonstrate that our TSCN outperforms the state-of-the-art method SRFormer on multiple benchmarks by 0.03$\sim$ 0.17 dB, while using fewer parameters and computations.

External IDs:dblp:journals/tii/LiuWYHGR25