DisCrossFormer: A Deep Light-Weight Image Super Resolution Network Using Disentangled Visual Signal Processing and Correlated Cross-Attention Transformer Operation

Alireza Esmaeilzehi, Mohammad Javad Rajabi, Hossein Zaredar, M. Omair Ahmad

Published: 2024, Last Modified: 07 Mar 2025MLSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep neural networks that employ transformer operations have provided state-of-the-art performances for the task of image super resolution (SR). However, processing the non-local information in the visual signals by the transformers often involves increasing the network complexity. In order to develop a light-weight SR network that can process non-local information for providing superior performance, in this paper, we propose the correlated cross-attention operation. Further, we design a novel overall architecture for our SR network, which processes the disentangled information of the low-resolution images based on the presence of various objects in the visual signals. Disentangling the information of the input low-resolution images facilitates learning by paying more attention to processing a certain number of objects (and not all of them) in the visual signals at a time. The results of various experiments show the effectiveness of both ideas in generating super-resolved images with higher qualities.