Mutually guided learning of global semantics and local representations for image restoration

Published: 01 Jan 2024, Last Modified: 02 Mar 2025Multim. Tools Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The global semantics and the local scene representation are crucial for image restoration. Although existing methods have proposed various hybrid frameworks of convolutional neural networks (CNNs) and Transformers to take into account both, they only focus on the complementarity of their capabilities. On the one hand, these works neglect the mutual guiding role of the two information, and on the other hand, they also ignore that the semantic gap caused by the two different modeling systems of convolution and Self-Attention seriously impede the feature fusion. In this work, we propose to establish entanglement between the global and the local to bridge the semantic gap and achieve mutual-guided modeling of the two features. In the proposed hybrid framework, the modeling of convolution and Self-Attention is no longer independent of each other, but through the proposed Mutual Transposed Cross Attention (MTCA), the mutual dependence of the two is realized, thereby strengthening the joint modeling of local and global. Further, we propose Bidirectional Injection Module (BIM), which makes the global and local features adapt to each other in parallel before fusion and greatly reduces interference in the fusion process caused by semantic gap. The proposed method is qualitatively and quantitatively evaluated on multiple benchmark datasets, and extensive experiments show that our method reaches the state-of-the-art with low computational consumption.
Loading