Abstract: Although transformer-based methods excel in stereo image super-resolution, the full potential of the distinctive, complementary information inherent in stereo images has not been fully utilized. We propose a Disparity-Guided Cross-View Transformer (DCT) to extract features across dimensions and views, achieving a more comprehensive feature representation. The proposed method introduces mutual attention within the transformer architecture, establishing the difference between left and right views through cross-view interaction. The proposed algorithm effectively harnesses the complementary information present in stereo image pairs, enhancing the restoration performance. Furthermore, we propose a disparity-guided cross-modal residual fusion module that leverages disparity information as prior knowledge to substantially improve image reconstruction. This module significantly complements the missing information in stereo images, enabling the network to comprehend more effectively and reconstruct the image content with greater accuracy. Extensive experimental results and ablation studies demonstrate the effectiveness of our method.
External IDs:dblp:conf/icassp/LiSGW0Y25
Loading