Keywords: Stereo Image Super-resolution, Differential Transformer
Abstract: In the field of computer vision, the task of stereo image super-resolution (StereoSR) has garnered significant attention due to its potential applications in augmented reality, virtual reality, and autonomous driving. Traditional Transformer-based models, while powerful, often suffer from attention noise, leading to suboptimal reconstruction issues in super-resolved images. This paper introduces DIFFSSR, a novel neural network architecture designed to address these challenges. We introduce the Diff Cross Attention Block (DCAB) and the Sliding Stereo Cross-Attention Module (SSCAM) to enhance feature integration and mitigate the impact of attention noise. The DCAB differentiates between relevant and irrelevant context, amplifying attention to important features and canceling out noise. The SSCAM, with its sliding window mechanism and disparity-based attention, adapts to local variations in stereo images, preserving details, and addressing the performance degradation due to misalignment of horizontal epipolar lines in stereo images. Extensive experiments on benchmark datasets demonstrate that DIFFSSR outperforms state-of-the-art methods, including NAFSSR and SwinFIRSSR, in terms of both quantitative metrics and visual quality.
Supplementary Material: zip
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 7904
Loading