Abstract: Due to the computational complexity of self-attention (SA), prevalent techniques for image deblurring often resort to either adopting localized SA or employing coarse-grained global SA methods, both of which exhibit drawbacks such as compromising global modeling or lacking fine-grained correlation. In order to address this issue by effectively modeling long-range dependencies without sacrificing fine-grained details, we introduce a novel approach termed Local Frequency Transformer (LoFormer). Within each unit of LoFormer, we incorporate a Local Channel-wise SA in the frequency domain (Freq-LC) to simultaneously capture cross-covariance within low- and high-frequency local windows. These operations offer the advantage of (1) ensuring equitable learning opportunities for both coarse-grained structures and fine-grained details, and (2) exploring a broader range of representational properties compared to coarse-grained global SA methods. Additionally, we introduce an MLP Gating mechanism complementary to Freq-LC, which serves to filter out irrelevant features while enhancing global learning capabilities. Our experiments demonstrate that LoFormer significantly improves performance in the image deblurring task, achieving a PSNR of 34.09 dB on the GoPro dataset with 126G FLOPs. Code will be released.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Image deblurring significantly contributes to multimedia and multimodal processing by improving the quality and interpretability of visual information. Clearer images enhance the performance of computer vision algorithms, aiding tasks such as object recognition and segmentation. In multimedia applications, deblurring ensures that images are sharp and detailed, enhancing user experience and facilitating effective communication. Additionally, in multimodal processing, deblurring supports image fusion techniques by providing clear visual input for combining information from different sources. Furthermore, in augmented reality and virtual reality applications, deblurring plays a crucial role in creating realistic and immersive visual experiences by ensuring that virtual scenes or overlaid digital content are sharp and well-defined. Overall, image deblurring enhances the accuracy, reliability, and usability of visual data across various multimedia and multimodal processing tasks.
Supplementary Material: zip
Submission Number: 791
Loading