Efficient Perceiving Local Details via Adaptive Spatial-Frequency Information Integration for Multi-focus Image Fusion
Abstract: Multi-focus image fusion (MFIF) aims to combine multiple images with different focused regions into a single all-in-focus image. Existing unsupervised deep learning-based methods only fuse structural information of images in the spatial domain, neglecting potential solutions from the frequency domain exploration. In this paper, we make the first attempt to integrate spatial-frequency information to achieve high-quality MFIF. We propose a novel unsupervised spatial-frequency interaction MFIF network named SFIMFN, which consists of three key components: Adaptive Frequency Domain Information Interaction Module (AFIM), Ret-Attention-Based Spatial Information Extraction Module (RASEM), and Invertible Dual-domain Feature Fusion Module (IDFM). Specifically, in AFIM, we interactively explore global contextual information by combining the amplitude and phase information of multiple images separately. In RASEM, we design a customized transformer to encourage the network to capture important local high-frequency information by redesigning the self-attention mechanism with a bidirectional, two-dimensional form of explicit decay. Finally, we employ IDFM to fuse spatial-frequency information without information loss to generate the desired all-in-focus image. Extensive experiments on different datasets demonstrate that our method significantly outperforms state-of-the-art unsupervised methods in terms of qualitative and quantitative metrics as well as the generalization ability.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: Constrained by the focused capability of optical imaging devices, objects may appear blurred in local regions due to being out of out the depth-of-field during the imaging process. The multi-focus image fusion (MFIF) aims to extract complementary information from images of multiple focused regions to generate an all-in-focus image. This work on multi-focus image fusion contributes to multimedia/multimodal processing in several ways: 1.Enhancing Image Quality and Clarity: MFIF synthesizes images from different focal points into a single, higher quality, and clearer image. This is useful for image enhancement and optimization in multimedia processing, improving visual quality and enhancing user experience. 2.Improving Image Information Fusion: MFIF helps effectively fuse image information from different focal points, creating an image with richer and more comprehensive information. This is significant for improving models' understanding and processing capabilities of image information. 3.Wide Applications of Downstream Tasks: MFIF can been applied to many applications such as microscopic imaging, image segmentation, image classification and image recognition. In this work, we propose a novel unsupervised MFIF framework that adaptively integrates high-low frequency information from the spatial and frequency domains of multiple source images. Also, we design a customized transformer for MFIF which can perceive the locally focused regions more effectively.
Supplementary Material: zip
Submission Number: 1703
Loading