Efficient Perceiving Local Details via Adaptive Spatial-Frequency Information Integration for Multi-focus Image Fusion

Jingjia Huang, Jingyan Tu, Ge Meng, Yingying Wang, Yuhang Dong, Xiaotong Tu, Xinghao Ding, Yue Huang

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-focus image fusion (MFIF) aims to combine multiple images with different focused regions into a single all-in-focus image. Existing unsupervised deep learning-based methods only fuse structural information of images in the spatial domain, neglecting potential solutions from the frequency domain exploration. In this paper, we make the first attempt to integrate spatial-frequency information to achieve high-quality MFIF. We propose a novel unsupervised spatial-frequency interaction MFIF network named SFIMFN, which consists of three key components: Adaptive Frequency Domain Information Interaction Module (AFIM), Ret-Attention-Based Spatial Information Extraction Module (RASEM), and Invertible Dual-domain Feature Fusion Module (IDFM). Specifically, in AFIM, we interactively explore global contextual information by combining the amplitude and phase information of multiple images separately. In RASEM, we design a customized transformer to encourage the network to capture important local high-frequency information by redesigning the self-attention mechanism with a bidirectional, two-dimensional form of explicit decay. Finally, we employ IDFM to fuse spatial-frequency information without information loss to generate the desired all-in-focus image. Extensive experiments on different datasets demonstrate that our method significantly outperforms state-of-the-art unsupervised methods in terms of qualitative and quantitative metrics as well as the generalization ability.