Abstract: Existing deep learning-based Multi-focus Image Fusion (MFIF) methods often rely on loss functions derived from linear combinations of image quality metrics, leading to complexities in training and only marginal improvements in image quality. Recognizing this, our study identifies input space and scale information as pivotal in enhancing MFIF performance. By augmenting raw spatial images with other feature spaces, i.e., gradient and dense Scale-Invariant Feature Transform (DSIFT), we enhance the model’s ability to detect edges, textures, and local structures, facilitating more accurate differentiation between focused and defocused areas. Additionally, smaller scale variations improve focus detection, while multi-scale learning within neural networks effectively suppresses artifacts without affecting focus detection accuracy. To achieve the above enhancements, we introduce the Multi-Feature Aggregation Network (MFANet), which employs a three-branch architecture to perform focused detection process in spatial, gradient, and DSIFT feature spaces. Each branch is equipped with a Pyramid Attention Fusion (PAF) module that utilizes attention mechanisms and a novel Light Spatial Aggregation Pyramid Module (LSAPM) to capture global feature relationships and aggregate multi-scale information. Experimental results demonstrate that MFANet surpasses other state-of-the-art fusion methods in both qualitative and quantitative evaluations.
Loading