MIAFusion: Infrared and Visible Image Fusion via Multi-scale Spatial and Channel-Aware Interaction Attention
Abstract: The purpose of Infrared and Visible Image Fusion (IVIF) is to obtain fused images with highlighted objects and rich details. Existing Transformer-based IVIF algorithms mainly depend on self-attention mechanisms to capture long-range dependencies. However, the single-scale self-attention in original transformers can only extract spatial global features, ignoring channel and cross-dimensional self-attention. In order to solve this problem, we propose a novel Transformer-based IVIF approach based on Multi-scale spatial and channel-aware Interaction Attention (MIAFusion). Specifically, Dual Convolution Attention Module (DCAM) is applied to obtain channel and spatial features by combing spatial and channel attention. Moreover, Multi-Scale Transformer Module (MSTM) is utilized to capture both cross-dimensional and long-range dependencies. Different from single-scale self-attention in original transformers, Multi-scale Interaction self-Attention (MIA) is applied in MSTM to replace multi-head self-attention. By utilizing MIA, attention interaction is achieved among different channels, contributing to effective feature fusion of the two modalities. Qualitative and quantitative experimental results conducted on public available TNO, RoadScene and M3FD datasets prove that our method has better performance in comparison with several state-of-the-art methods.
Loading