MIAFusion: Infrared and Visible Image Fusion via Multi-scale Spatial and Channel-Aware Interaction Attention

Published: 01 Jan 2024, Last Modified: 15 May 2025PRCV (8) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The purpose of Infrared and Visible Image Fusion (IVIF) is to obtain fused images with highlighted objects and rich details. Existing Transformer-based IVIF algorithms mainly depend on self-attention mechanisms to capture long-range dependencies. However, the single-scale self-attention in original transformers can only extract spatial global features, ignoring channel and cross-dimensional self-attention. In order to solve this problem, we propose a novel Transformer-based IVIF approach based on Multi-scale spatial and channel-aware Interaction Attention (MIAFusion). Specifically, Dual Convolution Attention Module (DCAM) is applied to obtain channel and spatial features by combing spatial and channel attention. Moreover, Multi-Scale Transformer Module (MSTM) is utilized to capture both cross-dimensional and long-range dependencies. Different from single-scale self-attention in original transformers, Multi-scale Interaction self-Attention (MIA) is applied in MSTM to replace multi-head self-attention. By utilizing MIA, attention interaction is achieved among different channels, contributing to effective feature fusion of the two modalities. Qualitative and quantitative experimental results conducted on public available TNO, RoadScene and M3FD datasets prove that our method has better performance in comparison with several state-of-the-art methods.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview