Abstract: Existing multimodal medical image fusion methods often rely on convolutional neural networks (CNNs) for local feature extraction but fail to model global relationships effectively. Transformer-based approaches address this limitation but are computationally expensive. In this work, we propose the salience-guided cross-domain aggregation network (SCAN) for efficient and high-performance multimodal medical image fusion. SCAN combines the strengths of CNNs and Transformers by introducing a novel nested pyramid residual attention (NPRA) module in the encoder for better local feature extraction and adaptive attention to important regions. We also design a salience-guided dual attention (SGDA) module in the decoder to enhance fused features and preserve fine details. Extensive experiments on three multimodal brain datasets show that SCAN outperforms state-of-the-art methods in both qualitative observation and objective assessment. Future work will explore the scalability of SCAN to other medical imaging tasks and its potential for real-time applications.
External IDs:dblp:journals/tim/LiZWSHLZ25
Loading