Abstract: Highlights•MCA fuses multimodalitites using global and long-range contextual interaction.•SCA (SCASNR+SCASCR) fuses the global information between adjacent scales.•SCASNR constructs the scale-related weighted long-distance dependencies.•SCASCR generates the channel-based scale-related matrix, enhancing the features.•MM-UNet improves self-attention learning and performance on well-known benchmarks.
Loading