Abstract: Highlights•We integrate explicit and implicit fusion techniques within the Swin Transformer architecture.•We propose a two-stage training strategy to implement targeted training for different modules.•We design a DWT-based fusion decoder to implicitly fuse information from different modalities.
External IDs:doi:10.1016/j.infrared.2025.106156
Loading