Abstract: Highlights•Two-stream encoder is proposed to extract multimodality hierarchical features.•Inner- and cross-modal long-range dependencies are utilized for feature fusion.•Thermal image reconstruction and SOD are achieved jointly via multi-task decoder.•Experiments on SOD datasets prove the superiority of the proposed network.
Loading