TMNet: Triple-modal interaction encoder and multi-scale fusion decoder network for V-D-T salient object detection
Abstract: Highlights•This paper proposes a triple-modal interaction encoder and multi-scale fusion decoder network (TMNet) to highlight the salient regions.•The triple-modal interaction encoder comprises the separation context-aware feature module, channel-wise fusion module, and triple-modal refinement and fusion module to explore and utilize the complementarity between Visible, Depth, and Thermal information.•The multi-scale fusion decoder involves the semantic-aware localizing module and contour-aware refinement module to extract and fuse the location and boundary information, yielding a high-quality saliency map.•Extensive experiments on the public VDT-2048 dataset demonstrate that our TMNet outperforms existing state-of-the-art methods in terms of all evaluation metrics.
Loading