TMNet: Triple-modal interaction encoder and multi-scale fusion decoder network for V-D-T salient object detection

Bin Wan, Chengtao Lv, Xiaofei Zhou, Yaoqi Sun, Zunjie Zhu, Hongkui Wang, Chenggang Yan

Published: 2024, Last Modified: 12 Nov 2025Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•This paper proposes a triple-modal interaction encoder and multi-scale fusion decoder network (TMNet) to highlight the salient regions.•The triple-modal interaction encoder comprises the separation context-aware feature module, channel-wise fusion module, and triple-modal refinement and fusion module to explore and utilize the complementarity between Visible, Depth, and Thermal information.•The multi-scale fusion decoder involves the semantic-aware localizing module and contour-aware refinement module to extract and fuse the location and boundary information, yielding a high-quality saliency map.•Extensive experiments on the public VDT-2048 dataset demonstrate that our TMNet outperforms existing state-of-the-art methods in terms of all evaluation metrics.

External IDs:dblp:journals/pr/WanLZSZWY24