TSVT: Token Sparsification Vision Transformer for robust RGB-D salient object detection

Lina Gao, Bing Liu, Ping Fu, Mingzhu Xu

Published: 2024, Last Modified: 10 Nov 2024Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Proposing an asymmetric encoder-decoder visual transformer network, called TSVT.•TSVT can adaptively sparse tokens for effectively exploring global context.•An IDFM is designed to fuse the difference and consistency of multi-modality tokens.•TSVT achieves a more robust and effective saliency detection performance.