TSVT: Token Sparsification Vision Transformer for robust RGB-D salient object detection

Published: 01 Jan 2024, Last Modified: 10 Nov 2024Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Proposing an asymmetric encoder-decoder visual transformer network, called TSVT.•TSVT can adaptively sparse tokens for effectively exploring global context.•An IDFM is designed to fuse the difference and consistency of multi-modality tokens.•TSVT achieves a more robust and effective saliency detection performance.
Loading