Collaborative spatial-temporal video salient object detection with cross attention transformer

Published: 2024, Last Modified: 09 Nov 2025Signal Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Siamese feature extractor is proposed to jointly extract static and motion features.•Deep level set method is utilized to fix the semantic gap.•Cross-attention transformer is proposed to refine and fuse static and motion features.
Loading