MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

Published: 01 Jan 2025, Last Modified: 11 Apr 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•This study explores the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for weakly supervised video object segmentation.•To enhance the effectiveness of prompt learning in diverse and complex video scenes, we introduce a spatial–temporal decoupled deformable attention mechanism to establish a strong correlation between intra- and inter-frame features.•Extensive experiments on the benchmark datasets demonstrate the superior performance of the proposed approach without mask supervision compared to existing mask-supervised methods, and its ability to generalize to weakly-annotated video datasets.
Loading