Weakly-Supervised Video Scene Co-parsing

Guangyu Zhong, Yi-Hsuan Tsai, Ming-Hsuan Yang

Published: 2016, Last Modified: 09 Nov 2025ACCV (1) 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we propose a scene co-parsing framework to assign pixel-wise semantic labels in weakly-labeled videos, i.e., only video-level category labels are given. To exploit rich semantic information, we first collect all videos that share the same video-level labels and segment them into supervoxels. We then select representative supervoxels for each category via a supervoxel ranking process. This ranking problem is formulated with a submodular objective function and a scene-object classifier is incorporated to distinguish scenes and objects. To assign each supervoxel a semantic label, we match each supervoxel to these selected representatives in the feature domain. Each supervoxel is then associated with a series of category potentials and assigned to a semantic label with the maximum one. The proposed co-parsing framework extends scene parsing from single images to videos and exploits mutual information among a video collection. Experimental results on the Wild-8 and SUNY-24 datasets show that the proposed algorithm performs favorably against the state-of-the-art approaches.

External IDs:dblp:conf/accv/ZhongT016