Abstract: In this paper we present a new computer vision task,
named video instance segmentation. The goal of this new
task is simultaneous detection, segmentation and tracking
of instances in videos. In words, it is the first time that
the image instance segmentation problem is extended to
the video domain. To facilitate research on this new task,
we propose a large-scale benchmark called YouTube-VIS,
which consists of 2,883 high-resolution YouTube videos, a
40-category label set and 131k high-quality instance masks.
In addition, we propose a novel algorithm called Mask-
Track R-CNN for this task. Our new method introduces
a new tracking branch to Mask R-CNN to jointly perform
the detection, segmentation and tracking tasks simultaneously.
Finally, we evaluate the proposed method and several
strong baselines on our new dataset. Experimental results
clearly demonstrate the advantages of the proposed algorithm
and reveal insight for future improvement. We believe
the video instance segmentation task will motivate the community
along the line of research for video understanding.
0 Replies
Loading