Toward Interactive Self-Annotation For Video Object Bounding Box: Recurrent Self-Learning And Hierarchical Annotation Based Framework
Abstract: Amount and variety of training data drastically affect the
performance of CNNs. Thus, annotation methods are becoming more and more critical to collect data efficiently.
In this paper, we propose a simple yet efficient Interactive
Self-Annotation framework to cut down both time and human labor cost for video object bounding box annotation.
Our method is based on recurrent self-supervised learning
and consists of two processes: automatic process and interactive process, where the automatic process aims to build a
supported detector to speed up the interactive process. In
the Automatic Recurrent Annotation, we let an off-the-shelf
detector watch unlabeled videos repeatedly to reinforce itself automatically. At each iteration, we utilize the trained
model from the previous iteration to generate better pseudo
ground-truth bounding boxes than those at the previous iteration, recurrently improving self-supervised training the
detector. In the Interactive Recurrent Annotation, we tackle
the human-in-the-loop annotation scenario where the detector receives feedback from the human annotator. To this
end, we propose a novel Hierarchical Correction module,
where the annotated frame-distance binarizedly decreases
at each time step, to utilize the strength of CNN for neighbor frames. Experimental results on various video datasets
demonstrate the advantages of the proposed framework in
generating high-quality annotations while reducing annotation time and human labor costs
0 Replies
Loading