Generating reliable video annotations by exploiting the crowd

Roberto Di Salvo, Concetto Spampinato, Daniela Giordano

Published: 2016, Last Modified: 20 Jul 2025WACV 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In computer vision and machine learning, the availability of annotated datasets is of crucial importance for both learning and performance evaluation. However, annotating visual datasets is a tedious and error-prone task and computer vision researchers usually dedicate a large amount of their time for collecting and generating annotations, which most of the time cannot be re-used in other scenarios. In this paper, we propose a simple, but effective, interactive video object segmentation method exploiting large noisy data gathered from crowd of users while playing a web game. Experimental results, carried out on two challenging video benchmarks, show how it is possible to generate reliable object segmentations in videos with a small human effort, achieving an accuracy comparable to the one obtained with manually-labeled annotations and also outperforming state-of-the-art video object segmentation approaches.