AI-Boosted Video Annotation: Exploring Pre-Labeling with Cross-Modalities

Juan Gutiérrez-Navarro, Ángel Mora-Sánchez, Silvia Rodríguez-Jiménez, J.L. Blanco-Murillo

Published: 01 Jan 2025, Last Modified: 12 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Annotation in large-scale video datasets requires significant resources. To enhance the efficiency of this process, we suggest employing pre-trained cross-modal models within the Human-in-the-Loop (HITL) paradigm. We used a synthetic video dataset to generate precise semantic annotations and assess the effectiveness of different label representations in comprehending visual information across diverse vision tasks, including fine- and coarse-grained ones. We also introduce a framework for automating pre-annotation extraction from semantically similar frames. Our approach presents promising avenues for efficiently annotating video data, crucial for developing robust Machine Learning (ML) systems.

External IDs:doi:10.1007/978-3-031-80946-0_1