Fill-in-the-Blank: A Challenging Video Understanding Evaluation FrameworkDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: We propose fill-in-the-blanks as a video understanding evaluation framework. The task tests a model's understanding of a video by requiring the model to predict a masked noun phrase in the caption of the video, given the video and the surrounding text. To this end, we introduce a novel dataset consisting of 28,000 videos and fill-in-the-blank tests with multiple correct answers. The task and the dataset are challenging for the current state-of-the-art systems to solve. This task also does not share the weaknesses of the current state of the art language-informed video understanding tasks, namely: (1) video question answering using multiple-choice questions, where models perform relatively well because they exploit linguistic biases in the task formulation; and (2) video captioning, which relies on an open-ended evaluation framework that is often inaccurate because system answers may be perceived as incorrect if they differ in form from the ground truth.
0 Replies

Loading