STAR: A Benchmark for Situated Reasoning in Real-World Videos

Bo Wu; Shoubin Yu; Zhenfang Chen; Joshua B. Tenenbaum; Chuang Gan

STAR: A Benchmark for Situated Reasoning in Real-World Videos

Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

Published: 11 Oct 2021, Last Modified: 23 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone

Keywords: Situated Reasoning, Visual Reasoning, Action, Benchmark, Question Answering

Abstract: Reasoning in the real world is not divorced from situations. How to capture the present knowledge from surrounding situations and perform reasoning accordingly is crucial and challenging for machine intelligence. This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR). This benchmark is built upon the real-world videos associated with human actions or interactions, which are naturally dynamic, compositional, and logical. The dataset includes four types of questions, including interaction, sequence, prediction, and feasibility. We represent the situations in real-world videos by hyper-graphs connecting extracted atomic entities and relations (e.g., actions, persons, objects, and relationships). Besides visual perception, situated reasoning also requires structured situation comprehension and logical reasoning. Questions and answers are procedurally generated. The answering logic of each question is represented by a functional program based on a situation hyper-graph. We compare various existing video reasoning models and find that they all struggle on this challenging situated reasoning task. We further propose a diagnostic neuro-symbolic model that can disentangle visual perception, situation abstraction, language understanding, and functional reasoning to understand the challenges of this benchmark.

Supplementary Material: pdf

URL: http://star.csail.mit.edu or https://bobbywu.com/STAR

TL;DR: STAR is a novel benchmark for Situated Reasoning in real-world videos, which provides challenging question-answering tasks, structured situation abstraction and logic-grounded programs.

Contribution Process Agreement: Yes

Dataset Url: STAR Benchmark: http://star.csail.mit.edu, Code Repository: https://github.com/csbobby/STAR_Benchmark

License: Dataset and Code License: https://github.com/csbobby/STAR_Benchmark/blob/main/LICENSE

Author Statement: Yes

27 Replies

Loading