Keywords: 360° Video Understanding, Episodic Memory, Transformation-Invariant Retrieval, Equirectangular Projection, Disaster-Resilient AI, Safety-Critical Vision, Compositional Reasoning, Temporal Grounding, Few-Shot Adaptation, Smoke Occlusion, Multimodal Evaluation, Real-World Video Benchmarks, Vision-Language Evaluation
TL;DR: Fire360 is a benchmark of 360° firefighting videos for evaluating vision-language models under real-world degradation, introducing five tasks including Transformed Object Retrieval (TOR) for fire-damaged object matching.
Abstract: Modern AI systems struggle most in environments where reliability is critical - scenes with smoke, poor visibility, and structural deformation. Each year, tens of thousands of firefighters are injured on duty, often due to breakdowns in situational perception. We introduce Fire360, a benchmark for evaluating perception and reasoning in safety-critical firefighting scenarios. The dataset includes 228 360° videos from professional training sessions under diverse conditions (e.g., low light, thermal distortion), annotated with action segments, object locations, and degradation metadata. Fire360 supports five tasks: Visual Question Answering, Temporal Action Captioning, Object Localization, Safety-Critical Reasoning, and Transformed Object Retrieval (TOR). TOR tests whether models can match pristine exemplars to fire-damaged counterparts in unpaired scenes, evaluating episodic memory under irreversible visual transformations. While human experts achieve 83.5% on TOR, models like GPT-4o lag significantly, exposing failures in reasoning under degradation. By releasing Fire360 and its evaluation suite, we aim to advance models that not only see, but also remember, reason, and act under uncertainty. The dataset is available at https://uofi.box.com/v/fire360dataset
Croissant File: json
Dataset URL: https://uofi.box.com/v/fire360dataset
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 617
Loading