Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360° Firefighting Video

Aditi Tiwari; Farzaneh Masoud; Dac Trong Nguyen; Jill Kraft; Heng Ji; Klara Nahrstedt

Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360° Firefighting Video

Aditi Tiwari, Farzaneh Masoud, Dac Trong Nguyen, Jill Kraft, Heng Ji, Klara Nahrstedt

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 360° Video Understanding, Episodic Memory, Transformation-Invariant Retrieval, Equirectangular Projection, Disaster-Resilient AI, Safety-Critical Vision, Compositional Reasoning, Temporal Grounding, Few-Shot Adaptation, Smoke Occlusion, Multimodal Evaluation, Real-World Video Benchmarks, Vision-Language Evaluation

TL;DR: Fire360 is a benchmark of 360° firefighting videos for evaluating vision-language models under real-world degradation, introducing five tasks including Transformed Object Retrieval (TOR) for fire-damaged object matching.

Abstract: Modern AI systems struggle most in environments where reliability is critical - scenes with smoke, poor visibility, and structural deformation. Each year, tens of thousands of firefighters are injured on duty, often due to breakdowns in situational perception. We introduce Fire360, a benchmark for evaluating perception and reasoning in safety-critical firefighting scenarios. The dataset includes 228 360° videos from professional training sessions under diverse conditions (e.g., low light, thermal distortion), annotated with action segments, object locations, and degradation metadata. Fire360 supports five tasks: Visual Question Answering, Temporal Action Captioning, Object Localization, Safety-Critical Reasoning, and Transformed Object Retrieval (TOR). TOR tests whether models can match pristine exemplars to fire-damaged counterparts in unpaired scenes, evaluating episodic memory under irreversible visual transformations. While human experts achieve 83.5% on TOR, models like GPT-4o lag significantly, exposing failures in reasoning under degradation. By releasing Fire360 and its evaluation suite, we aim to advance models that not only see, but also remember, reason, and act under uncertainty. The dataset is available at https://uofi.box.com/v/fire360dataset

Croissant File: json

Dataset URL: https://uofi.box.com/v/fire360dataset

Primary Area: Datasets & Benchmarks for applications in computer vision

Submission Number: 617

Loading