ACT360: An Efficient 360-Degree Action Detection and Summarization Framework for Mission-Critical Training and Debriefing

Aditi Tiwari; Klara Nahrstedt

ACT360: An Efficient 360-Degree Action Detection and Summarization Framework for Mission-Critical Training and Debriefing

Aditi Tiwari, Klara Nahrstedt

Published: 01 Jan 2025, Last Modified: 09 Aug 2025SMARTCOMP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Effective training and debriefing are critical in high-stakes, mission-critical environments such as firefighting, where precision and error minimization are paramount. The traditional post-training analysis relies on the manual review of 2D video, a process that is time-consuming and lacks comprehensive situational awareness. To address these limitations, we introduce ACT360, a novel system that leverages 360-degree video and machine learning for automated action detection and efficient debriefing. ACT360 incorporates 360YOWO, a customized You Only Watch Once (YOWO) model enhanced with a spatial attention mechanism and equirectangular-aware convolution (EAC) to handle the unique distortions of panoramic video data. To enable deployment in resource-constrained environments, we apply quantization and model pruning, reducing the model size by 74% while maintaining robust accuracy (mAP drop of only 1.5 %, from 0.865 to 0.850) and improving inference speed. We validate our approach on a new, publicly available dataset of 55 labeled 360-degree videos covering seven key firefighting actions, recorded across various real-world practice sessions and environmental conditions. Furthermore, we integrate the pipeline with 360AIE (Action Insight Explorer), a web-based interface that provides automatic action detection, retrieval, and textual summarization of key events using large language models (LLMs), significantly improving post-incident analysis efficiency. ACT360 serves as a generalized framework for mission-critical debriefing, incorporating techniques such as EAC, spatial attention, summarization, and model optimization. These innovations apply to any training environment requiring lightweight action detection and structured nost-exercise analysis.

Loading