A Multimodal Decision-Fusion Network Approach for Activity Recognition in Firefighter Self-Contained Breathing Apparatus Endurance Training
Abstract: Insufficient training in firefighting techniques increases the risk of injuries and fatalities among firefighters. Human activity recognition methods show promising potential for performance monitoring and evaluation. However, the present studies focus mainly on individual modalities. This limited scope presents challenges in effectively distinguishing between intricate tasks, such as those encountered in firefighting operations. This study introduces an innovative multimodal decision-fusion network designed to overcome this limitation. This is achieved by integrating vision data sourced from three distinct cameras and sensor data collected from four wearable devices. The proposed network combines a vision-focused Video Swin network with a sensor-driven Sensor Transformer network where the results show that the use of only vision-based methods is insufficient to accurately classify firefighting training activities. The proposed decision-fusion network improves classification with a mean F1-score of 95.73%, outperformed the existing hybrid machine learning network.
External IDs:doi:10.1109/lsens.2025.3614427
Loading