An Integrated YOLO and VLM System for Fire Detection in Enclosed Environments

Published: 05 Mar 2025, Last Modified: 21 Mar 2025ICLR 2025 Workshop ICBINBEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 4 pages)
Keywords: car park fire detection, yolo model, Vision-Language-Model, End-to-End Framework
TL;DR: We fine-tune YOLO for fire detection in parking environments but, due to data limitations and real-world constraints, propose an end-to-end framework with incorporating VLM with YOLO model.
Abstract: While YOLO models show promise in car fire detection, they remain insufficient for real-world deployment in confined parking environments due to dataset limitations, evaluation gaps, and deployment constraints. We first fine-tune YOLO on a fire/smoke-augmented dataset, but analysis reveals its struggles with ambiguous fire-smoke boundaries, leading to false predictions. To address this, we propose a real-time end-to-end framework integrating YOLOv8s with Florence2 VLM, combining object detection with contextual reasoning. While YOLOv8s with VLM improves detection reliability, challenges are still ongoing. Our findings highlight YOLO’s limitations in fire detection and the need for a more adaptive, environment-aware approach.
Supplementary Material: pdf
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 16
Loading