Keywords: Vision-language models, Small Language Model, multi-modal intelligence, safety-critical systems, prompt-based inference, edge AI, wildfire smoke detection
TL;DR: This paper investigates inference-time prompt design as a lightweight approach to control detection behavior in a small vision--language model in a zero-shot setting without any fine-tuning.
Abstract: Early wildfire smoke detection is a crucial safety task that allows for timely intervention before small-scale ignitions become major catastrophes. Current wildfire detection systems tend to depend on computationally demanding models, fine-tuning processes, or fixed inference behavior, which limits their adaptability and deployment on limited-capacity edge platforms.
This paper investigates inference-time prompt design as a lightweight approach to control detection behavior in a small vision--language model in a zero-shot setting without any fine-tuning. Using a compact 4B-parameter model, three prompt strategies—Baseline, Recall-Boost, and Balanced—are evaluated on the FIgLib-Test wildfire smoke dataset. The proposed approach enables explicit control over safety-oriented trade-offs between missed smoke detections and false alarms under identical inference conditions by modifying only the textual prompt.
Quantitative results and qualitative visual analysis illustrate a significant reduction in missed smoke detections while maintaining practical operational behavior when applying recall-oriented and balanced prompt strategies. The findings highlight prompt-controlled inference as an efficient and flexible solution for safety-critical multi-modal perception in real-world wildfire monitoring.
Submission Number: 2
Loading