Unleashing the Power of Generic Segmentation Model: A Simple Baseline for Infrared Small Target Detection

Mingjin Zhang; Chi Zhang; Qiming Zhang; Yunsong Li; Xinbo Gao; Jing Zhang

Unleashing the Power of Generic Segmentation Model: A Simple Baseline for Infrared Small Target Detection

Mingjin Zhang, Chi Zhang, Qiming Zhang, Yunsong Li, Xinbo Gao, Jing Zhang

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advancements in deep learning have greatly advanced the field of infrared small object detection (IRSTD). Despite their remarkable success, a notable gap persists between these IRSTD methods and generic segmentation approaches in natural image domains. This gap primarily arises from the significant modality differences and the limited availability of infrared data. In this study, we aim to bridge this divergence by investigating the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to IRSTD tasks. Our investigation reveals that many generic segmentation models can achieve comparable performance to state-of-the-art IRSTD methods. However, their full potential in IRSTD remains untapped. To address this, we propose a simple, lightweight, yet effective baseline model for segmenting small infrared objects. Through appropriate distillation strategies, we empower smaller student models to outperform state-of-the-art methods, even surpassing fine-tuned teacher results. Furthermore, we enhance the model's performance by introducing a novel query design comprising dense and sparse queries to effectively encode multi-scale features. Through extensive experimentation across four popular IRSTD datasets, our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches, surpassing SAM and Semantic-SAM by over 14 IoU on NUDT and 4 IoU on IRSTD1k. The source code and models will be released.

Primary Subject Area: [Experience] Multimedia Applications

Relevance To Conference: Infrared (IR) imaging can serve as a valuable input for multimodal applications as IR imaging captures thermal radiation emitted by objects, which can complement visual images captured by cameras. Combining visual and IR images can provide a more comprehensive understanding of a scene or object, especially in scenarios where visual cues alone may be insufficient, such as detecting hidden objects or identifying targets obscured by smoke or fog. In our work, we aim to build a pioneer infrared object detection model that incorporates the cross-modality visible image priors, an area that has garnered attention within this community [1][2][3]. [1], Zhang, Mingjin, et al. "Exploring feature compensation and cross-level correlation for infrared small target detection." Proceedings of the 30th ACM International Conference on Multimedia. 2022. [2], Wang, Zeyu, et al. "TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible Translation." Proceedings of the 31st ACM International Conference on Multimedia. 2023.

Supplementary Material: zip

Submission Number: 1235

Loading