GM-DETR: Generalized Muiltispectral DEtection TRansformer with Efficient Fusion Encoder for Visible-Infrared Detection

Published: 01 Jan 2024, Last Modified: 13 Nov 2024CVPR Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multispectral object detection based on RGB and IR achieves improved accurate and robust performance by integrating complementary information from different modalities. However, existing methods predominantly focus on effectively fusing information from both modalities to enhance detection performance, and rarely study the diversified utilization of RGB and IR data and explore the adaptability of the model to practical application scenarios. We first analyze the utilization of datasets for multispectral object detection, and compare their testing performance. To better leverage datasets and address more generalized model application scenarios, we propose a Generalized Multispectral DEtection TRansformer (GM-DETR) with a two-stage training strategy. Specifically, we design the Modality-Specific Feature Interaction (MSFI) module to extract the high-level information from RGB and IR, and propose the Cross-Modality-Scale feature Fusion (CMSF) module for fusing RGB and IR modalities, which performs multi-scale cross-modalities fusion. Our GM-DETR achieves state-of-the-art performance on FLIR and LLVIP benchmark datasets.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview