Abstract: RGB-T object detection in autonomous driving has been researched increasingly in recent years. Nevertheless, several problems limit the performance of RGB-T fusion perception. Initially, although illumination awareness is a mature technology to guide fusion process, the outputs of previous methods lack sufficient semantic detail. What’s more, RGB and thermal modalities have severe imbalance problems, which makes it difficult to generate a well-performed fusion modality. Thus, concerned about the above problems, a novel RGB-T detection network is proposed. Firstly, the semantic illumination extraction module is proposed to produce sufficient illumination fusion weights, which contain global-level and semantic-level fusion weights to guide fusion process. Secondly, the multi-modal illumination-guided deformable transformer module is designed to aggregate RGB and thermal modalities. The module implements fusion-modality query initiation to obtain well-performed fusion queries. Then illumination-guided multi-modal aggregation is utilized to refine the fusion queries with a single modality. The proposed network has been tested on RGB-T object detection datasets, KAIST and $\mathbf {M^{3}FD}$, and achieved 7.44% of $\mathbf {MR^{-2}}$ in KAIST and 80.0% of mAP in $\mathbf {M^{3}FD}$ separately. Meanwhile, we implement the real-time test to evaluate its practical feasibility. The results of the experiment demonstrate the superior performance of the proposed method.
External IDs:doi:10.1109/lra.2025.3615529
Loading