DFT-Net: A Bimodal Object Detection Algorithm for Complex Traffic Environments

Jing Lian, Yibin Zhang, Haoyu Li, Jun Hu, Linhui Li

Published: 2024, Last Modified: 08 Nov 2025INDIN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In response to the challenge of single-modal sensors struggling to rapidly and accurately detect in complex traffic environments, a DFT-Net detection algorithm based on the fusion of visible light and infrared bimodal features is proposed. Firstly, the algorithm constructs a dual modal feature extraction network and designs an efficient aggregation module with stronger feature extraction capability. Secondly, the algorithm utilizes a framework of self-attention and cross-attention mechanisms to fuse the features of visible light and infrared modalities. This framework facilitates information interaction between different modalities through query-guided mechanisms. To reduce the computational load of the dual modal feature transformer module, a spatial feature compression module is introduced. Finally, in order to mitigate visible noise and low infrared resolution, this paper combines Transformer and CBAM attention mechanism. The designed algorithm is compared with various detection algorithms on multiple datasets, and the results demonstrate its effectiveness. On the FLIR-gan dataset, compared to the YOLOv7 detection algorithms for visible light and infrared, our DFT-Net achieved an accuracy improvement of 26.8% and 26.7% respectively, and 2.8% improvement compared to the bimodal detection algorithm CFT. This indicates that the algorithm exhibits good detection performance in complex traffic environments.

External IDs:dblp:conf/indin/LianZLHL24