Weakly Aligned Multi-spectral Pedestrian Detection via Cross-Modality Differential Enhancement and Multi-scale Spatial Alignment

Published: 2024, Last Modified: 22 Jan 2026ICPR (30) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-spectral pedestrian detection has attracted extensive attention in recent years. In particular, the combination of RGB and thermal infrared images allows the around-the-clock applications, even in the poor illumination conditions. Considering the fact that RGB and thermal infrared (RGB-T) image pairs are not well aligned, it leads to the inaccuracy of pedestrian detection. To this end, this paper proposes a Multi-scale Alignment and Differential Enhancement Network (MADENet) for multi-spectral pedestrian detection, consisting of Cross-Modality Differential Enhancement Module (CDEM) and Multi-scale Spatial Alignment Module (MSAM). CDEM module is embedded in the backbone to suppress the redundant features and extract complementary information between modalities, and MSAM module is designed to align the RGB-T features by the transformation of thermal features using features of RGB image as the reference. The proposed network is evaluated on the public KAIST dataset across different scenarios. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods. Miss rate using all test set can reach 8.01.
Loading