Abstract: Inspired by how humans perceive and interpret the world using multiple senses, multi-modal learning involves integrating information from multiple modalities to improve understanding and performance in various tasks. Aligning with that notion, our key intuition is to utilize multi-model learning to solve the domain shift problem in night-time pedestrian detection.In this paper, we show that pairing RGB and infrared (IR) image features increases the robustness of pedestrian detection at night. Indeed, this solution is unbiased towards a specific time of the day as the IR domain reduces the reliance on lighting and serves as complementary information to the RGB domain. Our work aims at exploiting the power of attention mechanisms to guide a multi-modal framework in feature fusing from RGB and IR modalities. Our novel fusion approach, named dual attentive feature fusion (DaFF), leverages the duality of the transformer and channel-wise global attentions. To demonstrate the effectiveness of DaFF, we conducted experiments on two real-world multispectral pedestrian datasets. Extensive experimental results reveal the superiority of DaFF. We believe that combining the complementary properties of RGB and IR modalities is an effective remedy to mitigate the domain shift problem in pedestrian detection.
Loading