Boosting Visible-Infrared Image Fusion and Target Detection Performance with Sleep-Wake Joint Learning

Shaobing Gao

Published: 31 Aug 2025, Last Modified: 19 Sept 2025OpenReview Archive Direct UploadEveryoneCC BY-ND 4.0

Abstract: In this article, we amalgamate the low-level vision task of visible-infrared image fusion (VIIF) with the high-level vision task of target detection (TD), culminating in a holistic framework tailored for the surveillance of complex Airport scenes. Our proposed model capitalizes on the integration of detailed information from multiple sources, augmenting TD efficacy in the framework’s backend, while steering the fusion process in the frontend through the utilization of semantic insights gleaned from the TD task. Central to our approach is a feature extraction module that exploits dilated convolution and dense connections to bolster feature representation. Furthermore, inspired by the top-down attention mechanisms evident in biological vision systems, we devise a multiscale fusion technique rooted in attention mechanisms. Additionally, drawing from the concept of the “sleep-wake” mechanism observed in animal brains, we enable end-to-end training of multitasks despite inherent target discrepancies. Specifically, we strategically employ the TD loss to guide the VIIF process, thereby enhancing fusion outcomes while concurrently elevating TD performance. Through comprehensive quantitative analysis of human psychophysical data, object detection performance, image quality metrics, and training and testing efficiency analysis conducted on both the publicly available M3FD and FLIR datasets, alongside a complex Airport scene dataset under five imaging conditions curated by our research group, our proposed method showcases superior performance in both VIIF and TD tasks compared to prevailing state-of-the-art (SOTA) approaches. The Airport dataset and code will be made available at https://github.com/rwerwer2024/Sleep-Wake-Joint-Fusion-main.