Abstract: In Internet of Things (IoT)-driven intelligent perception systems, multimodal crowd counting using visible-thermal (RGB-T) sensor arrays deployed on edge nodes plays a vital role in urban management. Existing methods primarily focus on intermodal feature fusion but suffer from noise amplification when directly integrating thermal features with degraded visible images under low-light conditions, leading to inaccurate density estimation and poor nighttime performance. To address this, we propose the first domain-adaptive network for RGB-T crowd counting. First, we introduce a dark-light domain adaptation module based on Retinex decomposition. Unlike traditional approaches that extract only bright and thermal domain features, our module employs a dual decomposition-recomposition process to learn illumination-invariant features from RGB images, enhancing discriminability in dark regions while preserving bright-domain semantics. Second, the module is supervised by feature alignment, reflectance consistency, and decomposition invariance losses, strengthening its robustness under extreme illumination degradation. Finally, a multidomain progressive decision fusion module is designed to leverage shared and domain-specific features through public feature extraction, dual-domain fusion, and weighted decision processes, improving generalization across varying lighting conditions. Trained solely on well-lit data, our method generalizes effectively to dark scenarios. Experiments on DroneRGBT and RGBTCC datasets demonstrate superior performance over state-of-the-art fusion methods, particularly excelling in low-light crowd counting.
External IDs:dblp:journals/iotj/NiuPXYTL25
Loading