A three-stream fusion and self-differential attention network for multi-modal crowd counting

Published: 01 Jan 2024, Last Modified: 27 Sept 2024Pattern Recognit. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose a novel multi-modal crowd counting model to address information fusion and scale variation problems.•The model uses the three-stream fusion encoder with IIM to fuse modality-paired and modality-specific features.•The model adaptively integrates multi-scale features by SDAM to emphasize discriminative scale information.•Our method outperforms its counterparts and performs consistently well in the daytime and nighttime.
Loading