Abstract: Highlights•A light cross-modal attention network is built to complement heterogeneous features.•A RAMV-Softmax is proposed to promote the effect of Π-Net pre-training.•A lightweight cross-modal pooling attention is designed to align multimodal features.•We first establish a streamer re-ID pipeline based on multimodal deep learning.•We collect a real-world dataset StreamerReID for re-ID evaluation.
Loading