Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual Interaction

Abstract: Video-based visible-infrared person re-identification (VVI-ReID) aims to retrieve video sequences of the same pedestrian from different modalities. The key of VVI-ReID is to learn discriminative sequence-level representations that are invariant to both intra- and inter-modal discrepancies. However, most works only focus on the elimination of modality-gap while ignore the distractors within the modality. Moreover, existing sequence-level representation learning approaches are limited to a single video, failing to mine the correlations among multiple videos of the same pedestrian. In this paper, we propose a Style Augmentation, Attack and Defense network with Graph-based dual interaction (SAADG) to guarantee the semantic consistency against both intra-modal discrepancies and inter-modal gap. Specifically, we first generate diverse styles for video frames by random style variation in image spaces. Followed by the style attack and defense, the intra- and inter-modal discrepancies are modeled as different types of style disturbance (attack), and our model achieves to keep the id-related content invariant under such attack. Besides, a graph-based dual interaction module is further introduced to fully explore the cross-view and cross-modal correlations among various videos of the same identity, which are then transferred to the sequence-level representations. Extensive experiments on the public SYSU-MM01 and HITSZ-VCM datasets show that our approach achieves the remarkable performance compared with state-of-the-arts. The code is available at https://github.com/ChuhaoZhou99/SAADG_VVIReID.
0 Replies
Loading