Abstract: Cross-modality person re-identification (VI-ReID) is a challenging pedestrian retrieval problem, where the two main challenges are intra-class differences and cross-modality differences between visible and infrared images. To address these issues, many state-of-the-art methods attempt to learn coarse image alignment or part-level person features, however, it is often limited by the effects of intra-identity variation and image alignment is not always good. In this paper, to overcome these two shortcomings, a relational alignment and distance optimization network (RADONet) is constructed. Firstly, we design a cross-modal relational alignment (CM-RA) that exploits the correspondence between cross-modal images to handle cross-modal differences at the pixel level. Secondly, we propose a cross-modal Wasserstein Distance (CM-WD) to mitigate the effects of intra-identity variation in modal alignment. In this way, our network is able to overcome the effects of identity variations by focusing on reducing inter-modal differences and performing more effective feature alignment. Extensive experiments show that our method outperforms state-of-the-art methods on two challenging datasets, with improvements of 3.39% and 2.06% on the SYSU-MM01 dataset for Rank-1 and mAP, respectively.
Loading