An Effective Visible-Infrared Person Re-identification Network Based on Second-Order Attention and Mixed Intermediate Modality
Abstract: Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the significant cross-modality discrepancy, it is difficult to learn discriminative features. Attention-based methods have been widely utilized to extract discriminative features for VI-ReID. However, the existing methods are confined by first-order structures that just exploit simple and coarse information. The existing approach lacks the sufficient capability to learn both modality-irrelevant and modality-relevant features. In this paper, we extract the second-order information from mid-level features to complement the first-order cues. Specifically, we design a flexible second-order module, which considers the correlations between the common features and learns refined feature representations for pedestrian images. Additionally, the visible and infrared modality has a significant gap. Therefore, we propose a plug-and-play mixed intermediate modality module to generate intermediate modality representations to reduce the modality discrepancy between the visible and infrared features. Extensive experimental results on two challenging datasets SYSU-MM01 and RegDB demonstrate that our method considerably achieves competitive performance compared to the state-of-the-art methods.
Loading