Multi-Stage Auxiliary Learning for Visible-Infrared Person Re-Identification

Huadong Zhang; Shuli Cheng; Anyu Du

Multi-Stage Auxiliary Learning for Visible-Infrared Person Re-Identification

Huadong Zhang, Shuli Cheng, Anyu Du

Published: 01 Jan 2024, Last Modified: 08 Apr 2025IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Visible-infrared person re-identification (VI-ReID) is a challenging study aimed at retrieving the same person across cameras, time, and modalities. Existing methods usually employ dual-stream networks with integrated constraints, or compensate for modality information to reduce the significant modality discrepancies among heterogeneous images. However, the effectiveness of designed constraints is often limited due to substantial cross-modality differences, and methods that compensate for modality information may introduce noise and additional computational cost. In this paper, we propose a novel Multi-Stage Auxiliary Learning strategy called MSALNet. Specifically, in our approach, the training process is bifurcated into two stages: 1) training with auxiliary modality pairs obtained from grayscale histogram equalization, and 2) training with visible and infrared image pairs to gradually extract more discriminative modality-shared features. We propose the Heterogeneous Feature Compensation Learning (HFCL) module for information compensation and fusion between visible and infrared features, generating auxiliary branches to learn more cross-modality-related information. Additionally, we propose the Modality Similarity Reinforcement (MSR) module to improve the consistency of cross-modality feature representation by suppressing interference information and leveraging pixel similarity probability distribution as supervisory information. Lastly, we design the Distance Center Alignment (DCA) loss to reduce intra-class variations within and between modalities, enhancing the distinguishability among different identities. Experimental results demonstrate MSALNet’s superior performance over most existing methods on two mainstream VI-ReID datasets and effectively saves computational cost.

Loading