Asymmetric Mutual Learning for Unsupervised Transferable Visible-Infrared Re-Identification

Ancong Wu, Chengzhi Lin, Wei-Shi Zheng

Published: 01 Jan 2024, Last Modified: 09 May 2025IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Visible-infrared person re-identification (Re-ID) plays a crucial role in matching people across camera views in the darkness and normal lighting. To reduce annotation cost, it is advantageous to learn Re-ID model from unlabeled visible-infrared image pairs. However, large modality gap makes it difficult to discover the underlying cross-modality sample relations. Compared with cross-modality sample pairs in the target domain, it is easier to obtain more single-modality visible image samples from other domains. In this work, we study unsupervised transfer learning to extract modality-shared knowledge from auxiliary unlabeled visible images in a source domain and leverage this knowledge to learn cross-modality matching in the unlabeled target domain. Our framework consists of two stages: RGB-gray asymmetric mutual learning and unsupervised cross-modality self-training. In the first stage, to extract visible-infrared shared information from auxiliary unlabeled visible images, we regard RGB images and grayscale fake infrared images transformed from RGB images as two views to learn view-shared information and simultaneously preserve RGB-specific information. Based on information theoretic analysis, we learn an RGB-gray feature extractor and further introduce an auxiliary gray feature extractor to quantify RGB-gray shared knowledge. This knowledge is then transferred to the RGB-gray feature extractor without eliminating RGB-specific information. We call this process Cross-Modality Asymmetric Mutual Learning (CMAM). In the second stage, for unsupervised cross-modality self-training in the target domain, we fuse the complementary knowledge in two models by mutual learning and employ bipartite cross-modality pseudo labeling to alleviate modality gap. For a more extensive evaluation, we collected a new public multi-modality dataset, SYSU-MM02, constructed from untrimmed videos. Our method achieves the state-of-the-art performance on three benchmark datasets. Project page: https://www.isee-ai.cn/project/sysumm02.html.