Semi-supervised Visible-Infrared Person Re-identification via Modality Unification and Confidence Guidance

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Semi-supervised visible-infrared person re-identification (SSVI-ReID) aims to match pedestrian images of the same identity from different modalities (visible and infrared) while only annotating visible images, which is highly related to multimedia and multi-modal processing. Existing works primarily focus on assigning accurate pseudo-labels to infrared images but overlook the two key challenges: erroneous pseudo-labels and large modality discrepancy. To alleviate these issues, this paper proposes a novel Modality-Unified and Confidence-Guided (MUCG) semi-supervised learning framework. Specifically, we first propose a Dynamic Intermediate Modality Generation (DIMG) module, which transfers knowledge from labeled visible images to unlabeled infrared images, enhancing the pseudo-label quality and bridging the modality discrepancy. Meanwhile, we propose a Weighted Identification Loss (WIL) that can reduce the model's dependence on erroneous labels by using confidence weighting. Moreover, an effective Modality Consistency Loss (MCL) is proposed to narrow the distribution of visible and infrared features, further narrowing the modality discrepancy and enabling the learning of modality-unified features. Extensive experiments show that the proposed MUCG has significant advantages in improving the performance of the SSVI-ReID task, surpassing the current state-of-the-art methods by a significant margin. The code will be available.
Primary Subject Area: [Engagement] Multimedia Search and Recommendation
Secondary Subject Area: [Experience] Multimedia Applications, [Content] Multimodal Fusion
Relevance To Conference: This paper primarily delves into the problem of cross-modality person re-identification, a highly relevant area within the realm of multimedia and multi-modal processing. Facing the challenges of annotation scarcity and modality discrepancies in semi-supervised visible-infrared person re-identification (SSVI-ReID), we innovatively propose the Modality-Unified and Confidence-Guided (MUCG) framework, which integrates three key modules: Dynamic Intermediate Modality Generation (DIMG), Weighted Identification Loss (WIL), and Modality Consistency Loss (MCL). DIMG offers a novel approach to cross-modal knowledge transfer, WIL addresses the challenges associated with annotating multi-modal data, and MCL bridges the gap in feature distributions across modalities, providing a fresh strategy for multi-modal feature learning and fusion. The proposed methods and strategies will significantly propel the progress in areas such as cross-modal recognition and multimedia retrieval, making substantial contributions to the advancement of multimedia and multi-modal processing.
Submission Number: 1700
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview