LVLM-Driven Attribute-Aware Modeling for Visible-Infrared Person Re-Identification

Zhiqi Pang; Lingling Zhao; Junjie Wang; Chunyu Wang

LVLM-Driven Attribute-Aware Modeling for Visible-Infrared Person Re-Identification

Zhiqi Pang, Lingling Zhao, Junjie Wang, Chunyu Wang

Published: 18 Sept 2025, Last Modified: 21 Apr 2026NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: visible-infrared person re-identification, large vision-language model, unsupervised learning, contrastive learning

TL;DR: We enhance visible-infrared person re-identification by leveraging attribute arrays extracted using a large vision-language model.

Abstract: Visible-infrared person re-identification (VI-ReID) aims to match visible and infrared images of the same individual. Supervised VI-ReID (SVI-ReID) methods have achieved promising performance under the guidance of manually annotated identity labels. However, the substantial annotation cost severely limits their scalability in real-world applications. As a result, unsupervised VI-ReID (UVI-ReID) methods have attracted increasing attention. These methods typically rely on pseudo-labels generated by clustering and matching algorithms to replace manual annotations. Nevertheless, the quality of pseudo-labels is often difficult to guarantee, and low-quality pseudo-labels can significantly hinder model performance improvements. To address these challenges, we explore the use of attribute arrays extracted by a large vision-language model (LVLM) to enhance VI-ReID, and propose a novel LVLM-driven attribute-aware modeling (LVLM-AAM) approach. Specifically, we first design an attribute-aware reliable labeling strategy, which refines intra-modality clustering results based on image-level attributes and improves inter-modality matching by grouping clusters according to cluster-level attributes. Next, we develop an explicit-implicit attribute fusion module, which integrates explicit and implicit attributes to obtain more fine-grained identity-related text features. Finally, we introduce an attribute-aware contrastive learning module, which jointly leverages static and dynamic text features to promote modality-invariant feature learning. Extensive experiments conducted on VI-ReID datasets validate the effectiveness of the proposed LVLM-AAM and its individual components. LVLM-AAM not only significantly outperforms existing unsupervised methods but also surpasses several supervised methods.

Supplementary Material: zip

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 4985

Loading