Face Helps Person Re-Identification: Multi-modality Person Re-Identification Based on Vision-Language Models

Meng Zhang, Rujie Liu, Narishige Abe

Published: 01 Jan 2024, Last Modified: 11 Apr 2025IJCB 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Person re-identification (ReID), aiming to identify individuals from camera views, often faces challenges such as occlusion and appearance variations by cloth changing. Moreover, due to the long-distance capturing and varying positions of the pedestrians, human face is not always visible thus it is usually neglected in ReID. This paper proposes a novel approach to enhance ReID performance by integrating face and body into a multi-modality ReID framework, particularly improving the behavior in scenarios with occlusion and clothes-changing. Leveraging the visual-linguistic capabilities of the CLIP model, our framework comprises two CLIP-like structures: one dedicated to extracting body appearance features and the other one focused on face features. Furthermore, a feature adapter method is proposed to address the issue of invisible face. Experiments show that state-of-the-art (SOTA) performance is achieved on six popular benchmarks datasets, including Market1501, and LTCC, confirming the superiority of the proposed method. Additionally, we have proposed a multimodality ReID dataset to further verify and analyze the effectiveness of the proposed multi-modality ReID framework.