Object-Centric Discriminative Learning for Text-Based Person Retrieval

Published: 2025, Last Modified: 22 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Text-based person retrieval (TBPR) is a vision-language task that aims to find specific pedestrians in a large image gallery using the textual description. However, due to the heterogeneity between modalities and the redundancy in visual representations, it remains a challenging task. Existing methods do not explicitly reduce the influence of the background regions in images, inevitably decreasing representation ability and reducing the image-text matching performance. In this paper, we propose a novel framework for text-based person retrieval, termed Object-Centric Discriminative Learning (OCDL), which incorporates person masks to indicate attentive regions, thereby enhancing the model’s focus on the pedestrians in images while suppressing the background noise. Additionally, a novel crossmodal matching loss, namely Soft Angular Distribution Matching (SADM), is introduced to learn discriminative visual and textual representations. Extensive experiments on three widely-used TBPR datasets demonstrate the effectiveness of our approach. The code is available at https://github.com/JThuge/OCDL.
Loading