Enhancing Text-Image Person Retrieval Through Nuances Varied SampleOpen Website

Published: 01 Jan 2023, Last Modified: 22 Feb 2024PRCV (1) 2023Readers: Everyone
Abstract: Text-image person retrieval is a task that involves searching for a specific individual based on a corresponding textual description. However, a key challenge in this task is achieving modal alignment while conducting fine-grained retrieval. Current methods utilize classification and metric losses to enhance discrimination and alignment. Nevertheless, the substantial dissimilarities between samples often impede the network’s capacity to learn discriminative fine-grained information. To tackle this issue and enable the network to focus on intricate details, we introduce the Nuanced Variation Module (NVM). This module generates artificially difficult negative samples, which serve as a guide for directing the network’s attention towards discerning nuances. The incorporation of NVM-constructed hard-negative samples enhances the alignment loss and facilitates the network’s attentiveness to details. Additionally, we leverage the image text matching task to explicitly augment the network’s fine-grained ability. By adopting our NVM method, the network can extract an ample amount of fine-grained features, thereby mitigating the interference caused by challenging negative samples. Extensive experiments demonstrate that our proposed method achieves competitive performance compared to state-of-the-art approaches on publicly available datasets.
0 Replies

Loading