Data Augmentation for Text-based Person Retrieval Using Large Language Models

ACL ARR 2024 June Submission810 Authors

13 Jun 2024 (modified: 07 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Text-based Person Retrieval aims to retrieve person images that match the description given a text query. The performance of the TPR model relies on high-quality data. However it is challenging to construct a large-scale, high-quality TPR dataset due to expensive annotation and privacy protection. Recently, Large Language Models (LLMs) have approached human performance on many NLP tasks, creating the possibility to expand high-quality TPR datasets. This paper proposes the first LLM-based Data Augmentation (LLM-DA) method for TPR. LLM-DA uses LLMs to rewrite the text in the TPR dataset, achieving high-quality expansion concisely and efficiently. These rewritten texts are able to increase text diversity while retaining the original key semantic concepts. To alleviate hallucinations of LLMs, LLM-DA introduces a Text Faithfulness Filter to filter out unfaithful rewritten text. To balance the contributions of original and augmented text, a Balanced Sampling Strategy is proposed to control the proportion of original and augmented text used for training. LLM-DA is a plug-and-play method that can be integrated into various TPR models. Comprehensive experiments show that LLM-DA can improve the retrieval performance of current TPR models.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: image text matching;cross-modal application; multimodality
Languages Studied: English
Submission Number: 810