Cross-Modal Person Search: A Coarse-to-Fine Framework using Bi-Directional Text-Image Matching

Xiaojing Yu, Tianlong Chen, Yang Yang, Michael Mugo, Zhangyang Wang

2019 (modified: 02 Feb 2022)ICCV Workshops 2019Readers: Everyone

Abstract: Searching person images from a gallery based on natural language descriptions remains to be a challenging and under-explored cross-modal retrieval problem. To improve the accuracy off an image-based retrieval task, e.g., person re-identification (Person Re-Id), re-ranking is known to be an effective post-processing tool. In this paper, we extend re-ranking from uni-modal retrieval to cross-modal retrieval for the first time, and develop a bi-directional coarse-to-fine framework (BCF) for cross-modal person search. Built on a recent state-of-the-art Person Re-Id model, BCF exploits first text-to-image and then image-to-text relevance, in a two-stage refinement fashion. BCF ranks competitively against a strong baseline on the newly-introduced WIDER Person Search dataset, boosting validation set performance by 9.01%(top-1)/3.87%(mAP) for val1 and 6.60%(top-1)/3.49%(mAP) for val2, respectively. With a high score, our solution ranks competitively in the ICCV 2019 WIDER Person Search by Language Challenge.

0 Replies