Fusing Two Directions in Cross-Domain Adaption for Real Life Person Search by Language

Kai Niu, Yan Huang, Liang Wang

Published: 2019, Last Modified: 08 Oct 2024ICCV Workshops 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Person search by language is an important application in video surveillance. The existing huge visual-semantic discrepancy and the cross-domain difficulty of emerging pedestrian images with new identities while no language description for training in real life application make this problem non-trivial to be addressed. In this paper, we first propose a concise and effective framework for image-sentence alignment to deal with the visual-semantic discrepancy. Second, we innovatively fuse the two opposite directions, i.e., source to target and target to source, for cross-domain adaption. Extensive experiments have validated the significant superiority of the proposed method on both source domain and target domain, and we have obtained the state-of-the-art performance and won the 1st place in competition.