Text-based person search via cross-modal alignment learning

Xiao Ke, Hao Liu, Peirong Xu, Xinru Lin, Wenzhong Guo

Published: 2024, Last Modified: 08 Apr 2025Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A novel text-based person search network is proposed by reducing modal differences while learning sufficient modal features.•A multi-granularity feature self-optimization module is designed to optimize the multiscale image modal feature and multi-level semantic text modal feature, so as to learn more discriminative features with suppressing useless and redundant information.•A cross-instance feature alignment is proposed to construct image–text feature pairs with category-level information participating in training.•Extensive experiments in both CUHK-PEDES and ICFG-PEDES datasets show our MAPS obtains the state-of-the-art performance, which significantly outperforms other existing methods.