MTA-PS: Towards Practical Person Search in Videos

Published: 01 Jan 2024, Last Modified: 13 May 2025ICIP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Person search (PS) aims to simultaneously localize and identify a target person from natural, uncropped images. Existing PS datasets and research works are mostly based on individual images, exhibiting limited practicability in real-world surveillance scenarios. We contend that videos, compared to static images, offer additional temporal information, making searching for the trajectory of the target person from videos more realistic and accurate. In this paper, we propose a new practical and realistic task, namely person search in videos, and a new evaluation metric specifically tailored for it. To fulfill this, we introduce a new PS dataset, namely MTAPS, based on an existing large-scale simulated video dataset. MTA-PS is the first cross-camera PS dataset in virtual videos, consisting of 6 cameras, 60 videos, 1.8K identities, 295.2K frames, 7.3M bounding boxes, and more than 20 minutes per camera, which is challenging and comprehensive, and meanwhile avoids privacy issues. To validate the effectiveness of PS in videos and make full use of the temporal information on our dataset, we also propose a novel framework by seamlessly integrating the three sub-tasks of person detection, tracking, and re-identification. Extensive experiments demonstrate that our method performs favorably over existing counterparts on the newly-introduced MTA-PS dataset. Codes and datasets are available at https://github.com/mtmyyy/MTA-PS.
Loading