Transductive Multi-Object Tracking in Complex Events by Interactive Self-Training

Ancong Wu, Chengzhi Lin, Bogao Chen, Weihao Huang, Zeyu Huang, Wei-Shi Zheng

Published: 01 Jan 2020, Last Modified: 16 May 2023ACM Multimedia 2020Readers: Everyone

Abstract: Recently, multi-object tracking (MOT) for estimating trajectories of pedestrians has undergone fast development and played an important role in human-centric video analysis. However, video analysis in complex events (e.g. scenes in HiEve dataset) is still under-explored. In complex real-world scenarios, domain gap in unseen testing scenes and severe occlusion problem that disconnects tracks are challenging for existing online MOT methods without domain adaptation. To alleviate domain gap, we study the problem in a transductive learning setting, which assumes that unlabeled testing data is available for learning offline tracking. We propose a transductive interactive self-training method to adapt the tracking model to unseen crowded scenes with unlabeled testing data by means of teacher-student interative learning. To reduce prediction variance in an unseen domain, we train two different models and teach one model with pseudo labels of unlabeled data predicted by the other model interactively. To improve robustness against occlusions during self-training, we exploit disconnected track interpolation (DTI) to refine the predicted pseudo labels. Our method achieved MOTA of 60.23 on HiEve dataset and won the first place of Multi-person Motion Tracking in Complex Events (with Private Detection) in the ACM MM Grand Challenge on Large-scale Human-centric Video Analysis in Complex Events.

0 Replies