Aggregate Tracklet Appearance Features for Multi-Object Tracking

Long Chen, Haizhou Ai, Rui Chen, Zijie Zhuang

Published: 2019, Last Modified: 23 Jan 2026IEEE Signal Process. Lett. 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-object tracking (MOT) has wide applications in the fields of video analysis and signal processing. A major challenge in MOT is how to associate the noisy detections into long and continuous trajectories. In this letter, we address the association problem at the tracklet-level, and mainly focus on the appearance representation designed for tracklets. A multitask convolutional neural network is proposed to learn the discriminative features and spatial-temporal attentions jointly. In particular, we decompose an object in a static image with spatial attentions, and then aggregate multiple features in a tracklet based on the temporal attentions. Appearance misalignment that caused by occlusion and inaccurate bounding is then mitigated by multi-feature aggregation. Experimental results on two challenging MOT benchmarks have demonstrated the effectiveness of the proposed method and shown significant improvement on the quality of tracking identities.