Abstract: In recent years, multi-object tracking is usually treated as a data association problem based on detection results, also known as tracking-by-detection. Such methods are often difficult to adapt to the requirements of time-critical video analysis applications which consider detection and tracking together. In this paper, we propose to accomplish object detection and appearance embedding via a two-stage network. On the one hand, we accelerate network inference process by sharing a set of low-level features and introducing a Position-Sensitive RoI pooling layer to better estimate the classification probability. On the other hand, to handle unreliable detection results produced by the two-stage network, we select candidates from outputs of both detection and tracking based on a novel scoring function which considers classification probability and tracking confidence together. In this way, we can achieve an effective trade-off between multi-object tracking accuracy and speed. Moreover, we conduct a cascade data association based on the selected candidates to form object trajectories. Extensive experiments show that each component of the tracking framework is effective and our real-time tracker can achieve state-of-the-art performance.
0 Replies
Loading