Cascaded Tracking via Pyramid Dense Capsules

Ding Ma, Xiangqian Wu

Published: 01 Jan 2020, Last Modified: 12 May 2023ECCV Workshops (5) 2020Readers: Everyone

Abstract: The tracking-by-detection is a two-stage framework including, collecting the candidates around the target object and classifying each candidate as the target object or as background. Despite Convolutional Neural Networks (CNNs) based methods have been successful in tracking-by-detection framework, the own set of flaws of CNNs will still affect the performance. The underlying mechanism of CNNs that are based on the positional invariance (i.e., lose the spatial relationships between features) cannot capture the small affine transformations. This would ultimately result in drift. To solve this problem, we dig into spatial relationships endowed by the Capsule Networks (CapsNets) for tracking-by-detection framework. To strengthen the encoded power of convolutional capsules, we generate the convolutional capsules through a pyramid dense capsules (PDCaps) architecture. Our pyramid dense capsule representation is useful in producing comprehensive spatial relationships within the input. Besides, the critical challenges in the tracking-by-detection framework are how to avoid overfitting and mismatch during training and inference, where a reasonable intersection over union (IoU) threshold that defines the true/false positives is hard to set. To address the issue of the IoU threshold setting, a cascaded PDCaps model is proposed to improve the quality of candidates, and it consists of a sequential PDCaps model trained with increasing IoU thresholds to improve the quality of candidates sequentially. Extensive experiments demonstrate that our tracker performs favorably against state-of-the-art approaches.

0 Replies