Keywords: Latency-aware perception, aerial tracking, visual tracking benchmark
TL;DR: This work proposes a simple framework for end-to-end latency aware visual tracking.
Abstract: Visual object tracking is an essential capability of intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during real-world processing. Especially for unmanned aerial vehicle, where robust tracking is more challenging and onboard computation is limited, latency issue could be fatal. In this work, we presents a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). PVT++ is capable of turning most leading-edge trackers into predictive trackers by appending an online predictor. Unlike existing solutions that use model-based approaches, our framework is learnable, such that it can take not only motion information as input but it can also take advantage of visual cues or a combination of both. Moreover, since PVT++ is end-to-end optimizable, it can further boost the latency-aware tracking performance. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an \textit{any-speed} tracker in the online setting. Empirical results on robotic platform from aerial perspective show that the motion-based PVT++ can obtain on par or better performance than existing approaches. Further incorporating visual information and joint training techniques, PVT++ can achieve significant performance gain on various trackers and exhibit better robustness than prior model-based solution, essentially removing the degradation brought by their latency onboard.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/pvt-a-simple-end-to-end-latency-aware-visual/code)
5 Replies
Loading