Closed-loop Scaling Up for Visual Object Tracking

ICLR 2025 Conference Submission1572 Authors

18 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scaling law, Downstream vision tasks, Visual object tracking
TL;DR: We explore the scaling law in downstream vision tasks, and take visual object tracking as case study.
Abstract: Thanks to the principles of the scaling law, current neural networks have experienced remarkable performance improvements. While much of the existing research has concentrated on upstream pretraining, the application of the scaling law to downstream vision tasks remains underexplored. Understanding the scaling law in downstream tasks can aid in the design of more effective models and training strategies. Thus, in this work, we aim to investigate the application of the scaling law to downstream vision tasks. Firstly, we explore the impact of three key factors of scaling law: training data volume, model size, and input resolution. We empirically verify that increasing each of these factors can lead to performance enhancements. Secondly, to address naive training's optimization challenges and lack of iterative refinement, we introduce DT-Training which leverages small teacher transfer and dual-branch alignment to further exploit model potential. Thirdly, building on DT-Training, we propose a closed-loop scaling strategy to incrementally scale the model step-by-step. Finally, our scaled model exhibits strong ability and outperforms existing counterparts across diverse test benchmarks. Extensive experiments also reveal the robust transfer ability of our model. Moreover, we validate the generalizability of the scaling law and our proposed DT-Training on other downstream vision tasks, reinforcing the broader applicability of our approach. We hope that our findings can deepen the understanding of the scaling law in downstream tasks and foster future developments on downstream tasks.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1572
Loading