Keywords: tracking, tracking-any-point, optical flow
TL;DR: We present ITTO, a long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions.
Abstract: We introduce ITTO, a challenging new benchmark suite for evaluating and diagnosing the capabilities and limitations of point tracking methods. Our videos are sourced from existing datasets and egocentric real-world recordings, with high-quality human annotations collected through a multi-stage pipeline. ITTO captures the motion complexity, occlusion patterns, and object diversity characteristic of real-world scenes -- factors that are largely absent in current benchmarks. We conduct a rigorous analysis of state-of-the-art tracking methods on ITTO, breaking down performance along key axes of motion complexity. Our findings reveal that existing trackers struggle with these challenges, particularly in re-identifying points after occlusion, highlighting critical failure modes. These results point to the need for new modeling approaches tailored to real-world dynamics. We envision ITTO as a foundation testbed for advancing point tracking and guiding the development of more robust tracking algorithms.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/demalenk/itto-dataset
Code URL: https://github.com/ilonadem/itto/tree/main
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 575
Loading