TAPVid-360: Tracking Any Point in 360 from Narrow Field of View Video

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: track any point, 360 video, object permanence, video
TL;DR: For given query points, track their direction relative to the camera through a video, even if they leave the field of view. A dataset, benchmark and baseline method.
Abstract: Humans excel at constructing panoramic mental models of their surroundings, maintaining object permanence and inferring scene structure beyond visible regions. In contrast, current artificial vision systems struggle with persistent, panoramic understanding, often processing scenes egocentrically on a frame-by-frame basis. This limitation is pronounced in the Track Any Point (TAP) task, where existing methods fail to track 2D points outside the field of view. To address this, we introduce TAP-Vid 360, a novel task that requires predicting the 3D direction to queried scene points across a video sequence, even when far outside the narrow field of view of the observed video. This task fosters learning allocentric scene representations without needing dynamic 4D ground truth scene models for training. Instead, we exploit 360 videos as a source of supervision, resampling them into narrow field-of-view perspectives while computing ground truth directions by tracking points across the full panorama using a 2D pipeline. We introduce a new dataset and benchmark, TAP360-10k comprising 10k perspective videos with ground truth directional point tracking. Our baseline adapts CoTracker v3 to predict per-point rotations for direction updates, outperforming existing TAP and TAP-Vid 3D methods.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/fhudson96/TAPVid360-10k
Code URL: https://github.com/finlay-hudson/TAPVid360/
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 947
Loading