Forecasting Motion in the Wild

Neerja Thakkar; Shiry Ginosar; Jacob C Walker; Jitendra Malik; Joao Carreira; Carl Doersch

Forecasting Motion in the Wild

Neerja Thakkar, Shiry Ginosar, Jacob C Walker, Jitendra Malik, Joao Carreira, Carl Doersch

Published: 10 Jun 2026, Last Modified: 10 Jun 2026CVPR 2026 Workshop VideoWorldModel PosterEveryoneRevisionsCC BY 4.0

Keywords: predictive visual intelligence; non-rigid motion; behavior forecasting; dense point trajectories

TL;DR: By representing motion as dense point trajectory tokens, our diffusion transformer disentangles motion from appearance to enable category-agnostic and physically coherent forecasting of complex, non-rigid animal behavior in the wild.

Abstract: Visual intelligence requires anticipating the future behavior of agents, yet vision systems lack a general representation for motion and behavior. We propose dense point trajectories as visual tokens for behavior, a structured mid-level representation that disentangles motion from appearance and generalizes across diverse non-rigid agents, such as animals in-the-wild. Building on this abstraction, we design a diffusion transformer that models unordered sets of trajectories and explicitly reasons about occlusion, enabling coherent forecasts of complex motion patterns. To evaluate at scale, we curate MammalMotion, 300 hours of unconstrained animal video with robust shot detection and camera-motion compensation. Experiments show that forecasting trajectory tokens achieves category-agnostic, data-efficient prediction, outperforms state-of-the-art baselines, and generalizes to rare species and morphologies, providing a foundation for predictive visual intelligence in the wild.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 7

Loading