back

TAP-Net: Tracking Any Point in a Video


DAVIS Point Tracking

TAPNet generalizes to real-world videos from the DAVIS benchmark, even though training labels came exclusively from the synthetic Kubric dataset. For each example, we show each tracked point in a different color. For simplicity, all query points are given on the first frame, although our network is capable of tracking queries from any frame. The points are typically tracked consistently over hundreds of frames under challenging occlusions, changes in appearance and pose of the objects. However, small objects and changes in scale remain challenging.

The goat's body is tracked quite well, despite having texture which is somewhat similar to the background.

On this example, the swan's body and face are tracked very well. However bill is too thin and our algorithm loses track of one point.

Some points on these camels are tracked quite well, including one on the hump of a heavily occluded camel. However, large changes in viewpoint, as well as thin structures cause some failures toward the end of the video.

In this example, the network starts off very precise, but begins to deteriorate after large changes in scale due to the zooming camera.