Pixel-level Correspondence for Self-Supervised Learning from Video

Yash Sharma; Yi Zhu; Chris Russell; Thomas Brox

Pixel-level Correspondence for Self-Supervised Learning from Video

Yash Sharma, Yi Zhu, Chris Russell, Thomas Brox

26 May 2022 (modified: 04 May 2025)ICML 2022 Pre-training WorkshopReaders: Everyone

Keywords: self-supervised, unsupervised, representation, video, optical flow, dense prediction, contrastive learning

TL;DR: Improve dense prediction, while maintaining global classification performance, by enabling dense contrastive learning on video via off-the-shelf optical flow.

Abstract: While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision. To address this, we propose Pixel-level Correspondence (PiCo), a method for dense contrastive learning from video. By tracking points with optical flow, we obtain a correspondence map which can be used to match local features at different points in time. We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks, without compromising performance on image classification.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/pixel-level-correspondence-for-self/code)

0 Replies

Loading