Watching the World Go By: Representation Learning from Unlabeled VideosDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Representation Learning, Unsupervised Learning, Video Analytics
Abstract: Recent unsupervised representation learning techniques show remarkable success on many single image tasks by using instance discrimination: learning to differentiate between two augmented versions of the same image and a large batch of unrelated images. Prior work uses artificial data augmentation techniques such as cropping, and color jitter which can only affect the image in superficial waysand are not aligned with how objects actually change e.g. occlusion, deformation,viewpoint change. We argue that videos offer this natural augmentation for free.Videos can provide entirely new views of objects, show deformation, and even connect semantically similar but visually distinct concepts.We propose Video Noise Contrastive Estimation, a method for using unlabeled video to learnstrong, transferable, single image representations.We demonstrate improvements over recent unsupervised single image techniques,as well as over fully supervised ImageNet pretraining, across temporal and non-temporal tasks.
One-sentence Summary: Using unlabeled videos, we improve over recent unsupervised single image techniques, as well as over fully supervised ImageNet pretraining, across a variety of temporal and non-temporal tasks.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2003.07990/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=1hxAZGK7Ik
7 Replies

Loading