Dense Unsupervised Learning for Video Segmentation

Nikita Araslanov; Simone Schaub-Meyer; Stefan Roth

Dense Unsupervised Learning for Video Segmentation

Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: unsupervised learning, video object segmentation, representation learning from unlabelled videos

Abstract: We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. We rely on uniform grid sampling to extract a set of anchors and train our model to disambiguate between them on both inter- and intra-video levels. However, a naive scheme to train such a model results in a degenerate solution. We propose to prevent this with a simple regularisation scheme, accommodating the equivariance property of the segmentation task to similarity transformations. Our training objective admits efficient implementation and exhibits fast training convergence. On established VOS benchmarks, our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: zip

TL;DR: Learning spatio-temporal correspondences from unlabelled video for video object segmentation

Code: https://github.com/visinf/dense-ulearn-vos

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/dense-unsupervised-learning-for-video/code)

11 Replies

Loading