Object-Contrastive Networks: Unsupervised Object RepresentationsDownload PDF

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Discovering objects and their attributes is of great importance for autonomous agents to effectively operate in human environments. This task is particularly challenging due to the ubiquitousness of objects and all their nuances in perceptual and semantic detail. In this paper we present an unsupervised approach for learning disentangled representations of objects entirely from unlabeled monocular videos. These continuous representations are not biased by or limited by a discrete set of labels determined by human labelers. The proposed representation is trained with a metric learning loss, where objects with homogeneous features are pushed together, while those with heterogeneous features are pulled apart. We show these unsupervised embeddings allow to discover object attributes and can enable robots to self-supervise in previously unseen environments. We quantitatively evaluate performance on a large-scale synthetic dataset with 12k object models, as well as on a real dataset collected by a robot and show that our unsupervised object understanding generalizes to previously unseen objects. Specifically, we demonstrate the effectiveness of our approach on robotic manipulation tasks, such as pointing at and grasping of objects. An interesting and perhaps surprising finding in this approach is that given a limited set of objects, object correspondences will naturally emerge when using metric learning without requiring explicit positive pairs.
Keywords: self-supervised robotics, object understanding, object representations, metric learning, unsupervised vision
TL;DR: An unsupervised approach for learning disentangled representations of objects entirely from unlabeled monocular videos.
4 Replies

Loading