Hyperbolic Contrastive Learning for Visual Representations beyond ObjectsDownload PDF

16 May 2022 (modified: 12 Mar 2024)NeurIPS 2022 SubmittedReaders: Everyone
Keywords: contrastive learning, self-supervised learning, Riemannian geometry, representation learning
Abstract: Despite the rapid progress in visual representation learning driven by self-/un-supervised methods, both objects and scenes have been primarily treated using the same lens. In this paper, we focus on learning representations for objects and scenes explicitly in the same space. Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should further follow a hierarchical structure based on their compositionality. To exploit such a structure, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to regularize scene representations according to the hierarchy. This novel hyperbolic objective encourages the scene-object hypernymy among the representations by optimizing the magnitude of their norms. We show that when pretraining on the COCO and OpenImages datasets, the hyperbolic loss improves downstream performance across multiple datasets and tasks, including image classification, object detection, and semantic segmentation. We also show that the properties of the learned representations allow us to solve various vision tasks that involve the interaction between scenes and objects in a zero-shot way.
TL;DR: We propose a contrastive learning framework that learns visual representation for both objects and scenes in the same representation space with a hierarchical topology preserved.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2212.00653/code)
16 Replies

Loading