Self-Organizing Visual Embeddings for Non-Parametric Self-Supervised Learning

Thalles Silva; Helio Pedrini; Adín Ramírez Rivera

Self-Organizing Visual Embeddings for Non-Parametric Self-Supervised Learning

Thalles Silva, Helio Pedrini, Adín Ramírez Rivera

23 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: self-supervised learning, clustering, representation learning, computer vision

TL;DR: Forget learning prototypes using SGD. That puts too much responsibility into the prototypes. Instead, establish many anchors or judges per concept. Each concept assesses the similarity of views towards the concept from a different perspective.

Abstract: We present Self-Organizing Visual Embeddings (SOVE) a new training technique for unsupervised representation learning. SOVE avoids learning prototypes from scratch and explores relationships between visual embeddings in a non-parametric space. Unlike existing clustering-based techniques that employ a single prototype to encode all the relevant features of a complex concept, we propose the SOVE method where a concept is represented by many semantically similar representations, or judges, each containing a complement set of features that together can fully characterize the concept and maximize training performance. We reaffirm the feasibility of non-parametric self-supervised learning (SSL) by introducing novel non-parametric adaptions of two loss functions with the SOVE technique: (1) non-parametric cluster assignment prediction for class-level representations and (2) non-parametric Masked Image Modeling (MIM) for patch-level reconstruction. SOVE achieves state-of-the-art performance on many downstream benchmarks, including transfer learning, image retrieval, object detection, and segmentation. Moreover, SOVE demonstrates scaling performance when trained with Vision Transformers (ViTs), showing increased performance gains as more complex encoders are employed.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3269

Loading