Self-Organizing Visual Prototypes for Non-Parametric Representation Learning

Thalles Silva; Helio Pedrini; Adín Ramírez Rivera

Self-Organizing Visual Prototypes for Non-Parametric Representation Learning

Thalles Silva, Helio Pedrini, Adín Ramírez Rivera

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: A new SSL algorithm to learn visual features from unlabeled data based on a non-parametric pre-training strategy.

Abstract:

We present Self-Organizing Visual Prototypes (SOP), a new training technique for unsupervised visual feature learning. Unlike existing prototypical self-supervised learning (SSL) methods that rely on a single prototype to encode all relevant features of a hidden cluster in the data, we propose the SOP strategy. In this strategy, a prototype is represented by many semantically similar representations, or support embeddings (SEs), each containing a complementary set of features that together better characterize their region in space and maximize training performance. We reaffirm the feasibility of non-parametric SSL by introducing novel non-parametric adaptations of two loss functions that implement the SOP strategy. Notably, we introduce the SOP Masked Image Modeling (SOP-MIM) task, where masked representations are reconstructed from the perspective of multiple non-parametric local SEs. We comprehensively evaluate the representations learned using the SOP strategy on a range of benchmarks, including retrieval, linear evaluation, fine-tuning, and object detection. Our pre-trained encoders achieve state-of-the-art performance on many retrieval benchmarks and demonstrate increasing performance gains with more complex encoders.

Lay Summary:

Current computer vision systems often use a large set of prototypes to help computers learn from unlabeled pictures, but this approach can miss important details. We developed a new method that lets computers learn from many similar images at once, capturing richer and more accurate information. This makes learning from unlabeled data more reliable and could improve technologies like medical imaging, search engines, and self-driving cars.

Link To Code: https://github.com/sthalles/sop

Primary Area: Deep Learning->Self-Supervised Learning

Keywords: self-supervised learning, computer vision, representation learning, non-parametric SSL

Submission Number: 7347

Loading