TL;DR: Principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing
Abstract: We present a principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing. We theoretically demonstrate that this design introduces an inductive bias that encourages representations to be simultaneously decorrelated and clustered, without explicitly enforcing these properties. This bias provably enhances generalization and suffices to avoid known training failure modes, such as representation, dimensional, cluster, and intracluster collapses. We validate our theoretical findings on image datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet-100. Our approach effectively combines the strengths of feature decorrelation and cluster-based self-supervised learning methods, overcoming training failure modes while achieving strong generalization in clustering and linear classification tasks.
Lay Summary: How can we teach computers to understand the world without constant human guidance? One promising approach, known as self-supervised learning, allows computers to learn on their own by discovering patterns in data. But building these systems is often laborious and error-prone, even experts frequently rely on trial and error to make them work.
In this work, we identify and categorize the common ways these systems can fail, and we provide a simple set of guidelines to train them more reliably. We also introduce a method to measure how robust they are.
Our findings make this powerful type of machine learning easier to use and understand. Ultimately, this could lead to more accessible and trustworthy AI systems that learn meaningful, structured representations of the real world, all without constant human supervision.
Link To Code: https://github.com/emsansone/CPLearn
Primary Area: Deep Learning->Self-Supervised Learning
Keywords: self-supervised learning, hyperdimensional computing, feature decorrelation, clustering, collapses
Submission Number: 12199
Loading