Occam’s Razor for SSL: Memory-Efficient Parametric Instance Discrimination

Eric Gan; Patrik Reizinger; Alice Bizeul; Attila Juhos; Mark Ibrahim; Randall Balestriero; David Klindt; Wieland Brendel; Baharan Mirzasoleiman

Occam’s Razor for SSL: Memory-Efficient Parametric Instance Discrimination

Eric Gan, Patrik Reizinger, Alice Bizeul, Attila Juhos, Mark Ibrahim, Randall Balestriero, David Klindt, Wieland Brendel, Baharan Mirzasoleiman

Published: 29 Dec 2025, Last Modified: 29 Dec 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Self-supervised learning (SSL) is the prevalent paradigm for representation learning often relying on pairwise similarity between multiple augmented views of each example. Numerous learning methods with various complexities such as gradient stopping, negative sampling, projectors, additional regularization terms, were introduced in the past years. These methods can be effective, but they require careful hyperparameter tuning, have increased computational and memory requirements and struggle with latent dimensionality collapse. Furthermore, complexities such as gradient stopping make them hard to analyse theoretically and confound the essential components of SSL. We introduce a simple parametric instance discrimination method, called Datum IndEx as its Target (DIET). DIET has a single computational branch, without explicit negative sampling, gradient stopping or other hyperparameters. We empirically demonstrate that DIET (1) can be implemented in a memory-efficient way; (2) achieves competitive performance with state-of-the-art SSL methods on small-scale datasets; and (3) is robust to hyperparameters such as batch size. We uncover tight connections to Spectral Contrastive Learning in the lazy training regime, leading to practical insights about the role of feature normalization. Compared to SimCLR or VICReg, DIET also has higher-rank embeddings on CIFAR100 and TinyImageNet, suggesting that DIET captures more latent information.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: - theoretical results for the cross-entropy loss - restructured experimental and ablation sections with more extensive evaluations, MoCo and VICReg baselines, and additional experimental details (Sec. 6 and 7)

Assigned Action Editor: ~Georgios_Leontidis1

Submission Number: 6212

Loading