On the Importance of Embedding Norms in Self-Supervised Learning

Andrew Draganov; Sharvaree Vadgama; Sebastian Damrich; Jan Niklas Böhm; Lucas Maes; Dmitry Kobak; Erik J Bekkers

On the Importance of Embedding Norms in Self-Supervised Learning

Andrew Draganov, Sharvaree Vadgama, Sebastian Damrich, Jan Niklas Böhm, Lucas Maes, Dmitry Kobak, Erik J Bekkers

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We show in theory, simulation and practice that the embedding norms have critical consequences for self-supervised learning.

Abstract: Self-supervised learning (SSL) allows training data representations without a supervised signal and has become an important paradigm in machine learning. Most SSL methods employ the cosine similarity between embedding vectors and hence effectively embed data on a hypersphere. While this seemingly implies that embedding norms cannot play any role in SSL, a few recent works have suggested that embedding norms have properties related to network convergence and confidence. In this paper, we resolve this apparent contradiction and systematically establish the embedding norm's role in SSL training. Using theoretical analysis, simulations, and experiments, we show that embedding norms (i) govern SSL convergence rates and (ii) encode network confidence, with smaller norms corresponding to unexpected samples. Additionally, we show that manipulating embedding norms can have large effects on convergence speed. Our findings demonstrate that SSL embedding norms are integral to understanding and optimizing network behavior.

Lay Summary: Machine learning models process information using embeddings -- high-dimensional points which encode what the model extracted from the input. Many machine learning training objectives treat these embeddings as having a fixed size and, consequently, most analysis of these embeddings ignores their size. In this paper, we show that embedding sizes (norms) *both* contain valuable information *and* control how well the model learns. Specifically, we show that the embedding norm represents the model's certainty in the corresponding input and that, if the embedding norm is large, then the model has a difficult time updating this representation.

Link To Code: https://github.com/Andrew-Draganov/SSLEmbeddingNorms

Primary Area: General Machine Learning->Representation Learning

Keywords: Representation Learning, Embedding Norm, Network Confidence, Self-Supervised Learning

Submission Number: 3679

Loading