Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

Panagiotis Koromilas; Theodoros Giannakopoulos; Mihalis Nicolaou; Yannis Panagakis

Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

Panagiotis Koromilas, Theodoros Giannakopoulos, Mihalis Nicolaou, Yannis Panagakis

Published: 02 Jun 2026, Last Modified: 02 Jun 2026Greeks in AI 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation Learning, Neural Collapse, Supervised Contrastive Learning, Supervised Learning

Domains: Machine Learning Theory, Vision and Learning

TL;DR: We propose a unified view of Supervised Learning as contrasting prototypes on the hypersphere, and introduce objectives that attain Neural Collapse, the optimum of Supervised Learning, in practice.

External Link: https://arxiv.org/abs/2605.20302

Abstract: Supervised classification has a theoretical optimum, Neural Collapse (NC), yet neither of its two dominant paradigms reaches it in practice. Cross entropy (CE) leaves radial degrees of freedom unconstrained and converges to a degenerate geometry, while supervised contrastive learning (SCL) drives features toward NC during pretraining but discards this structure in a post hoc linear probing phase. We show that both paradigms are different appearances of the same method that contrasts prototypes on the unit hypersphere, and that closing the gap requires fixing each at its point of failure. From the CE side, we propose NTCE and NONL, two normalized losses that import contrastive optimization's missing ingredients into classifier learning: a large effective negative set and decoupled alignment and uniformity terms. From the SCL side, we prove that SCL's objective already optimizes throughout training for a principled classifier whose weights are the class mean embeddings, making linear probing both redundant and harmful. Empirically, on four benchmarks including ImageNet-1K, NTCE and NONL surpass CE accuracy, closely approximate NC ($\geq 95\%$), and match CE's converged NC on 4/5 metrics in under $7.5\%$ of its iterations, while SCL with fixed prototypes matches linear probing without the hours-long classifier training phase. The learned geometry yields $+5.5\%$ mean relative improvement in transfer learning, up to $+8.7\%$ under severe class imbalance, and improved robustness to corruptions on ImageNet-C. Our work recasts supervised learning as prototype learning on the hypersphere, with NC reached *by design*. Accepted at ICML 2026.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 116

Loading