Linear Loss Classification: Efficient Training Through Neural Collapse

Wonyeong Song; Donghwan Kim

Linear Loss Classification: Efficient Training Through Neural Collapse

Wonyeong Song, Donghwan Kim

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Loss Function, Neural Collapse, Inductive Bias, Learning Dynamics

TL;DR: We introduce the linear loss, which avoids gradient decay, induces neural collapse, and accelerates convergence to strong generalization.

Abstract: Logistic loss is widely used for classification, as neural networks trained by gradient descent (GD) on this loss exhibit strong generalization. This success is commonly attributed to inductive biases, such as the directional convergence of the last-layer classifier toward the max-margin solution for a given feature representation, and neural collapse (NC), a terminal-phase phenomenon characterized by structural simplification of last-layer features and classifiers. However, they can emerge slowly due to exponentially decaying gradients. In this work, we introduce linear loss $l(u)=-u$, which eliminates gradient decay and leads to faster training dynamics. Under this loss, GD no longer directionally converges to the max-margin solution, but instead aligns with the difference between class means. While this may appear suboptimal, once NC occurs, the two directions become closely aligned, mitigating the discrepancy. Empirically, we demonstrate that GD with the linear loss, combined with appropriate normalizations, induces NC, and that both NC and generalization occur faster than with logistic loss. On the theoretical side, we prove that two-layer neural networks with ReLU activation trained with GD on linear loss exhibit directional NC for orthogonally separable data. Together, these results suggest that the linear loss, despite its simplicity and deviation from standard classification losses, can be sufficient to induce NC and thereby achieve strong generalization faster than other losses.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 190

Loading