Provable Emergence of Deep Neural Collapse and Low-Rank Bias in $L^2$-Regularized Nonlinear Networks

Provable Emergence of Deep Neural Collapse and Low-Rank Bias in $L^2$-Regularized Nonlinear Networks

06 Feb 2026 (modified: 08 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present a unified theoretical framework connecting the first property of Deep Neural Collapse (DNC1) to the emergence of implicit low-rank bias in nonlinear networks trained with $L^2$ weight decay regularization. Our main contributions are threefold. First, we derive a quantitative relation between the Total Cluster Variation (TCV) of intermediate embeddings and the numerical rank of stationary weight matrices. In particular, we establish that, at any critical point, the distance from a weight matrix to the set of rank-$K$ matrices is bounded by a constant times the TCV of earlier-layer features, scaled inversely with the weight-decay parameter. Second, for invertible nonlinearities, we prove global optimality of DNC1 in a constrained representation-cost setting for both feedforward and residual architectures, showing that zero TCV across intermediate layers minimizes the representation cost under natural architectural constraints. Third, we establish a benign landscape property: for almost every interpolating initialization there exists a continuous, loss-decreasing path from the initialization to a globally optimal, DNC1-satisfying configuration. Our theoretical claims are validated empirically; numerical experiments confirm the predicted relations among TCV, singular-value structure, and weight decay. These results indicate that neural collapse and low-rank bias are intimately linked phenomena arising from the optimization geometry induced by weight decay.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Zhihui_Zhu1

Submission Number: 7384

Loading