Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay

TMLR Paper7192 Authors

27 Jan 2026 (modified: 06 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks, which states that last-layer feature vectors for the same class would `collapse' to a single point, while features of different classes become equally separated. We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC. In the near-optimal loss regime, we establish an asymptotic lower bound on the emergence of NC that depends only on the WD value, training loss, and the presence of last-layer BN. Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values and lower loss. Our findings offer a novel perspective in studying the role of BN and WD in shaping neural network features.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Marco_Mondelli1
Submission Number: 7192
Loading