How Gradient Descent Separates Data with Neural Collapse: A Layer-Peeled PerspectiveDownload PDF

21 May 2021 (modified: 05 May 2023)NeurIPS 2021 SubmittedReaders: Everyone
Abstract: In this paper, we derive a landscape analysis to the surrogate model to study the inductive bias of the neural features and parameters from neural networks with cross-entropy. We show that once the training cross-entropy loss decreases below a certain threshold, the features and classifiers in the last layer of the neural network will converge to a certain geometry structure, which is known as neural collapse\citep{papyan2020prevalence,fang2021layer}, \emph{i.e.} cross-example within-class variability of last-layer feature collapses to zero and the class-means converge to a Simplex Equiangular Tight Frame (ETF). We illustrate that the cross-entropy loss enjoys a benign global landscape where all the critical points are strict saddles whose Hessian exhibit negative curvature directions except the only global minimizers which exhibit neural collapse phenomenon.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: zip
21 Replies

Loading