Keywords: Neural collapse, Cross-entropy dynamics, spectral initialization
TL;DR: Hadamard initialization makes cross-entropy training analytically tractable in a simple two-layer classification model, allowing for an extension of the spectral initialization analysis of Saxe et al. (2013, 2019).
Abstract: In this work, we study cross-entropy (CE) dynamics using a two-layer linear network with orthogonal inputs, the simplest non-convex setting where the CE implicit bias remains unresolved. This coincides with the unconstrained features model used to study neural collapse (NC), a phenomenon occurring in deep classification networks. Our analysis is based on a key observation: Hadamard initialization diagonalizes the softmax operator. This allows us to extend the spectral initialization framework that Saxe et al. (2013,2019) developed for squared loss. We prove convergence to NC under spectral CE training and give the first finite-time analysis in this setting via an explicit Lyapunov function that decreases monotonically to NC. We further identify CE-specific phenomena absent under squared loss and show empirically that spectral dynamics model small random initialization.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 106
Loading