Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
Keywords: Neural collapse, Cross-entropy dynamics, spectral initialization
TL;DR: Hadamard initialization makes cross-entropy training analytically tractable in a simple two-layer classification model, allowing for an extension of the spectral initialization analysis of Saxe et al. (2013, 2019).
Abstract: In this work, we study cross-entropy (CE) dynamics using a two-layer linear network with orthogonal inputs, the simplest non-convex setting where the CE implicit bias remains unresolved. This coincides with the unconstrained features model used to study neural collapse (NC). Our analysis is based on a key observation: Hadamard initialization diagonalizes the softmax operator. This allows us to extend the spectral initialization framework that Saxe et al. (2013, 2019} developed for squared loss. We prove convergence to NC under spectral CE training and give the first finite-time analysis in this setting via an explicit Lyapunov function that decreases monotonically to NC. We further identify CE-specific phenomena absent under squared loss, and show empirically that spectral dynamics qualitatively model small random initialization.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 50
Loading