How Cross-Entropy Learns Data Modes: Emergence and Implicit Bias in the Unconstrained Features Model

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Optimization Dynamics, Cross-entropy, Linear Networks, Unconstrained Features Model, Implicit Bias, Regularization Path
TL;DR: We study sequential learning in cross-entropy training via explicit analysis of the regularization path and, under spectral initialization, of the gradient flow, in a canonical two-layer linear network with orthogonal inputs and imbalanced data
Abstract: A classical result for linear networks trained with mean-squared-error loss is that gradient flow learns the singular modes of the data sequentially, in order of importance. The precise mechanics of such sequential emergence under cross-entropy loss remain largely unknown. We study this question in a minimal nonconvex setting: a two-layer linear network with orthogonal inputs and step-imbalanced classes, equivalent to the unconstrained feature model used in neural collapse analyses. We derive a closed-form expression for the full regularization path, which exhibits sequential mode emergence, but with novel behavior: active singular values diverge, only normalized logits converge and can overshoot the limiting geometry. We then show that a related sequential picture holds for gradient flow under appropriate spectrally aligned initialization Our analysis relies on a novel imbalance-adapted Hadamard basis in which softmax preserves a diagonal-plus-rank-one structure.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 129
Loading