Keywords: neural collapse, implicit geometry, label imbalance, overparameterization
Abstract: It has been empirically observed that training large models with weighted cross-entropy (CE) beyond the zero-training-error regime is not a satisfactory remedy for label-imbalanced data. Instead, researchers have proposed the vector-scaling (VS) loss, as a parameterization of the CE loss that is tailored to this modern training regime. The driving force to understand the impact of such parameterizations on the gradient-descent path has been the theory of implicit bias. Specifically for linear(ized) models, this theory allows to explain why weighted CE fails and how the VS-loss biases the optimization path towards solutions that favor minorities. However, beyond linear models the description of implicit bias is more obscure. In order to gain insights on the impact of different CE-parameterizations in non-linear models, we investigate their implicit geometry of learned classifiers and embeddings. Our main result characterizes the global minimizers of a non-convex cost-sensitive SVM classifier for the so-called unconstrained features model, which serves as an abstraction of deep models. We also study empirically the convergence of SGD to this global minimizer observing slow-downs with increasing imbalance ratios and scalings of the loss hyperparameters. In deep-nets, we show preliminary results on the empirical convergence to the predicted geometry.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/on-the-implicit-geometry-of-cross-entropy/code)
0 Replies
Loading